mirror of
https://github.com/KarolS/millfork.git
synced 2024-06-09 01:29:31 +00:00
Text literals in expressions, escape sequences, and more
This commit is contained in:
parent
514e819ddf
commit
070ae395ee
|
@ -16,6 +16,12 @@
|
|||
|
||||
* Automatic selection of text encoding based on target platform.
|
||||
|
||||
* Text literals can be now used as expressions of type `pointer`.
|
||||
|
||||
* Extra `z` at the name of the encoding means that the string is zero-terminated.
|
||||
|
||||
* **Potentially breaking change!** Curly braces in text literals are now used for escape sequences.
|
||||
|
||||
* **Potentially breaking change!** `scr` now refers to the default screencodes as defined for the platform.
|
||||
Code that uses both a custom platform definition and the `scr` encoding needs attention
|
||||
(either change `scr` to `petscr` or add `screen_encoding=petscr` in the platform definition file).
|
||||
|
|
|
@ -21,6 +21,10 @@
|
|||
|
||||
* [Types](lang/types.md)
|
||||
|
||||
* [Literals](lang/literals.md)
|
||||
|
||||
* [List of text encodings and escape sequences](lang/text.md)
|
||||
|
||||
* [Operators reference](lang/operators.md)
|
||||
|
||||
* [Functions](lang/functions.md)
|
||||
|
|
|
@ -21,6 +21,10 @@
|
|||
|
||||
* [Types](lang/types.md)
|
||||
|
||||
* [Literals](lang/literals.md)
|
||||
|
||||
* [List of text encodings and escape sequences](lang/text.md)
|
||||
|
||||
* [Operators reference](lang/operators.md)
|
||||
|
||||
* [Functions](lang/functions.md)
|
||||
|
|
|
@ -16,49 +16,41 @@ Hexadecimal: `$D323`, `0x2a2`
|
|||
|
||||
## String literals
|
||||
|
||||
String literals can be used as either array initializers or expressions of type `pointer`.
|
||||
|
||||
String literals are surrounded with double quotes and optionally followed by the name of the encoding:
|
||||
|
||||
"this is a string" ascii
|
||||
"this is also a string"
|
||||
|
||||
Characters between the quotes are interpreted literally,
|
||||
there are no ways to escape special characters or quotes.
|
||||
If there is no encoding name specified, then the `default` encoding is used.
|
||||
Two encoding names are special and refer to platform-specific encodings:
|
||||
`default` and `scr`.
|
||||
|
||||
You can also append `z` to the name of the encoding to make the string zero-terminated.
|
||||
This means that the string will have one extra byte appended, equal to 0.
|
||||
|
||||
"this is a zero-terminated string" asciiz
|
||||
"this is also a zero-terminated string"z
|
||||
|
||||
Most characters between the quotes are interpreted literally.
|
||||
To allow characters that cannot be inserted normally,
|
||||
each encoding may define escape sequences.
|
||||
Every encoding is guaranteed to support at least
|
||||
`{n}` for new line,
|
||||
`{q}` for double quote
|
||||
and `{apos}` for single quote/apostrophe.
|
||||
|
||||
For the list of all text encodings and escape sequences, see [this page](./text.md).
|
||||
|
||||
In some encodings, multiple characters are mapped to the same byte value,
|
||||
for compatibility with multiple variants.
|
||||
|
||||
Currently available encodings:
|
||||
|
||||
* `default` – default console encoding (can be omitted)
|
||||
|
||||
* `scr` – default screencodes
|
||||
(usually the same as `default`, a notable exception are the Commodore computers)
|
||||
|
||||
* `ascii` – standard ASCII
|
||||
|
||||
* `pet` or `petscii` – PETSCII (ASCII-like character set used by Commodore machines)
|
||||
|
||||
* `cbmscr` or `petscr` – Commodore screencodes
|
||||
|
||||
* `apple2` – Apple II charset ($A0–$FE)
|
||||
|
||||
* `bbc` – BBC Micro and ZX Spectrum character set
|
||||
|
||||
* `jis` or `jisx` – JIS X 0201
|
||||
|
||||
* `iso_de`, `iso_no`, `iso_se`, `iso_yu` – various variants of ISO/IEC-646
|
||||
|
||||
* `iso_dk`, `iso_fi` – aliases for `iso_no` and `iso_se` respectively
|
||||
|
||||
When programming for Commodore,
|
||||
use `pet` for strings you're printing using standard I/O routines
|
||||
and `petscr` for strings you're copying to screen memory directly.
|
||||
|
||||
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
|
||||
However, if the command-line option `-flenient-encoding` is used,
|
||||
then literals using `default` and `scr` encodings replace unsupported characters with supported ones
|
||||
and a warning is issued.
|
||||
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß"` is equivalent to:
|
||||
then literals using `default` and `scr` encodings replace unsupported characters with supported ones,
|
||||
skip unsupported escape sequences, and a warning is issued.
|
||||
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß{lbrace}"` is equivalent to:
|
||||
|
||||
* `"£Y↑z(C)ss"` if the default encoding is `pet`
|
||||
|
||||
|
@ -83,10 +75,12 @@ Character literals are surrounded by single quotes and optionally followed by th
|
|||
|
||||
From the type system point of view, they are constants of type byte.
|
||||
|
||||
For the list of all text encodings and escape sequences, see [this page](./text.md).
|
||||
|
||||
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
|
||||
However, if the command-line option `-flenient-encoding` is used,
|
||||
then literals using `default` and `scr` encodings replace unsupported characters with supported ones.
|
||||
If the replacement is one characacter long, only a warning is issued, otherwise an error is raised.
|
||||
If the replacement is one character long, only a warning is issued, otherwise an error is raised.
|
||||
|
||||
## Array initialisers
|
||||
|
||||
|
|
70
docs/lang/text.md
Normal file
70
docs/lang/text.md
Normal file
|
@ -0,0 +1,70 @@
|
|||
[< back to index](../index.md)
|
||||
|
||||
# Text encodings ans escape sequences
|
||||
|
||||
### Text encoding list
|
||||
|
||||
* `default` – default console encoding (can be omitted)
|
||||
|
||||
* `scr` – default screencodes
|
||||
(usually the same as `default`, a notable exception are the Commodore computers)
|
||||
|
||||
* `ascii` – standard ASCII
|
||||
|
||||
* `pet` or `petscii` – PETSCII (ASCII-like character set used by Commodore machines)
|
||||
|
||||
* `cbmscr` or `petscr` – Commodore screencodes
|
||||
|
||||
* `apple2` – Apple II charset ($A0–$FE)
|
||||
|
||||
* `bbc` – BBC Micro character set
|
||||
|
||||
* `sinclair` – ZX Spectrum character set
|
||||
|
||||
* `jis` or `jisx` – JIS X 0201
|
||||
|
||||
* `iso_de`, `iso_no`, `iso_se`, `iso_yu` – various variants of ISO/IEC-646
|
||||
|
||||
* `iso_dk`, `iso_fi` – aliases for `iso_no` and `iso_se` respectively
|
||||
|
||||
When programming for Commodore,
|
||||
use `pet` for strings you're printing using standard I/O routines
|
||||
and `petscr` for strings you're copying to screen memory directly.
|
||||
|
||||
### Escape sequences
|
||||
|
||||
##### Available everywhere
|
||||
|
||||
* `{n}` – new line
|
||||
|
||||
* `{q}` – double quote symbol
|
||||
|
||||
* `{apos}` – apostrophe/single quote
|
||||
|
||||
* `{x00}`–`{xff}` – a character of the given hexadecimal value
|
||||
|
||||
##### Available only in some encodings
|
||||
|
||||
* `{b}` – backspace
|
||||
|
||||
* `{lbrace}`, `{rbrace}` – opening and closing curly brace (only in encodings that support braces)
|
||||
|
||||
* `{up}`, `{down}`, `{left}`, `{right}` – control codes for moving the cursor
|
||||
|
||||
* `{white}`, `{black}`, `{red}`, `{green}`, `{blue}`, `{cyan}`, `{yellow}`, `{purple}` –
|
||||
control codes for changing the text color
|
||||
|
||||
* `{bgwhite}`, `{bgblack}`, `{bgred}`, `{bggreen}`, `{bgblue}`, `{bgcyan}`, `{bgyellow}`, `{bgpurple}` –
|
||||
control codes for changing the text background color
|
||||
|
||||
* `{reverse}`, `{reverseoff}` – inverted mode on/off
|
||||
|
||||
##### Escape sequence availability
|
||||
|
||||
Encoding | braces | backspace | cursor movement | text colour and reverse | background colour
|
||||
--|--|--|--|--
|
||||
`pet` | no | no | yes | yes | no
|
||||
`petscr` | no | no | no | no | no
|
||||
`sinclair` | yes | no | yes | yes | yes
|
||||
`ascii`, `iso_*` | yes | yes | no | no | no
|
||||
all the rest | yes | no | no | no | no
|
|
@ -4,6 +4,8 @@
|
|||
|
||||
* [Hello world](hello_world/hello_world.mfk) (C64/C16/PET/VIC-20/Atari/Apple II/BBC Micro) – simple text output
|
||||
|
||||
* [Text encodings](c64/text_encodings.mfk) (C64/ZX Spectrum)– examples of text encoding features
|
||||
|
||||
## Commodore 64 examples
|
||||
|
||||
### Console I/O examples
|
||||
|
@ -12,8 +14,6 @@
|
|||
|
||||
* [Calculator](c64/calculator.mfk) – simple numeric input and output
|
||||
|
||||
* [Text encodings](c64/text_encodings.mfk) – examples of text encoding features
|
||||
|
||||
* [Panic](c64/panic_test.mfk) – how panic works on C64, showing the address of where it happened
|
||||
|
||||
### Graphical examples
|
||||
|
|
|
@ -1,30 +1,25 @@
|
|||
import stdio
|
||||
import c64_basic
|
||||
|
||||
array text1 = [ "enter first number:", 13, 0 ]
|
||||
array text2 = [ "enter second number:", 13, 0 ]
|
||||
array text3 = [ "the sum is:", 13, 0 ]
|
||||
array texte = [ "that wasn't a number, try again:", 13, 0 ]
|
||||
|
||||
void main() {
|
||||
word a
|
||||
word b
|
||||
|
||||
putstrz(text1)
|
||||
putstrz("enter first number:{n}"z)
|
||||
a = readword()
|
||||
while readword_err != 0 {
|
||||
putstrz(texte)
|
||||
putstrz("that wasn't a number, try again:{n}"z)
|
||||
a = readword()
|
||||
}
|
||||
|
||||
putstrz(text2)
|
||||
putstrz("enter second number:{n}"z)
|
||||
b = readword()
|
||||
while readword_err != 0 {
|
||||
putstrz(texte)
|
||||
putstrz("that wasn't a number, try again:{n}"z)
|
||||
b = readword()
|
||||
}
|
||||
|
||||
putstrz(text3)
|
||||
putstrz("the sum is:{n}"z)
|
||||
a += b
|
||||
putword(a)
|
||||
putchar(13)
|
||||
|
|
|
@ -2,7 +2,13 @@ import stdio
|
|||
|
||||
array p = [
|
||||
"this is an example", 13,
|
||||
"of multiline petscii text"
|
||||
"of {red}multiline {yellow}{reverse}",
|
||||
#if CBM
|
||||
"petscii",
|
||||
#else
|
||||
"ASCII",
|
||||
#endif
|
||||
"{reverseoff} {white}text"
|
||||
]
|
||||
|
||||
array s = [
|
||||
|
@ -15,9 +21,11 @@ array screen [1000] @$400
|
|||
void main(){
|
||||
byte i
|
||||
putstr(p, p.length)
|
||||
#if CBM_64
|
||||
for i,0,paralleluntil,s.length {
|
||||
screen[20 * 40 + i] = s[i]
|
||||
c64_color_ram[20 * 40 + i] = light_blue
|
||||
}
|
||||
#endif
|
||||
while(true){}
|
||||
}
|
|
@ -7,5 +7,7 @@ array hello_world = "hello world"
|
|||
|
||||
void main(){
|
||||
putstr(hello_world, hello_world.length)
|
||||
putchar(13)
|
||||
putstrz("hello world again"z)
|
||||
while(true){}
|
||||
}
|
|
@ -2,7 +2,7 @@
|
|||
;a single-load ZX Spectrum 48k program
|
||||
[compilation]
|
||||
arch=z80
|
||||
encoding=bbc
|
||||
encoding=sinclair
|
||||
modules=default_panic,zxspectrum,stdlib
|
||||
|
||||
[allocation]
|
||||
|
|
|
@ -241,6 +241,7 @@ object CompilationFlag extends Enumeration {
|
|||
// warning options
|
||||
ExtraComparisonWarnings,
|
||||
RorWarning,
|
||||
NonZeroTerminatedLiteralWarning,
|
||||
FatalWarnings,
|
||||
// special options for internal compiler use
|
||||
InternalCurrentlyOptimizingForMeasurement = Value
|
||||
|
|
|
@ -104,8 +104,14 @@ object Platform {
|
|||
|
||||
val codecName = cs.get(classOf[String], "encoding", "ascii")
|
||||
val srcCodecName = cs.get(classOf[String], "screen_encoding", codecName)
|
||||
val codec = TextCodec.forName(codecName, None)
|
||||
val srcCodec = TextCodec.forName(srcCodecName, None)
|
||||
val (codec, czt) = TextCodec.forName(codecName, None)
|
||||
if (czt) {
|
||||
ErrorReporting.error("Default encoding cannot be zero-terminated")
|
||||
}
|
||||
val (srcCodec, szt) = TextCodec.forName(srcCodecName, None)
|
||||
if (szt) {
|
||||
ErrorReporting.error("Default screen encoding cannot be zero-terminated")
|
||||
}
|
||||
|
||||
val as = conf.getSection("allocation")
|
||||
|
||||
|
|
|
@ -145,6 +145,7 @@ object AbstractExpressionCompiler {
|
|||
case 4 => env.get[Type]("long")
|
||||
}
|
||||
case GeneratedConstantExpression(c, t) => t
|
||||
case TextLiteralExpression(_) => env.get[Type]("pointer")
|
||||
case VariableExpression(name) =>
|
||||
env.get[TypedThing](name, expr.position).typ
|
||||
case HalfWordExpression(param, _) =>
|
||||
|
|
|
@ -154,6 +154,14 @@ abstract class AbstractStatementPreprocessor(ctx: CompilationContext, statements
|
|||
}
|
||||
}
|
||||
|
||||
def genName(characters: List[Expression]): String = {
|
||||
"textliteral$" ++ characters.flatMap{
|
||||
case LiteralExpression(n, _) =>
|
||||
f"$n%02x"
|
||||
case _ => ???
|
||||
}
|
||||
}
|
||||
|
||||
def optimizeExpr(expr: Expression, currentVarValues: VV): Expression = {
|
||||
val pos = expr.position
|
||||
expr match {
|
||||
|
@ -161,6 +169,12 @@ abstract class AbstractStatementPreprocessor(ctx: CompilationContext, statements
|
|||
expr
|
||||
case FunctionCallExpression("->", List(handle, FunctionCallExpression(method, params))) =>
|
||||
expr
|
||||
case TextLiteralExpression(characters) =>
|
||||
val name = genName(characters)
|
||||
if (ctx.env.maybeGet[Thing](name).isEmpty) {
|
||||
ctx.env.root.registerArray(ArrayDeclarationStatement(name, None, None, None, Some(LiteralContents(characters))).pos(pos), ctx.options)
|
||||
}
|
||||
VariableExpression(name).pos(pos)
|
||||
case VariableExpression(v) if currentVarValues.contains(v) =>
|
||||
val constant = currentVarValues(v)
|
||||
ErrorReporting.debug(s"Using node flow to replace $v with $constant", pos)
|
||||
|
|
|
@ -1218,6 +1218,7 @@ class Environment(val parent: Option[Environment], val prefix: String, val cpuFa
|
|||
case _:BooleanLiteralExpression => ()
|
||||
case _:LiteralExpression => ()
|
||||
case _:GeneratedConstantExpression => ()
|
||||
case _:TextLiteralExpression => ()
|
||||
case VariableExpression(name) =>
|
||||
checkName[VariableLikeThing]("Variable or constant", name, node.position)
|
||||
case IndexedExpression(name, index) =>
|
||||
|
|
|
@ -46,6 +46,13 @@ case class LiteralExpression(value: Long, requiredSize: Int) extends Expression
|
|||
override def getAllIdentifiers: Set[String] = Set.empty
|
||||
}
|
||||
|
||||
case class TextLiteralExpression(characters: List[Expression]) extends Expression {
|
||||
override def replaceVariable(variable: String, actualParam: Expression): Expression = this
|
||||
override def containsVariable(variable: String): Boolean = false
|
||||
override def isPure: Boolean = true
|
||||
override def getAllIdentifiers: Set[String] = Set.empty
|
||||
}
|
||||
|
||||
case class GeneratedConstantExpression(value: Constant, typ: Type) extends Expression {
|
||||
override def replaceVariable(variable: String, actualParam: Expression): Expression = this
|
||||
override def containsVariable(variable: String): Boolean = false
|
||||
|
|
|
@ -41,9 +41,11 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
|
|||
newPosition
|
||||
}
|
||||
|
||||
val codec: P[(TextCodec, Boolean)] = P(position("text codec identifier") ~ identifier.?.map(_.getOrElse(""))).map {
|
||||
case (_, "" | "default") => options.platform.defaultCodec -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
case (_, "scr") => options.platform.screenCodec -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
val codec: P[((TextCodec, Boolean), Boolean)] = P(position("text codec identifier") ~ identifier.?.map(_.getOrElse(""))).map {
|
||||
case (_, "" | "default") => (options.platform.defaultCodec -> false) -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
case (_, "z" | "defaultz") => (options.platform.defaultCodec -> true) -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
case (_, "scr") => (options.platform.screenCodec -> false) -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
case (_, "scrz") => (options.platform.screenCodec -> true) -> options.flag(CompilationFlag.LenientTextEncoding)
|
||||
case (p, x) => TextCodec.forName(x, Some(p)) -> false
|
||||
}
|
||||
|
||||
|
@ -52,9 +54,12 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
|
|||
val charAtom: P[LiteralExpression] = for {
|
||||
p <- position()
|
||||
c <- "'" ~/ CharPred(c => c >= ' ' && !invalidCharLiteralTypes(Character.getType(c))).! ~/ "'"
|
||||
(co, lenient) <- HWS ~ codec
|
||||
((co, zt), lenient) <- HWS ~ codec
|
||||
} yield {
|
||||
co.encode(options, Some(p), c.charAt(0), lenient = lenient) match {
|
||||
if (zt) {
|
||||
ErrorReporting.error("Zero-terminated encoding is not a valid encoding for a character literal", Some(p))
|
||||
}
|
||||
co.encode(options, Some(p), c.toList, lenient = lenient) match {
|
||||
case List(value) =>
|
||||
LiteralExpression(value, 1)
|
||||
case _ =>
|
||||
|
@ -71,9 +76,18 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
|
|||
}
|
||||
}
|
||||
|
||||
val textLiteral: P[List[Expression]] = P(position() ~ doubleQuotedString ~/ HWS ~ codec).map {
|
||||
case (p, s, ((co, zt), lenient)) =>
|
||||
val characters = co.encode(options, None, s.toList, lenient = lenient).map(c => LiteralExpression(c, 1).pos(p))
|
||||
if (zt) characters :+ LiteralExpression(0,1)
|
||||
else characters
|
||||
}
|
||||
|
||||
val textLiteralAtom: P[TextLiteralExpression] = textLiteral.map(TextLiteralExpression)
|
||||
|
||||
val literalAtom: P[LiteralExpression] = charAtom | binaryAtom | hexAtom | octalAtom | quaternaryAtom | decimalAtom
|
||||
|
||||
val atom: P[Expression] = P(position() ~ (literalAtom | variableAtom)).map{case (p,a) => a.pos(p)}
|
||||
val atom: P[Expression] = P(position() ~ (literalAtom | variableAtom | textLiteralAtom)).map{case (p,a) => a.pos(p)}
|
||||
|
||||
val globalVariableDefinition: P[Seq[DeclarationStatement]] = variableDefinition(true)
|
||||
val localVariableDefinition: P[Seq[DeclarationStatement]] = variableDefinition(false)
|
||||
|
@ -150,9 +164,7 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
|
|||
LiteralContents(slice.map(c => LiteralExpression(c & 0xff, 1)).toList)
|
||||
}
|
||||
|
||||
def arrayStringContents: P[ArrayContents] = P(position() ~ doubleQuotedString ~/ HWS ~ codec).map {
|
||||
case (p, s, (co, lenient)) => LiteralContents(s.flatMap(c => co.encode(options, None, c, lenient = lenient)).map(c => LiteralExpression(c, 1).pos(p)))
|
||||
}
|
||||
def arrayStringContents: P[ArrayContents] = textLiteral.map(LiteralContents(_))
|
||||
|
||||
def arrayLoopContents: P[ArrayContents] = for {
|
||||
identifier <- "for" ~ SWS ~/ identifier ~/ HWS ~ "," ~/ HWS ~ Pass
|
||||
|
|
|
@ -9,7 +9,11 @@ import millfork.node.Position
|
|||
/**
|
||||
* @author Karol Stasiak
|
||||
*/
|
||||
class TextCodec(val name: String, private val map: String, private val extra: Map[Char, Int], private val decompositions: Map[Char, String]) {
|
||||
class TextCodec(val name: String,
|
||||
private val map: String,
|
||||
private val extra: Map[Char, Int],
|
||||
private val decompositions: Map[Char, String],
|
||||
private val escapeSequences: Map[String, List[Int]]) {
|
||||
|
||||
private def isPrintable(c: Char) = {
|
||||
c.getType match {
|
||||
|
@ -52,31 +56,60 @@ class TextCodec(val name: String, private val map: String, private val extra: Ma
|
|||
if (s.forall(isPrintable)) f"`$s%s` ($u%s)"
|
||||
else u
|
||||
}
|
||||
|
||||
private def encodeImpl(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): Option[List[Int]] = {
|
||||
if (decompositions.contains(c)) {
|
||||
Some(decompositions(c).toList.flatMap(x => encodeImpl(options, position, x, lenient).getOrElse(Nil)))
|
||||
} else if (extra.contains(c)) Some(List(extra(c))) else {
|
||||
val index = map.indexOf(c)
|
||||
if (index >= 0) {
|
||||
Some(List(index))
|
||||
} else if (lenient) {
|
||||
val alternative = TextCodec.lossyAlternatives.getOrElse(c, Nil).:+("?").find(alts => alts.forall(alt => encodeImpl(options, position, alt, lenient = false).isDefined)).getOrElse("")
|
||||
ErrorReporting.warn(s"Cannot encode ${format(c)} in encoding `$name`, replaced it with ${format(alternative)}", options, position)
|
||||
Some(alternative.toList.flatMap(encodeImpl(options, position, _, lenient = false).get))
|
||||
} else {
|
||||
None
|
||||
private def encodeChar(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): Option[List[Int]] = {
|
||||
if (decompositions.contains(c)) {
|
||||
Some(decompositions(c).toList.flatMap(x => encodeChar(options, position, x, lenient).getOrElse(Nil)))
|
||||
} else if (extra.contains(c)) Some(List(extra(c))) else {
|
||||
val index = map.indexOf(c)
|
||||
if (index >= 0) {
|
||||
Some(List(index))
|
||||
} else if (lenient) {
|
||||
val alternative = TextCodec.lossyAlternatives.getOrElse(c, Nil).:+("?").find(alts => alts.forall(alt => encodeChar(options, position, alt, lenient = false).isDefined)).getOrElse("")
|
||||
ErrorReporting.warn(s"Cannot encode ${format(c)} in encoding `$name`, replaced it with ${format(alternative)}", options, position)
|
||||
Some(alternative.toList.flatMap(encodeChar(options, position, _, lenient = false).get))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def encode(options: CompilationOptions, position: Option[Position], s: List[Char], lenient: Boolean): List[Int] = s match {
|
||||
case '{' :: tail =>
|
||||
val (escSeq, closingBrace) = tail.span(_ != '}')
|
||||
closingBrace match {
|
||||
case '}' :: xs =>
|
||||
encodeEscapeSequence(options, escSeq.mkString(""), position, lenient) ++ encode(options, position, xs, lenient)
|
||||
case _ =>
|
||||
ErrorReporting.error(f"Unclosed escape sequence", position)
|
||||
Nil
|
||||
}
|
||||
case head :: tail =>
|
||||
(encodeChar(options, position, head, lenient) match {
|
||||
case Some(x) => x
|
||||
case None =>
|
||||
ErrorReporting.error(f"Invalid character ${format(head)} in string", position)
|
||||
Nil
|
||||
}) ++ encode(options, position, tail, lenient)
|
||||
case Nil => Nil
|
||||
}
|
||||
|
||||
def encode(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): List[Int] = {
|
||||
encodeImpl(options, position, c, lenient) match {
|
||||
case Some(x) => x
|
||||
case None =>
|
||||
ErrorReporting.error(f"Invalid character ${format(c)} in string", position)
|
||||
Nil
|
||||
private def encodeEscapeSequence(options: CompilationOptions, escSeq: String, position: Option[Position], lenient: Boolean): List[Int] = {
|
||||
if (escSeq.length == 3 && (escSeq(0) == 'X' || escSeq(0) == 'x')){
|
||||
try {
|
||||
return List(Integer.parseInt(escSeq.tail, 16))
|
||||
} catch {
|
||||
case _: NumberFormatException =>
|
||||
}
|
||||
}
|
||||
escapeSequences.getOrElse(escSeq, {
|
||||
if (lenient) {
|
||||
ErrorReporting.warn(s"Cannot encode escape sequence {$escSeq} in encoding `$name`, skipped it", options, position)
|
||||
} else {
|
||||
ErrorReporting.error(s"Invalid escape sequence {$escSeq} for encoding `$name`", position)
|
||||
}
|
||||
Nil
|
||||
})
|
||||
}
|
||||
|
||||
def decode(by: Int): Char = {
|
||||
|
@ -87,41 +120,81 @@ class TextCodec(val name: String, private val map: String, private val extra: Ma
|
|||
|
||||
object TextCodec {
|
||||
|
||||
def forName(name: String, position: Option[Position]): TextCodec = (position, name) match {
|
||||
case (_, "ascii") => TextCodec.Ascii
|
||||
case (_, "petscii") => TextCodec.Petscii
|
||||
case (_, "pet") => TextCodec.Petscii
|
||||
case (_, "cbmscr") => TextCodec.CbmScreencodes
|
||||
case (_, "petscr") => TextCodec.CbmScreencodes
|
||||
case (_, "atascii") => TextCodec.Atascii
|
||||
case (_, "atari") => TextCodec.Atascii
|
||||
case (_, "bbc") => TextCodec.Bbc
|
||||
case (_, "apple2") => TextCodec.Apple2
|
||||
case (_, "jis") => TextCodec.Jis
|
||||
case (_, "jisx") => TextCodec.Jis
|
||||
case (_, "iso_de") => TextCodec.IsoIec646De
|
||||
case (_, "iso_no") => TextCodec.IsoIec646No
|
||||
case (_, "iso_dk") => TextCodec.IsoIec646No
|
||||
case (_, "iso_se") => TextCodec.IsoIec646Se
|
||||
case (_, "iso_fi") => TextCodec.IsoIec646Se
|
||||
case (_, "iso_yu") => TextCodec.IsoIec646Yu
|
||||
case (p, x) =>
|
||||
ErrorReporting.error(s"Unknown string encoding: `$x`", p)
|
||||
TextCodec.Ascii
|
||||
def forName(name: String, position: Option[Position]): (TextCodec, Boolean) = {
|
||||
val zeroTerminated = name.endsWith("z")
|
||||
val cleanName = name.stripSuffix("z")
|
||||
val codec = (position, cleanName) match {
|
||||
case (_, "ascii") => TextCodec.Ascii
|
||||
case (_, "petscii") => TextCodec.Petscii
|
||||
case (_, "pet") => TextCodec.Petscii
|
||||
case (_, "cbmscr") => TextCodec.CbmScreencodes
|
||||
case (_, "petscr") => TextCodec.CbmScreencodes
|
||||
case (_, "atascii") => TextCodec.Atascii
|
||||
case (_, "atari") => TextCodec.Atascii
|
||||
case (_, "bbc") => TextCodec.Bbc
|
||||
case (_, "sinclair") => TextCodec.Sinclair
|
||||
case (_, "apple2") => TextCodec.Apple2
|
||||
case (_, "jis") => TextCodec.Jis
|
||||
case (_, "jisx") => TextCodec.Jis
|
||||
case (_, "iso_de") => TextCodec.IsoIec646De
|
||||
case (_, "iso_no") => TextCodec.IsoIec646No
|
||||
case (_, "iso_dk") => TextCodec.IsoIec646No
|
||||
case (_, "iso_se") => TextCodec.IsoIec646Se
|
||||
case (_, "iso_fi") => TextCodec.IsoIec646Se
|
||||
case (_, "iso_yu") => TextCodec.IsoIec646Yu
|
||||
case (p, _) =>
|
||||
ErrorReporting.error(s"Unknown string encoding: `$name`", p)
|
||||
TextCodec.Ascii
|
||||
}
|
||||
codec -> zeroTerminated
|
||||
}
|
||||
|
||||
val NotAChar = '\ufffd'
|
||||
|
||||
val Ascii = new TextCodec("ASCII", 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty)
|
||||
private val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127)
|
||||
|
||||
val Apple2 = new TextCodec("APPLE-II", 0.until(255).map { i => if (i < 160) NotAChar else (i - 128).toChar }.mkString, Map.empty, Map.empty)
|
||||
//noinspection ScalaUnusedSymbol
|
||||
private val AsciiEscapeSequences: Map[String, List[Int]] = Map(
|
||||
"n" -> List(13),
|
||||
"t" -> List(9),
|
||||
"b" -> List(8),
|
||||
"q" -> List('\"'.toInt),
|
||||
"apos" -> List('\''.toInt),
|
||||
"lbrace" -> List('{'.toInt),
|
||||
"rbrace" -> List('}'.toInt))
|
||||
|
||||
//noinspection ScalaUnusedSymbol
|
||||
private val MinimalEscapeSequencesWithoutBraces: Map[String, List[Int]] = Map(
|
||||
"n" -> List(13),
|
||||
"apos" -> List('\''.toInt),
|
||||
"q" -> List('\"'.toInt))
|
||||
|
||||
//noinspection ScalaUnusedSymbol
|
||||
private val MinimalEscapeSequencesWithBraces: Map[String, List[Int]] = Map(
|
||||
"n" -> List(13),
|
||||
"apos" -> List('\''.toInt),
|
||||
"q" -> List('\"'.toInt),
|
||||
"lbrace" -> List('{'.toInt),
|
||||
"rbrace" -> List('}'.toInt))
|
||||
|
||||
val Ascii = new TextCodec("ASCII", 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences)
|
||||
|
||||
val Apple2 = new TextCodec("APPLE-II", 0.until(255).map { i => if (i < 160) NotAChar else (i - 128).toChar }.mkString, Map.empty, Map.empty, MinimalEscapeSequencesWithBraces)
|
||||
|
||||
val IsoIec646De = new TextCodec("ISO-IEC-646-DE",
|
||||
"\ufffd" * 32 +
|
||||
" !\"#$%^'()*+,-./0123456789:;<=>?" +
|
||||
"§ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ^_" +
|
||||
"`abcdefghijklmnopqrstuvwxyzäöüß",
|
||||
Map.empty, Map.empty
|
||||
DefaultOverrides, Map.empty, AsciiEscapeSequences ++ Map(
|
||||
"UE" -> List('['.toInt),
|
||||
"OE" -> List('\\'.toInt),
|
||||
"AE" -> List(']'.toInt),
|
||||
"ue" -> List('{'.toInt),
|
||||
"oe" -> List('|'.toInt),
|
||||
"ae" -> List('}'.toInt),
|
||||
"ss" -> List('~'.toInt)
|
||||
)
|
||||
)
|
||||
|
||||
val IsoIec646Se = new TextCodec("ISO-IEC-646-SE",
|
||||
|
@ -136,7 +209,14 @@ object TextCodec {
|
|||
'Ü' -> '^'.toInt,
|
||||
'ü' -> '~'.toInt,
|
||||
'$' -> '¤'.toInt),
|
||||
Map.empty
|
||||
Map.empty, AsciiEscapeSequences ++ Map(
|
||||
"AE" -> List('['.toInt),
|
||||
"OE" -> List('\\'.toInt),
|
||||
"AA" -> List(']'.toInt),
|
||||
"ae" -> List('{'.toInt),
|
||||
"oe" -> List('|'.toInt),
|
||||
"aa" -> List('}'.toInt)
|
||||
)
|
||||
)
|
||||
|
||||
val IsoIec646No = new TextCodec("ISO-IEC-646-NO",
|
||||
|
@ -155,23 +235,31 @@ object TextCodec {
|
|||
'«' -> '"'.toInt,
|
||||
'»' -> '"'.toInt,
|
||||
'§' -> '#'.toInt),
|
||||
Map.empty
|
||||
Map.empty, AsciiEscapeSequences ++ Map(
|
||||
"AE" -> List('['.toInt),
|
||||
"OE" -> List('\\'.toInt),
|
||||
"AA" -> List(']'.toInt),
|
||||
"ae" -> List('{'.toInt),
|
||||
"oe" -> List('|'.toInt),
|
||||
"aa" -> List('}'.toInt)
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU",
|
||||
"\ufffd" * 32 +
|
||||
" !\"#$%^'()*+,-./0123456789:;<=>?" +
|
||||
"ŽABCDEFGHIJKLMNOPQRSTUVWXYZŠĐĆČ_" +
|
||||
"žabcdefghijklmnopqrstuvwxyzšđćč",
|
||||
Map('Ë' -> '$'.toInt, 'ë' -> '_'.toInt),
|
||||
Map.empty
|
||||
)
|
||||
Map.empty, AsciiEscapeSequences)
|
||||
|
||||
val CbmScreencodes = new TextCodec("CBM-Screen",
|
||||
"@abcdefghijklmnopqrstuvwxyz[£]↑←" +
|
||||
0x20.to(0x3f).map(_.toChar).mkString +
|
||||
"–ABCDEFGHIJKLMNOPQRSTUVWXYZ\ufffd\ufffd\ufffdπ",
|
||||
Map('^' -> 0x3E, '♥' -> 0x53, '♡' -> 0x53, '♠' -> 0x41, '♣' -> 0x58, '♢' -> 0x5A, '•' -> 0x51), Map.empty
|
||||
Map('^' -> 0x3E, '♥' -> 0x53, '♡' -> 0x53, '♠' -> 0x41, '♣' -> 0x58, '♢' -> 0x5A, '•' -> 0x51),
|
||||
Map.empty, MinimalEscapeSequencesWithoutBraces
|
||||
)
|
||||
|
||||
val Petscii = new TextCodec("PETSCII",
|
||||
|
@ -179,7 +267,25 @@ object TextCodec {
|
|||
0x20.to(0x3f).map(_.toChar).mkString +
|
||||
"@abcdefghijklmnopqrstuvwxyz[£]↑←" +
|
||||
"–ABCDEFGHIJKLMNOPQRSTUVWXYZ\ufffd\ufffd\ufffdπ",
|
||||
Map('^' -> 0x5E, '♥' -> 0x73, '♡' -> 0x73, '♠' -> 0x61, '♣' -> 0x78, '♢' -> 0x7A, '•' -> 0x71), Map.empty
|
||||
Map('^' -> 0x5E, '♥' -> 0x73, '♡' -> 0x73, '♠' -> 0x61, '♣' -> 0x78, '♢' -> 0x7A, '•' -> 0x71), Map.empty, Map(
|
||||
"n" -> List(13),
|
||||
"q" -> List('\"'.toInt),
|
||||
"apos" -> List('\''.toInt),
|
||||
"up" -> List(0x91),
|
||||
"down" -> List(0x11),
|
||||
"left" -> List(0x9d),
|
||||
"right" -> List(0x1d),
|
||||
"white" -> List(5),
|
||||
"black" -> List(0x90),
|
||||
"red" -> List(0x1c),
|
||||
"blue" -> List(0x1f),
|
||||
"green" -> List(0x1e),
|
||||
"cyan" -> List(0x9f),
|
||||
"purple" -> List(0x9c),
|
||||
"yellow" -> List(0x9e),
|
||||
"reverse" -> List(0x12),
|
||||
"reverseoff" -> List(0x92)
|
||||
)
|
||||
)
|
||||
|
||||
val Atascii = new TextCodec("ATASCII",
|
||||
|
@ -189,14 +295,49 @@ object TextCodec {
|
|||
"\ufffd" * 11 +
|
||||
0x20.to(0x5f).map(_.toChar).mkString +
|
||||
"♢abcdefghijklmnopqrstuvwxyz♠|",
|
||||
Map('♥' -> 0, '·' -> 0x14), Map.empty
|
||||
Map('♥' -> 0, '·' -> 0x14), Map.empty, MinimalEscapeSequencesWithBraces
|
||||
)
|
||||
|
||||
val Bbc = new TextCodec("BBC",
|
||||
"\ufffd" * 32 +
|
||||
0x20.to(0x5f).map(_.toChar).mkString +
|
||||
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
|
||||
Map('↑' -> '^'.toInt), Map.empty
|
||||
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
|
||||
Map('↑' -> '^'.toInt), Map.empty, MinimalEscapeSequencesWithBraces
|
||||
)
|
||||
|
||||
val Sinclair = new TextCodec("Sinclair",
|
||||
"\ufffd" * 32 +
|
||||
0x20.to(0x5f).map(_.toChar).mkString +
|
||||
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
|
||||
Map('↑' -> '^'.toInt), Map.empty, Map(
|
||||
"n" -> List(13),
|
||||
"q" -> List('\"'.toInt),
|
||||
"apos" -> List('\''.toInt),
|
||||
"lbrace" -> List('{'.toInt),
|
||||
"rbrace" -> List('}'.toInt),
|
||||
"up" -> List(11),
|
||||
"down" -> List(10),
|
||||
"left" -> List(8),
|
||||
"right" -> List(9),
|
||||
"white" -> List(0x10, 7),
|
||||
"black" -> List(0x10, 0),
|
||||
"red" -> List(0x10, 2),
|
||||
"blue" -> List(0x10, 1),
|
||||
"green" -> List(0x10, 4),
|
||||
"cyan" -> List(0x10, 5),
|
||||
"purple" -> List(0x10, 3),
|
||||
"yellow" -> List(0x10, 6),
|
||||
"bgwhite" -> List(0x11, 7),
|
||||
"bgblack" -> List(0x11, 0),
|
||||
"bgred" -> List(0x11, 2),
|
||||
"bgblue" -> List(0x11, 1),
|
||||
"bggreen" -> List(0x11, 4),
|
||||
"bgcyan" -> List(0x11, 5),
|
||||
"bgpurple" -> List(0x11, 3),
|
||||
"bgyellow" -> List(0x11, 6),
|
||||
"reverse" -> List(0x14, 1),
|
||||
"reverseoff" -> List(0x14, 0)
|
||||
)
|
||||
)
|
||||
|
||||
private val jisHalfwidthKatakanaOrder: String =
|
||||
|
@ -222,7 +363,7 @@ object TextCodec {
|
|||
1.to(0x3F).map(i => (i + 0xff60).toChar -> (i + 0xA1)).toMap,
|
||||
(("カキクケコサシスセソタチツテトハヒフヘホ")).zip(
|
||||
"ガギグゲゴザジズゼゾダヂヅデドバビブベボ").map { case (u, v) => v -> (u + "゛") }.toMap ++
|
||||
"ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap
|
||||
"ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap, MinimalEscapeSequencesWithBraces
|
||||
)
|
||||
|
||||
val lossyAlternatives: Map[Char, List[String]] = {
|
||||
|
|
Loading…
Reference in New Issue
Block a user