1
0
mirror of https://github.com/KarolS/millfork.git synced 2024-06-09 01:29:31 +00:00

Text literals in expressions, escape sequences, and more

This commit is contained in:
Karol Stasiak 2018-07-28 00:58:20 +02:00
parent 514e819ddf
commit 070ae395ee
18 changed files with 379 additions and 113 deletions

View File

@ -16,6 +16,12 @@
* Automatic selection of text encoding based on target platform.
* Text literals can be now used as expressions of type `pointer`.
* Extra `z` at the name of the encoding means that the string is zero-terminated.
* **Potentially breaking change!** Curly braces in text literals are now used for escape sequences.
* **Potentially breaking change!** `scr` now refers to the default screencodes as defined for the platform.
Code that uses both a custom platform definition and the `scr` encoding needs attention
(either change `scr` to `petscr` or add `screen_encoding=petscr` in the platform definition file).

View File

@ -21,6 +21,10 @@
* [Types](lang/types.md)
* [Literals](lang/literals.md)
* [List of text encodings and escape sequences](lang/text.md)
* [Operators reference](lang/operators.md)
* [Functions](lang/functions.md)

View File

@ -21,6 +21,10 @@
* [Types](lang/types.md)
* [Literals](lang/literals.md)
* [List of text encodings and escape sequences](lang/text.md)
* [Operators reference](lang/operators.md)
* [Functions](lang/functions.md)

View File

@ -16,49 +16,41 @@ Hexadecimal: `$D323`, `0x2a2`
## String literals
String literals can be used as either array initializers or expressions of type `pointer`.
String literals are surrounded with double quotes and optionally followed by the name of the encoding:
"this is a string" ascii
"this is also a string"
Characters between the quotes are interpreted literally,
there are no ways to escape special characters or quotes.
If there is no encoding name specified, then the `default` encoding is used.
Two encoding names are special and refer to platform-specific encodings:
`default` and `scr`.
You can also append `z` to the name of the encoding to make the string zero-terminated.
This means that the string will have one extra byte appended, equal to 0.
"this is a zero-terminated string" asciiz
"this is also a zero-terminated string"z
Most characters between the quotes are interpreted literally.
To allow characters that cannot be inserted normally,
each encoding may define escape sequences.
Every encoding is guaranteed to support at least
`{n}` for new line,
`{q}` for double quote
and `{apos}` for single quote/apostrophe.
For the list of all text encodings and escape sequences, see [this page](./text.md).
In some encodings, multiple characters are mapped to the same byte value,
for compatibility with multiple variants.
Currently available encodings:
* `default` default console encoding (can be omitted)
* `scr` default screencodes
(usually the same as `default`, a notable exception are the Commodore computers)
* `ascii` standard ASCII
* `pet` or `petscii` PETSCII (ASCII-like character set used by Commodore machines)
* `cbmscr` or `petscr` Commodore screencodes
* `apple2` Apple II charset ($A0$FE)
* `bbc` BBC Micro and ZX Spectrum character set
* `jis` or `jisx` JIS X 0201
* `iso_de`, `iso_no`, `iso_se`, `iso_yu` various variants of ISO/IEC-646
* `iso_dk`, `iso_fi` aliases for `iso_no` and `iso_se` respectively
When programming for Commodore,
use `pet` for strings you're printing using standard I/O routines
and `petscr` for strings you're copying to screen memory directly.
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option `-flenient-encoding` is used,
then literals using `default` and `scr` encodings replace unsupported characters with supported ones
and a warning is issued.
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß"` is equivalent to:
then literals using `default` and `scr` encodings replace unsupported characters with supported ones,
skip unsupported escape sequences, and a warning is issued.
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß{lbrace}"` is equivalent to:
* `"£Y↑z(C)ss"` if the default encoding is `pet`
@ -83,10 +75,12 @@ Character literals are surrounded by single quotes and optionally followed by th
From the type system point of view, they are constants of type byte.
For the list of all text encodings and escape sequences, see [this page](./text.md).
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option `-flenient-encoding` is used,
then literals using `default` and `scr` encodings replace unsupported characters with supported ones.
If the replacement is one characacter long, only a warning is issued, otherwise an error is raised.
If the replacement is one character long, only a warning is issued, otherwise an error is raised.
## Array initialisers

70
docs/lang/text.md Normal file
View File

@ -0,0 +1,70 @@
[< back to index](../index.md)
# Text encodings ans escape sequences
### Text encoding list
* `default` default console encoding (can be omitted)
* `scr` default screencodes
(usually the same as `default`, a notable exception are the Commodore computers)
* `ascii` standard ASCII
* `pet` or `petscii` PETSCII (ASCII-like character set used by Commodore machines)
* `cbmscr` or `petscr` Commodore screencodes
* `apple2` Apple II charset ($A0$FE)
* `bbc` BBC Micro character set
* `sinclair` ZX Spectrum character set
* `jis` or `jisx` JIS X 0201
* `iso_de`, `iso_no`, `iso_se`, `iso_yu` various variants of ISO/IEC-646
* `iso_dk`, `iso_fi` aliases for `iso_no` and `iso_se` respectively
When programming for Commodore,
use `pet` for strings you're printing using standard I/O routines
and `petscr` for strings you're copying to screen memory directly.
### Escape sequences
##### Available everywhere
* `{n}` new line
* `{q}` double quote symbol
* `{apos}` apostrophe/single quote
* `{x00}``{xff}` a character of the given hexadecimal value
##### Available only in some encodings
* `{b}` backspace
* `{lbrace}`, `{rbrace}` opening and closing curly brace (only in encodings that support braces)
* `{up}`, `{down}`, `{left}`, `{right}` control codes for moving the cursor
* `{white}`, `{black}`, `{red}`, `{green}`, `{blue}`, `{cyan}`, `{yellow}`, `{purple}`
control codes for changing the text color
* `{bgwhite}`, `{bgblack}`, `{bgred}`, `{bggreen}`, `{bgblue}`, `{bgcyan}`, `{bgyellow}`, `{bgpurple}`
control codes for changing the text background color
* `{reverse}`, `{reverseoff}` inverted mode on/off
##### Escape sequence availability
Encoding | braces | backspace | cursor movement | text colour and reverse | background colour
--|--|--|--|--
`pet` | no | no | yes | yes | no
`petscr` | no | no | no | no | no
`sinclair` | yes | no | yes | yes | yes
`ascii`, `iso_*` | yes | yes | no | no | no
all the rest | yes | no | no | no | no

View File

@ -4,6 +4,8 @@
* [Hello world](hello_world/hello_world.mfk) (C64/C16/PET/VIC-20/Atari/Apple II/BBC Micro) simple text output
* [Text encodings](c64/text_encodings.mfk) (C64/ZX Spectrum) examples of text encoding features
## Commodore 64 examples
### Console I/O examples
@ -12,8 +14,6 @@
* [Calculator](c64/calculator.mfk) simple numeric input and output
* [Text encodings](c64/text_encodings.mfk) examples of text encoding features
* [Panic](c64/panic_test.mfk) how panic works on C64, showing the address of where it happened
### Graphical examples

View File

@ -1,30 +1,25 @@
import stdio
import c64_basic
array text1 = [ "enter first number:", 13, 0 ]
array text2 = [ "enter second number:", 13, 0 ]
array text3 = [ "the sum is:", 13, 0 ]
array texte = [ "that wasn't a number, try again:", 13, 0 ]
void main() {
word a
word b
putstrz(text1)
putstrz("enter first number:{n}"z)
a = readword()
while readword_err != 0 {
putstrz(texte)
putstrz("that wasn't a number, try again:{n}"z)
a = readword()
}
putstrz(text2)
putstrz("enter second number:{n}"z)
b = readword()
while readword_err != 0 {
putstrz(texte)
putstrz("that wasn't a number, try again:{n}"z)
b = readword()
}
putstrz(text3)
putstrz("the sum is:{n}"z)
a += b
putword(a)
putchar(13)

View File

@ -2,7 +2,13 @@ import stdio
array p = [
"this is an example", 13,
"of multiline petscii text"
"of {red}multiline {yellow}{reverse}",
#if CBM
"petscii",
#else
"ASCII",
#endif
"{reverseoff} {white}text"
]
array s = [
@ -15,9 +21,11 @@ array screen [1000] @$400
void main(){
byte i
putstr(p, p.length)
#if CBM_64
for i,0,paralleluntil,s.length {
screen[20 * 40 + i] = s[i]
c64_color_ram[20 * 40 + i] = light_blue
}
#endif
while(true){}
}

View File

@ -7,5 +7,7 @@ array hello_world = "hello world"
void main(){
putstr(hello_world, hello_world.length)
putchar(13)
putstrz("hello world again"z)
while(true){}
}

View File

@ -2,7 +2,7 @@
;a single-load ZX Spectrum 48k program
[compilation]
arch=z80
encoding=bbc
encoding=sinclair
modules=default_panic,zxspectrum,stdlib
[allocation]

View File

@ -241,6 +241,7 @@ object CompilationFlag extends Enumeration {
// warning options
ExtraComparisonWarnings,
RorWarning,
NonZeroTerminatedLiteralWarning,
FatalWarnings,
// special options for internal compiler use
InternalCurrentlyOptimizingForMeasurement = Value

View File

@ -104,8 +104,14 @@ object Platform {
val codecName = cs.get(classOf[String], "encoding", "ascii")
val srcCodecName = cs.get(classOf[String], "screen_encoding", codecName)
val codec = TextCodec.forName(codecName, None)
val srcCodec = TextCodec.forName(srcCodecName, None)
val (codec, czt) = TextCodec.forName(codecName, None)
if (czt) {
ErrorReporting.error("Default encoding cannot be zero-terminated")
}
val (srcCodec, szt) = TextCodec.forName(srcCodecName, None)
if (szt) {
ErrorReporting.error("Default screen encoding cannot be zero-terminated")
}
val as = conf.getSection("allocation")

View File

@ -145,6 +145,7 @@ object AbstractExpressionCompiler {
case 4 => env.get[Type]("long")
}
case GeneratedConstantExpression(c, t) => t
case TextLiteralExpression(_) => env.get[Type]("pointer")
case VariableExpression(name) =>
env.get[TypedThing](name, expr.position).typ
case HalfWordExpression(param, _) =>

View File

@ -154,6 +154,14 @@ abstract class AbstractStatementPreprocessor(ctx: CompilationContext, statements
}
}
def genName(characters: List[Expression]): String = {
"textliteral$" ++ characters.flatMap{
case LiteralExpression(n, _) =>
f"$n%02x"
case _ => ???
}
}
def optimizeExpr(expr: Expression, currentVarValues: VV): Expression = {
val pos = expr.position
expr match {
@ -161,6 +169,12 @@ abstract class AbstractStatementPreprocessor(ctx: CompilationContext, statements
expr
case FunctionCallExpression("->", List(handle, FunctionCallExpression(method, params))) =>
expr
case TextLiteralExpression(characters) =>
val name = genName(characters)
if (ctx.env.maybeGet[Thing](name).isEmpty) {
ctx.env.root.registerArray(ArrayDeclarationStatement(name, None, None, None, Some(LiteralContents(characters))).pos(pos), ctx.options)
}
VariableExpression(name).pos(pos)
case VariableExpression(v) if currentVarValues.contains(v) =>
val constant = currentVarValues(v)
ErrorReporting.debug(s"Using node flow to replace $v with $constant", pos)

View File

@ -1218,6 +1218,7 @@ class Environment(val parent: Option[Environment], val prefix: String, val cpuFa
case _:BooleanLiteralExpression => ()
case _:LiteralExpression => ()
case _:GeneratedConstantExpression => ()
case _:TextLiteralExpression => ()
case VariableExpression(name) =>
checkName[VariableLikeThing]("Variable or constant", name, node.position)
case IndexedExpression(name, index) =>

View File

@ -46,6 +46,13 @@ case class LiteralExpression(value: Long, requiredSize: Int) extends Expression
override def getAllIdentifiers: Set[String] = Set.empty
}
case class TextLiteralExpression(characters: List[Expression]) extends Expression {
override def replaceVariable(variable: String, actualParam: Expression): Expression = this
override def containsVariable(variable: String): Boolean = false
override def isPure: Boolean = true
override def getAllIdentifiers: Set[String] = Set.empty
}
case class GeneratedConstantExpression(value: Constant, typ: Type) extends Expression {
override def replaceVariable(variable: String, actualParam: Expression): Expression = this
override def containsVariable(variable: String): Boolean = false

View File

@ -41,9 +41,11 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
newPosition
}
val codec: P[(TextCodec, Boolean)] = P(position("text codec identifier") ~ identifier.?.map(_.getOrElse(""))).map {
case (_, "" | "default") => options.platform.defaultCodec -> options.flag(CompilationFlag.LenientTextEncoding)
case (_, "scr") => options.platform.screenCodec -> options.flag(CompilationFlag.LenientTextEncoding)
val codec: P[((TextCodec, Boolean), Boolean)] = P(position("text codec identifier") ~ identifier.?.map(_.getOrElse(""))).map {
case (_, "" | "default") => (options.platform.defaultCodec -> false) -> options.flag(CompilationFlag.LenientTextEncoding)
case (_, "z" | "defaultz") => (options.platform.defaultCodec -> true) -> options.flag(CompilationFlag.LenientTextEncoding)
case (_, "scr") => (options.platform.screenCodec -> false) -> options.flag(CompilationFlag.LenientTextEncoding)
case (_, "scrz") => (options.platform.screenCodec -> true) -> options.flag(CompilationFlag.LenientTextEncoding)
case (p, x) => TextCodec.forName(x, Some(p)) -> false
}
@ -52,9 +54,12 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
val charAtom: P[LiteralExpression] = for {
p <- position()
c <- "'" ~/ CharPred(c => c >= ' ' && !invalidCharLiteralTypes(Character.getType(c))).! ~/ "'"
(co, lenient) <- HWS ~ codec
((co, zt), lenient) <- HWS ~ codec
} yield {
co.encode(options, Some(p), c.charAt(0), lenient = lenient) match {
if (zt) {
ErrorReporting.error("Zero-terminated encoding is not a valid encoding for a character literal", Some(p))
}
co.encode(options, Some(p), c.toList, lenient = lenient) match {
case List(value) =>
LiteralExpression(value, 1)
case _ =>
@ -71,9 +76,18 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
}
}
val textLiteral: P[List[Expression]] = P(position() ~ doubleQuotedString ~/ HWS ~ codec).map {
case (p, s, ((co, zt), lenient)) =>
val characters = co.encode(options, None, s.toList, lenient = lenient).map(c => LiteralExpression(c, 1).pos(p))
if (zt) characters :+ LiteralExpression(0,1)
else characters
}
val textLiteralAtom: P[TextLiteralExpression] = textLiteral.map(TextLiteralExpression)
val literalAtom: P[LiteralExpression] = charAtom | binaryAtom | hexAtom | octalAtom | quaternaryAtom | decimalAtom
val atom: P[Expression] = P(position() ~ (literalAtom | variableAtom)).map{case (p,a) => a.pos(p)}
val atom: P[Expression] = P(position() ~ (literalAtom | variableAtom | textLiteralAtom)).map{case (p,a) => a.pos(p)}
val globalVariableDefinition: P[Seq[DeclarationStatement]] = variableDefinition(true)
val localVariableDefinition: P[Seq[DeclarationStatement]] = variableDefinition(false)
@ -150,9 +164,7 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
LiteralContents(slice.map(c => LiteralExpression(c & 0xff, 1)).toList)
}
def arrayStringContents: P[ArrayContents] = P(position() ~ doubleQuotedString ~/ HWS ~ codec).map {
case (p, s, (co, lenient)) => LiteralContents(s.flatMap(c => co.encode(options, None, c, lenient = lenient)).map(c => LiteralExpression(c, 1).pos(p)))
}
def arrayStringContents: P[ArrayContents] = textLiteral.map(LiteralContents(_))
def arrayLoopContents: P[ArrayContents] = for {
identifier <- "for" ~ SWS ~/ identifier ~/ HWS ~ "," ~/ HWS ~ Pass

View File

@ -9,7 +9,11 @@ import millfork.node.Position
/**
* @author Karol Stasiak
*/
class TextCodec(val name: String, private val map: String, private val extra: Map[Char, Int], private val decompositions: Map[Char, String]) {
class TextCodec(val name: String,
private val map: String,
private val extra: Map[Char, Int],
private val decompositions: Map[Char, String],
private val escapeSequences: Map[String, List[Int]]) {
private def isPrintable(c: Char) = {
c.getType match {
@ -52,31 +56,60 @@ class TextCodec(val name: String, private val map: String, private val extra: Ma
if (s.forall(isPrintable)) f"`$s%s` ($u%s)"
else u
}
private def encodeImpl(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): Option[List[Int]] = {
if (decompositions.contains(c)) {
Some(decompositions(c).toList.flatMap(x => encodeImpl(options, position, x, lenient).getOrElse(Nil)))
} else if (extra.contains(c)) Some(List(extra(c))) else {
val index = map.indexOf(c)
if (index >= 0) {
Some(List(index))
} else if (lenient) {
val alternative = TextCodec.lossyAlternatives.getOrElse(c, Nil).:+("?").find(alts => alts.forall(alt => encodeImpl(options, position, alt, lenient = false).isDefined)).getOrElse("")
ErrorReporting.warn(s"Cannot encode ${format(c)} in encoding `$name`, replaced it with ${format(alternative)}", options, position)
Some(alternative.toList.flatMap(encodeImpl(options, position, _, lenient = false).get))
} else {
None
private def encodeChar(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): Option[List[Int]] = {
if (decompositions.contains(c)) {
Some(decompositions(c).toList.flatMap(x => encodeChar(options, position, x, lenient).getOrElse(Nil)))
} else if (extra.contains(c)) Some(List(extra(c))) else {
val index = map.indexOf(c)
if (index >= 0) {
Some(List(index))
} else if (lenient) {
val alternative = TextCodec.lossyAlternatives.getOrElse(c, Nil).:+("?").find(alts => alts.forall(alt => encodeChar(options, position, alt, lenient = false).isDefined)).getOrElse("")
ErrorReporting.warn(s"Cannot encode ${format(c)} in encoding `$name`, replaced it with ${format(alternative)}", options, position)
Some(alternative.toList.flatMap(encodeChar(options, position, _, lenient = false).get))
} else {
None
}
}
}
def encode(options: CompilationOptions, position: Option[Position], s: List[Char], lenient: Boolean): List[Int] = s match {
case '{' :: tail =>
val (escSeq, closingBrace) = tail.span(_ != '}')
closingBrace match {
case '}' :: xs =>
encodeEscapeSequence(options, escSeq.mkString(""), position, lenient) ++ encode(options, position, xs, lenient)
case _ =>
ErrorReporting.error(f"Unclosed escape sequence", position)
Nil
}
case head :: tail =>
(encodeChar(options, position, head, lenient) match {
case Some(x) => x
case None =>
ErrorReporting.error(f"Invalid character ${format(head)} in string", position)
Nil
}) ++ encode(options, position, tail, lenient)
case Nil => Nil
}
def encode(options: CompilationOptions, position: Option[Position], c: Char, lenient: Boolean): List[Int] = {
encodeImpl(options, position, c, lenient) match {
case Some(x) => x
case None =>
ErrorReporting.error(f"Invalid character ${format(c)} in string", position)
Nil
private def encodeEscapeSequence(options: CompilationOptions, escSeq: String, position: Option[Position], lenient: Boolean): List[Int] = {
if (escSeq.length == 3 && (escSeq(0) == 'X' || escSeq(0) == 'x')){
try {
return List(Integer.parseInt(escSeq.tail, 16))
} catch {
case _: NumberFormatException =>
}
}
escapeSequences.getOrElse(escSeq, {
if (lenient) {
ErrorReporting.warn(s"Cannot encode escape sequence {$escSeq} in encoding `$name`, skipped it", options, position)
} else {
ErrorReporting.error(s"Invalid escape sequence {$escSeq} for encoding `$name`", position)
}
Nil
})
}
def decode(by: Int): Char = {
@ -87,41 +120,81 @@ class TextCodec(val name: String, private val map: String, private val extra: Ma
object TextCodec {
def forName(name: String, position: Option[Position]): TextCodec = (position, name) match {
case (_, "ascii") => TextCodec.Ascii
case (_, "petscii") => TextCodec.Petscii
case (_, "pet") => TextCodec.Petscii
case (_, "cbmscr") => TextCodec.CbmScreencodes
case (_, "petscr") => TextCodec.CbmScreencodes
case (_, "atascii") => TextCodec.Atascii
case (_, "atari") => TextCodec.Atascii
case (_, "bbc") => TextCodec.Bbc
case (_, "apple2") => TextCodec.Apple2
case (_, "jis") => TextCodec.Jis
case (_, "jisx") => TextCodec.Jis
case (_, "iso_de") => TextCodec.IsoIec646De
case (_, "iso_no") => TextCodec.IsoIec646No
case (_, "iso_dk") => TextCodec.IsoIec646No
case (_, "iso_se") => TextCodec.IsoIec646Se
case (_, "iso_fi") => TextCodec.IsoIec646Se
case (_, "iso_yu") => TextCodec.IsoIec646Yu
case (p, x) =>
ErrorReporting.error(s"Unknown string encoding: `$x`", p)
TextCodec.Ascii
def forName(name: String, position: Option[Position]): (TextCodec, Boolean) = {
val zeroTerminated = name.endsWith("z")
val cleanName = name.stripSuffix("z")
val codec = (position, cleanName) match {
case (_, "ascii") => TextCodec.Ascii
case (_, "petscii") => TextCodec.Petscii
case (_, "pet") => TextCodec.Petscii
case (_, "cbmscr") => TextCodec.CbmScreencodes
case (_, "petscr") => TextCodec.CbmScreencodes
case (_, "atascii") => TextCodec.Atascii
case (_, "atari") => TextCodec.Atascii
case (_, "bbc") => TextCodec.Bbc
case (_, "sinclair") => TextCodec.Sinclair
case (_, "apple2") => TextCodec.Apple2
case (_, "jis") => TextCodec.Jis
case (_, "jisx") => TextCodec.Jis
case (_, "iso_de") => TextCodec.IsoIec646De
case (_, "iso_no") => TextCodec.IsoIec646No
case (_, "iso_dk") => TextCodec.IsoIec646No
case (_, "iso_se") => TextCodec.IsoIec646Se
case (_, "iso_fi") => TextCodec.IsoIec646Se
case (_, "iso_yu") => TextCodec.IsoIec646Yu
case (p, _) =>
ErrorReporting.error(s"Unknown string encoding: `$name`", p)
TextCodec.Ascii
}
codec -> zeroTerminated
}
val NotAChar = '\ufffd'
val Ascii = new TextCodec("ASCII", 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty)
private val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127)
val Apple2 = new TextCodec("APPLE-II", 0.until(255).map { i => if (i < 160) NotAChar else (i - 128).toChar }.mkString, Map.empty, Map.empty)
//noinspection ScalaUnusedSymbol
private val AsciiEscapeSequences: Map[String, List[Int]] = Map(
"n" -> List(13),
"t" -> List(9),
"b" -> List(8),
"q" -> List('\"'.toInt),
"apos" -> List('\''.toInt),
"lbrace" -> List('{'.toInt),
"rbrace" -> List('}'.toInt))
//noinspection ScalaUnusedSymbol
private val MinimalEscapeSequencesWithoutBraces: Map[String, List[Int]] = Map(
"n" -> List(13),
"apos" -> List('\''.toInt),
"q" -> List('\"'.toInt))
//noinspection ScalaUnusedSymbol
private val MinimalEscapeSequencesWithBraces: Map[String, List[Int]] = Map(
"n" -> List(13),
"apos" -> List('\''.toInt),
"q" -> List('\"'.toInt),
"lbrace" -> List('{'.toInt),
"rbrace" -> List('}'.toInt))
val Ascii = new TextCodec("ASCII", 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences)
val Apple2 = new TextCodec("APPLE-II", 0.until(255).map { i => if (i < 160) NotAChar else (i - 128).toChar }.mkString, Map.empty, Map.empty, MinimalEscapeSequencesWithBraces)
val IsoIec646De = new TextCodec("ISO-IEC-646-DE",
"\ufffd" * 32 +
" !\"#$%^'()*+,-./0123456789:;<=>?" +
"§ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ^_" +
"`abcdefghijklmnopqrstuvwxyzäöüß",
Map.empty, Map.empty
DefaultOverrides, Map.empty, AsciiEscapeSequences ++ Map(
"UE" -> List('['.toInt),
"OE" -> List('\\'.toInt),
"AE" -> List(']'.toInt),
"ue" -> List('{'.toInt),
"oe" -> List('|'.toInt),
"ae" -> List('}'.toInt),
"ss" -> List('~'.toInt)
)
)
val IsoIec646Se = new TextCodec("ISO-IEC-646-SE",
@ -136,7 +209,14 @@ object TextCodec {
'Ü' -> '^'.toInt,
'ü' -> '~'.toInt,
'$' -> '¤'.toInt),
Map.empty
Map.empty, AsciiEscapeSequences ++ Map(
"AE" -> List('['.toInt),
"OE" -> List('\\'.toInt),
"AA" -> List(']'.toInt),
"ae" -> List('{'.toInt),
"oe" -> List('|'.toInt),
"aa" -> List('}'.toInt)
)
)
val IsoIec646No = new TextCodec("ISO-IEC-646-NO",
@ -155,23 +235,31 @@ object TextCodec {
'«' -> '"'.toInt,
'»' -> '"'.toInt,
'§' -> '#'.toInt),
Map.empty
Map.empty, AsciiEscapeSequences ++ Map(
"AE" -> List('['.toInt),
"OE" -> List('\\'.toInt),
"AA" -> List(']'.toInt),
"ae" -> List('{'.toInt),
"oe" -> List('|'.toInt),
"aa" -> List('}'.toInt)
)
)
val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU",
"\ufffd" * 32 +
" !\"#$%^'()*+,-./0123456789:;<=>?" +
"ŽABCDEFGHIJKLMNOPQRSTUVWXYZŠĐĆČ_" +
"žabcdefghijklmnopqrstuvwxyzšđćč",
Map('Ë' -> '$'.toInt, 'ë' -> '_'.toInt),
Map.empty
)
Map.empty, AsciiEscapeSequences)
val CbmScreencodes = new TextCodec("CBM-Screen",
"@abcdefghijklmnopqrstuvwxyz[£]↑←" +
0x20.to(0x3f).map(_.toChar).mkString +
"ABCDEFGHIJKLMNOPQRSTUVWXYZ\ufffd\ufffd\ufffdπ",
Map('^' -> 0x3E, '♥' -> 0x53, '♡' -> 0x53, '♠' -> 0x41, '♣' -> 0x58, '♢' -> 0x5A, '•' -> 0x51), Map.empty
Map('^' -> 0x3E, '♥' -> 0x53, '♡' -> 0x53, '♠' -> 0x41, '♣' -> 0x58, '♢' -> 0x5A, '•' -> 0x51),
Map.empty, MinimalEscapeSequencesWithoutBraces
)
val Petscii = new TextCodec("PETSCII",
@ -179,7 +267,25 @@ object TextCodec {
0x20.to(0x3f).map(_.toChar).mkString +
"@abcdefghijklmnopqrstuvwxyz[£]↑←" +
"ABCDEFGHIJKLMNOPQRSTUVWXYZ\ufffd\ufffd\ufffdπ",
Map('^' -> 0x5E, '♥' -> 0x73, '♡' -> 0x73, '♠' -> 0x61, '♣' -> 0x78, '♢' -> 0x7A, '•' -> 0x71), Map.empty
Map('^' -> 0x5E, '♥' -> 0x73, '♡' -> 0x73, '♠' -> 0x61, '♣' -> 0x78, '♢' -> 0x7A, '•' -> 0x71), Map.empty, Map(
"n" -> List(13),
"q" -> List('\"'.toInt),
"apos" -> List('\''.toInt),
"up" -> List(0x91),
"down" -> List(0x11),
"left" -> List(0x9d),
"right" -> List(0x1d),
"white" -> List(5),
"black" -> List(0x90),
"red" -> List(0x1c),
"blue" -> List(0x1f),
"green" -> List(0x1e),
"cyan" -> List(0x9f),
"purple" -> List(0x9c),
"yellow" -> List(0x9e),
"reverse" -> List(0x12),
"reverseoff" -> List(0x92)
)
)
val Atascii = new TextCodec("ATASCII",
@ -189,14 +295,49 @@ object TextCodec {
"\ufffd" * 11 +
0x20.to(0x5f).map(_.toChar).mkString +
"♢abcdefghijklmnopqrstuvwxyz♠|",
Map('♥' -> 0, '·' -> 0x14), Map.empty
Map('♥' -> 0, '·' -> 0x14), Map.empty, MinimalEscapeSequencesWithBraces
)
val Bbc = new TextCodec("BBC",
"\ufffd" * 32 +
0x20.to(0x5f).map(_.toChar).mkString +
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
Map('↑' -> '^'.toInt), Map.empty
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
Map('↑' -> '^'.toInt), Map.empty, MinimalEscapeSequencesWithBraces
)
val Sinclair = new TextCodec("Sinclair",
"\ufffd" * 32 +
0x20.to(0x5f).map(_.toChar).mkString +
"£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
Map('↑' -> '^'.toInt), Map.empty, Map(
"n" -> List(13),
"q" -> List('\"'.toInt),
"apos" -> List('\''.toInt),
"lbrace" -> List('{'.toInt),
"rbrace" -> List('}'.toInt),
"up" -> List(11),
"down" -> List(10),
"left" -> List(8),
"right" -> List(9),
"white" -> List(0x10, 7),
"black" -> List(0x10, 0),
"red" -> List(0x10, 2),
"blue" -> List(0x10, 1),
"green" -> List(0x10, 4),
"cyan" -> List(0x10, 5),
"purple" -> List(0x10, 3),
"yellow" -> List(0x10, 6),
"bgwhite" -> List(0x11, 7),
"bgblack" -> List(0x11, 0),
"bgred" -> List(0x11, 2),
"bgblue" -> List(0x11, 1),
"bggreen" -> List(0x11, 4),
"bgcyan" -> List(0x11, 5),
"bgpurple" -> List(0x11, 3),
"bgyellow" -> List(0x11, 6),
"reverse" -> List(0x14, 1),
"reverseoff" -> List(0x14, 0)
)
)
private val jisHalfwidthKatakanaOrder: String =
@ -222,7 +363,7 @@ object TextCodec {
1.to(0x3F).map(i => (i + 0xff60).toChar -> (i + 0xA1)).toMap,
(("カキクケコサシスセソタチツテトハヒフヘホ")).zip(
"ガギグゲゴザジズゼゾダヂヅデドバビブベボ").map { case (u, v) => v -> (u + "゛") }.toMap ++
"ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap
"ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap, MinimalEscapeSequencesWithBraces
)
val lossyAlternatives: Map[Char, List[String]] = {