diff --git a/CHANGELOG.md b/CHANGELOG.md index 48baef5d..115e892d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,7 +22,7 @@ * Added `nullchar` constant as the null terminator for strings and `NULLCHAR` feature to define its value. -* Added `vectrex`, `msx_br` and `koi7n2` text encodings. +* Added `vectrex`, `msx_br`, `koi7n2`, `iso15`, `zx80` and `zx81` text encodings. * Fixed arithmetic promotion bugs for signed values. diff --git a/docs/lang/literals.md b/docs/lang/literals.md index fd46cb0e..b474d8cc 100644 --- a/docs/lang/literals.md +++ b/docs/lang/literals.md @@ -37,8 +37,12 @@ Two encoding names are special and refer to platform-specific encodings: You can also append `z` to the name of the encoding to make the string zero-terminated. This means that the string will have one extra byte appended, equal to `nullchar`. -The exact value of `nullchar` is encoding-dependent: in the `vectrex` encoding it's $80, -in other encodings it's 0 (this might be a subject to change in future versions). +The exact value of `nullchar` is encoding-dependent: +* in the `vectrex` encoding it's 128, +* in the `zx80` encoding it's 1, +* in the `zx81` encoding it's 11, +* in other encodings it's 0 (this might be a subject to change in future versions). + "this is a zero-terminated string" asciiz "this is also a zero-terminated string"z diff --git a/docs/lang/text.md b/docs/lang/text.md index 2866a0a7..4c5037f1 100644 --- a/docs/lang/text.md +++ b/docs/lang/text.md @@ -29,12 +29,20 @@ * `sinclair` – ZX Spectrum character set +* `zx80` – ZX80 character set + +* `zx81` – ZX81 character set + * `jis` or `jisx` – JIS X 0201 * `iso_de`, `iso_no`, `iso_se`, `iso_yu` – various variants of ISO/IEC-646 * `iso_dk`, `iso_fi` – aliases for `iso_no` and `iso_se` respectively +* `iso15` – ISO 8859-15 + +* `latin0`, `latin9`, `iso8859_15` – aliases for `iso15` + * `msx_intl`, `msx_jp`, `msx_ru`, `msx_br` – MSX character encoding, International, Japanese, Russian and Brazilian respectively * `msx_us`, `msx_uk`, `msx_fr`, `msx_de` – aliases for `msx_intl` @@ -61,8 +69,6 @@ Some escape sequences may expand to multiple characters. For example, in several * `{q}` – double quote symbol -* `{apos}` – apostrophe/single quote - * `{x00}`–`{xff}` – a character of the given hexadecimal value * `{copyright_year}` – this expands to the current year in digits @@ -75,6 +81,8 @@ Some escape sequences may expand to multiple characters. For example, in several ##### Available only in some encodings +* `{apos}` – apostrophe/single quote (available everywhere except for `zx80` and `zx81`) + * `{n}` – new line * `{b}` – backspace @@ -91,29 +99,31 @@ control codes for changing the text background color * `{reverse}`, `{reverseoff}` – inverted mode on/off -* `{yen}`, `{pound}`, `{copy}` – yen symbol, pound symbol, copyright symbol +* `{yen}`, `{pound}`, `{cent}`, `{euro}`, `{copy}` – yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol ##### Character availability -Encoding | lowercase letters | backslash | pound | yen | intl | card suits ----------|-------------------|-----------|-------|-----|------|----------- -`pet`, | yes¹ | no | yes | no | none | yes¹ -`origpet` | yes¹ | yes | no | no | none | yes¹ -`oldpet` | yes² | yes | no | no | none | yes² -`petscr` | yes¹ | no | yes | no | none | yes¹ -`petjp` | no | no | no | yes | katakana³ | yes³ -`petscrjp` | no | no | no | yes | katakana³ | yes³ -`sinclair`, `bbc` | yes | yes | yes | no | none | no -`apple2` | no | yes | no | no | none | no -`atascii` | yes | yes | no | no | none | yes -`atasciiscr` | yes | yes | no | no | none | yes -`jis` | yes | no | no | yes | both kana | no -`msx_intl`,`msx_br` | yes | yes | yes | yes | Western | yes -`msx_jp` | yes | no | no | yes | katakana | yes -`msx_ru` | yes | yes | no | no | Russian⁴ | yes -`koi7n2` | no | yes | no | no | Russian⁵ | no -`vectrex` | no | yes | no | no | none | no -all the rest | yes | yes | no | no | none | no +Encoding | lowercase letters | backslash | currencies | intl | card suits +---------|-------------------|-----------|------------|------|----------- +`pet`, | yes¹ | no | £ | none | yes¹ +`origpet` | yes¹ | yes | | none | yes¹ +`oldpet` | yes² | yes | | none | yes² +`petscr` | yes¹ | no | £ | none | yes¹ +`petjp` | no | no | ¥ | katakana³ | yes³ +`petscrjp` | no | no | ¥ | katakana³ | yes³ +`sinclair`, `bbc` | yes | yes | £ | none | no +`zx80`, `zx81` | no | no | £ | none | no +`apple2` | no | yes | | none | no +`atascii` | yes | yes | | none | yes +`atasciiscr` | yes | yes | | none | yes +`jis` | yes | no | ¥ | both kana | no +`iso15` | yes | yes | €¢£¥ | Western | no +`msx_intl`,`msx_br` | yes | yes | ¢£¥ | Western | yes +`msx_jp` | yes | no | ¥ | katakana | yes +`msx_ru` | yes | yes | | Russian⁴ | yes +`koi7n2` | no | yes | | Russian⁵ | no +`vectrex` | no | yes | | none | no +all the rest | yes | yes | | none | no 1. `pet`, `origpet` and `petscr` cannot display card suit symbols and lowercase letters at the same time. Card suit symbols are only available in graphics mode, @@ -144,7 +154,9 @@ Encoding | new line | braces | backspace | cursor movement | text colour | rever `oldpet` | yes | no | no | yes | no | yes | no `petscr`, `petscrjp`| no | no | no | no | no | no | no `sinclair` | yes | yes | no | yes | yes | yes | yes +`zx80`,`zx81` | yes | no | yes | yes | no | no | no `ascii`, `iso_*` | yes | yes | yes | no | no | no | no +`iso15` | yes | yes | yes | no | no | no | no `apple2` | no | yes | no | no | no | no | no `atascii` | yes | no | yes | yes | no | no | no `atasciiscr` | no | no | no | no | no | no | no diff --git a/src/main/scala/millfork/parser/TextCodec.scala b/src/main/scala/millfork/parser/TextCodec.scala index ed8bac52..43d0113a 100644 --- a/src/main/scala/millfork/parser/TextCodec.scala +++ b/src/main/scala/millfork/parser/TextCodec.scala @@ -185,6 +185,12 @@ object TextCodec { case (_, "vectrex") => TextCodec.Vectrex case (_, "koi7n2") => TextCodec.Koi7N2 case (_, "short_koi") => TextCodec.Koi7N2 + case (_, "zx80") => TextCodec.Zx80 + case (_, "zx81") => TextCodec.Zx81 + case (_, "iso8859_15") => TextCodec.Iso8859_15 + case (_, "latin0") => TextCodec.Iso8859_15 + case (_, "latin9") => TextCodec.Iso8859_15 + case (_, "iso15") => TextCodec.Iso8859_15 case (p, _) => log.error(s"Unknown string encoding: `$name`", p) TextCodec.Ascii @@ -194,7 +200,7 @@ object TextCodec { val NotAChar = '\ufffd' - private val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127) + private lazy val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127) //noinspection ScalaUnusedSymbol private val AsciiEscapeSequences: Map[String, List[Int]] = Map( @@ -218,27 +224,47 @@ object TextCodec { "lbrace" -> List('{'.toInt), "rbrace" -> List('}'.toInt)) - private val StandardKatakanaDecompositions: Map[Char, String] = { + private lazy val StandardKatakanaDecompositions: Map[Char, String] = { (("カキクケコサシスセソタチツテトハヒフヘホ")).zip( "ガギグゲゴザジズゼゾダヂヅデドバビブベボ").map { case (u, v) => v -> (u + "゛") }.toMap ++ "ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap } - private val StandardHiraganaDecompositions: Map[Char, String] = { + private lazy val StandardHiraganaDecompositions: Map[Char, String] = { (("かきくけこさしすせそたちつてとはひふへほ")).zip( "がぎぐげござじずぜぞだぢづでどばびぶべぼ").map { case (u, v) => v -> (u + "゛") }.toMap ++ "はひふへほ".zip("ぱぴぷぺぽ").map { case (h, p) => p -> (h + "゜") }.toMap } - val Ascii = new TextCodec("ASCII", 0, 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences) + lazy val Ascii = new TextCodec("ASCII", 0, 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences) - val Apple2 = new TextCodec("APPLE-II", 0, 0.until(255).map { i => + lazy val Iso8859_15 = new TextCodec("ISO 8859-15", 0, + "\ufffd" * 32 + + 32.until(127).map { i => i.toChar }.mkString + + "\ufffd" + + "\ufffd" * 32 + + "\ufffd¡¢£€¥Š§š©ª«¬\ufffd®¯" + + "°±²³Žµ¶·ž¹º»ŒœŸ¿" + + "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ" + + "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß" + + "àáâãäåæçèéêëìíîï" + + "ðñòóôõö÷øùúûüýþÿ", + Map.empty, Map.empty, AsciiEscapeSequences ++ Map( + "cent" -> List(0xA2), + "pound" -> List(0xA3), + "euro" -> List(0xA4), + "yen" -> List(0xA5), + "copy" -> List(0xA9), + ) + ) + + lazy val Apple2 = new TextCodec("APPLE-II", 0, 0.until(255).map { i => if (i < 0xa0) NotAChar else if (i < 0xe0) (i - 128).toChar else NotAChar }.mkString, ('a' to 'z').map(l => l -> (l - 'a' + 0xC1)).toMap, Map.empty, MinimalEscapeSequencesWithBraces) - val IsoIec646De = new TextCodec("ISO-IEC-646-DE", 0, + lazy val IsoIec646De = new TextCodec("ISO-IEC-646-DE", 0, "\ufffd" * 32 + " !\"#$%^'()*+,-./0123456789:;<=>?" + "§ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ^_" + @@ -254,7 +280,7 @@ object TextCodec { ) ) - val IsoIec646Se = new TextCodec("ISO-IEC-646-SE", 0, + lazy val IsoIec646Se = new TextCodec("ISO-IEC-646-SE", 0, "\ufffd" * 32 + " !\"#¤%^'()*+,-./0123456789:;<=>?" + "@ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÅ^_" + @@ -276,7 +302,7 @@ object TextCodec { ) ) - val IsoIec646No = new TextCodec("ISO-IEC-646-NO", 0, + lazy val IsoIec646No = new TextCodec("ISO-IEC-646-NO", 0, "\ufffd" * 32 + " !\"#$%^'()*+,-./0123456789:;<=>?" + "@ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ^_" + @@ -303,7 +329,7 @@ object TextCodec { ) - val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU", 0, + lazy val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU", 0, "\ufffd" * 32 + " !\"#$%^'()*+,-./0123456789:;<=>?" + "ŽABCDEFGHIJKLMNOPQRSTUVWXYZŠĐĆČ_" + @@ -322,7 +348,7 @@ object TextCodec { ) ) - val CbmScreencodesJp = new TextCodec("CBM-Screen-JP", 0, + lazy val CbmScreencodesJp = new TextCodec("CBM-Screen-JP", 0, "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[¥]↑←" + // 00-1f 0x20.to(0x3f).map(_.toChar).mkString + "タチツテトナニヌネノハヒフヘホマ" + // 40-4f @@ -353,7 +379,7 @@ object TextCodec { ) ) - val Petscii = new TextCodec("PETSCII", 0, + lazy val Petscii = new TextCodec("PETSCII", 0, "\ufffd" * 32 + 0x20.to(0x3f).map(_.toChar).mkString + "@abcdefghijklmnopqrstuvwxyz[£]↑←" + @@ -384,7 +410,7 @@ object TextCodec { ) ) - val PetsciiJp = new TextCodec("PETSCII-JP", 0, + lazy val PetsciiJp = new TextCodec("PETSCII-JP", 0, "\ufffd" * 32 + 0x20.to(0x3f).map(_.toChar).mkString + "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[¥]↑←" + @@ -433,7 +459,7 @@ object TextCodec { ) ) - val Vectrex = new TextCodec("Vectrex", 0x80, + lazy val Vectrex = new TextCodec("Vectrex", 0x80, "\ufffd" * 32 + 0x20.to(0x3f).map(_.toChar).mkString + "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_" + @@ -444,7 +470,7 @@ object TextCodec { ) ) - val Koi7N2 = new TextCodec("KOI-7 N2", 0, + lazy val Koi7N2 = new TextCodec("KOI-7 N2", 0, "\ufffd" * 32 + " !\"#¤%&'()*+,-./" + "0123456789:;<=>?" + @@ -463,7 +489,7 @@ object TextCodec { ) ) - val OldPetscii = new TextCodec("Old PETSCII", 0, + lazy val OldPetscii = new TextCodec("Old PETSCII", 0, "\ufffd" * 32 + 0x20.to(0x3f).map(_.toChar).mkString + "@abcdefghijklmnopqrstuvwxyz[\\]↑←" + @@ -485,7 +511,7 @@ object TextCodec { ) ) - val OriginalPetscii = new TextCodec("Original PETSCII", 0, + lazy val OriginalPetscii = new TextCodec("Original PETSCII", 0, "\ufffd" * 32 + 0x20.to(0x3f).map(_.toChar).mkString + "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]↑←" + @@ -507,7 +533,7 @@ object TextCodec { ) ) - val Atascii = new TextCodec("ATASCII", 0, + lazy val Atascii = new TextCodec("ATASCII", 0, "♡" + "\ufffd" * 15 + "♣\ufffd–\ufffd•" + @@ -524,7 +550,7 @@ object TextCodec { ) ) - val AtasciiScreencodes = new TextCodec("ATASCII-Screen", 0, + lazy val AtasciiScreencodes = new TextCodec("ATASCII-Screen", 0, 0x20.to(0x3f).map(_.toChar).mkString + 0x40.to(0x5f).map(_.toChar).mkString + "♡" + @@ -535,7 +561,7 @@ object TextCodec { Map('♥' -> 0x40, '·' -> 0x54), Map.empty, MinimalEscapeSequencesWithoutBraces ) - val Bbc = new TextCodec("BBC", 0, + lazy val Bbc = new TextCodec("BBC", 0, "\ufffd" * 32 + 0x20.to(0x5f).map(_.toChar).mkString + "£" + 0x61.to(0x7E).map(_.toChar).mkString + "©", @@ -546,7 +572,7 @@ object TextCodec { ) ) - val Sinclair = new TextCodec("Sinclair", 0, + lazy val Sinclair = new TextCodec("Sinclair", 0, "\ufffd" * 32 + 0x20.to(0x5f).map(_.toChar).mkString + "£" + 0x61.to(0x7E).map(_.toChar).mkString + "©", @@ -583,6 +609,47 @@ object TextCodec { ) ) + lazy val Zx80 = new TextCodec("ZX80", 1, + " \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd" + + "£$:?()-+*/=><;,." + + "0123456789" + + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + + "\ufffd" * (9 * 16) + + "\ufffd\ufffd\ufffd\ufffd\"", + ('a' to 'z').map(l => l -> (l - 'a' + 0x26)).toMap, + Map.empty, Map( + "pound" -> List(0x0c), + "q" -> List(0xd4), + "apos" -> List(212), + "n" -> List(0x76), + "b" -> List(0x77), + "up" -> List(0x70), + "down" -> List(0x71), + "left" -> List(0x72), + "right" -> List(0x73), + ) + ) + + lazy val Zx81 = new TextCodec("ZX81", 11, + " \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd" + + "£$:?()><=+-*/;,." + + "0123456789" + + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + + "\ufffd" * (8 * 16) + + "\"", + ('a' to 'z').map(l => l -> (l - 'a' + 0x26)).toMap, + Map.empty, Map( + "pound" -> List(0x0c), + "q" -> List(0xc0), + "n" -> List(0x76), + "b" -> List(0x77), + "up" -> List(0x70), + "down" -> List(0x71), + "left" -> List(0x72), + "right" -> List(0x73), + ) + ) + private val jisHalfwidthKatakanaOrder: String = "\ufffd。「」、・ヲァィゥェォャュョッ" + "ーアイウエオカキクケコサシスセソ" + @@ -590,7 +657,7 @@ object TextCodec { "ミムメモヤユヨラリルレロワン゛゜" //noinspection ScalaUnnecessaryParentheses - val Jis = new TextCodec("JIS-X-0201", 0, + lazy val Jis = new TextCodec("JIS-X-0201", 0, "\ufffd" * 32 + ' '.to('Z').mkString + "[¥]^_" + @@ -610,7 +677,7 @@ object TextCodec { ) ) - val MsxWest = new TextCodec("MSX-International", 0, + lazy val MsxWest = new TextCodec("MSX-International", 0, "\ufffd" * 32 + (0x20 to 0x7e).map(_.toChar).mkString("") + "\ufffd" + @@ -636,7 +703,7 @@ object TextCodec { ) ) - val MsxBr = new TextCodec("MSX-BR", 0, + lazy val MsxBr = new TextCodec("MSX-BR", 0, "\ufffd" * 32 + (0x20 to 0x7e).map(_.toChar).mkString("") + "\ufffd" + @@ -662,7 +729,7 @@ object TextCodec { ) ) - val MsxRu = new TextCodec("MSX-RU", 0, + lazy val MsxRu = new TextCodec("MSX-RU", 0, "\ufffd" * 32 + (0x20 to 0x7e).map(_.toChar).mkString("") + "\ufffd" + @@ -685,7 +752,7 @@ object TextCodec { ) ) - val MsxJp = new TextCodec("MSX-JP", 0, + lazy val MsxJp = new TextCodec("MSX-JP", 0, "\ufffd" * 32 + (0x20 to 0x7e).map(c => if (c == 0x5c) '¥' else c.toChar).mkString("") + "\ufffd" + @@ -730,7 +797,7 @@ object TextCodec { ) ) - val lossyAlternatives: Map[Char, List[String]] = { + lazy val lossyAlternatives: Map[Char, List[String]] = { val allowLowercase: Map[Char, List[String]] = ('A' to 'Z').map(c => c -> List(c.toString.toLowerCase(Locale.ROOT))).toMap val allowUppercase: Map[Char, List[String]] = ('a' to 'z').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap val allowLowercaseCyr: Map[Char, List[String]] = ('а' to 'я').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap