Add zx80, zx81 and iso15 encodings

2026-04-20 03:16:45 +00:00 · 2019-09-20 19:41:53 +02:00
parent 6c3523d5af
commit 680e94c3b7
4 changed files with 134 additions and 51 deletions
@@ -22,7 +22,7 @@

 * Added `nullchar` constant as the null terminator for strings and `NULLCHAR` feature to define its value.

-* Added `vectrex`, `msx_br` and `koi7n2` text encodings.
+* Added `vectrex`, `msx_br`, `koi7n2`, `iso15`, `zx80` and `zx81` text encodings.

 * Fixed arithmetic promotion bugs for signed values.

@@ -37,8 +37,12 @@ Two encoding names are special and refer to platform-specific encodings:

 You can also append `z` to the name of the encoding to make the string zero-terminated.
 This means that the string will have one extra byte appended, equal to `nullchar`.
-The exact value of `nullchar` is encoding-dependent: in the `vectrex` encoding it's $80,
-in other encodings it's 0 (this might be a subject to change in future versions).
+The exact value of `nullchar` is encoding-dependent:
+* in the `vectrex` encoding it's 128,
+* in the `zx80` encoding it's 1,
+* in the `zx81` encoding it's 11,
+* in other encodings it's 0 (this might be a subject to change in future versions).
+

    "this is a zero-terminated string" asciiz
    "this is also a zero-terminated string"z
@@ -29,12 +29,20 @@

 * `sinclair` – ZX Spectrum character set

+* `zx80` – ZX80 character set
+
+* `zx81` – ZX81 character set
+
 * `jis` or `jisx` – JIS X 0201

 * `iso_de`, `iso_no`, `iso_se`, `iso_yu` – various variants of ISO/IEC-646
 
 * `iso_dk`, `iso_fi` – aliases for `iso_no` and `iso_se` respectively

+* `iso15` – ISO 8859-15
+
+* `latin0`, `latin9`, `iso8859_15` – aliases for `iso15`
+
 * `msx_intl`, `msx_jp`, `msx_ru`, `msx_br` – MSX character encoding, International, Japanese, Russian and Brazilian respectively

 * `msx_us`, `msx_uk`, `msx_fr`, `msx_de` – aliases for `msx_intl`
@@ -61,8 +69,6 @@ Some escape sequences may expand to multiple characters. For example, in several

 * `{q}` – double quote symbol

-* `{apos}` – apostrophe/single quote
-
 * `{x00}`–`{xff}` – a character of the given hexadecimal value

 * `{copyright_year}` – this expands to the current year in digits
@@ -75,6 +81,8 @@ Some escape sequences may expand to multiple characters. For example, in several

 ##### Available only in some encodings

+* `{apos}` – apostrophe/single quote (available everywhere except for `zx80` and `zx81`)
+
 * `{n}` – new line

 * `{b}` – backspace
@@ -91,29 +99,31 @@ control codes for changing the text background color

 * `{reverse}`, `{reverseoff}` – inverted mode on/off

-* `{yen}`, `{pound}`, `{copy}` – yen symbol, pound symbol, copyright symbol
+* `{yen}`, `{pound}`, `{cent}`, `{euro}`, `{copy}` – yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol

 ##### Character availability

-Encoding | lowercase letters | backslash | pound | yen | intl | card suits  
---------|-------------------|-----------|-------|-----|------|-----------  
-`pet`,              | yes¹ | no  | yes | no   | none      | yes¹  
-`origpet`           | yes¹ | yes | no  | no   | none      | yes¹  
-`oldpet`            | yes² | yes | no  | no   | none      | yes²  
-`petscr`            | yes¹ | no  | yes | no   | none      | yes¹  
-`petjp`             | no   | no  | no  | yes  | katakana³ | yes³  
-`petscrjp`          | no   | no  | no  | yes  | katakana³ | yes³  
-`sinclair`, `bbc`   | yes  | yes | yes | no   | none      | no  
-`apple2`            | no   | yes | no  | no   | none      | no  
-`atascii`           | yes  | yes | no  | no   | none      | yes  
-`atasciiscr`        | yes  | yes | no  | no   | none      | yes  
-`jis`               | yes  | no  | no  | yes  | both kana | no  
-`msx_intl`,`msx_br` | yes  | yes | yes | yes  | Western   | yes   
-`msx_jp`            | yes  | no  | no  | yes  | katakana  | yes   
-`msx_ru`            | yes  | yes | no  | no   | Russian⁴   | yes   
-`koi7n2`            | no   | yes | no  | no   | Russian⁵  | no   
-`vectrex`           | no   | yes | no  | no   | none      | no   
-all the rest        | yes  | yes | no  | no   | none      | no  
+Encoding | lowercase letters | backslash | currencies | intl | card suits  
+---------|-------------------|-----------|------------|------|-----------  
+`pet`,              | yes¹ | no  | £    | none      | yes¹  
+`origpet`           | yes¹ | yes |      | none      | yes¹  
+`oldpet`            | yes² | yes |      | none      | yes²  
+`petscr`            | yes¹ | no  | £    | none      | yes¹  
+`petjp`             | no   | no  | ¥    | katakana³ | yes³  
+`petscrjp`          | no   | no  | ¥    | katakana³ | yes³  
+`sinclair`, `bbc`   | yes  | yes | £    | none      | no  
+`zx80`, `zx81`      | no   | no  | £    | none      | no  
+`apple2`            | no   | yes |      | none      | no  
+`atascii`           | yes  | yes |      | none      | yes  
+`atasciiscr`        | yes  | yes |      | none      | yes  
+`jis`               | yes  | no  | ¥    | both kana | no  
+`iso15`             | yes  | yes | €¢£¥ | Western   | no   
+`msx_intl`,`msx_br` | yes  | yes | ¢£¥  | Western   | yes   
+`msx_jp`            | yes  | no  | ¥    | katakana  | yes   
+`msx_ru`            | yes  | yes |      | Russian⁴  | yes   
+`koi7n2`            | no   | yes |      | Russian⁵  | no   
+`vectrex`           | no   | yes |      | none      | no   
+all the rest        | yes  | yes |      | none      | no  
  
 1. `pet`, `origpet` and `petscr` cannot display card suit symbols and lowercase letters at the same time.
 Card suit symbols are only available in graphics mode,
@@ -144,7 +154,9 @@ Encoding | new line | braces | backspace | cursor movement | text colour | rever
 `oldpet`            | yes | no  | no  | yes | no  | yes | no  
 `petscr`, `petscrjp`| no  | no  | no  | no  | no  | no  | no  
 `sinclair`          | yes | yes | no  | yes | yes | yes | yes  
+`zx80`,`zx81`       | yes | no  | yes | yes | no  | no  | no  
 `ascii`, `iso_*`    | yes | yes | yes | no  | no  | no  | no  
+`iso15`             | yes | yes | yes | no  | no  | no  | no  
 `apple2`            | no  | yes | no  | no  | no  | no  | no  
 `atascii`           | yes | no  | yes | yes | no  | no  | no  
 `atasciiscr`        | no  | no  | no  | no  | no  | no  | no  
@@ -185,6 +185,12 @@ object TextCodec {
      case (_, "vectrex") => TextCodec.Vectrex
      case (_, "koi7n2") => TextCodec.Koi7N2
      case (_, "short_koi") => TextCodec.Koi7N2
+      case (_, "zx80") => TextCodec.Zx80
+      case (_, "zx81") => TextCodec.Zx81
+      case (_, "iso8859_15") => TextCodec.Iso8859_15
+      case (_, "latin0") => TextCodec.Iso8859_15
+      case (_, "latin9") => TextCodec.Iso8859_15
+      case (_, "iso15") => TextCodec.Iso8859_15
      case (p, _) =>
        log.error(s"Unknown string encoding: `$name`", p)
        TextCodec.Ascii
@@ -194,7 +200,7 @@ object TextCodec {

  val NotAChar = '\ufffd'

-  private val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127)
+  private lazy val DefaultOverrides: Map[Char, Int] = ('\u2400' to '\u2420').map(c => c->(c.toInt - 0x2400)).toMap + ('\u2421' -> 127)

  //noinspection ScalaUnusedSymbol
  private val AsciiEscapeSequences: Map[String, List[Int]] = Map(
@@ -218,27 +224,47 @@ object TextCodec {
    "lbrace" -> List('{'.toInt),
    "rbrace" -> List('}'.toInt))

-  private val StandardKatakanaDecompositions: Map[Char, String] = {
+  private lazy val StandardKatakanaDecompositions: Map[Char, String] = {
    (("カキクケコサシスセソタチツテトハヒフヘホ")).zip(
      "ガギグゲゴザジズゼゾダヂヅデドバビブベボ").map { case (u, v) => v -> (u + "゛") }.toMap ++
      "ハヒフヘホ".zip("パピプペポ").map { case (h, p) => p -> (h + "゜") }.toMap
  }
-  private val StandardHiraganaDecompositions: Map[Char, String] = {
+  private lazy val StandardHiraganaDecompositions: Map[Char, String] = {
    (("かきくけこさしすせそたちつてとはひふへほ")).zip(
      "がぎぐげござじずぜぞだぢづでどばびぶべぼ").map { case (u, v) => v -> (u + "゛") }.toMap ++
      "はひふへほ".zip("ぱぴぷぺぽ").map { case (h, p) => p -> (h + "゜") }.toMap
  }

-  val Ascii = new TextCodec("ASCII", 0, 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences)
+  lazy val Ascii = new TextCodec("ASCII", 0, 0.until(127).map { i => if (i < 32) NotAChar else i.toChar }.mkString, Map.empty, Map.empty, AsciiEscapeSequences)

-  val Apple2 = new TextCodec("APPLE-II", 0, 0.until(255).map { i =>
+  lazy val Iso8859_15 = new TextCodec("ISO 8859-15", 0,
+    "\ufffd" * 32 +
+      32.until(127).map { i => i.toChar }.mkString +
+      "\ufffd" +
+      "\ufffd" * 32 +
+      "\ufffd¡¢£€¥Š§š©ª«¬\ufffd®¯" +
+      "°±²³Žµ¶·ž¹º»ŒœŸ¿" +
+      "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ" +
+      "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß" +
+      "àáâãäåæçèéêëìíîï" +
+      "ðñòóôõö÷øùúûüýþÿ",
+    Map.empty, Map.empty, AsciiEscapeSequences ++ Map(
+      "cent" -> List(0xA2),
+      "pound" -> List(0xA3),
+      "euro" -> List(0xA4),
+      "yen" -> List(0xA5),
+      "copy" -> List(0xA9),
+    )
+  )
+
+  lazy val Apple2 = new TextCodec("APPLE-II", 0, 0.until(255).map { i =>
    if (i < 0xa0) NotAChar
    else if (i < 0xe0) (i - 128).toChar
    else NotAChar
  }.mkString,
    ('a' to 'z').map(l => l -> (l - 'a' + 0xC1)).toMap, Map.empty, MinimalEscapeSequencesWithBraces)

-  val IsoIec646De = new TextCodec("ISO-IEC-646-DE", 0,
+  lazy val IsoIec646De = new TextCodec("ISO-IEC-646-DE", 0,
    "\ufffd" * 32 +
      " !\"#$%^'()*+,-./0123456789:;<=>?" +
      "§ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ^_" +
@@ -254,7 +280,7 @@ object TextCodec {
    )
  )

-  val IsoIec646Se = new TextCodec("ISO-IEC-646-SE", 0,
+  lazy val IsoIec646Se = new TextCodec("ISO-IEC-646-SE", 0,
    "\ufffd" * 32 +
      " !\"#¤%^'()*+,-./0123456789:;<=>?" +
      "@ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÅ^_" +
@@ -276,7 +302,7 @@ object TextCodec {
    )
  )

-  val IsoIec646No = new TextCodec("ISO-IEC-646-NO", 0,
+  lazy val IsoIec646No = new TextCodec("ISO-IEC-646-NO", 0,
    "\ufffd" * 32 +
      " !\"#$%^'()*+,-./0123456789:;<=>?" +
      "@ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ^_" +
@@ -303,7 +329,7 @@ object TextCodec {
  )


-  val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU", 0,
+  lazy val IsoIec646Yu = new TextCodec("ISO-IEC-646-YU", 0,
    "\ufffd" * 32 +
      " !\"#$%^'()*+,-./0123456789:;<=>?" +
      "ŽABCDEFGHIJKLMNOPQRSTUVWXYZŠĐĆČ_" +
@@ -322,7 +348,7 @@ object TextCodec {
    )
  )

-  val CbmScreencodesJp = new TextCodec("CBM-Screen-JP", 0,
+  lazy val CbmScreencodesJp = new TextCodec("CBM-Screen-JP", 0,
    "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[¥]↑←" + // 00-1f
      0x20.to(0x3f).map(_.toChar).mkString +
      "タチツテトナニヌネノハヒフヘホマ" + // 40-4f
@@ -353,7 +379,7 @@ object TextCodec {
    )
  )

-  val Petscii = new TextCodec("PETSCII", 0,
+  lazy val Petscii = new TextCodec("PETSCII", 0,
    "\ufffd" * 32 +
      0x20.to(0x3f).map(_.toChar).mkString +
      "@abcdefghijklmnopqrstuvwxyz[£]↑←" +
@@ -384,7 +410,7 @@ object TextCodec {
    )
  )

-  val PetsciiJp = new TextCodec("PETSCII-JP", 0,
+  lazy val PetsciiJp = new TextCodec("PETSCII-JP", 0,
    "\ufffd" * 32 +
      0x20.to(0x3f).map(_.toChar).mkString +
      "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[¥]↑←" +
@@ -433,7 +459,7 @@ object TextCodec {
    )
  )

-  val Vectrex = new TextCodec("Vectrex", 0x80,
+  lazy val Vectrex = new TextCodec("Vectrex", 0x80,
    "\ufffd" * 32 +
      0x20.to(0x3f).map(_.toChar).mkString +
      "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_" +
@@ -444,7 +470,7 @@ object TextCodec {
    )
  )

-  val Koi7N2 = new TextCodec("KOI-7 N2", 0,
+  lazy val Koi7N2 = new TextCodec("KOI-7 N2", 0,
    "\ufffd" * 32 +
      " !\"#¤%&'()*+,-./" +
      "0123456789:;<=>?" +
@@ -463,7 +489,7 @@ object TextCodec {
    )
  )

-  val OldPetscii = new TextCodec("Old PETSCII", 0,
+  lazy val OldPetscii = new TextCodec("Old PETSCII", 0,
    "\ufffd" * 32 +
      0x20.to(0x3f).map(_.toChar).mkString +
      "@abcdefghijklmnopqrstuvwxyz[\\]↑←" +
@@ -485,7 +511,7 @@ object TextCodec {
    )
  )

-  val OriginalPetscii = new TextCodec("Original PETSCII", 0,
+  lazy val OriginalPetscii = new TextCodec("Original PETSCII", 0,
    "\ufffd" * 32 +
      0x20.to(0x3f).map(_.toChar).mkString +
      "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]↑←" +
@@ -507,7 +533,7 @@ object TextCodec {
    )
  )

-  val Atascii = new TextCodec("ATASCII", 0,
+  lazy val Atascii = new TextCodec("ATASCII", 0,
    "♡" +
    "\ufffd" * 15 +
      "♣\ufffd–\ufffd•" +
@@ -524,7 +550,7 @@ object TextCodec {
    )
  )

-  val AtasciiScreencodes = new TextCodec("ATASCII-Screen", 0,
+  lazy val AtasciiScreencodes = new TextCodec("ATASCII-Screen", 0,
    0x20.to(0x3f).map(_.toChar).mkString +
      0x40.to(0x5f).map(_.toChar).mkString +
      "♡" +
@@ -535,7 +561,7 @@ object TextCodec {
    Map('♥' -> 0x40, '·' -> 0x54), Map.empty, MinimalEscapeSequencesWithoutBraces
  )

-  val Bbc = new TextCodec("BBC", 0,
+  lazy val Bbc = new TextCodec("BBC", 0,
    "\ufffd" * 32 +
      0x20.to(0x5f).map(_.toChar).mkString +
      "£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
@@ -546,7 +572,7 @@ object TextCodec {
    )
  )

-  val Sinclair = new TextCodec("Sinclair", 0,
+  lazy val Sinclair = new TextCodec("Sinclair", 0,
    "\ufffd" * 32 +
      0x20.to(0x5f).map(_.toChar).mkString +
      "£" + 0x61.to(0x7E).map(_.toChar).mkString + "©",
@@ -583,6 +609,47 @@ object TextCodec {
    )
  )

+  lazy val Zx80 = new TextCodec("ZX80", 1,
+    " \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd" +
+      "£$:?()-+*/=><;,." +
+      "0123456789" +
+      "ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
+      "\ufffd" * (9 * 16) +
+      "\ufffd\ufffd\ufffd\ufffd\"",
+    ('a' to 'z').map(l => l -> (l - 'a' + 0x26)).toMap,
+    Map.empty, Map(
+      "pound" -> List(0x0c),
+      "q" -> List(0xd4),
+      "apos" -> List(212),
+      "n" -> List(0x76),
+      "b" -> List(0x77),
+      "up" -> List(0x70),
+      "down" -> List(0x71),
+      "left" -> List(0x72),
+      "right" -> List(0x73),
+    )
+  )
+
+  lazy val Zx81 = new TextCodec("ZX81", 11,
+    " \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd" +
+      "£$:?()><=+-*/;,." +
+      "0123456789" +
+      "ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
+      "\ufffd" * (8 * 16) +
+      "\"",
+    ('a' to 'z').map(l => l -> (l - 'a' + 0x26)).toMap,
+    Map.empty, Map(
+      "pound" -> List(0x0c),
+      "q" -> List(0xc0),
+      "n" -> List(0x76),
+      "b" -> List(0x77),
+      "up" -> List(0x70),
+      "down" -> List(0x71),
+      "left" -> List(0x72),
+      "right" -> List(0x73),
+    )
+  )
+
  private val jisHalfwidthKatakanaOrder: String =
    "\ufffd。「」、・ヲァィゥェォャュョッ" +
    "ーアイウエオカキクケコサシスセソ" +
@@ -590,7 +657,7 @@ object TextCodec {
    "ミムメモヤユヨラリルレロワン゛゜"

  //noinspection ScalaUnnecessaryParentheses
-  val Jis = new TextCodec("JIS-X-0201", 0,
+  lazy val Jis = new TextCodec("JIS-X-0201", 0,
    "\ufffd" * 32 +
      ' '.to('Z').mkString +
      "[¥]^_" +
@@ -610,7 +677,7 @@ object TextCodec {
    )
  )

-  val MsxWest = new TextCodec("MSX-International", 0,
+  lazy val MsxWest = new TextCodec("MSX-International", 0,
    "\ufffd" * 32 +
      (0x20 to 0x7e).map(_.toChar).mkString("") +
      "\ufffd" +
@@ -636,7 +703,7 @@ object TextCodec {
    )
  )

-  val MsxBr = new TextCodec("MSX-BR", 0,
+  lazy val MsxBr = new TextCodec("MSX-BR", 0,
    "\ufffd" * 32 +
      (0x20 to 0x7e).map(_.toChar).mkString("") +
      "\ufffd" +
@@ -662,7 +729,7 @@ object TextCodec {
    )
  )

-  val MsxRu = new TextCodec("MSX-RU", 0,
+  lazy val MsxRu = new TextCodec("MSX-RU", 0,
    "\ufffd" * 32 +
      (0x20 to 0x7e).map(_.toChar).mkString("") +
      "\ufffd" +
@@ -685,7 +752,7 @@ object TextCodec {
    )
  )

-  val MsxJp = new TextCodec("MSX-JP", 0,
+  lazy val MsxJp = new TextCodec("MSX-JP", 0,
    "\ufffd" * 32 +
      (0x20 to 0x7e).map(c => if (c == 0x5c) '¥' else c.toChar).mkString("") +
      "\ufffd" +
@@ -730,7 +797,7 @@ object TextCodec {
    )
  )

-  val lossyAlternatives: Map[Char, List[String]] = {
+  lazy val lossyAlternatives: Map[Char, List[String]] = {
    val allowLowercase: Map[Char, List[String]] = ('A' to 'Z').map(c => c -> List(c.toString.toLowerCase(Locale.ROOT))).toMap
    val allowUppercase: Map[Char, List[String]] = ('a' to 'z').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap
    val allowLowercaseCyr: Map[Char, List[String]] = ('а' to 'я').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap