From c45cf7d51d8d0d5de978c90eb9856df27e1f807b Mon Sep 17 00:00:00 2001
From: Karol Stasiak <karol.m.stasiak@gmail.com>
Date: Wed, 18 Sep 2019 00:09:37 +0200
Subject: [PATCH] Add KOI-7 N2 and MSX-BR encodings. Some encoding
 enhancements.

---
 docs/lang/literals.md                         |   2 +
 docs/lang/text.md                             |  54 ++++---
 .../scala/millfork/parser/TextCodec.scala     | 132 +++++++++++++++---
 3 files changed, 151 insertions(+), 37 deletions(-)

diff --git a/docs/lang/literals.md b/docs/lang/literals.md
index d14c38b4..fd46cb0e 100644
--- a/docs/lang/literals.md
+++ b/docs/lang/literals.md
@@ -76,6 +76,8 @@ For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß"
 
 * `"?¥^z(C)ss"` if the default encoding is `jisx`
 
+* `"£¥^z(C)β"` if the default encoding is `msx_intl`
+
 Note that the final length of the string may vary.
 
 ## Character literals
diff --git a/docs/lang/text.md b/docs/lang/text.md
index a3259eea..f42c0bda 100644
--- a/docs/lang/text.md
+++ b/docs/lang/text.md
@@ -35,7 +35,7 @@
  
 * `iso_dk`, `iso_fi` – aliases for `iso_no` and `iso_se` respectively
 
-* `msx_intl`, `msx_jp`, `msx_ru` – MSX character encoding, International, Japanese and Russian respectively
+* `msx_intl`, `msx_jp`, `msx_ru`, `msx_br` – MSX character encoding, International, Japanese, Russian and Brazilian respectively
 
 * `msx_us`, `msx_uk`, `msx_fr`, `msx_de` – aliases for `msx_intl`
  
@@ -43,6 +43,8 @@
  
 * `atasciiscr` or `atariscr` – screencodes used by Atari 8-bit computers
 
+* `koi7n2` or `short_koi` – KOI-7 N2
+
 * `vectrex` – built-in Vectrex font
 
 When programming for Commodore,
@@ -93,23 +95,24 @@ control codes for changing the text background color
 
 ##### Character availability
 
-Encoding | lowercase letters | backslash | pound | yen | katakana | card suits  
---|--|--|--|--|--|--  
-`pet`, `origpet`   | yes¹ | no  | no  | no   | no   | yes¹  
-`oldpet`           | yes² | no  | no  | no   | no   | yes²  
-`petscr`           | yes¹ | no  | yes | no   | no   | yes¹  
-`petjp`            | no   | no  | no  | yes  | yes³ | yes³  
-`petscrjp`         | no   | no  | no  | yes  | yes³ | yes³  
-`sinclair`, `bbc`  | yes  | yes | yes | no   | no   | no  
-`apple2`           | no   | yes | no  | no   | no   | no  
-`atascii`          | yes  | yes | no  | no   | no   | yes  
-`atasciiscr`       | yes  | yes | no  | no   | no   | yes  
-`jis`              | yes  | no  | no  | yes  | yes  | no  
-`msx_intl`         | yes  | yes | yes | yes  | no   | yes   
-`msx_jp`           | yes  | no  | no  | yes  | yes  | yes   
-`msx_ru`           | yes  | yes | no  | no   | no   | yes   
-`vectrex`          | no   | yes | no  | no   | no   | no   
-all the rest       | yes  | yes | no  | no   | no   | no  
+Encoding | lowercase letters | backslash | pound | yen | intl | card suits  
+---------|-------------------|-----------|-------|-----|------|-----------  
+`pet`, `origpet`    | yes¹ | no  | no  | no   | none      | yes¹  
+`oldpet`            | yes² | no  | no  | no   | none      | yes²  
+`petscr`            | yes¹ | no  | yes | no   | none      | yes¹  
+`petjp`             | no   | no  | no  | yes  | katakana³ | yes³  
+`petscrjp`          | no   | no  | no  | yes  | katakana³ | yes³  
+`sinclair`, `bbc`   | yes  | yes | yes | no   | none      | no  
+`apple2`            | no   | yes | no  | no   | none      | no  
+`atascii`           | yes  | yes | no  | no   | none      | yes  
+`atasciiscr`        | yes  | yes | no  | no   | none      | yes  
+`jis`               | yes  | no  | no  | yes  | both kana | no  
+`msx_intl`,`msx_br` | yes  | yes | yes | yes  | Western   | yes   
+`msx_jp`            | yes  | no  | no  | yes  | katakana  | yes   
+`msx_ru`            | yes  | yes | no  | no   | Russian⁴   | yes   
+`koi7n2`            | no   | yes | no  | no   | Russian⁵  | no   
+`vectrex`           | no   | yes | no  | no   | none      | no   
+all the rest        | yes  | yes | no  | no   | none      | no  
   
 1. `pet`, `origpet` and `petscr` cannot display card suit symbols and lowercase letters at the same time.
 Card suit symbols are only available in graphics mode,
@@ -118,16 +121,23 @@ in which lowercase letters are displayed as uppercase and uppercase letters are
 2.  `oldpet` cannot display card suit symbols and lowercase letters at the same time.
 Card suit symbols are only available in graphics mode, in which lowercase letters are displayed as symbols. 
 
-3. `petjp` and `petscrjp` cannot display card suit symbols and katakana at the same time
+3. `petjp` and `petscrjp` cannot display card suit symbols and katakana at the same time.
 Card suit symbols are only available in graphics mode, in which katakana is displayed as symbols. 
 
-If the encoding does not support lowercase letters (e.g. `apple2`, `petjp`, `petscrjp`),
+4. Letter **Ё** and uppercase **Ъ** are not available.
+
+5. Only uppercase. Letters **Ё** and **Ъ** are not available.
+
+If the encoding does not support lowercase letters (e.g. `apple2`, `petjp`, `petscrjp`, `koi7n2`, `vectrex`),
 then text and character literals containing lowercase letters are automatically converted to uppercase. 
+Only unaccented Latin and Cyrillic letters will be converted as such.
+Accented Latin letters will not be converted and will fail to compile without `-flenient-encoding`.     
+To detect if your default encoding does not support lowercase letters, test `'A' == 'a'`.
 
 ##### Escape sequence availability
 
 Encoding | new line | braces | backspace | cursor movement | text colour | reverse | background colour  
---|--|--|--|--|--|--|--  
+---------|----------|--------|-----------|-----------------|-------------|---------|------------------  
 `pet`,`petjp`       | yes | no  | no  | yes | yes | yes | no  
 `origpet`           | yes | no  | no  | yes | no  | yes | no  
 `oldpet`            | yes | no  | no  | yes | no  | yes | no  
@@ -137,5 +147,7 @@ Encoding | new line | braces | backspace | cursor movement | text colour | rever
 `apple2`            | no  | yes | no  | no  | no  | no  | no  
 `atascii`           | yes | no  | yes | yes | no  | no  | no  
 `atasciiscr`        | no  | no  | no  | no  | no  | no  | no  
+`msx_*`             | yes | yes | yes | yes | no  | no  | no  
+`koi7n2`            | yes | no  | yes | no  | no  | no  | no  
 `vectrex`           | no  | no  | no  | no  | no  | no  | no  
 all the rest        | yes | yes | no  | no  | no  | no  | no
diff --git a/src/main/scala/millfork/parser/TextCodec.scala b/src/main/scala/millfork/parser/TextCodec.scala
index 5f1010aa..ed8bac52 100644
--- a/src/main/scala/millfork/parser/TextCodec.scala
+++ b/src/main/scala/millfork/parser/TextCodec.scala
@@ -181,7 +181,10 @@ object TextCodec {
       case (_, "msx_es") => TextCodec.MsxWest
       case (_, "msx_ru") => TextCodec.MsxRu
       case (_, "msx_jp") => TextCodec.MsxJp
+      case (_, "msx_br") => TextCodec.MsxBr
       case (_, "vectrex") => TextCodec.Vectrex
+      case (_, "koi7n2") => TextCodec.Koi7N2
+      case (_, "short_koi") => TextCodec.Koi7N2
       case (p, _) =>
         log.error(s"Unknown string encoding: `$name`", p)
         TextCodec.Ascii
@@ -441,6 +444,25 @@ object TextCodec {
     )
   )
 
+  val Koi7N2 = new TextCodec("KOI-7 N2", 0,
+    "\ufffd" * 32 +
+      " !\"#¤%&'()*+,-./" +
+      "0123456789:;<=>?" +
+      "@ABCDEFGHIJKLMNO" +
+      "PQRSTUVWXYZ[\\]^_" +
+      "ЮАБЦДЕФГХИЙКЛМНО" +
+      "ПЯРСТУЖВЬЫЗШЭЩЧ",
+    Map('↑' -> 0x5E, '$' -> 0x24) ++
+      ('a' to 'z').map(l => l -> l.toUpper.toInt).toMap ++
+      ('а' to 'я').filter(_ != 'ъ').map(l => l -> l.toUpper.toInt).toMap,
+    Map.empty, Map(
+      "n" -> List(13), // TODO: ?
+      "b" -> List(8), // TODO: ?
+      "q" -> List('\"'.toInt),
+      "apos" -> List('\''.toInt)
+    )
+  )
+
   val OldPetscii = new TextCodec("Old PETSCII", 0,
     "\ufffd" * 32 +
       0x20.to(0x3f).map(_.toChar).mkString +
@@ -600,15 +622,46 @@ object TextCodec {
       "Δ\ufffdω\ufffd\ufffd\ufffd\ufffd\ufffd" +
       "αβΓΠΣσµγΦθΩδ∞∅∈∩" +
       "≡±≥≤\ufffd\ufffd÷\ufffd\ufffd\ufffd\ufffd\ufffdⁿ²",
-    Map('ß' -> 0xE1, '¦' -> 0x7C),
+    Map('ß' -> 0xE1, '¦' -> 0x7C, 'Ő' -> 0xB4, 'ő' -> 0xB5, 'Ű' -> 0xB6, 'ű' -> 0xB7),
     Map('♥' -> "\u0001C", '♡' -> "\u0001C", '♢' -> "\u0001D", '♢' -> "\u0001D", '♣' -> "\u0001E", '♠' -> "\u0001F", '·' -> "\u0001G") ,
     MinimalEscapeSequencesWithBraces ++ Map(
+      "right" -> List(0x1c),
+      "left" -> List(0x1d),
+      "up" -> List(0x1e),
+      "down" -> List(0x1f),
+      "b" -> List(8),
       "n" -> List(13, 10),
       "pound" -> List(0x9c),
       "yen" -> List(0x9d),
     )
   )
 
+  val MsxBr = new TextCodec("MSX-BR", 0,
+    "\ufffd" * 32 +
+      (0x20 to 0x7e).map(_.toChar).mkString("") +
+      "\ufffd" +
+      "ÇüéâÁà\ufffdçêÍÓÚÂÊÔÀ" +
+      "ÉæÆôöòûùÿÖÜ¢£¥₧ƒ" +
+      "áíóúñÑªº¿⌐¬½¼¡«»" +
+      "ÃãĨĩÕõŨũĲĳ¾\ufffd\ufffd‰¶§" +
+      "\ufffd" * 24 +
+      "Δ\ufffdω\ufffd\ufffd\ufffd\ufffd\ufffd" +
+      "αβΓΠΣσµγΦθΩδ∞∅∈∩" +
+      "≡±≥≤\ufffd\ufffd÷\ufffd\ufffd\ufffd\ufffd\ufffdⁿ²",
+    Map('ß' -> 0xE1, '¦' -> 0x7C, 'Ő' -> 0xB4, 'ő' -> 0xB5, 'Ű' -> 0xB6, 'ű' -> 0xB7),
+    Map('♥' -> "\u0001C", '♡' -> "\u0001C", '♢' -> "\u0001D", '♢' -> "\u0001D", '♣' -> "\u0001E", '♠' -> "\u0001F", '·' -> "\u0001G") ,
+    MinimalEscapeSequencesWithBraces ++ Map(
+      "right" -> List(0x1c),
+      "left" -> List(0x1d),
+      "up" -> List(0x1e),
+      "down" -> List(0x1f),
+      "n" -> List(13, 10),
+      "b" -> List(8),
+      "pound" -> List(0x9c),
+      "yen" -> List(0x9d),
+    )
+  )
+
   val MsxRu = new TextCodec("MSX-RU", 0,
     "\ufffd" * 32 +
       (0x20 to 0x7e).map(_.toChar).mkString("") +
@@ -617,12 +670,19 @@ object TextCodec {
       "\ufffd" * 8 +
       "Δ\ufffdω\ufffd\ufffd\ufffd\ufffd\ufffd" +
       "αβΓΠΣσµγΦθΩδ∞∅∈∩" +
-      "≡±≥≤\ufffd\ufffd÷\ufffd\ufffd\ufffd\ufffd\ufffdⁿ²\ufffd\ufffd" +
+      "≡±≥≤\ufffd\ufffd÷\ufffd\ufffd\ufffd\ufffd\ufffdⁿ²\ufffd¤" +
       "юабцдефгхийклмнопярстужвьызшэщчъ" +
       "ЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩ",
     Map('ß' -> 0xA1, '¦' -> 0x7C),
     Map('♥' -> "\u0001C", '♡' -> "\u0001C", '♢' -> "\u0001D", '♢' -> "\u0001D", '♣' -> "\u0001E", '♠' -> "\u0001F", '·' -> "\u0001G"),
-    MinimalEscapeSequencesWithBraces + ("n" -> List(13, 10))
+    MinimalEscapeSequencesWithBraces ++ Map(
+      "right" -> List(0x1c),
+      "left" -> List(0x1d),
+      "up" -> List(0x1e),
+      "down" -> List(0x1f),
+      "b" -> List(8),
+      "n" -> List(13, 10)
+    )
   )
 
   val MsxJp = new TextCodec("MSX-JP", 0,
@@ -660,6 +720,11 @@ object TextCodec {
     ) ++
       StandardHiraganaDecompositions ++ StandardKatakanaDecompositions,
     MinimalEscapeSequencesWithBraces ++ Map(
+      "right" -> List(0x1c),
+      "left" -> List(0x1d),
+      "up" -> List(0x1e),
+      "down" -> List(0x1f),
+      "b" -> List(8),
       "n" -> List(13, 10),
       "yen" -> List(0x5c)
     )
@@ -668,19 +733,33 @@ object TextCodec {
   val lossyAlternatives: Map[Char, List[String]] = {
     val allowLowercase: Map[Char, List[String]] = ('A' to 'Z').map(c => c -> List(c.toString.toLowerCase(Locale.ROOT))).toMap
     val allowUppercase: Map[Char, List[String]] = ('a' to 'z').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap
+    val allowLowercaseCyr: Map[Char, List[String]] = ('а' to 'я').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap
+    val allowUppercaseCyr: Map[Char, List[String]] = ('а' to 'я').map(c => c -> List(c.toString.toUpperCase(Locale.ROOT))).toMap
      val ligaturesAndSymbols: Map[Char, List[String]] = Map(
+       // commonly used alternative forms:
        '¦' -> List("|"),
        '|' -> List("¦"),
+       // Eszett:
        'ß' -> List("ss", "SS"),
+       'β' -> List("ß"),
+       // various ligatures:
        'ﬀ' -> List("ff", "FF"),
        'ﬂ' -> List("fl", "FL"),
        'ﬁ' -> List("fi", "FI"),
        'ﬃ' -> List("ffi", "FFI"),
        'ﬄ' -> List("ffl", "FFL"),
+       'ĳ' -> List("ij", "IJ"),
+       'Ĳ' -> List("IJ", "ij"),
+       // fractions:
        '½' -> List("1/2"),
        '¼' -> List("1/4"),
        '¾' -> List("3/4"),
+       // currencies:
+       '₧' -> List("Pt", "PT"),
+       '¢' -> List("c", "C"),
+       '$' -> List("¤"),
        '¥' -> List("Y", "y"),
+       // kanji:
        '円' -> List("¥", "Y", "y"),
        '年' -> List("Y", "y"),
        '月' -> List("M", "m"),
@@ -688,11 +767,13 @@ object TextCodec {
        '時' -> List("h", "H"),
        '分' -> List("m", "M"),
        '秒' -> List("s", "S"),
+       // card suits:
        '♥' -> List("H", "h"),
        '♠' -> List("S", "s"),
        '♡' -> List("H", "h"),
        '♢' -> List("D", "d"),
        '♣' -> List("C", "c"),
+       // Eastern punctuation:
        '。' -> List("."),
        '、' -> List(","),
        '・' -> List("-"),
@@ -701,33 +782,52 @@ object TextCodec {
        '」' -> List("]", ")"),
        '。' -> List("."),
        '。' -> List("."),
+       // quote marks:
+       '«' -> List("\""),
+       '»' -> List("\""),
+       '‟' -> List("\""),
+       '”' -> List("\""),
+       '„' -> List("\""),
+       '’' -> List("\'"),
+       '‘' -> List("\'"),
+       // pi:
+       'π' -> List("Π"),
+       'Π' -> List("π"),
+       // alternative symbols:
        '^' -> List("↑"),
        '↑' -> List("^"),
        '‾' -> List("~"),
        '¯' -> List("~"),
-       '«' -> List("\""),
-       '»' -> List("\""),
        '§' -> List("#"),
        '[' -> List("("),
        ']' -> List(")"),
        '{' -> List("("),
        '}' -> List(")"),
-       '§' -> List("#"),
-       '§' -> List("#"),
-       '©' -> List("(C)"),
-       'İ' -> List("I", "i"),
+       '©' -> List("(C)", "(c)"),
+       '®' -> List("(R)", "(r)"),
+       '‰' -> List("%."),
+       '×' -> List("x"),
+       '÷' -> List("/"),
        'ª' -> List("a", "A"),
        'º' -> List("o", "O"),
-       '‰' -> List("%."),
-       '÷' -> List("/"),
-       'ĳ' -> List("ij", "IJ"),
-       'Ĳ' -> List("IJ", "ij"),
+       // Turkish I with dot:
+       'İ' -> List("I", "i"),
+       // partially supported Russian letters:
+       'ё' -> List("е", "Ё", "Е"),
+       'Ё' -> List("Е", "ё", "е"),
+       'Ъ' -> List("ъ"),
+       'ъ' -> List("Ъ"),
+       // Latin lookalikes for Cyrillic:
+       'і' -> List("i", "I"),
+       'І' -> List("I", "i"),
+       'ј' -> List("j", "J"),
+       'Ј' -> List("J", "j"),
      )
     val accentedLetters: Map[Char, List[String]] = List(
       "áàäãåąāǎă" -> "a",
       "çčċćĉ" -> "c",
-      "đď" -> "d",
-      "ð" -> "dh",
+      "ðď" -> "d",
+      "đ" -> "dj",
       "éèêëęēėě" -> "e",
       "ğǧĝģġ" -> "g",
       "ħĥ" -> "h",
@@ -760,7 +860,7 @@ object TextCodec {
       else fw -> List(hw.toString)
     }.toMap
     val halfWidth = (0xff61 to 0xff9f).map{ c => c.toChar -> List(jisHalfwidthKatakanaOrder(c - 0xff60).toString)}.toMap
-    allowLowercase ++ allowUppercase ++ ligaturesAndSymbols ++ accentedLetters ++ hiragana ++ fullWidth ++ halfWidth
+    allowLowercase ++ allowUppercase ++ allowLowercaseCyr ++ allowUppercaseCyr ++ ligaturesAndSymbols ++ accentedLetters ++ hiragana ++ fullWidth ++ halfWidth
   }
 
 }