Parse character map information

This adds associations between scripts, regions, and character maps, and
parses them from a Go utility.
This commit is contained in:
Dietrich Epp
2022-03-15 13:38:45 -04:00
parent 7bc44f4a5a
commit 022d11fa14
9 changed files with 270 additions and 284 deletions
+7 -1
View File
@@ -2,4 +2,10 @@
This folder contains the script and region definitions for the Mac OS toolbox.
These constants are extracted from the `Script.h` file in Universal Interfaces.
- `extract.py`: Generate `script.csv` and `region.csv` from the `Script.h` file in Mac OS Universal Interfaces. The output of this program is checked in, so it does not need to be run again unless the logic is changed.
- `script.csv`: Constants identifying scripts.
- `region.csv`: Constants identifying localization regions.
- `charmap.csv`: Identifies character maps used by classic Mac OS. Each character map is given a name, a data file in the `../charmap` folder, and the script and, optionally, regions it corresponds to. This mapping is taken from the readme in the charmap folder. More specific mappings (which contain regions) in this file take precedence less specific mappings (which do not contain regions). For example, Turkish is more specific than Roman.
+24
View File
@@ -0,0 +1,24 @@
Name,File,Script,Regions
Roman,ROMAN.TXT,smRoman,
Turkish,TURKISH.TXT,smRoman,verTurkey
Croatian,CROATIAN.TXT,smRoman,verCroatia;verSlovenian;verYugoCroatian
Icelandic,ICELAND.TXT,smRoman,verIceland;verFaroeIsl
Romanian,ROMANIAN.TXT,smRoman,verRomania
Celtic,CELTIC.TXT,smRoman,verIreland;verScottishGaelic;verManxGaelic;verBreton;verWelsh
Gaelic,GAELIC.TXT,smRoman,verIrishGaelicScript
Greek,GREEK.TXT,smRoman,verGreece
Japanese,JAPANESE.TXT,smJapanese,
Chinese (Traditional),CHINTRAD.TXT,smTradChinese,
Korean,KOREAN.TXT,smKorean,
Arabic,ARABIC.TXT,smArabic,
Farsi,FARSI.TXT,smArabic,verIran
Hebrew,HEBREW.TXT,smHebrew,
Cyrillic,CYRILLIC.TXT,smCyrillic,
Devanagari,DEVANAGA.TXT,smDevanagari,
Gurmukhi,GURMUKHI.TXT,smGurmukhi,
Gujarati,GUJARATI.TXT,smGujarati,
Thai,,smThai,
Chinese (Simplified),CHINSIMP.TXT,smSimpChinese,
Tibetan,,smTibetan,
Inuit,INUIT.TXT,smEthiopic,verNunavut
Central European,CENTEURO.TXT,smCentralEuroRoman,
1 Name File Script Regions
2 Roman ROMAN.TXT smRoman
3 Turkish TURKISH.TXT smRoman verTurkey
4 Croatian CROATIAN.TXT smRoman verCroatia;verSlovenian;verYugoCroatian
5 Icelandic ICELAND.TXT smRoman verIceland;verFaroeIsl
6 Romanian ROMANIAN.TXT smRoman verRomania
7 Celtic CELTIC.TXT smRoman verIreland;verScottishGaelic;verManxGaelic;verBreton;verWelsh
8 Gaelic GAELIC.TXT smRoman verIrishGaelicScript
9 Greek GREEK.TXT smRoman verGreece
10 Japanese JAPANESE.TXT smJapanese
11 Chinese (Traditional) CHINTRAD.TXT smTradChinese
12 Korean KOREAN.TXT smKorean
13 Arabic ARABIC.TXT smArabic
14 Farsi FARSI.TXT smArabic verIran
15 Hebrew HEBREW.TXT smHebrew
16 Cyrillic CYRILLIC.TXT smCyrillic
17 Devanagari DEVANAGA.TXT smDevanagari
18 Gurmukhi GURMUKHI.TXT smGurmukhi
19 Gujarati GUJARATI.TXT smGujarati
20 Thai smThai
21 Chinese (Simplified) CHINSIMP.TXT smSimpChinese
22 Tibetan smTibetan
23 Inuit INUIT.TXT smEthiopic verNunavut
24 Central European CENTEURO.TXT smCentralEuroRoman