1
0
mirror of https://github.com/KarolS/millfork.git synced 2025-01-12 03:30:09 +00:00
millfork/docs/lang/text.md

294 lines
12 KiB
Markdown
Raw Normal View History

2019-07-15 14:21:50 +02:00
[< back to index](../doc_index.md)
2020-05-01 01:31:54 +02:00
# Text encodings and escape sequences
### Defining custom encodings
Every platform is defined in an `.tbl` file with an appropriate name.
The file is looked up in the directories on the include path, first directly, then in the `encoding` subdirectory.
TODO: document the file format.
### Text encoding list
* `default` default console encoding (can be omitted)
* `scr` default screencodes
(usually the same as `default`, a notable exception are the Commodore computers)
* `ascii` standard ASCII
2020-05-01 01:31:54 +02:00
* `petscii` or `pet` PETSCII (ASCII-like character set used by Commodore machines from VIC-20 onward)
2018-12-30 18:54:45 +01:00
2020-05-01 01:31:54 +02:00
* `petsciijp` or `petjp` PETSCII as used on Japanese versions of Commodore 64
2020-05-01 01:31:54 +02:00
* `origpetscii` or `origpet` old PETSCII (Commodore PET with original ROMs)
2018-12-30 18:54:45 +01:00
2020-05-01 01:31:54 +02:00
* `oldpetscii` or `oldpet` old PETSCII (Commodore PET with newer ROMs)
* `cbmscr` or `petscr` Commodore screencodes
* `cbmscrjp` or `petscrjp` Commodore screencodes as used on Japanese versions of Commodore 64
2020-05-01 01:31:54 +02:00
* `apple2` original Apple II charset ($A0$DF)
* `apple2e` Apple IIe charset
* `apple2c` alternative Apple IIc charset
* `apple2gs` Apple IIgs charset
* `macroman` Macintosh Western Latin charset
* `bbc` BBC Micro character set
* `sinclair` ZX Spectrum character set
2019-09-20 19:41:53 +02:00
* `zx80` ZX80 character set
* `zx81` ZX81 character set
* `jis` or `jisx` JIS X 0201
* `iso_de`, `iso_no`, `iso_se`, `iso_yu` various variants of ISO/IEC-646
2020-05-01 01:31:54 +02:00
* `iso_dk`, `iso_fi` aliases for `iso_no` and `iso_se` respectively
* `dmcs` DEC Multinational Character Set
* `lics` Lotus International Character Set
2020-05-01 01:31:54 +02:00
* `iso8859_1`, `iso8859_2`, `iso8859_3`,
`iso8859_4`, `iso8859_5`, `iso8859_7`,
`iso8859_9`, `iso8859_10`, `iso8859_13`,
`iso8859_14`, `iso8859_15`, `iso8859_13`
ISO 8859-1, ISO 8859-2, ISO 8859-3,
ISO 8859-4, ISO 8859-5, ISO 8859-7,
ISO 8859-9, ISO 8859-10, ISO 8859-13,
ISO 8859-14, ISO 8859-15, ISO 8859-16,
* `iso1`, `latin1` aliases for `iso8859_1`
* `iso2`, `latin2` aliases for `iso8859_2`
* `iso3`, `latin3` aliases for `iso8859_3`
* `iso4`, `latin4` aliases for `iso8859_4`
* `iso5` alias for `iso8859_5`
* `iso7` alias for `iso8859_7`
* `iso9`, `latin5`, aliases for `iso8859_9`
* `iso10`, `latin6` aliases for `iso8859_10`
* `iso13`, `latin7` aliases for `iso8859_13`
* `iso14`, `latin8` aliases for `iso8859_14`
* `iso_15`, `latin9`, `latin0` aliases for `iso8859_15`
* `iso16`, `latin10` aliases for `iso8859_16`
* `brascii` BraSCII
2020-05-01 01:31:54 +02:00
* `cp437`, `cp850`, `cp851`, `cp852`, `cp855`, `cp858`, `cp866`
DOS codepages 437, 850, 851, 852, 855, 858, 866
* `mazovia` Mazovia encoding
* `kamenicky` Kamenický encoding
* `cp1250`, `cp1251`, `cp1252` Windows codepages 1250, 1251, 1252
2019-07-31 00:20:18 +02:00
2020-05-01 01:31:54 +02:00
* `msx_intl`, `msx_jp`, `msx_ru`, `msx_br` MSX character encoding, International, Japanese, Russian and Brazilian respectively
2019-09-20 19:41:53 +02:00
2020-05-01 01:31:54 +02:00
* `msx_us`, `msx_uk`, `msx_fr`, `msx_de` aliases for `msx_intl`
* `cpc_en`, `cpc_fr`, `cpc_es`, `cpc_da` Amstrad CPC character encoding, English, French, Spanish and Danish respectively
2019-09-20 19:41:53 +02:00
2020-05-01 01:31:54 +02:00
* `pcw` or `amstrad_cpm` Amstrad CP/M encoding, the US variant (language 0), as used on PCW machines
* `pokemon1en`, `pokemon1jp`, `pokemon1es`, `pokemon1fr` text encodings used in 1st generation Pokémon games,
English, Japanese, Spanish/Italian and French/German respectively
2019-07-31 00:20:18 +02:00
2020-05-01 01:31:54 +02:00
* `pokemon1it`, `pokemon1de` aliases for `pokemon1es` and `pokemon1fr` respectively
2019-07-12 13:29:59 +02:00
* `atascii` or `atari` ATASCII as seen on Atari 8-bit computers
* `atasciiscr` or `atariscr` screencodes used by Atari 8-bit computers
* `koi7n2` or `short_koi` KOI-7 N2
2020-07-20 00:13:49 +02:00
* `koi8r`, `koi8u`, `koi8ru`, `koi8e`, `koi8f`, `koi8t` various variants of KOI-8
* `vectrex` built-in Vectrex font
2020-05-01 01:31:54 +02:00
* `galaksija` text encoding used on Galaksija computers
2020-09-27 18:58:46 +02:00
* `trs80m1` text encoding used on TRS-80 Model 1
* `trs80m3` text encoding used on TRS-80 Model 3
2020-07-31 01:58:40 +02:00
* `coco` text encoding used on Tandy Color Computer
* `cocoscr` Tandy Color Computer screencodes
2020-09-29 22:50:46 +02:00
* `z1013` text encodind used on Robotron Z1013
2020-05-01 01:31:54 +02:00
* `ebcdic` EBCDIC codepage 037 (partial coverage)
2019-10-18 11:01:31 +02:00
* `utf8` UTF-8
2019-10-17 23:23:57 +02:00
* `utf16be`, `utf16le` UTF-16BE and UTF-16LE
When programming for Commodore,
2020-05-01 01:31:54 +02:00
use `petscii` for strings you're printing using standard I/O routines
and `petsciiscr` for strings you're copying to screen memory directly.
When programming for Atari,
use `atascii` for strings you're printing using standard I/O routines
and `atasciiscr` for strings you're copying to screen memory directly.
### Escape sequences
2019-08-05 14:07:33 +02:00
Escape sequences allow for including characters in the string literals that would be otherwise impossible to type.
Some escape sequences may expand to multiple characters. For example, in several encodings `{n}` expands to `{x0D}{x0A}`.
##### Available everywhere
* `{x00}``{xff}` a character of the given hexadecimal value
* `{copyright_year}` this expands to the current year in digits
* `{program_name}` this expands to the name of the output file without the file extension
* `{program_name_upper}` the same, but uppercased
* `{nullchar}` the null terminator for strings (`"{nullchar}"` is equivalent to `""z`).
The exact value of `{nullchar}` is encoding-dependent:
* in the `vectrex` encoding it's `{x80}`,
* in the `zx80` encoding it's `{x01}`,
* in the `zx81` encoding it's `{x0b}`,
* in the `petscr` and `petscrjp` encodings it's `{xe0}`,
* in the `atasciiscr` encoding it's `{xdb}`,
2020-05-01 01:31:54 +02:00
* in the `pokemon1*` encodings it's `{x50}`,
2020-07-31 16:07:10 +02:00
* in the `cocoscr` encoding it's exceptionally two bytes: `{xd0}`
* in the `utf16be` and `utf16le` encodings it's exceptionally two bytes: `{x00}{x00}`
* in other encodings it's `{x00}` (this may be a subject to change in future versions).
2019-09-02 23:22:32 +02:00
##### Available only in some encodings
2020-05-01 01:31:54 +02:00
* `{apos}` apostrophe/single quote (available everywhere except for `zx80`, `zx81` and `galaksija`)
* `{q}` double quote symbol (available everywhere except for `pokemon1*` encodings)
2019-09-20 19:41:53 +02:00
2018-12-17 17:03:52 +01:00
* `{n}` new line
* `{b}` backspace
* `{lbrace}`, `{rbrace}` opening and closing curly brace (only in encodings that support braces)
* `{up}`, `{down}`, `{left}`, `{right}` control codes for moving the cursor
* `{white}`, `{black}`, `{red}`, `{green}`, `{blue}`, `{cyan}`, `{yellow}`, `{purple}`
2020-05-01 01:31:54 +02:00
control codes for changing the text color (`petscii`, `petsciijp`, `sinclair` only)
* `{bgwhite}`, `{bgblack}`, `{bgred}`, `{bggreen}`, `{bgblue}`, `{bgcyan}`, `{bgyellow}`, `{bgpurple}`
2020-05-01 01:31:54 +02:00
control codes for changing the text background color (`sinclair` only)
* `{reverse}`, `{reverseoff}` inverted mode on/off
2019-09-20 19:41:53 +02:00
* `{yen}`, `{pound}`, `{cent}`, `{euro}`, `{copy}` yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol
2020-05-01 01:31:54 +02:00
* `{nbsp}`, `{shy}` non-breaking space, soft hyphen
* `{pi}` letter π
2019-10-17 23:23:57 +02:00
* `{u0000}``{u1fffff}` Unicode codepoint (available in UTF encodings only)
##### Character availability
2020-05-01 01:31:54 +02:00
For ISO/DOS/Windows/UTF encodings, consult external sources.
2019-09-20 19:41:53 +02:00
Encoding | lowercase letters | backslash | currencies | intl | card suits
---------|-------------------|-----------|------------|------|-----------
`pet`, | yes¹ | no | £ | none | yes¹
`origpet` | yes¹ | yes | | none | yes¹
`oldpet` | yes² | yes | | none | yes²
`petscr` | yes¹ | no | £ | none | yes¹
`petjp` | no | no | ¥ | katakana³ | yes³
`petscrjp` | no | no | ¥ | katakana³ | yes³
`sinclair`, `bbc` | yes | yes | £ | none | no
`zx80`, `zx81` | no | no | £ | none | no
`apple2` | no | yes | | none | no
`atascii` | yes | yes | | none | yes
`atasciiscr` | yes | yes | | none | yes
2020-09-29 22:50:46 +02:00
`z1013` | yes | yes | | none | yes
2019-09-20 19:41:53 +02:00
`jis` | yes | no | ¥ | both kana | no
`dmcs`,`lics` | yes | yes | ¢£¥ | Western | no
`brascii`,`macroman`| yes | yes | ¢£¥ | Western | no
2019-09-20 19:41:53 +02:00
`msx_intl`,`msx_br` | yes | yes | ¢£¥ | Western | yes
`msx_jp` | yes | no | ¥ | katakana | yes
`msx_ru` | yes | yes | | Russian⁴ | yes
`koi7n2` | no | yes | | Russian⁵ | no
2020-07-20 00:13:49 +02:00
`koi8*` | yes | yes | | Russian | no
2020-05-01 01:31:54 +02:00
`cpc_en` | yes | yes | £ | none | yes
`cpc_es` | yes | yes | | Spanish⁶ | yes
`cpc_fr` | yes | no | £ | French⁷ | yes
`cpc_da` | yes | no | £ | Nor/Dan. | yes
2019-09-20 19:41:53 +02:00
`vectrex` | no | yes | | none | no
2020-07-31 01:58:40 +02:00
`coco`,`cocoscr` | no | yes | | none | no
2020-05-01 01:31:54 +02:00
`pokemon1jp` | no | no | | both kana | no
`pokemon1en` | yes | no | | none | no
`pokemon1fr` | yes | no | | Ger/Fre. | no
`pokemon1es` | yes | no | | Spa/Ita. | no
`galaksija` | no | no | | Yugoslav⁸ | no
2019-07-31 02:37:40 +02:00
1. `pet`, `origpet` and `petscr` cannot display card suit symbols and lowercase letters at the same time.
Card suit symbols are only available in graphics mode,
in which lowercase letters are displayed as uppercase and uppercase letters are displayed as symbols.
2. `oldpet` cannot display card suit symbols and lowercase letters at the same time.
Card suit symbols are only available in graphics mode, in which lowercase letters are displayed as symbols.
3. `petjp` and `petscrjp` cannot display card suit symbols and katakana at the same time.
Card suit symbols are only available in graphics mode, in which katakana is displayed as symbols.
4. Letter **Ё** and uppercase **Ъ** are not available.
5. Only uppercase. Letters **Ё** and **Ъ** are not available.
2020-05-01 01:31:54 +02:00
6. No accented vowels.
7. Some accented vowels are not available.
8. Letter **Đ** is not available.
If the encoding does not support lowercase letters (e.g. `apple2`, `petjp`, `petscrjp`, `koi7n2`, `vectrex`),
then text and character literals containing lowercase letters are automatically converted to uppercase.
Only unaccented Latin and Cyrillic letters will be converted as such.
Accented Latin letters will not be converted and will fail to compile without `-flenient-encoding`.
To detect if your default encoding does not support lowercase letters, test `'A' == 'a'`.
##### Escape sequence availability
2020-05-01 01:31:54 +02:00
The table below may be incomplete.
Encoding | new line | braces | backspace | cursor movement | text colour | reverse | background colour
---------|----------|--------|-----------|-----------------|-------------|---------|------------------
`pet`,`petjp` | yes | no | no | yes | yes | yes | no
`origpet` | yes | no | no | yes | no | yes | no
`oldpet` | yes | no | no | yes | no | yes | no
`petscr`, `petscrjp`| no | no | no | no | no | no | no
`sinclair` | yes | yes | no | yes | yes | yes | yes
2019-09-20 19:41:53 +02:00
`zx80`,`zx81` | yes | no | yes | yes | no | no | no
`ascii`, `iso_*` | yes | yes | yes | no | no | no | no
2020-05-01 01:31:54 +02:00
`iso8869_*`, `cp*` | yes | yes | yes | no | no | no | no
`apple2` | no | yes | no | no | no | no | no
2020-05-01 01:31:54 +02:00
`apple2` | no | no | no | no | no | no | no
`apple2e` | no | yes | no | no | no | no | no
`apple2gs` | no | yes | no | no | no | no | no
`atascii` | yes | no | yes | yes | no | no | no
`atasciiscr` | no | no | no | no | no | no | no
`msx_*` | yes | yes | yes | yes | no | no | no
`koi7n2` | yes | no | yes | no | no | no | no
2020-07-20 00:13:49 +02:00
`koi8*` | yes | yes | yes | no | no | no | no
`vectrex` | no | no | no | no | no | no | no
2020-07-31 01:58:40 +02:00
`coco` | yes | no | yes | no | no | no | no
`cocoscr` | no | no | no | no | no | no | no
2019-10-17 23:23:57 +02:00
`utf*` | yes | yes | yes | no | no | no | no
all the rest | yes | yes | no | no | no | no | no