12 KiB
Text encodings and escape sequences
Defining custom encodings
Every platform is defined in an .tbl
file with an appropriate name.
The file is looked up in the directories on the include path, first directly, then in the encoding
subdirectory.
TODO: document the file format.
Text encoding list
-
default
– default console encoding (can be omitted) -
scr
– default screencodes (usually the same asdefault
, a notable exception are the Commodore computers) -
ascii
– standard ASCII -
petscii
orpet
– PETSCII (ASCII-like character set used by Commodore machines from VIC-20 onward) -
petsciijp
orpetjp
– PETSCII as used on Japanese versions of Commodore 64 -
origpetscii
ororigpet
– old PETSCII (Commodore PET with original ROMs) -
oldpetscii
oroldpet
– old PETSCII (Commodore PET with newer ROMs) -
cbmscr
orpetscr
– Commodore screencodes -
cbmscrjp
orpetscrjp
– Commodore screencodes as used on Japanese versions of Commodore 64 -
apple2
– original Apple II charset ($A0–$DF) -
apple2e
– Apple IIe charset -
apple2c
– alternative Apple IIc charset -
apple2gs
– Apple IIgs charset -
macroman
– Macintosh Western Latin charset -
bbc
– BBC Micro character set -
sinclair
– ZX Spectrum character set -
zx80
– ZX80 character set -
zx81
– ZX81 character set -
jis
orjisx
– JIS X 0201 -
iso_de
,iso_no
,iso_se
,iso_yu
– various variants of ISO/IEC-646iso_dk
,iso_fi
– aliases foriso_no
andiso_se
respectively
-
dmcs
– DEC Multinational Character Set -
lics
– Lotus International Character Set -
iso8859_1
,iso8859_2
,iso8859_3
,iso8859_4
,iso8859_5
,iso8859_7
,iso8859_9
,iso8859_10
,iso8859_13
,iso8859_14
,iso8859_15
,iso8859_13
– ISO 8859-1, ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-7, ISO 8859-9, ISO 8859-10, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16,iso1
,latin1
– aliases foriso8859_1
iso2
,latin2
– aliases foriso8859_2
iso3
,latin3
– aliases foriso8859_3
iso4
,latin4
– aliases foriso8859_4
iso5
– alias foriso8859_5
iso7
– alias foriso8859_7
iso9
,latin5
, – aliases foriso8859_9
iso10
,latin6
– aliases foriso8859_10
iso13
,latin7
– aliases foriso8859_13
iso14
,latin8
– aliases foriso8859_14
iso_15
,latin9
,latin0
– aliases foriso8859_15
iso16
,latin10
– aliases foriso8859_16
-
brascii
– BraSCII -
cp437
,cp850
,cp851
,cp852
,cp855
,cp858
,cp866
– DOS codepages 437, 850, 851, 852, 855, 858, 866 -
mazovia
– Mazovia encoding -
kamenicky
– Kamenický encoding -
cp1250
,cp1251
,cp1252
– Windows codepages 1250, 1251, 1252 -
msx_intl
,msx_jp
,msx_ru
,msx_br
– MSX character encoding, International, Japanese, Russian and Brazilian respectivelymsx_us
,msx_uk
,msx_fr
,msx_de
– aliases formsx_intl
-
cpc_en
,cpc_fr
,cpc_es
,cpc_da
– Amstrad CPC character encoding, English, French, Spanish and Danish respectively -
pcw
oramstrad_cpm
– Amstrad CP/M encoding, the US variant (language 0), as used on PCW machines -
pokemon1en
,pokemon1jp
,pokemon1es
,pokemon1fr
– text encodings used in 1st generation Pokémon games, English, Japanese, Spanish/Italian and French/German respectivelypokemon1it
,pokemon1de
– aliases forpokemon1es
andpokemon1fr
respectively
-
atascii
oratari
– ATASCII as seen on Atari 8-bit computers -
atasciiscr
oratariscr
– screencodes used by Atari 8-bit computers -
koi7n2
orshort_koi
– KOI-7 N2 -
koi8r
,koi8u
,koi8ru
,koi8e
,koi8f
,koi8t
– various variants of KOI-8 -
vectrex
– built-in Vectrex font -
galaksija
– text encoding used on Galaksija computers -
coco
– text encoding used on Tandy Color Computer -
cocoscr
– Tandy Color Computer screencodes -
ebcdic
– EBCDIC codepage 037 (partial coverage) -
utf8
– UTF-8 -
utf16be
,utf16le
– UTF-16BE and UTF-16LE
When programming for Commodore,
use petscii
for strings you're printing using standard I/O routines
and petsciiscr
for strings you're copying to screen memory directly.
When programming for Atari,
use atascii
for strings you're printing using standard I/O routines
and atasciiscr
for strings you're copying to screen memory directly.
Escape sequences
Escape sequences allow for including characters in the string literals that would be otherwise impossible to type.
Some escape sequences may expand to multiple characters. For example, in several encodings {n}
expands to {x0D}{x0A}
.
Available everywhere
-
{x00}
–{xff}
– a character of the given hexadecimal value -
{copyright_year}
– this expands to the current year in digits -
{program_name}
– this expands to the name of the output file without the file extension -
{program_name_upper}
– the same, but uppercased -
{nullchar}
– the null terminator for strings ("{nullchar}"
is equivalent to""z
).
The exact value of{nullchar}
is encoding-dependent:- in the
vectrex
encoding it's{x80}
, - in the
zx80
encoding it's{x01}
, - in the
zx81
encoding it's{x0b}
, - in the
petscr
andpetscrjp
encodings it's{xe0}
, - in the
atasciiscr
encoding it's{xdb}
, - in the
pokemon1*
encodings it's{x50}
, - in the
cocoscr
encoding it's exceptionally two bytes:{xd0}
- in the
utf16be
andutf16le
encodings it's exceptionally two bytes:{x00}{x00}
- in other encodings it's
{x00}
(this may be a subject to change in future versions).
- in the
Available only in some encodings
-
{apos}
– apostrophe/single quote (available everywhere except forzx80
,zx81
andgalaksija
) -
{q}
– double quote symbol (available everywhere except forpokemon1*
encodings) -
{n}
– new line -
{b}
– backspace -
{lbrace}
,{rbrace}
– opening and closing curly brace (only in encodings that support braces) -
{up}
,{down}
,{left}
,{right}
– control codes for moving the cursor -
{white}
,{black}
,{red}
,{green}
,{blue}
,{cyan}
,{yellow}
,{purple}
– control codes for changing the text color (petscii
,petsciijp
,sinclair
only) -
{bgwhite}
,{bgblack}
,{bgred}
,{bggreen}
,{bgblue}
,{bgcyan}
,{bgyellow}
,{bgpurple}
– control codes for changing the text background color (sinclair
only) -
{reverse}
,{reverseoff}
– inverted mode on/off -
{yen}
,{pound}
,{cent}
,{euro}
,{copy}
– yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol -
{nbsp}
,{shy}
– non-breaking space, soft hyphen -
{pi}
– letter π -
{u0000}
–{u1fffff}
– Unicode codepoint (available in UTF encodings only)
Character availability
For ISO/DOS/Windows/UTF encodings, consult external sources.
Encoding | lowercase letters | backslash | currencies | intl | card suits |
---|---|---|---|---|---|
pet , |
yes¹ | no | £ | none | yes¹ |
origpet |
yes¹ | yes | none | yes¹ | |
oldpet |
yes² | yes | none | yes² | |
petscr |
yes¹ | no | £ | none | yes¹ |
petjp |
no | no | ¥ | katakana³ | yes³ |
petscrjp |
no | no | ¥ | katakana³ | yes³ |
sinclair , bbc |
yes | yes | £ | none | no |
zx80 , zx81 |
no | no | £ | none | no |
apple2 |
no | yes | none | no | |
atascii |
yes | yes | none | yes | |
atasciiscr |
yes | yes | none | yes | |
jis |
yes | no | ¥ | both kana | no |
dmcs ,lics |
yes | yes | ¢£¥ | Western | no |
brascii ,macroman |
yes | yes | ¢£¥ | Western | no |
msx_intl ,msx_br |
yes | yes | ¢£¥ | Western | yes |
msx_jp |
yes | no | ¥ | katakana | yes |
msx_ru |
yes | yes | Russian⁴ | yes | |
koi7n2 |
no | yes | Russian⁵ | no | |
koi8* |
yes | yes | Russian | no | |
cpc_en |
yes | yes | £ | none | yes |
cpc_es |
yes | yes | Spanish⁶ | yes | |
cpc_fr |
yes | no | £ | French⁷ | yes |
cpc_da |
yes | no | £ | Nor/Dan. | yes |
vectrex |
no | yes | none | no | |
coco ,cocoscr |
no | yes | none | no | |
pokemon1jp |
no | no | both kana | no | |
pokemon1en |
yes | no | none | no | |
pokemon1fr |
yes | no | Ger/Fre. | no | |
pokemon1es |
yes | no | Spa/Ita. | no | |
galaksija |
no | no | Yugoslav⁸ | no |
-
pet
,origpet
andpetscr
cannot display card suit symbols and lowercase letters at the same time. Card suit symbols are only available in graphics mode, in which lowercase letters are displayed as uppercase and uppercase letters are displayed as symbols. -
oldpet
cannot display card suit symbols and lowercase letters at the same time. Card suit symbols are only available in graphics mode, in which lowercase letters are displayed as symbols. -
petjp
andpetscrjp
cannot display card suit symbols and katakana at the same time. Card suit symbols are only available in graphics mode, in which katakana is displayed as symbols. -
Letter Ё and uppercase Ъ are not available.
-
Only uppercase. Letters Ё and Ъ are not available.
-
No accented vowels.
-
Some accented vowels are not available.
-
Letter Đ is not available.
If the encoding does not support lowercase letters (e.g. apple2
, petjp
, petscrjp
, koi7n2
, vectrex
),
then text and character literals containing lowercase letters are automatically converted to uppercase.
Only unaccented Latin and Cyrillic letters will be converted as such.
Accented Latin letters will not be converted and will fail to compile without -flenient-encoding
.
To detect if your default encoding does not support lowercase letters, test 'A' == 'a'
.
Escape sequence availability
The table below may be incomplete.
Encoding | new line | braces | backspace | cursor movement | text colour | reverse | background colour |
---|---|---|---|---|---|---|---|
pet ,petjp |
yes | no | no | yes | yes | yes | no |
origpet |
yes | no | no | yes | no | yes | no |
oldpet |
yes | no | no | yes | no | yes | no |
petscr , petscrjp |
no | no | no | no | no | no | no |
sinclair |
yes | yes | no | yes | yes | yes | yes |
zx80 ,zx81 |
yes | no | yes | yes | no | no | no |
ascii , iso_* |
yes | yes | yes | no | no | no | no |
iso8869_* , cp* |
yes | yes | yes | no | no | no | no |
apple2 |
no | yes | no | no | no | no | no |
apple2 |
no | no | no | no | no | no | no |
apple2e |
no | yes | no | no | no | no | no |
apple2gs |
no | yes | no | no | no | no | no |
atascii |
yes | no | yes | yes | no | no | no |
atasciiscr |
no | no | no | no | no | no | no |
msx_* |
yes | yes | yes | yes | no | no | no |
koi7n2 |
yes | no | yes | no | no | no | no |
koi8* |
yes | yes | yes | no | no | no | no |
vectrex |
no | no | no | no | no | no | no |
coco |
yes | no | yes | no | no | no | no |
cocoscr |
no | no | no | no | no | no | no |
utf* |
yes | yes | yes | no | no | no | no |
all the rest | yes | yes | no | no | no | no | no |