1
0
mirror of https://github.com/KarolS/millfork.git synced 2026-04-19 10:42:10 +00:00

Text encoding improvements

This commit is contained in:
Karol Stasiak
2018-07-07 00:58:44 +02:00
parent 265f729b24
commit 2c8de8b6a5
31 changed files with 329 additions and 54 deletions
+6 -2
View File
@@ -81,14 +81,18 @@ Default: no if targeting Ricoh, yes otherwise.
* `-fvariable-overlap`, `-fno-variable-overlap` Whether variables should overlap if their scopes do not intersect.
Default: yes.
* `-fbounds-checking`, `-fnobounds-checking` Whether should insert bounds checking on array access.
* `-fbounds-checking`, `-fno-bounds-checking` Whether should insert bounds checking on array access.
Default: no.
* `-fcompact-dispatch-params`, `-fnocompact-dispatch-params`
* `-fcompact-dispatch-params`, `-fno-compact-dispatch-params`
Whether parameter values in return dispatch statements may overlap other objects.
This may cause problems if the parameter table is stored next to a hardware register that has side effects when reading.
`.ini` equivalent: `compact_dispatch_params`. Default: yes.
* `-flenient-encoding`, `-fno-lenient-encoding`
Whether the compiler should allow for invalid characters in string/character literals that use the default encodings and replace them with alternatives.
.ini` equivalent: `lenient_encoding`. Default: no.
## Optimization options
* `-O0` Disable all optimizations.
+9
View File
@@ -26,6 +26,13 @@ Every platform is defined in an `.ini` file with an appropriate name.
* `z80` (Zilog Z80; experimental and very incomplete)
* `encoding` default encoding for console I/O, one of
`ascii`, `pet`/`petscii`, `petscr`/`cbmscr`, `atascii`, `bbc`, `jis`/`jisx`, `apple2`,
`iso_de`, `iso_no`/`iso_dk`, `iso_se`/`iso_fi`, `iso_yu`. Default: `ascii`
* `screen_encoding` default encoding for screencodes (literals with encoding specified as `scr`).
Default: the same as `encoding`.
* `modules` comma-separated list of modules that will be automatically imported
* other compilation options (they can be overridden using commandline options):
@@ -54,6 +61,8 @@ Every platform is defined in an `.ini` file with an appropriate name.
* `inline` - inline functions automatically by default, default is `false`.
* `ipo` - enable interprocedural optimization, default is `false`.
* `lenient_encoding` - allow for automatic substitution of invalid characters in string literals using the default encodings, default is `false`.
#### `[allocation]` section
+36 -4
View File
@@ -16,9 +16,10 @@ Hexadecimal: `$D323`, `0x2a2`
## String literals
String literals are surrounded with double quotes and followed by the name of the encoding:
String literals are surrounded with double quotes and optionally followed by the name of the encoding:
"this is a string" ascii
"this is also a string"
Characters between the quotes are interpreted literally,
there are no ways to escape special characters or quotes.
@@ -28,11 +29,16 @@ for compatibility with multiple variants.
Currently available encodings:
* `default` default console encoding (can be omitted)
* `scr` default screencodes
(usually the same as `default`, a notable exception are the Commodore computers)
* `ascii` standard ASCII
* `pet` or `petscii` PETSCII (ASCII-like character set used by Commodore machines)
* `scr` Commodore screencodes
* `cbmscr` or `petscr` Commodore screencodes
* `apple2` Apple II charset ($A0$FE)
@@ -46,16 +52,42 @@ Currently available encodings:
When programming for Commodore,
use `pet` for strings you're printing using standard I/O routines
and `scr` for strings you're copying to screen memory directly.
and `petscr` for strings you're copying to screen memory directly.
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option `-flenient-encoding` is used,
then literals using `default` and `scr` encodings replace unsupported characters with supported ones
and a warning is issued.
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß"` is equivalent to:
* `"£Y↑z(C)ss"` if the default encoding is `pet`
* `"£Y↑z©ss"` if the default encoding is `bbc`
* `"?Y^z(C)ss"` if the default encoding is `ascii`
* `"?Y^ž(C)ss"` if the default encoding is `iso_yu`
* `"?Y^z(C)ß"` if the default encoding is `iso_de`
* `"?¥^z(C)ss"` if the default encoding is `jisx`
Note that the final length of the string may vary.
## Character literals
Character literals are surrounded by single quotes and followed by the name of the encoding:
Character literals are surrounded by single quotes and optionally followed by the name of the encoding:
'x' ascii
'W'
From the type system point of view, they are constants of type byte.
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option `-flenient-encoding` is used,
then literals using `default` and `scr` encodings replace unsupported characters with supported ones.
If the replacement is one characacter long, only a warning is issued, otherwise an error is raised.
## Array initialisers
An array is initialized with either: