2019-07-15 12:21:50 +00:00
|
|
|
|
[< back to index](../doc_index.md)
|
2018-04-02 22:21:26 +00:00
|
|
|
|
|
2018-01-04 00:15:04 +00:00
|
|
|
|
# Literals and initializers
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
|
|
|
|
## Numeric literals
|
|
|
|
|
|
|
|
|
|
Decimal: `1`, `10`
|
|
|
|
|
|
|
|
|
|
Binary: `%0101`, `0b101001`
|
|
|
|
|
|
2018-03-03 00:21:57 +00:00
|
|
|
|
Quaternary: `0q2131`
|
|
|
|
|
|
|
|
|
|
Octal: `0o172`
|
|
|
|
|
|
2018-02-27 12:26:56 +00:00
|
|
|
|
Hexadecimal: `$D323`, `0x2a2`
|
|
|
|
|
|
2018-08-03 11:23:37 +00:00
|
|
|
|
When using Intel syntax for inline assembly, another hexadecimal syntax is available: `0D323H`, `2a2h`.
|
|
|
|
|
It is not allowed in any other places.
|
|
|
|
|
|
2019-11-03 22:33:58 +00:00
|
|
|
|
The type of a literal is the smallest type of undefined signedness
|
|
|
|
|
that can fit either the unsigned or signed representation of the value:
|
|
|
|
|
`200` is a `byte`, `4000` is a `word`, `75000` is an `int24` etc.
|
|
|
|
|
|
|
|
|
|
However, padding the literal to the left with zeroes changes the type
|
|
|
|
|
to the smallest type that can fit the smallest number with the same number of digits and without padding.
|
|
|
|
|
For example, `0002` is of type `word`, as 1000 does not fit in one byte.
|
|
|
|
|
|
2018-02-27 12:26:56 +00:00
|
|
|
|
## String literals
|
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
String literals can be used as either array initializers or expressions of type `pointer`.
|
|
|
|
|
|
2019-07-31 00:37:40 +00:00
|
|
|
|
String literals are equivalent to constant arrays. Writing to them via their pointer is undefined behaviour.
|
2019-04-29 20:57:40 +00:00
|
|
|
|
|
2019-01-05 00:04:08 +00:00
|
|
|
|
If a string literal is used as an expression, then the text data will be located in the default code segment,
|
|
|
|
|
regardless of which code segment the current function is located it. This may be subject to change in future releases.
|
|
|
|
|
|
2018-07-06 22:58:44 +00:00
|
|
|
|
String literals are surrounded with double quotes and optionally followed by the name of the encoding:
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
|
|
|
|
"this is a string" ascii
|
2018-07-06 22:58:44 +00:00
|
|
|
|
"this is also a string"
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
If there is no encoding name specified, then the `default` encoding is used.
|
|
|
|
|
Two encoding names are special and refer to platform-specific encodings:
|
|
|
|
|
`default` and `scr`.
|
2018-07-06 22:58:44 +00:00
|
|
|
|
|
2019-10-31 11:20:20 +00:00
|
|
|
|
## Zero-terminated strings
|
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
You can also append `z` to the name of the encoding to make the string zero-terminated.
|
2019-10-31 11:20:20 +00:00
|
|
|
|
This means that the string will have a string terminator appended, usually a single byte.
|
|
|
|
|
The exact value of that byte is encoding-dependent:
|
2019-09-20 17:41:53 +00:00
|
|
|
|
* in the `vectrex` encoding it's 128,
|
|
|
|
|
* in the `zx80` encoding it's 1,
|
|
|
|
|
* in the `zx81` encoding it's 11,
|
2019-11-04 01:38:07 +00:00
|
|
|
|
* in the `petscr` and `petscrjp` encodings it's 224,
|
|
|
|
|
* in the `atascii` encoding it's 219,
|
2019-10-17 21:23:57 +00:00
|
|
|
|
* in the `utf16be` and `utf16le` encodings it's exceptionally two bytes: 0, 0
|
2019-11-04 01:38:07 +00:00
|
|
|
|
* in other encodings it's 0 (this may be a subject to change in future versions).
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2019-10-23 09:03:01 +00:00
|
|
|
|
"this is a zero-terminated string" asciiz
|
|
|
|
|
"this is also a zero-terminated string"z
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2019-10-31 11:20:20 +00:00
|
|
|
|
The byte constant `nullchar` is defined to be equal to the string terminator in the `default` encoding (or, in other words, to `'{nullchar}'`)
|
|
|
|
|
and the byte constant `nullchar_scr` is defined to be equal to the string terminator in the `scr` encoding (`'{nullchar}'scr`).
|
|
|
|
|
|
|
|
|
|
You can override the values for `nullchar` and `nullchar_scr`
|
|
|
|
|
by defining preprocesor features `NULLCHAR` and `NULLCHAR_SCR` respectively.
|
|
|
|
|
|
|
|
|
|
Warning: If you define UTF-16 to be you default or screen encoding, you will encounter several problems:
|
|
|
|
|
|
|
|
|
|
* `nullchar` and `nullchar_scr` will still be bytes, equal to zero.
|
|
|
|
|
* the `string` module in the Millfork standard library will not work correctly
|
|
|
|
|
|
2020-04-03 22:45:09 +00:00
|
|
|
|
## Length-prefixed strings (Pascal strings)
|
|
|
|
|
|
|
|
|
|
You can also prepend `p` to the name of the encoding to make the string length-prefixed.
|
|
|
|
|
|
|
|
|
|
The length is measured in bytes and doesn't include the zero terminator, if present.
|
|
|
|
|
In all encodings except for UTF-16 the prefix takes one byte,
|
|
|
|
|
which means that length-prefixed strings cannot be longer than 255 bytes.
|
|
|
|
|
|
|
|
|
|
In case of UTF-16, the length prefix contains the number of code units,
|
|
|
|
|
so the number of bytes divided by two,
|
|
|
|
|
which allows for strings of practically unlimited length.
|
|
|
|
|
The length is stores as two bytes and is always little endian,
|
|
|
|
|
even in case of the `utf16be` encoding or a big-endian processor.
|
|
|
|
|
|
|
|
|
|
"this is a Pascal string" pascii
|
|
|
|
|
"this is also a Pascal string"p
|
|
|
|
|
"this is a zero-terminated Pascal string"pz
|
|
|
|
|
|
|
|
|
|
Note: A string that's both length-prefixed and zero-terminated does not count as a normal zero-terminated string!
|
|
|
|
|
To pass it to a function that expects a zero-terminated string, add 1 (or, in case of UTF-16, 2):
|
|
|
|
|
|
|
|
|
|
pointer p
|
|
|
|
|
p = "test"pz
|
|
|
|
|
// putstrz(p) // won't work correctly
|
|
|
|
|
putstrz(p + 1) // ok
|
|
|
|
|
|
2019-10-31 11:20:20 +00:00
|
|
|
|
## Escape sequences and miscellaneous compatibility issues
|
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
Most characters between the quotes are interpreted literally.
|
|
|
|
|
To allow characters that cannot be inserted normally,
|
|
|
|
|
each encoding may define escape sequences.
|
|
|
|
|
Every encoding is guaranteed to support at least
|
|
|
|
|
`{q}` for double quote
|
|
|
|
|
and `{apos}` for single quote/apostrophe.
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2019-08-05 12:07:33 +00:00
|
|
|
|
The number of bytes used to represent given characters may differ from the number of the characters.
|
|
|
|
|
For example, the `petjp`, `msx_jp` and `jis` encodings represent ポ as two separate characters, and therefore two bytes.
|
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
For the list of all text encodings and escape sequences, see [this page](./text.md).
|
2018-04-02 17:47:11 +00:00
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
In some encodings, multiple characters are mapped to the same byte value,
|
|
|
|
|
for compatibility with multiple variants.
|
2018-07-06 22:58:44 +00:00
|
|
|
|
|
|
|
|
|
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
|
|
|
|
|
However, if the command-line option `-flenient-encoding` is used,
|
2018-07-27 22:58:20 +00:00
|
|
|
|
then literals using `default` and `scr` encodings replace unsupported characters with supported ones,
|
|
|
|
|
skip unsupported escape sequences, and a warning is issued.
|
2018-12-24 00:32:17 +00:00
|
|
|
|
For example, if `-flenient-encoding` is enabled, then a literal `"£¥↑ž©ß"` is equivalent to:
|
2018-07-06 22:58:44 +00:00
|
|
|
|
|
|
|
|
|
* `"£Y↑z(C)ss"` if the default encoding is `pet`
|
|
|
|
|
|
|
|
|
|
* `"£Y↑z©ss"` if the default encoding is `bbc`
|
|
|
|
|
|
|
|
|
|
* `"?Y^z(C)ss"` if the default encoding is `ascii`
|
|
|
|
|
|
|
|
|
|
* `"?Y^ž(C)ss"` if the default encoding is `iso_yu`
|
|
|
|
|
|
|
|
|
|
* `"?Y^z(C)ß"` if the default encoding is `iso_de`
|
|
|
|
|
|
|
|
|
|
* `"?¥^z(C)ss"` if the default encoding is `jisx`
|
|
|
|
|
|
2019-09-17 22:09:37 +00:00
|
|
|
|
* `"£¥^z(C)β"` if the default encoding is `msx_intl`
|
|
|
|
|
|
2018-07-06 22:58:44 +00:00
|
|
|
|
Note that the final length of the string may vary.
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2018-04-02 19:06:18 +00:00
|
|
|
|
## Character literals
|
|
|
|
|
|
2018-07-06 22:58:44 +00:00
|
|
|
|
Character literals are surrounded by single quotes and optionally followed by the name of the encoding:
|
2018-04-02 19:06:18 +00:00
|
|
|
|
|
|
|
|
|
'x' ascii
|
2018-07-06 22:58:44 +00:00
|
|
|
|
'W'
|
2018-04-02 19:06:18 +00:00
|
|
|
|
|
2019-08-05 12:07:33 +00:00
|
|
|
|
Character literals have to be separated from preceding operators with whitespace:
|
|
|
|
|
|
|
|
|
|
a='a' // wrong
|
|
|
|
|
a = 'a' // ok
|
|
|
|
|
|
2018-04-02 19:06:18 +00:00
|
|
|
|
From the type system point of view, they are constants of type byte.
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2019-08-05 12:07:33 +00:00
|
|
|
|
If the character cannot be represented as one byte, an error is raised.
|
|
|
|
|
|
2018-07-27 22:58:20 +00:00
|
|
|
|
For the list of all text encodings and escape sequences, see [this page](./text.md).
|
|
|
|
|
|
2018-07-06 22:58:44 +00:00
|
|
|
|
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
|
|
|
|
|
However, if the command-line option `-flenient-encoding` is used,
|
|
|
|
|
then literals using `default` and `scr` encodings replace unsupported characters with supported ones.
|
2018-07-27 22:58:20 +00:00
|
|
|
|
If the replacement is one character long, only a warning is issued, otherwise an error is raised.
|
2018-07-06 22:58:44 +00:00
|
|
|
|
|
2019-07-15 12:15:05 +00:00
|
|
|
|
## Struct constructors
|
|
|
|
|
|
|
|
|
|
You can create a constant of a given struct type by listing constant values of fields as arguments:
|
|
|
|
|
|
|
|
|
|
struct point { word x, word y }
|
|
|
|
|
point(5,6)
|
|
|
|
|
|
|
|
|
|
|
2019-07-31 00:37:40 +00:00
|
|
|
|
## Array initializers
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2018-04-02 22:21:26 +00:00
|
|
|
|
An array is initialized with either:
|
|
|
|
|
|
2019-07-15 12:15:05 +00:00
|
|
|
|
* (only byte arrays) a string literal
|
2018-04-02 22:21:26 +00:00
|
|
|
|
|
2019-07-15 12:15:05 +00:00
|
|
|
|
* (only byte arrays) a `file` expression
|
2018-04-02 22:21:26 +00:00
|
|
|
|
|
|
|
|
|
* a `for`-style expression
|
|
|
|
|
|
2019-07-15 12:15:05 +00:00
|
|
|
|
* (only byte arrays) a format, followed by an array initializer:
|
2018-06-18 00:52:14 +00:00
|
|
|
|
|
2019-07-29 20:51:08 +00:00
|
|
|
|
* `@word_le`: for every term of the array initializer, emit two bytes, first being the low byte of the value, second being the high byte:
|
|
|
|
|
`@word_le [$1122]` is equivalent to `[$22, $11]`
|
2018-06-18 00:52:14 +00:00
|
|
|
|
|
|
|
|
|
* `@word_be` – like the above, but opposite:
|
|
|
|
|
`@word_be [$1122]` is equivalent to `[$11, $22]`
|
2019-01-05 00:04:08 +00:00
|
|
|
|
|
2019-07-29 20:51:08 +00:00
|
|
|
|
* `@word`: equivalent to `@word_le` on little-endian architectures and `@word_be` on big-endian architectures
|
|
|
|
|
|
|
|
|
|
* `@long`, `@long_le`, `@long_be`: similar, but with four bytes
|
|
|
|
|
`@long_le [$11223344]` is equivalent to `[$44, $33, $22, $11]`
|
2019-01-05 00:04:08 +00:00
|
|
|
|
`@long_be [$11223344]` is equivalent to `[$11, $22, $33, $44]`
|
2019-04-29 23:30:22 +00:00
|
|
|
|
|
|
|
|
|
* `@struct`: every term of the initializer is interpreted as a struct constructor (see below)
|
|
|
|
|
and treated as a list of bytes with no padding
|
|
|
|
|
`@struct [s(1, 2)]` is equivalent to `[1, 2]` when `struct s {byte x, byte y}` is defined
|
2019-10-23 09:03:01 +00:00
|
|
|
|
`@struct [s2(1, 2), s2(3, 4)]` is equivalent to `[1, 0, 2, 0, 3, 0, 4, 0]` on little-endian machines when `struct s2 {word x, word y}` is defined
|
2019-07-15 12:15:05 +00:00
|
|
|
|
|
|
|
|
|
* a list of literals and/or other array initializers, surrounded by brackets:
|
|
|
|
|
|
|
|
|
|
array a = [1, 2]
|
|
|
|
|
array b = "----" scr
|
|
|
|
|
array c = ["hello world!" ascii, 13]
|
|
|
|
|
array d = file("d.bin")
|
2020-01-10 17:37:49 +00:00
|
|
|
|
array d1 = file("d.bin", 128)
|
2019-07-15 12:15:05 +00:00
|
|
|
|
array e = file("d.bin", 128, 256)
|
|
|
|
|
array f = for x,0,until,8 [x * 3 + 5] // equivalent to [5, 8, 11, 14, 17, 20, 23, 26]
|
|
|
|
|
array(point) g = [point(2,3), point(5,6)]
|
|
|
|
|
array(point) i = for x,0,until,100 [point(x, x+1)]
|
2018-04-02 22:21:26 +00:00
|
|
|
|
|
|
|
|
|
Trailing commas (`[1, 2,]`) are not allowed.
|
|
|
|
|
|
2020-04-27 10:42:43 +00:00
|
|
|
|
String literals are laid out in the arrays as-is, flat.
|
|
|
|
|
To have an array of pointers to strings, wrap each string in `pointer(...)`:
|
|
|
|
|
|
|
|
|
|
// a.length = 12; identical to [$48, $45, $4C, $4C, $4F, 0, $57, $4F, $52, $4C, $44, 0]
|
|
|
|
|
array a = [ "hello"z, "world"z ]
|
|
|
|
|
// b.length = 2
|
|
|
|
|
array(pointer) b = [ pointer("hello"z), pointer("world"z) ]
|
|
|
|
|
|
2018-04-02 22:21:26 +00:00
|
|
|
|
The parameters for `file` are: file path, optional start offset, optional length
|
2020-01-10 17:37:49 +00:00
|
|
|
|
(if only two parameters are present, then the second one is assumed to be the start offset).
|
|
|
|
|
The `file` expression is expanded at the compile time to an array of bytes equal to the bytes contained in the file.
|
|
|
|
|
If the start offset is present, then that many bytes at the start of the file are skipped.
|
|
|
|
|
If the length is present, then only that many bytes are taken, otherwise, all bytes until the end of the file are taken.
|
2018-02-27 12:26:56 +00:00
|
|
|
|
|
2018-04-02 22:21:26 +00:00
|
|
|
|
The `for`-style expression has a variable, a starting index, a direction, a final index,
|
2018-07-03 21:28:05 +00:00
|
|
|
|
and a parameterizable array initializer.
|
2018-04-02 22:21:26 +00:00
|
|
|
|
The initializer is repeated for every value of the variable in the given range.
|
2018-08-08 21:52:47 +00:00
|
|
|
|
|
2019-04-29 23:30:22 +00:00
|
|
|
|
Struct constructors look like a function call, where the callee name is the name of the struct type
|
|
|
|
|
and the parameters are the values of fields in the order of declaration.
|
|
|
|
|
Fields of arithmetic, pointer and enum types are declared using normal expressions.
|
|
|
|
|
Fields of struct types are declared using struct constructors.
|
|
|
|
|
Fields of union types cannot be declared.
|
|
|
|
|
|
2020-03-19 18:43:24 +00:00
|
|
|
|
What might be useful is the fact that the compiler allows for certain built-in functions
|
2018-08-08 21:52:47 +00:00
|
|
|
|
in constant expressions only:
|
|
|
|
|
|
|
|
|
|
* `sin(x, n)` – returns _n_·**sin**(*x*π/128)
|
|
|
|
|
|
|
|
|
|
* `cos(x, n)` – returns _n_·**cos**(*x*π/128)
|
|
|
|
|
|
|
|
|
|
* `tan(x, n)` – returns _n_·**tan**(*x*π/128)
|
|
|
|
|
|
2020-03-19 18:43:24 +00:00
|
|
|
|
* `min(x,...)` – returns the smallest argument
|
|
|
|
|
|
|
|
|
|
* `max(x,...)` – returns the largest argument
|
|
|
|
|
|