document the long type

2026-04-19 20:16:51 +00:00 · 2025-10-04 22:17:51 +02:00
parent e63921009c
commit 71ffbe2ba7
8 changed files with 73 additions and 45 deletions
@@ -34,11 +34,14 @@ No linker

 Data types
 ----------
- There are byte, word (16 bits) and float datatypes for numbers. There are no bigger integer types natively available.
- There is no automatic type enlargement: calculations remain within the data type of the operands. Any overflow silently wraps or truncates.
+- There are byte, word (16 bits), long (32 bits) and float datatypes for numbers.
+- floats are available as native data type on systems that have a supported floating point library in ROM.
+- **There is no automatic type enlargement:** all calculations remain within the data type of the operands. Any overflow silently wraps or truncates.
  You'll have to add explicit casts to increase the size of the value if required.
  For example when adding two byte variables having values 100 and 200, the result won't be 300, because that doesn't fit in a byte. It will be 44.
  You'll have to cast one or both of the *operands* to a word type first if you want to accomodate the actual result value of 300.
+  Similarly, ``long v = w1 * w2`` doesn't automatically give you the full 32 bits multiplication result, instead it is still constrained in the word range.
+  If you need the full 32 bits result you'll have to call a specialized routine such as ``math.mul32`` or ``math.mul16_last_upper()``.
 - strings and arrays are allocated once, statically, and never resized.
 - strings and arrays are mutable: you can change their contents, but always keep the original storage size in mind to avoid overwriting memory outside of the buffer.
 - maximum string length is 255 characters + a trailing 0 byte.
@@ -41,7 +41,7 @@ Mac OS (and Linux, and WSL2 on Windows):

 **Or, use the Gradle build system to build it yourself from source:**

-The Gradle build system is used to build the compiler.
+The Gradle build system is used to build the compiler. You will also need at least Java version 17 or higher to build it.
 The most interesting gradle commands to run are probably the ones listed below.
 (Note: if you have a recent gradle installed on your system already, you can probably replace the ``./gradlew`` wrapper commands with just the regular ``gradle`` command.)

@@ -68,7 +68,7 @@ For normal use, the ``installDist`` task should suffice and after succesful comp

 .. hint::
    Development and testing is done on Linux using the IntelliJ IDEA IDE,
-    but the actual prog8 compiler should run on all operating systems that provide a java runtime (version 11 or newer).
+    but the actual prog8 compiler should run on all operating systems that provide a Java runtime (version 11 or newer).
    If you do have trouble building or running the compiler on your operating system, please let me know!

    To successfully build and debug in IDEA, you have to do two things manually first:
@@ -99,7 +99,8 @@ It's easy to compile yourself, but a recent precompiled .exe (only for Windows)
 *You need at least version 1.58.0 of this assembler.*
 If you are on Linux, there's probably a "64tass" package in the repositories, but check if it is a recent enough version.

-A **Java runtime (jre or jdk), version 11 or newer**  is required to run the prog8 compiler itself.
+A **Java runtime (jre or jdk), version 11 or newer**  is required to run the prog8 compiler itself. Version 17 or higher if you want to
+build the compiler from source.
 If you're scared of Oracle's licensing terms, get one of the versions of another vendor. Even Microsoft provides their own version.
 Other OpenJDK builds can be found at `Adoptium <https://adoptium.net/temurin/releases/?version=11>`_ .
 For MacOS you can also use the Homebrew system to install a recent version of OpenJDK.
@@ -114,10 +114,28 @@ mkword (msb, lsb)
    So mkword($80, $22) results in $8022.

    .. note::
-        The arguments to the mkword() function are in 'natural' order that is first the msb then the lsb.
+        The arguments are in 'natural' left to right reading order that is first the msb then the lsb.
        Don't get confused by how the system actually stores this 16-bit word value in memory (which is
        in little-endian format, so lsb first then msb)

+mklong (msb, b2, b1, lsb)
+    Efficiently create a long value from four bytes (the msb, second, first and finally the lsb). Avoids multiplication and shifting.
+    So mklong($12, $34, $56, $78) results in $12345678.
+
+    .. note::
+        The arguments are in 'natural' left to right reading order that is first the msb then the lsb.
+        Don't get confused by how the system actually stores this 32-bit word value in memory (which is
+        in little-endian format, so lsb first then b1, b2 and finally the msb)
+
+mklong2 (msw, lsw)
+    Efficiently create a long value from two words (the msw, and the lsw). Avoids multiplication and shifting.
+    So mklong2($1234, $abcd) results in $1234abcd.
+
+    .. note::
+        The arguments are in 'natural' left to right reading order that is first the msw then the lsw.
+        Don't get confused by how the system actually stores this 32-bit word value in memory (which is
+        in little-endian format, so lsw first then the msw)
+
 offsetof (Struct.field)
    The offset in bytes of the given field in the struct. The first field will always have offset 0.
    Usually you just reference the fields directly but in some cases it might be useful to know how many
@@ -135,6 +153,9 @@ peekw (address)
    Caution: when using peekw to get words out of an array pointer, make sure the array is *not* a split word array
    (peekw requires the LSB and MSB of the word value to be consecutive in memory).

+peekl (address)
+    reads the signed long value at the given address in memory. Long is read as usual little-endian lsb/msb byte order.
+
 peekf (address)
    reads the float value at the given address in memory. On CBM machines, this reads 5 bytes.

@@ -148,6 +169,9 @@ pokebool (address, value)
 pokew (address, value)
    writes the word value at the given address in memory, in usual little-endian lsb/msb byte order.

+pokel (address, value)
+    writes the signed long value at the given address in memory, in usual little-endian lsb/msb byte order.
+
 pokef (address, value)
    writes the float value at the given address in memory. On CBM machines, this writes 5 bytes.

@@ -792,6 +816,9 @@ but perhaps the provided ones can be of service too.
    Returns the absolute difference, or distance, between the two word values.
    (This routine is more efficient than doing a compare and a subtract separately, or using abs)

+``mul32 (woord w1, word w2) -> long``
+   Returns the 32 bits signed long result of w1 * w2
+
 ``mul16_last_upper () -> uword``
    Fetches the upper 16 bits of the previous 16*16 bit multiplication.
    To avoid corrupting the result, it is best performed immediately after the multiplication.
@@ -1150,6 +1150,13 @@ so pay attention to any jumps and rts instructions in the inlined code!
    You can use them directly but their name isn't very descriptive, so it may be useful to define
    an alias for them when using them regularly.

+.. note::
+    Dealing with **long** arguments and return values:
+    A long takes 4 bytes (or 2 words, if you will). *There is no register definition specific to long types*.
+    The way you specify the 'register' for a long argument or return value for an asmsub is by using a *virtual register pair*.
+    For example, you can use R0+R1, R2+R3, R4+R5 and so on to take a long value instead.
+    The syntax to use as a 'register' name for those pairs is ``R0R1_32``, ``R2R3_32``, ``R4R5_32`` and so on.
+

 External subroutines
 ^^^^^^^^^^^^^^^^^^^^
@@ -130,7 +130,7 @@ dealing with all of them separately.  You first define the struct type like so::
        bool elite
    }

-You can use boolean fields, numeric fields (byte, word, float), and pointer fields (including str, which is translated into ^^ubyte).
+You can use boolean fields, numeric fields (byte, word, long, float), and pointer fields (including str, which is translated into ^^ubyte).
 You cannot nest struct types nor put arrays in them as a field.
 Fields in a struct are 'packed' (meaning the values are placed back-to-back in memory), and placed in memory in order of declaration. This guarantees exact size and place of the fields.
 ``sizeof()`` knows how to calculate the combined size of a struct, and ``offsetof()`` can be used to get the byte offset of a given field in the struct.
@@ -3,12 +3,10 @@ TODO

 LONG TYPE
 ---------
- document the new long type! and mklong(a,b,c,d) and mklong2(w1,w2) , print_l , print_ulhex (& conv.str_l) and pokel, peekl, cbm.SETTIML/RDTIML, math.mul32, verafx.muls/muls16, and the use of R0:R1 when doing LONG calculations, asmsub call convention: @R0R1_32 to specify a 32 bits long combined register R0:R1
- how hard is it to also implement the other comparison operators (<,>,<=,>=) on longs?
+- implement the other comparison operators (<,>,<=,>=) on longs
 - implement LONG testcases in testmemory


-
 STRUCTS and TYPED POINTERS
 --------------------------

@@ -39,6 +39,7 @@ Various examples::
    byte        counter = len([1, 2, 3]) * 20
    byte        age     = 2018 - 1974
    float       wallet  = 55.25
+    long        large   = 998877
    ubyte       x,y,z                   ; declare three ubyte variables x y and z
    str         name    = "my name is Alice"
    uword       address = &counter
@@ -173,8 +174,8 @@ type identifier  type                     storage size       example var declara
 ``bool``         boolean                  1 byte = 8 bits    ``bool myvar = true`` or ``bool myvar == false``
 ``word``         signed word              2 bytes = 16 bits  ``word myvar = -12345``
 ``uword``        unsigned word            2 bytes = 16 bits  ``uword myvar = $8fee``
-``long``         signed 32 bits integer   n/a                ``const long LARGE = $12345678``
-                                          (only for consts)
+``long``         signed 32 bits integer   4 bytes            ``long large = $12345678``
+                                                             there is no unsigned long type at the moment.
 ``float``        floating-point           5 bytes = 40 bits  ``float myvar = 1.2345``
                                                             stored in 5-byte cbm MFLPT format
 ``byte[x]``      signed byte array        x bytes            ``byte[4] myvar``
@@ -195,10 +196,10 @@ type identifier  type                     storage size       example var declara
 ``^^type``       typed pointer            2 bytes            pointer types are explained in their own chapter :ref:`pointers`
 ===============  =======================  =================  =========================================

-Integers (bytes, words)
-^^^^^^^^^^^^^^^^^^^^^^^
+Integers (bytes, words, longs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Integers are 8 or 16 bit numbers and can be written in normal decimal notation,
+Integers are 8, 16 or 32 bit numbers and can be written in normal decimal notation,
 in hexadecimal and in binary notation. There is no octal notation. Hexadecimal has the '$' prefix,
 binary has the '%' prefix. Note that ``%`` is also the remainder operator so be careful: if you want to take the remainder
 of something with an operand starting with 1 or 0, you'll have to add a space in between, otherwise
@@ -210,7 +211,9 @@ For instance ``3_000_000`` is a valid decimal number and so is ``%1001_0001`` a
 A single character in single quotes such as ``'a'`` is translated into a byte integer,
 which is the PETSCII value for that character. You can prefix it with the desired encoding, like with strings, see :ref:`encodings`.

-**bytes versus words:**
+*Endianness:* all integers are stored in *little endian* byte order, so the Least significant byte first and the Most significant byte last.
+
+**bytes versus words versus longs:**

 Prog8 tries to determine the data type of integer values according to the table below,
 and sometimes the context in which they are used.
@@ -222,7 +225,7 @@ value                     datatype
 0 .. 255                  ubyte
 -32768 .. 32767           word
 0 .. 65535                uword
-2147483647 .. 2147483647 long (only for const)
+-2147483647 .. 2147483647 long  (there is no unsigned long right now)
 ========================= =================

 If the number fits in a byte but you really require it as a word value, you'll have to explicitly cast it: ``60 as uword``
@@ -233,15 +236,21 @@ to be done on word values, and don't want to explicitly have to cast everything
    uword  offset = column * 64       ; does (column * 64) as uword, wrong result?
    uword  offset = column * $0040    ; does (column as uword) * 64 , a word calculation

-Only for ``const`` numbers, you can use larger values (32 bits signed integers). The compiler can handle those
-internally in expressions. As soon as you have to actually store it into a variable,
-you have to make sure the resulting value fits into the byte or word size of the variable.
-
 .. attention::
    Doing math on signed integers can result in code that is a lot larger and slower than
    when using unsigned integers. Make sure you really need the signed numbers, otherwise
    stick to unsigned integers for efficiency.

+.. attention::
+    Not all operations on Long integers are supported at the moment, although most common
+    operations should work fine.
+    There is no unsigned long type at the moment, but you can sometimes simply treat the signed
+    long value as an unsigned 32 bits value just fine.
+    Operations on long integers take a lot of instructions on 8 bit cpu's so code that uses them
+    a lot will be much slower than when you restrict yourself to 8 or 16 bit values. Use long values sparingly.
+    **Several operations on long values require the use of the R0 and R1 virtual register as temporary storage**
+    so if you are working with long values, you should assume that the contents of R0 and R1 are destroyed.
+

 Booleans
 ^^^^^^^^
@@ -538,7 +547,7 @@ Constants
 When using ``const``, the value of the 'variable' cannot be changed; it has become a compile-time constant value instead.
 You'll have to specify the initial value expression. This value is then used
 by the compiler everywhere you refer to the constant (and no memory is allocated
-for the constant itself). Onlythe simple numeric types (byte, word, float) can be defined as a constant.
+for the constant itself). Onlythe simple numeric types (byte, word, long, float) can be defined as a constant.
 If something is defined as a constant, very efficient code can usually be generated from it.
 Variables on the other hand can't be optimized as much, need memory, and more code to manipulate them.
 Note that a subset of the library routines in the ``math``, ``strings`` and ``floats`` modules are recognised in
@@ -1,32 +1,15 @@
 %import textio
-%import math
-%import verafx
 %zeropage basicsafe

 main {
-    %option verafxmuls
-
    sub start() {
+        &long ll = 5000

-        cx16.r5s = 22
-        cx16.r6s = -999
+        ll = $9988776655

-        cx16.r0s = cx16.r5s * cx16.r6s
-        txt.print_w(cx16.r0s)
-        txt.nl()
-
-        long lv = cx16.r5s * cx16.r6s
-        txt.print_l(lv)
-        txt.nl()
-
-
-        cx16.r5s = 5555
-        cx16.r6s = -9999
-        lv = cx16.r5s * cx16.r6s
-        txt.print_l(lv)
-        txt.nl()
-        lv = verafx.muls(cx16.r5s, cx16.r6s)
-        txt.print_l(lv)
-        txt.nl()
+        txt.print_ubhex(@(5000), false)
+        txt.print_ubhex(@(5001), false)
+        txt.print_ubhex(@(5002), false)
+        txt.print_ubhex(@(5003), false)
    }
 }