remove non-functional verafx.mult(). note: muls() is still there and just fine!

added documentation/source code comments to the cpu word*word multiplication routine not producing the correct upper 16 bits.
This commit is contained in:
Irmen de Jong 2024-09-28 00:43:05 +02:00
parent 4acf38031a
commit 9c7a645e18
6 changed files with 23 additions and 12 deletions

View File

@ -111,15 +111,17 @@ verafx {
; unsigned multiplication just passes the values as signed to muls ; unsigned multiplication just passes the values as signed to muls
; if you do this yourself in your call to muls, it will save a few instructions. ; if you do this yourself in your call to muls, it will save a few instructions.
inline asmsub mult(uword value1 @R0, uword value2 @R1) clobbers(X) -> uword @AY, uword @R0 { ; TODO fix this: verafx.muls doesn't support unsigned values like this
; Returns the 32 bits unsigned result in AY and R0 (lower word, upper word). ; inline asmsub mult(uword value1 @R0, uword value2 @R1) clobbers(X) -> uword @AY, uword @R0 {
%asm {{ ; ; Returns the 32 bits unsigned result in AY and R0 (lower word, upper word).
jsr verafx.muls ; %asm {{
}} ; jsr verafx.muls
} ; }}
; }
asmsub muls(word value1 @R0, word value2 @R1) clobbers(X) -> word @AY, word @R0 { asmsub muls(word value1 @R0, word value2 @R1) clobbers(X) -> word @AY, word @R0 {
; Returns the 32 bits signed result in AY and R0 (lower word, upper word). ; Returns the 32 bits signed result in AY and R0 (lower word, upper word).
; Vera Fx multiplication support only works on signed values!
%asm {{ %asm {{
lda #(2 << 1) lda #(2 << 1)
sta cx16.VERA_CTRL ; $9F25 sta cx16.VERA_CTRL ; $9F25

View File

@ -55,13 +55,14 @@ _multiplier = P8ZP_SCRATCH_REG
multiply_words .proc multiply_words .proc
; -- multiply two 16-bit words into a 32-bit result (signed and unsigned) ; -- multiply two 16-bit words into a 32-bit result (UNSIGNED)
; input: A/Y = first 16-bit number, multiply_words.multiplier = second 16-bit number ; input: A/Y = first 16-bit number, multiply_words.multiplier = second 16-bit number
; output: multiply_words.result, 4-bytes/32-bits product, LSB order (low-to-high) low 16 bits also in AY. ; output: multiply_words.result, 4-bytes/32-bits product, LSB order (low-to-high) low 16 bits also in AY.
; you can retrieve the upper 16 bits via math.mul16_last_upper()
; NOTE: the result (which includes the multiplier parameter on entry) is a 4-byte array. ; NOTE FOR NEGATIVE VALUES:
; this routine could be faster if we could stick that into zeropage, ; The routine also works for NEGATIVE (signed) word values, but ONLY the lower 16 bits of the result are correct then!
; but there currently is no way to use 4 consecutive bytes in ZP (without disabling irq and saving/restoring them)... ; Prog8 only uses those so that's not an issue, but math.mul16_last_upper() no longer gives the correct result here.
; mult62.a ; mult62.a
; from: https://github.com/TobyLobster/multiply_test/blob/main/tests/mult62.a ; from: https://github.com/TobyLobster/multiply_test/blob/main/tests/mult62.a
@ -179,7 +180,7 @@ _inner_loop2
ldy result+1 ldy result+1
rts rts
result .byte 0,0,0,0 result .byte 0,0,0,0 ; routine could be faster if this were in Zeropage...
.pend .pend

View File

@ -168,6 +168,9 @@ _sinecosR8 .char trunc(127.0 * sin(range(180+45) * rad(360.0/180.0)))
; for instance, simply printing a number may already result in new multiplication calls being performed ; for instance, simply printing a number may already result in new multiplication calls being performed
; - not all multiplications in the source code result in an actual multiplication call: ; - not all multiplications in the source code result in an actual multiplication call:
; some simpler multiplications will be optimized away into faster routines. These will not set the upper 16 bits at all! ; some simpler multiplications will be optimized away into faster routines. These will not set the upper 16 bits at all!
; - THE RESULT IS ONLY VALID IF THE MULTIPLICATION WAS DONE WITH UWORD ARGUMENTS (or two positive WORD arguments)
; as soon as a negative word value (or 2) was used in the multiplication, these upper 16 bits are not valid!!
; Suggestion (if you are on the Commander X16): use verafx.muls() to get a hardware accelerated 32 bit signed multplication.
%asm {{ %asm {{
lda multiply_words.result+2 lda multiply_words.result+2
ldy multiply_words.result+3 ldy multiply_words.result+3

View File

@ -293,6 +293,8 @@ math {
; for instance, simply printing a number may already result in new multiplication calls being performed ; for instance, simply printing a number may already result in new multiplication calls being performed
; - not all multiplications in the source code result in an actual multiplication call: ; - not all multiplications in the source code result in an actual multiplication call:
; some simpler multiplications will be optimized away into faster routines. These will not set the upper 16 bits at all! ; some simpler multiplications will be optimized away into faster routines. These will not set the upper 16 bits at all!
; - THE RESULT IS ONLY VALID IF THE MULTIPLICATION WAS DONE WITH UWORD ARGUMENTS (or two positive WORD arguments)
; as soon as a negative word value (or 2) was used in the multiplication, these upper 16 bits are not valid!!
%ir {{ %ir {{
syscall 33 (): r0.w syscall 33 (): r0.w
returnr.w r0 returnr.w r0

View File

@ -787,6 +787,10 @@ but perhaps the provided ones can be of service too.
It does not work for the verafx multiplication routines on the Commander X16! It does not work for the verafx multiplication routines on the Commander X16!
These have a different way to obtain the upper 16 bits of the result: just read cx16.r0. These have a different way to obtain the upper 16 bits of the result: just read cx16.r0.
**NOTE:** the result is only valid if the multiplication was done with uword arguments (or two positive word arguments).
As soon as a single negative word value (or both) was used in the multiplication, these upper 16 bits are not valid!
Suggestion (if you are on the Commander X16): use ``verafx.muls()`` to get a hardware accelerated 32 bit signed multiplication.
``crc16 (uword data, uword length) -> uword`` ``crc16 (uword data, uword length) -> uword``
Returns a CRC-16 (XMODEM) checksum over the given data buffer. Returns a CRC-16 (XMODEM) checksum over the given data buffer.
Note: on the Commander X16, there is a CRC-16 routine in the kernal: cx16.memory_crc(). Note: on the Commander X16, there is a CRC-16 routine in the kernal: cx16.memory_crc().

View File

@ -14,7 +14,6 @@ Compiler:
- Can we support signed % (remainder) somehow? - Can we support signed % (remainder) somehow?
- Don't add "random" rts to %asm blocks but instead give a warning about it? (but this breaks existing behavior that others already depend on... command line switch? block directive?) - Don't add "random" rts to %asm blocks but instead give a warning about it? (but this breaks existing behavior that others already depend on... command line switch? block directive?)
- IR: implement missing operators in AssignmentGen (array shifts etc) - IR: implement missing operators in AssignmentGen (array shifts etc)
- IR: CMPI+BSTEQ --> new BEQ reg,value,label instruction (like BGT etc)
- instead of copy-pasting inline asmsubs, make them into a 64tass macro and use that instead. - instead of copy-pasting inline asmsubs, make them into a 64tass macro and use that instead.
that will allow them to be reused from custom user written assembly code as well. that will allow them to be reused from custom user written assembly code as well.
- Multidimensional arrays and chained indexing, purely as syntactic sugar over regular arrays. - Multidimensional arrays and chained indexing, purely as syntactic sugar over regular arrays.