Commit Graph

40 Commits

Author SHA1 Message Date
Stephen Heumann
99a10590b1 Avoid out-of-range branches around asm code using dcl directives.
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).

This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.

Here is an example of code affected by this:

int main(int argc, char **argv) {
top:
        if (!argc) { /* this caused a linker error */
                asm {
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                }
                goto top; /* this generated bad code with no error */
        }
}
2022-10-13 18:00:16 -05:00
Stephen Heumann
19683706cc Do not optimize code from asm statements.
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.

To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.

This fixes #34.
2022-10-12 22:03:37 -05:00
Stephen Heumann
ca21e33ba7 Generate more efficient code for indirect function calls. 2022-10-11 21:14:40 -05:00
Stephen Heumann
05ecf5eef3 Add option to use the declared type for float/double/comp params.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.

The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
2022-09-18 21:16:46 -05:00
Stephen Heumann
60efb4d882 Generate better code for indexed jumps.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
2022-07-18 21:18:26 -05:00
Stephen Heumann
c3567c81a4 Correct comments. 2022-07-11 18:23:36 -05:00
Stephen Heumann
9b31e7f72a Improve code generation for comparisons.
This converts comparisons like x > N (with constant N) to instead be evaluated as x >= N+1, since >= comparisons generate better code. This is possible as long as N is not the maximum value in the type, but in that case the comparison is always false. There are also a few other tweaks to the generated code in some cases.
2022-07-10 22:27:38 -05:00
Stephen Heumann
76e4b1f038 Optimize away some tax/tay instructions used only to set flags. 2022-07-10 17:35:56 -05:00
Stephen Heumann
393b7304a0 Optimize 16-bit multiplication by various constants.
This optimizes most multiplications by a power of 2 or the sum of two powers of 2, converting them to equivalent operations using shifts which should be faster than the general-purpose multiplication routine.
2022-07-06 22:24:54 -05:00
Stephen Heumann
497e5c036b Use new 16-bit unsigned multiply routine that complies with C standards.
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
2022-07-06 22:22:02 -05:00
Stephen Heumann
161bb952e3 Dynamically allocate string space, and make it larger.
This increases the limit on total bytes of strings in a function, and also frees up space in the blank segment.
2022-06-08 22:09:30 -05:00
Stephen Heumann
e8d90a1b69 Do not generate extra zero bytes after certain string constants.
These extra bytes are unnecessary after the changes in commit 5871820e0c to make string constants explicitly include their null terminators.

The extra bytes would be generated for code like the following:

int main(void) {
        static char *s1 = "abc", *s2 = "def", *s3 = "ghi";
}
2022-01-29 18:27:03 -06:00
Stephen Heumann
fc515108f4 Make floating-point casts reduce the range and precision of numbers.
The C standards generally allow floating-point operations to be done with extra range and precision, but they require that explicit casts convert to the actual type specified. ORCA/C was not previously doing that.

This patch relies on some new library routines (currently in ORCALib) to do this precision reduction.

This fixes #64.
2021-03-06 22:28:39 -06:00
Stephen Heumann
f19d21365a Recognize more indirect long instructions in the native code optimizer.
These instructions can be generated for indirect accesses to quad values, and the optimization can sometimes make those code sequences more efficient (e.g. avoiding unnecessary reloads of Y).
2021-03-02 19:19:00 -06:00
Stephen Heumann
e3b24fb50b Add support for real to long long conversions. 2021-02-16 18:47:28 -06:00
Stephen Heumann
e38be489df Implement comparisons for signed long long.
These use a library function to perform the comparison.
2021-02-15 18:10:34 -06:00
Stephen Heumann
8faafcc7c8 Implement 64-bit shifts. 2021-02-12 15:06:15 -06:00
Stephen Heumann
30f2eda4f3 Generate code for long long to real conversions. 2021-02-11 12:41:58 -06:00
Stephen Heumann
05868667b2 Implement 64-bit division and remainder, signed and unsigned.
These operations rely on new library routines in ORCALib (~CDIV8 and ~UDIV8).
2021-02-05 12:42:48 -06:00
Stephen Heumann
08cf7a0181 Implement 64-bit multiplication support.
Signed multiplication uses the existing ~MUL8 routine in SysLib. Unsigned multiplication will use a new ~UMUL8 library routine.
2021-02-04 22:23:59 -06:00
Stephen Heumann
168a06b7bf Add support for emitting 64-bit constants in statically-initialized data. 2021-02-04 02:17:10 -06:00
Stephen Heumann
32b0d53b07 PLD/TCD should invalidate register==DP location correspondences.
I don't think this ever comes up in code from the ORCA code generator, but it can in inline assembly.
2021-02-02 18:36:18 -06:00
Stephen Heumann
ffe6c4e924 Spellcheck comments throughout the code.
There are no non-comment changes.
2020-01-29 17:09:52 -06:00
Stephen Heumann
e6a0769bed Fix register optimizer bug that generated bad code in some cases.
The register optimizer tracks when a register is known to contain the same value as a memory location (direct page or absolute) and does optimizations based on this. But it did not always recognize when this information had become invalid because of a subsequent store to the memory location, so it might perform invalid optimizations. This patch adds those checks.

This fixes #66.
2020-01-28 12:54:18 -06:00
Stephen Heumann
857e432896 Disable a native-code optimization that was generating bad code for %.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.

It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.

The below program demonstrates an example that was being miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        int a = 100, b = 200, c = 3, d = 4;
        printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
2018-09-10 19:29:16 -05:00
Stephen Heumann
2d43074d5a Make % operator give proper remainders even if one or both operands are negative.
Per the C standards, the % operator should give a remainder after division, such that (a/b)*b + a%b equals a (provided that a/b is representable). As such, the operation of % is defined for cases where either or both of the operands are negative. Since division truncates toward 0, a%b should give a negative result (or 0) in cases where a is negative.

Previously, the % operator was essentially behaving like the "mod" operator in Pascal, which is equivalent for positive operands but not if either operand is negative. It would generally give incorrect results in those cases, or in some cases give compile-time or run-time errors.

This patch addresses both 16-bit and 32-bit signed computations at run time, and operations in constant expressions. The approach at run time is to call existing division routines, which return the correct remainder, except always as a positive number. The generated code checks the sign of the first operand, and if it is negative negates the remainder.

The code generated is somewhat large (especially for the 32-bit case), so it might be sensible to put it in a library function and call that, but for now it's just generated in-line. This avoids introducing a dependency on a new library function, so the generated code remains compatible with older versions of ORCALib (e.g. the GNO one).

Fixes #10.
2018-09-10 18:21:17 -05:00
Stephen Heumann
bdd60d9d08 When using varargs stack repair, only disable native-code peephole opt in functions containing varargs calls.
There is no need to reduce the optimization in other functions, which will not contain any varargs stack repair code.
2018-01-13 21:37:28 -06:00
Stephen Heumann
44714767e5 Don't optimize out certain volatile stores.
This could occur due to the new native-code peephole optimizations for stz instructions, which can collapse consecutive identical ones down to one instruction. This is OK most of the time, but not when dealing with volatile variables, so disable it in that case.

The following test case shows the issue (look at the generated code):

#pragma optimize -1
volatile int a;
int main(void) {
        a = 0;
        a = 0;
}
2018-01-07 21:50:32 -06:00
Stephen Heumann
5312843b93 Don't invalidly eliminate certain stores of 0 to global/static variables.
This could happen in native-code peephole optimization if two stz instructions targeting different global/static locations occurred consecutively.

This was a regression introduced by commit a3170ea7.

The following program demonstrates the problem:

#pragma optimize 1+2+8+64
int i,j=1;
int main (void) {
    i = 0;
    j = 0;
    return j; /* should return 0 */
}
2017-12-13 20:03:48 -06:00
Stephen Heumann
a3170ea715 Do a few more native code peephole optimizations.
Patch from Kelvin Sherlock.
2017-10-21 20:36:21 -05:00
Stephen Heumann
ccd653ddb9 Move some more code out of the blank segment to make space for static data. 2017-10-21 20:36:21 -05:00
Stephen Heumann
cf1cd085d8 Don't block the LDA elimination optimization if a native code label is encountered.
This was an inadvertent change in commit 9d2bb600. This patch restores the old behavior with respect to d_lab.
2017-10-21 20:36:21 -05:00
Stephen Heumann
8c81b23b6f Expand the size of the object buffer from 64K to 128K, and use 32-bit values to track related sizes.
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)

This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.

This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
2017-10-21 20:36:21 -05:00
Stephen Heumann
41fb05404e Don’t add the length of the last segment generated in the previous execution to that of the segment in the root file.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
2017-10-21 20:36:21 -05:00
Stephen Heumann
709f9b3f25 Fix bug where comparing 32-bit values in static arrays or structs against 0 may give wrong results with large memory model.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.

This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.

The following program often demonstrates the problem (depending on memory layout and contents):

#pragma memorymodel 1
#pragma optimize 1

#include <stdio.h>

int i;
char ch1[32000];
long L1[1];

int main (void)
{
    if (L1 [0] != 0)
        printf("%li\n", L1[0]); /* shouldn't print */

    /* buggy behavior can happen if the bank bytes of these pointers differ */
    printf("%p %p\n", &L1[0], &i);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
fd48d77c60 Don’t erroneously optimize out lda instructions in certain cases involving instructions the native-code optimizer didn’t know about.
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.

The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.

Here is an example of a program with an asm block affected by this problem:

#pragma optimize 74
int x,y;

/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
    x = argc;
    y = argc + 6;

    asm {
        lda #1
        pha
        eor >x
        bne done
        inc argc
done:   pla
    }

    printf("%i\n", argc);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
efb23003f7 Don't do an optimization that would move a store to a DP location above an indirect load using that DP location.
This generated invalid code in instances like the following. The code generated for "s = s->u.next" would update the most significant word of s first, then use an indirect load with the half-updated pointer value to update the least significant word of s. This would generally corrupt the result if the new and old pointers had different bank bytes.

#pragma optimize 79
#include <stdio.h>

struct S {
	int i;
	union {
		struct S * next;
	} u;
} s1 = {0, 0};

int main (void)
{
	struct S * s = &s1;
	s = s->u.next;
	if (s != 0)
		puts("compiler bug detected\n"); /* May not always be triggered, depending on memory contents. */
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
97cca84713 When pointer arithmetic is used to initialize a global or static variable to point before the beginning of a string constant, initialize it to the value indicated by the pointer arithmetic.
Previously, such initializations would sometimes generate a garbage value pointing up to 65535 bytes beyond the start of the string constant. (This was due to a lack of sign-extension in the object code generation.)

Computing a pointer to before the start of an object invokes undefined behavior, so the previous behavior wasn't technically wrong, but it was unintuitive and served no useful purpose. The new behavior should at least be easier to understand and debug.
2017-10-21 20:36:20 -05:00
Stephen Heumann
46b6aa389f Change all text/source files to LF line endings. 2017-10-21 18:40:19 -05:00
mikew50
e72177985e ORCA/C 2.1.0 source from the Opus ][ CD 2017-10-01 17:47:47 -06:00