Commit Graph

52 Commits

Author SHA1 Message Date
Stephen Heumann a6ef872513 Add debugging option to detect illegal use of null pointers.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).

Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
2023-02-12 18:56:02 -06:00
Stephen Heumann a87aeef25b Ensure native peephole opt uses a jump table.
In ORCA/Pascal's code generation, a case statement may use a jump table or a sequence of comparisons depending on whether it is considered sparse. This one was just a little too sparse to use a jump table, but changing it to use one makes it considerably faster. To force generation of a jump table, this commit adds several more explicit cases (even though they don't do anything).
2022-12-20 20:31:24 -06:00
Stephen Heumann cf9f19c93d Optimize LDA+TAY to LDY (when A is unused after).
This pattern comes up in the new return code when returning a local variable.
2022-12-20 20:21:25 -06:00
Stephen Heumann 44499bdddb Make root files jump to the shutdown code rather than calling it.
This better reflects that the shutdown code will never return.
2022-12-11 22:14:09 -06:00
Stephen Heumann 17936a14ed Rework root file code for CDevs to avoid leaking user IDs.
Formerly, the code would allocate user IDs but never free them. The result was that one user ID was leaked for each time a CDev was opened and closed.

The new root code calls new cleanup code in ORCALib, which detects if the CDev is going away and deallocates its user ID if so.
2022-12-11 22:01:29 -06:00
Stephen Heumann ecca7a7737 Never make the segment in the root file dynamic.
This would previously happen if a segment directive with "dynamic" appeared before the first function in the program. That would cause the resulting program not to work, because the root segment needs to be a static segment at the start of the program, but if it is dynamic it would come after a jump table and a static segment of library code.

The root segments are also configured to refer to main or the NDA/CDA entry points using LEXPR records, so that they can be in dynamic segments (not that they necessarily should be). That change is intentionally not done for CDEV/XCMD/NBA, because they use code resources, which do not support dynamic segments, so it is better to force a linker error in these cases.
2022-12-11 14:46:38 -06:00
Stephen Heumann 1754607908 Add native peephole opts for stack repair code.
These mainly affect cases of multiple successive or nested function calls.
2022-12-10 21:56:16 -06:00
Stephen Heumann 32975b720f Allow native code peephole opt to be used when stack repair is enabled.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d3) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.

This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
2022-12-10 20:34:00 -06:00
Stephen Heumann 7364e2d2d3 Fix issue with native code optimization of TYA+STA.
This would be changed to STY, but that is invalid if the A value is needed afterward. This could affect the code for certain division operations (after the optimizations in commit 4470626ade).

Here is an example that would be miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        unsigned i = 55555;
        unsigned a,b;
        a = b = i / 10000;
        printf("%u %u\n", a,b);
}

Also, remove MVN from the list of "ASafe" instructions since it really isn't, although I don't think this was affecting anything in practice.
2022-12-10 19:37:48 -06:00
Stephen Heumann 6857913daa Make the object buffer dynamically resizable.
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.

This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
2022-12-06 21:49:20 -06:00
Stephen Heumann 8aedd42294 Optimize out TDC following TCD.
This can occur if the first code in the function (which could be an initializer) takes the address of a local variable.
2022-12-05 18:02:23 -06:00
Stephen Heumann d56cf7e666 Pass constant data to backend as pointers into buffer.
This avoids needing to generate many intermediate code records representing the data at most 8 bytes at a time, which should reduce memory use and probably improve performance for large initialized arrays or structs.
2022-12-03 00:14:15 -06:00
Stephen Heumann 99a10590b1 Avoid out-of-range branches around asm code using dcl directives.
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).

This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.

Here is an example of code affected by this:

int main(int argc, char **argv) {
top:
        if (!argc) { /* this caused a linker error */
                asm {
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                }
                goto top; /* this generated bad code with no error */
        }
}
2022-10-13 18:00:16 -05:00
Stephen Heumann 19683706cc Do not optimize code from asm statements.
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.

To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.

This fixes #34.
2022-10-12 22:03:37 -05:00
Stephen Heumann ca21e33ba7 Generate more efficient code for indirect function calls. 2022-10-11 21:14:40 -05:00
Stephen Heumann 05ecf5eef3 Add option to use the declared type for float/double/comp params.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.

The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
2022-09-18 21:16:46 -05:00
Stephen Heumann 60efb4d882 Generate better code for indexed jumps.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
2022-07-18 21:18:26 -05:00
Stephen Heumann c3567c81a4 Correct comments. 2022-07-11 18:23:36 -05:00
Stephen Heumann 9b31e7f72a Improve code generation for comparisons.
This converts comparisons like x > N (with constant N) to instead be evaluated as x >= N+1, since >= comparisons generate better code. This is possible as long as N is not the maximum value in the type, but in that case the comparison is always false. There are also a few other tweaks to the generated code in some cases.
2022-07-10 22:27:38 -05:00
Stephen Heumann 76e4b1f038 Optimize away some tax/tay instructions used only to set flags. 2022-07-10 17:35:56 -05:00
Stephen Heumann 393b7304a0 Optimize 16-bit multiplication by various constants.
This optimizes most multiplications by a power of 2 or the sum of two powers of 2, converting them to equivalent operations using shifts which should be faster than the general-purpose multiplication routine.
2022-07-06 22:24:54 -05:00
Stephen Heumann 497e5c036b Use new 16-bit unsigned multiply routine that complies with C standards.
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
2022-07-06 22:22:02 -05:00
Stephen Heumann 161bb952e3 Dynamically allocate string space, and make it larger.
This increases the limit on total bytes of strings in a function, and also frees up space in the blank segment.
2022-06-08 22:09:30 -05:00
Stephen Heumann e8d90a1b69 Do not generate extra zero bytes after certain string constants.
These extra bytes are unnecessary after the changes in commit 5871820e0c to make string constants explicitly include their null terminators.

The extra bytes would be generated for code like the following:

int main(void) {
        static char *s1 = "abc", *s2 = "def", *s3 = "ghi";
}
2022-01-29 18:27:03 -06:00
Stephen Heumann fc515108f4 Make floating-point casts reduce the range and precision of numbers.
The C standards generally allow floating-point operations to be done with extra range and precision, but they require that explicit casts convert to the actual type specified. ORCA/C was not previously doing that.

This patch relies on some new library routines (currently in ORCALib) to do this precision reduction.

This fixes #64.
2021-03-06 22:28:39 -06:00
Stephen Heumann f19d21365a Recognize more indirect long instructions in the native code optimizer.
These instructions can be generated for indirect accesses to quad values, and the optimization can sometimes make those code sequences more efficient (e.g. avoiding unnecessary reloads of Y).
2021-03-02 19:19:00 -06:00
Stephen Heumann e3b24fb50b Add support for real to long long conversions. 2021-02-16 18:47:28 -06:00
Stephen Heumann e38be489df Implement comparisons for signed long long.
These use a library function to perform the comparison.
2021-02-15 18:10:34 -06:00
Stephen Heumann 8faafcc7c8 Implement 64-bit shifts. 2021-02-12 15:06:15 -06:00
Stephen Heumann 30f2eda4f3 Generate code for long long to real conversions. 2021-02-11 12:41:58 -06:00
Stephen Heumann 05868667b2 Implement 64-bit division and remainder, signed and unsigned.
These operations rely on new library routines in ORCALib (~CDIV8 and ~UDIV8).
2021-02-05 12:42:48 -06:00
Stephen Heumann 08cf7a0181 Implement 64-bit multiplication support.
Signed multiplication uses the existing ~MUL8 routine in SysLib. Unsigned multiplication will use a new ~UMUL8 library routine.
2021-02-04 22:23:59 -06:00
Stephen Heumann 168a06b7bf Add support for emitting 64-bit constants in statically-initialized data. 2021-02-04 02:17:10 -06:00
Stephen Heumann 32b0d53b07 PLD/TCD should invalidate register==DP location correspondences.
I don't think this ever comes up in code from the ORCA code generator, but it can in inline assembly.
2021-02-02 18:36:18 -06:00
Stephen Heumann ffe6c4e924 Spellcheck comments throughout the code.
There are no non-comment changes.
2020-01-29 17:09:52 -06:00
Stephen Heumann e6a0769bed Fix register optimizer bug that generated bad code in some cases.
The register optimizer tracks when a register is known to contain the same value as a memory location (direct page or absolute) and does optimizations based on this. But it did not always recognize when this information had become invalid because of a subsequent store to the memory location, so it might perform invalid optimizations. This patch adds those checks.

This fixes #66.
2020-01-28 12:54:18 -06:00
Stephen Heumann 857e432896 Disable a native-code optimization that was generating bad code for %.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.

It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.

The below program demonstrates an example that was being miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        int a = 100, b = 200, c = 3, d = 4;
        printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
2018-09-10 19:29:16 -05:00
Stephen Heumann 2d43074d5a Make % operator give proper remainders even if one or both operands are negative.
Per the C standards, the % operator should give a remainder after division, such that (a/b)*b + a%b equals a (provided that a/b is representable). As such, the operation of % is defined for cases where either or both of the operands are negative. Since division truncates toward 0, a%b should give a negative result (or 0) in cases where a is negative.

Previously, the % operator was essentially behaving like the "mod" operator in Pascal, which is equivalent for positive operands but not if either operand is negative. It would generally give incorrect results in those cases, or in some cases give compile-time or run-time errors.

This patch addresses both 16-bit and 32-bit signed computations at run time, and operations in constant expressions. The approach at run time is to call existing division routines, which return the correct remainder, except always as a positive number. The generated code checks the sign of the first operand, and if it is negative negates the remainder.

The code generated is somewhat large (especially for the 32-bit case), so it might be sensible to put it in a library function and call that, but for now it's just generated in-line. This avoids introducing a dependency on a new library function, so the generated code remains compatible with older versions of ORCALib (e.g. the GNO one).

Fixes #10.
2018-09-10 18:21:17 -05:00
Stephen Heumann bdd60d9d08 When using varargs stack repair, only disable native-code peephole opt in functions containing varargs calls.
There is no need to reduce the optimization in other functions, which will not contain any varargs stack repair code.
2018-01-13 21:37:28 -06:00
Stephen Heumann 44714767e5 Don't optimize out certain volatile stores.
This could occur due to the new native-code peephole optimizations for stz instructions, which can collapse consecutive identical ones down to one instruction. This is OK most of the time, but not when dealing with volatile variables, so disable it in that case.

The following test case shows the issue (look at the generated code):

#pragma optimize -1
volatile int a;
int main(void) {
        a = 0;
        a = 0;
}
2018-01-07 21:50:32 -06:00
Stephen Heumann 5312843b93 Don't invalidly eliminate certain stores of 0 to global/static variables.
This could happen in native-code peephole optimization if two stz instructions targeting different global/static locations occurred consecutively.

This was a regression introduced by commit a3170ea7.

The following program demonstrates the problem:

#pragma optimize 1+2+8+64
int i,j=1;
int main (void) {
    i = 0;
    j = 0;
    return j; /* should return 0 */
}
2017-12-13 20:03:48 -06:00
Stephen Heumann a3170ea715 Do a few more native code peephole optimizations.
Patch from Kelvin Sherlock.
2017-10-21 20:36:21 -05:00
Stephen Heumann ccd653ddb9 Move some more code out of the blank segment to make space for static data. 2017-10-21 20:36:21 -05:00
Stephen Heumann cf1cd085d8 Don't block the LDA elimination optimization if a native code label is encountered.
This was an inadvertent change in commit 9d2bb600. This patch restores the old behavior with respect to d_lab.
2017-10-21 20:36:21 -05:00
Stephen Heumann 8c81b23b6f Expand the size of the object buffer from 64K to 128K, and use 32-bit values to track related sizes.
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)

This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.

This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
2017-10-21 20:36:21 -05:00
Stephen Heumann 41fb05404e Don’t add the length of the last segment generated in the previous execution to that of the segment in the root file.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
2017-10-21 20:36:21 -05:00
Stephen Heumann 709f9b3f25 Fix bug where comparing 32-bit values in static arrays or structs against 0 may give wrong results with large memory model.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.

This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.

The following program often demonstrates the problem (depending on memory layout and contents):

#pragma memorymodel 1
#pragma optimize 1

#include <stdio.h>

int i;
char ch1[32000];
long L1[1];

int main (void)
{
    if (L1 [0] != 0)
        printf("%li\n", L1[0]); /* shouldn't print */

    /* buggy behavior can happen if the bank bytes of these pointers differ */
    printf("%p %p\n", &L1[0], &i);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann fd48d77c60 Don’t erroneously optimize out lda instructions in certain cases involving instructions the native-code optimizer didn’t know about.
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.

The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.

Here is an example of a program with an asm block affected by this problem:

#pragma optimize 74
int x,y;

/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
    x = argc;
    y = argc + 6;

    asm {
        lda #1
        pha
        eor >x
        bne done
        inc argc
done:   pla
    }

    printf("%i\n", argc);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann efb23003f7 Don't do an optimization that would move a store to a DP location above an indirect load using that DP location.
This generated invalid code in instances like the following. The code generated for "s = s->u.next" would update the most significant word of s first, then use an indirect load with the half-updated pointer value to update the least significant word of s. This would generally corrupt the result if the new and old pointers had different bank bytes.

#pragma optimize 79
#include <stdio.h>

struct S {
	int i;
	union {
		struct S * next;
	} u;
} s1 = {0, 0};

int main (void)
{
	struct S * s = &s1;
	s = s->u.next;
	if (s != 0)
		puts("compiler bug detected\n"); /* May not always be triggered, depending on memory contents. */
}
2017-10-21 20:36:20 -05:00
Stephen Heumann 97cca84713 When pointer arithmetic is used to initialize a global or static variable to point before the beginning of a string constant, initialize it to the value indicated by the pointer arithmetic.
Previously, such initializations would sometimes generate a garbage value pointing up to 65535 bytes beyond the start of the string constant. (This was due to a lack of sign-extension in the object code generation.)

Computing a pointer to before the start of an object invokes undefined behavior, so the previous behavior wasn't technically wrong, but it was unintuitive and served no useful purpose. The new behavior should at least be easier to understand and debug.
2017-10-21 20:36:20 -05:00