Commit Graph

120 Commits

Author SHA1 Message Date
Stephen Heumann 6b7414384f Fix code generation bug for indirect load/store of 64-bit values.
The issue was that if a 64-bit value was being loaded via one pointer and stored via another, the load and store parts could both be using y for their indexing, but they would clash with each other, potentially leading to loads coming from the wrong place.

Here are some examples that illustrate the problem:

/* example 1 */
int main(void) {
        struct {
                char c[16];
                long long x;
        } s = {.x = 0x1234567890abcdef}, *sp = &s;
        long long ll, *llp = ≪
        *llp = sp->x;
        return ll != s.x; // should return 0
}

/* example 2 */
int main(void) {
        struct {
                char c[16];
                long long x;
        } s = {.x = 0x1234567890abcdef}, *sp = &s;
        long long ll, *llp = ≪
        unsigned i = 0;
        *llp = sp[i].x;
        return ll != s.x; // should return 0
}

/* example 3 */
int main(void) {
        long long x[2] = {0, 0x1234567890abcdef}, *xp = x;
        long long ll, *llp = ≪
        unsigned i = 1;
        *llp = xp[i];
        return ll != x[1]; // should return 0
}
2024-04-10 20:49:17 -05:00
Stephen Heumann 77e0b8fc59 Fix codegen error for some indirect accesses to 64-bit values.
The code was not properly adding in the offset of the 64-bit value from the pointed-to location, so the wrong memory location would be accessed. This affected indirect accesses to non-initial structure members, when used as operands to certain operations.

Here is an example showing the problem:

#include <stdio.h>

long long x = 123456;

struct S {
        long long a;
        long long b;
} s = {0, 123456};

int main(void) {
        struct S *sp = &s;

        if (sp->b != x) {
                puts("error");
        }
}
2024-04-03 21:04:47 -05:00
Stephen Heumann 72234a4f2b Generate better code for most unsigned 32-bit comparisons. 2024-03-10 21:24:33 -05:00
Stephen Heumann 36f766a662 Generate better code for comparisons against constant 1 or 2. 2024-03-06 21:57:27 -06:00
Stephen Heumann 4c402fc883 Generate better code for certain equality/inequality comparisons. 2024-03-06 21:18:50 -06:00
Stephen Heumann ca0147507b Generate slightly better code for logical negation. 2024-03-06 17:04:51 -06:00
Stephen Heumann 24c6e72a83 Simplify some conditional branches.
This affects certain places where code like the following could be generated:

	bCC lab2
lab1	brl ...
lab2 ...

If lab1 is no longer referenced due to previous optimizations, it can be removed. This then allows the bCC+brl combination to be shortened to a single conditional branch, if the target is close enough.

This introduces a flag for tracking and potentially removing labels that are only used as the target of one branch. This could be used more widely, but currently it is only used for the specific code sequences shown above. Using it in other places could potentially open up possibilities for invalid native-code optimizations that were previously blocked due to the presence of the label.
2024-03-05 22:20:34 -06:00
Stephen Heumann 8f07ca5d6c Generate better code for && and || with 32-bit operands. 2024-03-05 17:09:21 -06:00
Stephen Heumann 60b472a99e Optimize generated code for some indexing ops in large memory model.
This generates slightly better code for indexing a global/static char array with a signed 16-bit index and a positive offset, e.g. a[i+1].

Here is an example that is affected:

#pragma memorymodel 1
#pragma optimize -1
char a[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
int main(int argc, char *argv[]) {
        return a[argc+2];
}
2024-03-04 19:38:39 -06:00
Stephen Heumann 995885540b Correct a comment. 2024-03-04 19:03:13 -06:00
Stephen Heumann 74cec68dac Generate better code for some floating-point and long long constants.
We now recognize cases where the same value needs to be pushed for several consecutive words, so it is more efficient to load it into a register and push that rather than just using PEA instructions.
2023-04-02 19:39:03 -05:00
Stephen Heumann ae89e77bbe Remove some unused or write-only variables. 2023-03-24 19:49:55 -05:00
Stephen Heumann 7e860e60df Generate better code for pc_ixa in large memory model.
This improves the code for certain array indexing operations.
2023-03-23 18:41:16 -05:00
Stephen Heumann 3a64c5b977 Generate better code for stack array indexing in large memory model.
Any stack-allocated array must be < 32KB, so we can use the same approach as in the small memory model to compute indexes for it (which is considerably more efficient than the large-memory-model code).
2023-03-20 17:56:44 -05:00
Stephen Heumann cc36e9929f Remove some unused variables. 2023-03-20 11:12:48 -05:00
Stephen Heumann a5eafe56af Generate more efficient code for 16-bit signed comparisons.
The new code is smaller and (in the common case where the subtraction does not overflow) faster. It takes advantage of the fact that in overflow cases the carry flag always gets set to the opposite of the sign bit of the result.
2023-03-18 20:05:56 -05:00
Stephen Heumann 49deff3c86 Generate more efficient code for certain conditionals.
This will change a "jump if true" to "jump if false" (or vice versa) and logically negate the condition in certain cases where that generates better code.

An assembly peephole optimization for certain "branch to branch" instructions is also added. (Certain conditionals could generate these.)
2023-03-14 21:32:20 -05:00
Stephen Heumann 85890e0b6b Give an error if assembly code tries to use direct page addressing for a local variable that is out of range.
This could previously cause bad code to be produced with no error reported.
2023-03-04 21:06:07 -06:00
Stephen Heumann a6ef872513 Add debugging option to detect illegal use of null pointers.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).

Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
2023-02-12 18:56:02 -06:00
Stephen Heumann 40260bb8a0 Remove an unnecessary instruction from stack check code.
The intention may have been to set the flags based on the return value, but that is not part of the calling convention and nothing should be relying on it.
2023-01-14 19:09:45 -06:00
Stephen Heumann 2958619726 Fix varargs stack repair.
Varargs-only stack repair (i.e. using #pragma optimize bit 3 but not bit 6) was broken by commit 32975b720f. It removed some code that was needed to allocate the direct page location used to hold the stack pointer value in that case. This would lead to invalid code being produced, which could cause a crash when run. The fix is to revert the erroneous parts of commit 32975b720f (which do not affect its core purpose of enabling intermediate code peephole optimization to be used when stack repair code is active).
2023-01-08 15:15:32 -06:00
Stephen Heumann 854a6779a9 Generate even better code for constant returns.
If the return value is just a numeric constant or static address, it can simply be loaded right before the RTL instruction, avoiding any data movement.

This could actually be applied to a somewhat broader class of expressions, but for now it only applies to numeric or address constants, for which it is clearly correct.
2022-12-19 21:18:41 -06:00
Stephen Heumann d68e0b268f Generate more efficient code for a single return at end of function.
When a function has a single return statement at the end and meets certain other constraints, we now generate a different intermediate code instruction to evaluate the return value as part of the return operation, rather than assigning it to (effectively) a variable and then reading that value again to return it.

This approach could actually be used for all returns in C code, but for now we only use it for a single return at the end. Directly applying it in other cases could increase the code size by duplicating the function epilogue code.
2022-12-19 18:52:46 -06:00
Stephen Heumann 32975b720f Allow native code peephole opt to be used when stack repair is enabled.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d3) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.

This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
2022-12-10 20:34:00 -06:00
Stephen Heumann fb5a2fcf33 Generate more efficient code for shifts by 13, 14, or 15 bits. 2022-12-07 21:54:53 -06:00
Stephen Heumann 0c4660d5fc Generate better code for pc_mov in some cases.
This allows it to use MVN-based copying code in more cases, including when moving to/from local variables on the stack. This is slightly shorter and more efficient than calling a helper function.
2022-12-05 17:58:30 -06:00
Stephen Heumann 2550081517 Fix bug with 4-byte comparisons against globals in large memory model.
Long addressing was not being used to access the values, which could lead to mis-evaluation of comparisons against values in global structs, unions, or arrays, depending on the memory layout.

This could sometimes affect the c99desinit.c test, when run with large memory model and at least intermediate code peephole optimization. It could also affect this simpler test (depending on memory layout):

#pragma memorymodel 1
#pragma optimize 1
struct S {
        void *p;
} s =  {&s};
int main(void) {
        return s.p != &s; /* should be 0 */
}
2022-12-04 21:54:29 -06:00
Stephen Heumann 19683706cc Do not optimize code from asm statements.
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.

To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.

This fixes #34.
2022-10-12 22:03:37 -05:00
Stephen Heumann ca21e33ba7 Generate more efficient code for indirect function calls. 2022-10-11 21:14:40 -05:00
Stephen Heumann 05ecf5eef3 Add option to use the declared type for float/double/comp params.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.

The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
2022-09-18 21:16:46 -05:00
Stephen Heumann 60efb4d882 Generate better code for indexed jumps.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
2022-07-18 21:18:26 -05:00
Stephen Heumann bdf8ed4f29 Simplify some code. 2022-07-17 18:15:29 -05:00
Stephen Heumann 417fd1ad9c Generate better code for && and ||. 2022-07-11 21:16:18 -05:00
Stephen Heumann 312a3a09b9 Generate better code for long long >= comparisons. 2022-07-11 19:20:55 -05:00
Stephen Heumann 687a5eaa45 Generate better code for pc_not on boolean operands. 2022-07-11 18:54:39 -05:00
Stephen Heumann b5b76b624c Use pei rather than load+push in a few places. 2022-07-11 18:42:14 -05:00
Stephen Heumann 607211d38e Rearrange some labels to facilitate branch-shortening optimization. 2022-07-11 18:39:00 -05:00
Stephen Heumann 9b31e7f72a Improve code generation for comparisons.
This converts comparisons like x > N (with constant N) to instead be evaluated as x >= N+1, since >= comparisons generate better code. This is possible as long as N is not the maximum value in the type, but in that case the comparison is always false. There are also a few other tweaks to the generated code in some cases.
2022-07-10 22:27:38 -05:00
Stephen Heumann 76e4b1f038 Optimize away some tax/tay instructions used only to set flags. 2022-07-10 17:35:56 -05:00
Stephen Heumann bf40e861aa Fix indentation. 2022-07-10 13:12:10 -05:00
Stephen Heumann 2dff68e6ae Eliminate an unnecessary instruction in quad-to-word conversion.
The TAY instruction would set the flags, but that is unnecessary because pc_cnv is a "NeedsCondition" operation (and some other conversions also do not reliably set the flags).

The code is also changed to preserve the Y register, possibly facilitating register optimizations.
2022-07-09 21:48:56 -05:00
Stephen Heumann 4470626ade Optimize division/remainder by various constants.
This generally covers powers of two and certain other values. (Details differ for signed/unsigned div/rem.)
2022-07-09 15:05:47 -05:00
Stephen Heumann 054719aab2 Fix bug in code generation for the product of two constants.
This was a problem introduced in commit 393b7304a0. It could cause a compiler error for unoptimized array indexing code, e.g.:

int a[100];
int main(void) {
        return a[5];
}
2022-07-09 15:01:25 -05:00
Stephen Heumann f0d827eade Generate more efficient code for certain subtractions.
This affects 16-bit subtractions where where only the left operand is "complex" (i.e. most things other than constants and simple loads). They were using an unnecessarily complicated code path suitable for the case where both operands are complex.
2022-07-07 18:38:41 -05:00
Stephen Heumann 7898c619c8 Fix several cases where a condition might not be evaluated correctly.
These could occur because the code for certain operations was assumed to set the z flag based on the result value, but did not actually do so. The affected operations were shifts, loads or stores of bit-fields, and ? : expressions.

Here is an example showing the problem with a shift:

#pragma optimize 1
int main(void) {
        int i = 1, j = 0;
        return (i >> j) ? 1 : 0;
}

Here is an example showing the problem with a bit-field load:

struct {
        signed int i : 16;
} s = {1};
int main(void) {
        return (s.i) ? 1 : 0;
}

Here is an example showing the problem with a bit-field store:

#pragma optimize 1
struct {
        signed int i : 16;
} s;
int main(void) {
        return (s.i = 1) ? 1 : 0;
}

Here is an example showing the problem with a ? : expression:

#pragma optimize 1
int main(void) {
        int a = 5;
        return (a ? (a<<a) : 0) ? 0 : 1;
}
2022-07-07 18:26:37 -05:00
Stephen Heumann 393b7304a0 Optimize 16-bit multiplication by various constants.
This optimizes most multiplications by a power of 2 or the sum of two powers of 2, converting them to equivalent operations using shifts which should be faster than the general-purpose multiplication routine.
2022-07-06 22:24:54 -05:00
Stephen Heumann 497e5c036b Use new 16-bit unsigned multiply routine that complies with C standards.
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
2022-07-06 22:22:02 -05:00
Stephen Heumann 161bb952e3 Dynamically allocate string space, and make it larger.
This increases the limit on total bytes of strings in a function, and also frees up space in the blank segment.
2022-06-08 22:09:30 -05:00
Stephen Heumann a85846cc80 Fix codegen bug with pc_bno in some cases with a 64-bit right operand.
The desired location for the quad result was not saved, so it could be overwritten when generating code for the left operand. This could result in incorrect code that might trash the stack.

Here is an example affected by this:

#pragma optimize 1
int main(void) {
        long long a, b=2;
        char c = (a=1,b);
}
2022-06-08 20:49:32 -05:00
Stephen Heumann daff1754b2 Make volatile loads from IO/softswitches access exactly the byte(s) specified.
Previously, one-byte loads were typically done by reading a 16-bit value and then masking off the upper 8 bits. This is a problem when accessing softswitches or slot IO locations, because reading the subsequent byte may have some undesired effect. Now, ORCA/C will do an 8-bit read for such cases, if the volatile qualifier is used.

There were also a couple optimizations that could occasionally result in not all the bytes of a larger value actually being read. These are now disabled for volatile loads that may access softswitches or IO.

These changes should make ORCA/C more suitable for writing low-level software like device drivers.
2022-05-23 21:10:29 -05:00