Commit Graph

65 Commits

Author SHA1 Message Date
Stephen Heumann
83537fd3c7 Disable a peephole optimization that can produce bad code.
The optimization applies to code sequences like:
	dec abs
	lda abs
	beq ...
where the dec and lda were supposed to refer to the same location.

There were two problems with this optimization as written:
-It considered the dec and lda to refer to the same location even if they were actually references to different elements of the same array.
-It did not work in the case where the A register value was needed in subsequent code.

The first of these was already an issue in previous ORCA/C releases, as in the following example:

#pragma optimize -1
int x[2] = {0,0};
int main(void) {
        --x[0];
        if (x[1] != 0)
                return 123;
        return 0; /* should return 0 */
}

I do not believe the second problem was triggered by any code sequences generated in previous releases of ORCA/C, but it can be triggered after commit 4c402fc88, e.g. by the following example:

#pragma optimize -1
int x = 1;
int main(void) {
        int y = 123;
        --x;
        return x == 0; /* should return 1 */
}

Since the circumstances where this peephole optimization was triggered validly are pretty obscure, just disabling it should have a minimal impact on the generated code.
2024-03-17 21:31:18 -05:00
Stephen Heumann
24c6e72a83 Simplify some conditional branches.
This affects certain places where code like the following could be generated:

	bCC lab2
lab1	brl ...
lab2 ...

If lab1 is no longer referenced due to previous optimizations, it can be removed. This then allows the bCC+brl combination to be shortened to a single conditional branch, if the target is close enough.

This introduces a flag for tracking and potentially removing labels that are only used as the target of one branch. This could be used more widely, but currently it is only used for the specific code sequences shown above. Using it in other places could potentially open up possibilities for invalid native-code optimizations that were previously blocked due to the presence of the label.
2024-03-05 22:20:34 -06:00
Stephen Heumann
0f18fa63b5 Optimize some additional cases of a branch to a branch.
This covers patterns like

	bCC lab
	???
	???
lab:	bra/brl ...

These can come up in the new code for 32-bit ||, but also in cases like "if (i > 0) ...".
2024-03-05 17:16:17 -06:00
Stephen Heumann
9a56a50f5f Support FPE card auto-detection.
The second parameter of #pragma float is now optional, and if it missing or invalid then the FPE slot is auto-detected by the start-up code. This is done by calling the new ~InitFloat function in the FPE version of SysFloat.
2023-06-26 18:33:54 -05:00
Stephen Heumann
0021fd81bc #pragma float: Generate code in the .root file to set the FPE slot.
This allows valid FPE-using programs to be compiled using only #pragma float, with no changes needed to the code itself.

The slot-setting code is only generated if the slot is 1..7, and even then it can be overridden by calling setfpeslot(), so this should not cause compatibility problems for existing code.
2023-06-17 18:13:31 -05:00
Stephen Heumann
0b3f48157e Simplify code for writing out extended constants.
This removes the need for the CnvSX function, so it is removed.
2023-04-04 18:11:04 -05:00
Stephen Heumann
7e860e60df Generate better code for pc_ixa in large memory model.
This improves the code for certain array indexing operations.
2023-03-23 18:41:16 -05:00
Stephen Heumann
cc36e9929f Remove some unused variables. 2023-03-20 11:12:48 -05:00
Stephen Heumann
cbf32e5b71 Comment out an unused peephole optimization involving BVS.
The code generator never generates this code sequence (and did not do so even prior to the last commit), so having a peephole optimization for it is pointless.
2023-03-18 20:09:49 -05:00
Stephen Heumann
a5eafe56af Generate more efficient code for 16-bit signed comparisons.
The new code is smaller and (in the common case where the subtraction does not overflow) faster. It takes advantage of the fact that in overflow cases the carry flag always gets set to the opposite of the sign bit of the result.
2023-03-18 20:05:56 -05:00
Stephen Heumann
49deff3c86 Generate more efficient code for certain conditionals.
This will change a "jump if true" to "jump if false" (or vice versa) and logically negate the condition in certain cases where that generates better code.

An assembly peephole optimization for certain "branch to branch" instructions is also added. (Certain conditionals could generate these.)
2023-03-14 21:32:20 -05:00
Stephen Heumann
7c8ec41148 Optimize some assembly code sequences that can occur for array access.
Here is an example that benefits from the new optimizations:

#pragma optimize 7
void f(char *a, unsigned i, unsigned n) {
        a[i] = (a[i] & 0xF0) | n;
}
2023-03-09 17:53:45 -06:00
Stephen Heumann
c6ba1e1c1c Use bit operations rather than division in a few places.
This should produce faster code.
2023-03-06 22:52:52 -06:00
Stephen Heumann
a6ef872513 Add debugging option to detect illegal use of null pointers.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).

Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
2023-02-12 18:56:02 -06:00
Stephen Heumann
a87aeef25b Ensure native peephole opt uses a jump table.
In ORCA/Pascal's code generation, a case statement may use a jump table or a sequence of comparisons depending on whether it is considered sparse. This one was just a little too sparse to use a jump table, but changing it to use one makes it considerably faster. To force generation of a jump table, this commit adds several more explicit cases (even though they don't do anything).
2022-12-20 20:31:24 -06:00
Stephen Heumann
cf9f19c93d Optimize LDA+TAY to LDY (when A is unused after).
This pattern comes up in the new return code when returning a local variable.
2022-12-20 20:21:25 -06:00
Stephen Heumann
44499bdddb Make root files jump to the shutdown code rather than calling it.
This better reflects that the shutdown code will never return.
2022-12-11 22:14:09 -06:00
Stephen Heumann
17936a14ed Rework root file code for CDevs to avoid leaking user IDs.
Formerly, the code would allocate user IDs but never free them. The result was that one user ID was leaked for each time a CDev was opened and closed.

The new root code calls new cleanup code in ORCALib, which detects if the CDev is going away and deallocates its user ID if so.
2022-12-11 22:01:29 -06:00
Stephen Heumann
ecca7a7737 Never make the segment in the root file dynamic.
This would previously happen if a segment directive with "dynamic" appeared before the first function in the program. That would cause the resulting program not to work, because the root segment needs to be a static segment at the start of the program, but if it is dynamic it would come after a jump table and a static segment of library code.

The root segments are also configured to refer to main or the NDA/CDA entry points using LEXPR records, so that they can be in dynamic segments (not that they necessarily should be). That change is intentionally not done for CDEV/XCMD/NBA, because they use code resources, which do not support dynamic segments, so it is better to force a linker error in these cases.
2022-12-11 14:46:38 -06:00
Stephen Heumann
1754607908 Add native peephole opts for stack repair code.
These mainly affect cases of multiple successive or nested function calls.
2022-12-10 21:56:16 -06:00
Stephen Heumann
32975b720f Allow native code peephole opt to be used when stack repair is enabled.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d3) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.

This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
2022-12-10 20:34:00 -06:00
Stephen Heumann
7364e2d2d3 Fix issue with native code optimization of TYA+STA.
This would be changed to STY, but that is invalid if the A value is needed afterward. This could affect the code for certain division operations (after the optimizations in commit 4470626ade).

Here is an example that would be miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        unsigned i = 55555;
        unsigned a,b;
        a = b = i / 10000;
        printf("%u %u\n", a,b);
}

Also, remove MVN from the list of "ASafe" instructions since it really isn't, although I don't think this was affecting anything in practice.
2022-12-10 19:37:48 -06:00
Stephen Heumann
6857913daa Make the object buffer dynamically resizable.
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.

This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
2022-12-06 21:49:20 -06:00
Stephen Heumann
8aedd42294 Optimize out TDC following TCD.
This can occur if the first code in the function (which could be an initializer) takes the address of a local variable.
2022-12-05 18:02:23 -06:00
Stephen Heumann
d56cf7e666 Pass constant data to backend as pointers into buffer.
This avoids needing to generate many intermediate code records representing the data at most 8 bytes at a time, which should reduce memory use and probably improve performance for large initialized arrays or structs.
2022-12-03 00:14:15 -06:00
Stephen Heumann
99a10590b1 Avoid out-of-range branches around asm code using dcl directives.
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).

This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.

Here is an example of code affected by this:

int main(int argc, char **argv) {
top:
        if (!argc) { /* this caused a linker error */
                asm {
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                        dcl 0
                }
                goto top; /* this generated bad code with no error */
        }
}
2022-10-13 18:00:16 -05:00
Stephen Heumann
19683706cc Do not optimize code from asm statements.
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.

To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.

This fixes #34.
2022-10-12 22:03:37 -05:00
Stephen Heumann
ca21e33ba7 Generate more efficient code for indirect function calls. 2022-10-11 21:14:40 -05:00
Stephen Heumann
05ecf5eef3 Add option to use the declared type for float/double/comp params.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.

The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
2022-09-18 21:16:46 -05:00
Stephen Heumann
60efb4d882 Generate better code for indexed jumps.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
2022-07-18 21:18:26 -05:00
Stephen Heumann
c3567c81a4 Correct comments. 2022-07-11 18:23:36 -05:00
Stephen Heumann
9b31e7f72a Improve code generation for comparisons.
This converts comparisons like x > N (with constant N) to instead be evaluated as x >= N+1, since >= comparisons generate better code. This is possible as long as N is not the maximum value in the type, but in that case the comparison is always false. There are also a few other tweaks to the generated code in some cases.
2022-07-10 22:27:38 -05:00
Stephen Heumann
76e4b1f038 Optimize away some tax/tay instructions used only to set flags. 2022-07-10 17:35:56 -05:00
Stephen Heumann
393b7304a0 Optimize 16-bit multiplication by various constants.
This optimizes most multiplications by a power of 2 or the sum of two powers of 2, converting them to equivalent operations using shifts which should be faster than the general-purpose multiplication routine.
2022-07-06 22:24:54 -05:00
Stephen Heumann
497e5c036b Use new 16-bit unsigned multiply routine that complies with C standards.
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
2022-07-06 22:22:02 -05:00
Stephen Heumann
161bb952e3 Dynamically allocate string space, and make it larger.
This increases the limit on total bytes of strings in a function, and also frees up space in the blank segment.
2022-06-08 22:09:30 -05:00
Stephen Heumann
e8d90a1b69 Do not generate extra zero bytes after certain string constants.
These extra bytes are unnecessary after the changes in commit 5871820e0c to make string constants explicitly include their null terminators.

The extra bytes would be generated for code like the following:

int main(void) {
        static char *s1 = "abc", *s2 = "def", *s3 = "ghi";
}
2022-01-29 18:27:03 -06:00
Stephen Heumann
fc515108f4 Make floating-point casts reduce the range and precision of numbers.
The C standards generally allow floating-point operations to be done with extra range and precision, but they require that explicit casts convert to the actual type specified. ORCA/C was not previously doing that.

This patch relies on some new library routines (currently in ORCALib) to do this precision reduction.

This fixes #64.
2021-03-06 22:28:39 -06:00
Stephen Heumann
f19d21365a Recognize more indirect long instructions in the native code optimizer.
These instructions can be generated for indirect accesses to quad values, and the optimization can sometimes make those code sequences more efficient (e.g. avoiding unnecessary reloads of Y).
2021-03-02 19:19:00 -06:00
Stephen Heumann
e3b24fb50b Add support for real to long long conversions. 2021-02-16 18:47:28 -06:00
Stephen Heumann
e38be489df Implement comparisons for signed long long.
These use a library function to perform the comparison.
2021-02-15 18:10:34 -06:00
Stephen Heumann
8faafcc7c8 Implement 64-bit shifts. 2021-02-12 15:06:15 -06:00
Stephen Heumann
30f2eda4f3 Generate code for long long to real conversions. 2021-02-11 12:41:58 -06:00
Stephen Heumann
05868667b2 Implement 64-bit division and remainder, signed and unsigned.
These operations rely on new library routines in ORCALib (~CDIV8 and ~UDIV8).
2021-02-05 12:42:48 -06:00
Stephen Heumann
08cf7a0181 Implement 64-bit multiplication support.
Signed multiplication uses the existing ~MUL8 routine in SysLib. Unsigned multiplication will use a new ~UMUL8 library routine.
2021-02-04 22:23:59 -06:00
Stephen Heumann
168a06b7bf Add support for emitting 64-bit constants in statically-initialized data. 2021-02-04 02:17:10 -06:00
Stephen Heumann
32b0d53b07 PLD/TCD should invalidate register==DP location correspondences.
I don't think this ever comes up in code from the ORCA code generator, but it can in inline assembly.
2021-02-02 18:36:18 -06:00
Stephen Heumann
ffe6c4e924 Spellcheck comments throughout the code.
There are no non-comment changes.
2020-01-29 17:09:52 -06:00
Stephen Heumann
e6a0769bed Fix register optimizer bug that generated bad code in some cases.
The register optimizer tracks when a register is known to contain the same value as a memory location (direct page or absolute) and does optimizations based on this. But it did not always recognize when this information had become invalid because of a subsequent store to the memory location, so it might perform invalid optimizations. This patch adds those checks.

This fixes #66.
2020-01-28 12:54:18 -06:00
Stephen Heumann
857e432896 Disable a native-code optimization that was generating bad code for %.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.

It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.

The below program demonstrates an example that was being miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        int a = 100, b = 200, c = 3, d = 4;
        printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
2018-09-10 19:29:16 -05:00