In ORCA/Pascal's code generation, a case statement may use a jump table or a sequence of comparisons depending on whether it is considered sparse. This one was just a little too sparse to use a jump table, but changing it to use one makes it considerably faster. To force generation of a jump table, this commit adds several more explicit cases (even though they don't do anything).
Formerly, the code would allocate user IDs but never free them. The result was that one user ID was leaked for each time a CDev was opened and closed.
The new root code calls new cleanup code in ORCALib, which detects if the CDev is going away and deallocates its user ID if so.
This would previously happen if a segment directive with "dynamic" appeared before the first function in the program. That would cause the resulting program not to work, because the root segment needs to be a static segment at the start of the program, but if it is dynamic it would come after a jump table and a static segment of library code.
The root segments are also configured to refer to main or the NDA/CDA entry points using LEXPR records, so that they can be in dynamic segments (not that they necessarily should be). That change is intentionally not done for CDEV/XCMD/NBA, because they use code resources, which do not support dynamic segments, so it is better to force a linker error in these cases.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d3) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.
This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
This would be changed to STY, but that is invalid if the A value is needed afterward. This could affect the code for certain division operations (after the optimizations in commit 4470626ade).
Here is an example that would be miscompiled:
#pragma optimize -1
#include <stdio.h>
int main(void) {
unsigned i = 55555;
unsigned a,b;
a = b = i / 10000;
printf("%u %u\n", a,b);
}
Also, remove MVN from the list of "ASafe" instructions since it really isn't, although I don't think this was affecting anything in practice.
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.
This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
This avoids needing to generate many intermediate code records representing the data at most 8 bytes at a time, which should reduce memory use and probably improve performance for large initialized arrays or structs.
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).
This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.
Here is an example of code affected by this:
int main(int argc, char **argv) {
top:
if (!argc) { /* this caused a linker error */
asm {
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
}
goto top; /* this generated bad code with no error */
}
}
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.
To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.
This fixes#34.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
This converts comparisons like x > N (with constant N) to instead be evaluated as x >= N+1, since >= comparisons generate better code. This is possible as long as N is not the maximum value in the type, but in that case the comparison is always false. There are also a few other tweaks to the generated code in some cases.
This optimizes most multiplications by a power of 2 or the sum of two powers of 2, converting them to equivalent operations using shifts which should be faster than the general-purpose multiplication routine.
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
These extra bytes are unnecessary after the changes in commit 5871820e0c to make string constants explicitly include their null terminators.
The extra bytes would be generated for code like the following:
int main(void) {
static char *s1 = "abc", *s2 = "def", *s3 = "ghi";
}
The C standards generally allow floating-point operations to be done with extra range and precision, but they require that explicit casts convert to the actual type specified. ORCA/C was not previously doing that.
This patch relies on some new library routines (currently in ORCALib) to do this precision reduction.
This fixes#64.
These instructions can be generated for indirect accesses to quad values, and the optimization can sometimes make those code sequences more efficient (e.g. avoiding unnecessary reloads of Y).
The register optimizer tracks when a register is known to contain the same value as a memory location (direct page or absolute) and does optimizations based on this. But it did not always recognize when this information had become invalid because of a subsequent store to the memory location, so it might perform invalid optimizations. This patch adds those checks.
This fixes#66.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.
It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.
The below program demonstrates an example that was being miscompiled:
#pragma optimize -1
#include <stdio.h>
int main(void) {
int a = 100, b = 200, c = 3, d = 4;
printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
Per the C standards, the % operator should give a remainder after division, such that (a/b)*b + a%b equals a (provided that a/b is representable). As such, the operation of % is defined for cases where either or both of the operands are negative. Since division truncates toward 0, a%b should give a negative result (or 0) in cases where a is negative.
Previously, the % operator was essentially behaving like the "mod" operator in Pascal, which is equivalent for positive operands but not if either operand is negative. It would generally give incorrect results in those cases, or in some cases give compile-time or run-time errors.
This patch addresses both 16-bit and 32-bit signed computations at run time, and operations in constant expressions. The approach at run time is to call existing division routines, which return the correct remainder, except always as a positive number. The generated code checks the sign of the first operand, and if it is negative negates the remainder.
The code generated is somewhat large (especially for the 32-bit case), so it might be sensible to put it in a library function and call that, but for now it's just generated in-line. This avoids introducing a dependency on a new library function, so the generated code remains compatible with older versions of ORCALib (e.g. the GNO one).
Fixes#10.
This could occur due to the new native-code peephole optimizations for stz instructions, which can collapse consecutive identical ones down to one instruction. This is OK most of the time, but not when dealing with volatile variables, so disable it in that case.
The following test case shows the issue (look at the generated code):
#pragma optimize -1
volatile int a;
int main(void) {
a = 0;
a = 0;
}
This could happen in native-code peephole optimization if two stz instructions targeting different global/static locations occurred consecutively.
This was a regression introduced by commit a3170ea7.
The following program demonstrates the problem:
#pragma optimize 1+2+8+64
int i,j=1;
int main (void) {
i = 0;
j = 0;
return j; /* should return 0 */
}
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)
This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.
This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.
This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.
The following program often demonstrates the problem (depending on memory layout and contents):
#pragma memorymodel 1
#pragma optimize 1
#include <stdio.h>
int i;
char ch1[32000];
long L1[1];
int main (void)
{
if (L1 [0] != 0)
printf("%li\n", L1[0]); /* shouldn't print */
/* buggy behavior can happen if the bank bytes of these pointers differ */
printf("%p %p\n", &L1[0], &i);
}
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.
The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.
Here is an example of a program with an asm block affected by this problem:
#pragma optimize 74
int x,y;
/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
x = argc;
y = argc + 6;
asm {
lda #1
pha
eor >x
bne done
inc argc
done: pla
}
printf("%i\n", argc);
}
This generated invalid code in instances like the following. The code generated for "s = s->u.next" would update the most significant word of s first, then use an indirect load with the half-updated pointer value to update the least significant word of s. This would generally corrupt the result if the new and old pointers had different bank bytes.
#pragma optimize 79
#include <stdio.h>
struct S {
int i;
union {
struct S * next;
} u;
} s1 = {0, 0};
int main (void)
{
struct S * s = &s1;
s = s->u.next;
if (s != 0)
puts("compiler bug detected\n"); /* May not always be triggered, depending on memory contents. */
}
Previously, such initializations would sometimes generate a garbage value pointing up to 65535 bytes beyond the start of the string constant. (This was due to a lack of sign-extension in the object code generation.)
Computing a pointer to before the start of an object invokes undefined behavior, so the previous behavior wasn't technically wrong, but it was unintuitive and served no useful purpose. The new behavior should at least be easier to understand and debug.