This problem could cause "duplicate symbol" and "undeclared identifier" errors, for example in the following program:
typedef int f1( void );
void bar( void ) {
int i;
f1 *foo;
int baz;
i = 10;
}
int foo;
long baz;
The C standards define "pp-number" tokens handled by the preprocessor using a syntax that encompasses various things that aren't valid integer or floating constants, or are constants too large for ORCA/C to handle. These cases would previously give errors even in code skipped by the preprocessor. With this patch, most such errors in skipped code are now ignored.
This is useful, e.g., to allow for #ifdefed-out code containing 64-bit constants.
There are still some cases involving pp-numbers that should be allowed but aren't, particularly in the context of macros.
This should give C99-compatible behavior, as far as it goes. The functions aren't actually inlined, but that's just a quality-of-implementation issue. No C standard requires actual inlining.
Non-static inline functions are still not supported. The C99 semantics for them are more complicated, and they're less widely used, so they're a lower priority for now.
The "inline" function specifier can currently only come after the "static" storage class specifier. This relates to a broader issue where not all legal orderings of declaration specifiers are supported.
Since "inline" was already treated as a keyword in ORCA/C, this shouldn't create any extra compatibility issues for C89 code.
This allows code like the following to compile:
#if 0
#if some bogus stuff !
#endif
#endif
This is what the C standards require. The change affects #if, #ifdef, and #ifndef directives.
This may be needed to handle code targeted at other compilers that allow pseudo-functions such as "__has_feature" in preprocessor expressions.
This is necessary to make the "compile" command halt and not process additional source files, as well as to make occ stop and not run the linker.
Previously, this was not happening when the #error directive was used, or when an undefined label was used in a goto.
This addressed the issue with the compco07.c test case.
Previously the result type was based on the operand types (using the arithmetic conversions), which is incorrect. The following program illustrates the issue:
#include <stdio.h>
int main(void)
{
/* should print "1 0 2 2" */
printf("%i %i %lu %lu\n", 0L || 2, 0.0 && 2,
sizeof(1L || 5), sizeof(1.0 && 2.5));
}
The following is an example of a program that requires this:
#include <stdio.h>
int main(void)
{
int true = !0.0;
int false = !1.1;
printf("%i %i\n", true, false);
}
This is as required by C90. C99 and later require (u)intmax_t, which must be 64-bit or greater.
The following example shows problems with the previous behavior:
#if (30000 + 30000 == 60000) && (1 << 16 == 0x10000)
int main(void) {}
#else
#error "preprocessor error"
#endif
This avoids various problems with inappropriately processing these elements, which should not be recognized as such at preprocessing time. For example, the following program should compile without errors, but did not:
typedef long foo;
#if int+1
#if foo-1
int main(void) {}
#endif
#endif
The following is an example that would give a compile error before this patch:
int main(void)
{
unsigned long i = 1 % 3000000000;
}
The remainder operation still does not work properly for signed types when either operand is negative. It gives either errors or incorrect values in various cases, both when evaluated at compile time and run time. Fully addressing this (including the run-time cases) would require library updates.
The following example shows cases that were mis-evaluated:
/* Should print "3 10000" */
#include <stdio.h>
int main(void)
{
printf("%lu %lu\n", 100000ul / 30000ul, 100000ul % 30000ul);
}
This is a problem introduced by the scanner changes between ORCA/C 2.1.0 and ORCA/C 2.1.1 B3.
The following examples demonstrate the problem:
#define m m
m
#define f(x) f(x)
f(a)
This occurred because the values were being rounded rather than truncated when converted to long, unsigned long, or unsigned int.
This was causing problems in the C6.2.3.5.CC test case when compiled with optimization.
The below program demonstrates the problem:
#pragma optimize 1
#include <stdio.h>
int main (void)
{
long L;
unsigned int ui;
unsigned long ul;
L = -1.5;
ui = 1.5;
ul = 1.5;
printf("%li %u %lu\n", L, ui, ul); /* should print "-1 1 1" */
}
This fixes the compco09.c test case.
This implementation permits duplicate copies of type qualifiers to appear. This is technically illegal in C90, but it’s legal in C99 and later, and ORCA/C already allows this in other contexts.
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)
This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.
This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
This is necessary to compile some very large functions, such as the main interpreter loop in Git.
This consumes about 8K of extra memory for the additional label records.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.
This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.
The following program often demonstrates the problem (depending on memory layout and contents):
#pragma memorymodel 1
#pragma optimize 1
#include <stdio.h>
int i;
char ch1[32000];
long L1[1];
int main (void)
{
if (L1 [0] != 0)
printf("%li\n", L1[0]); /* shouldn't print */
/* buggy behavior can happen if the bank bytes of these pointers differ */
printf("%p %p\n", &L1[0], &i);
}
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.
The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.
Here is an example of a program with an asm block affected by this problem:
#pragma optimize 74
int x,y;
/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
x = argc;
y = argc + 6;
asm {
lda #1
pha
eor >x
bne done
inc argc
done: pla
}
printf("%i\n", argc);
}
The issue was that one of the procedures used for CSE would recursively call itself for every 'next' link in the code of the basic block. To avoid this, I made it loop back to the top instead (i.e. did a manual tail-call elimination transformation).
This problem could be observed with large switch statements as in the following example, although other codes with very large basic blocks might have triggered it too. Whether ORCA/C actually crashes will depend on the memory layout--in my testing, this example consistently caused it to crash when running under GNO:
#pragma optimize 16
int main (int argc, char **argv)
{
switch (argc)
{
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
case 8: case 9: case 10: case 11: case 12: case 13: case 14: case 15:
case 16: case 17: case 18: case 19: case 20: case 21: case 22: case 23:
case 24: case 25: case 26: case 27: case 28: case 29: case 30: case 31:
case 32: case 33: case 34: case 35: case 36: case 37: case 38: case 39:
case 40: case 41: case 42:
case 262:
;
}
}
Also, fix a case where an uninitialized value could be used, potentially resulting in errors not being reported (although I haven’t seen that in practice).
This fixes problems where >>= operations might not use an arithmetic shift in certain cases where they should, as in the below program:
#include <stdio.h>
int main (void)
{
int i;
unsigned u;
long l;
unsigned long ul;
i = -1;
u = 1;
i >>= u;
printf("%i\n", i); /* should be -1 */
l = -1;
ul = 3;
l >>= ul;
printf("%li\n", l); /* should be -1 */
}
This is necessary so that subsequent processing sees the correct expression type and proceeds accordingly. In particular, without this patch, the casts in the below program were erroneously ignored:
#include <stdio.h>
int main(void)
{
unsigned int u;
unsigned char c;
c = 0;
u = (unsigned char)~c;
printf("%u\n", u);
c = 200;
printf("%i\n", (unsigned char)(c+c));
}
This is as required by the C standards: the type of the right operand should not affect the result type.
The following program demonstrates problems with the old behavior:
#include <stdio.h>
int main(void)
{
unsigned long ul;
long l;
unsigned u;
int i;
ul = 0x8000 << 1L; /* should be 0 */
printf("%lx\n", ul);
l = -1 >> 1U; /* should be -1 */
printf("%ld\n", l);
u = 0xFF10;
l = 8;
ul = u << l; /* should be 0x1000 */
printf("%lx\n", ul);
l = -4;
ul = 1;
l = l >> ul; /* should be -2 */
printf("%ld\n", l);
}
The following demonstrates cases that would erroneously be allowed (and misleadingly give a size of 0) before:
#include <stdio.h>
struct s *S;
int main(void)
{
printf("%lu %lu\n", sizeof(struct s), sizeof *S);
}
The following test case demonstrates the problem:
#include <stdio.h>
int main (void)
{
if (sizeof(int) - 5 < 0) puts("error 1");
if (sizeof &main - 9 < 0) puts("error 2");
}
This affected binary or unary expressions with constant operands, at least one of which was of type unsigned long.
The following test case demonstrates the problem:
#include <stdio.h>
int main (void)
{
if (0 + 0x80000000ul < 0) puts("error 1");
if (~0ul < 0) puts("error 2");
}
This cuts a few instructions from code like what is shown in commit affbe9. (It also works around the bug with that example, although that patch addresses the root cause of the problem.)
This generated invalid code in instances like the following. The code generated for "s = s->u.next" would update the most significant word of s first, then use an indirect load with the half-updated pointer value to update the least significant word of s. This would generally corrupt the result if the new and old pointers had different bank bytes.
#pragma optimize 79
#include <stdio.h>
struct S {
int i;
union {
struct S * next;
} u;
} s1 = {0, 0};
int main (void)
{
struct S * s = &s1;
s = s->u.next;
if (s != 0)
puts("compiler bug detected\n"); /* May not always be triggered, depending on memory contents. */
}
ORCA/C previously allowed struct/union members to be declared with incomplete type. Because of this, it allowed C99-style flexible array members to be declared, albeit by accident rather than by design. In some basic testing, these seem to work correctly, except that they could be initialized and that would give rise to odd behavior.
I have restricted it to allowing flexible array members only in the cases allowed by C99/C11, and otherwise disallowing members with incomplete type. I have also prohibited initializing flexible array members.
Also, generate better code for zero-initializing small arrays.
The problem was that the code would call the library routine ~ZERO with a size of 1, but it only works properly with a size of 2 or more. While adding a check here, I also changed it to not call ~ZERO for other small arrays (<=10 bytes), since it is generally more efficient to just initialize them directly.
The initializations in the following are examples that could trigger the problem:
int main(void)
{
struct { int i; char s[1]; } foo = {1, 0};
char arr[2][1] = {2};
}
This may be someone trying to use a C11-style anonymous struct/union, which should be flagged as an error until and unless those are supported. Otherwise, it probably just indicates that the programmer is confused. In any case, an error should be flagged for it.
C89 restricts bit fields to (signed) int and unsigned int only, although later standards note that additional types may be supported. ORCA/C supports the other integer types as an extension.
This fixes the compco01.c test case.