Commit Graph

651 Commits

Author SHA1 Message Date
Stephen Heumann
d9523c145c Allow unknown preprocessor directives in skipped blocks.
For example, the following should not generate an error:

#if 0
#warning "..."
#endif
2017-10-21 20:36:21 -05:00
Stephen Heumann
e242f03501 Don't attempt bogus common subexpression elimination when loading structures on the stack.
Previously, the structure load would be treated as a common subexpression eligible for elimination, but the structure would always be treated as if it had a size of 4 bytes. If it did not, this would generally lead to a crash. (I'm also not sure if dependency analysis was being performed properly for these structures.)

The following program illustrates the problem:

#pragma optimize 17
struct mystruct { char x; } ms;
static void foo(struct mystruct pk) {}
int main(void)
{
    struct mystruct *p = &ms;
    foo(*p);
    foo(*p);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
c46cf79c79 Increase the maximum allowed number of local variables from 200 to 220. 2017-10-21 20:36:21 -05:00
Stephen Heumann
ccd653ddb9 Move some more code out of the blank segment to make space for static data. 2017-10-21 20:36:21 -05:00
Stephen Heumann
ad31ecfcae Fix bug where 32-bit addition and subtraction results are not saved in some cases.
This could happen in certain cases where the destination is not considered "simple" (e.g. because it is a local array location that does not fit in the direct page).

The following program demonstrates the problem:

#pragma optimize 1
int main(void) {
    long temp1 = 1, temp2 = 2, A[64];
    long B[2] = {0};
    B[1] = temp1 + temp2;
    return B[1]; /* should return 3 */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
afe3e9586b Fix bad code generation in some cases where compound assignment operators are used to update a value through a pointer.
This could occur because a temporary location might be used both in the l-value and r-value computations, but the final assignment code assumed it still had the value from the l-value computation.

The following function demonstrates this problem (*ip is not updated, and *p is trashed):

int badinc(char **p, int *ip)
{
    *ip += *(*p)++;
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
5969d80e57 If 'volatile' is used within a function, only reduce optimization for that function.
Previously, several optimizations would be disabled for the rest of the translation unit whenever the keyword 'volatile' appeared. Now, if 'volatile' is used within a function, it only reduces optimization for that function, since whatever was declared as 'volatile' will be out of scope after the function is over. Uses of 'volatile' outside functions still behave as before.
2017-10-21 20:36:21 -05:00
Stephen Heumann
1502e48188 Fix bug causing incorrect code generation in programs that use the 'volatile' keyword.
This bug could both cause accesses to volatile variables to be omitted, and also cause other expressions to be erroneously optimized out in certain circumstances.

As an example, both the access of x and the call to bar() would be erroneously removed in the following program:

#pragma optimize 1
volatile int x;
int bar(void);
void foo(void)
{
    if(x) ;
    if(bar()) ;
}

Note that this patch disables even more optimizations than previously if the 'volatile' keyword is used anywhere in a translation unit. This is necessary for correctness given the current design of ORCA/C, but it means that care should be taken to avoid unnecessary use of 'volatile'.
2017-10-21 20:36:21 -05:00
Stephen Heumann
fb612e14d1 Fix bug causing functions with 254 bytes of locals and parameters to crash on return.
This bug occurred because the generated code tried to store part of the return address to a direct page offset of 256, but instead an offset of 0 was used, resulting in an invalid return address and (typically) a crash. It could occur if the function took one or more parameters, and the total size of parameters and local variables (including compiler-generated ones) was 254 bytes.

The following program demonstrates the problem:

int main(int argc, char **argv) {
    char x[244];
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
e642a6f3fd Update test to account for enlarged string space. 2017-10-21 20:36:21 -05:00
Stephen Heumann
10ca3bcc73 Properly generate const-qualified structs and unions.
Global structs and unions with the const qualifier were not being generated in object files. This occurred because they were represented as having "defined types" and the code was not handling those types properly.

The following example demonstrated this problem:

const struct x { int i; } X = {9};
int main(void) {
    return X.i;
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
f79887c565 Don't erroneously pop the symbol table at declarations of a pointer to a typedef'd function type.
This problem could cause "duplicate symbol" and "undeclared identifier" errors, for example in the following program:

typedef int f1( void );
void bar( void ) {
    int i;
    f1 *foo;
    int baz;
    i = 10;
}
int foo;
long baz;
2017-10-21 20:36:21 -05:00
Stephen Heumann
5321ef2f84 Treat array types as compatible with corresponding pointer types in function prototypes.
This permits code like the following to compile, as it should:

int foo(int*);
int foo(int[]);
int foo(int[42]);
2017-10-21 20:36:21 -05:00
Stephen Heumann
8ca3d5f4f0 Allow skipped code to contain pp-numbers that are not valid numeric constants.
The C standards define "pp-number" tokens handled by the preprocessor using a syntax that encompasses various things that aren't valid integer or floating constants, or are constants too large for ORCA/C to handle. These cases would previously give errors even in code skipped by the preprocessor. With this patch, most such errors in skipped code are now ignored.

This is useful, e.g., to allow for #ifdefed-out code containing 64-bit constants.

There are still some cases involving pp-numbers that should be allowed but aren't, particularly in the context of macros.
2017-10-21 20:36:21 -05:00
Stephen Heumann
02de5f4137 Increase the total size of string constants permitted in each function.
The size limit is increased from 8000 bytes to 12500 bytes. This was needed to compile some functions with many string constants.
2017-10-21 20:36:21 -05:00
Stephen Heumann
4cff395745 Move some code from the blank segment to named load segments.
This frees up some space in the blank segment for more static data.
2017-10-21 20:36:21 -05:00
Stephen Heumann
227731a1a8 Allow "static inline" function declarations.
This should give C99-compatible behavior, as far as it goes. The functions aren't actually inlined, but that's just a quality-of-implementation issue. No C standard requires actual inlining.

Non-static inline functions are still not supported. The C99 semantics for them are more complicated, and they're less widely used, so they're a lower priority for now.

The "inline" function specifier can currently only come after the "static" storage class specifier. This relates to a broader issue where not all legal orderings of declaration specifiers are supported.

Since "inline" was already treated as a keyword in ORCA/C, this shouldn't create any extra compatibility issues for C89 code.
2017-10-21 20:36:21 -05:00
Stephen Heumann
6ea43d34a1 Skip tokens following preprocessing directives on lines that are skipped.
This allows code like the following to compile:

#if 0
#if some bogus stuff !
#endif
#endif

This is what the C standards require. The change affects #if, #ifdef, and #ifndef directives.

This may be needed to handle code targeted at other compilers that allow pseudo-functions such as "__has_feature" in preprocessor expressions.
2017-10-21 20:36:21 -05:00
Stephen Heumann
db2a09bd1d Allow trailing comma in enum (as in C99).
Patch from Kelvin Sherlock.
2017-10-21 20:36:21 -05:00
Stephen Heumann
a75d18a45c Make the main function return 0 if execution reaches the closing brace of its body.
This is required by C99 and later. It’s not required by C89, but it’s allowed and produces more predictable behavior.
2017-10-21 20:36:21 -05:00
Stephen Heumann
90c291808a Always report a non-zero error level when there is an error.
This is necessary to make the "compile" command halt and not process additional source files, as well as to make occ stop and not run the linker.

Previously, this was not happening when the #error directive was used, or when an undefined label was used in a goto.

This addressed the issue with the compco07.c test case.
2017-10-21 20:36:21 -05:00
Stephen Heumann
280f67e846 Make || and && operators in constant expressions yield results of type int.
Previously the result type was based on the operand types (using the arithmetic conversions), which is incorrect. The following program illustrates the issue:

#include <stdio.h>
int main(void)
{
    /* should print "1 0 2 2" */
    printf("%i %i %lu %lu\n", 0L || 2, 0.0 && 2,
           sizeof(1L || 5), sizeof(1.0 && 2.5));
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
40bed0a93e Allow application of the unary ! operator to floating constants.
The following is an example of a program that requires this:

#include <stdio.h>
int main(void)
{
    int true = !0.0;
    int false = !1.1;
    printf("%i %i\n", true, false);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
cd9a424499 Remove bogus code for applying unary &, ++, and -- operators to integer constants.
This is already precluded from executing by an earlier test, so there is no functional change.
2017-10-21 20:36:21 -05:00
Stephen Heumann
7ab1875b54 Evaluate preprocessor expressions using 32-bit long/unsigned long types.
This is as required by C90. C99 and later require (u)intmax_t, which must be 64-bit or greater.

The following example shows problems with the previous behavior:

#if (30000 + 30000 == 60000) && (1 << 16 == 0x10000)
int main(void) {}
#else
#error "preprocessor error"
#endif
2017-10-21 20:36:21 -05:00
Stephen Heumann
60df52c268 Disallow the application of the unary * operator to integer constants.
It must only be applied to pointer types.
2017-10-21 20:36:21 -05:00
Stephen Heumann
4c0b02f32e Always convert keywords and typedef names to identifiers when processing preprocessor expressions.
This avoids various problems with inappropriately processing these elements, which should not be recognized as such at preprocessing time. For example, the following program should compile without errors, but did not:

typedef long foo;
#if int+1
#if foo-1
int main(void) {}
#endif
#endif
2017-10-21 20:36:21 -05:00
Stephen Heumann
7fa74d1183 Allow floating literals that are immediately cast to integer types to appear in integer constant expressions.
The C standards say these should be permitted. The following declaration shows an example:

int a[(int)5.5];
2017-10-21 20:36:21 -05:00
Stephen Heumann
0cf948e3bd Fix to make statically-evaluated unsigned comparisons evaluate to 1 if true.
Previously, they evaluated to -1. The following example shows the problem:

#include <stdio.h>
int main(void)
{
    printf("%i\n", 0U <= 5U);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
018f5a7548 Fix bug where constant expressions involving unsigned int and signed long operands would be evaluated as unsigned long.
The following example demonstrates the problem:

#include <stdio.h>
int main (void)
{
	printf("%i\n", -5L < 10U); /* should print "1" */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
b6b2121a9e Allow unsigned long values greater than 2^31 to be used as the second operand to % in constant expressions.
The following is an example that would give a compile error before this patch:

int main(void)
{
    unsigned long i = 1 % 3000000000;
}

The remainder operation still does not work properly for signed types when either operand is negative. It gives either errors or incorrect values in various cases, both when evaluated at compile time and run time. Fully addressing this (including the run-time cases) would require library updates.
2017-10-21 20:36:21 -05:00
Stephen Heumann
f4ad0fab80 Fix bug in unsigned 32-bit division and remainder routines, which could cause mis-evaluation of constant expressions.
The following example shows cases that were mis-evaluated:

/* Should print "3 10000" */
#include <stdio.h>
int main(void)
{
	printf("%lu %lu\n", 100000ul / 30000ul, 100000ul % 30000ul);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
d0b4b75970 Fix problem where if a macro's name appeared inside the macro, it would be expanded repeatedly, leading to a crash.
This is a problem introduced by the scanner changes between ORCA/C 2.1.0 and ORCA/C 2.1.1 B3.

The following examples demonstrate the problem:

#define m m
m

#define f(x) f(x)
f(a)
2017-10-21 20:36:21 -05:00
Stephen Heumann
5c81d970b5 Fix issue where statically-evaluated conversions from floating-point values to some integer types could yield wrong values.
This occurred because the values were being rounded rather than truncated when converted to long, unsigned long, or unsigned int.

This was causing problems in the C6.2.3.5.CC test case when compiled with optimization.

The below program demonstrates the problem:

#pragma optimize 1
#include <stdio.h>
int main (void)
{
   long L;
   unsigned int ui;
   unsigned long ul;
   L = -1.5;
   ui = 1.5;
   ul = 1.5;
   printf("%li %u %lu\n", L, ui, ul); /* should print "-1 1 1" */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
f099222af6 Don't give an error when calling functions with const-qualified parameter types or returning from functions with const-qualified return type.
This fixes the compco12.c test case.
2017-10-21 20:36:21 -05:00
Stephen Heumann
49ddf5abf1 Allow type qualifiers for pointer types to be used in type names (in casts and sizeof expressions).
This fixes the compco09.c test case.

This implementation permits duplicate copies of type qualifiers to appear. This is technically illegal in C90, but it’s legal in C99 and later, and ORCA/C already allows this in other contexts.
2017-10-21 20:36:21 -05:00
Stephen Heumann
cf1cd085d8 Don't block the LDA elimination optimization if a native code label is encountered.
This was an inadvertent change in commit 9d2bb600. This patch restores the old behavior with respect to d_lab.
2017-10-21 20:36:21 -05:00
Stephen Heumann
8c81b23b6f Expand the size of the object buffer from 64K to 128K, and use 32-bit values to track related sizes.
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)

This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.

This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
2017-10-21 20:36:21 -05:00
Stephen Heumann
41fb05404e Don’t add the length of the last segment generated in the previous execution to that of the segment in the root file.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
2017-10-21 20:36:21 -05:00
Stephen Heumann
a4bffe65e5 Increase the limit on the number of intermediate code labels in a function from 2400 to 3200.
This is necessary to compile some very large functions, such as the main interpreter loop in Git.

This consumes about 8K of extra memory for the additional label records.
2017-10-21 20:36:21 -05:00
Stephen Heumann
709f9b3f25 Fix bug where comparing 32-bit values in static arrays or structs against 0 may give wrong results with large memory model.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.

This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.

The following program often demonstrates the problem (depending on memory layout and contents):

#pragma memorymodel 1
#pragma optimize 1

#include <stdio.h>

int i;
char ch1[32000];
long L1[1];

int main (void)
{
    if (L1 [0] != 0)
        printf("%li\n", L1[0]); /* shouldn't print */

    /* buggy behavior can happen if the bank bytes of these pointers differ */
    printf("%p %p\n", &L1[0], &i);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
fd48d77c60 Don’t erroneously optimize out lda instructions in certain cases involving instructions the native-code optimizer didn’t know about.
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.

The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.

Here is an example of a program with an asm block affected by this problem:

#pragma optimize 74
int x,y;

/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
    x = argc;
    y = argc + 6;

    asm {
        lda #1
        pha
        eor >x
        bne done
        inc argc
done:   pla
    }

    printf("%i\n", argc);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
aa7084dada Fix problem where long basic blocks could lead to a crash due to stack overflow in common subexpression elimination.
The issue was that one of the procedures used for CSE would recursively call itself for every 'next' link in the code of the basic block. To avoid this, I made it loop back to the top instead (i.e. did a manual tail-call elimination transformation).

This problem could be observed with large switch statements as in the following example, although other codes with very large basic blocks might have triggered it too. Whether ORCA/C actually crashes will depend on the memory layout--in my testing, this example consistently caused it to crash when running under GNO:

#pragma optimize 16
int main (int argc, char **argv)
{
    switch (argc)
    {
        case 0:  case 1:  case 2:  case 3:  case 4:  case 5:  case 6:  case 7:
        case 8:  case 9:  case 10: case 11: case 12: case 13: case 14: case 15:
        case 16: case 17: case 18: case 19: case 20: case 21: case 22: case 23:
        case 24: case 25: case 26: case 27: case 28: case 29: case 30: case 31:
        case 32: case 33: case 34: case 35: case 36: case 37: case 38: case 39:
        case 40: case 41: case 42:
        case 262:
        ;
    }
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
896c60d18c Determine the type to use for computing a >>= or <<= operation based only on the left operand.
Also, fix a case where an uninitialized value could be used, potentially resulting in errors not being reported (although I haven’t seen that in practice).

This fixes problems where >>= operations might not use an arithmetic shift in certain cases where they should, as in the below program:

#include <stdio.h>
int main (void)
{
    int i;
    unsigned u;
    long l;
    unsigned long ul;

    i = -1;
    u = 1;
    i >>= u;
    printf("%i\n", i); /* should be -1 */

    l = -1;
    ul = 3;
    l >>= ul;
    printf("%li\n", l); /* should be -1 */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
67a29304a7 Fix to consistently update expressionType to reflect the application of usual unary conversions (promotions).
This is necessary so that subsequent processing sees the correct expression type and proceeds accordingly. In particular, without this patch, the casts in the below program were erroneously ignored:

#include <stdio.h>
int main(void)
{
    unsigned int u;
    unsigned char c;

    c = 0;
    u = (unsigned char)~c;
    printf("%u\n", u);

    c = 200;
    printf("%i\n", (unsigned char)(c+c));
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
4f736079f0 If the day of the month is in the range 1-9, report it in __DATE__ with a leading space character rather than a leading 0.
This is what is required by the C standards.
2017-10-21 20:36:21 -05:00
Stephen Heumann
1994e7e353 Properly report dates in the year 2000 and beyond with the __DATE__ macro.
This should now work properly through the end of the year 2155, i.e. the limit of what the ReadTimeHex format can represent.
2017-10-21 20:36:21 -05:00
Stephen Heumann
af48935d43 Fix so the type of shift expressions depends only on the (promoted) type of their left operand.
This is as required by the C standards: the type of the right operand should not affect the result type.

The following program demonstrates problems with the old behavior:

#include <stdio.h>

int main(void)
{
    unsigned long ul;
    long l;
    unsigned u;
    int i;

    ul = 0x8000 << 1L; /* should be 0 */
    printf("%lx\n", ul);

    l = -1 >> 1U; /* should be -1 */
    printf("%ld\n", l);

    u = 0xFF10;
    l = 8;
    ul = u << l; /* should be 0x1000 */
    printf("%lx\n", ul);

    l = -4;
    ul = 1;
    l = l >> ul; /* should be -2 */
    printf("%ld\n", l);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
5618b2810e Don’t produce spurious error messages when #error is used with tokens other than a string constant.
According to the C standards, #error is supposed to take an arbitrary sequence of pp-tokens, not just a string constant.
2017-10-21 20:36:21 -05:00
Stephen Heumann
8b4c83f527 Give an error when trying to use sizeof on incomplete struct or union types.
The following demonstrates cases that would erroneously be allowed (and misleadingly give a size of 0) before:

#include <stdio.h>
struct s *S;
int main(void)
{
        printf("%lu %lu\n", sizeof(struct s), sizeof *S);
}
2017-10-21 20:36:21 -05:00