Commit Graph

76 Commits

Author SHA1 Message Date
Stephen Heumann
60df52c268 Disallow the application of the unary * operator to integer constants.
It must only be applied to pointer types.
2017-10-21 20:36:21 -05:00
Stephen Heumann
4c0b02f32e Always convert keywords and typedef names to identifiers when processing preprocessor expressions.
This avoids various problems with inappropriately processing these elements, which should not be recognized as such at preprocessing time. For example, the following program should compile without errors, but did not:

typedef long foo;
#if int+1
#if foo-1
int main(void) {}
#endif
#endif
2017-10-21 20:36:21 -05:00
Stephen Heumann
7fa74d1183 Allow floating literals that are immediately cast to integer types to appear in integer constant expressions.
The C standards say these should be permitted. The following declaration shows an example:

int a[(int)5.5];
2017-10-21 20:36:21 -05:00
Stephen Heumann
0cf948e3bd Fix to make statically-evaluated unsigned comparisons evaluate to 1 if true.
Previously, they evaluated to -1. The following example shows the problem:

#include <stdio.h>
int main(void)
{
    printf("%i\n", 0U <= 5U);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
018f5a7548 Fix bug where constant expressions involving unsigned int and signed long operands would be evaluated as unsigned long.
The following example demonstrates the problem:

#include <stdio.h>
int main (void)
{
	printf("%i\n", -5L < 10U); /* should print "1" */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
b6b2121a9e Allow unsigned long values greater than 2^31 to be used as the second operand to % in constant expressions.
The following is an example that would give a compile error before this patch:

int main(void)
{
    unsigned long i = 1 % 3000000000;
}

The remainder operation still does not work properly for signed types when either operand is negative. It gives either errors or incorrect values in various cases, both when evaluated at compile time and run time. Fully addressing this (including the run-time cases) would require library updates.
2017-10-21 20:36:21 -05:00
Stephen Heumann
f4ad0fab80 Fix bug in unsigned 32-bit division and remainder routines, which could cause mis-evaluation of constant expressions.
The following example shows cases that were mis-evaluated:

/* Should print "3 10000" */
#include <stdio.h>
int main(void)
{
	printf("%lu %lu\n", 100000ul / 30000ul, 100000ul % 30000ul);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
d0b4b75970 Fix problem where if a macro's name appeared inside the macro, it would be expanded repeatedly, leading to a crash.
This is a problem introduced by the scanner changes between ORCA/C 2.1.0 and ORCA/C 2.1.1 B3.

The following examples demonstrate the problem:

#define m m
m

#define f(x) f(x)
f(a)
2017-10-21 20:36:21 -05:00
Stephen Heumann
5c81d970b5 Fix issue where statically-evaluated conversions from floating-point values to some integer types could yield wrong values.
This occurred because the values were being rounded rather than truncated when converted to long, unsigned long, or unsigned int.

This was causing problems in the C6.2.3.5.CC test case when compiled with optimization.

The below program demonstrates the problem:

#pragma optimize 1
#include <stdio.h>
int main (void)
{
   long L;
   unsigned int ui;
   unsigned long ul;
   L = -1.5;
   ui = 1.5;
   ul = 1.5;
   printf("%li %u %lu\n", L, ui, ul); /* should print "-1 1 1" */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
f099222af6 Don't give an error when calling functions with const-qualified parameter types or returning from functions with const-qualified return type.
This fixes the compco12.c test case.
2017-10-21 20:36:21 -05:00
Stephen Heumann
49ddf5abf1 Allow type qualifiers for pointer types to be used in type names (in casts and sizeof expressions).
This fixes the compco09.c test case.

This implementation permits duplicate copies of type qualifiers to appear. This is technically illegal in C90, but it’s legal in C99 and later, and ORCA/C already allows this in other contexts.
2017-10-21 20:36:21 -05:00
Stephen Heumann
cf1cd085d8 Don't block the LDA elimination optimization if a native code label is encountered.
This was an inadvertent change in commit 9d2bb600. This patch restores the old behavior with respect to d_lab.
2017-10-21 20:36:21 -05:00
Stephen Heumann
8c81b23b6f Expand the size of the object buffer from 64K to 128K, and use 32-bit values to track related sizes.
This allows functions that require an OMF segment byte count of up to 128K to be compiled, although the length in memory at run time is still limited to 64K. (The OMF segment byte count is usually larger, due to the size of relocation records, etc.)

This is useful for compiling large functions, e.g. the main interpreter loop in git. It also fixes the bug shown in the compca23 test case, where functions that require a segment of over 64K may appear to compile correctly but generate corrupted OMF segment headers. This related to tracking sizes with 16-bit values that could roll over.

This patch increases the memory needed at run time by 64K. This shouldn’t generally be a problem on systems with sufficient memory, although it does increase the minimum memory requirement a bit. If behavior in low-memory configurations is a concern, buffSize could be made into a run-time option.
2017-10-21 20:36:21 -05:00
Stephen Heumann
41fb05404e Don’t add the length of the last segment generated in the previous execution to that of the segment in the root file.
This would occur if ORCA/C remained in memory and was restarted after a previous execution, because the 'pc' value was not reinitialized. The ORCA linker seems to ignore the too-long segment length value, but ORCA/C should generate a correct value that actually corresponds to the length of the segment.
2017-10-21 20:36:21 -05:00
Stephen Heumann
a4bffe65e5 Increase the limit on the number of intermediate code labels in a function from 2400 to 3200.
This is necessary to compile some very large functions, such as the main interpreter loop in Git.

This consumes about 8K of extra memory for the additional label records.
2017-10-21 20:36:21 -05:00
Stephen Heumann
709f9b3f25 Fix bug where comparing 32-bit values in static arrays or structs against 0 may give wrong results with large memory model.
The issue was that 16-bit absolute addressing (in the data bank) was being used to access the data to compare, but with the large memory model the static arrays or structs are not necessarily in the same bank, so absolute long addressing should be used.

This was sometimes causing failures in the C4.6.4.1.CC and C4.6.6.1.CC conformance tests in the ORCA/C test suite.

The following program often demonstrates the problem (depending on memory layout and contents):

#pragma memorymodel 1
#pragma optimize 1

#include <stdio.h>

int i;
char ch1[32000];
long L1[1];

int main (void)
{
    if (L1 [0] != 0)
        printf("%li\n", L1[0]); /* shouldn't print */

    /* buggy behavior can happen if the bank bytes of these pointers differ */
    printf("%p %p\n", &L1[0], &i);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
fd48d77c60 Don’t erroneously optimize out lda instructions in certain cases involving instructions the native-code optimizer didn’t know about.
This could cause problems when asm blocks contained instructions that the ORCA/C native code optimizer didn’t know about, as in the example below. It might also be possible to trigger this bug without asm blocks (particularly with the large memory model), but I haven’t run into a case that does.

The new approach conservatively assumes that unknown instructions block the optimization. This should be equivalent to the old code with respect to the instructions defined in CGI.pas, except that m_bit_imm should have been treated as blocking the optimization but was not. There are still some other potential problem cases with applying this lda-elimination optimization to arbitrary assembly code, but fixing them might interfere with the optimization in useful cases, so I’m leaving those alone for now.

Here is an example of a program with an asm block affected by this problem:

#pragma optimize 74
int x,y;

/* should print 2 when invoked with argc==1 */
int main(int argc, char **argv)
{
    x = argc;
    y = argc + 6;

    asm {
        lda #1
        pha
        eor >x
        bne done
        inc argc
done:   pla
    }

    printf("%i\n", argc);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
aa7084dada Fix problem where long basic blocks could lead to a crash due to stack overflow in common subexpression elimination.
The issue was that one of the procedures used for CSE would recursively call itself for every 'next' link in the code of the basic block. To avoid this, I made it loop back to the top instead (i.e. did a manual tail-call elimination transformation).

This problem could be observed with large switch statements as in the following example, although other codes with very large basic blocks might have triggered it too. Whether ORCA/C actually crashes will depend on the memory layout--in my testing, this example consistently caused it to crash when running under GNO:

#pragma optimize 16
int main (int argc, char **argv)
{
    switch (argc)
    {
        case 0:  case 1:  case 2:  case 3:  case 4:  case 5:  case 6:  case 7:
        case 8:  case 9:  case 10: case 11: case 12: case 13: case 14: case 15:
        case 16: case 17: case 18: case 19: case 20: case 21: case 22: case 23:
        case 24: case 25: case 26: case 27: case 28: case 29: case 30: case 31:
        case 32: case 33: case 34: case 35: case 36: case 37: case 38: case 39:
        case 40: case 41: case 42:
        case 262:
        ;
    }
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
896c60d18c Determine the type to use for computing a >>= or <<= operation based only on the left operand.
Also, fix a case where an uninitialized value could be used, potentially resulting in errors not being reported (although I haven’t seen that in practice).

This fixes problems where >>= operations might not use an arithmetic shift in certain cases where they should, as in the below program:

#include <stdio.h>
int main (void)
{
    int i;
    unsigned u;
    long l;
    unsigned long ul;

    i = -1;
    u = 1;
    i >>= u;
    printf("%i\n", i); /* should be -1 */

    l = -1;
    ul = 3;
    l >>= ul;
    printf("%li\n", l); /* should be -1 */
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
67a29304a7 Fix to consistently update expressionType to reflect the application of usual unary conversions (promotions).
This is necessary so that subsequent processing sees the correct expression type and proceeds accordingly. In particular, without this patch, the casts in the below program were erroneously ignored:

#include <stdio.h>
int main(void)
{
    unsigned int u;
    unsigned char c;

    c = 0;
    u = (unsigned char)~c;
    printf("%u\n", u);

    c = 200;
    printf("%i\n", (unsigned char)(c+c));
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
4f736079f0 If the day of the month is in the range 1-9, report it in __DATE__ with a leading space character rather than a leading 0.
This is what is required by the C standards.
2017-10-21 20:36:21 -05:00
Stephen Heumann
1994e7e353 Properly report dates in the year 2000 and beyond with the __DATE__ macro.
This should now work properly through the end of the year 2155, i.e. the limit of what the ReadTimeHex format can represent.
2017-10-21 20:36:21 -05:00
Stephen Heumann
af48935d43 Fix so the type of shift expressions depends only on the (promoted) type of their left operand.
This is as required by the C standards: the type of the right operand should not affect the result type.

The following program demonstrates problems with the old behavior:

#include <stdio.h>

int main(void)
{
    unsigned long ul;
    long l;
    unsigned u;
    int i;

    ul = 0x8000 << 1L; /* should be 0 */
    printf("%lx\n", ul);

    l = -1 >> 1U; /* should be -1 */
    printf("%ld\n", l);

    u = 0xFF10;
    l = 8;
    ul = u << l; /* should be 0x1000 */
    printf("%lx\n", ul);

    l = -4;
    ul = 1;
    l = l >> ul; /* should be -2 */
    printf("%ld\n", l);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
5618b2810e Don’t produce spurious error messages when #error is used with tokens other than a string constant.
According to the C standards, #error is supposed to take an arbitrary sequence of pp-tokens, not just a string constant.
2017-10-21 20:36:21 -05:00
Stephen Heumann
8b4c83f527 Give an error when trying to use sizeof on incomplete struct or union types.
The following demonstrates cases that would erroneously be allowed (and misleadingly give a size of 0) before:

#include <stdio.h>
struct s *S;
int main(void)
{
        printf("%lu %lu\n", sizeof(struct s), sizeof *S);
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
5950002d03 Fix so the result of sizeof has type unsigned long (i.e. size_t), not signed long.
The following test case demonstrates the problem:

#include <stdio.h>

int main (void)
{
        if (sizeof(int) - 5 < 0) puts("error 1");
        if (sizeof &main - 9 < 0) puts("error 2");
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
26a800e667 Fix bug where certain integer constant expressions would have type signed long when they should have type unsigned long.
This affected binary or unary expressions with constant operands, at least one of which was of type unsigned long.

The following test case demonstrates the problem:

#include <stdio.h>
int main (void)
{
        if (0 + 0x80000000ul < 0) puts("error 1");
        if (~0ul < 0) puts("error 2");
}
2017-10-21 20:36:21 -05:00
Stephen Heumann
b8c1ea753f Allow 'extern' storage-class specifier in function definitions.
Per the C standards, this is allowed and is equivalent to not specifying a storage-class specifier.
2017-10-21 20:36:21 -05:00
Stephen Heumann
88279484af Generate better code when accessing structs and unions within other structures.
This cuts a few instructions from code like what is shown in commit affbe9. (It also works around the bug with that example, although that patch addresses the root cause of the problem.)
2017-10-21 20:36:20 -05:00
Stephen Heumann
efb23003f7 Don't do an optimization that would move a store to a DP location above an indirect load using that DP location.
This generated invalid code in instances like the following. The code generated for "s = s->u.next" would update the most significant word of s first, then use an indirect load with the half-updated pointer value to update the least significant word of s. This would generally corrupt the result if the new and old pointers had different bank bytes.

#pragma optimize 79
#include <stdio.h>

struct S {
	int i;
	union {
		struct S * next;
	} u;
} s1 = {0, 0};

int main (void)
{
	struct S * s = &s1;
	s = s->u.next;
	if (s != 0)
		puts("compiler bug detected\n"); /* May not always be triggered, depending on memory contents. */
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
45cc0a0721 Prevent struct/union members from having incomplete type, except for flexible array member.
ORCA/C previously allowed struct/union members to be declared with incomplete type. Because of this, it allowed C99-style flexible array members to be declared, albeit by accident rather than by design. In some basic testing, these seem to work correctly, except that they could be initialized and that would give rise to odd behavior.

I have restricted it to allowing flexible array members only in the cases allowed by C99/C11, and otherwise disallowing members with incomplete type. I have also prohibited initializing flexible array members.
2017-10-21 20:36:20 -05:00
Stephen Heumann
972b0109a4 Fix a problem where zero-initializing a one-byte array would crash the system.
Also, generate better code for zero-initializing small arrays.

The problem was that the code would call the library routine ~ZERO with a size of 1, but it only works properly with a size of 2 or more. While adding a check here, I also changed it to not call ~ZERO for other small arrays (<=10 bytes), since it is generally more efficient to just initialize them directly.

The initializations in the following are examples that could trigger the problem:

int main(void)
{
    struct { int i; char s[1]; } foo = {1, 0};
    char arr[2][1] = {2};
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
fa5974199d Don't allow the unary * operator to be applied to structs and unions. 2017-10-21 20:36:20 -05:00
Stephen Heumann
4247eb2c91 Fix bug where line number was off by one when using defaults.h.
This fixes the compco11.c test case.
2017-10-21 20:36:20 -05:00
Stephen Heumann
b019c59803 Flag an error if a struct or enum field declaration contains no declarator (i.e. no name).
This may be someone trying to use a C11-style anonymous struct/union, which should be flagged as an error until and unless those are supported. Otherwise, it probably just indicates that the programmer is confused. In any case, an error should be flagged for it.
2017-10-21 20:36:20 -05:00
Stephen Heumann
1077c35a49 Restrict bit fields to having integer types, and to having no more bits than the type specified.
C89 restricts bit fields to (signed) int and unsigned int only, although later standards note that additional types may be supported. ORCA/C supports the other integer types as an extension.

This fixes the compco01.c test case.
2017-10-21 20:36:20 -05:00
Stephen Heumann
f7c3c26794 Report an error when trying to cast a void expression to a non-void type. 2017-10-21 20:36:20 -05:00
Stephen Heumann
89f1dbce9b Report an error when void values are used in binary expressions.
These would previously be allowed, with the void value treated as if it was unsigned long.

Void values are still allowed for the second and third operands of the ?: operator, but only if they are both void. This is as required by the C standard.
2017-10-21 20:36:20 -05:00
Stephen Heumann
9558a88571 Fix test cases that used a non-brace-enclosed expression to initialize the first element of a union.
This is not permitted under standard C, and the previous commit caused this non-standard construct to stop working.
2017-10-21 20:36:20 -05:00
Stephen Heumann
05a0ce738a Permit unions to be initialized by assignment from expressions of the appropriate union type.
This patch should also permit the union initialization code to handle unions containing bit fields, but for the time being they are still prohibited by code elsewhere in the compiler.
2017-10-21 20:36:20 -05:00
Stephen Heumann
5e7d9e8278 Note that non-support for bit fields in unions is a non-standard limitation of ORCA/C. 2017-10-21 20:36:20 -05:00
Stephen Heumann
00ace776c4 Properly support declarations with incomplete structure types that are completed later in the same scope.
This fixes a problem with ORCA/C conformance test C5.6.0.1.CC, which was introduced by commit bf9fa66. A slightly more involved fix was needed to preserve the correct behavior while avoiding the memory trashing fixed by that patch.
2017-10-21 20:36:20 -05:00
Stephen Heumann
b83ed7b17b Do not erroneously treat integer constant expressions of type unsigned int as signed in certain contexts.
Unsigned constants in the range 0x8000-0xFFFF were erroneously being treated like negative signed values in some contexts, including in array declarators and in the case labels of switch statements where the expression switched over has type long or unsigned long.

This could lead to bogus compile errors for array declarations and typedefs such as the following:

typedef char foo[0x8000];

It could also lead to cases in switch statements not being properly matched, as in the following program:

#include <stdio.h>
int main(void)
{
    long i = 0xFF00;
    switch (i) {
        case 0xFF00:
            puts("good");
            break;
        default:
            puts("bad");
    }
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
0705a337b0 Allow case labels in switch statements that are out of range of the type being switched on.
The case label values are converted to the promoted type of the expression being switched on, as if by a cast. In practice, this means discarding the high bits of a 32-bit value to produce a 16-bit one.

Code requiring this is dubious and would be a good candidate for a warning or a lint error, but it's allowed under the C standards.

The following code demonstrates the issue:

#include <stdio.h>
int main(void)
{
    int i = 0x1234;
    switch (i) {
        case 0xABCD1234:
            puts("good");
            break;
        default:
            puts("bad");
    }
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
953a7c36d5 Optimization to avoid generating unnecessary sign-extension code when converting UChar values to Long/ULong.
This ameliorates a performance regression due to promoting char to int instead of unsigned int.
2017-10-21 20:36:20 -05:00
Stephen Heumann
cc75a9b12b Fix a problem where a conversion to a smaller type could be improperly omitted when followed by a conversion to a larger type.
This also fixes a logic error that may have permitted other conversions to be improperly omitted in some cases.

The following program demonstrates the problem (should print 211):

#pragma optimize 1
#include <stdio.h>
int main(void)
{
        unsigned int i = 1234;
        long l = (unsigned char)(i+1);
        printf("%li\n", l);
}
2017-10-21 20:36:20 -05:00
Stephen Heumann
0e82755334 Fix error that could happen when a function call was used in a comparison expression.
This resulted from the addition of the signed-to-unsigned comparison optimization. Specifically, it calls TypeOf for the expressions on each side of the comparison, and this did not handle function calls. That support has now been added, and will give the proper return type for direct and indirect calls to C functions. The IR for tool calls doesn't include the return type (just the number of bytes), so we return cgVoid for them. This is OK for the present use case.
2017-10-21 20:36:20 -05:00
Stephen Heumann
e00d47aaaf Fix invalid memory access due to uninitialized pointer.
(From Kelvin Sherlock.)
2017-10-21 20:36:20 -05:00
Stephen Heumann
dce39a851e Peephole optimization: fix some places where an expression with side effects could be removed during optimization.
Without this fix, an expression of the form "0 * exp" would be reduced to simply "0" unless exp contained a function call; other side effects of exp (such as assignments or increments) would be removed.

A similar issue could occur with additions that use the same expression on both sides of the "+": after optimization, it would only be evaluated once. I think the cases addressed here are all undefined behavior under the C standards, so the old behavior wasn't technically wrong, but the new behavior is still less confusing.
2017-10-21 20:36:20 -05:00
Stephen Heumann
bd19465ab2 Fix so that reaching definitions are properly computed in loop optimization when there is unreachable code.
Specifically, this ensures that the depth-first numbering of basic blocks starts from 1, which is what ReachingDefinitions expects. Without this fix, reaching definitions wouldn't be correctly computed for functions that contain unreachable basic blocks (including the implicit one to return at the end). This could result in invalid hoisting of operations out of the loop.

This fixes the compca26.c test case.
2017-10-21 20:36:20 -05:00