Commit Graph

225 Commits

Author SHA1 Message Date
Stephen Heumann
80b96c1147 Ensure % with negative operands is not mis-optimized in intermediate code.
This will not be triggered in most cases, but might be if one of the operand expressions was itself subject to optimization.
2018-09-14 19:18:45 -05:00
Stephen Heumann
fedd275395 Fix issues where static initialization may generate the wrong number of bytes.
This relates to unions or structs that are "filled" with zeros because the initializer does not include explicit terms for them, and that contain bit-fields or (for unions) do not start with the longest member.

The following program is an example that was miscompiled:

#include <stdio.h>

struct BF {
        int i:3;
        int j:4;
};

union U {
        int i;
        long l;
};

struct Outer1 {
        int n;
        struct BF bf[7];
        union U u[5];
};

struct Outer2 {
        long p;
        struct Outer1 o1;
        long q;
};

int main(void) {
        static struct Outer2 s = {1,{0},212};
        printf("%li %li\n", s.p, s.q);
}
2018-09-14 19:02:21 -05:00
Stephen Heumann
4d10fbae01 Fix several initialization issues.
This fixes case 1 (dealing with run-time initialization of structures containing bit-fields) and case 2 (dealing with initialization of structs where initializer values are not provided for all elements) from issue #59. It also fixes cases that could result in invalid initialization of unions if their first element was not the longest, as in the following example:

#include <stdio.h>
union U {
        int i;
        long l;
};
int main(void) {
        union U a[5] = {1,2,3,4};
        printf("a[0].i=%i, a[1].i=%i, a[2].i=%i, a[3].i=%i, a[4].i=%i\n",
                a[0].i, a[1].i, a[2].i, a[3].i, a[4].i);
}
2018-09-14 18:13:33 -05:00
Stephen Heumann
8b4213cd5a Fix bug where the condition check of the ?: operator may be mis-evaluated.
This could happen in certain cases where the condition codes might not be set at expected. The following program gives an example:

#pragma optimize 1
#include <stdio.h>
int one(void) {return 1;}
int negative_one(void) {return -1;}
int main(void) {
    puts((one() + negative_one()) ? "A" : "B");
}

This could also occur if the condition used the % operator, particularly after the recent changes to it.

Also, add unsigned multiplication, division, and modulo operations to the list of those that may not set the condition codes based on the result value, both in this and other contexts.

Detected based on several programs from FizzBuzz-C.
2018-09-14 13:25:40 -05:00
Stephen Heumann
60484d6f69 Fix for including system headers via macros.
This makes something like the following work:

#define STDIO_H <stdio.h>
#include STDIO_H

It didn't previously, because workString would be overwritten by NextToken. The effect in this case was that it would erroneously try to include the header <hh>, rather than <stdio.h>.

Detected based on a couple programs from FizzBuzz-C.
2018-09-13 21:59:46 -05:00
Stephen Heumann
95f5ec9c13 Don't print a whole bunch of spaces for an error message if the column number is 0.
This could happen, e.g., for a "'}' expected" error at end-of-file. It occurred because the 0..maxint type being used caused the Pascal compiler to use unsigned comparisons, which were inappropriate here.
2018-09-10 21:55:02 -05:00
Stephen Heumann
857e432896 Disable a native-code optimization that was generating bad code for %.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.

It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.

The below program demonstrates an example that was being miscompiled:

#pragma optimize -1
#include <stdio.h>
int main(void) {
        int a = 100, b = 200, c = 3, d = 4;
        printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
2018-09-10 19:29:16 -05:00
Stephen Heumann
a359543769 Add EACCES as another name for EACCESS in <errno.h>.
The EACCES name is used in the ORCA/C manual, and also matches the name in POSIX and GNO. EACCESS is retained as an alias for compatibility.
2018-09-10 18:26:24 -05:00
Stephen Heumann
2d43074d5a Make % operator give proper remainders even if one or both operands are negative.
Per the C standards, the % operator should give a remainder after division, such that (a/b)*b + a%b equals a (provided that a/b is representable). As such, the operation of % is defined for cases where either or both of the operands are negative. Since division truncates toward 0, a%b should give a negative result (or 0) in cases where a is negative.

Previously, the % operator was essentially behaving like the "mod" operator in Pascal, which is equivalent for positive operands but not if either operand is negative. It would generally give incorrect results in those cases, or in some cases give compile-time or run-time errors.

This patch addresses both 16-bit and 32-bit signed computations at run time, and operations in constant expressions. The approach at run time is to call existing division routines, which return the correct remainder, except always as a positive number. The generated code checks the sign of the first operand, and if it is negative negates the remainder.

The code generated is somewhat large (especially for the 32-bit case), so it might be sensible to put it in a library function and call that, but for now it's just generated in-line. This avoids introducing a dependency on a new library function, so the generated code remains compatible with older versions of ORCALib (e.g. the GNO one).

Fixes #10.
2018-09-10 18:21:17 -05:00
Stephen Heumann
e8d89c9b39 Add prototype for _Exit() function from C99. 2018-09-10 17:41:53 -05:00
Stephen Heumann
3352e874dd Update a test to account for correct behavior of #line directive. 2018-09-10 17:40:57 -05:00
Stephen Heumann
f7010f1746 Update release notes. 2018-09-08 23:08:17 -05:00
Stephen Heumann
dc0f7dc4e4 Correct most of the LDBL_* values in <float.h>.
These had been the values appropriate for double, not accounting for the fact that long double is now the 80-bit extended type.

The integer LDBL_* values have now all been updated to be correct, as has LDBL_EPSILON. LDBL_MAX and LDBL_MIN are still not correct, because ORCA/C internally processes floating-point constants in the double format, and so the correct values would get rounded to INF and 0, respectively.

Note that the SANE 80-bit extended format is almost like the x87 80-bit extended format used for long double on many modern systems, but not entirely. The difference is that SANE allows the biased exponent field of an extended value (either normalized or denormalized) to be 0, yielding an effective exponent of 0-16383 = -16383. x87 uses an effective exponent of -16382 in these cases (which it considers to be pseudo-denormalized or denormalized), yielding values that are twice as large as the SANE values. This difference causes LDBL_MIN_EXP to be different, and would also cause LDBL_MIN to be different if it could be represented correctly.
2018-09-08 22:39:30 -05:00
Stephen Heumann
4de0035d77 Remove support for the non-standard "long float" type.
This was an alias for double, but it's non-standard and undocumented. Apparently it existed in some other pre-standard compilers, but it's not in any version of standard C, and I can't find any evidence of it being used. Considering the possibility for confusion, I think it's best to remove it.
2018-09-08 18:12:48 -05:00
Stephen Heumann
15b1c88d44 Give accurate error message if a numeric constant is too long.
Previously, "integer overflow" was reported in this case, even for floating constants.
2018-09-08 14:40:06 -05:00
Stephen Heumann
f6381b7523 Indicate errors at correct positions when the source line contains tabs.
Previously, the error markers would generally be misaligned in this case, because a tab would expand to no spaces (in ORCA/Shell) or multiple spaces (in most other environments), but the error-printing code would use a single space to try to line up with it.

The solution adopted is just to print tabs in the error lines at the positions where they occur in the source lines. The actual amount of space displayed will depend on the console being used, but in any case it should line up correctly with the source line.
2018-09-07 17:48:19 -05:00
Stephen Heumann
d33ac61af3 Fix bug where exit-to-editor may use wrong position in certain cases.
This could happen in certain situations where an error is detected at the end of a line (for example with "cannot redefine a macro" errors).

Fixes #40.
2018-09-06 23:55:25 -05:00
Stephen Heumann
f70f03d473 Update release notes. 2018-09-06 23:27:42 -05:00
Stephen Heumann
6f8546c964 Report errors for use of inappropriate types as operands to various operators.
Mainly, this detects errors in several cases where a pointer could inappropriately be used where an arithmetic type was expected. In some cases, other types (e.g. structs) could be used too.
2018-09-06 21:21:23 -05:00
Stephen Heumann
dc1b0aa29f Add lint flag to check for several forms of undefined behavior in computations.
This adds lint bit 5 (a value of 32), which currently enables checking for the following conditions:

*Integer overflow from arithmetic in constant expressions (currently only of type int).
*Invalid constant shift counts (negative, or >= the width of the type)
*Division by (constant) zero.

These (mainly the first two) can be indicative of code that was designed for larger type sizes and needs changes to support 16-bit int.
2018-09-05 23:48:35 -05:00
Stephen Heumann
b2785dfe0e Treat constant expressions cast to char as promoting to int, not unsigned int.
The following program demonstrates the problem:

#include <stdio.h>
int main(void) {
    long l = (char)1 - (char)5;
    printf("%li\n", l);  /* should print -4 */
}
2018-09-05 12:49:02 -05:00
Stephen Heumann
cb0497687e Update documentation to cover format checking and other recent changes. 2018-09-03 18:21:39 -05:00
Stephen Heumann
0e341e1153 Record line numbers for certain auto variable declarations with initializers.
This allows debuggers to stop on the declaration lines, and also provides trace-back information for them if that feature is enabled.

Currently, this applies only to declarations that occur after the first statement in the block. As such, it doesn't change the handling of traditional pre-C99-style declarations at the beginning of a block.
2018-09-02 15:22:58 -05:00
Stephen Heumann
55dbc718c1 Small format checker adjustments.
Format checking for "%p" is improved: in the case of scanf, the corresponding argument must be a pointer to a pointer.
2018-09-02 15:12:52 -05:00
Stephen Heumann
d3aff49550 Use escape sequences in format strings printed by the format checker.
Error messages are now lined up to account for the actual printed length of any escape sequences.
2018-09-02 00:18:26 -05:00
Stephen Heumann
7493ffa76c Support the pause-on-errors feature for format checker messages. 2018-09-01 23:18:48 -05:00
Stephen Heumann
ecff1e3479 Don't give an error when format string is not a string literal.
This may be indicative of dubious code in some cases, but it can also have valid use cases that are not easy to modify.

If desired, this could be re-enabled under a separate lint flag (analogous to -Wformat-nonliteral in GCC/Clang, which is available as an option but is not included in -Wformat or -Wall).
2018-09-01 22:36:31 -05:00
Stephen Heumann
94ec3c73c7 Various tweaks to printf/scanf format checking.
It is now stricter about invalid length modifiers. Also, "comp" is accepted where a floating-point type is expected (like the others, it is promoted to extended), and size specifiers are allowed with %n (now pretty much supported in the latest library version).
2018-09-01 22:25:38 -05:00
Stephen Heumann
69f086367c Adjust how messages from the printf/scanf format checker are displayed.
Mainly, this causes the messages from the format checker to be displayed after the relevant line is printed, along with any other error messages. The wording and formatting of some of the messages is also slightly adjusted, but there should be no substantive change in what is warned about.
2018-09-01 19:59:52 -05:00
Stephen Heumann
9ff3407c60 Avoid producing invalid string literals in #pragma expand output.
Previously, the characters ", /, and ? within string literals were not escaped in #pragma expand output, which could result in them being erroneously interpreted as ending the string literal, starting an escape sequence, or being part of a trigraph (respectively). Also, escape sequences were output in hexadecimal format. Since there is no length limit on hexadecimal escape sequences, this could result in subsequent characters in the string being interpreted as part of the escape sequence.

This fixes the issues by escaping the characters ", /, and ?, and by using three-digit octal escape sequences rather than hexadecimal ones.
2018-09-01 16:11:18 -05:00
Stephen Heumann
a6f1211ee6 Properly treat #line directive as giving the next line number, not the current one. 2018-08-31 21:46:10 -05:00
Stephen Heumann
463da80902 Small tweaks to <stdarg.h>.
*Use a typedef rather than a macro definition for va_list. (The C standards specify that va_list is a type, although this would make a practical difference only if someone #undef'd it.)

*Don't include a semicolon in va_start(), so it expands to an expression rather than a statement. This could make a difference in a construct like "if (...) va_start(...); else ...".
2018-08-28 18:58:27 -05:00
Stephen Heumann
0b8d4ce3e4 Fix invalid static initialization of bitfields occupying three bytes.
An extra, fourth byte was being generated for the bitfield(s). This would cause all subsequent members of the struct and any enclosing object not to be initialized at the proper locations, which would generally corrupt their values.

The following program illustrates the issue:

#include <stdio.h>

struct X {
    int a:9;
    int b:9;
    int c;
} x = {123,234,12345};

int main(void) {
    printf("x.a = %i, x.b = %i, x.b = %i\n", x.a, x.b, x.c);
}
2018-04-11 23:42:04 -05:00
Stephen Heumann
0cfed00b52 Fix problem with static initialization of bitfields followed by non-bitfield members.
The initialized bytes for the bitfield(s) could wind up improperly being placed after those for the non-bitfield, generally corrupting both values.

The following program illustrates the problem:

#include <stdio.h>

struct X {
    int a:9;
    int b;
} x = {42,123};

int main(void) {
    printf("x.a = %i, x.b = %i\n", x.a, x.b);
}
2018-04-11 23:30:48 -05:00
Stephen Heumann
04b0143eff Update release notes to cover my recent changes.
printf/scanf format checking is not yet covered.
2018-04-10 23:36:18 -05:00
Stephen Heumann
caabb5addf Allow declarations in first clause of for loop (C99). 2018-04-01 16:48:11 -05:00
Stephen Heumann
275e1f080b Add a new flag to control whether mixed declarations are allowed and C99 scope rules are used.
#pragma ignore bit 4 (a value of 16) now controls these. It is on by default (allowing them), but turning it off will restore the C89 rules.
2018-04-01 14:14:18 -05:00
Stephen Heumann
2be0ef0de5 Allow initialization of static/global variables with the address of members of const structs.
Note that this code currently permits discarding the const qualifier via such an initialization. That should give a diagnostic, but currently it doesn't in this or various other cases.

The following code (derived from a csmith-generated test case) illustrates the problem:

struct S0 {
   const long  f4;
};
const struct S0 g_149;
const long *g_311 = &g_149.f4;
2018-03-31 21:40:43 -05:00
Stephen Heumann
a9f7f97a2f Avoid errors from attempting common subexpression elimination on the left subexpression of the comma operator.
This could happen because the left subexpression does not produce a result for use in the enclosing expression, and therefore is not of the form expected by the CSE code.

The following program (derived from a csmith-generated test case) illustrates the problem:

#pragma optimize 16
int main(void) {
    int i;
    i, (i, 1);
}
2018-03-31 20:10:50 -05:00
Stephen Heumann
fd0ff033ad Fix code generation for certain cases where addresses are stored to local variables that don't fit in the direct page.
There was a bug when storing addresses generated by expressions like &a[i], where a is a global array and i is a variable. In certain cases where the destination location was a local variable that didn't fit in the direct page, the result of the address calculation would be stored to the wrong location on the stack. This failed to give the correct result, and could also sometimes cause crashes or other problems due to stack corruption.

The following program (derived from a csmith-generated test case) illustrates the issues:

#pragma optimize 1
long g_87[5];
static int g_242 = 4;

int main(void) {
    char l_298[256];
    long *l_284[3] = {0, 0, &g_87[g_242]};
    return l_284[2]-g_87; /* should be 4 */
}
2018-03-31 15:45:05 -05:00
Stephen Heumann
bfe4416623 Fix some cases where decay of array types to pointer types was not handled correctly.
Specifically:
*The result of pointer arithmetic (or equivalent operations like &a[i]) always has pointer type.
*Array types decay to integer types in the context of comparison operations, so it is legal to compare two differently-sized arrays with the same element type.

The following program (partially derived from a csmith-generated test case) illustrates the issues:

int main(void) {
    int a[2], b[10];
    if (a == b) ; /* legal */
    if (&a[1] != &b[0]) ; /* legal */
    return sizeof(&b[1]); /* Should be sizeof(int*), i.e. 4 on GS */
}
2018-03-29 19:41:32 -05:00
Stephen Heumann
37cf771eee Fix bug where ++/-- operations would use the wrong location for local variables that don't fit in the direct page.
The code would trash other data on the stack, which could corrupt other variables and in some cases lead to crashes.

The following program (derived from a csmith-generated test case) shows the problem:

#pragma optimize -1
int main(void) {
    char arr[256] = {0};
    char l_565[3][2] = {{3,4}, {5,6}, {7,8}};
    l_565[0][0]++;
    return l_565[0][0];
}
2018-03-27 23:10:49 -05:00
Stephen Heumann
7605b7bbf2 Fix bug where bitwise binary ops on 32-bit values will be miscalculated and trash the stack in certain cases.
The following program (derived from a csmith-generated test case) demonstrates the crash:

#pragma optimize 8+64
#include <stdio.h>
long g = 0;
int main (void) {
    long l = 0x10305070;
    printf("%08lx\n", l ^ (g = (1 , 0x12345678)));
}
2018-03-27 20:11:45 -05:00
Kelvin Sherlock
c2b24f2854 TermError doesn't always initalize the error message. commonly seen with compile +T. 2018-03-27 20:19:39 -04:00
Stephen Heumann
29de867039 Fix bug causing functions with 254 bytes of locals to crash on return.
This was a bug with the code for moving the return address. It would generate a "LDA 0" instruction when it was trying to load the value at DP+256.

The following program (derived from a csmith-generated test case) demonstrates the crash:

#pragma optimize 8
int main (int argc, char **argv) {
    char s[0xFC];
}
2018-03-26 23:30:26 -05:00
Stephen Heumann
21493271b9 Fix optimizer bug where tests of long or floating-point constants can trash the stack.
This problem could lead to crashes in code like the following (derived from a csmith-generated test case):

#pragma optimize 1
int main (void)
{
    if (1L) ;
}
2018-03-26 21:54:01 -05:00
Stephen Heumann
f2d15b8fc7 Fix optimizer bug where casts with unused results could sometimes cause stack corruption.
This problem could lead to crashes in code like the following (derived from a csmith-generated test case):

#pragma optimize 1
static int main(void) {
    long i = 2;
    (long)(i > 1);
}
2018-03-26 19:57:18 -05:00
Stephen Heumann
7f94876fa8 Fix mis-optimization of "expression && non-zero constant" operations with 32-bit type.
The previous code may have been intended to convert this to a "!=0" test, which would have been valid if correctly implemented, but with the current code generator that actually yields worse code than the original version, so for now I just removed the optimization for this case.

This problem could lead to crashes in code like the following (derived from a csmith-generated test case):

#pragma optimize 1
int main(int argc, char *argv[]){
    long l_57 = argc;

    return (4 ^ l_57) && 6;
}
2018-03-26 18:30:45 -05:00
Stephen Heumann
db98f7842d Fix mis-evaluation of certain equality comparisons with intermediate code optimization.
This affected comparisons of the form "logical operation or comparison == constant other than 0 or 1". These should always evaluate to 0 (false), but could mis-evaluate to true due to the bad optimization.

The following program gives an example showing the problem:

#pragma optimize 1
int main(void) {
        int i = 0, j = 42;
        return (i || j) == 123;
}
2018-03-26 18:20:36 -05:00
Stephen Heumann
7266b1d613 Properly support use of const structs and unions in assignments and as function arguments.
Here's an example that shows the issues (derived from a csmith-generated test case):

struct S {
   unsigned f;
};

void f1(struct S p) {
    printf("%u\n", p.f);
}

int main(void) {
    const struct S l = {123};
    struct S s;

    f1(l);

    s = l;
    printf("%u\n", s.f);
}
2018-03-25 20:56:15 -05:00