Previously, the logic for this was incorrect and would lead to a null pointer dereference in the compiler. In most cases the generated code would not actually change the pointer.
The following program demonstrates the issue:
#include <stdio.h>
#pragma memorymodel 1
typedef char bigarray[0x20000];
bigarray big[5];
int main(void) {
bigarray *p = big;
p++;
printf("%p %p\n", (void*)big, (void*)p);
}
These are initially entered into the symbol table with no known type (itype = nil), so this case should be accounted for in NewSymbol.
This typically would not cause a problem, but might if the zero page contained certain values
In certain rare cases, constant subexpression elimination could set the left subtree of a pc_bno operation in the intermediate code to nil. This could lead to null pointer dereferences, sometimes resulting in a crash or error during native code generation.
The below program sometimes demonstrates the problem (dependent on zero page contents):
#pragma optimize 16
struct F {int *p;};
void foo(struct F* f)
{
struct {int c;} s = {0};
++f->p;
s.c |= *--f->p;
}
This could cause spurious errors, or in some cases bad code generation.
The following example illustrates the problem:
#include <stdio.h>
enum {A,B,C};
/* arr was treated as having a size of 1, rather than 3 */
char arr[(int)C+1] = {1,2,3}; /* incorrectly gave an error for initializer */
int main(void) {
static int i = (int)C+1; /* incorrectly gave an error */
printf("%zu\n", sizeof(arr));
printf("%i\n", (int)C+1); /* OK */
printf("%i\n", i);
}
const structs are wrapped in definedType. The debugger symbol table code is unaware of this, which results in missing or incomplete entries.
example:
const struct { int a; int b; } cs;
cs: isForwardDeclared = false; class = ident
4 byte constant defined type of
4 byte struct: 223978
const struct { int a; int b; } *pcs;
pcs: isForwardDeclared = false; class = ident
4 byte pointer to
4 byte constant defined type of
4 byte struct: 224145
const struct { const struct { const int a; } a[2]; } csa[5];
csa: isForwardDeclared = false; class = ident
20 byte 5 element array of
4 byte constant defined type of
4 byte struct: 225155
const struct { const struct { const int a; } a[2]; } *cspa[5];
cspa: isForwardDeclared = false; class = ident
20 byte 5 element array of
4 byte pointer to
4 byte constant defined type of
4 byte struct: 224850
This change unwraps the definedType so the underlying type info can be placed in the debugger symbol table.
There still aren't prototypes for the main SANE calls, since they aren't really designed to be called directly from C, and may take variable numbers of parameters depending on the operation.
Bumping the version forces regeneration of any sym files created by old ORCA/C versions with the bug that was just fixed.
A couple sanity checks are also introduced when reading sym files, including one that would have caught that bug.
When these invalid sym files were used during subsequent compiles, certain type pointers (for what should be const-qualified struct or union types) could be left uninitialized, or possibly initialized pointing to different types. This could result in spurious errors or potentially in other problems.
This relates to unions or structs that are "filled" with zeros because the initializer does not include explicit terms for them, and that contain bit-fields or (for unions) do not start with the longest member.
The following program is an example that was miscompiled:
#include <stdio.h>
struct BF {
int i:3;
int j:4;
};
union U {
int i;
long l;
};
struct Outer1 {
int n;
struct BF bf[7];
union U u[5];
};
struct Outer2 {
long p;
struct Outer1 o1;
long q;
};
int main(void) {
static struct Outer2 s = {1,{0},212};
printf("%li %li\n", s.p, s.q);
}
This fixes case 1 (dealing with run-time initialization of structures containing bit-fields) and case 2 (dealing with initialization of structs where initializer values are not provided for all elements) from issue #59. It also fixes cases that could result in invalid initialization of unions if their first element was not the longest, as in the following example:
#include <stdio.h>
union U {
int i;
long l;
};
int main(void) {
union U a[5] = {1,2,3,4};
printf("a[0].i=%i, a[1].i=%i, a[2].i=%i, a[3].i=%i, a[4].i=%i\n",
a[0].i, a[1].i, a[2].i, a[3].i, a[4].i);
}
This could happen in certain cases where the condition codes might not be set at expected. The following program gives an example:
#pragma optimize 1
#include <stdio.h>
int one(void) {return 1;}
int negative_one(void) {return -1;}
int main(void) {
puts((one() + negative_one()) ? "A" : "B");
}
This could also occur if the condition used the % operator, particularly after the recent changes to it.
Also, add unsigned multiplication, division, and modulo operations to the list of those that may not set the condition codes based on the result value, both in this and other contexts.
Detected based on several programs from FizzBuzz-C.
This makes something like the following work:
#define STDIO_H <stdio.h>
#include STDIO_H
It didn't previously, because workString would be overwritten by NextToken. The effect in this case was that it would erroneously try to include the header <hh>, rather than <stdio.h>.
Detected based on a couple programs from FizzBuzz-C.
This could happen, e.g., for a "'}' expected" error at end-of-file. It occurred because the 0..maxint type being used caused the Pascal compiler to use unsigned comparisons, which were inappropriate here.
Specifically, it converted PLX followed by PHA to STA 1,S. This is invalid if the x value is actually used, which is a case that can come up in the code now generated for the % operator.
It might be possible to re-enable this optimization with tighter checks about where it's applied, but I don't think it's terribly important.
The below program demonstrates an example that was being miscompiled:
#pragma optimize -1
#include <stdio.h>
int main(void) {
int a = 100, b = 200, c = 3, d = 4;
printf("%i\n", (a+b) % (c+d)); /* should be 6 */
}
Per the C standards, the % operator should give a remainder after division, such that (a/b)*b + a%b equals a (provided that a/b is representable). As such, the operation of % is defined for cases where either or both of the operands are negative. Since division truncates toward 0, a%b should give a negative result (or 0) in cases where a is negative.
Previously, the % operator was essentially behaving like the "mod" operator in Pascal, which is equivalent for positive operands but not if either operand is negative. It would generally give incorrect results in those cases, or in some cases give compile-time or run-time errors.
This patch addresses both 16-bit and 32-bit signed computations at run time, and operations in constant expressions. The approach at run time is to call existing division routines, which return the correct remainder, except always as a positive number. The generated code checks the sign of the first operand, and if it is negative negates the remainder.
The code generated is somewhat large (especially for the 32-bit case), so it might be sensible to put it in a library function and call that, but for now it's just generated in-line. This avoids introducing a dependency on a new library function, so the generated code remains compatible with older versions of ORCALib (e.g. the GNO one).
Fixes#10.
These had been the values appropriate for double, not accounting for the fact that long double is now the 80-bit extended type.
The integer LDBL_* values have now all been updated to be correct, as has LDBL_EPSILON. LDBL_MAX and LDBL_MIN are still not correct, because ORCA/C internally processes floating-point constants in the double format, and so the correct values would get rounded to INF and 0, respectively.
Note that the SANE 80-bit extended format is almost like the x87 80-bit extended format used for long double on many modern systems, but not entirely. The difference is that SANE allows the biased exponent field of an extended value (either normalized or denormalized) to be 0, yielding an effective exponent of 0-16383 = -16383. x87 uses an effective exponent of -16382 in these cases (which it considers to be pseudo-denormalized or denormalized), yielding values that are twice as large as the SANE values. This difference causes LDBL_MIN_EXP to be different, and would also cause LDBL_MIN to be different if it could be represented correctly.
This was an alias for double, but it's non-standard and undocumented. Apparently it existed in some other pre-standard compilers, but it's not in any version of standard C, and I can't find any evidence of it being used. Considering the possibility for confusion, I think it's best to remove it.
Previously, the error markers would generally be misaligned in this case, because a tab would expand to no spaces (in ORCA/Shell) or multiple spaces (in most other environments), but the error-printing code would use a single space to try to line up with it.
The solution adopted is just to print tabs in the error lines at the positions where they occur in the source lines. The actual amount of space displayed will depend on the console being used, but in any case it should line up correctly with the source line.
Mainly, this detects errors in several cases where a pointer could inappropriately be used where an arithmetic type was expected. In some cases, other types (e.g. structs) could be used too.
This adds lint bit 5 (a value of 32), which currently enables checking for the following conditions:
*Integer overflow from arithmetic in constant expressions (currently only of type int).
*Invalid constant shift counts (negative, or >= the width of the type)
*Division by (constant) zero.
These (mainly the first two) can be indicative of code that was designed for larger type sizes and needs changes to support 16-bit int.
The following program demonstrates the problem:
#include <stdio.h>
int main(void) {
long l = (char)1 - (char)5;
printf("%li\n", l); /* should print -4 */
}
This allows debuggers to stop on the declaration lines, and also provides trace-back information for them if that feature is enabled.
Currently, this applies only to declarations that occur after the first statement in the block. As such, it doesn't change the handling of traditional pre-C99-style declarations at the beginning of a block.
This may be indicative of dubious code in some cases, but it can also have valid use cases that are not easy to modify.
If desired, this could be re-enabled under a separate lint flag (analogous to -Wformat-nonliteral in GCC/Clang, which is available as an option but is not included in -Wformat or -Wall).
It is now stricter about invalid length modifiers. Also, "comp" is accepted where a floating-point type is expected (like the others, it is promoted to extended), and size specifiers are allowed with %n (now pretty much supported in the latest library version).