This covers code like the following:
int main(void) {
auto int a[20];
static int *p = &a[5];
}
Previously, this would compile without error, and then either give a linker error or be linked to refer to the global symbol "a" (if there was one).
Any stack-allocated array must be < 32KB, so we can use the same approach as in the small memory model to compute indexes for it (which is considerably more efficient than the large-memory-model code).
The code generator never generates this code sequence (and did not do so even prior to the last commit), so having a peephole optimization for it is pointless.
The new code is smaller and (in the common case where the subtraction does not overflow) faster. It takes advantage of the fact that in overflow cases the carry flag always gets set to the opposite of the sign bit of the result.
This will change a "jump if true" to "jump if false" (or vice versa) and logically negate the condition in certain cases where that generates better code.
An assembly peephole optimization for certain "branch to branch" instructions is also added. (Certain conditionals could generate these.)
The hash algorithm has been modified to include a rotate at each step. This should improve the quality of hashes and reduce the number of collisions. However, probably the more important change for performance is to do the modulo computation by repeated subtraction rather than by calling a slow library function.
This could occur because when FindSymbol was called to look for symbols in all spaces, it would find a tag in an inner scope before a typedef in an outer scope. The processing order has been changed to look for regular symbols (including typedefs) in any scope, and only look for tags if no regular symbol is found.
Here is an example illustrating the problem:
typedef int T;
int main(void) {
struct T;
T x;
}
This occurred due to looking for the symbol in all namespaces rather than only variable space.
Here is an example affected by this:
int X;
int main(void) {
struct X {int i;};
static int *i = &X;
}
If an identifier is used as a typedef in an outer scope but then declared as something else in an inner scope (e.g. a variable name or tag), and that same identifier is the next token after the end of the inner scope, it would not be recognized properly as a typedef name leading to spurious errors.\
Here is an example that triggered this:
typedef char Type;
void f(int Type);
Type t;
Here is another one:
int main(void) {
typedef int S;
if (1)
(struct S {int a;} *)0;
S x;
}
The old approach would call GenerateCode twice for each parameter expression. Now, it is only called once. This is faster, and also avoids some oddities with error handling. With the previous approach, expressionType would not be set if there was an error in the expression, which could lead to additional spurious errors. Also, a lint message treated as a warning could appear twice.
This could happen in some cases where one subexpression of a larger expression was a constant. One effect of this was to cause spurious "lint: implicit conversion changes value of constant" messages in certain cases (when that lint check was enabled). It may also have caused certain errors to be missed in other situations.
It is legal for source files that do not include <stdio.h> to define static functions named printf, scanf, etc. These obviously are not the standard library functions and are not required to work the some way as them, so they should not be subject to format checking.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).
Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
The intention may have been to set the flags based on the return value, but that is not part of the calling convention and nothing should be relying on it.
These are erroneous, in situations where the expression is used for its value. For function return types, this violates a constraint (C17 6.5.2.2 p1), so a diagnostic is required. We also now diagnose this issue for identifier expressions or unary * (indirection) expressions. These cases cause undefined behavior per C17 6.3.2.1 p2, so a diagnostic is not required, but it is nice to give one.
Varargs-only stack repair (i.e. using #pragma optimize bit 3 but not bit 6) was broken by commit 32975b720fb4d79. It removed some code that was needed to allocate the direct page location used to hold the stack pointer value in that case. This would lead to invalid code being produced, which could cause a crash when run. The fix is to revert the erroneous parts of commit 32975b720fb4d79 (which do not affect its core purpose of enabling intermediate code peephole optimization to be used when stack repair code is active).
Redeclaration of a pascal function could cause spurious errors when using strict type checking. (This was similar to the issue fixed in commit b5b276d0f40844, but this time occurring due to the CompTypes call in NewSymbol.) There may also have been subtle misbehavior in other corner cases.
Now the reversal of parameters for pascal functions is applied only once, in Declarator prior to calling NewSymbol. This ensures that symbols for pascal functions have the correct types whenever they are processed, and also simplifies the previous code, where the parameters could be reversed, un-reversed, and re-reversed in three separate places.
The pc_rev intermediate code always returns a value, so the check is not needed, and (since the generated code does not jump to a return label) it can yield false positives.
This occurs when the constant value is out of range of the type being assigned to. This is likely indicative of an error, or of code that assumes types have larger ranges than they do in ORCA/C (e.g. 32-bit int).
This intentionally does not report cases where a value is assigned to a signed type but is within the range of the corresponding unsigned type, or vice versa. These may be done intentionally, e.g. setting an unsigned value to "-1" or setting a signed value using a hex constant with the high bit set. Also, only conversions to 8-bit or 16-bit integer types are currently checked.
A macro is used to control whether struct timespec is declared, because GNO might want to declare it in other headers, and this would allow it to avoid duplicate declarations. (This will still require changes in the GNO headers. Currently, they declare struct timespec with different field names, although the layout is the same.)
In ORCA/Pascal's code generation, a case statement may use a jump table or a sequence of comparisons depending on whether it is considered sparse. This one was just a little too sparse to use a jump table, but changing it to use one makes it considerably faster. To force generation of a jump table, this commit adds several more explicit cases (even though they don't do anything).
If the return value is just a numeric constant or static address, it can simply be loaded right before the RTL instruction, avoiding any data movement.
This could actually be applied to a somewhat broader class of expressions, but for now it only applies to numeric or address constants, for which it is clearly correct.