The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.
Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
This allows those tokens (asm, comp, extended, pascal, and segment) to be used as identifiers, consistent with the C standards.
A new pragma (#pragma extensions) is introduced to control this. It might also be used for other things in the future.
The string representation of macro tokens is needed for some preprocessor operations, but we get this in other ways (e.g. based on tokenStart/tokenEnd).
This could happen if a header was saved in the sym file, but the sym file data was not actually used because the source code in the main file did not match what was saved.
Previously, these might or might not be saved (based on the contents of uninitialized memory), but in many cases they were. This was unnecessary, since these macros are automatically defined when the scanner is initialized. Reading them from the sym file could result in duplicate copies of them in the macro list. This is usually harmless, but might result in #undefs of macros from the command line not working properly.
This would occur if the macro had already been saved in the sym file and the #undef occurred before a subsequent #include that was also recorded in the sym file. The solution is simply to terminate sym file generation if an #undef of an already-saved macro is encountered.
Here is an example showing the problem:
test.c:
#include "test1.h"
#undef x
#include "test2.h"
int main(void) {
#ifdef x
return x;
#else
return y;
#endif
}
test1.h:
#define x 27
test2.h:
#define y 6
Macros and include paths from the cc= parameters may be included in the symbol file, so incorrect behavior could result if the symbol file was used for a later compilation with different cc= parameters.
There were a couple issues that could occur with #pragma keep and sym files:
*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.
*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.
With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
There were several existing optimizations that could change behavior in ways that violated the IEEE standard with regard to infinities, NaNs, or signed zeros. They are now gated behind a new #pragma optimize flag. This change allows intermediate code peephole optimization and common subexpression elimination to be used while maintaining IEEE conformance, but also keeps the rule-breaking optimizations available if desired.
See section F.9.2 of recent C standards for a discussion of how these optimizations violate IEEE rules.
We previously ignored this, but it is a constraint violation under the C standards, so it should be reported as an error.
GCC and Clang allow this as an extension, as we were effectively doing previously. We will follow the standards for now, but if there was demand for such an extension in ORCA/C, it could be re-introduced subject to a #pragma ignore flag.
These are needed to correctly distinguish pointer types in _Generic. They should also be used for type compatibility checks in other contexts, but currently are not.
This also fixes a couple small problems related to type qualifiers:
*restrict was not allowed to appear after * in type-names
*volatile status was not properly recorded in sym files
Here is an example of using _Generic to distinguish pointer types based on the qualifiers of the pointed-to type:
#include <stdio.h>
#define f(e) _Generic((e),\
int * restrict *: 1,\
int * volatile const *: 2,\
int **: 3,\
default: 0)
#define g(e) _Generic((e),\
int *: 1,\
const int *: 2,\
volatile int *: 3,\
default: 0)
int main(void) {
int * restrict * p1;
int * volatile const * p2;
int * const * p3;
// should print "1 2 0 1"
printf("%i %i %i %i\n", f(p1), f(p2), f(p3), f((int * restrict *)0));
int *q1;
const int *q2;
volatile int *q3;
const volatile int *q4;
// should print "1 2 3 0"
printf("%i %i %i %i\n", g(q1), g(q2), g(q3), g(q4));
}
Here is an example of a problem resulting from volatile not being recorded in sym files (if a sym file was present, the read of x was lifted out of the loop):
#pragma optimize -1
static volatile int x;
#include <stdio.h>
int main(void) {
int y;
for (unsigned i = 0; i < 100; i++) {
y = x*2 + 7;
}
}
These were previously treated as having type int. This resulted in incorrect results from sizeof, and would also be a problem for _Generic if it was implemented.
Note that this creates a token kind of "charconst", but this is not the kind for character constants in the source code. Those have type int, so their kind is intconst. The new kinds of "tokens" are created only through casts of constant expressions.
The FENV_ACCESS pragma is now implemented. It causes floating-point operations to be evaluated at run time to the maximum extent possible, so that they can affect and be affected by the floating-point environment. It also disables optimizations that might evaluate floating-point operations at compile time or move them around calls to the <fenv.h> functions.
The FP_CONTRACT and CX_LIMITED_RANGE pragmas are also recognized, but they have no effect. (FP_CONTRACT relates to "contracting" floating-point expressions in a way that ORCA/C does not do, and CX_LIMITED_RANGE relates to complex arithmetic, which ORCA/C does not support.)
This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2.
This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
This is contrary to the C standards, but ORCA/C historically permitted it (as do some other compilers), and I think there is a fair amount of existing code that relies on it.
In the #pragma lint line, the integer indicating the checks to perform can now optionally be followed by a semicolon and another integer. If these are present and the second integer is 0, then the lint checks will be performed, but will be treated as warnings rather than errors, so that they do not cause compilation to fail.
Specifically, the following six punctuator tokens are now supported:
<: :> <% %> %: %:%:
These behave the same as the existing tokens [, ], {, }, #, and ## (respectively), apart from their spelling.
This can be useful when the full ASCII character set cannot easily be displayed or input (e.g. on the IIgs text screen with certain language settings).
Specifically, the following will now be tokenized as keywords:
_Alignas
_Alignof
_Atomic
_Bool
_Complex
_Generic
_Imaginary
_Noreturn
_Static_assert
_Thread_local
restrict
('inline' was also added as a standard keyword in C99, but ORCA/C already treated it as such.)
The parser currently has no support for any of these keywords, so for now errors will still be generated if they are used, but this is a first step toward adding support for them.
Bumping the version forces regeneration of any sym files created by old ORCA/C versions with the bug that was just fixed.
A couple sanity checks are also introduced when reading sym files, including one that would have caught that bug.