This generally simplifies things, and always generates individual initializer records for each explicit initialization of a bit-field (which was previously done for automatic initialization, but not static).
This should work correctly for automatic initialization, but needs corresponding code changes in GenSymbols for static initialization.
This could maybe be simplified to just fill on levels with braces, but I want to consider that after implementing designated initializers for structs and unions.
The idea (not yet implemented) is to use this to support out-of-order initialization. For automatic variables, we can just initialize the subobjects in the order that initializers appear. For static variables, we will eventually need to reorder the initializers in order, but this can be done based on their recorded displacements.
The part of the declaration within the header could be ignored on subsequent compilations using the .sym file, which could lead to errors or misbehavior.
(This also applies to headers that end in the middle of a _Static_assert(...) or segment directive.)
Commit 9cc72c88452bf6b75 introduced otherch tokens but did not properly update these tables to account for them. This would cause * not to be accepted as the first character in an expression, and might also cause other problems.
This is preparatory to supporting designated initializers.
Any struct/union type with an anonymous member now forces .sym file generation to end, since we do not have a scheme for serializing this information in a .sym file. It would be possible to do so, but for now we just avoid this situation for simplicity.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.
One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
This still has a few issues. A \ token may not be followed by u or U (because this triggers UCN processing). We should scan through the whole possible UCN until we can confirm whether it is actually a UCN, but that would require more lookahead. Also, \ is not handled correctly in stringization (it should form escape sequences).
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.
It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
If the extern declaration refers to a global variable/function for which a declaration is already visible, the inner declaration should have the composite type (and it is an error if the types are incompatible).
This affects programs like the following:
static char a[60] = {5};
int main(void) {
extern char a[];
return sizeof(a)+a[0]; /* should return 65 */
}
Function declarations within a block are now entered within its symbol table rather than moved to the global one. Several error checks are also added or tightened.
This fixes at least one bug: if a function declared within a block had the same name as a variable in an outer scope, the symbol table entry for that variable could be corrupted, leading to spurious errors or incorrect code generation. This example program illustrates the problem:
/* This should compile without errors and return 2 */
int f(void) {return 1;}
int g(void) {return 2;}
int main(void) {
int (*f)(void) = g;
{
int f(void);
}
f = g;
return f();
}
Errors now detected include:
*Duplicate declarations of a static variable within a block (with the second one initialized)
*Duplicate declarations of the same variable as static and non-static
*Declaration of the same identifier as a typedef and a variable (at file scope)
This detects errors in the following cases that were previously missed:
* A function declaration and definition being part of the same overall declaration, e.g.:
void f(void), g(void) {}
* A function declaration (not definition) with no declaration specifiers, e.g.:
f(void);
(Function definitions with no declaration specifiers continue to be accepted by default, consistent with C90 rules.)
Previously, it generally just used the later type (except for function types where only the earlier one included a prototype). One effect of this is that if a global array is first declared with a size and then redeclared without one, the size information is lost, causing the proper space not to be allocated.
See C17 section 6.2.7 p4.
Here is an example affected by the array issue (dump the object file to see the size allocated):
int foo[50];
int foo[];
This seemed to be aimed at supporting lazy allocation of symbol tables. That could be a useful optimization, but the code that existed was incomplete and did not do anything useful. That or similar code could be reintroduced as part of a full implementation of lazy allocation, if it is ever done.
A function declared "inline" with an explicit "extern" storage class has the same semantics as if "inline" was omitted. (It is not an inline definition as defined in the C standards.) The "inline" specifier suggests that the function should be inlined, but it is legal to just ignore it, as we already do for "static inline" functions.
Also add a test for the inline function specifier.
This still works by "reconstructing" the string literal text, rather than just using what was in the source code. This is not what the standards specify and can result in slightly different behavior in some corner cases, but for realistic cases it is probably fine.
According to the C standards (C17 section 6.10.3 p8), they should not be subject to macro replacement.
A similar change also applies to the "STDC" in #pragma STDC ... (but we still allow macros for other pragmas, which is allowed as part of the implementation-defined behavior of #pragma).
Here is an example affected by this issue:
#define ifdef ifndef
#ifdef foobar
#error "foobar defined?"
#else
int main(void) {}
#endif
This could access arbitrary memory locations, and could theoretically cause misbehavior including falsely recognizing the token as a pragma or accessing a softswitch/IO location.
In certain error cases, tokens from subsequent lines could get treated as part of a preprocessor expression, causing subsequent code to be essentially ignored and producing strange error messages.
Here is an example (with an error) affected by this:
#pragma optimize 0 0
int main(void) {}
The scanner has been updated so that ... should always get recognized as a single token, so this is no longer necessary as a workaround. Any code that actually uses separate . . . is non-standard and will need to be changed.