*Initialization of floating-point variables from unsigned long expressions with value > LONG_MAX would give the wrong value.
*Initialization of floating-point variables from (unsigned) long long expressions would give the wrong value.
*Initialization of _Bool variables should give 0 or 1, as per the usual rules for conversion to _Bool.
*Initialization of integer variables from floating-point expressions should be allowed, applying the usual conversions.
There is a tradeoff of code size vs. speed, since a sequence of STZ instructions is faster than a call to ~ZERO but could be quite large for a big array or struct. We now use ~ZERO for initializations of over 50 bytes to avoid excessive code bloat; the exact number chosen is somewhat arbitrary.
This avoids needing to generate many intermediate code records representing the data at most 8 bytes at a time, which should reduce memory use and probably improve performance for large initialized arrays or structs.
Static initialization of arrays/structs/unions now essentially "executes" the initializer records to fill in a buffer (and keep track of relocations), then emits pcode to represent that initialized state. This supports overlapping and out-of-order initializer records, as can be produced by designated initialization.
This prohibits empty initializers ({}) for arrays of unknown size, consistent with C23 requirements. Previous versions of C did not allow empty initializers at all, but ORCA/C historically did in some cases, so this patch still allows them for structs/unions/arrays of known size.
When the expression is initially parsed, we do not necessarily know whether it is the initializer for the struct/union or for its first member. That needs to be determined based on the type. To support that, a new function is added to evaluate the expression separately from using it to initialize an object.
This is currently used in a couple places in the designated initializer code (solving the problem with #pragma expand in the last commit). It could probably be used elsewhere too, but for now it is not.
As noted previously, there is some ambiguity in the standards about how anonymous structs/unions participate in initialization. ORCA/C follows the model that they do participate as structs or unions, and designated initialization of them is implemented accordingly.
This currently has a slight issue in that extra copies of the anonymous member field name will be printed in #pragma expand output.
This generally simplifies things, and always generates individual initializer records for each explicit initialization of a bit-field (which was previously done for automatic initialization, but not static).
This should work correctly for automatic initialization, but needs corresponding code changes in GenSymbols for static initialization.
This could maybe be simplified to just fill on levels with braces, but I want to consider that after implementing designated initializers for structs and unions.
The idea (not yet implemented) is to use this to support out-of-order initialization. For automatic variables, we can just initialize the subobjects in the order that initializers appear. For static variables, we will eventually need to reorder the initializers in order, but this can be done based on their recorded displacements.
The part of the declaration within the header could be ignored on subsequent compilations using the .sym file, which could lead to errors or misbehavior.
(This also applies to headers that end in the middle of a _Static_assert(...) or segment directive.)
Commit 9cc72c8845 introduced otherch tokens but did not properly update these tables to account for them. This would cause * not to be accepted as the first character in an expression, and might also cause other problems.
This is preparatory to supporting designated initializers.
Any struct/union type with an anonymous member now forces .sym file generation to end, since we do not have a scheme for serializing this information in a .sym file. It would be possible to do so, but for now we just avoid this situation for simplicity.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.
One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
This still has a few issues. A \ token may not be followed by u or U (because this triggers UCN processing). We should scan through the whole possible UCN until we can confirm whether it is actually a UCN, but that would require more lookahead. Also, \ is not handled correctly in stringization (it should form escape sequences).
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.
It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
If the extern declaration refers to a global variable/function for which a declaration is already visible, the inner declaration should have the composite type (and it is an error if the types are incompatible).
This affects programs like the following:
static char a[60] = {5};
int main(void) {
extern char a[];
return sizeof(a)+a[0]; /* should return 65 */
}