For the moment, it behaves as expected with regard to token merging and stringization, but otherwise it doesn't do anything. (It can be used in attributes, but those aren't implemented yet.)
These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.
Specifically, this affects the case where a macro argument ends with the name of a function-like macro that takes 0 parameters. When that argument is initially expanded, the macro should not be expanded, even if there are parentheses within the macro that it is being passed to or the subsequent program code. This is the case because the C standards specify that "The argument’s preprocessing tokens are completely macro replaced before being substituted as if they formed the rest of the preprocessing file with no other preprocessing tokens being available." (The macro may still be expanded at a later stage, but that depends on other rules that determine whether the expansion is suppressed.) The logic for this was already present for the case of macros taking one or more argument; this extends it to apply to function-like macros taking zero arguments as well.
I'm not sure that this makes any practical difference while cycles of mutually-referential macros still aren't handled correctly (issue #48), but if that were fixed then there would be some cases that depend on this behavior.
Previously, there were a couple problems:
*If the parameter that was passed an empty argument appeared directly after the ##, the ## would permanently be removed from the macro record, affecting subsequent uses of the macro even if the argument was not empty.
*If the parameter that was passed an empty argument appeared between two ## operators, both would effectively be skipped, so the tokens to the left of the first ## and to the right of the second would not be combined.
This example illustrates both issues (not expected to compile; just check preprocessor output):
#pragma expand 1
#define x(a,b,c) a##b##c
x(1, ,3)
x(a,b,c)
Previously, it was not necessarily set correctly for the newly-generated token. This would result in incorrect behavior if that token was an operand to another ## operator, as in the following example:
#define x(a,b,c) a##b##c
x(1,2,3)
if a #define is within a function, it could use the local memory pool for string allocation (via Malloc in NextToken, line 5785) which can lead to a dangling memory reference when the macro is expanded.
void function(void) {
#define TEXT "abc"
static struct {
char text[sizeof(TEXT)];
} template = { TEXT };
}
The second parameter of #pragma float is now optional, and if it missing or invalid then the FPE slot is auto-detected by the start-up code. This is done by calling the new ~InitFloat function in the FPE version of SysFloat.
This allows valid FPE-using programs to be compiled using only #pragma float, with no changes needed to the code itself.
The slot-setting code is only generated if the slot is 1..7, and even then it can be overridden by calling setfpeslot(), so this should not cause compatibility problems for existing code.
This could occur because when FindSymbol was called to look for symbols in all spaces, it would find a tag in an inner scope before a typedef in an outer scope. The processing order has been changed to look for regular symbols (including typedefs) in any scope, and only look for tags if no regular symbol is found.
Here is an example illustrating the problem:
typedef int T;
int main(void) {
struct T;
T x;
}
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).
Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
These are erroneous, in situations where the expression is used for its value. For function return types, this violates a constraint (C17 6.5.2.2 p1), so a diagnostic is required. We also now diagnose this issue for identifier expressions or unary * (indirection) expressions. These cases cause undefined behavior per C17 6.3.2.1 p2, so a diagnostic is not required, but it is nice to give one.
This occurs when the constant value is out of range of the type being assigned to. This is likely indicative of an error, or of code that assumes types have larger ranges than they do in ORCA/C (e.g. 32-bit int).
This intentionally does not report cases where a value is assigned to a signed type but is within the range of the corresponding unsigned type, or vice versa. These may be done intentionally, e.g. setting an unsigned value to "-1" or setting a signed value using a hex constant with the high bit set. Also, only conversions to 8-bit or 16-bit integer types are currently checked.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d329d81) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.
This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).
This provides a more straightforward way to place the compiler in a "strict conformance" mode. This could essentially be achieved by setting several pragma options, but having a single setting is simpler. "Compatibility modes" for older standards can also be selected, although these actually continue to enable most C17 features (since they are unlikely to cause compatibility problems for older code).
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.
This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
With the addition of designated initializers, ORCA/C now supports all the major mandatory language features added between C90 and C17, apart from those made optional by C11. There are still various small areas of nonconformance and a number of missing library functions, but at this point it is reasonable for ORCA/C to report itself as being a C17 implementation.
This is currently used in a couple places in the designated initializer code (solving the problem with #pragma expand in the last commit). It could probably be used elsewhere too, but for now it is not.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.
One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
This still has a few issues. A \ token may not be followed by u or U (because this triggers UCN processing). We should scan through the whole possible UCN until we can confirm whether it is actually a UCN, but that would require more lookahead. Also, \ is not handled correctly in stringization (it should form escape sequences).
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.
It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
A function declared "inline" with an explicit "extern" storage class has the same semantics as if "inline" was omitted. (It is not an inline definition as defined in the C standards.) The "inline" specifier suggests that the function should be inlined, but it is legal to just ignore it, as we already do for "static inline" functions.
Also add a test for the inline function specifier.
This still works by "reconstructing" the string literal text, rather than just using what was in the source code. This is not what the standards specify and can result in slightly different behavior in some corner cases, but for realistic cases it is probably fine.
According to the C standards (C17 section 6.10.3 p8), they should not be subject to macro replacement.
A similar change also applies to the "STDC" in #pragma STDC ... (but we still allow macros for other pragmas, which is allowed as part of the implementation-defined behavior of #pragma).
Here is an example affected by this issue:
#define ifdef ifndef
#ifdef foobar
#error "foobar defined?"
#else
int main(void) {}
#endif
This could access arbitrary memory locations, and could theoretically cause misbehavior including falsely recognizing the token as a pragma or accessing a softswitch/IO location.
In certain error cases, tokens from subsequent lines could get treated as part of a preprocessor expression, causing subsequent code to be essentially ignored and producing strange error messages.
Here is an example (with an error) affected by this:
#pragma optimize 0 0
int main(void) {}
The scanner has been updated so that ... should always get recognized as a single token, so this is no longer necessary as a workaround. Any code that actually uses separate . . . is non-standard and will need to be changed.
This does not really do anything, because ORCA/C does not support multithreading, but the C11 and later standards indicate it should be allowed anyway.