205 Commits

Author SHA1 Message Date
Stephen Heumann
0fe9424373 Disable recognition of trigraphs in C23 mode. 2024-08-30 13:43:59 -05:00
Stephen Heumann
b27fc52dc4 Recognize :: as a punctuator token (C23).
For the moment, it behaves as expected with regard to token merging and stringization, but otherwise it doesn't do anything. (It can be used in attributes, but those aren't implemented yet.)
2024-08-29 22:04:53 -05:00
Stephen Heumann
1e415aadd7 Parse _DecimalN keywords as (unsupported) type specifiers.
This gives a clearer error message when they are used and helps to keep the subsequent parsing on track.
2024-08-29 13:24:08 -05:00
Stephen Heumann
9ecbd42c1a Recognize new C23 keywords as keywords (in C23 mode).
They are recognized as keywords, but they currently don't do anything.
2024-08-28 21:58:59 -05:00
Stephen Heumann
f59c2cf93d Add a C23 standard option to the compiler.
The default is still c17compat.
2024-08-27 22:06:55 -05:00
Stephen Heumann
a9e0b13e1c Require at least one hex digit in \x escape sequences.
This is required by the grammar given in the C standards.
2024-07-31 17:05:57 -05:00
Stephen Heumann
69320cd4d8 Detect some erroneous numeric constants that were being allowed.
These include tokens like 0x, 0b, and 1.2LL.
2024-04-23 22:07:19 -05:00
Stephen Heumann
8278f7865a Support unconvertible preprocessing numbers.
These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.
2024-04-23 21:39:14 -05:00
Stephen Heumann
a545685ab4 Use more correct logic for expanding macros in macro parameters.
Specifically, this affects the case where a macro argument ends with the name of a function-like macro that takes 0 parameters. When that argument is initially expanded, the macro should not be expanded, even if there are parentheses within the macro that it is being passed to or the subsequent program code. This is the case because the C standards specify that "The argument’s preprocessing tokens are completely macro replaced before being substituted as if they formed the rest of the preprocessing file with no other preprocessing tokens being available." (The macro may still be expanded at a later stage, but that depends on other rules that determine whether the expansion is suppressed.) The logic for this was already present for the case of macros taking one or more argument; this extends it to apply to function-like macros taking zero arguments as well.

I'm not sure that this makes any practical difference while cycles of mutually-referential macros still aren't handled correctly (issue #48), but if that were fixed then there would be some cases that depend on this behavior.
2024-02-26 22:31:46 -06:00
Stephen Heumann
84fdb5c975 Fix handling of empty macro arguments as ## operands.
Previously, there were a couple problems:

*If the parameter that was passed an empty argument appeared directly after the ##, the ## would permanently be removed from the macro record, affecting subsequent uses of the macro even if the argument was not empty.
*If the parameter that was passed an empty argument appeared between two ## operators, both would effectively be skipped, so the tokens to the left of the first ## and to the right of the second would not be combined.

This example illustrates both issues (not expected to compile; just check preprocessor output):

#pragma expand 1
#define x(a,b,c) a##b##c
x(1, ,3)
x(a,b,c)
2024-02-22 21:55:46 -06:00
Stephen Heumann
d1847d40be Set numString properly for numeric tokens generated by ##.
Previously, it was not necessarily set correctly for the newly-generated token. This would result in incorrect behavior if that token was an operand to another ## operator, as in the following example:

#define x(a,b,c) a##b##c
x(1,2,3)
2024-02-22 21:42:05 -06:00
Kelvin Sherlock
586229e6eb #define should always use the global pool....
if a #define is within a function, it could use the local memory pool for string allocation (via Malloc in NextToken, line 5785) which can lead to a dangling memory reference when the macro is expanded.

void function(void) {

#define TEXT "abc"

static struct {
	char text[sizeof(TEXT)];
} template = { TEXT };

}
2024-01-15 22:05:32 -06:00
Stephen Heumann
9a56a50f5f Support FPE card auto-detection.
The second parameter of #pragma float is now optional, and if it missing or invalid then the FPE slot is auto-detected by the start-up code. This is done by calling the new ~InitFloat function in the FPE version of SysFloat.
2023-06-26 18:33:54 -05:00
Stephen Heumann
0021fd81bc #pragma float: Generate code in the .root file to set the FPE slot.
This allows valid FPE-using programs to be compiled using only #pragma float, with no changes needed to the code itself.

The slot-setting code is only generated if the slot is 1..7, and even then it can be overridden by calling setfpeslot(), so this should not cause compatibility problems for existing code.
2023-06-17 18:13:31 -05:00
Stephen Heumann
c2262929e9 Fix handling of #pragma float.
This was not getting recognized properly, because float is a keyword rather than an identifier.
2023-04-30 21:47:19 -05:00
Stephen Heumann
9d5360e844 Comment out some unused error messages. 2023-04-30 21:38:34 -05:00
Stephen Heumann
4c903a5331 Remove a few unused variables. 2023-04-04 18:11:41 -05:00
Stephen Heumann
cc36e9929f Remove some unused variables. 2023-03-20 11:12:48 -05:00
Stephen Heumann
137188ff4f Comment out an obsolete error message. 2023-03-17 19:47:30 -05:00
Stephen Heumann
ea056f1fbb Avoid listing the first line twice when a pre-include file is used. 2023-03-15 20:43:43 -05:00
Stephen Heumann
30a04d42c5 Require preprocessor conditionals to be balanced in each include file.
This is required by the standard syntax for a preprocessing file (C17 6.10), which must be a "group" (or empty).
2023-03-07 19:00:13 -06:00
Stephen Heumann
3406dbd3ae Prevent a tag declared in an inner scope from shadowing a typedef.
This could occur because when FindSymbol was called to look for symbols in all spaces, it would find a tag in an inner scope before a typedef in an outer scope. The processing order has been changed to look for regular symbols (including typedefs) in any scope, and only look for tags if no regular symbol is found.

Here is an example illustrating the problem:

typedef int T;
int main(void) {
        struct T;
        T x;
}
2023-03-06 21:38:05 -06:00
Stephen Heumann
85890e0b6b Give an error if assembly code tries to use direct page addressing for a local variable that is out of range.
This could previously cause bad code to be produced with no error reported.
2023-03-04 21:06:07 -06:00
Stephen Heumann
a6ef872513 Add debugging option to detect illegal use of null pointers.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).

Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
2023-02-12 18:56:02 -06:00
Stephen Heumann
03fc7a43b9 Give an error for expressions with incomplete struct/union types.
These are erroneous, in situations where the expression is used for its value. For function return types, this violates a constraint (C17 6.5.2.2 p1), so a diagnostic is required. We also now diagnose this issue for identifier expressions or unary * (indirection) expressions. These cases cause undefined behavior per C17 6.3.2.1 p2, so a diagnostic is not required, but it is nice to give one.
2023-01-09 21:58:53 -06:00
Stephen Heumann
61a2cd1e5e Consistently report "compiler error" for unrecognized error codes.
The old approach of calling Error while in the middle of writing error messages did not work reliably.
2023-01-09 18:46:33 -06:00
Stephen Heumann
245dd0a3f4 Add lint check for implicit conversions that change a constant's value.
This occurs when the constant value is out of range of the type being assigned to. This is likely indicative of an error, or of code that assumes types have larger ranges than they do in ORCA/C (e.g. 32-bit int).

This intentionally does not report cases where a value is assigned to a signed type but is within the range of the corresponding unsigned type, or vice versa. These may be done intentionally, e.g. setting an unsigned value to "-1" or setting a signed value using a hex constant with the high bit set. Also, only conversions to 8-bit or 16-bit integer types are currently checked.
2023-01-03 18:57:32 -06:00
Stephen Heumann
fe62f70d51 Add lint option to check for unused variables. 2022-12-12 21:47:32 -06:00
Stephen Heumann
32975b720f Allow native code peephole opt to be used when stack repair is enabled.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d329d81) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.

This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
2022-12-10 20:34:00 -06:00
Stephen Heumann
e71fe5d785 Treat unary + as an actual operator, not a no-op.
This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).
2022-12-09 19:03:38 -06:00
Stephen Heumann
bb1bd176f4 Add a command-line option to select the C standard to use.
This provides a more straightforward way to place the compiler in a "strict conformance" mode. This could essentially be achieved by setting several pragma options, but having a single setting is simpler. "Compatibility modes" for older standards can also be selected, although these actually continue to enable most C17 features (since they are unlikely to cause compatibility problems for older code).
2022-12-07 21:35:15 -06:00
Stephen Heumann
6857913daa Make the object buffer dynamically resizable.
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.

This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
2022-12-06 21:49:20 -06:00
Stephen Heumann
c06d78bb5e Add __STDC_VERSION__ macro.
With the addition of designated initializers, ORCA/C now supports all the major mandatory language features added between C90 and C17, apart from those made optional by C11. There are still various small areas of nonconformance and a number of missing library functions, but at this point it is reasonable for ORCA/C to report itself as being a C17 implementation.
2022-12-04 22:25:02 -06:00
Stephen Heumann
dc305a86b2 Add flag to suppress printing of put-back tokens with #pragma expand.
This is currently used in a couple places in the designated initializer code (solving the problem with #pragma expand in the last commit). It could probably be used elsewhere too, but for now it is not.
2022-11-28 21:22:56 -06:00
Stephen Heumann
b6d3dfb075 Designated initializers for arrays, part 1.
This can parse designated initializers for arrays, but does not create proper initializer records for them.
2022-11-26 15:22:58 -06:00
Stephen Heumann
3f450bdb80 Support "inline" function definitions without static or extern.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.

One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
2022-11-19 23:04:22 -06:00
Stephen Heumann
ab368d442a Allow \ as an "other character" preprocessing token.
This still has a few issues. A \ token may not be followed by u or U (because this triggers UCN processing). We should scan through the whole possible UCN until we can confirm whether it is actually a UCN, but that would require more lookahead. Also, \ is not handled correctly in stringization (it should form escape sequences).
2022-11-08 20:46:48 -06:00
Stephen Heumann
9cc72c8845 Support "other character" preprocessing tokens.
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.

It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
2022-11-08 18:58:50 -06:00
Stephen Heumann
f31b5ea1e6 Allow "extern inline" functions.
A function declared "inline" with an explicit "extern" storage class has the same semantics as if "inline" was omitted. (It is not an inline definition as defined in the C standards.) The "inline" specifier suggests that the function should be inlined, but it is legal to just ignore it, as we already do for "static inline" functions.

Also add a test for the inline function specifier.
2022-10-29 19:43:57 -05:00
Stephen Heumann
f54d0e1854 Require that main have no function specifiers.
This enforces a constraint in the C standards (for a hosted environment).
2022-10-29 18:36:51 -05:00
Stephen Heumann
e5428b21d2 Do not skip over the character after _Pragma(...). 2022-10-29 15:55:44 -05:00
Stephen Heumann
4702df9aac Support Unicode strings and some escape sequences in _Pragma.
This still works by "reconstructing" the string literal text, rather than just using what was in the source code. This is not what the standards specify and can result in slightly different behavior in some corner cases, but for realistic cases it is probably fine.
2022-10-25 22:47:22 -05:00
Stephen Heumann
e63d827049 Do not do macro expansion on preprocessor directive names.
According to the C standards (C17 section 6.10.3 p8), they should not be subject to macro replacement.

A similar change also applies to the "STDC" in #pragma STDC ... (but we still allow macros for other pragmas, which is allowed as part of the implementation-defined behavior of #pragma).

Here is an example affected by this issue:

#define ifdef ifndef
#ifdef foobar
#error "foobar defined?"
#else
int main(void) {}
#endif
2022-10-25 22:40:20 -05:00
Stephen Heumann
e0b27db652 Do not try to interpret non-identifier tokens as pragma names.
This could access arbitrary memory locations, and could theoretically cause misbehavior including falsely recognizing the token as a pragma or accessing a softswitch/IO location.
2022-10-25 22:26:30 -05:00
Stephen Heumann
81353a9f8a Always interpret the digit sequence in #line as decimal.
This is what the standards call for.
2022-10-23 13:47:59 -05:00
Stephen Heumann
65ec29ee3e Use 32-bit representation for line numbers.
C99 and later specify that line numbers set via #line can be up to 2147483647, so they need to be represented as (at least) a 32-bit value.
2022-10-22 21:46:12 -05:00
Stephen Heumann
760c932fea Initial implementation of _Pragma (C99).
This works for typical cases, but does not yet handle Unicode strings, very long strings, or certain escape sequences.
2022-10-22 17:08:54 -05:00
Stephen Heumann
946c6c1d55 Always end preprocessor expression processing at end of line.
In certain error cases, tokens from subsequent lines could get treated as part of a preprocessor expression, causing subsequent code to be essentially ignored and producing strange error messages.

Here is an example (with an error) affected by this:

#pragma optimize 0 0
int main(void) {}
2022-10-21 18:51:53 -05:00
Stephen Heumann
bdf212ec6b Remove support for separate . . . as equivalent to a ... token.
The scanner has been updated so that ... should always get recognized as a single token, so this is no longer necessary as a workaround. Any code that actually uses separate . . .  is non-standard and will need to be changed.
2022-10-19 18:14:14 -05:00
Stephen Heumann
6d8ca42734 Parse the _Thread_local storage-class specifier.
This does not really do anything, because ORCA/C does not support multithreading, but the C11 and later standards indicate it should be allowed anyway.
2022-10-18 21:01:26 -05:00