Commit Graph

56 Commits

Author SHA1 Message Date
Stephen Heumann 8278f7865a Support unconvertible preprocessing numbers.
These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.
2024-04-23 21:39:14 -05:00
Stephen Heumann 0aee669746 Update version number for ORCA/C 2.2.1 development. 2024-01-15 21:47:00 -06:00
Stephen Heumann 0021fd81bc #pragma float: Generate code in the .root file to set the FPE slot.
This allows valid FPE-using programs to be compiled using only #pragma float, with no changes needed to the code itself.

The slot-setting code is only generated if the slot is 1..7, and even then it can be overridden by calling setfpeslot(), so this should not cause compatibility problems for existing code.
2023-06-17 18:13:31 -05:00
Stephen Heumann 5c96042423 Update ORCA/C version number to 2.2.0 B7.
Also tweak documentation wording in a couple places.
2023-04-06 18:53:34 -05:00
Stephen Heumann be291b2423 Show "Including ..." lines for all headers when using a sym file.
Previously, only the last header from a group of headers represented together would be listed.
2023-03-20 17:22:13 -05:00
Stephen Heumann 3ac55a64bf Use an improved hash function for symbol tables.
The hash algorithm has been modified to include a rotate at each step. This should improve the quality of hashes and reduce the number of collisions. However, probably the more important change for performance is to do the modulo computation by repeated subtraction rather than by calling a slow library function.
2023-03-06 21:54:14 -06:00
Stephen Heumann a6ef872513 Add debugging option to detect illegal use of null pointers.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).

Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
2023-02-12 18:56:02 -06:00
Stephen Heumann fe62f70d51 Add lint option to check for unused variables. 2022-12-12 21:47:32 -06:00
Stephen Heumann ecca7a7737 Never make the segment in the root file dynamic.
This would previously happen if a segment directive with "dynamic" appeared before the first function in the program. That would cause the resulting program not to work, because the root segment needs to be a static segment at the start of the program, but if it is dynamic it would come after a jump table and a static segment of library code.

The root segments are also configured to refer to main or the NDA/CDA entry points using LEXPR records, so that they can be in dynamic segments (not that they necessarily should be). That change is intentionally not done for CDEV/XCMD/NBA, because they use code resources, which do not support dynamic segments, so it is better to force a linker error in these cases.
2022-12-11 14:46:38 -06:00
Stephen Heumann e71fe5d785 Treat unary + as an actual operator, not a no-op.
This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).
2022-12-09 19:03:38 -06:00
Stephen Heumann c06d78bb5e Add __STDC_VERSION__ macro.
With the addition of designated initializers, ORCA/C now supports all the major mandatory language features added between C90 and C17, apart from those made optional by C11. There are still various small areas of nonconformance and a number of missing library functions, but at this point it is reasonable for ORCA/C to report itself as being a C17 implementation.
2022-12-04 22:25:02 -06:00
Stephen Heumann 5500833180 Record which anon struct/union an anonymous member field came from.
This is preparatory to supporting designated initializers.

Any struct/union type with an anonymous member now forces .sym file generation to end, since we do not have a scheme for serializing this information in a .sym file. It would be possible to do so, but for now we just avoid this situation for simplicity.
2022-11-25 22:32:59 -06:00
Stephen Heumann 3f450bdb80 Support "inline" function definitions without static or extern.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.

One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
2022-11-19 23:04:22 -06:00
Stephen Heumann 9cc72c8845 Support "other character" preprocessing tokens.
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.

It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
2022-11-08 18:58:50 -06:00
Stephen Heumann 4702df9aac Support Unicode strings and some escape sequences in _Pragma.
This still works by "reconstructing" the string literal text, rather than just using what was in the source code. This is not what the standards specify and can result in slightly different behavior in some corner cases, but for realistic cases it is probably fine.
2022-10-25 22:47:22 -05:00
Stephen Heumann e3a3548443 Fix line numbering via #line when using a .sym file.
The line numbering would be off by one in this case.
2022-10-22 21:56:16 -05:00
Stephen Heumann 65ec29ee3e Use 32-bit representation for line numbers.
C99 and later specify that line numbers set via #line can be up to 2147483647, so they need to be represented as (at least) a 32-bit value.
2022-10-22 21:46:12 -05:00
Stephen Heumann 99e268e3b9 Implement support for anonymous structures and unions (C11).
Note that this implementation allows anonymous structures and unions to participate in initialization. That is, you can have a braced initializer list corresponding to an anonymous structure or union. Also, anonymous structures within unions follow the initialization rules for structures (and vice versa).

I think the better interpretation of the standard text is that anonymous structures and unions cannot participate in initialization as such, and instead their members are treated as members of the containing structure or union for purposes of initialization. However, all other compilers I am aware of allow anonymous structures and unions to participate in initialization, so I have implemented it that way too.
2022-10-16 18:44:19 -05:00
Stephen Heumann 4fe9c90942 Parse ... as a single punctuator token.
This accords with its definition in the C standards. For the time being, the old form of three separate tokens is still accepted too, because the ... token may not be scanned correctly in the obscure case where there is a line continuation between the second and third dots.

One observable effect of this is that there are no longer spaces between the dots in #pragma expand output.
2022-10-10 18:06:01 -05:00
Stephen Heumann 05ecf5eef3 Add option to use the declared type for float/double/comp params.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.

The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
2022-09-18 21:16:46 -05:00
Stephen Heumann 2f75f47140 Update ORCA/C version number to 2.2.0 B6. 2022-07-19 20:40:52 -05:00
Stephen Heumann 3c2b492618 Add support for compound literals within functions.
The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.

Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
2022-06-08 21:34:12 -05:00
Stephen Heumann a1d57c4db3 Allow ORCA/C-specific keywords to be disabled via a new pragma.
This allows those tokens (asm, comp, extended, pascal, and segment) to be used as identifiers, consistent with the C standards.

A new pragma (#pragma extensions) is introduced to control this. It might also be used for other things in the future.
2022-03-26 18:45:47 -05:00
Stephen Heumann 6cfe8cc886 Remove an unused string representation of macro tokens.
The string representation of macro tokens is needed for some preprocessor operations, but we get this in other ways (e.g. based on tokenStart/tokenEnd).
2022-02-21 18:39:39 -06:00
Stephen Heumann 1e98a63bf4 Avoid generating duplicate "Including ..." messages.
This could happen if a header was saved in the sym file, but the sym file data was not actually used because the source code in the main file did not match what was saved.
2022-02-16 21:31:49 -06:00
Stephen Heumann c96cf4f1dd Do not save predefined and command-line macros in the sym file.
Previously, these might or might not be saved (based on the contents of uninitialized memory), but in many cases they were. This was unnecessary, since these macros are automatically defined when the scanner is initialized. Reading them from the sym file could result in duplicate copies of them in the macro list. This is usually harmless, but might result in #undefs of macros from the command line not working properly.
2022-02-13 20:17:33 -06:00
Stephen Heumann 5d7c002819 Fix bug causing some #undefs to be ignored when using a sym file.
This would occur if the macro had already been saved in the sym file and the #undef occurred before a subsequent #include that was also recorded in the sym file. The solution is simply to terminate sym file generation if an #undef of an already-saved macro is encountered.

Here is an example showing the problem:

test.c:
#include "test1.h"
#undef x
#include "test2.h"

int main(void) {
#ifdef x
        return x;
#else
        return y;
#endif
}

test1.h:
#define x 27

test2.h:
#define y 6
2022-02-13 16:33:43 -06:00
Stephen Heumann 913a333f9f Record the cc= string in the symbol file and require it to match.
Macros and include paths from the cc= parameters may be included in the symbol file, so incorrect behavior could result if the symbol file was used for a later compilation with different cc= parameters.
2022-02-12 19:45:04 -06:00
Stephen Heumann bd811559d6 Fix issues with keep names in sym files.
There were a couple issues that could occur with #pragma keep and sym files:

*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.

*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.

With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
2022-02-06 21:49:08 -06:00
Stephen Heumann b43036409e Add a new optimize flag for FP math optimizations that break IEEE rules.
There were several existing optimizations that could change behavior in ways that violated the IEEE standard with regard to infinities, NaNs, or signed zeros. They are now gated behind a new #pragma optimize flag. This change allows intermediate code peephole optimization and common subexpression elimination to be used while maintaining IEEE conformance, but also keeps the rule-breaking optimizations available if desired.

See section F.9.2 of recent C standards for a discussion of how these optimizations violate IEEE rules.
2021-11-29 20:31:15 -06:00
Stephen Heumann 692ebaba85 Structs or arrays may not contain structs with a flexible array member.
We previously ignored this, but it is a constraint violation under the C standards, so it should be reported as an error.

GCC and Clang allow this as an extension, as we were effectively doing previously. We will follow the standards for now, but if there was demand for such an extension in ORCA/C, it could be re-introduced subject to a #pragma ignore flag.
2021-10-17 22:22:42 -05:00
Stephen Heumann 5871820e0c Support UTF-8/16/32 string literals and character constants (C11).
These have u8, u, or U prefixes, respectively. The types char16_t and char32_t (defined in <uchar.h>) are used for UTF-16 and UTF-32 code points.
2021-10-11 20:54:37 -05:00
Stephen Heumann 894baac94f Give an error if assigning to a whole struct or union that has a const member.
Such structs or unions are not modifiable lvalues, so they cannot be assigned to as a whole. Any non-const fields can be assigned to individually.
2021-09-11 18:12:58 -05:00
Stephen Heumann a8682e28d3 Give an error for pointer assignments that discard qualifiers.
This is controlled by #pragma ignore bit 5, which is now a more general "loose type checks" bit.
2021-09-10 17:58:20 -05:00
Stephen Heumann b16210a50b Record volatile and restrict qualifiers in types.
These are needed to correctly distinguish pointer types in _Generic. They should also be used for type compatibility checks in other contexts, but currently are not.

This also fixes a couple small problems related to type qualifiers:
*restrict was not allowed to appear after * in type-names
*volatile status was not properly recorded in sym files

Here is an example of using _Generic to distinguish pointer types based on the qualifiers of the pointed-to type:

#include <stdio.h>

#define f(e) _Generic((e),\
        int * restrict *: 1,\
        int * volatile const *: 2,\
        int **: 3,\
        default: 0)

#define g(e) _Generic((e),\
        int *: 1,\
        const int *: 2,\
        volatile int *: 3,\
        default: 0)

int main(void) {
        int * restrict * p1;
        int * volatile const * p2;
        int * const * p3;

        // should print "1 2 0 1"
        printf("%i %i %i %i\n", f(p1), f(p2), f(p3), f((int * restrict *)0));

        int *q1;
        const int *q2;
        volatile int *q3;
        const volatile int *q4;

        // should print "1 2 3 0"
        printf("%i %i %i %i\n", g(q1), g(q2), g(q3), g(q4));
}

Here is an example of a problem resulting from volatile not being recorded in sym files (if a sym file was present, the read of x was lifted out of the loop):

#pragma optimize -1
static volatile int x;
#include <stdio.h>
int main(void) {
        int y;
        for (unsigned i = 0; i < 100; i++) {
                y = x*2 + 7;
        }
}
2021-08-30 18:19:58 -05:00
Stephen Heumann 979852be3c Use the right types for constants cast to character types.
These were previously treated as having type int. This resulted in incorrect results from sizeof, and would also be a problem for _Generic if it was implemented.

Note that this creates a token kind of "charconst", but this is not the kind for character constants in the source code. Those have type int, so their kind is intconst. The new kinds of "tokens" are created only through casts of constant expressions.
2021-03-07 13:38:21 -06:00
Stephen Heumann 8f8e7f12e2 Distinguish the different types of floating-point constants.
As with expressions, the type does not actually limit the precision and range of values represented.
2021-03-07 00:48:51 -06:00
Stephen Heumann f368071146 Do some more checks for invalid sym files. 2021-03-06 15:02:51 -06:00
Stephen Heumann f9f79983f8 Implement the standard pragmas, in particular FENV_ACCESS.
The FENV_ACCESS pragma is now implemented. It causes floating-point operations to be evaluated at run time to the maximum extent possible, so that they can affect and be affected by the floating-point environment. It also disables optimizations that might evaluate floating-point operations at compile time or move them around calls to the <fenv.h> functions.

The FP_CONTRACT and CX_LIMITED_RANGE pragmas are also recognized, but they have no effect. (FP_CONTRACT relates to "contracting" floating-point expressions in a way that ORCA/C does not do, and CX_LIMITED_RANGE relates to complex arithmetic, which ORCA/C does not support.)
2021-03-06 00:57:13 -06:00
Stephen Heumann 4ad7a65de6 Process floating-point values within the compiler using the extended type.
This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2.

This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.
2021-03-04 23:58:08 -06:00
Stephen Heumann 793f0a57cc Initial support for constants with long long types.
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
2021-02-03 23:11:23 -06:00
Stephen Heumann 085cd7eb1b Initial code to recognize 'long long' as a type. 2021-01-29 22:27:11 -06:00
Stephen Heumann f0a3808c18 Add a new #pragma ignore option to treat char and unsigned char as compatible.
This is contrary to the C standards, but ORCA/C historically permitted it (as do some other compilers), and I think there is a fair amount of existing code that relies on it.
2020-05-22 17:11:13 -05:00
Stephen Heumann c0b2b44cad Add a new representation of C basic types and use it for type checking.
This allows us to distinguish int from short, etc.
2020-03-01 15:00:02 -06:00
Stephen Heumann 77dcfdf3ee Implement support for macros with variable arguments (C99). 2020-01-31 20:07:10 -06:00
Stephen Heumann ffe6c4e924 Spellcheck comments throughout the code.
There are no non-comment changes.
2020-01-29 17:09:52 -06:00
Stephen Heumann fe6c410271 Allow #pragma lint messages to optionally be treated as warnings.
In the #pragma lint line, the integer indicating the checks to perform can now optionally be followed by a semicolon and another integer. If these are present and the second integer is 0, then the lint checks will be performed, but will be treated as warnings rather than errors, so that they do not cause compilation to fail.
2020-01-25 11:29:12 -06:00
Stephen Heumann 9036a98e1c Implement support for digraphs.
Specifically, the following six punctuator tokens are now supported:

<: :> <% %> %: %:%:

These behave the same as the existing tokens [, ], {, }, #, and ## (respectively), apart from their spelling.

This can be useful when the full ASCII character set cannot easily be displayed or input (e.g. on the IIgs text screen with certain language settings).
2020-01-04 21:49:50 -06:00
Stephen Heumann 0184e3db7b Recognize the new keywords from C99 and C11 as such.
Specifically, the following will now be tokenized as keywords:

_Alignas
_Alignof
_Atomic
_Bool
_Complex
_Generic
_Imaginary
_Noreturn
_Static_assert
_Thread_local
restrict

('inline' was also added as a standard keyword in C99, but ORCA/C already treated it as such.)

The parser currently has no support for any of these keywords, so for now errors will still be generated if they are used, but this is a first step toward adding support for them.
2020-01-03 22:48:53 -06:00
Stephen Heumann 411f911b60 Bump sym file version and add a couple sanity checks.
Bumping the version forces regeneration of any sym files created by old ORCA/C versions with the bug that was just fixed.

A couple sanity checks are also introduced when reading sym files, including one that would have caught that bug.
2018-09-15 00:42:05 -05:00