Commit Graph

81 Commits

Author SHA1 Message Date
Stephen Heumann 8278f7865a Support unconvertible preprocessing numbers.
These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.
2024-04-23 21:39:14 -05:00
Stephen Heumann 0aee669746 Update version number for ORCA/C 2.2.1 development. 2024-01-15 21:47:00 -06:00
Stephen Heumann 1aa654628a Update ORCA/C version number to 2.2.0 final. 2023-07-16 22:51:16 -05:00
Stephen Heumann 9dad2b6186 Update displayed version number to mark this as a development version. 2023-04-16 14:29:02 -05:00
Stephen Heumann 5c96042423 Update ORCA/C version number to 2.2.0 B7.
Also tweak documentation wording in a couple places.
2023-04-06 18:53:34 -05:00
Stephen Heumann 245dd0a3f4 Add lint check for implicit conversions that change a constant's value.
This occurs when the constant value is out of range of the type being assigned to. This is likely indicative of an error, or of code that assumes types have larger ranges than they do in ORCA/C (e.g. 32-bit int).

This intentionally does not report cases where a value is assigned to a signed type but is within the range of the corresponding unsigned type, or vice versa. These may be done intentionally, e.g. setting an unsigned value to "-1" or setting a signed value using a hex constant with the high bit set. Also, only conversions to 8-bit or 16-bit integer types are currently checked.
2023-01-03 18:57:32 -06:00
Stephen Heumann 09fbfb1905 Maintain a pool of empty symbol tables that can be reused.
The motivation for this is that allocating and clearing symbol tables is a common operation, especially with C99+, where a construct like "if (...) { ... }" involves three levels of scope with their own symbol tables. In some tests, it could take an appreciable fraction of total execution time (sometimes ~10%).

This patch allows symbol tables that have already been allocated and cleared to be reused for a subsequent scope, as long as they are still empty. It does this by maintaining a pool of empty symbol tables and taking one from there rather than allocating a new one when possible.

We impose a somewhat arbitrary limit of MaxBlock/150000 on the number of symbol tables we keep, to avoid filling up memory with them. It would probably be better to use purgeable handles here, but that would be a little more work, and this should be good enough for now.
2022-12-13 21:14:23 -06:00
Stephen Heumann fe62f70d51 Add lint option to check for unused variables. 2022-12-12 21:47:32 -06:00
Stephen Heumann e71fe5d785 Treat unary + as an actual operator, not a no-op.
This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).
2022-12-09 19:03:38 -06:00
Stephen Heumann bb1bd176f4 Add a command-line option to select the C standard to use.
This provides a more straightforward way to place the compiler in a "strict conformance" mode. This could essentially be achieved by setting several pragma options, but having a single setting is simpler. "Compatibility modes" for older standards can also be selected, although these actually continue to enable most C17 features (since they are unlikely to cause compatibility problems for older code).
2022-12-07 21:35:15 -06:00
Stephen Heumann 6857913daa Make the object buffer dynamically resizable.
It will now grow as needed to accommodate large segments, subject to the constraints of available memory. In practice, this mostly affects the size of initialized static arrays that can be used.

This also removes any limit apart from memory size on how large the object representation produced by a "compile to memory" can be, and cleans up error reporting regarding size limits.
2022-12-06 21:49:20 -06:00
Stephen Heumann 28e119afb1 Rework static initialization to support new-style initializer records.
Static initialization of arrays/structs/unions now essentially "executes" the initializer records to fill in a buffer (and keep track of relocations), then emits pcode to represent that initialized state. This supports overlapping and out-of-order initializer records, as can be produced by designated initialization.
2022-12-02 21:55:57 -06:00
Stephen Heumann 5f8a6baa94 Get rid of an unnecessary field in initializer records.
The "isStructOrUnion" information can now be determined simply by the type in the record.
2022-11-26 20:29:31 -06:00
Stephen Heumann d1edc8821d Record the type being initialized in auto initializer records. 2022-11-26 19:58:01 -06:00
Stephen Heumann cd9931a60c Record displacement from start of object in initializer records.
The idea (not yet implemented) is to use this to support out-of-order initialization. For automatic variables, we can just initialize the subobjects in the order that initializers appear. For static variables, we will eventually need to reorder the initializers in order, but this can be done based on their recorded displacements.
2022-11-26 19:27:17 -06:00
Stephen Heumann 8cfc14b50a Rename itype field of initializerRecord to basetype. 2022-11-26 15:45:26 -06:00
Stephen Heumann 5500833180 Record which anon struct/union an anonymous member field came from.
This is preparatory to supporting designated initializers.

Any struct/union type with an anonymous member now forces .sym file generation to end, since we do not have a scheme for serializing this information in a .sym file. It would be possible to do so, but for now we just avoid this situation for simplicity.
2022-11-25 22:32:59 -06:00
Stephen Heumann 3f450bdb80 Support "inline" function definitions without static or extern.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.

One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
2022-11-19 23:04:22 -06:00
Stephen Heumann 9cc72c8845 Support "other character" preprocessing tokens.
This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters.

It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.
2022-11-08 18:58:50 -06:00
Stephen Heumann 65ec29ee3e Use 32-bit representation for line numbers.
C99 and later specify that line numbers set via #line can be up to 2147483647, so they need to be represented as (at least) a 32-bit value.
2022-10-22 21:46:12 -05:00
Stephen Heumann 859aa4a20a Do not enter the editor with a negative file displacement.
This could happen due to errors on the command line, e.g.:

cmpl +T +E file.c cc=(invalid)
2022-10-22 12:54:59 -05:00
Stephen Heumann 99e268e3b9 Implement support for anonymous structures and unions (C11).
Note that this implementation allows anonymous structures and unions to participate in initialization. That is, you can have a braced initializer list corresponding to an anonymous structure or union. Also, anonymous structures within unions follow the initialization rules for structures (and vice versa).

I think the better interpretation of the standard text is that anonymous structures and unions cannot participate in initialization as such, and instead their members are treated as members of the containing structure or union for purposes of initialization. However, all other compilers I am aware of allow anonymous structures and unions to participate in initialization, so I have implemented it that way too.
2022-10-16 18:44:19 -05:00
Stephen Heumann 4fe9c90942 Parse ... as a single punctuator token.
This accords with its definition in the C standards. For the time being, the old form of three separate tokens is still accepted too, because the ... token may not be scanned correctly in the obscure case where there is a line continuation between the second and third dots.

One observable effect of this is that there are no longer spaces between the dots in #pragma expand output.
2022-10-10 18:06:01 -05:00
Stephen Heumann 1fa3ec8fdd Eliminate global variables for declaration specifiers.
They are now represented in local structures instead. This keeps the representation of declaration specifiers together and eliminates the need for awkward and error-prone code to save and restore the global variables.
2022-10-01 21:28:16 -05:00
Stephen Heumann 711549392c Update displayed version number to mark this as a development version. 2022-07-25 18:33:32 -05:00
Stephen Heumann 2f75f47140 Update ORCA/C version number to 2.2.0 B6. 2022-07-19 20:40:52 -05:00
Stephen Heumann 00e7fe7125 Increase some size limits.
The new value of maxLocalLabel is aligned with the C99+ requirement to support "511 identifiers with block scope declared in one block".

The value of maxLabel is now the maximum it can be while keeping the size of the labelTab array under 32 KiB. (I'm not entirely sure the address calculations in the code generated by ORCA/Pascal would work correctly beyond that.)
2022-07-08 21:30:14 -05:00
Stephen Heumann 3c2b492618 Add support for compound literals within functions.
The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.

Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
2022-06-08 21:34:12 -05:00
Stephen Heumann 06e17cd8f5 Give an error if file names or command-line parameters are too long.
The source file name, keep name, NAMES= string, and cc= string are all restricted to 255 characters, but these limits were not previously enforced, and exceeding them could lead to strange behavior.
2022-02-12 15:42:15 -06:00
Stephen Heumann bd811559d6 Fix issues with keep names in sym files.
There were a couple issues that could occur with #pragma keep and sym files:

*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.

*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.

With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
2022-02-06 21:49:08 -06:00
Stephen Heumann 785a6997de Record source file changes within a function as part of debug info.
This affects functions whose body spans multiple files due to includes, or is treated as doing so due to #line directives. ORCA/C will now generate a COP 6 instruction to record each source file change, allowing debuggers to properly track the flow of execution across files.
2022-02-05 18:32:11 -06:00
Stephen Heumann 7322428e1d Add an option to print file names in error messages.
This can help identify if an error is in the main source file or an include file.
2022-02-04 22:10:50 -06:00
Stephen Heumann 73d194c12f Allow string constants with up to 32760 bytes.
This allows the length of the string plus a few extra bytes used internally to be represented by a 16-bit integer. Since the size limit for memory allocations has been raised, there is no good reason to impose a shorter limit on strings.

Note that C99 and later specify a minimum translation limit for string constants of at least 4095 characters.
2021-10-24 21:43:43 -05:00
Stephen Heumann 692ebaba85 Structs or arrays may not contain structs with a flexible array member.
We previously ignored this, but it is a constraint violation under the C standards, so it should be reported as an error.

GCC and Clang allow this as an extension, as we were effectively doing previously. We will follow the standards for now, but if there was demand for such an extension in ORCA/C, it could be re-introduced subject to a #pragma ignore flag.
2021-10-17 22:22:42 -05:00
Stephen Heumann 5871820e0c Support UTF-8/16/32 string literals and character constants (C11).
These have u8, u, or U prefixes, respectively. The types char16_t and char32_t (defined in <uchar.h>) are used for UTF-16 and UTF-32 code points.
2021-10-11 20:54:37 -05:00
Stephen Heumann 7ae830ae7e Initial support for compound literals.
Compound literals outside of functions should work at this point.

Compound literals inside of functions are not fully implemented, so they are disabled for now. (There is some code to support them, but the code to actually initialize them at the appropriate time is not written yet.)
2021-09-16 18:34:55 -05:00
Stephen Heumann 851d7d0787 Update displayed version number to mark this as a development version. 2021-09-15 18:34:27 -05:00
Stephen Heumann 617c46095d Update ORCA/C version number to 2.2.0 B5. 2021-09-12 18:12:36 -05:00
Stephen Heumann 894baac94f Give an error if assigning to a whole struct or union that has a const member.
Such structs or unions are not modifiable lvalues, so they cannot be assigned to as a whole. Any non-const fields can be assigned to individually.
2021-09-11 18:12:58 -05:00
Stephen Heumann b16210a50b Record volatile and restrict qualifiers in types.
These are needed to correctly distinguish pointer types in _Generic. They should also be used for type compatibility checks in other contexts, but currently are not.

This also fixes a couple small problems related to type qualifiers:
*restrict was not allowed to appear after * in type-names
*volatile status was not properly recorded in sym files

Here is an example of using _Generic to distinguish pointer types based on the qualifiers of the pointed-to type:

#include <stdio.h>

#define f(e) _Generic((e),\
        int * restrict *: 1,\
        int * volatile const *: 2,\
        int **: 3,\
        default: 0)

#define g(e) _Generic((e),\
        int *: 1,\
        const int *: 2,\
        volatile int *: 3,\
        default: 0)

int main(void) {
        int * restrict * p1;
        int * volatile const * p2;
        int * const * p3;

        // should print "1 2 0 1"
        printf("%i %i %i %i\n", f(p1), f(p2), f(p3), f((int * restrict *)0));

        int *q1;
        const int *q2;
        volatile int *q3;
        const volatile int *q4;

        // should print "1 2 3 0"
        printf("%i %i %i %i\n", g(q1), g(q2), g(q3), g(q4));
}

Here is an example of a problem resulting from volatile not being recorded in sym files (if a sym file was present, the read of x was lifted out of the loop):

#pragma optimize -1
static volatile int x;
#include <stdio.h>
int main(void) {
        int y;
        for (unsigned i = 0; i < 100; i++) {
                y = x*2 + 7;
        }
}
2021-08-30 18:19:58 -05:00
Stephen Heumann 979852be3c Use the right types for constants cast to character types.
These were previously treated as having type int. This resulted in incorrect results from sizeof, and would also be a problem for _Generic if it was implemented.

Note that this creates a token kind of "charconst", but this is not the kind for character constants in the source code. Those have type int, so their kind is intconst. The new kinds of "tokens" are created only through casts of constant expressions.
2021-03-07 13:38:21 -06:00
Stephen Heumann 8f8e7f12e2 Distinguish the different types of floating-point constants.
As with expressions, the type does not actually limit the precision and range of values represented.
2021-03-07 00:48:51 -06:00
Stephen Heumann f9f79983f8 Implement the standard pragmas, in particular FENV_ACCESS.
The FENV_ACCESS pragma is now implemented. It causes floating-point operations to be evaluated at run time to the maximum extent possible, so that they can affect and be affected by the floating-point environment. It also disables optimizations that might evaluate floating-point operations at compile time or move them around calls to the <fenv.h> functions.

The FP_CONTRACT and CX_LIMITED_RANGE pragmas are also recognized, but they have no effect. (FP_CONTRACT relates to "contracting" floating-point expressions in a way that ORCA/C does not do, and CX_LIMITED_RANGE relates to complex arithmetic, which ORCA/C does not support.)
2021-03-06 00:57:13 -06:00
Stephen Heumann 4ad7a65de6 Process floating-point values within the compiler using the extended type.
This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2.

This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.
2021-03-04 23:58:08 -06:00
Stephen Heumann cf463ff155 Support switch statements using long long expressions. 2021-02-17 19:41:46 -06:00
Stephen Heumann 2408c9602c Make expressionValue a saturating approximation of the true value for long long expressions.
This gives sensible behavior for several things in the parser, e.g. where all negative values or all very large values should be disallowed.
2021-02-04 12:44:44 -06:00
Stephen Heumann a59a2427fd Add some support for ++/-- on long long values.
Some more complex cases require pc_ind, which is not implemented yet.
2021-02-04 12:35:28 -06:00
Stephen Heumann 168a06b7bf Add support for emitting 64-bit constants in statically-initialized data. 2021-02-04 02:17:10 -06:00
Stephen Heumann 793f0a57cc Initial support for constants with long long types.
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
2021-02-03 23:11:23 -06:00
Stephen Heumann 2222e4a0b4 Restore old order of baseTypeEnum values.
The ordinal values of these are hard-coded in code for handling pc_cnv/pc_cnn, so let's avoid changing them.
2021-01-29 23:23:03 -06:00