Commit Graph

924 Commits

Author SHA1 Message Date
Stephen Heumann
08f1380d21 Allow some C23 features in earlier compatibility modes.
Currently, this covers the following features, which should not cause compatibility problems:

-Recognize :: as a punctuator
-Allow one-argument _Static_assert
-Let variadic macro invocations omit final comma for empty varargs
-Define va_start() such that the second parameter is not required
-Allow UCNs less that \u00A0 in string literals and character constants
2024-09-14 15:24:53 -05:00
Stephen Heumann
fd0b4920f6 Allow Mac OS Roman characters $F6 and $FF to be used in identifiers.
The corresponding Unicode characters (U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT and U+02C7 CARON) have the XID_Start and XID_Continue properties, so they are allowable in identifiers per C23 rules. We will just allow them in all language modes, since C99 through C17 permit the use of "other implementation-defined characters".
2024-09-14 14:23:56 -05:00
Stephen Heumann
ead95bcb12 Implement C23 changes to universal character names.
As of C23, UCNs within string literals or character constants can contain any valid Unicode code point, including ASCII characters or control characters.

The validity of UCNs within identifiers is now defined based on the XID_Start and XID_Continue Unicode properties. A helper program is used to generate tables of the allowed characters based on a Unicode data file. These can be updated for future Unicode versions by re-running the helper program using the updated Unicode data files.
2024-09-13 22:14:43 -05:00
Stephen Heumann
bae40bc615 Define va_start() such that the second parameter is not required (C23). 2024-08-30 21:24:48 -05:00
Stephen Heumann
67a94eda4c Let variadic macro invocations omit final comma for empty varargs (C23). 2024-08-30 21:19:37 -05:00
Stephen Heumann
41440e1db1 Add unreachable() macro (C23).
This could be used for optimizations and/or for static or dynamic error checking, but currently it is not.
2024-08-30 20:39:24 -05:00
Stephen Heumann
b35cc8e2e5 Include width macros in <limits.h> and <stdint.h> (C23). 2024-08-30 18:40:03 -05:00
Stephen Heumann
22627cbc0d Permit one-argument version of static_assert in C23 mode. 2024-08-30 13:52:09 -05:00
Stephen Heumann
1db26a88ad Give u8-prefixed strings "array of unsigned char" type in C23.
This is a change from C11/C17, where they were arrays of char.
2024-08-30 13:45:20 -05:00
Stephen Heumann
0fe9424373 Disable recognition of trigraphs in C23 mode. 2024-08-30 13:43:59 -05:00
Stephen Heumann
b27fc52dc4 Recognize :: as a punctuator token (C23).
For the moment, it behaves as expected with regard to token merging and stringization, but otherwise it doesn't do anything. (It can be used in attributes, but those aren't implemented yet.)
2024-08-29 22:04:53 -05:00
Stephen Heumann
1e415aadd7 Parse _DecimalN keywords as (unsupported) type specifiers.
This gives a clearer error message when they are used and helps to keep the subsequent parsing on track.
2024-08-29 13:24:08 -05:00
Stephen Heumann
f125a6640c Handle alignas/alignof/bool/static_assert/thread_local keywords (C23).
These are now treated equivalently to the old versions that start with _.

Note that thread_localsy and alignassy are canonicalized to _Thread_localsy and _Alignassy when recorded as declaration modifiers.
2024-08-29 13:16:01 -05:00
Stephen Heumann
9ecbd42c1a Recognize new C23 keywords as keywords (in C23 mode).
They are recognized as keywords, but they currently don't do anything.
2024-08-28 21:58:59 -05:00
Stephen Heumann
f59c2cf93d Add a C23 standard option to the compiler.
The default is still c17compat.
2024-08-27 22:06:55 -05:00
Stephen Heumann
3a0dbd2e15 Update version number for ORCA/C 2.3.0 development. 2024-08-27 21:55:57 -05:00
Stephen Heumann
b363a2c006 Update ORCA/C version number to 2.2.1 final.
Also, update the release notes.
2024-08-21 19:05:15 -05:00
Stephen Heumann
55898ddd60 Adjust a test for long numeric constants.
The 255-character limit for numeric constants now includes any suffix.
2024-08-21 18:22:05 -05:00
Stephen Heumann
347ad00ff7 Allow the operand of sizeof to be an un-parenthesized compound literal.
This is allowed based on the C standard syntax, but it previously gave a spurious error in ORCA/C, because the parenthesized type name at the beginning of the compound literal was parsed as the complete operand to sizeof.

Here is an example program affected by this:

int main(void) {
        return sizeof (char[]){1,2,3}; // should return 3
}
2024-08-11 20:33:36 -05:00
Stephen Heumann
5f59f152ed Allow function pointer == NULL comparisons under strict type checks.
These were improperly being flagged as an error, because (void*)0 was not being recognized as a null pointer constant in this context.
2024-08-02 20:26:40 -05:00
Stephen Heumann
a9e0b13e1c Require at least one hex digit in \x escape sequences.
This is required by the grammar given in the C standards.
2024-07-31 17:05:57 -05:00
Stephen Heumann
e11c24bc24 Add two constant definitions related to list controls.
These constants are documented in Programmer's Reference for System 6.0.
2024-07-04 12:54:46 -05:00
Stephen Heumann
6cf573a87c Generate better code for equality comparisons against -1/0xFFFF. 2024-07-04 12:47:16 -05:00
Stephen Heumann
69320cd4d8 Detect some erroneous numeric constants that were being allowed.
These include tokens like 0x, 0b, and 1.2LL.
2024-04-23 22:07:19 -05:00
Stephen Heumann
8278f7865a Support unconvertible preprocessing numbers.
These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.
2024-04-23 21:39:14 -05:00
Stephen Heumann
6b7414384f Fix code generation bug for indirect load/store of 64-bit values.
The issue was that if a 64-bit value was being loaded via one pointer and stored via another, the load and store parts could both be using y for their indexing, but they would clash with each other, potentially leading to loads coming from the wrong place.

Here are some examples that illustrate the problem:

/* example 1 */
int main(void) {
        struct {
                char c[16];
                long long x;
        } s = {.x = 0x1234567890abcdef}, *sp = &s;
        long long ll, *llp = &ll;
        *llp = sp->x;
        return ll != s.x; // should return 0
}

/* example 2 */
int main(void) {
        struct {
                char c[16];
                long long x;
        } s = {.x = 0x1234567890abcdef}, *sp = &s;
        long long ll, *llp = &ll;
        unsigned i = 0;
        *llp = sp[i].x;
        return ll != s.x; // should return 0
}

/* example 3 */
int main(void) {
        long long x[2] = {0, 0x1234567890abcdef}, *xp = x;
        long long ll, *llp = &ll;
        unsigned i = 1;
        *llp = xp[i];
        return ll != x[1]; // should return 0
}
2024-04-10 20:49:17 -05:00
Stephen Heumann
77e0b8fc59 Fix codegen error for some indirect accesses to 64-bit values.
The code was not properly adding in the offset of the 64-bit value from the pointed-to location, so the wrong memory location would be accessed. This affected indirect accesses to non-initial structure members, when used as operands to certain operations.

Here is an example showing the problem:

#include <stdio.h>

long long x = 123456;

struct S {
        long long a;
        long long b;
} s = {0, 123456};

int main(void) {
        struct S *sp = &s;

        if (sp->b != x) {
                puts("error");
        }
}
2024-04-03 21:04:47 -05:00
Stephen Heumann
50636bd28b Fix code generation for qualified struct or union function parameters.
They were not being properly recognized as structs/unions, so they were being passed by address rather than by value as they should be.

Here is an example affected by this:

struct S {int a,b,c,d;};

int f(struct S s) {
    return s.a + s.b + s.c + s.d;
}

int main(void) {
    const struct S s = {1,2,3,4};
    return f(s);
}
2024-04-01 20:37:51 -05:00
Stephen Heumann
83537fd3c7 Disable a peephole optimization that can produce bad code.
The optimization applies to code sequences like:
	dec abs
	lda abs
	beq ...
where the dec and lda were supposed to refer to the same location.

There were two problems with this optimization as written:
-It considered the dec and lda to refer to the same location even if they were actually references to different elements of the same array.
-It did not work in the case where the A register value was needed in subsequent code.

The first of these was already an issue in previous ORCA/C releases, as in the following example:

#pragma optimize -1
int x[2] = {0,0};
int main(void) {
        --x[0];
        if (x[1] != 0)
                return 123;
        return 0; /* should return 0 */
}

I do not believe the second problem was triggered by any code sequences generated in previous releases of ORCA/C, but it can be triggered after commit 4c402fc88, e.g. by the following example:

#pragma optimize -1
int x = 1;
int main(void) {
        int y = 123;
        --x;
        return x == 0; /* should return 1 */
}

Since the circumstances where this peephole optimization was triggered validly are pretty obscure, just disabling it should have a minimal impact on the generated code.
2024-03-17 21:31:18 -05:00
Stephen Heumann
81934109fc Fix issues with type names in the third expression of a for loop.
There were a couple issues here:
*If the type name contained a semicolon (for struct/union member declarations), a spurious error would be reported.
*Tags or enumeration constants declared in the type name should be in scope within the loop, but were not.

These both stemmed from the way the parser handled the third expression, which was to save the tokens from it and re-inject them at the end of the loop. To get the scope issues right, the expression really needs to be evaluated at the point where it occurs, so we now do that. To enable that while still placing the code at the end of the loop, a mechanism to remove and re-insert sections of generated code is introduced.

Here is an example illustrating the issues:

int main(void) {
        int i, j, x;
        for (i = 0; i < 123; i += sizeof(struct {int a;}))
                for (j = 0; j < 123; j += sizeof(enum E {A,B,C}))
                        x = i + j + A;
}
2024-03-13 22:09:25 -05:00
Stephen Heumann
72234a4f2b Generate better code for most unsigned 32-bit comparisons. 2024-03-10 21:24:33 -05:00
Stephen Heumann
36f766a662 Generate better code for comparisons against constant 1 or 2. 2024-03-06 21:57:27 -06:00
Stephen Heumann
4c402fc883 Generate better code for certain equality/inequality comparisons. 2024-03-06 21:18:50 -06:00
Stephen Heumann
ca0147507b Generate slightly better code for logical negation. 2024-03-06 17:04:51 -06:00
Stephen Heumann
24c6e72a83 Simplify some conditional branches.
This affects certain places where code like the following could be generated:

	bCC lab2
lab1	brl ...
lab2 ...

If lab1 is no longer referenced due to previous optimizations, it can be removed. This then allows the bCC+brl combination to be shortened to a single conditional branch, if the target is close enough.

This introduces a flag for tracking and potentially removing labels that are only used as the target of one branch. This could be used more widely, but currently it is only used for the specific code sequences shown above. Using it in other places could potentially open up possibilities for invalid native-code optimizations that were previously blocked due to the presence of the label.
2024-03-05 22:20:34 -06:00
Stephen Heumann
0f18fa63b5 Optimize some additional cases of a branch to a branch.
This covers patterns like

	bCC lab
	???
	???
lab:	bra/brl ...

These can come up in the new code for 32-bit ||, but also in cases like "if (i > 0) ...".
2024-03-05 17:16:17 -06:00
Stephen Heumann
8f07ca5d6c Generate better code for && and || with 32-bit operands. 2024-03-05 17:09:21 -06:00
Stephen Heumann
60b472a99e Optimize generated code for some indexing ops in large memory model.
This generates slightly better code for indexing a global/static char array with a signed 16-bit index and a positive offset, e.g. a[i+1].

Here is an example that is affected:

#pragma memorymodel 1
#pragma optimize -1
char a[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
int main(int argc, char *argv[]) {
        return a[argc+2];
}
2024-03-04 19:38:39 -06:00
Stephen Heumann
995885540b Correct a comment. 2024-03-04 19:03:13 -06:00
Stephen Heumann
34c5be5cab Update readme files for version 2.2.1. 2024-02-28 20:11:13 -06:00
Stephen Heumann
75a928e273 Change division-by-zero tests to cases requiring a constant expression. 2024-02-27 13:06:45 -06:00
Stephen Heumann
a545685ab4 Use more correct logic for expanding macros in macro parameters.
Specifically, this affects the case where a macro argument ends with the name of a function-like macro that takes 0 parameters. When that argument is initially expanded, the macro should not be expanded, even if there are parentheses within the macro that it is being passed to or the subsequent program code. This is the case because the C standards specify that "The argument’s preprocessing tokens are completely macro replaced before being substituted as if they formed the rest of the preprocessing file with no other preprocessing tokens being available." (The macro may still be expanded at a later stage, but that depends on other rules that determine whether the expansion is suppressed.) The logic for this was already present for the case of macros taking one or more argument; this extends it to apply to function-like macros taking zero arguments as well.

I'm not sure that this makes any practical difference while cycles of mutually-referential macros still aren't handled correctly (issue #48), but if that were fixed then there would be some cases that depend on this behavior.
2024-02-26 22:31:46 -06:00
Stephen Heumann
ce94f4e2b6 Do not pass negative sizes to strncat or strncmp in tests.
These parameters are of type size_t, which is unsigned.
2024-02-24 18:57:50 -06:00
Stephen Heumann
84fdb5c975 Fix handling of empty macro arguments as ## operands.
Previously, there were a couple problems:

*If the parameter that was passed an empty argument appeared directly after the ##, the ## would permanently be removed from the macro record, affecting subsequent uses of the macro even if the argument was not empty.
*If the parameter that was passed an empty argument appeared between two ## operators, both would effectively be skipped, so the tokens to the left of the first ## and to the right of the second would not be combined.

This example illustrates both issues (not expected to compile; just check preprocessor output):

#pragma expand 1
#define x(a,b,c) a##b##c
x(1, ,3)
x(a,b,c)
2024-02-22 21:55:46 -06:00
Stephen Heumann
d1847d40be Set numString properly for numeric tokens generated by ##.
Previously, it was not necessarily set correctly for the newly-generated token. This would result in incorrect behavior if that token was an operand to another ## operator, as in the following example:

#define x(a,b,c) a##b##c
x(1,2,3)
2024-02-22 21:42:05 -06:00
Stephen Heumann
c671bb71a5 Document recent library bug fixes. 2024-02-22 21:20:33 -06:00
Stephen Heumann
a646a03b5e Avoid possible errors when using postfix ++/-- on pointer expressions.
There was code that would attempt to use the cType field of the type record, but this is only valid for scalar types, not pointer types. In the case of a pointer type, the upper two bytes of the pointer would be interpreted as a cType value, and if they happened to have one of the values being tested for, incorrect intermediate code would be generated. The lower two bytes of the pointer would be used as a baseType value; this would most likely result in "compiler error" messages from the code generator, but might cause incorrect code generation with no errors if that value happened to correspond to a real baseType.

Code like the following might cause this error, although it only occurs if pointers have certain values and therefore depends on the memory layout at compile time:

void f(const int **p) {
    (*p)++;
}

This bug was introduced in commit f2a66a524a.
2024-02-09 20:45:14 -06:00
Stephen Heumann
7ca30d7784 Do not give a compile error for division of any integer constant by 0.
Division by zero produces undefined behavior if it is evaluated, but in general we cannot tell whether a given expression will actually be evaluated at run time, so we should not report this as a compile-time error.

We still report an error for division by zero in constant expressions that need to be evaluated at compile time. We also still produce a lint message about division by zero if the appropriate flag is enabled.
2024-02-02 20:03:34 -06:00
Stephen Heumann
c9dc566c10 Update release notes. 2024-02-02 18:31:48 -06:00
Stephen Heumann
2ca4aba5c4 Correct a misspelled error code in <gsos.h>. 2024-01-18 17:55:41 -06:00