In certain error cases, tokens from subsequent lines could get treated as part of a preprocessor expression, causing subsequent code to be essentially ignored and producing strange error messages.
Here is an example (with an error) affected by this:
#pragma optimize 0 0
int main(void) {}
The scanner has been updated so that ... should always get recognized as a single token, so this is no longer necessary as a workaround. Any code that actually uses separate . . . is non-standard and will need to be changed.
This does not really do anything, because ORCA/C does not support multithreading, but the C11 and later standards indicate it should be allowed anyway.
The main changes made to most tests are:
*Declarations always include explicit types, not relying on implicit int. The declaration of main in most test programs is changed to be "int main (void) {...}", adding an explicit return type and a prototype. (There are still some non-prototyped functions, though.)
*Functions are always declared before use, either by including a header or by providing a declaration for the specific function. The latter approach is usually used for printf, to avoid requiring ORCA/C to process stdio.h when compiling every test case (which might make test runs noticeably slower).
*Make all return statements in non-void functions (e.g. main) return a value.
*Avoid some instances of undefined behavior and type errors in printf and scanf calls.
Several miscellaneous bugs are also fixed.
There are still a couple test cases that intentionally rely on the C89 behavior, to ensure it still works.
If a struct contained a function pointer with a prototyped parameter list, processing the parameters could reset the declaredTagOrEnumConst flag, potentially leading to a spurious error, as in this example:
struct S {
int (*f)(int);
};
This also gives a better error for structs declared as containing functions.
Note that this implementation allows anonymous structures and unions to participate in initialization. That is, you can have a braced initializer list corresponding to an anonymous structure or union. Also, anonymous structures within unions follow the initialization rules for structures (and vice versa).
I think the better interpretation of the standard text is that anonymous structures and unions cannot participate in initialization as such, and instead their members are treated as members of the containing structure or union for purposes of initialization. However, all other compilers I am aware of allow anonymous structures and unions to participate in initialization, so I have implemented it that way too.
This is necessary to correctly handle line continuations in a few places:
* Between an initial . and the subsequent digit in a floating constant
* Between the third and fourth characters of a %:%: digraph
* Between the second and third dots of a ... token
Previously, these would not be tokenized correctly, leading to spurious errors in the first and second cases above.
Here is a sample program illustrating the problem:
int printf(const char * restrict, ..\
\
??/
.);
int main(void) {
double d = .??/
\
??/
\
1234;
printf("%f\n", d);
}
They are supposed to be macros, according to the C standards. This ordinarily doesn't matter, but it can be detected by #ifdef, as in the following program:
#include <stdio.h>
#ifdef stdin
int main(void) {
puts("stdin is a macro");
}
#endif
Previously, it included some instances that violate the standard constraint that a declaration must declare a declarator, a tag, or an enum constant. As of commit f263066f616, this constraint is now enforced, so those cases would (properly) give errors.
C90 had constraints requiring # and ## tokens to only appear in preprocessing directives, but C99 and later removed those constraints, so this code is no longer necessary when targeting current languages versions. (It would be necessary in a "strict C90" mode, if that was ever implemented.)
The main practical effect of this is that # and ## tokens can be passed as parameters to macros, provided the macro either ignores or stringizes that parameter. # and ## tokens still have no role in the grammar of the C language after preprocessing, so they will be an unexpected token and produce some kind of error if they appear anywhere.
This also contains a change to ensure that a line containing one or more illegal characters (e.g. $) and then a # is not treated as a preprocessing directive.
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).
This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.
Here is an example of code affected by this:
int main(int argc, char **argv) {
top:
if (!argc) { /* this caused a linker error */
asm {
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
}
goto top; /* this generated bad code with no error */
}
}
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.
To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.
This fixes#34.
If one step of peephole optimization produced code that can be further optimized with more peephole optimizations, that additional optimization was not always done. This makes sure the additional optimization is done in several such cases.
This was particularly likely to affect functions containing asm blocks (because CheckLabels would never trigger rescanning in them), but could also occur in other cases.
Here is an example affected by this (generating inefficient code to load a[1]):
#pragma optimize 1
int a[10];
void f(int x) {}
int main(int argc, char **argv) {
if (argc) return 0;
f(a[1]);
}
This accords with its definition in the C standards. For the time being, the old form of three separate tokens is still accepted too, because the ... token may not be scanned correctly in the obscure case where there is a line continuation between the second and third dots.
One observable effect of this is that there are no longer spaces between the dots in #pragma expand output.
This enforces the constraint from C17 section 6.7 p2 that declarations "shall declare at least a declarator (other than the parameters of a function or the members of a structure or union), a tag, or the members of an enumeration."
Somewhat relaxed rules are used for enums in the default loose type checking mode, similar to what GCC and Clang do.
A declaration of this exact form always declares the tag T within the current scope, and as such makes this "struct T" a distinct type from any other "struct T" type in an outer scope. (Similarly for unions.)
See C17 section 6.7.2.3 p7 (and corresponding places in all other C standards).
Here is an example of a program affected by this:
struct S {char a;};
int main(void) {
struct S;
struct S *sp;
struct S {long b;} s;
sp = &s;
sp->b = sizeof(*sp);
return s.b;
}
They are now represented in local structures instead. This keeps the representation of declaration specifiers together and eliminates the need for awkward and error-prone code to save and restore the global variables.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
The added characters are accented roman letters that were added to the Mac OS Roman character set at some time after it was first defined. Some IIGS fonts include them, although others do not.
The "known issue" about not issuing required diagnostics is removed because ORCA/C has gotten significantly better about that, particularly if strict type checking is enabled. There are still probably some diagnostics that are missed, but it is no longer a big enough issue to be called out more prominently than other bugs.
If strict type checking is enabled, this will prohibit redefinition of enums, like:
enum E {a,b,c};
enum E {x,y,z};
It also prohibits use of an "enum E" type specifier if the enum has not been previously declared (with its constants).
These things were historically supported by ORCA/C, but they are prohibited by constraints in section 6.7.2.3 of C99 and later. (The C90 wording was different and less clear, but I think they were not intended to be valid there either.)
This affects code like the following:
enum E {a,b,c};
int main(void) {
enum E e;
struct E {int x;}; /* or: enum E {x,y,z}; */
}
The line "enum E e;" should refer to the enum type declared in the outer scope, but not redeclare it in the inner scope. Therefore, a subsequent struct, union, or enum declaration using the same tag in the same scope is acceptable.
They now use a jmp (addr,X) instruction, rather than a more complicated code sequence using rts. This is an improvement that was suggested in an old Genie message from Todd Whitesel.
This avoids strangeness where an enum tag declared within a typedef declaration would act like a typedef. For example, the following would compile without error:
typedef enum E {a,b,c} T;
E e;
This covers code like the following, which is very dubious but does not seem to be clearly prohibited by the standards:
int main(void) {
void *vp;
*vp;
}
Previously, this would do an indirect load of a four-byte value at the location, but then treat it as void. This could lead to the four-byte value being left on the stack, eventually causing a crash. Now we just evaluate the pointer expression (in case it has side effects), but effectively cast it to void without dereferencing it.
This avoids any possible issue with code possibly expecting __FE_DFL_ENV to be in the data bank when using the large memory model, although I don't think that happened in practice.