This does not really do anything, because ORCA/C does not support multithreading, but the C11 and later standards indicate it should be allowed anyway.
Note that this implementation allows anonymous structures and unions to participate in initialization. That is, you can have a braced initializer list corresponding to an anonymous structure or union. Also, anonymous structures within unions follow the initialization rules for structures (and vice versa).
I think the better interpretation of the standard text is that anonymous structures and unions cannot participate in initialization as such, and instead their members are treated as members of the containing structure or union for purposes of initialization. However, all other compilers I am aware of allow anonymous structures and unions to participate in initialization, so I have implemented it that way too.
This is necessary to correctly handle line continuations in a few places:
* Between an initial . and the subsequent digit in a floating constant
* Between the third and fourth characters of a %:%: digraph
* Between the second and third dots of a ... token
Previously, these would not be tokenized correctly, leading to spurious errors in the first and second cases above.
Here is a sample program illustrating the problem:
int printf(const char * restrict, ..\
\
??/
.);
int main(void) {
double d = .??/
\
??/
\
1234;
printf("%f\n", d);
}
The branch range calculation treated dcl directives as taking 2 bytes rather than 4, which could result in out-of-range branches. These could result in linker errors (for forward branches) or silently generating wrong code (for backward branches).
This patch now treats dcb, dcw, and dcl as separate directives in the native-code layer, so the appropriate length can be calculated for each.
Here is an example of code affected by this:
int main(int argc, char **argv) {
top:
if (!argc) { /* this caused a linker error */
asm {
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
dcl 0
}
goto top; /* this generated bad code with no error */
}
}
Previously, the assembly-level optimizations applied to code in asm statements. In many cases, this was fine (and could even do useful optimizations), but occasionally the optimizations could be invalid. This was especially the case if the assembly involved tricky things like self-modifying code.
To avoid these problems, this patch makes the assembly optimizers ignore code from asm statements, so it is always emitted as-is, without any changes.
This fixes#34.
A declaration of this exact form always declares the tag T within the current scope, and as such makes this "struct T" a distinct type from any other "struct T" type in an outer scope. (Similarly for unions.)
See C17 section 6.7.2.3 p7 (and corresponding places in all other C standards).
Here is an example of a program affected by this:
struct S {char a;};
int main(void) {
struct S;
struct S *sp;
struct S {long b;} s;
sp = &s;
sp->b = sizeof(*sp);
return s.b;
}
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
The added characters are accented roman letters that were added to the Mac OS Roman character set at some time after it was first defined. Some IIGS fonts include them, although others do not.
The "known issue" about not issuing required diagnostics is removed because ORCA/C has gotten significantly better about that, particularly if strict type checking is enabled. There are still probably some diagnostics that are missed, but it is no longer a big enough issue to be called out more prominently than other bugs.
If strict type checking is enabled, this will prohibit redefinition of enums, like:
enum E {a,b,c};
enum E {x,y,z};
It also prohibits use of an "enum E" type specifier if the enum has not been previously declared (with its constants).
These things were historically supported by ORCA/C, but they are prohibited by constraints in section 6.7.2.3 of C99 and later. (The C90 wording was different and less clear, but I think they were not intended to be valid there either.)
This affects code like the following:
enum E {a,b,c};
int main(void) {
enum E e;
struct E {int x;}; /* or: enum E {x,y,z}; */
}
The line "enum E e;" should refer to the enum type declared in the outer scope, but not redeclare it in the inner scope. Therefore, a subsequent struct, union, or enum declaration using the same tag in the same scope is acceptable.
The optimization could turn an unsigned comparison "x <= 0xFFFF" into "x < 0".
Here is an example affected by this:
int main(void) {
unsigned i = 1;
return (i <= 0xffff);
}
These could occur because the code for certain operations was assumed to set the z flag based on the result value, but did not actually do so. The affected operations were shifts, loads or stores of bit-fields, and ? : expressions.
Here is an example showing the problem with a shift:
#pragma optimize 1
int main(void) {
int i = 1, j = 0;
return (i >> j) ? 1 : 0;
}
Here is an example showing the problem with a bit-field load:
struct {
signed int i : 16;
} s = {1};
int main(void) {
return (s.i) ? 1 : 0;
}
Here is an example showing the problem with a bit-field store:
#pragma optimize 1
struct {
signed int i : 16;
} s;
int main(void) {
return (s.i = 1) ? 1 : 0;
}
Here is an example showing the problem with a ? : expression:
#pragma optimize 1
int main(void) {
int a = 5;
return (a ? (a<<a) : 0) ? 0 : 1;
}
This changes unsigned 16-bit multiplies to use the new ~CUMul2 routine in ORCALib, rather than ~UMul2 in SysLib. They differ in that ~CUMul2 gives the low-order 16 bits of the true result in case of overflow. The C standards require this behavior for arithmetic on unsigned types.
Now rewind() will always be called as a function. In combination with an update to the rewind() function in ORCALib, this will ensure that the error indicator is always cleared, as required by the C standards.
This was non-standard in various ways, mainly in regard to pointer types. It has been rewritten to closely follow the specification in the C standards.
Several helper functions dealing with types have been introduced. They are currently only used for ? :, but they might also be useful for other purposes.
New tests are also introduced to check the behavior for the ? : operator.
This fixes#35 (including the initializer-specific case).
This affects expressions like &*a (where a is an array) or &*"string". In most contexts, these undergo array-to-pointer conversion anyway, but as an operand of sizeof they do not. This leads to sizeof either giving the wrong value (the size of the array rather than of a pointer) or reporting an error when the array size is not recorded as part of the type (which is currently the case for string constants).
In combination with an earlier patch, this fixes#8.
The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.
Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
It should only be done after all the ## operators in the macro have been evaluated, potentially merging together several tokens via successive ## operators.
Here is an example illustrating the problem:
#define merge(a,b,c) a##b##c
#define foobar
#define foobarbaz a
int merge(foo,bar,baz) = 42;
int main(void) {
return a;
}
If such macros were used within other macros, they would generally not be expanded, due to the order in which operations were evaluated during preprocessing.
This is actually an issue that was fixed by the changes from ORCA/C 2.1.0 to 2.1.1 B3, but then broken again by commit d0b4b75970.
Here is an example with the name of a keyword:
#define X long int
#define long
X x;
int main(void) {
return sizeof(x); /* should be sizeof(int) */
}
Here is an example with the name of a typedef:
typedef short T;
#define T long
#define X T
X x;
int main(void) {
return sizeof(x); /* should be sizeof(long) */
}
This allows those tokens (asm, comp, extended, pascal, and segment) to be used as identifiers, consistent with the C standards.
A new pragma (#pragma extensions) is introduced to control this. It might also be used for other things in the future.
This did not work correctly before, because such tokens were recorded as starting with the third character of the trigraph.
Here is an example affected by this:
#define mkstr(a) # a
#include <stdio.h>
int main(void) {
puts(mkstr(??!));
puts(mkstr(??!??!));
puts(mkstr('??<'));
puts(mkstr(+??!));
puts(mkstr(+??'));
}
A suffix will now be printed on any integer constant with a type other than int, or any floating constant with a type other than double. This ensures that all constants have the correct types, and also serves as documentation of the types.
Previously, continuations or trigraphs would be included in the string as-is, which should not be the case because they are (conceptually) processed in earlier compilation phases. Initial trigraphs still do not get stringized properly, because the token starting position is not recorded correctly for them.
This fixes code like the following:
#define mkstr(a) # a
#include <stdio.h>
int main(void) {
puts(mkstr(a\
bc));
puts(mkstr(qr\
));
puts(mkstr(\
xy));
puts(mkstr(12??/
34));
puts(mkstr('??<'));
}
This is necessary for correct behavior if such tokens are subsequently stringized with #. Previously, only the first half of the token would be produced.
Here is an example demonstrating the issue:
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define joinstr(a,b) in_between(a ## b)
#include <stdio.h>
int main(void) {
puts(joinstr(123,456));
puts(joinstr(abc,def));
puts(joinstr(dou,ble));
puts(joinstr(+,=));
puts(joinstr(:,>));
}
They were not being saved, which would result in ORCA/C not searching the proper paths when looking for an include file after the sym file had ended. Here is an example showing the problem:
#pragma path "include"
#include <stdio.h>
int k = 50;
#include "n.h" /* will not find include:n.h */
There were various places where the flag for macro expansions was saved, set to false, and then later restored. If #pragma expand was used within those areas, it would not be properly applied. Here is an example showing that problem:
void f(void
#pragma expand 1
) {}
This could also affect some uses of #pragma expand within precompiled headers, e.g.:
#pragma expand 1
#include "a.h"
#undef foobar
#include "b.h"
...
Also, add a note saying that code in precompiled headers will not be expanded. (This has always been the case, but was not clearly documented.)
This would occur if the macro had already been saved in the sym file and the #undef occurred before a subsequent #include that was also recorded in the sym file. The solution is simply to terminate sym file generation if an #undef of an already-saved macro is encountered.
Here is an example showing the problem:
test.c:
#include "test1.h"
#undef x
#include "test2.h"
int main(void) {
#ifdef x
return x;
#else
return y;
#endif
}
test1.h:
#define x 27
test2.h:
#define y 6
Macros and include paths from the cc= parameters may be included in the symbol file, so incorrect behavior could result if the symbol file was used for a later compilation with different cc= parameters.
There were a couple issues that could occur with #pragma keep and sym files:
*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.
*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.
With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
This affects functions whose body spans multiple files due to includes, or is treated as doing so due to #line directives. ORCA/C will now generate a COP 6 instruction to record each source file change, allowing debuggers to properly track the flow of execution across files.
This causes __FILE__ to give the name of an include file if used within it, which seems to be what the standards intend (and what other compilers do). It also affects the file name recorded in debugging information for functions declared in an include file.
(Note that occ will generate a #line directive before an #append, essentially to work around the problem this patch fixes. After the patch, such a #line directive is effectively ignored. This should be OK, although it may result in a difference in whether a full or partial pathname is used for __FILE__ and in debug info.)
There were several forms that should be permitted but were not, such as &"str"[1], &*"str", &*a (where a is an array), and &*f (where f is a function).
This fixes#15 and also certain other cases illustrated in the following example:
char a[10];
int main(void);
static char *s1 = &"string"[1];
static char *s2 = &*"string";
static char *s3 = &*a;
static int (*f2)(void)=&*main;
This is necessary to allow declarations of pascal-qualified function pointers as members of a structure, among other things.
Note that the behavior for "pascal" now differs from that for the standard function specifiers, which have more restrictive rules for where they can be used. This is justified by the fact that the "pascal" qualifier is allowed and meaningful for function pointer types, so it should be able to appear anywhere they can.
This fixes#28.
The parameters of the underlying function type were not being reversed when applying the "pascal" qualifier to a function pointer type. This resulted in the parameters not being in the expected order when a call was made using such a function pointer. This could result in spurious errors in some cases or inappropriate parameter conversions in others.
This fixes#75.