This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
If strict type checking is enabled, this will prohibit redefinition of enums, like:
enum E {a,b,c};
enum E {x,y,z};
It also prohibits use of an "enum E" type specifier if the enum has not been previously declared (with its constants).
These things were historically supported by ORCA/C, but they are prohibited by constraints in section 6.7.2.3 of C99 and later. (The C90 wording was different and less clear, but I think they were not intended to be valid there either.)
This makes a macro defined on the command line like -Dfoo=-1 consist of two tokens, the same as it would if defined in code. (Previously, it was just one token.)
This also somewhat expands the set of macros accepted on the command line. A prefix of +, -, *, &, ~, or ! (the one-character unary operators) can now be used ahead of any identifier, number, or string. Empty macro definitions like -Dfoo= are also permitted.
The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.
Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
It should only be done after all the ## operators in the macro have been evaluated, potentially merging together several tokens via successive ## operators.
Here is an example illustrating the problem:
#define merge(a,b,c) a##b##c
#define foobar
#define foobarbaz a
int merge(foo,bar,baz) = 42;
int main(void) {
return a;
}
If such macros were used within other macros, they would generally not be expanded, due to the order in which operations were evaluated during preprocessing.
This is actually an issue that was fixed by the changes from ORCA/C 2.1.0 to 2.1.1 B3, but then broken again by commit d0b4b75970.
Here is an example with the name of a keyword:
#define X long int
#define long
X x;
int main(void) {
return sizeof(x); /* should be sizeof(int) */
}
Here is an example with the name of a typedef:
typedef short T;
#define T long
#define X T
X x;
int main(void) {
return sizeof(x); /* should be sizeof(long) */
}
This is part of the general requirement that macro redefinitions be "identical" as defined in the standard.
This affects code like:
#define x [
#define x <:
This allows those tokens (asm, comp, extended, pascal, and segment) to be used as identifiers, consistent with the C standards.
A new pragma (#pragma extensions) is introduced to control this. It might also be used for other things in the future.
This did not work correctly before, because such tokens were recorded as starting with the third character of the trigraph.
Here is an example affected by this:
#define mkstr(a) # a
#include <stdio.h>
int main(void) {
puts(mkstr(??!));
puts(mkstr(??!??!));
puts(mkstr('??<'));
puts(mkstr(+??!));
puts(mkstr(+??'));
}
A suffix will now be printed on any integer constant with a type other than int, or any floating constant with a type other than double. This ensures that all constants have the correct types, and also serves as documentation of the types.
Previously, continuations or trigraphs would be included in the string as-is, which should not be the case because they are (conceptually) processed in earlier compilation phases. Initial trigraphs still do not get stringized properly, because the token starting position is not recorded correctly for them.
This fixes code like the following:
#define mkstr(a) # a
#include <stdio.h>
int main(void) {
puts(mkstr(a\
bc));
puts(mkstr(qr\
));
puts(mkstr(\
xy));
puts(mkstr(12??/
34));
puts(mkstr('??<'));
}
This is necessary for correct behavior if such tokens are subsequently stringized with #. Previously, only the first half of the token would be produced.
Here is an example demonstrating the issue:
#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define joinstr(a,b) in_between(a ## b)
#include <stdio.h>
int main(void) {
puts(joinstr(123,456));
puts(joinstr(abc,def));
puts(joinstr(dou,ble));
puts(joinstr(+,=));
puts(joinstr(:,>));
}
The string representation of macro tokens is needed for some preprocessor operations, but we get this in other ways (e.g. based on tokenStart/tokenEnd).
These are conceptually separate operations occurring in different phases of the translation process. This change means that ## can no longer merge string constants: such operations will give an error about an illegal token. Cases like this are technically undefined behavior, so the old behavior could have been permitted, but it is clearer and more consistent with other compilers to treat this as an error.
This ultimately should be supported, but that will be more work. For now, we just set the string representation to '?', which will usually give an error when merged. (Previously, whatever was at memory location 0 would be treated as the string representation of the token. Frequently this would just be an empty string, leading to no error but incorrect results.)
This is necessary for correct operation of the # and ## preprocessor operators on the tokens from such macros.
Integers with a sign character still have the non-standard property of being treated as a single token, so they cannot be used with ##, but in most cases such uses will now give an error.
If the appended file was another C file and that file contained an #include, this would create an invalid record in the sym file. It would record memory from the buffer holding the original file to the buffer holding the appended file. In general, these are not contiguous, so superfluous data from other parts of memory would be included in the sym file. This record would normally just be treated as invalid on subsequent compiles, but it could theoretically be very large (depending on the memory layout) and might contain sensitive data from other parts of memory.
They were not being saved, which would result in ORCA/C not searching the proper paths when looking for an include file after the sym file had ended. Here is an example showing the problem:
#pragma path "include"
#include <stdio.h>
int k = 50;
#include "n.h" /* will not find include:n.h */
There were various places where the flag for macro expansions was saved, set to false, and then later restored. If #pragma expand was used within those areas, it would not be properly applied. Here is an example showing that problem:
void f(void
#pragma expand 1
) {}
This could also affect some uses of #pragma expand within precompiled headers, e.g.:
#pragma expand 1
#include "a.h"
#undef foobar
#include "b.h"
...
Also, add a note saying that code in precompiled headers will not be expanded. (This has always been the case, but was not clearly documented.)
Previously, these might or might not be saved (based on the contents of uninitialized memory), but in many cases they were. This was unnecessary, since these macros are automatically defined when the scanner is initialized. Reading them from the sym file could result in duplicate copies of them in the macro list. This is usually harmless, but might result in #undefs of macros from the command line not working properly.
This would occur if the macro had already been saved in the sym file and the #undef occurred before a subsequent #include that was also recorded in the sym file. The solution is simply to terminate sym file generation if an #undef of an already-saved macro is encountered.
Here is an example showing the problem:
test.c:
#include "test1.h"
#undef x
#include "test2.h"
int main(void) {
#ifdef x
return x;
#else
return y;
#endif
}
test1.h:
#define x 27
test2.h:
#define y 6
There were a couple issues that could occur with #pragma keep and sym files:
*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.
*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.
With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
This affects functions whose body spans multiple files due to includes, or is treated as doing so due to #line directives. ORCA/C will now generate a COP 6 instruction to record each source file change, allowing debuggers to properly track the flow of execution across files.
This causes __FILE__ to give the name of an include file if used within it, which seems to be what the standards intend (and what other compilers do). It also affects the file name recorded in debugging information for functions declared in an include file.
(Note that occ will generate a #line directive before an #append, essentially to work around the problem this patch fixes. After the patch, such a #line directive is effectively ignored. This should be OK, although it may result in a difference in whether a full or partial pathname is used for __FILE__ and in debug info.)
There were several existing optimizations that could change behavior in ways that violated the IEEE standard with regard to infinities, NaNs, or signed zeros. They are now gated behind a new #pragma optimize flag. This change allows intermediate code peephole optimization and common subexpression elimination to be used while maintaining IEEE conformance, but also keeps the rule-breaking optimizations available if desired.
See section F.9.2 of recent C standards for a discussion of how these optimizations violate IEEE rules.
This allows the length of the string plus a few extra bytes used internally to be represented by a 16-bit integer. Since the size limit for memory allocations has been raised, there is no good reason to impose a shorter limit on strings.
Note that C99 and later specify a minimum translation limit for string constants of at least 4095 characters.
We previously ignored this, but it is a constraint violation under the C standards, so it should be reported as an error.
GCC and Clang allow this as an extension, as we were effectively doing previously. We will follow the standards for now, but if there was demand for such an extension in ORCA/C, it could be re-introduced subject to a #pragma ignore flag.
The code for this was recursive and could overflow if there were several dozen consecutive string literals. It has been changed to only use one level of recursion, avoiding the problem.
Compound literals outside of functions should work at this point.
Compound literals inside of functions are not fully implemented, so they are disabled for now. (There is some code to support them, but the code to actually initialize them at the appropriate time is not written yet.)
The standard wording is not always clear on these cases, but I think at least some of them should be allowed and others may be undefined behavior (which we can choose to allow). At any rate, this allows non-standard escape sequences targeted at other compilers to appear in skipped-over code.
There probably ought to be similar handling for #defines that are never expanded, but that would require more code changes.
This applies to octal and hexadecimal sequences with out-of-range values, and also to unrecognized escape characters. The C standards say both of these cases are syntax/constraint violations requiring a diagnostic.