These are now treated equivalently to the old versions that start with _.
Note that thread_localsy and alignassy are canonicalized to _Thread_localsy and _Alignassy when recorded as declaration modifiers.
This is allowed based on the C standard syntax, but it previously gave a spurious error in ORCA/C, because the parenthesized type name at the beginning of the compound literal was parsed as the complete operand to sizeof.
Here is an example program affected by this:
int main(void) {
return sizeof (char[]){1,2,3}; // should return 3
}
They were not being properly recognized as structs/unions, so they were being passed by address rather than by value as they should be.
Here is an example affected by this:
struct S {int a,b,c,d;};
int f(struct S s) {
return s.a + s.b + s.c + s.d;
}
int main(void) {
const struct S s = {1,2,3,4};
return f(s);
}
There was code that would attempt to use the cType field of the type record, but this is only valid for scalar types, not pointer types. In the case of a pointer type, the upper two bytes of the pointer would be interpreted as a cType value, and if they happened to have one of the values being tested for, incorrect intermediate code would be generated. The lower two bytes of the pointer would be used as a baseType value; this would most likely result in "compiler error" messages from the code generator, but might cause incorrect code generation with no errors if that value happened to correspond to a real baseType.
Code like the following might cause this error, although it only occurs if pointers have certain values and therefore depends on the memory layout at compile time:
void f(const int **p) {
(*p)++;
}
This bug was introduced in commit f2a66a524a23d.
Division by zero produces undefined behavior if it is evaluated, but in general we cannot tell whether a given expression will actually be evaluated at run time, so we should not report this as a compile-time error.
We still report an error for division by zero in constant expressions that need to be evaluated at compile time. We also still produce a lint message about division by zero if the appropriate flag is enabled.
This could occur because when FindSymbol was called to look for symbols in all spaces, it would find a tag in an inner scope before a typedef in an outer scope. The processing order has been changed to look for regular symbols (including typedefs) in any scope, and only look for tags if no regular symbol is found.
Here is an example illustrating the problem:
typedef int T;
int main(void) {
struct T;
T x;
}
The old approach would call GenerateCode twice for each parameter expression. Now, it is only called once. This is faster, and also avoids some oddities with error handling. With the previous approach, expressionType would not be set if there was an error in the expression, which could lead to additional spurious errors. Also, a lint message treated as a warning could appear twice.
This could happen in some cases where one subexpression of a larger expression was a constant. One effect of this was to cause spurious "lint: implicit conversion changes value of constant" messages in certain cases (when that lint check was enabled). It may also have caused certain errors to be missed in other situations.
It is legal for source files that do not include <stdio.h> to define static functions named printf, scanf, etc. These obviously are not the standard library functions and are not required to work the some way as them, so they should not be subject to format checking.
This adds debugging code to detect null pointer dereferences, as well as pointer arithmetic on null pointers (which is also undefined behavior, and can lead to later dereferences of the resulting pointers).
Note that ORCA/Pascal can already detect null pointer dereferences as part of its more general range-checking code. This implementation for ORCA/C will report the same error as ORCA/Pascal ("Subrange exceeded"). However, it does not include any of the other forms of range checking that ORCA/Pascal does, and (unlike in ORCA/Pascal) it is controlled by a separate flag from stack overflow checking.
These are erroneous, in situations where the expression is used for its value. For function return types, this violates a constraint (C17 6.5.2.2 p1), so a diagnostic is required. We also now diagnose this issue for identifier expressions or unary * (indirection) expressions. These cases cause undefined behavior per C17 6.3.2.1 p2, so a diagnostic is not required, but it is nice to give one.
Varargs-only stack repair (i.e. using #pragma optimize bit 3 but not bit 6) was broken by commit 32975b720fb4d79. It removed some code that was needed to allocate the direct page location used to hold the stack pointer value in that case. This would lead to invalid code being produced, which could cause a crash when run. The fix is to revert the erroneous parts of commit 32975b720fb4d79 (which do not affect its core purpose of enabling intermediate code peephole optimization to be used when stack repair code is active).
This occurs when the constant value is out of range of the type being assigned to. This is likely indicative of an error, or of code that assumes types have larger ranges than they do in ORCA/C (e.g. 32-bit int).
This intentionally does not report cases where a value is assigned to a signed type but is within the range of the corresponding unsigned type, or vice versa. These may be done intentionally, e.g. setting an unsigned value to "-1" or setting a signed value using a hex constant with the high bit set. Also, only conversions to 8-bit or 16-bit integer types are currently checked.
I think the reason this was originally disallowed is that the old code sequence for stack repair code (in ORCA/C 2.1.0) ended with TYA. If this was followed by STA dp or STA abs, the native code peephole optimizer (prior to commit 7364e2d2d329d81) would have turned the combination into a STY instruction. That is invalid if the value in A is needed. This could come up, e.g., when assigning the return value from a function to two different variables.
This is no longer an issue, because the current code sequence for stack repair code no longer ends in TYA and is not susceptible to the same kind of invalid optimization. So it is no longer necessary to disable the native code peephole optimizer when using stack repair code (either for all calls or just varargs calls).
This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).
This affects calls to a varargs function that do not actually supply any arguments beyond the fixed portion of the argument list, e.g. printf("foo"). Since these calls do not supply any variable arguments, they clearly do not include any extra variable arguments beyond those used by the function, so the standards-conformance issue that requires varargs stack repair code does not apply.
It is possible that the call may include too few variable arguments, but that is illegal behavior, and it will trash the stack even when varargs stack repair code is present (although in some cases programs may "get away" with it). If stack repair code around these calls is still desired, then general stack repair code can be enabled for all function calls.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.
One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
They are now represented in local structures instead. This keeps the representation of declaration specifiers together and eliminates the need for awkward and error-prone code to save and restore the global variables.
This covers code like the following, which is very dubious but does not seem to be clearly prohibited by the standards:
int main(void) {
void *vp;
*vp;
}
Previously, this would do an indirect load of a four-byte value at the location, but then treat it as void. This could lead to the four-byte value being left on the stack, eventually causing a crash. Now we just evaluate the pointer expression (in case it has side effects), but effectively cast it to void without dereferencing it.
This was non-standard in various ways, mainly in regard to pointer types. It has been rewritten to closely follow the specification in the C standards.
Several helper functions dealing with types have been introduced. They are currently only used for ? :, but they might also be useful for other purposes.
New tests are also introduced to check the behavior for the ? : operator.
This fixes#35 (including the initializer-specific case).
This affects expressions like &*a (where a is an array) or &*"string". In most contexts, these undergo array-to-pointer conversion anyway, but as an operand of sizeof they do not. This leads to sizeof either giving the wrong value (the size of the array rather than of a pointer) or reporting an error when the array size is not recorded as part of the type (which is currently the case for string constants).
In combination with an earlier patch, this fixes#8.
The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep.
Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.
Previously, one-byte loads were typically done by reading a 16-bit value and then masking off the upper 8 bits. This is a problem when accessing softswitches or slot IO locations, because reading the subsequent byte may have some undesired effect. Now, ORCA/C will do an 8-bit read for such cases, if the volatile qualifier is used.
There were also a couple optimizations that could occasionally result in not all the bytes of a larger value actually being read. These are now disabled for volatile loads that may access softswitches or IO.
These changes should make ORCA/C more suitable for writing low-level software like device drivers.
This affects field selection expressions where the left expressions is a struct/union assignment or a function call returning a struct or union. Such expressions should be accepted, but they were giving spurious errors.
The following program illustrates the problem:
struct S {int a,b;} x, y={2,3};
struct S f(void) {
struct S s = {7,8};
return s;
}
int main(void) {
return f().a + (x=y).b;
}
Compound literals outside of functions should work at this point.
Compound literals inside of functions are not fully implemented, so they are disabled for now. (There is some code to support them, but the code to actually initialize them at the appropriate time is not written yet.)
These rules are used if loose type checks are disabled. They are intended to strictly implement the constraints in C17 sections 6.5.9 and 6.5.10.
This patch also fixes a bug where object pointer comparisons to "const void *" should be permitted but were not.
This causes an error to be produced when trying to assign to these fields, which was being allowed before. It is also necessary for correct behavior of _Generic in some cases.
These are needed to correctly distinguish pointer types in _Generic. They should also be used for type compatibility checks in other contexts, but currently are not.
This also fixes a couple small problems related to type qualifiers:
*restrict was not allowed to appear after * in type-names
*volatile status was not properly recorded in sym files
Here is an example of using _Generic to distinguish pointer types based on the qualifiers of the pointed-to type:
#include <stdio.h>
#define f(e) _Generic((e),\
int * restrict *: 1,\
int * volatile const *: 2,\
int **: 3,\
default: 0)
#define g(e) _Generic((e),\
int *: 1,\
const int *: 2,\
volatile int *: 3,\
default: 0)
int main(void) {
int * restrict * p1;
int * volatile const * p2;
int * const * p3;
// should print "1 2 0 1"
printf("%i %i %i %i\n", f(p1), f(p2), f(p3), f((int * restrict *)0));
int *q1;
const int *q2;
volatile int *q3;
const volatile int *q4;
// should print "1 2 3 0"
printf("%i %i %i %i\n", g(q1), g(q2), g(q3), g(q4));
}
Here is an example of a problem resulting from volatile not being recorded in sym files (if a sym file was present, the read of x was lifted out of the loop):
#pragma optimize -1
static volatile int x;
#include <stdio.h>
int main(void) {
int y;
for (unsigned i = 0; i < 100; i++) {
y = x*2 + 7;
}
}
This should be allowed, but it previously could lead to spurious errors in contexts like argument lists, where a comma would normally be expected to end the expression.
The following example program demonstrated the problem:
#include <stdlib.h>
int main(void) {
return abs(1 ? 2,-3 : 4);
}
For now, this is only used for _Generic expressions. Eventually, it should probably replace the current CompTypes, but CompTypes currently performs somewhat looser checks that are suitable for some situations, so adjustments would be needed at some call sites.