The motivation for this is that allocating and clearing symbol tables is a common operation, especially with C99+, where a construct like "if (...) { ... }" involves three levels of scope with their own symbol tables. In some tests, it could take an appreciable fraction of total execution time (sometimes ~10%).
This patch allows symbol tables that have already been allocated and cleared to be reused for a subsequent scope, as long as they are still empty. It does this by maintaining a pool of empty symbol tables and taking one from there rather than allocating a new one when possible.
We impose a somewhat arbitrary limit of MaxBlock/150000 on the number of symbol tables we keep, to avoid filling up memory with them. It would probably be better to use purgeable handles here, but that would be a little more work, and this should be good enough for now.
This mostly implements the rule in C17 6.9 p3, which requires a definition to be provided only if the function is used in an expression. Per that rule, we should also exclude most sizeof or _Alignof operands, but we don't do that yet.
This would previously happen if a segment directive with "dynamic" appeared before the first function in the program. That would cause the resulting program not to work, because the root segment needs to be a static segment at the start of the program, but if it is dynamic it would come after a jump table and a static segment of library code.
The root segments are also configured to refer to main or the NDA/CDA entry points using LEXPR records, so that they can be in dynamic segments (not that they necessarily should be). That change is intentionally not done for CDEV/XCMD/NBA, because they use code resources, which do not support dynamic segments, so it is better to force a linker error in these cases.
This avoids needing to generate many intermediate code records representing the data at most 8 bytes at a time, which should reduce memory use and probably improve performance for large initialized arrays or structs.
Static initialization of arrays/structs/unions now essentially "executes" the initializer records to fill in a buffer (and keep track of relocations), then emits pcode to represent that initialized state. This supports overlapping and out-of-order initializer records, as can be produced by designated initialization.
This is a minimal implementation that does not actually inline anything, but it is intended to implement the semantics defined by the C99 and later standards.
One complication is that a declaration that appears somewhere after the function body may create an external definition for a function that appeared to be an inline definition when it was defined. To support this while preserving ORCA/C's general one-pass compilation strategy, we generate code even for inline definitions, but treat them as private and add the prefix "~inline~" to the name. If they are "un-inlined" based on a later declaration, we generate a stub with external linkage that just jumps to the apparently-inline function.
If the extern declaration refers to a global variable/function for which a declaration is already visible, the inner declaration should have the composite type (and it is an error if the types are incompatible).
This affects programs like the following:
static char a[60] = {5};
int main(void) {
extern char a[];
return sizeof(a)+a[0]; /* should return 65 */
}
Function declarations within a block are now entered within its symbol table rather than moved to the global one. Several error checks are also added or tightened.
This fixes at least one bug: if a function declared within a block had the same name as a variable in an outer scope, the symbol table entry for that variable could be corrupted, leading to spurious errors or incorrect code generation. This example program illustrates the problem:
/* This should compile without errors and return 2 */
int f(void) {return 1;}
int g(void) {return 2;}
int main(void) {
int (*f)(void) = g;
{
int f(void);
}
f = g;
return f();
}
Errors now detected include:
*Duplicate declarations of a static variable within a block (with the second one initialized)
*Duplicate declarations of the same variable as static and non-static
*Declaration of the same identifier as a typedef and a variable (at file scope)
Previously, it generally just used the later type (except for function types where only the earlier one included a prototype). One effect of this is that if a global array is first declared with a size and then redeclared without one, the size information is lost, causing the proper space not to be allocated.
See C17 section 6.2.7 p4.
Here is an example affected by the array issue (dump the object file to see the size allocated):
int foo[50];
int foo[];
This seemed to be aimed at supporting lazy allocation of symbol tables. That could be a useful optimization, but the code that existed was incomplete and did not do anything useful. That or similar code could be reintroduced as part of a full implementation of lazy allocation, if it is ever done.
Note that this implementation allows anonymous structures and unions to participate in initialization. That is, you can have a braced initializer list corresponding to an anonymous structure or union. Also, anonymous structures within unions follow the initialization rules for structures (and vice versa).
I think the better interpretation of the standard text is that anonymous structures and unions cannot participate in initialization as such, and instead their members are treated as members of the containing structure or union for purposes of initialization. However, all other compilers I am aware of allow anonymous structures and unions to participate in initialization, so I have implemented it that way too.
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
This was non-standard in various ways, mainly in regard to pointer types. It has been rewritten to closely follow the specification in the C standards.
Several helper functions dealing with types have been introduced. They are currently only used for ? :, but they might also be useful for other purposes.
New tests are also introduced to check the behavior for the ? : operator.
This fixes#35 (including the initializer-specific case).
The parameters of the underlying function type were not being reversed when applying the "pascal" qualifier to a function pointer type. This resulted in the parameters not being in the expected order when a call was made using such a function pointer. This could result in spurious errors in some cases or inappropriate parameter conversions in others.
This fixes#75.
This was broken by the varargs changes in commit a20d69a211. The code was not accounting for the internal representation of the parameters being in reverse order, so it was basing address calculations on the first fixed parameter rather than the last one, resulting in the wrong number of bytes being removed from the stack (generally causing a crash).
This affected the c99stdio.c test case, and is now also covered in c99stdarg.c.
In the new implementation, variable arguments are not removed until the end of the function. This allows variable argument processing to be restarted, and it prevents the addresses of local variables from changing in the middle of the function. The requirement to turn off stack repair code around varargs functions is also removed.
This fixes#58.
We previously ignored this, but it is a constraint violation under the C standards, so it should be reported as an error.
GCC and Clang allow this as an extension, as we were effectively doing previously. We will follow the standards for now, but if there was demand for such an extension in ORCA/C, it could be re-introduced subject to a #pragma ignore flag.
This gives a clearer pattern of matching ORCA/C's historical behavior if loose type checks are used, and the documentation is updated accordingly.
It also avoids breaking existing code that may be relying on the old behavior. I am aware of at least one place that does (conflicting declarations of InstallNetDriver in GNO's <gno/gno.h>).
This is a change that was introduced in C17. However, it actually keeps things closer to ORCA/C's historical behavior, which generally ignored qualifiers in return types.
These will check that the prototypes (if present) match in number and types of arguments. This primarily affects operations on function pointers, since similar checks were already done elsewhere on function declarations themselves.
This causes an error to be produced when trying to assign to these fields, which was being allowed before. It is also necessary for correct behavior of _Generic in some cases.
These are needed to correctly distinguish pointer types in _Generic. They should also be used for type compatibility checks in other contexts, but currently are not.
This also fixes a couple small problems related to type qualifiers:
*restrict was not allowed to appear after * in type-names
*volatile status was not properly recorded in sym files
Here is an example of using _Generic to distinguish pointer types based on the qualifiers of the pointed-to type:
#include <stdio.h>
#define f(e) _Generic((e),\
int * restrict *: 1,\
int * volatile const *: 2,\
int **: 3,\
default: 0)
#define g(e) _Generic((e),\
int *: 1,\
const int *: 2,\
volatile int *: 3,\
default: 0)
int main(void) {
int * restrict * p1;
int * volatile const * p2;
int * const * p3;
// should print "1 2 0 1"
printf("%i %i %i %i\n", f(p1), f(p2), f(p3), f((int * restrict *)0));
int *q1;
const int *q2;
volatile int *q3;
const volatile int *q4;
// should print "1 2 3 0"
printf("%i %i %i %i\n", g(q1), g(q2), g(q3), g(q4));
}
Here is an example of a problem resulting from volatile not being recorded in sym files (if a sym file was present, the read of x was lifted out of the loop):
#pragma optimize -1
static volatile int x;
#include <stdio.h>
int main(void) {
int y;
for (unsigned i = 0; i < 100; i++) {
y = x*2 + 7;
}
}
For now, this is only used for _Generic expressions. Eventually, it should probably replace the current CompTypes, but CompTypes currently performs somewhat looser checks that are suitable for some situations, so adjustments would be needed at some call sites.
This gives the name of the current function, as if the following definition appeared at the beginning of the function body:
static const char __func__[] = "function-name";
For now, "long long" is represented with the existing code for the SANE comp format, since their representation is the same except for the comp NaN. This allows existing debuggers that support comp to work with it. The code for "unsigned long long" includes the unsigned flag, so it is unambiguous.
This is contrary to the C standards, but ORCA/C historically permitted it (as do some other compilers), and I think there is a fair amount of existing code that relies on it.
These are initially entered into the symbol table with no known type (itype = nil), so this case should be accounted for in NewSymbol.
This typically would not cause a problem, but might if the zero page contained certain values
const structs are wrapped in definedType. The debugger symbol table code is unaware of this, which results in missing or incomplete entries.
example:
const struct { int a; int b; } cs;
cs: isForwardDeclared = false; class = ident
4 byte constant defined type of
4 byte struct: 223978
const struct { int a; int b; } *pcs;
pcs: isForwardDeclared = false; class = ident
4 byte pointer to
4 byte constant defined type of
4 byte struct: 224145
const struct { const struct { const int a; } a[2]; } csa[5];
csa: isForwardDeclared = false; class = ident
20 byte 5 element array of
4 byte constant defined type of
4 byte struct: 225155
const struct { const struct { const int a; } a[2]; } *cspa[5];
cspa: isForwardDeclared = false; class = ident
20 byte 5 element array of
4 byte pointer to
4 byte constant defined type of
4 byte struct: 224850
This change unwraps the definedType so the underlying type info can be placed in the debugger symbol table.