If one step of peephole optimization produced code that can be further optimized with more peephole optimizations, that additional optimization was not always done. This makes sure the additional optimization is done in several such cases.
This was particularly likely to affect functions containing asm blocks (because CheckLabels would never trigger rescanning in them), but could also occur in other cases.
Here is an example affected by this (generating inefficient code to load a[1]):
#pragma optimize 1
int a[10];
void f(int x) {}
int main(int argc, char **argv) {
if (argc) return 0;
f(a[1]);
}
This differs from the usual ORCA/C behavior of treating all floating-point parameters as extended. With the option enabled, they will still be passed in the extended format, but will be converted to their declared type at the start of the function. This is needed for strict standards conformance, because you should be able to take the address of a parameter and get a usable pointer to its declared type. The difference in types can also affect the behavior of _Generic expressions.
The implementation of this is based on ORCA/Pascal, which already did the same thing (unconditionally) with real/double/comp parameters.
The optimization could turn an unsigned comparison "x <= 0xFFFF" into "x < 0".
Here is an example affected by this:
int main(void) {
unsigned i = 1;
return (i <= 0xffff);
}
Previously, one-byte loads were typically done by reading a 16-bit value and then masking off the upper 8 bits. This is a problem when accessing softswitches or slot IO locations, because reading the subsequent byte may have some undesired effect. Now, ORCA/C will do an 8-bit read for such cases, if the volatile qualifier is used.
There were also a couple optimizations that could occasionally result in not all the bytes of a larger value actually being read. These are now disabled for volatile loads that may access softswitches or IO.
These changes should make ORCA/C more suitable for writing low-level software like device drivers.
There were several existing optimizations that could change behavior in ways that violated the IEEE standard with regard to infinities, NaNs, or signed zeros. They are now gated behind a new #pragma optimize flag. This change allows intermediate code peephole optimization and common subexpression elimination to be used while maintaining IEEE conformance, but also keeps the rule-breaking optimizations available if desired.
See section F.9.2 of recent C standards for a discussion of how these optimizations violate IEEE rules.
This was a bug introduced in commit c95d8d9f9b.
Here is an example of an affected program:
#pragma optimize 1
#include <stdio.h>
int main(void) {
int i = 123;
double d = i;
printf("%f\n", d);
}
This could read and write a byte beyond the value being modified. This normally would not matter, but theoretically could in some cases involving concurrency.
The C standards generally allow floating-point operations to be done with extra range and precision, but they require that explicit casts convert to the actual type specified. ORCA/C was not previously doing that.
This patch relies on some new library routines (currently in ORCALib) to do this precision reduction.
This fixes#64.
The FENV_ACCESS pragma is now implemented. It causes floating-point operations to be evaluated at run time to the maximum extent possible, so that they can affect and be affected by the floating-point environment. It also disables optimizations that might evaluate floating-point operations at compile time or move them around calls to the <fenv.h> functions.
The FP_CONTRACT and CX_LIMITED_RANGE pragmas are also recognized, but they have no effect. (FP_CONTRACT relates to "contracting" floating-point expressions in a way that ORCA/C does not do, and CX_LIMITED_RANGE relates to complex arithmetic, which ORCA/C does not support.)
This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2.
This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.
The code of PeepHoleOptimization is now big enough that it triggers bogus "Relative address out of range" range errors from the linker. This is a linker bug and should be fixed there.
When an expression that the intermediate code peephole optimizer could reduce to a constant was cast to a char type, the resulting value could be outside the range of that type.
The following program illustrates the problem:
#pragma optimize 1
#include <stdio.h>
int main(void) {
int i = 0;
i = (unsigned char)(i | -1);
printf("%i\n", i);
}
This was previously happening in intermediate code peephole optimization.
The following example program demonstrates the problem:
#pragma optimize 1
int main(void) {
int i = 0;
long j = 0;
++i | -1;
++i & 0;
++j | -1;
++j & 0;
return i+j; /* should be 4 */
}
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
This generalizes the heuristic approach for checking whether _Noreturn functions could execute to the end of the function, extending it to apply to any function with a non-void return type. These checks use the same #pragma lint bit but give different messages depending on the situation.
This uses a heuristic that may produce both false positives and false negatives, but any false positives should reflect extraneous code at the end of the function that is not actually reachable.
In certain rare cases, constant subexpression elimination could set the left subtree of a pc_bno operation in the intermediate code to nil. This could lead to null pointer dereferences, sometimes resulting in a crash or error during native code generation.
The below program sometimes demonstrates the problem (dependent on zero page contents):
#pragma optimize 16
struct F {int *p;};
void foo(struct F* f)
{
struct {int c;} s = {0};
++f->p;
s.c |= *--f->p;
}
This could happen because the left subexpression does not produce a result for use in the enclosing expression, and therefore is not of the form expected by the CSE code.
The following program (derived from a csmith-generated test case) illustrates the problem:
#pragma optimize 16
int main(void) {
int i;
i, (i, 1);
}
This problem could lead to crashes in code like the following (derived from a csmith-generated test case):
#pragma optimize 1
int main (void)
{
if (1L) ;
}
This problem could lead to crashes in code like the following (derived from a csmith-generated test case):
#pragma optimize 1
static int main(void) {
long i = 2;
(long)(i > 1);
}
The previous code may have been intended to convert this to a "!=0" test, which would have been valid if correctly implemented, but with the current code generator that actually yields worse code than the original version, so for now I just removed the optimization for this case.
This problem could lead to crashes in code like the following (derived from a csmith-generated test case):
#pragma optimize 1
int main(int argc, char *argv[]){
long l_57 = argc;
return (4 ^ l_57) && 6;
}
This affected comparisons of the form "logical operation or comparison == constant other than 0 or 1". These should always evaluate to 0 (false), but could mis-evaluate to true due to the bad optimization.
The following program gives an example showing the problem:
#pragma optimize 1
int main(void) {
int i = 0, j = 42;
return (i || j) == 123;
}
Such subexpressions are not of the right form to work with the existing code, because they do not generate a value for use in the enclosing expression. For now, the code has been changed to simply not remove the subexpression in these cases. Alternative code could be written to make it work, but that might be more trouble than it's worth.
Here's an example that shows the problem (derived from a csmith-generated test case):
#pragma optimize 32+1 /* also had a problem with just 32 */
int main(void) {
int x, y=10; /* also had problems if x was global */
do {
x=42, y-=1;
} while (y);
return x+y;
}
These mainly related to situations where the optimization of multiple natural loops (including those created by continue statements) could interact to generate invalid results. Invalid optimizations could also be performed in certain other cases where there were multiple goto statements targeting a single label and at least one of them formed a loop.
These issues are addressed by appropriately adjusting the control flow and updating various data structures after each loop is processed during loop invariant removal.
This fixes#18 (compca18.c).
These cases should now always work when using an expression of type unsigned as the index. They will work in some cases but not others when using an int as the index: making those cases work consistently would require more extensive changes and/or a speed hit, so I haven't done it for now.
Note that this now uses an "unsigned multiply" operation for all 16-bit index computations. This should actually work even when the index is a negative signed value, because it will wind up producing (the low-order 16 bits of) the right answer. The signed multiply, on the other hand, generally does not produce the low-order 16 bits of the right answer in cases where it overflows.
The following program is an example that was miscompiled (both with and without optimization):
int c[20000] = {3};
int main(void) {
int *p;
unsigned i = 17000;
p = c + 17000u;
return *(p-i); /* should return 3 */
}