Right now, decimal constants can have long long types based on their suffix, but they are still limited to a maximum value of 2^32-1.
This also implements the C99 change where decimal constants without a u suffix always have signed types. Thus, decimal constants of 2^31 and up now have type long long, even if their values could be represented in the type unsigned long.
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
This affects command lines like:
cmpl myprog.c cc=(-da=+) ...
Previously, this would be accepted, but a was actually defined to 0 rather than +.
Now, this gives an error, consistent with other tokens that are not supported in such definitions on the command line. (Perhaps we should support definitions using any tokens, but that would require bigger code changes.)
This also cleans up some related code to avoid possible null-pointer dereferences.
For now, "long long" is represented with the existing code for the SANE comp format, since their representation is the same except for the comp NaN. This allows existing debuggers that support comp to work with it. The code for "unsigned long long" includes the unsigned flag, so it is unambiguous.
This affects cases where the floating value, truncated to an integer, is outside the range of the destination type. Previously, the result value might appear to be an int value outside the range of the character type.
These situations are undefined behavior under the C standards, so this was not technically a bug, but the new behavior is less surprising. (Note that it still may not raise the "invalid" floating-point exception in some cases where Annex F would call for that.)
The basic issue with all of these is that they failed to sign-extend the 8-bit signed char value to the full 16-bit A register. This could make certain operations on negative signed char values appear to yield positive values outside the range of signed char.
The following example code demonstrates the problems:
#include <stdio.h>
signed char f(void) {return -50;}
int main(void) {
long l = -123;
int i = -99;
signed char sc = -47;
signed char *scp = ≻
printf("%i\n", (signed char)l);
printf("%i\n", (signed char)i);
printf("%i\n", f());
printf("%i\n", (*scp)++);
printf("%i\n", *scp = -32);
}
There are several conversions that do not set the necessary flags, so they must be set separately before doing a comparison. Without this fix, comparisons of a value that was just converted might be mis-evaluated.
This led to bugs where the wrong side of an "if" could be followed in some cases, as in the below examples:
#include <stdio.h>
int g(void) {return 50;}
signed char h(void) {return 50;}
long lf(void) {return 50;}
int main(void) {
signed char sc = 50;
if ((int)(signed char)g()) puts("OK1");
if ((int)h()) puts("OK2");
if ((int)sc) puts("OK3");
if ((int)lf()) puts("OK4");
}
These would generally not work correctly on bit-fields, or on floating-point values that were in a structure or were accessed via a pointer.
The below program is an example that would demonstrate problems:
#include <stdio.h>
int main(void) {
struct {
signed int i:7;
unsigned long int j:6;
_Bool b:1;
double d;
} s = {-10, -20, 0, 5.0};
double d = 70.0, *dp = &d;
printf("%i\n", (int)s.i++);
printf("%i\n", (int)s.i--);
printf("%i\n", (int)++s.i);
printf("%i\n", (int)--s.i);
printf("%i\n", (int)s.i);
printf("%i\n", (int)s.j++);
printf("%i\n", (int)s.j--);
printf("%i\n", (int)++s.j);
printf("%i\n", (int)--s.j);
printf("%i\n", (int)s.j);
printf("%i\n", s.b++);
printf("%i\n", s.b--);
printf("%i\n", ++s.b);
printf("%i\n", --s.b);
printf("%i\n", s.b);
printf("%f\n", s.d++);
printf("%f\n", s.d--);
printf("%f\n", ++s.d);
printf("%f\n", --s.d);
printf("%f\n", s.d);
printf("%f\n", (*dp)++);
printf("%f\n", (*dp)--);
printf("%f\n", ++*dp);
printf("%f\n", --*dp);
printf("%f\n", *dp);
}
This is contrary to the C standards, but ORCA/C historically permitted it (as do some other compilers), and I think there is a fair amount of existing code that relies on it.
This is normally 1 (indicating a hosted implementation, where the full standard library is available and the program starts by executing main()), but it is 0 if one of the pragmas for special types of programs with different entry points has been used.
This was not happening for declared identifiers (variables and functions) or for enum constants, as demonstrated in the following example:
enum {a,b,c};
#if b
#error "bad b"
#endif
int x = 0;
#if x
#error "bad x"
#endif
This generalizes the heuristic approach for checking whether _Noreturn functions could execute to the end of the function, extending it to apply to any function with a non-void return type. These checks use the same #pragma lint bit but give different messages depending on the situation.
This uses a heuristic that may produce both false positives and false negatives, but any false positives should reflect extraneous code at the end of the function that is not actually reachable.
We now insert spaces corresponding to whitespace between tokens, and string tokens are enclosed in quotes.
There are still issues with (at least) escape sequences in strings and comments between tokens.