These were previously treated as having type int. This resulted in incorrect results from sizeof, and would also be a problem for _Generic if it was implemented.
Note that this creates a token kind of "charconst", but this is not the kind for character constants in the source code. Those have type int, so their kind is intconst. The new kinds of "tokens" are created only through casts of constant expressions.
The FENV_ACCESS pragma is now implemented. It causes floating-point operations to be evaluated at run time to the maximum extent possible, so that they can affect and be affected by the floating-point environment. It also disables optimizations that might evaluate floating-point operations at compile time or move them around calls to the <fenv.h> functions.
The FP_CONTRACT and CX_LIMITED_RANGE pragmas are also recognized, but they have no effect. (FP_CONTRACT relates to "contracting" floating-point expressions in a way that ORCA/C does not do, and CX_LIMITED_RANGE relates to complex arithmetic, which ORCA/C does not support.)
This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2.
This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.
Note that we currently defer evaluation of such expressions to run time if the long long value cannot be represented exactly in a double, because statically-evaluated floating point expressions use the double format rather than the extended (long double) format used at run time.
Right now, decimal constants can have long long types based on their suffix, but they are still limited to a maximum value of 2^32-1.
This also implements the C99 change where decimal constants without a u suffix always have signed types. Thus, decimal constants of 2^31 and up now have type long long, even if their values could be represented in the type unsigned long.
Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.
This affects command lines like:
cmpl myprog.c cc=(-da=+) ...
Previously, this would be accepted, but a was actually defined to 0 rather than +.
Now, this gives an error, consistent with other tokens that are not supported in such definitions on the command line. (Perhaps we should support definitions using any tokens, but that would require bigger code changes.)
This also cleans up some related code to avoid possible null-pointer dereferences.
This is contrary to the C standards, but ORCA/C historically permitted it (as do some other compilers), and I think there is a fair amount of existing code that relies on it.
This is normally 1 (indicating a hosted implementation, where the full standard library is available and the program starts by executing main()), but it is 0 if one of the pragmas for special types of programs with different entry points has been used.
This generalizes the heuristic approach for checking whether _Noreturn functions could execute to the end of the function, extending it to apply to any function with a non-void return type. These checks use the same #pragma lint bit but give different messages depending on the situation.
This uses a heuristic that may produce both false positives and false negatives, but any false positives should reflect extraneous code at the end of the function that is not actually reachable.
We now insert spaces corresponding to whitespace between tokens, and string tokens are enclosed in quotes.
There are still issues with (at least) escape sequences in strings and comments between tokens.
Currently, this only flags return statements, not cases where they may execute to the end of the function. (Whether the function will actually return is not decidable in general, although it may be in special cases).
This currently checks for:
*Calls to undefined functions (same as bit 0)
*Parameters not declared in K&R-style function definitions
*Declarations or type names with no type specifiers (includes but is broader than the condition checked by bit 1)
Previously, the designated initializer syntax could confuse the parser enough to cause null pointer dereferences. This avoids that, and also gives a more meaningful error message to the user.
In the #pragma lint line, the integer indicating the checks to perform can now optionally be followed by a semicolon and another integer. If these are present and the second integer is 0, then the lint checks will be performed, but will be treated as warnings rather than errors, so that they do not cause compilation to fail.
These were previously allowed in some cases, but not as the last argument to a macro. Also, stringization and concatenation of them did not behave according to the standards.
In combination with earlier patches, this fixes#53.
Also, if the lint flag requiring explicit function types is set, then also require that K&R-style parameters be explicitly declared with types, rather than not being declared and defaulting to int. (This is a requirement in C99 and later.)
Previously, these would report "identifier expected"; now they correctly say "')' expected".
This introduces a new UnexpectedTokenError procedure that can be used more generally for cases where the expected token may differ based on context.
_Thread_local is recognized but gives a "not supported" error. It could arguably be 'supported' trivially by saying the execution of an ORCA/C program is just one thread and so no special handling is needed, but that likely isn't what someone using it would expect.
There would be a possible issue if a "static" or "typedef" storage class specifier occurred after a type specifier that required memory to be allocated for it, because that memory conceptually might be in the local pool, but static objects are processed at the end of the translation unit, so their types need to stick around. In practice, this should not occur, because the local pool isn't currently used for much (in particular, not for statements or declarations in the body of a function). We give an error in case this somehow might occur.
In combination with preceding commits, this fixes#14. Declaration specifiers can now appear in any order, as required by the C standards.
_Bool, _Complex, _Imaginary, _Atomic, restrict, and _Alignas are now recognized in types, but all except restrict and _Alignas will give an error saying they are not supported.
This also introduces uniform definitions of the syntactic classes of tokens that can be used in declaration specifiers and related constructs (currently used in some places but not yet in others).
Type specifiers and type qualifiers can now appear in any order, as specified by the C standards. However, storage class specifiers and function specifiers still cannot be freely mixed with them.
Specifically, the following six punctuator tokens are now supported:
<: :> <% %> %: %:%:
These behave the same as the existing tokens [, ], {, }, #, and ## (respectively), apart from their spelling.
This can be useful when the full ASCII character set cannot easily be displayed or input (e.g. on the IIgs text screen with certain language settings).
Specifically, the following will now be tokenized as keywords:
_Alignas
_Alignof
_Atomic
_Bool
_Complex
_Generic
_Imaginary
_Noreturn
_Static_assert
_Thread_local
restrict
('inline' was also added as a standard keyword in C99, but ORCA/C already treated it as such.)
The parser currently has no support for any of these keywords, so for now errors will still be generated if they are used, but this is a first step toward adding support for them.
This could happen in some very obscure cases like using these macros for the names of segments or include files. The fix is to just terminate precompiled header generation if they are encountered.
The issue was that invalid sym files could be generated if an #include is encountered within an #if or #ifdef block in the main source file. The fix (for now) is to simply terminate precompiled header generation if such an #include is encountered.
Fixes#2.
This makes something like the following work:
#define STDIO_H <stdio.h>
#include STDIO_H
It didn't previously, because workString would be overwritten by NextToken. The effect in this case was that it would erroneously try to include the header <hh>, rather than <stdio.h>.
Detected based on a couple programs from FizzBuzz-C.
This could happen, e.g., for a "'}' expected" error at end-of-file. It occurred because the 0..maxint type being used caused the Pascal compiler to use unsigned comparisons, which were inappropriate here.
Previously, the error markers would generally be misaligned in this case, because a tab would expand to no spaces (in ORCA/Shell) or multiple spaces (in most other environments), but the error-printing code would use a single space to try to line up with it.
The solution adopted is just to print tabs in the error lines at the positions where they occur in the source lines. The actual amount of space displayed will depend on the console being used, but in any case it should line up correctly with the source line.
This adds lint bit 5 (a value of 32), which currently enables checking for the following conditions:
*Integer overflow from arithmetic in constant expressions (currently only of type int).
*Invalid constant shift counts (negative, or >= the width of the type)
*Division by (constant) zero.
These (mainly the first two) can be indicative of code that was designed for larger type sizes and needs changes to support 16-bit int.
Mainly, this causes the messages from the format checker to be displayed after the relevant line is printed, along with any other error messages. The wording and formatting of some of the messages is also slightly adjusted, but there should be no substantive change in what is warned about.
Previously, the characters ", /, and ? within string literals were not escaped in #pragma expand output, which could result in them being erroneously interpreted as ending the string literal, starting an escape sequence, or being part of a trigraph (respectively). Also, escape sequences were output in hexadecimal format. Since there is no length limit on hexadecimal escape sequences, this could result in subsequent characters in the string being interpreted as part of the escape sequence.
This fixes the issues by escaping the characters ", /, and ?, and by using three-digit octal escape sequences rather than hexadecimal ones.
commit 4265329097538640e9e21202f1b141bcd42a44f3
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 21:45:32 2018 -0400
indent to match standard indent.
commit 783518fbeb01d2df43ef2083d3341004c05e4e2e
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 20:21:15 2018 -0400
clean up the typenames
commit 29b627ecf5ca9b8a143761f85a1807a6ca35ddd9
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 20:18:04 2018 -0400
enable feature_hh, warn about %n with non-int modifier.
commit fc4ac8129e3772c4eda36658e344ec475938369c
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 15:13:47 2018 -0400
warn thar %lc, %ls, etc are unsupported.
commit 7e6b433ba0552f7e52f0f034d398e9195c764326
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 13:36:25 2018 -0400
warn about hh/ll modifier (if not supported)
commit 1943c9979d0013f9f38045ec04a962fbf0269f31
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Fri Mar 23 11:42:41 2018 -0400
use error facilities for format errors.
commit 7811168f56dca1387055574ba8d32638da2fad96
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Thu Mar 22 15:34:21 2018 -0400
add feature flags to disable c99 enhancements until orca lib is updated.
commit c2149cc5953155cfc3c3b4d0483cd25fb946b055
Author: Kelvin Sherlock <ksherlock@gmail.com>
Date: Thu Mar 22 08:59:10 2018 -0400
Add printf/scanf format checking [WIP]
This parses out the xprintf / xscanf format string and compares it with the function arguments.
enabled via #pragma lint 16.
This should implement the C standard rules about making macro name tokens ineligible for replacement, except that it does not currently handle cases of nested replacements (such as a cycle of mutually-referential macros).
This fixes#12. There are still a couple other bugs with macro expansion in obscure cases, but I'll consider them separate issues.
This is what is required by the C standards.
This partially reverts a change in ORCA/C 2.1.0, which should only have been applied to hexadecimal escape sequences.
This is based on a patch from Kelvin Sherlock, and in turn on code from MPW IIgs ORCA/C, but with modifications to be more standards-compliant.
Bit 1 in #pragma ignore controls a new option to (non-standardly) treat character constants with three or more characters as having type long, so they can contain up to four bytes.
Note that this patch orders the bytes the opposite way from MPW IIgs ORCA/C, but the same way as GCC and Clang.
These are enabled when bit 15 is set in the #pragma debug directive.
Support is still needed to ensure these work properly with pre-compiled headers.
This patch is from Kelvin Sherlock.
The C standards define "pp-number" tokens handled by the preprocessor using a syntax that encompasses various things that aren't valid integer or floating constants, or are constants too large for ORCA/C to handle. These cases would previously give errors even in code skipped by the preprocessor. With this patch, most such errors in skipped code are now ignored.
This is useful, e.g., to allow for #ifdefed-out code containing 64-bit constants.
There are still some cases involving pp-numbers that should be allowed but aren't, particularly in the context of macros.
This should give C99-compatible behavior, as far as it goes. The functions aren't actually inlined, but that's just a quality-of-implementation issue. No C standard requires actual inlining.
Non-static inline functions are still not supported. The C99 semantics for them are more complicated, and they're less widely used, so they're a lower priority for now.
The "inline" function specifier can currently only come after the "static" storage class specifier. This relates to a broader issue where not all legal orderings of declaration specifiers are supported.
Since "inline" was already treated as a keyword in ORCA/C, this shouldn't create any extra compatibility issues for C89 code.
This allows code like the following to compile:
#if 0
#if some bogus stuff !
#endif
#endif
This is what the C standards require. The change affects #if, #ifdef, and #ifndef directives.
This may be needed to handle code targeted at other compilers that allow pseudo-functions such as "__has_feature" in preprocessor expressions.
This avoids various problems with inappropriately processing these elements, which should not be recognized as such at preprocessing time. For example, the following program should compile without errors, but did not:
typedef long foo;
#if int+1
#if foo-1
int main(void) {}
#endif
#endif
This is a problem introduced by the scanner changes between ORCA/C 2.1.0 and ORCA/C 2.1.1 B3.
The following examples demonstrate the problem:
#define m m
m
#define f(x) f(x)
f(a)
The following demonstrates cases that would erroneously be allowed (and misleadingly give a size of 0) before:
#include <stdio.h>
struct s *S;
int main(void)
{
printf("%lu %lu\n", sizeof(struct s), sizeof *S);
}
ORCA/C previously allowed struct/union members to be declared with incomplete type. Because of this, it allowed C99-style flexible array members to be declared, albeit by accident rather than by design. In some basic testing, these seem to work correctly, except that they could be initialized and that would give rise to odd behavior.
I have restricted it to allowing flexible array members only in the cases allowed by C99/C11, and otherwise disallowing members with incomplete type. I have also prohibited initializing flexible array members.
This may be someone trying to use a C11-style anonymous struct/union, which should be flagged as an error until and unless those are supported. Otherwise, it probably just indicates that the programmer is confused. In any case, an error should be flagged for it.
C89 restricts bit fields to (signed) int and unsigned int only, although later standards note that additional types may be supported. ORCA/C supports the other integer types as an extension.
This fixes the compco01.c test case.
This occurred because the code to handle the function-like macro use would read the following token, which could prompt processing of the following preprocessing directive in an inappropriate context.
The following example illustrates the problem (the error message would be printed):
#define A()
#define FOO 1
A()
#if !FOO
# error "shouldn't get here"
#endif