Commit Graph

592 Commits

Author SHA1 Message Date
Stephen Heumann
21f266c5df Require use of digraphs in macro redefinitions to match the original.
This is part of the general requirement that macro redefinitions be "identical" as defined in the standard.

This affects code like:

#define x [
#define x <:
2022-04-05 19:47:22 -05:00
Stephen Heumann
a1d57c4db3 Allow ORCA/C-specific keywords to be disabled via a new pragma.
This allows those tokens (asm, comp, extended, pascal, and segment) to be used as identifiers, consistent with the C standards.

A new pragma (#pragma extensions) is introduced to control this. It might also be used for other things in the future.
2022-03-26 18:45:47 -05:00
Stephen Heumann
b2edeb4ad1 Properly stringize tokens that start with a trigraph.
This did not work correctly before, because such tokens were recorded as starting with the third character of the trigraph.

Here is an example affected by this:

#define mkstr(a) # a
#include <stdio.h>
int main(void) {
        puts(mkstr(??!));
        puts(mkstr(??!??!));
        puts(mkstr('??<'));
        puts(mkstr(+??!));
        puts(mkstr(+??'));
}
2022-03-25 18:10:13 -05:00
Stephen Heumann
f531f38463 Use suffixes on numeric constants in #pragma expand output.
A suffix will now be printed on any integer constant with a type other than int, or any floating constant with a type other than double. This ensures that all constants have the correct types, and also serves as documentation of the types.
2022-03-01 19:46:14 -06:00
Stephen Heumann
182cf66754 Properly stringize tokens with line continuations or non-initial trigraphs.
Previously, continuations or trigraphs would be included in the string as-is, which should not be the case because they are (conceptually) processed in earlier compilation phases. Initial trigraphs still do not get stringized properly, because the token starting position is not recorded correctly for them.

This fixes code like the following:

#define mkstr(a) # a
#include <stdio.h>
int main(void) {
        puts(mkstr(a\
bc));
        puts(mkstr(qr\
));
        puts(mkstr(\
xy));
        puts(mkstr(12??/
34));
        puts(mkstr('??<'));
}
2022-03-01 19:01:11 -06:00
Stephen Heumann
fec7b57ec2 Generate a string representation of tokens merged with ##.
This is necessary for correct behavior if such tokens are subsequently stringized with #. Previously, only the first half of the token would be produced.

Here is an example demonstrating the issue:

#define mkstr(a) # a
#define in_between(a) mkstr(a)
#define joinstr(a,b) in_between(a ## b)
#include <stdio.h>
int main(void) {
        puts(joinstr(123,456));
        puts(joinstr(abc,def));
        puts(joinstr(dou,ble));
        puts(joinstr(+,=));
        puts(joinstr(:,>));
}
2022-02-22 18:48:34 -06:00
Stephen Heumann
6cfe8cc886 Remove an unused string representation of macro tokens.
The string representation of macro tokens is needed for some preprocessor operations, but we get this in other ways (e.g. based on tokenStart/tokenEnd).
2022-02-21 18:39:39 -06:00
Stephen Heumann
8f27b8abdb Print any ## tokens in #pragma expand output.
Note that ## will not currently be recognized as a token in some contexts, leading to it not being printed.
2022-02-20 20:53:37 -06:00
Stephen Heumann
bf7a6fa5db Use separate functions for merging tokens with ## and merging adjacent strings.
These are conceptually separate operations occurring in different phases of the translation process. This change means that ## can no longer merge string constants: such operations will give an error about an illegal token. Cases like this are technically undefined behavior, so the old behavior could have been permitted, but it is clearer and more consistent with other compilers to treat this as an error.
2022-02-20 20:16:08 -06:00
Stephen Heumann
26e1bfc253 Allow generation of digraphs via ## token merging. 2022-02-20 18:57:03 -06:00
Stephen Heumann
2b062a8392 Make ## token merging on character constants give an error.
This ultimately should be supported, but that will be more work. For now, we just set the string representation to '?', which will usually give an error when merged. (Previously, whatever was at memory location 0 would be treated as the string representation of the token. Frequently this would just be an empty string, leading to no error but incorrect results.)
2022-02-20 16:19:00 -06:00
Stephen Heumann
da978932bf Save string representation of macros defined on command line.
This is necessary for correct operation of the # and ## preprocessor operators on the tokens from such macros.

Integers with a sign character still have the non-standard property of being treated as a single token, so they cannot be used with ##, but in most cases such uses will now give an error.
2022-02-20 15:35:49 -06:00
Stephen Heumann
2a9ec8fc43 Explicitly terminate PCH generation if there is an initialized variable.
Initialized variables have always been one of the things that stops PCH generation, but previously this was only detected when trying to write out the symbol records at the point of a later #include. On a subsequent compile using the sym file, nothing would recognize that PCH generation had stopped for this reason, so the PCH code would recognize the later #include as a potential opportunity to extend the sym file, and therefore would delete it to force regeneration next time. This led to the sym file being deleted and regenerated on alternate compiles, so its full benefit was not realized.

There is code in Header.pas to abort PCH generation if an initialized symbol is found. That is probably superfluous after this change, but it has been left in place for now.
2022-02-19 14:22:25 -06:00
Stephen Heumann
aabbadb34b Terminate header generation if #warning is encountered.
This is necessary to ensure that the warning message is printed on subsequent compiles.
2022-02-19 14:06:15 -06:00
Stephen Heumann
a73dce103b Terminate PCH generation if an #append is encountered.
If the appended file was another C file and that file contained an #include, this would create an invalid record in the sym file. It would record memory from the buffer holding the original file to the buffer holding the appended file. In general, these are not contiguous, so superfluous data from other parts of memory would be included in the sym file. This record would normally just be treated as invalid on subsequent compiles, but it could theoretically be very large (depending on the memory layout) and might contain sensitive data from other parts of memory.
2022-02-19 14:05:07 -06:00
Stephen Heumann
1e98a63bf4 Avoid generating duplicate "Including ..." messages.
This could happen if a header was saved in the sym file, but the sym file data was not actually used because the source code in the main file did not match what was saved.
2022-02-16 21:31:49 -06:00
Stephen Heumann
f2d6625300 Save #pragma path directives in sym files.
They were not being saved, which would result in ORCA/C not searching the proper paths when looking for an include file after the sym file had ended. Here is an example showing the problem:

#pragma path "include"
#include <stdio.h>
int k = 50;
#include "n.h" /* will not find include:n.h */
2022-02-15 21:27:35 -06:00
Stephen Heumann
30fcc7227f Tweak comments in Scanner.asm.
There are no code changes.
2022-02-15 20:51:16 -06:00
Stephen Heumann
3893db1346 Make sure #pragma expand is properly applied in all cases.
There were various places where the flag for macro expansions was saved, set to false, and then later restored. If #pragma expand was used within those areas, it would not be properly applied. Here is an example showing that problem:

void f(void
#pragma expand 1
) {}

This could also affect some uses of #pragma expand within precompiled headers, e.g.:

#pragma expand 1
#include "a.h"
#undef foobar
#include "b.h"
...

Also, add a note saying that code in precompiled headers will not be expanded. (This has always been the case, but was not clearly documented.)
2022-02-15 20:50:02 -06:00
Stephen Heumann
8c0d65616c Remove an unnecessary variable. 2022-02-13 21:36:03 -06:00
Stephen Heumann
c96cf4f1dd Do not save predefined and command-line macros in the sym file.
Previously, these might or might not be saved (based on the contents of uninitialized memory), but in many cases they were. This was unnecessary, since these macros are automatically defined when the scanner is initialized. Reading them from the sym file could result in duplicate copies of them in the macro list. This is usually harmless, but might result in #undefs of macros from the command line not working properly.
2022-02-13 20:17:33 -06:00
Stephen Heumann
b493dcb1da Add lint check to require whitespace after names of object-like macros.
This is a requirement added in C99, so it is added as part of the C99 syntax checks.

This affects definitions like:

#define foo;
2022-02-13 19:44:56 -06:00
Stephen Heumann
c169c2bf92 Fully prohibit redefinition of predefined macros.
Code like the following was previously being allowed:

#define __STDC__ /* no tokens */
2022-02-13 18:10:45 -06:00
Stephen Heumann
5d7c002819 Fix bug causing some #undefs to be ignored when using a sym file.
This would occur if the macro had already been saved in the sym file and the #undef occurred before a subsequent #include that was also recorded in the sym file. The solution is simply to terminate sym file generation if an #undef of an already-saved macro is encountered.

Here is an example showing the problem:

test.c:
#include "test1.h"
#undef x
#include "test2.h"

int main(void) {
#ifdef x
        return x;
#else
        return y;
#endif
}

test1.h:
#define x 27

test2.h:
#define y 6
2022-02-13 16:33:43 -06:00
Stephen Heumann
b231782442 Add option to use a custom pre-include file.
This is a file that will be included before the source file is processed. If specified, it is used instead of the default .h file.
2022-02-12 21:36:39 -06:00
Stephen Heumann
913a333f9f Record the cc= string in the symbol file and require it to match.
Macros and include paths from the cc= parameters may be included in the symbol file, so incorrect behavior could result if the symbol file was used for a later compilation with different cc= parameters.
2022-02-12 19:45:04 -06:00
Stephen Heumann
06e17cd8f5 Give an error if file names or command-line parameters are too long.
The source file name, keep name, NAMES= string, and cc= string are all restricted to 255 characters, but these limits were not previously enforced, and exceeding them could lead to strange behavior.
2022-02-12 15:42:15 -06:00
Stephen Heumann
bd811559d6 Fix issues with keep names in sym files.
There were a couple issues that could occur with #pragma keep and sym files:

*If a source file used #pragma keep but it was overridden by KEEP= on the command line or {KeepName} in the shell, then the overriding keep name would be saved to the sym file. It would therefore be applied to subsequent compilations even if it was no longer specified in the command line or shell variable.

*If a source file used #pragma keep, that keep name would be recorded in the sym file. On subsequent compilations, it would always be used, overriding any keep name specified by the command line or shell, contrary to the usual rule that the name on the command line takes priority.

With this patch, the keep name recorded in the sym file (if any) should always be the one specified by #pragma keep, but it can be overridden as usual.
2022-02-06 21:49:08 -06:00
Stephen Heumann
9cdf199c3a Clarify that sym files still need to be deleted when adding defaults.h.
The old wording made it sound like it applied only to .sym files generated by an old version of ORCA/C, but that is not the case.
2022-02-06 19:06:51 -06:00
Stephen Heumann
5f03dee66a Allow negated long long constants in cc= defines.
These are still treated as one token, like other negated numbers specified in cc=(-d...).
2022-02-06 15:33:42 -06:00
Stephen Heumann
efb363a04d Update a comment. 2022-02-06 15:08:04 -06:00
Stephen Heumann
7d4f923470 Improve error handling for cc= options on command line. 2022-02-06 14:24:22 -06:00
Stephen Heumann
785a6997de Record source file changes within a function as part of debug info.
This affects functions whose body spans multiple files due to includes, or is treated as doing so due to #line directives. ORCA/C will now generate a COP 6 instruction to record each source file change, allowing debuggers to properly track the flow of execution across files.
2022-02-05 18:32:11 -06:00
Stephen Heumann
5ac79ff36c Stop capitalizing file names in debug information.
This does not seem to be necessary for any of the debuggers (at least in their latest versions), and it obviously causes problems with case-sensitive filesystems.
2022-02-04 22:15:02 -06:00
Stephen Heumann
7322428e1d Add an option to print file names in error messages.
This can help identify if an error is in the main source file or an include file.
2022-02-04 22:10:50 -06:00
Stephen Heumann
4cb2106ee4 Change the name of the current source file on an #include or #append.
This causes __FILE__ to give the name of an include file if used within it, which seems to be what the standards intend (and what other compilers do). It also affects the file name recorded in debugging information for functions declared in an include file.

(Note that occ will generate a #line directive before an #append, essentially to work around the problem this patch fixes. After the patch, such a #line directive is effectively ignored. This should be OK, although it may result in a difference in whether a full or partial pathname is used for __FILE__ and in debug info.)
2022-02-03 22:22:33 -06:00
Stephen Heumann
dce9d36edd Comment out unused error messages and update docs about errors. 2022-02-01 22:16:57 -06:00
Stephen Heumann
e36503508a Allow more forms of address expressions in static initializers.
There were several forms that should be permitted but were not, such as &"str"[1], &*"str", &*a (where a is an array), and &*f (where f is a function).

This fixes #15 and also certain other cases illustrated in the following example:

char a[10];
int main(void);
static char *s1 = &"string"[1];
static char *s2 = &*"string";
static char *s3 = &*a;
static int (*f2)(void)=&*main;
2022-01-29 21:59:25 -06:00
Stephen Heumann
e8d90a1b69 Do not generate extra zero bytes after certain string constants.
These extra bytes are unnecessary after the changes in commit 5871820e0c to make string constants explicitly include their null terminators.

The extra bytes would be generated for code like the following:

int main(void) {
        static char *s1 = "abc", *s2 = "def", *s3 = "ghi";
}
2022-01-29 18:27:03 -06:00
Stephen Heumann
02fbf97a1e Work around the SANE comp conversion bug in another place.
This affects casts to comp evaluated at compile time, e.g.:

static comp c = (comp)-53019223785472.0;
2022-01-22 18:22:37 -06:00
Stephen Heumann
5357e65859 Fix indentation of a few lines. 2022-01-22 18:21:11 -06:00
Stephen Heumann
f4b0993007 Specify correct location for the default .h file. 2022-01-17 18:27:39 -06:00
Stephen Heumann
242bef1f6e Correct a comment. 2022-01-17 18:27:10 -06:00
Stephen Heumann
8eda03436a Preserve qualifiers when changing float/double/comp parameters to extended.
Changing the type is still non-standard, but at least this allows us to detect and report write-to-const errors.
2022-01-17 18:26:28 -06:00
Stephen Heumann
6f0b94bb7c Allow the pascal qualifier to appear anywhere types are used.
This is necessary to allow declarations of pascal-qualified function pointers as members of a structure, among other things.

Note that the behavior for "pascal" now differs from that for the standard function specifiers, which have more restrictive rules for where they can be used. This is justified by the fact that the "pascal" qualifier is allowed and meaningful for function pointer types, so it should be able to appear anywhere they can.

This fixes #28.
2022-01-13 20:11:43 -06:00
Stephen Heumann
b1bc840ec8 Reverse order of parameters for pascal function pointer types.
The parameters of the underlying function type were not being reversed when applying the "pascal" qualifier to a function pointer type. This resulted in the parameters not being in the expected order when a call was made using such a function pointer. This could result in spurious errors in some cases or inappropriate parameter conversions in others.

This fixes #75.
2022-01-13 19:38:22 -06:00
Stephen Heumann
3acf5844c2 Save and restore type spec when evaluating expressions in a type name.
Failing to do this could allow the type spec to be overwritten if the expression contained another type name within it (e.g. a cast). This could cause the wrong type to be computed, which could lead to incorrect behavior for constructs that use type names, e.g. sizeof.

Here is an example program that demonstrated the problem:

int main(void) {
        return sizeof(short[(long)50]);
}
2022-01-12 21:53:23 -06:00
Stephen Heumann
3b35a65b1d Give an error if a function pointer is redefined as a function.
This gives an error for code like the following, which was previously allowed:

void (*p)(int);
void p(int i) {}

Note that the opposite order still does not give a compiler error, but does give linker errors. Making sure we give a compiler error for all similar cases would require larger changes, but this patch at least catches some erroneous cases that were previously being allowed.
2022-01-12 18:31:32 -06:00
Stephen Heumann
61a382de0b Report parameter type errors the same way with and without strict type checks.
They were being reported as "type conflict" by default, but as "duplicate symbol" if strict checks were used.
2022-01-12 18:31:04 -06:00
Stephen Heumann
b5b276d0f4 Do not give a spurious error for redeclarations of a pascal function.
This could occur with strict type checking on, because the parameter types were compared at a point where they had been reversed for the original declaration but not for the subsequent one.

Here is an example that would give an error:

#pragma ignore 24
extern pascal void func(int, long);
extern pascal void func(int, long);
2022-01-12 18:30:28 -06:00