ORCA-C

Commit Graph

Author	SHA1	Message	Date
Stephen Heumann	8278f7865a	Support unconvertible preprocessing numbers. These are tokens that follow the syntax for a preprocessing number, but not for an integer or floating constant after preprocessing. They are now allowed within the preprocessing phases of the compiler. They are not legal after preprocessing, but they may be used as operands of the # and ## preprocessor operators to produce legal tokens.	2024-04-23 21:39:14 -05:00
Stephen Heumann	e71fe5d785	Treat unary + as an actual operator, not a no-op. This is necessary both to detect errors (using unary + on non-arithmetic types) and to correctly perform the integer promotions when unary + is used (which can be detected with sizeof or _Generic).	2022-12-09 19:03:38 -06:00
Stephen Heumann	bb1bd176f4	Add a command-line option to select the C standard to use. This provides a more straightforward way to place the compiler in a "strict conformance" mode. This could essentially be achieved by setting several pragma options, but having a single setting is simpler. "Compatibility modes" for older standards can also be selected, although these actually continue to enable most C17 features (since they are unlikely to cause compatibility problems for older code).	2022-12-07 21:35:15 -06:00
Stephen Heumann	92a3af1d5f	Fix icp/isp tables to account for otherch. Commit `9cc72c8845` introduced otherch tokens but did not properly update these tables to account for them. This would cause * not to be accepted as the first character in an expression, and might also cause other problems.	2022-11-25 23:25:58 -06:00
Stephen Heumann	9cc72c8845	Support "other character" preprocessing tokens. This implements the catch-all category for preprocessing tokens for "each non-white-space character that cannot be one of the above" (C17 section 6.4). These may appear in skipped code, or in macros or macro parameters if they are never expanded or are stringized during macro processing. The affected characters are $, @, `, and many extended characters. It is still an error if these tokens are used in contexts where they remain present after preprocessing. If #pragma ignore bit 0 is clear, these characters are also reported as errors in skipped code or preprocessor constructs.	2022-11-08 18:58:50 -06:00
Stephen Heumann	b8b7dc2c2b	Remove code that treats # as an illegal character in most places. C90 had constraints requiring # and ## tokens to only appear in preprocessing directives, but C99 and later removed those constraints, so this code is no longer necessary when targeting current languages versions. (It would be necessary in a "strict C90" mode, if that was ever implemented.) The main practical effect of this is that # and ## tokens can be passed as parameters to macros, provided the macro either ignores or stringizes that parameter. # and ## tokens still have no role in the grammar of the C language after preprocessing, so they will be an unexpected token and produce some kind of error if they appear anywhere. This also contains a change to ensure that a line containing one or more illegal characters (e.g. $) and then a # is not treated as a preprocessing directive.	2022-10-13 18:35:26 -05:00
Stephen Heumann	4fe9c90942	Parse ... as a single punctuator token. This accords with its definition in the C standards. For the time being, the old form of three separate tokens is still accepted too, because the ... token may not be scanned correctly in the obscure case where there is a line continuation between the second and third dots. One observable effect of this is that there are no longer spaces between the dots in #pragma expand output.	2022-10-10 18:06:01 -05:00
Stephen Heumann	4e76f62b0e	Allow additional letters in identifiers. The added characters are accented roman letters that were added to the Mac OS Roman character set at some time after it was first defined. Some IIGS fonts include them, although others do not.	2022-08-01 19:59:49 -05:00
Stephen Heumann	3c2b492618	Add support for compound literals within functions. The basic approach is to generate a single expression tree containing the code for the initialization plus the reference to the compound literal (or its address). The various subexpressions are joined together with pc_bno pcodes, similar to the code generated for the comma operator. The initializer expressions are placed in a balanced binary tree, so that it is not excessively deep. Note: Common subexpression elimination has poor performance for very large trees. This is not specific to compound literals, but compound literals for relatively large arrays can run into this issue. It will eventually complete and generate a correct program, but it may be quite slow. To avoid this, turn off CSE.	2022-06-08 21:34:12 -05:00
Stephen Heumann	5871820e0c	Support UTF-8/16/32 string literals and character constants (C11). These have u8, u, or U prefixes, respectively. The types char16_t and char32_t (defined in <uchar.h>) are used for UTF-16 and UTF-32 code points.	2021-10-11 20:54:37 -05:00
Stephen Heumann	979852be3c	Use the right types for constants cast to character types. These were previously treated as having type int. This resulted in incorrect results from sizeof, and would also be a problem for _Generic if it was implemented. Note that this creates a token kind of "charconst", but this is not the kind for character constants in the source code. Those have type int, so their kind is intconst. The new kinds of "tokens" are created only through casts of constant expressions.	2021-03-07 13:38:21 -06:00
Stephen Heumann	8f8e7f12e2	Distinguish the different types of floating-point constants. As with expressions, the type does not actually limit the precision and range of values represented.	2021-03-07 00:48:51 -06:00
Stephen Heumann	4ad7a65de6	Process floating-point values within the compiler using the extended type. This means that floating-point constants can now have the range and precision of the extended type (aka long double), and floating-point constant expressions evaluated within the compiler also have that same range and precision (matching expressions evaluated at run time). This new behavior is intended to match the behavior specified in the C99 and later standards for FLT_EVAL_METHOD 2. This fixes the previous problem where long double constants and constant expressions of type long double were not represented and evaluated with the full range and precision that they should be. It also gives extra range and precision to constants and constant expressions of type double or float. This may have pluses and minuses, but at any rate it is consistent with the existing behavior for expressions evaluated at run time, and with one of the possible models of floating point evaluation specified in the C standards.	2021-03-04 23:58:08 -06:00
Stephen Heumann	793f0a57cc	Initial support for constants with long long types. Currently, the actual values they can have are still constrained to the 32-bit range. Also, there are some bits of functionality (e.g. for initializers) that are not implemented yet.	2021-02-03 23:11:23 -06:00
Stephen Heumann	ffe6c4e924	Spellcheck comments throughout the code. There are no non-comment changes.	2020-01-29 17:09:52 -06:00
Stephen Heumann	656868a095	Implement support for universal character names in identifiers.	2020-01-20 17:22:06 -06:00
Stephen Heumann	d24dacf01a	Add initial support for universal character names. This currently only works in character constants or strings, not identifiers.	2020-01-19 23:59:54 -06:00
Stephen Heumann	7e822819b7	Allow the WDM instruction to be used in the mini-assembler. This can be useful under emulators that may implement special functionality using WDM. It is implemented as taking a one-byte numeric operand.	2020-01-11 21:58:21 -06:00
Stephen Heumann	3121a465f1	Implement the _Alignof operator (from C11). In ORCA/C, the alignment of all object types is 1.	2020-01-06 20:17:29 -06:00
Stephen Heumann	9036a98e1c	Implement support for digraphs. Specifically, the following six punctuator tokens are now supported: <: :> <% %> %: %:%: These behave the same as the existing tokens [, ], {, }, #, and ## (respectively), apart from their spelling. This can be useful when the full ASCII character set cannot easily be displayed or input (e.g. on the IIgs text screen with certain language settings).	2020-01-04 21:49:50 -06:00
Stephen Heumann	0184e3db7b	Recognize the new keywords from C99 and C11 as such. Specifically, the following will now be tokenized as keywords: _Alignas _Alignof _Atomic _Bool _Complex _Generic _Imaginary _Noreturn _Static_assert _Thread_local restrict ('inline' was also added as a standard keyword in C99, but ORCA/C already treated it as such.) The parser currently has no support for any of these keywords, so for now errors will still be generated if they are used, but this is a first step toward adding support for them.	2020-01-03 22:48:53 -06:00
Stephen Heumann	5b26b8cc5b	Expand all tabs in assembly files to spaces. This allows the code to be displayed properly on GitHub and in modern text editors, which typically do not support the irregularly-spaced tab stops used for ORCA/M code. It also avoids any possibility of problems building the code if the SysTabs file is missing or has been customized with non-standard tab stops.	2018-02-10 21:55:24 -06:00
Stephen Heumann	46b6aa389f	Change all text/source files to LF line endings.	2017-10-21 18:40:19 -05:00
mikew50	e72177985e	ORCA/C 2.1.0 source from the Opus ][ CD	2017-10-01 17:47:47 -06:00

24 Commits