prog8/CompilerDevelopment.md at c786acc39bdece74c523fa577489151e85627fc3

mirror of https://github.com/irmen/prog8.git synced 2024-07-08 10:29:09 +00:00

meisl c786acc39b + CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage)

2021-07-02 15:41:38 +02:00

2.9 KiB

Raw Blame History

Just a few remarks upfront:

There is the (gradle/IDEA) module parser: that's the parser generated by ANTLR4, in Java. The only file to be edited here is the grammar, prog8.g4.
Then we have the module compilerAst - in Kotlin - which uses parser and adds AST nodes. Here we put our additions to the generated thing, including any tests of the parsing stage.
- the name is a bit misleading, as this module isn't (or, resp. shouldn't be; see below) about compiling, only the parsing stage
- also, the tree that comes out isn't much of an abstraction, but rather still more or less a parse tree (this might very well change).
- However, let's not yet rename the module. We'll find a good name during refactoring.

Problems with `compilerAst`:

ModuleImporter.kt, doing (Prog8-) module resolution. That's not the parser's job.
During parsing, character literals are turned into UBYTEs (since there is no basic type e.g. CHAR). That's bad because it depends on a specific character encoding (IStringEncoding in compilerAst/src/prog8/ast/AstToplevel.kt) of/for some target platform. Note that strings are indeed encoded later, in the compiler module.
The same argument applies to IMemSizer, and - not entirely sure about that - IBuiltinFunctions.

Steps to take, in order:

introduce an abstraction SourceCode that encapsulates the origin and actual loading of Prog8 source code
- from the local file system (use case: user programs)
- from resources (prog8lib)
- from plain strings (for testing)
introduce a minimal interface to the outside, input: SourceCode, output: a tree with a Module node as the root
- this will be the Kotlin singleton Prog8Parser with the main method parseModule
- plus, optionally, method's for registering/unregistering a listener with the parser
- anything related to the lexer, error strategies, character/token streams is hidden from the outside
- to make a clear distinction between the generated parser (and lexer) vs. Prog8Parser, and to discourage directly using the generated stuff, we'll rename the existing prog8Parser/prog8Lexer to Prog8ANTLRParser and Prog8ANTLRLexer and move them to package prog8.parser.generated
introduce AST node CharLiteral and keep them until after identifier resolution and type checking; insert there an AST transformation step that turns them in UBYTE constants (literals)
remove uses of IStringEncoding from module compilerAst - none should be necessary anymore
move IStringEncoding to module compiler
same with ModuleImporter, then rewrite that (addressing #46)
refactor AST nodes and grammar: less generated parse tree nodes (XyzContext), less intermediary stuff (private classes in Antr2Kotlin.kt [sic]), more compact code. Also: nicer names such as simply StringLiteral instead of StringLiteralValue
re-think IStringEncoding to address #38

2.9 KiB Raw Blame History

Just a few remarks upfront:

Problems with compilerAst:

Steps to take, in order:

2.9 KiB

Raw Blame History

Problems with `compilerAst`: