prog8/CompilerDevelopment.md

#### Just a few remarks upfront:
* There is the (gradle/IDEA) module `parser`: that's the parser generated by ANTLR4, in Java. The only file to be edited here is the grammar, `prog8.g4`.
* Then we have the module `compilerAst` - in Kotlin - which uses `parser` and adds AST nodes. Here we put our additions to the generated thing, *including any tests of the parsing stage*.
  - the name is a bit misleading, as this module isn't (or, resp. shouldn't be; see below) about *compiling*, only the parsing stage
  - also, the tree that comes out isn't much of an *abstraction*, but rather still more or less a parse tree (this might very well change).
  - **However, let's not *yet* rename the module.** We'll find a good name during refactoring.

#### Problems with `compilerAst`:
* `ModuleImporter.kt`, doing (Prog8-) module resolution. That's not the parser's job.
* During parsing, character literals are turned into UBYTEs (since there is no basic type e.g. CHAR). That's bad because it depends on a specific character encoding (`IStringEncoding` in `compilerAst/src/prog8/ast/AstToplevel.kt`) of/for some target platform. Note that *strings* are indeed encoded later, in the `compiler` module.
* The same argument applies to `IMemSizer`, and - not entirely sure about that - `IBuiltinFunctions`.

#### Steps to take, in order:
1. introduce an abstraction `SourceCode` that encapsulates the origin and actual loading of Prog8 source code
   - from the local file system (use case: user programs)
   - from resources (prog8lib)
   - from plain strings (for testing)
2. introduce a minimal interface to the outside, input: `SourceCode`, output: a tree with a `Module` node as the root
   - this will be the Kotlin singleton `Prog8Parser` with the main method `parseModule`
   - plus, optionally, method's for registering/unregistering a listener with the parser
   - anything related to the lexer, error strategies, character/token streams is hidden from the outside
   - to make a clear distinction between the *generated* parser (and lexer) vs. `Prog8Parser`, and to discourage directly using the generated stuff, we'll rename the existing `prog8Parser`/`prog8Lexer` to `Prog8ANTLRParser` and `Prog8ANTLRLexer` and move them to package `prog8.parser.generated`
3. introduce AST node `CharLiteral` and keep them until after identifier resolution and type checking; insert there an AST transformation step that turns them in UBYTE constants (literals)
4. remove uses of `IStringEncoding` from module `compilerAst` - none should be necessary anymore
5. move `IStringEncoding` to module `compiler`
6. same with `ModuleImporter`, then rewrite that (addressing #46)
7. refactor AST nodes and grammar: less generated parse tree nodes (`XyzContext`), less intermediary stuff (private classes in `Antr2Kotlin.kt` [sic]), more compact code. Also: nicer names such as simply `StringLiteral` instead of `StringLiteralValue`
8. re-think `IStringEncoding` to address #38
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			`#### Just a few remarks upfront:`
			* There is the (gradle/IDEA) module `parser`: that's the parser generated by ANTLR4, in Java. The only file to be edited here is the grammar, `prog8.g4`.
			* Then we have the module `compilerAst` - in Kotlin - which uses `parser` and adds AST nodes. Here we put our additions to the generated thing, including any tests of the parsing stage.
			`- the name is a bit misleading, as this module isn't (or, resp. shouldn't be; see below) about compiling, only the parsing stage`
			`- also, the tree that comes out isn't much of an abstraction, but rather still more or less a parse tree (this might very well change).`
			`- *However, let's not yet* rename the module.** We'll find a good name during refactoring.`

			#### Problems with `compilerAst`:
			* `ModuleImporter.kt`, doing (Prog8-) module resolution. That's not the parser's job.
			* During parsing, character literals are turned into UBYTEs (since there is no basic type e.g. CHAR). That's bad because it depends on a specific character encoding (`IStringEncoding` in `compilerAst/src/prog8/ast/AstToplevel.kt`) of/for some target platform. Note that strings are indeed encoded later, in the `compiler` module.
			* The same argument applies to `IMemSizer`, and - not entirely sure about that - `IBuiltinFunctions`.

			`#### Steps to take, in order:`
			1. introduce an abstraction `SourceCode` that encapsulates the origin and actual loading of Prog8 source code
			`- from the local file system (use case: user programs)`
			`- from resources (prog8lib)`
			`- from plain strings (for testing)`
			2. introduce a minimal interface to the outside, input: `SourceCode`, output: a tree with a `Module` node as the root
			- this will be the Kotlin singleton `Prog8Parser` with the main method `parseModule`
			`- plus, optionally, method's for registering/unregistering a listener with the parser`
			`- anything related to the lexer, error strategies, character/token streams is hidden from the outside`
			- to make a clear distinction between the generated parser (and lexer) vs. `Prog8Parser`, and to discourage directly using the generated stuff, we'll rename the existing `prog8Parser`/`prog8Lexer` to `Prog8ANTLRParser` and `Prog8ANTLRLexer` and move them to package `prog8.parser.generated`
			3. introduce AST node `CharLiteral` and keep them until after identifier resolution and type checking; insert there an AST transformation step that turns them in UBYTE constants (literals)
			4. remove uses of `IStringEncoding` from module `compilerAst` - none should be necessary anymore
			5. move `IStringEncoding` to module `compiler`
			6. same with `ModuleImporter`, then rewrite that (addressing #46)
			7. refactor AST nodes and grammar: less generated parse tree nodes (`XyzContext`), less intermediary stuff (private classes in `Antr2Kotlin.kt` [sic]), more compact code. Also: nicer names such as simply `StringLiteral` instead of `StringLiteralValue`
			8. re-think `IStringEncoding` to address #38