prog8/CompilerDevelopment.md

#### Just a few remarks upfront:
* There is the (gradle/IDEA) module `parser`: that's the parser generated by ANTLR4, in Java. The only file to be edited here is the grammar, `prog8.g4`.
* Then we have the module `compilerAst` - in Kotlin - which uses `parser` and adds AST nodes. Here we put our additions to the generated thing, *including any tests of the parsing stage*.
  - the name is a bit misleading, as this module isn't (or, resp. shouldn't be; see below) about *compiling*, only the parsing stage
  - also, the tree that comes out isn't much of an *abstraction*, but rather still more or less a parse tree (this might very well change).
  - **However, let's not *yet* rename the module.** We'll find a good name during refactoring.

#### Problems with `compilerAst`:
* `ModuleImporter.kt`, doing (Prog8-) module resolution. That's not the parser's job.
* `ParsingFailedError` (in `ModuleParsing.kt`): this exception (it is actually *not* a `java.lang.Error`...) is thrown in a number of places, where other exceptions would make more sense. For example: not finding a file should just yield a `NoSuchFileException`, not this one. The other problem with it is that it does not provide any additional information about the source of parsing error, in particular a `Position`.
* During parsing, character literals are turned into UBYTEs (since there is no basic type e.g. CHAR). That's bad because it depends on a specific character encoding (`IStringEncoding` in `compilerAst/src/prog8/ast/AstToplevel.kt`) of/for some target platform. Note that *strings* are indeed encoded later, in the `compiler` module.
* The same argument applies to `IMemSizer`, and - not entirely sure about that - `IBuiltinFunctions`.

#### Steps to take, in conceptual (!) order:

(note: all these steps have been implemented, rejected or otherwise solved now.)

1. introduce an abstraction `SourceCode` that encapsulates the origin and actual loading of Prog8 source code
   - from the local file system (use case: user programs)
   - from resources (prog8lib)
   - from plain strings (for testing)
2. add subclass `ParseError : ParsingFailedError` which adds information about the *source of parsing error* (`SourceCode` and `Position`). We cannot just replace `ParsingFailedError`  right away because it is so widely used (even in the `compiler` module). Therefore we'll just subclass for the time being, add more and more tests requiring the new one to be thrown (or, resp., NOT to be thrown), and gradually transition.
3. introduce a minimal interface to the outside, input: `SourceCode`, output: a tree with a `Module` node as the root
   - this will be the Kotlin singleton `Prog8Parser` with the main method `parseModule`
   - plus, optionally, method's for registering/unregistering a listener with the parser
   - the *only* exception ever thrown / reported to listeners (TBD) will be `ParseError`
   - anything related to the lexer, error strategies, character/token streams is hidden from the outside
   - to make a clear distinction between the *generated* parser (and lexer) vs. `Prog8Parser`, and to discourage directly using the generated stuff, we'll rename the existing `prog8Parser`/`prog8Lexer` to `Prog8ANTLRParser` and `Prog8ANTLRLexer` and move them to package `prog8.parser.generated`
4. introduce AST node `CharLiteral` and keep them until after identifier resolution and type checking; insert there an AST transformation step that turns them in UBYTE constants (literals)
5. remove uses of `IStringEncoding` from module `compilerAst` - none should be necessary anymore
6. move `IStringEncoding` to module `compiler`
7. same with `ModuleImporter`, then rewrite that (addressing #46)
8. refactor AST nodes and grammar: less generated parse tree nodes (`XyzContext`), less intermediary stuff (private classes in `Antrl2Kotlin.kt`), more compact code. Also: nicer names such as simply `StringLiteral` instead of `StringLiteralValue`
9. re-think `IStringEncoding` to address #38
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			`#### Just a few remarks upfront:`
			* There is the (gradle/IDEA) module `parser`: that's the parser generated by ANTLR4, in Java. The only file to be edited here is the grammar, `prog8.g4`.
			* Then we have the module `compilerAst` - in Kotlin - which uses `parser` and adds AST nodes. Here we put our additions to the generated thing, including any tests of the parsing stage.
			`- the name is a bit misleading, as this module isn't (or, resp. shouldn't be; see below) about compiling, only the parsing stage`
			`- also, the tree that comes out isn't much of an abstraction, but rather still more or less a parse tree (this might very well change).`
			`- *However, let's not yet* rename the module.** We'll find a good name during refactoring.`

			#### Problems with `compilerAst`:
			* `ModuleImporter.kt`, doing (Prog8-) module resolution. That's not the parser's job.
+ add mention of `ParseError : ParsingFailedError` - particularly for testability this is something that needs to be done 2021-07-02 16:42:38 +00:00			* `ParsingFailedError` (in `ModuleParsing.kt`): this exception (it is actually not a `java.lang.Error`...) is thrown in a number of places, where other exceptions would make more sense. For example: not finding a file should just yield a `NoSuchFileException`, not this one. The other problem with it is that it does not provide any additional information about the source of parsing error, in particular a `Position`.
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			* During parsing, character literals are turned into UBYTEs (since there is no basic type e.g. CHAR). That's bad because it depends on a specific character encoding (`IStringEncoding` in `compilerAst/src/prog8/ast/AstToplevel.kt`) of/for some target platform. Note that strings are indeed encoded later, in the `compiler` module.
			* The same argument applies to `IMemSizer`, and - not entirely sure about that - `IBuiltinFunctions`.

Update CompilerDevelopment.md 2021-07-03 13:11:34 +00:00			`#### Steps to take, in conceptual (!) order:`
doc 2022-02-10 23:16:39 +00:00
			`(note: all these steps have been implemented, rejected or otherwise solved now.)`

+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			1. introduce an abstraction `SourceCode` that encapsulates the origin and actual loading of Prog8 source code
			`- from the local file system (use case: user programs)`
			`- from resources (prog8lib)`
			`- from plain strings (for testing)`
+ add mention of `ParseError : ParsingFailedError` - particularly for testability this is something that needs to be done 2021-07-02 16:42:38 +00:00			2. add subclass `ParseError : ParsingFailedError` which adds information about the source of parsing error (`SourceCode` and `Position`). We cannot just replace `ParsingFailedError` right away because it is so widely used (even in the `compiler` module). Therefore we'll just subclass for the time being, add more and more tests requiring the new one to be thrown (or, resp., NOT to be thrown), and gradually transition.
			3. introduce a minimal interface to the outside, input: `SourceCode`, output: a tree with a `Module` node as the root
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			- this will be the Kotlin singleton `Prog8Parser` with the main method `parseModule`
			`- plus, optionally, method's for registering/unregistering a listener with the parser`
+ add mention of `ParseError : ParsingFailedError` - particularly for testability this is something that needs to be done 2021-07-02 16:42:38 +00:00			- the only exception ever thrown / reported to listeners (TBD) will be `ParseError`
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00			`- anything related to the lexer, error strategies, character/token streams is hidden from the outside`
			- to make a clear distinction between the generated parser (and lexer) vs. `Prog8Parser`, and to discourage directly using the generated stuff, we'll rename the existing `prog8Parser`/`prog8Lexer` to `Prog8ANTLRParser` and `Prog8ANTLRLexer` and move them to package `prog8.parser.generated`
+ add mention of `ParseError : ParsingFailedError` - particularly for testability this is something that needs to be done 2021-07-02 16:42:38 +00:00			4. introduce AST node `CharLiteral` and keep them until after identifier resolution and type checking; insert there an AST transformation step that turns them in UBYTE constants (literals)
			5. remove uses of `IStringEncoding` from module `compilerAst` - none should be necessary anymore
			6. move `IStringEncoding` to module `compiler`
			7. same with `ModuleImporter`, then rewrite that (addressing #46)
Update CompilerDevelopment.md 2021-07-03 13:11:34 +00:00			8. refactor AST nodes and grammar: less generated parse tree nodes (`XyzContext`), less intermediary stuff (private classes in `Antrl2Kotlin.kt`), more compact code. Also: nicer names such as simply `StringLiteral` instead of `StringLiteralValue`
+ add mention of `ParseError : ParsingFailedError` - particularly for testability this is something that needs to be done 2021-07-02 16:42:38 +00:00			9. re-think `IStringEncoding` to address #38
+ CompilerDevelopment.md, outlining what to do to improve testability (atm only for the parsing stage) 2021-07-02 13:41:38 +00:00