Last week the first beta version of my language and compiler was released (if you're interested you can find it here https://github.com/irmen/prog8 documentation is here https://prog8.readthedocs.io/ )
Here's the developmend story behind it, so far:
Last year my interest in the venerable Commodore-64 sparked again because of recent demo scene productions and SID music compositions that I really liked. I grew up with the Commodore-64 it was the first computer I ever used. And because "the 8-bit guy" on youtube published a few videos about his "Planet X2" game that he made and published on the C64. I decided to create a high level programming language for it to be able to use that later to make some new programs for the C64, and to learn a bit about compiler construction. I started off writing everything in Python, but didn't like where it was going and had troubles with restructuring and refactoring the growing code base - even when using the PyCharm IDE (execution speed was never an issue though until then). So I switched to Kotlin to see if it would fit my project better (and to learn this promising language at the same time)
I knew very little (and still don't, I guess) about creating compilers. After a bit of studying I decided on the following strategies to create code for the very constrained 8 bit 6502:
-
code is parsed with an antlr4 parser into an AST that is optimized a bit (const-folding and such)
-
the compiler then generates an intermedidate code program from this, that targets a stack based virtual machine. There's a bunch of general opcodes but it also contains a fair bit of specialized opcodes that helped me target the 6502 and its memory model better. I have no idea if this a usual approach but it kinda evolved into it. I felt I needed those special opcodes otherwise the assembly code generator would be too difficult. The intermediate code can be executed by the way in an interpreter as long as it doesn't use stuff only available in the C64 and its ROMs - this helped me a lot with debugging. Chosing a stack based vm over a register based vm seemed to fit the target machine better (which is extremely register constrained)
-
the intermediate code is then transformed into 6502 assembly code. There's a simple pattern matcher that maps intermediate instruction patterns to 6502 assembly source code.
-
the assembly source is assembled into an actual binary program for the C64. I'm using the 64tass assembler to do this
-
expression value evaluation is done on a split lsb/msb stack indexed by the X register. Bytes, words and floats can be used on this stack.
-
Subroutine parameters are passed via variables located on fixed memory locations, while their return values are passed via the evaluation stack. I think I need that to allow subroutines to be used as functions in an expression, but maybe there are more efficient ways to do that?
I'm struggling with generating compact assembly code (it all has to fit in less than 40 Kb to be able to load it without trickery on a C64)
The programming language supports several data types (byte, word, float, string), but dealing with anything but bytes on the 6502 quickly turns into a mess and requires a lot of instructions to combine bytes into words (let alone floats) and perform calculations on them.
While the split lsb/msb evaluation stack works, the code generated to deal with this spends a lot of instructions simply pushing bytes on this stack and popping them from the stack. A LOT of code is generated for evaluating seemingly simple expressions.
I don't have experience with other strategies to do this. Also while I know a little about code optimizations a compiler does, I have no idea how to actually program such things as dataflow analysis to see what values could be reused or common subexpression elimination to simplify expressions, but hey it works for now :) The code of the compiler is fairly modular though so adding more optimizers shouldn't be too hard, if you know how to do the actual optimization.
I learned a lot from building this. While the language and compiler work, they're still toys to be honest. I did learn the Kotlin language in the process as well, which I learned to like a lot.