mirror of
https://github.com/irmen/prog8.git
synced 2026-04-19 20:16:51 +00:00
15 KiB
15 KiB
Agents.md
Context and instructions for AI Agents to work on this project.
Project Overview
- This project is a compiler for the Prog8 programming language.
- Prog8 is a programming language primarily targeting 8-bit retro systems with the 6502 CPU, such as the Commodore 64, Commodore 128, and Commander X16.
- The compiler has a 6502 code generator backend, and an IR code generator.
- The IR code is meant to be used in a new machine specific code generator backend (primarily 6502 but maybe 68000 as well later)
- The compiler includes a simple 'virtual machine' that can execute the IR code directly via interpretation.
- Prog8 source files have .p8 extension - these are not LUA or PICO-8 source files in this case!
- Prog8 source files are a "module" that can contain 1 or more "blocks". They can also import other modules, from internal library files or from source files on the filesystem.
- The prog8 compiler is written mostly in Kotlin, those files have the .kt extension.
- The standard library is mostly written in Prog8 and assembly code, and can be found in the "compiler" module, in the 'res/prog8lib' directory.
- Kotlin version 2.3 is used for the compiler implementation.
- Java 11 is used as Java runtime version.
- ANTLR4 version 4.13 is used for the parser implementation.
- Dependent library versions can be found in 'build.gradle.kts' and in the IntelliJ IDEA configuration files in .idea/libraries
- The compiler main entrypoint is in the "compiler" module, in src/prog8/CompilerMain.kt
Prog8 language feature hints
General & Setup
- an overview of the language features can be found in the documentation file docs/source/introduction.rst
- the syntax and grammar is specified in an ANTLR4 grammar file found in the parser directory
- a program consists of a 'main' block containing the 'start' subroutine entry point, and zero or more other subroutines. Additional blocks and subroutines in those can be present too.
- module imports are done using "%import modulename" note: this does not always mean that the scope "modulename" gets defined! Example: "%import textio" imports the textio module, but it defines the "txt" prefix where all the routines are in.
Datatypes & Variables
- available primitive datatypes: bool, byte, ubyte, word, uword, long, float, str. ubyte/uword are unsigned. long=4 bytes (SIGNED only - no unsigned long yet), float=5-byte Microsoft format, str=0-terminated ubytes (max 255 chars).
- there are also arrays (max 256 bytes, or 512 for split word arrays), pointers, and structs. Prog8 does not yet have by-value struct variables, only pointer to structs. Pointers can point to primitive types or struct types. Use
memory()+ pointers for data larger than array limits. - word arrays split by default: LSB and MSB bytes stored in separate arrays for efficient 6502 access. With @nosplit this can be overridden to use regular sequential storage.
- prog8 has C-style pointer arithmetic when adding or subtracting integers from pointers. pointer syntax differs from C: Dereference with
@(ptr)orptr[index]. - while there are larger than byte datatypes, the intended compiler target is a 6502 CPU system which is 8 bit so operations on larger datatypes are expensive. Words are still somewhat okay, but longs and floats in particular are very inefficient. Try to avoid them unless needed for correctness.
- all variables (including parameters) are statically allocated exactly once; there is no call stack, so recursion and reentrancy are not possible by default. All variables are zero-initialized (globals at program start, locals on subroutine entry).
- variables should not be placed in zeropage (with @zp and @requirezp) often, because there is only limited zeropage memory space, except for pointer variables: those should usually be in zeropage.
Virtual Registers & Stack
- Zeropage scratch variables: The compiler provides these predefined zeropage scratch variables:
P8ZP_SCRATCH_B1(byte),P8ZP_SCRATCH_REG(byte),P8ZP_SCRATCH_W1(word),P8ZP_SCRATCH_W2(word),P8ZP_SCRATCH_PTR(word). No other zeropage locations can be used - assembly routines must only use these predefined scratch variables. If additional storage is needed, define regular variables in the BSS section instead. You can also use the cx16 virtual registers (cx16.r0-cx16.r15) as temporary storage. - the 16 'virtual registers' (cx16.r0 - cx16.r15) are available on ALL targets, not just CX16. They're fast 16-bit global variables but NOT preserved across subroutine calls. R12-R15 are especially dangerous: long operations may clobber them without warning. In IRQ handlers, save/restore with cx16.save_virtual_registers() / cx16.restore_virtual_registers(). You can give them descriptive names using aliases:
alias score = cx16.r7but using regular variables is preferred. - the CPU hardware stack can be manipulated via builtin functions: push(), pushw(), pushl(), pushf() and pop(), popw(), popl(), popf(). These can manually implement recursion if needed.
Logic & Control Flow
- the syntax for boolean logical operators is 'and', 'or', 'xor', 'not'. Bitwise operators are '&', '|', '^', '~', and '<<','>>' for bit shifting left and right respectively. All operators are documented in docs/source/programming.rst
- CPU status flags can be tested with if_cs, if_cc, if_z, if_nz, etc. which compile to single 6502 branch instructions but require careful handling of flag state.
- use 'when' statements with choice blocks to avoid multiple 'if' statements.
- use 'repeat' statements instead of loops if the iteration count is not needed inside the loop body.
- use if-expressions instead of if-statements if the goal is to assign a single value to a variable based on a simple choice between two values.
- there is a 'defer' statement that can be used to defer the execution of a statement(s) until flow returns from the current scope.
- it is allowed to use 'goto' with labels, and jump lists, if needed; because that generates very optimal code
Subroutines & Return Values
- everything is public: No private/public modifiers. All symbols accessible via fully qualified names from anywhere.
- no function overloading: Each subroutine must have a unique name (except for some builtin functions).
- subroutines can return 0, 1 or more return value(s). They can be assigned to multiple variables in a single multi-variable assignment: a,b,c = routine(). Values can be skipped using 'void'.
- the
voidkeyword has two forms: (1) prefix formvoid routine()suppresses all return values from a subroutine call, (2) assignment forma, void, c = routine()skips specific return values in multi-return assignments. - subroutines can be nested. Nested subroutines have direct access to all variables defined in their parent scope.
Standard library
- to discover what modules and routines are available in the standard library, FIRST consult docs/source/_static/symboldumps/ - there's a skeleton file per compilation target (e.g., skeletons-cx16.txt, skeletons-c64.txt) listing ALL available modules, subroutines, and builtin functions with their signatures.
- standard library source code is in the 'res/prog8lib' directory. See docs/source/libraries.rst for detailed descriptions of builtin functions and library routines.
- text output is done via the 'textio' module (txt.print, txt.chrout, etc.), math routines are in 'math', and string conversion routines are in 'conv'. Note: conv module uses
str_<type>naming for number-to-string (e.g.,str_uword,str_long), NOT<type>2str. String-to-number usesstr2<type>(e.g.,str2uword,str2long). However, for printing numbers you don't need explicit conversion - txt module has direct routines:txt.print_b(byte),txt.print_ub(ubyte),txt.print_w(word),txt.print_uw(uword),txt.print_l(long),txt.print_bool(bool).
Syntax & Formatting
- numeric literal syntax:
$prefix for hex ($FFnot0xFF),%prefix for binary (%1010not0b1010). Underscores allowed for readability:25_000_000. No leading-zero octal notation. No type suffixes: long literals are just regular numbers (e.g.,12345678not12345678L), the type is determined by context (variable type or cast). - for loop syntax:
for i in 0 to 10 { ... }(use downto when counting down) - NOTfor i = 0 to 10or C-stylefor(i=0; i<10; i++) - semicolons start comments:
; this is a comment- they do NOT end statements. There is NO statement separator (unlike C/Java's;). One statement per line only. Multi-line comments use/* ... */. - trailing commas allowed:
[1, 2, 3,]is valid syntax. - Prog8 source files are indented with 4 spaces, no tabs.
- assembly source files (*.asm) can be indented with either spaces or tabs
- The assembly source code uses 64tass syntax, NOT ca65/cc65 or other assemblers. Key 64tass syntax:
.proc/.pendfor procedures,_labelfor local labels,.byte/.word/.dwordfor data,=for equates, zero-page variables defined with=. Instructions likerol,ror,asl,lsrrequire an explicit operand - userol a,ror a, etc. for the accumulator, not justrolorror.
Other Key differences from other languages (C, Python, etc.)
- no automatic type widening:
byte*byte=byte(may overflow!),word*word=word, etc. Explicitly cast operands:word result = (bytevar as word) * 1000. Hex literals with full width (e.g.,$0040) also promote. Compiler does not warn by default. - no block scope:
for,if/elseblocks do NOT introduce new scope. Only subroutines introduce scope. Variables declared anywhere in a subroutine are hoisted to the top. - qualified names from top level: Must use full qualified names (e.g.,
cx16.r0), not relative imports.
Project Module Descriptions
beanshell- EXPERIMENTAL/UNFINISHED Contains BeanShell integration for scripting capabilities within the compilerbenchmark-c- C implementations for performance comparison and benchmarkingbenchmark-program- Benchmark programs to test compiler output performancecodeCore- Core code generation utilities and shared componentscodeGenCpu6502- 6502 CPU-specific code generator backend (generates assembly for 6502-based systems)codeGenExperimental- Experimental code generators that are under developmentcodeGenIntermediate- Intermediate representation (IR) code generatorcodeGenVirtual- Virtual machine code generator backendcodeOptimizers- Optimization passes for improving generated code efficiencycompiler- Main compiler executable and top-level compiler logiccompilerAst- complicated Abstract Syntax Tree (AST) where most optimizations also run on, is later transformed into the simpleAstdocs- Documentation files for the Prog8 language and compilerexamples- Example Prog8 programs demonstrating language featuresintermediate- Components related to the intermediate representation (IR) of Prog8 programslanguageServer- Language Server Protocol implementation for IDE integrationparser- ANTLR4 Parser implementation for the Prog8 language syntaxscripts- Utility scripts for development, testing, and deploymentsimpleAst- Simplified AST that is used to run the code generator backends fromsyntax-files- Syntax definition files for editors and IDEsvirtualmachine- Virtual machine implementation that can execute IR code
Key Information
- never read the files and directories that are ignored via the .aiignore and .gitignore files
- never perform any git source control write/update/add/commit/branch operations. Read and status operations are allowed.
- git log/history queries can be useful for understanding when/why a feature was added or tracking down when a bug was introduced, but for locating code use grep_search or glob instead.
- Architecture decisions: separation of frontend/parser, IR intermediate representation, multiple backends
Dev environment tips
Commands to build the compiler
- use the system installed gradle command instead of the gradle wrapper.
- IMPORTANT: Always run
gradle installdist installshadowdistto rebuild the compiler after any code modifications. gradle build- Full build of the compiler including running the full test suitegradle clean- Clean build artifactsgradle compileKotlin- Compile only the Kotlin source codegradle installdist- Create the compiler JARs and executable filegradle installshadowdist- Create the single "fat" compiler JAR and executable file
Compilation Output Files
*.prg- The final compiled program file for the target system (e.g., Commander X16)*.asm- Generated assembly code from the Prog8 source*.list- Generated full assembly listing file from the Prog8 source*.p8ir- Intermediate representation file, can be executed in the Virtual Machine*.vice-mon-list- Vice emulator monitor list file for debugging
Commands to run tests
gradle test --tests "*TestName*"- Run specific test classesgradle build --refresh-dependencies- Refresh dependencies during development- Unit tests are written using KoTest
- Several modules in the project contain unit tests, but most of them live in the "compiler" module in the 'test' directory.
- when writing TEST programs, always add these directives at the top:
%zeropage basicsafeand%option no_sysinit- this keeps zeropage usage safe and skips system initialization for faster, simpler test programs.
Commands to run the Prog8 Compiler after building it
- the prog8c compiler executable can be found in the compiler/build/install/prog8c/bin folder (this is already added to the shell's path)
- the
-emuswitch can be used to directly execute the resulting program in an emulator after successful compilation. - the
-checkswitch performs a quick syntax/semantic check only - it will NOT produce any output files (no .prg, .asm, etc.). Use it only for fast error checking during development. prog8c -target cx16 input.p8- Compile a Prog8 source file "input.p8" for the CommanderX16 targetprog8c -target cx16 -check input.p8- Quickly check a Prog8 source file "input.p8" for compiler errors, no output binary is producedprog8c -target cx16 -emu input.p8- Compile and execute a prog8 file in the CommanderX16 emulatorprog8c -target c64 -emu input.p8- Compile and execute a prog8 file in the Commodore-64 emulatorprog8c -target virtual input.p8- Compile a prog8 file for the IR/Virtual machine targetprog8c -target virtual -emu input.p8- Compile and directly execute a prog8 file in the Virtual Machineprog8c -vm input.p8ir- Execute an existing prog8 program, compiled in IR form, in the Virtual Machinex16emu -scale 2 -prg input.prg- Just load an existing compiled program in the CommanderX16 emulator. Ignore any errors and warnings, because the emulator doesn't produce any output on STDOUT.x64sc input.prg- run an existing compiled program in the Commodore-64 emulator. Ignore any errors and warnings, because the emulator doesn't produce any output on STDOUT.
TODO Items
The file docs/source/todo.rst contains a comprehensive list of things that still have to be fixed, implemented, or optimized.