BT API
The BASIC Tokenizer API is a set of reusable code that can be used to parse a text-based AppleSoft BASIC program an generate the appropriate tokens. It also has multiple types of visitors that can re-write that parse tree to rearrange the code (calling them optimizations is a bit over-the-top).
Overview
Generally, the usage pattern is:
- Setup the
Configuration
. - Read the tokens.
- Parse the tokens into a
Program
. - Apply transformations, if applicable.
Code snippets
Configuration config = Configuration.builder()
.sourceFile(this.sourceFile)
.build();
The Configuration
class also allows the BASIC start address to be set (defaults to 0x801
), set the maximum line length (this is in bytes, and defaults to 255
, but feel free to experiment). Some of the classes report output via the debug stream, which defaults to a simple null stream (no output) - replace with System.out
or another PrintStream
.
Queue<Token> tokens = TokenReader.tokenize(config.sourceFile);
The list of tokens is a loose interpretation. It includes more of a compiler sense of tokens -- numbers, end of line markers (they're significant), AppleSoft tokens, strings, comments, identifiers, etc.
Parser parser = new Parser(tokens);
Program program = parser.parse();
The Program
is now the parsed version of the BASIC program. Various Visitor
s may be used to report, gather information, or manipulate the tree in various ways.
Directives
The framework allows embedding of directives.
$embed
$embed
will allow a binary to be embedded within the resulting application. Please note that once the application is loaeded on the Apple II, the program cannot be altered as the computer will crash. Usage example:
5 $embed "read.time.bin", "0x0260"
The $embed
directive must be last on the line (if there are comments, be sure to use the REMOVE_REM_STATEMENTS
optimization. It takes two parameters: file name and target address, both are strings.
From the circles-timing.bas
sample, this is the beginning of the program:
0801:9A 09 00 00 8C 32 30 36 32 3A AB 31 00 A9 2B 85
\___/ \___/ \____________/ \___/ \_______...
Ptr, Line 0, CALL 3062, :, GOTO 1, Assembly code...
Optimizations
Optimizations are mechanisms to rewrite the Program
, typically making the program smaller. Optimization
itself is an enum which has a create
method to setup the Visitor
.
Current optimizations are:
- Remove empty statements will remove all extra colons. For example, if the application in question used
:
to indicate nesting. Or just accidents! - Remove REM statements will remove all comments.
- Merge lines will identify all lines that are not a target of
GOTO
/GOSUB
-type action and rewrite the line by merging it with others. The concept involved is that the BASIC program is just a linked list and shortening the list will shorten the search path. The default max length in bytes is set to255
. - Renumber will renumber the application, beginning with line
0
. This makes the decoding a tiny bit more efficient in that the number to decode will be smaller in the token stream.
Sample use:
program = program.accept(Optimization.REMOVE_REM_STATEMENTS.create(config));