6.4 KiB
Tokenizer Overview
Generally, the usage pattern is:
- Setup the
Configuration. - Read the tokens. Token readers are: ModernTokenReader and ClassicTokenReader.
- Parse the tokens into a
Program. - Apply transformations, if applicable.
Code snippets
Queue<Token> tokens = ModernTokenReader.tokenize(config.sourceFile);
The list of tokens is a loose interpretation. It includes more of a compiler sense of tokens -- numbers, end of line markers (they're significant), AppleSoft tokens, strings, comments, identifiers, etc.
Parser parser = new Parser(tokens);
Program program = parser.parse();
The Program is now the parsed version of the BASIC program. Various Visitors may be used to report, gather information, or manipulate the tree in various ways.
Configuration config = Configuration.builder()
.sourceFile(this.sourceFile)
.build();
The Configuration class also allows the BASIC start address to be set (defaults to 0x801), set the maximum line length (this is in bytes, and defaults to 255, but feel free to experiment). Some of the classes report output via the debug stream, which defaults to a simple null stream (no output) - replace with System.out or another PrintStream.
ByteVisitor byteVisitor = Visitors.byteVisitor(config);
byte[] programData = byteVisitor.dump(program);
Finally, the ByteVisitor will transform the program into the tokenized form.
Directives
The framework allows embedding of directives.
$embed
NOTE: It appears that DOS 3.3 rewrites the resulting application and messes up the linked list of lines. ProDOS does not.
$embed will allow a binary to be embedded within the resulting application and can move it to a destination in memory. Please note that once the application is loaded on the Apple II, the program cannot be altered as the computer will crash.
Options:
file=<string>, required. Specifies the file to load.moveto=<addr>, optional. If provided, generates code to move binary to destination. AutomaticallyCALLed.var=<variable>, optional. If provided, address is assigned to variable specified.
Note that the current parser does not handle hex formats (at all). You may provide a string as well that starts with a
$or0xprefix.
Usage example:
5 $embed file="read.time.bin", moveto="0x0260"
The $embed directive must be last on the line (if there are comments, be sure to use the REMOVE_REM_STATEMENTS optimization.
From the circles-timing.bas sample, this is the beginning of the program:
0801:9A 09 00 00 8C 32 30 36 32 3A AB 31 00 A9 2B 85
\___/ \___/ \____________/ \___/ \_______...
Ptr, Line 0, CALL 2062, :, GOTO 1, Assembly code...
The move code is based on what Beagle Bros put into their Peeks, Pokes, and Pointers poster. (See Memory Move under the Useful Calls; the CALL -468 entry.)
LDA #<embeddedStart
STA $3C
LDA #>embeddedStart
STA $3D
LDA #<embeddedEnd
STA $3E
LDA #>embeddedEnd
STA $3F
LDA #<targetAddress
STA $42
LDA #>targetAddress
STA $43
LDY #0
JMP $FE2C
$shape
$shape will generate a shape table based either on the source (src=) or binary (bin=) shape table provided. Source shape table generation is based on the shape table st tool support and is described here in more detail.
Overall format is as follows:
$shape ( src="path" [ ,label=variable | ,assign=(varname1="label1" [,varname2="label2"]* ] )
| bin="path" )
[,poke=yes(default)|no]
[,address=<variable>]
[,init=yes|no ]
Shape from source
By using the src= option, the source code will be generated on the fly. For example the following shape source will insert a shape named "mouse" into the BASIC program:
; extracted from NEW MOUSE
.bitmap mouse
..........*X..
....XXXX.XX...
...XXXXXXXX...
.XXXXXXXXXXX..
XX.XXXXXXX.XX.
X...XXXXXXXXXX
XX............
.XXX.XX.......
...XXX........
Options on the source include:
label=variablewhich indicates a label is really a variable name; in the example, the variable name would be "MOUSE".assign=(...)will define a mapping from the label in the source to the BASIC variable name. Aassign(m=mouse)will define the variableMto be the shape number for the mouse.
Shape from binary
By using the bin= option, an already existing binary shape table can be inserted into the code. There are no additional options available in this case.
General options
poke=yes|no(default=yes) will embed aPOKE 232,<lowAddr>:POKE 233,<highAddr>into the line of code.address=<variable>, if supplied, will assign the address to a variable; therefore aaddress=ADwill embed the variableADinto the line of code.init=yes|no(default=yes) will embed a simpleROT=0:SCALE=1into the line of code for simple shape initialization.
$hex
If embedding hexadecimal addresses into an application makes sense, the $hex directive allows that to be done in a rudimentary manner.
Sample:
10 call $hex value="fc58"
Yields:
10 call -936
Optimizations
Optimizations are mechanisms to rewrite the Program, typically making the program smaller. Optimization itself is an enum which has a create method to setup the Visitor.
Current optimizations are:
- Remove empty statements will remove all extra colons. For example, if the application in question used
:to indicate nesting. Or just accidents! - Remove REM statements will remove all comments.
- Extract constant values will find all constant numerical references, insert a line
0with assignments, and finally replace all the numbers with the approrpiate variable name. Hypothesis is that the BASIC interpreter only parses the number once. - Merge lines will identify all lines that are not a target of
GOTO/GOSUB-type action and rewrite the line by merging it with others. The concept involved is that the BASIC program is just a linked list and shortening the list will shorten the search path. The default max length in bytes is set to255. - Renumber will renumber the application, beginning with line
0. This makes the decoding a tiny bit more efficient in that the number to decode will be smaller in the token stream.
Sample use:
program = program.accept(Optimization.REMOVE_REM_STATEMENTS.create(config));