## Synopsis A sprite compiler that targets 16-bit 65816 assembly code on the Apple IIgs computer. The sprite compiler uses informed search techniques to generate optimal code for whole-sprite rendering. ## Example The compiler takes a simple masked, sparse byte sequence which are represented by (data, mask, offset) tuples. During the search, it tracks the state of the 65816 CPU registers in order to find an optimal sequence of operations to generated the sprite data. The space of possible actions are defined by the subclasses of the CodeSequence class. Currently, the compiler can only handle short, unmasked sequences, but it does correctly find optimal code sequences. Here is a sample of the code that the compiler generates ### Data = $11 ### ``` TCS ; 2 cycles SEP #$10 ; 3 cycles LDA #$11 ; 2 cycles STA 00,s ; 4 cycles REP #$10 ; 3 cycles ; Total Cost = 14 cycles ``` ### Data = $11 $22 ### ``` TCS ; 2 cycles LDA #$2211 ; 3 cycles STA 00,s ; 5 cycles ; Total Cost = 10 cycles ``` ### Data = $11 $22 $22 ``` TCS ; 2 cycles LDA #$2222 ; 3 cycles STA 02,s ; 5 cycles LDA #$2211 ; 3 cycles STA 01,s ; 5 cycles ; Total Cost = 18 cycles ``` ### Data = $11 $22 $11 $22 ### ``` TCS ; 2 cycles LDA #$2211 ; 3 cycles STA 00,s ; 5 cycles STA 02,s ; 5 cycles ; Total Cost = 15 cycles ``` ### Data = $11 $22 $33 $44 $55 $66 ### ``` ADC #5 ; 3 cycles TCS ; 2 cycles PEA $6655 ; 5 cycles PEA $4433 ; 5 cycles PEA $2211 ; 5 cycles ; Total Cost = 20 cycles ``` ### Data = $11 $22 $11 $22 $11 $22 $11 $22 ### ``` ADC #7 ; 3 cycles TCS ; 2 cycles LDA #$2211 ; 3 cycles PHA ; 4 cycles PHA ; 4 cycles PHA ; 4 cycles PHA ; 4 cycles ; Total Cost = 24 cycles ``` ### Data = ($11, 0), ($11, 160), ($11, 320) ### A simple sprite three lines tall. ``` TCS ; 2 cycles SEP #$10 ; 3 cycles LDA #$11 ; 2 cycles PHA ; 3 cycles STA A1,s ; 4 cycles REP #$10 ; 3 cycles TSC ; 2 cycles ADC #321 ; 3 cycles TCS ; 2 cycles SEP #$10 ; 3 cycles LDA #$11 ; 2 cycles PHA ; 3 cycles REP #$10 ; 3 cycles ; Total Cost = 35 cycles ``` ## Limitations ## * The search is quite memory intensive and grows too fast to handle multi-line sprite data yet. Future versions will incorporate more aggressive heuristic and Iterative Deepening A-Star search to mitigate the memory usage. * Carry Bit Tracking. If the stack is moved backwards, the code does not track that the carry is now set, so the next ADC instruction will be off by one. Proper carry bit tracking should remove the need for any CLC/SEC instructions. * High/Low Register States. The state of a register (A/X/Y/S/D) is itentified to be in one of three states: IMMEDIATE, SCREEN_OFFSET, UNINITIALIZED. There are cases where high and low bytes of a register can be in a different state and the compiler does not track this scenario. Ex. ``` TCS ; A = SCREEN_OFFSET SEP #$10 ; Endable 8-bit accumulator LDA #$11 ; Low Byte = IMMEDIATE / High Byte = SCREEN_OFFSET ``` Currently, the accumulator will be marked as having an IMMEDIATE value of $0011, rather than the intederminant value of $--11. If there are 16-bit data values equal to $0011, then the compiler could emit incorrect code. * Heuristic is too ooptimistic. The Heuristic function does not take into account the cost of masked data stores, stack movement, or switrched between 8-bit and 16-bit modes. As such, it is too optimistic in many cases and causes the effectrive branching factor of the search to become too large. For example, one test case has a true cost of 37 cycles, but the heuristic estimated a total cost of 19 cycles from the initial state. * Successor function allows non-productive moves. Specifically, it allows a switch to/from 8/16-bit mode at any point, e.g. SEP/REP instructions. As such, the expander will propose long sequences of SEP/REP/SEP/REP/SEP/REP/... Since each instruction is cheaper than any code that actually writes data, many states are expanded that a trivially redundent until their cumulative cost is greater that a store. This can actually be quite high for 16-bit masked stores (16 cycles), which is 5 SEP/REP expansions. ## License MIT License