## Synopsis

A sprite compiler that targets 16-bit 65816 assembly code on the Apple IIgs computer.  The sprite compiler uses informed search techniques to generate optimal code for whole-sprite rendering.

## Example

The compiler takes a simple masked, sparse byte sequence which are represented by (data, mask, offset) tuples.  During the search, it tracks the state of the 65816 CPU registers in order to find an optimal sequence of operations to generated the sprite data.  The space of possible actions are defined by the subclasses of the CodeSequence class.

Currently, the compiler can only handle short, unmasked sequences, but it does correctly find optimal code sequences.  Here is a sample of the code that the compiler generates 

### Data = $11 ###

```
	TCS       ; 2 cycles
	SEP	#$10  ; 3 cycles
	LDA	#$11  ; 2 cycles
	STA	00,s  ; 4 cycles
	REP	#$10  ; 3 cycles
; Total Cost = 14 cycles
```

### Data = $11 $22 ###

```
	TCS         ; 2 cycles
	LDA	#$2211  ; 3 cycles
	STA	00,s    ; 5 cycles
; Total Cost = 10 cycles
```

### Data = $11 $22 $22

```
        TCS             ; 2 cycles
        LDA     #$2222  ; 3 cycles
        STA     02,s    ; 5 cycles
        LDA     #$2211  ; 3 cycles
        STA     01,s    ; 5 cycles
; Total Cost = 18 cycles
```

### Data = $11 $22 $11 $22 ###

```
	TCS         ; 2 cycles
	LDA	#$2211  ; 3 cycles
	STA	00,s    ; 5 cycles
	STA	02,s    ; 5 cycles
; Total Cost = 15 cycles
```

### Data = $11 $22 $33 $44 $55 $66 ###

```
	ADC	#5     ; 3 cycles
	TCS        ; 2 cycles
	PEA	$6655  ; 5 cycles
	PEA	$4433  ; 5 cycles
	PEA	$2211  ; 5 cycles
; Total Cost = 20 cycles
```

### Data = $11 $22 $11 $22 $11 $22 $11 $22 ###

```
	ADC	#7      ; 3 cycles
	TCS         ; 2 cycles
	LDA	#$2211  ; 3 cycles
	PHA         ; 4 cycles
	PHA         ; 4 cycles
	PHA         ; 4 cycles
	PHA         ; 4 cycles
; Total Cost = 24 cycles
```
### Data = ($11, 0), ($11, 160), ($11, 320) ###

A simple sprite three lines tall.

```
	TCS             ; 2 cycles
	SEP     #$10    ; 3 cycles
	LDA     #$11    ; 2 cycles
	PHA             ; 3 cycles
	STA     A1,s    ; 4 cycles
	REP     #$10    ; 3 cycles
	TSC             ; 2 cycles
	ADC     #321    ; 3 cycles
	TCS             ; 2 cycles
	SEP     #$10    ; 3 cycles
	LDA     #$11    ; 2 cycles
	PHA             ; 3 cycles
	REP     #$10    ; 3 cycles
; Total Cost = 35 cycles

```

## Limitations ##

 * The search is quite memory intensive and grows too fast to handle multi-line sprite data yet.  Future versions will incorporate more aggressive heuristic and Iterative Deepening A-Star search to mitigate the memory usage.

 * Carry Bit Tracking.  If the stack is moved backwards, the code does not track that the carry is now set, so the next ADC instruction will be off by one.  Proper carry bit tracking should remove the need for any CLC/SEC instructions.
 
 * High/Low Register States. The state of a register (A/X/Y/S/D) is itentified to be in one of three states: IMMEDIATE, SCREEN_OFFSET, UNINITIALIZED.  There are cases where high and low bytes of a register can be in a different state and the compiler does not track this scenario. Ex. 
 ```
	TCS             ; A = SCREEN_OFFSET
	SEP     #$10    ; Endable 8-bit accumulator
	LDA     #$11    ; Low Byte = IMMEDIATE / High Byte = SCREEN_OFFSET
```
Currently, the accumulator will be marked as having an IMMEDIATE value of $0011, rather than the intederminant value of $--11.  If there are 16-bit data values equal to $0011, then the compiler could emit incorrect code.

* Heuristic is too ooptimistic.  The Heuristic function does not take into account the cost of masked data stores, stack movement, or switrched between 8-bit and 16-bit modes.  As such, it is too optimistic in many cases and causes the effectrive branching factor of the search to become too large. For example, one test case has a true cost of 37 cycles, but the heuristic estimated a total cost of 19 cycles from the initial state.

* Successor function allows non-productive moves.  Specifically, it allows a switch to/from 8/16-bit mode at any point, e.g. SEP/REP instructions.  As such, the expander will propose long sequences of SEP/REP/SEP/REP/SEP/REP/...  Since each instruction is cheaper than any code that actually writes data, many states are expanded that a trivially redundent until their cumulative cost is greater that a store.  This can actually be quite high for 16-bit masked stores (16 cycles), which is 5 SEP/REP expansions.
## License

MIT License