* Split the creation of the sprite stamps from adding the
sprites themselves. This allows for 48 stamps that can
be pre-rendered and quickly reassigned to sprites for
animations.
* Inlined all calls to PushDirtyTile. This both removed
significant overhead from calling the small function and,
since almost all callers we checking multiple tiles, we
were able to avoid incrementing the count each time and
just add a single incrments at the end.
* Switched from recording each tile that a sprite intersects
with each from to only recording the top-left tile and the
overlap size. This reduced overhead for larger sprites
and removed the needs for an end-of-list marker.
* Much more aggressive caching of Sprite and Tile Store
values in order to streamline the inner tile dispatch
routines.
* Moving TileStore and Sprites (and other supporting
data structures) into a separate data bank. Needed just
for size purposes and provide micro-optimizations by
opening up the use of abs,y addressing modes.
* Revamped multi-sprite rendering code to avoid the need to
copy any masks and all stacked sprites can be drawn
via a sequence of and [addrX],y; ora (addrX),y where
addrX is set once per tile.
* General streamlining to reduct overhead. This work was
focused on removing as much per-tile overhead as possible.
All of the sprite rendering has been deferred down to the level of
the tile drawing. Sprites are no longer drawn/erased, but instead
a sprite sheet is generated in AddSprite and referenced by the
renderer.
Because there is no longer a single off-screen buffer that holds
a copy of all the rendered sprites, the TileStore size must be
expanded to hold a reference to the sprite data address fo each
tile. This increase in data structure size require the TileStore
to be put into its own bank and appropriate code restructuring.
The benefits to the rewrite are significant:
1. Sprites are never drawn/erased off-screen. They are only
ever drawn directly to the screen or play field.
2. The concept of "damaged" sprites is gone. Every dirty tile
automatically renders just to portion of a sprite that it
intersects.
These two properties result in a substantial increase in throughput.