* Split the creation of the sprite stamps from adding the
sprites themselves. This allows for 48 stamps that can
be pre-rendered and quickly reassigned to sprites for
animations.
* Inlined all calls to PushDirtyTile. This both removed
significant overhead from calling the small function and,
since almost all callers we checking multiple tiles, we
were able to avoid incrementing the count each time and
just add a single incrments at the end.
* Switched from recording each tile that a sprite intersects
with each from to only recording the top-left tile and the
overlap size. This reduced overhead for larger sprites
and removed the needs for an end-of-list marker.
* Much more aggressive caching of Sprite and Tile Store
values in order to streamline the inner tile dispatch
routines.
* Moving TileStore and Sprites (and other supporting
data structures) into a separate data bank. Needed just
for size purposes and provide micro-optimizations by
opening up the use of abs,y addressing modes.
* Revamped multi-sprite rendering code to avoid the need to
copy any masks and all stacked sprites can be drawn
via a sequence of and [addrX],y; ora (addrX),y where
addrX is set once per tile.
* General streamlining to reduct overhead. This work was
focused on removing as much per-tile overhead as possible.
* Added a TS_LAST_VBUFF cached value in the tile store
* Added a fast path for single sprite w/fast test
* Improved raster timing granularity for visual profiling
All of the code paths are generally good. Both the _RenderSprites
and _ApplyDirtyTiles functions take a fair bit of raster time. Will
continue to try and streamline the data structures and code to
reduce overhead.`
All of the sprite rendering has been deferred down to the level of
the tile drawing. Sprites are no longer drawn/erased, but instead
a sprite sheet is generated in AddSprite and referenced by the
renderer.
Because there is no longer a single off-screen buffer that holds
a copy of all the rendered sprites, the TileStore size must be
expanded to hold a reference to the sprite data address fo each
tile. This increase in data structure size require the TileStore
to be put into its own bank and appropriate code restructuring.
The benefits to the rewrite are significant:
1. Sprites are never drawn/erased off-screen. They are only
ever drawn directly to the screen or play field.
2. The concept of "damaged" sprites is gone. Every dirty tile
automatically renders just to portion of a sprite that it
intersects.
These two properties result in a substantial increase in throughput.
There is enough room in the 32-byte exception handler to inline the
9-byte epilogue when generating the code sequence for mixed BG1/BG0
rendering.
This code sequence is generated once and run for as many frames as the
word appear on screen, so saving an uncondition branch (3 cycles) at the
cost of 60 cycles is probably worth it.
The SEP/REP pairs that are used to move in and out
of 8-bit mode to do the single-byte pushed on the left
and right edges of the screen can also be used to clear
the necessary carry and overflow flags.