* Split the creation of the sprite stamps from adding the
sprites themselves. This allows for 48 stamps that can
be pre-rendered and quickly reassigned to sprites for
animations.
* Inlined all calls to PushDirtyTile. This both removed
significant overhead from calling the small function and,
since almost all callers we checking multiple tiles, we
were able to avoid incrementing the count each time and
just add a single incrments at the end.
* Switched from recording each tile that a sprite intersects
with each from to only recording the top-left tile and the
overlap size. This reduced overhead for larger sprites
and removed the needs for an end-of-list marker.
* Much more aggressive caching of Sprite and Tile Store
values in order to streamline the inner tile dispatch
routines.
* Moving TileStore and Sprites (and other supporting
data structures) into a separate data bank. Needed just
for size purposes and provide micro-optimizations by
opening up the use of abs,y addressing modes.
* Revamped multi-sprite rendering code to avoid the need to
copy any masks and all stacked sprites can be drawn
via a sequence of and [addrX],y; ora (addrX),y where
addrX is set once per tile.
* General streamlining to reduct overhead. This work was
focused on removing as much per-tile overhead as possible.
This significantly simplifies the dispatch process by creating a
proper backing store for the tiles. Most values that were
calcualted on the fly are now stored as constants in the tile
store.
Also, all tile updated are run through the dirty tile list which
solved a checken-and-egg problem of which order to do sprites vs
new tiles and affords a lot of optimizations since tile rendering
is deferred and each tile is only drawn at most once per frame.