Small optimization to save 1 cycle per line

This commit is contained in:
Lucas Scharenbroich 2021-07-07 21:13:18 -05:00
parent 39f61087ce
commit 6b32d61fa9
1 changed files with 42 additions and 15 deletions

View File

@ -2,6 +2,14 @@
; this implementation does is that it does break up the slam into chunks of scan lines to allow
; time for interrupts to be serviced in a timely manner.
;
; This is a fairly basic slam in that it does not try to align the direct page. To enhance the
; slammer, note that page-aligned addresses repeat every 8 scan lines and some lines would need
; to be split into two slams to keep the direct page aligned.
;
; At best, this saves 1 cycles per word, or 80 cycles for a full screen -- which is only about
; 12 additional instructions, so this is an optimization that is unlikely to lead to a net
; improvement.
;
; A = base address of top-left edge of the screen
; Y = number of scanlines to blit
; X = width of the screen in bytes
@ -23,11 +31,14 @@ PEISlam
sta :inner+1
clc ; clear before the loop -- nothing in the loop affect the carry bit
bra :outer ; hop into the entry point. The loop control logic is next because
; the size of the PEI instruction is too large to use short branches
; in the code after pei_end
brl :outer ; hop into the entry point.
:control
]dp equ 158
lup 80 ; A full width screen is 160 bytes / 80 words
pei ]dp
]dp equ ]dp-2
--^
:pei_end
tdc ; Move to the next line
adc #160
tcd
@ -37,9 +48,20 @@ PEISlam
dey ; decrement the total counter, if zero then we're done
beq :exit
dex ; decrement the inner counter
bne :inner ; if not zero, no break; go to the next line
dex ; decrement the inner counter. Both counters are set
beq :restore ; up so that they fall-through by default to save a cycle
; per loop iteration.
:inner jmp $0000 ; 25 cycles of overhead per line. A full width slam executes all
; 80 of the PEI instructions which we expect to take 7 cycles
; since the direct page is not aligned. So total overhead is
; 25 / (25 + 7 * 80) = 4.27% of execution
;
; Without the interrupt breaks, we could remove the dex/beq test
; and save 4 cycles per loop which takes the overhead down to
; only 3.6%
:restore
tsx ; save the current stack
_R0W0 ; restore the execution environment and
lda :stk_save ; give a few cycles to catch some interrupts
@ -51,7 +73,7 @@ PEISlam
txs ; set the stack address to the right edge
ldx #8 ; Enable interrupts at least once every 8 lines
_R1W1
:inner jmp $0000
bra :inner
:exit
_R0W0
@ -62,14 +84,6 @@ PEISlam
pld
rts
]dp equ 158
lup 80 ; A full width screen is 160 bytes / 80 words
pei ]dp
]dp equ ]dp-2
--^
:pei_end
jmp :control
:stk_save ds 2
:screen_width ds 2
@ -108,6 +122,19 @@ PEISlam