dos33fsprogs/tfv/OPTIMIZATION

246 lines
7.4 KiB
Plaintext
Raw Normal View History

2017-11-24 19:40:50 +00:00
Original implementation:
Multiplying 1.0 * 2.0 = 2.0, took 707 cycles
Multiplying ff.ff * ff.ff = 0.0, took 761 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 88,179
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
==================================
Total = 200,971
Frame Rate = 4.98 fps
Update Multiply to use zero page addresses:
Multiplying 1.0 * 2.0 = 2.0, took 616 cycles
Multiplying ff.ff * ff.ff = 0.0, took 664 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 76,561
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
===================================
Total = 189,353
Frame Rate = 5.28 fps
2017-11-25 06:30:22 +00:00
Update to use "fast multiply" w 2kB squares table lookup:
Multiplying 1.0 * 2.0 = 2.0, took 228 cycles
Multiplying ff.ff * ff.ff = 0.0, took 272 cycles
2017-11-24 19:40:50 +00:00
2017-11-25 06:30:22 +00:00
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 27,041
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 139,833
Frame Rate = 7.15 fps
2017-11-26 02:55:45 +00:00
Update to optimize fast multiply (reusing NUM1H, return results in register)
Multiplying 1.0 * 2.0 = 2.0, took 234 cycles
Multiplying ff.ff * ff.ff = 0.0, took 278 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 73,925
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 135,575
Frame Rate = 7.38 fps
2017-11-26 04:27:55 +00:00
Add a cache to lookup_map
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 73,445
Cycles: lookup_map= 24,649
Cycles: put_sprite= 2,561
=================================
Total = 125,824
Frame Rate = 7.95 fps
2017-11-26 05:10:58 +00:00
Don't draw sky every frame
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 69,099
Cycles: lookup_map= 24,649
Cycles: put_sprite= 2,561
=================================
Total = 121,478
Frame Rate = 8.23 fps
Move checking if over water out of critical section
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 54,374
Cycles: lookup_map= 24,712
Cycles: put_sprite= 2,561
=================================
Total = 106,841
Frame Rate = 9.36 fps
Move to 40x40 mode
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 123,701
Cycles: lookup_map= 48,847
Cycles: put_sprite= 2,561
=================================
Total = 224,981
Frame Rate = 4.44 fps
2017-11-26 23:35:50 +00:00
Remove some unnecessary zero page copies in the mode7 code
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 141,269
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
================================
Total = 215,420
Frame Rate = 4.64 fps
2017-11-27 00:20:58 +00:00
A few more minor cleanups in the Y loop
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 140,858
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
===============================
Total = 215,009
Frame Rate = 4.65 fps
2017-11-27 01:51:24 +00:00
Add some self-modifying code to inner loop:
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 131,610
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
================================
Total = 205,761
Frame Rate = 4.86 fps
More self-modifying code, also move SCREEN_X to X register
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 118,034
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 193,214
Frame Rate = 5.18 fps
2017-11-27 00:20:58 +00:00
Remove unneeded precision in the 8.8 x 8.8 fixed point multiply
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 43,588
Cycles: mode7= 118,034
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 187,189
Frame Rate = 5.34 fps
2017-11-29 06:04:07 +00:00
In-line unsigned multiply inside of signed multiply (save 12 cycles)
Multiplying 1.0 * 2.0 = 2.0, took 198 cycles
Multiplying ff.ff * ff.ff = 0.0, took 218 cycles
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 40,888
Cycles: mode7= 118,034
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 184,489
Frame Rate = 5.42 fps
2017-11-30 04:17:32 +00:00
Have loop counter count down from 40 instead of count up (avoid compare)
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 40,888
Cycles: mode7= 115,538
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 181,993
Frame Rate = 5.49 fps
2017-11-30 05:13:23 +00:00
Move spacez updates out of line and also do some self modifying code
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 40,680
Cycles: mode7= 114,830
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 181,077
Frame Rate = 5.52 fps
2017-11-30 04:17:32 +00:00
Re-arranged multiply result register to allow more optimization.
This looks like a pessimization, but it's because the cycle counting code
had been undercounting and missed a few add routines :(
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 40,680
Cycles: mode7= 115,150
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 181,397
Frame Rate = 5.51 fps
2017-12-01 05:01:34 +00:00
Make lookup_map inline. Again it looks like an impressive speedup
but some of this was fixing the cycle count estimates.
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 40,680
Cycles: mode7= 111,882
Cycles: lookup_map= 19,872
Cycles: put_sprite= 2,561
================================
Total = 175,254
Frame Rate = 5.71 fps
2017-11-30 04:17:32 +00:00
Each cycle removed from inner X loop saves
32*40=1280 cycles