dos33fsprogs/tfv/OPTIMIZATION

163 lines
4.9 KiB
Plaintext
Raw Normal View History

2017-11-24 19:40:50 +00:00
Original implementation:
Multiplying 1.0 * 2.0 = 2.0, took 707 cycles
Multiplying ff.ff * ff.ff = 0.0, took 761 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 88,179
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
==================================
Total = 200,971
Frame Rate = 4.98 fps
Update Multiply to use zero page addresses:
Multiplying 1.0 * 2.0 = 2.0, took 616 cycles
Multiplying ff.ff * ff.ff = 0.0, took 664 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 76,561
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
===================================
Total = 189,353
Frame Rate = 5.28 fps
2017-11-25 06:30:22 +00:00
Update to use "fast multiply" w 2kB squares table lookup:
Multiplying 1.0 * 2.0 = 2.0, took 228 cycles
Multiplying ff.ff * ff.ff = 0.0, took 272 cycles
2017-11-24 19:40:50 +00:00
2017-11-25 06:30:22 +00:00
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 27,041
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 139,833
Frame Rate = 7.15 fps
2017-11-26 02:55:45 +00:00
Update to optimize fast multiply (reusing NUM1H, return results in register)
Multiplying 1.0 * 2.0 = 2.0, took 234 cycles
Multiplying ff.ff * ff.ff = 0.0, took 278 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 73,925
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 135,575
Frame Rate = 7.38 fps
2017-11-26 04:27:55 +00:00
Add a cache to lookup_map
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 73,445
Cycles: lookup_map= 24,649
Cycles: put_sprite= 2,561
=================================
Total = 125,824
Frame Rate = 7.95 fps
2017-11-26 05:10:58 +00:00
Don't draw sky every frame
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 69,099
Cycles: lookup_map= 24,649
Cycles: put_sprite= 2,561
=================================
Total = 121,478
Frame Rate = 8.23 fps
Move checking if over water out of critical section
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 54,374
Cycles: lookup_map= 24,712
Cycles: put_sprite= 2,561
=================================
Total = 106,841
Frame Rate = 9.36 fps
Move to 40x40 mode
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 123,701
Cycles: lookup_map= 48,847
Cycles: put_sprite= 2,561
=================================
Total = 224,981
Frame Rate = 4.44 fps
2017-11-26 23:35:50 +00:00
Remove some unnecessary zero page copies in the mode7 code
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 141,269
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
================================
Total = 215,420
Frame Rate = 4.64 fps
2017-11-27 00:20:58 +00:00
A few more minor cleanups in the Y loop
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 140,858
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
===============================
Total = 215,009
Frame Rate = 4.65 fps
2017-11-27 01:51:24 +00:00
Add some self-modifying code to inner loop:
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 131,610
Cycles: lookup_map= 21,718
Cycles: put_sprite= 2,561
================================
Total = 205,761
Frame Rate = 4.86 fps
More self-modifying code, also move SCREEN_X to X register
Cycles: flying= 187
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 49,613
Cycles: mode7= 118,034
Cycles: lookup_map= 22,747
Cycles: put_sprite= 2,561
================================
Total = 193,214
Frame Rate = 5.18 fps
2017-11-27 00:20:58 +00:00
Each cycle removed from inner X loop saves
32*40=1280 cycles