dos33fsprogs/tfv/OPTIMIZATION
2017-11-25 21:55:45 -05:00

61 lines
1.9 KiB
Plaintext

Original implementation:
Multiplying 1.0 * 2.0 = 2.0, took 707 cycles
Multiplying ff.ff * ff.ff = 0.0, took 761 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 88,179
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
==================================
Total = 200,971
Frame Rate = 4.98 fps
Update Multiply to use zero page addresses:
Multiplying 1.0 * 2.0 = 2.0, took 616 cycles
Multiplying ff.ff * ff.ff = 0.0, took 664 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 76,561
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
===================================
Total = 189,353
Frame Rate = 5.28 fps
Update to use "fast multiply" w 2kB squares table lookup:
Multiplying 1.0 * 2.0 = 2.0, took 228 cycles
Multiplying ff.ff * ff.ff = 0.0, took 272 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 27,041
Cycles: mode7= 76,077
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 139,833
Frame Rate = 7.15 fps
Update to optimize fast multiply (reusing NUM1H, return results in register)
Multiplying 1.0 * 2.0 = 2.0, took 234 cycles
Multiplying ff.ff * ff.ff = 0.0, took 278 cycles
Cycles: flying= 162
Cycles: getkey= 46
Cycles: page_flip= 26
Cycles: multiply= 24,935
Cycles: mode7= 73,925
Cycles: lookup_map= 33,920
Cycles: put_sprite= 2,561
=================================
Total = 135,575
Frame Rate = 7.38 fps