In #135 we switched from a static OpcodeGrabber table to a
curOpcodeGrabber pointer in ppc_main_opcode. This results in an extra
indirection (as far as generated assembly having an additional load),
which reduces execution speed.
Switch to making the opcode grabber into a parameter to
ppc_main_opcode, and make ppc_exec_inner keep it up to date (via an
EXEF_OPCODE exception flag).
Also fixes FPU instructions in ppctests - we now need to set the FP
MSR bit when initializing the CPU.
Rather than running them normally, they should trigger a "no FPU"
exception. This appears to be required to allow correct graphical
rendering under Mac OS X - the FP bit cleared via mtmsr and rfi
instructions and something else appears to be relying on the exception
to be thrown.
Implemented by maintaining a parallel version of the OpcodeGrabber
table (OpcodeGrabberNoFPU) which contains alternate implementations
for all the floating point instructions. We switch the table whenever
the MSR value changes. This should minimize the overhead of doing
these checks.
Replace it wth an explicit opcode parameter that is passed around. That
is both slightly easier to reason about (to trace where it comes from)
and slightly faster, since it can be read from a register.
On my machine takes booting to "Welcome to Macintosh" being output in
a verbose boot of Mac OS X 10.2.8 from 31.8s to 30.6s (average of 5
runs, measured using deterministic mode and looking at when execution
reaches PC 0x90004a88).