Some of these accept a 4th register "C" so they exist in the opcode table 32 times.
Some of these don't accept a 4th register "C" so they exist in the opcode table once.
Instead of a primary opcode lookup table with 64 entries and a few
smaller tables with 4-2048 entries, use a single 64 * 2048 (128K)
entry table to dispatch opcodes.
Helps with performance, since we avoid the function call overhead for
some frequently-used instructions (e.g. branch, integer, floating point).
Saves ~2 seconds from the time to Welcome to Macintosh (same measurement
methodology as #125)
Secondarily also makes opcode registration/decoding a bit more uniform,
and scannable, since it's now all in initialize_ppc_opcode_table.
Replace it wth an explicit opcode parameter that is passed around. That
is both slightly easier to reason about (to trace where it comes from)
and slightly faster, since it can be read from a register.
On my machine takes booting to "Welcome to Macintosh" being output in
a verbose boot of Mac OS X 10.2.8 from 31.8s to 30.6s (average of 5
runs, measured using deterministic mode and looking at when execution
reaches PC 0x90004a88).
ppc_opcode16 and other functions are only needed in the implementation in
ppcexec.cpp, they don't need to be in the header.
fp_return_double and fp_return_uint64 have no uses (as of 2141a72b873763995b3428353dc7fd9d5bb47abb)
can can thus be removed altogether.
Similarly ppc_fpu_off has no uses (as of bb3f4e596e3f18f9414daa94e3639d2c192e93ec)
and can be removed.
Adds support for a --deterministic command-line option that makes
repeated runs the same:
- Keyboard and mouse input is ignored
- The sound server does a periodic pull from the DMA channel (so that
it gets drained), but only does so via a periodic timer (instead of
being driven by a cubeb callback, which could arrive at different
times)
- Disk image writes are disabled (reads of a modified area still
work via an in-memory copy)
- NVRAM writes are disabled
- The current time that ViaCuda initializes the guest OS is always the
same.
This makes execution exactly the same each time, which should
make debugging of more subtle issues easier.
To validate that the deterministic mode is working, I've added a
periodic log of the current "time" (measured in cycle count), PC
and opcode. When comparing two runs with --log-no-uptime, the generated
log files are identical.
Shutdown will enter the debugger or quit depending on the execution mode.
Quit is different from shutdown since it is triggered outside the guest by using the host Quit menu item.
Last use of grab_return was removed in f204caa9079aa94d90e1a8ef650b845283c1d46a.
grab_breakpoint was added in 2bd717e2931cba5be3152f92b3cca5e82e446759 but
never used.
There's no reason for it to be a global, we always set it and use it
in instruction implementations, and we never read it directly.
Perhaps the compiler could optimize this away, but it's better to be
simpler (and also be easier to read).
Mode 1 contains real addressing mode entries, which by definition cannot
be using segment registers. By skipping over them, we can shave off a
couple of seconds from the 10.2 boot time.
They happen surprisingly often, and flushing the TLB is expensive
because we need to walk over all entries.
Takes booting 10.2 on a Beige G3 from binary start to "Welcome to Macintosh"
from 58s to 38s on my machine.
Keeps track of instructions (including operands) that are executed,
to see if there are any hotspots that could be optimized or fastpaths
that should be added.
Also adds a mode where CPU profiler data is periodically output, to
make it easier to get at these instruction counts during startup.