Replace it wth an explicit opcode parameter that is passed around. That
is both slightly easier to reason about (to trace where it comes from)
and slightly faster, since it can be read from a register.
On my machine takes booting to "Welcome to Macintosh" being output in
a verbose boot of Mac OS X 10.2.8 from 31.8s to 30.6s (average of 5
runs, measured using deterministic mode and looking at when execution
reaches PC 0x90004a88).
There's no reason for it to be a global, we always set it and use it
in instruction implementations, and we never read it directly.
Perhaps the compiler could optimize this away, but it's better to be
simpler (and also be easier to read).
None of the POWER opcodes uses it now, plus it is a duplicate of ppc_setsoov (though ppc_setsoov is inline so it would have to be moved to be able to use it in poweropcodes.cpp?
Use U instead of UL. U will use the smallest size that can fit all the unsigned bytes. Since 0xFFFFFFFF fits in 32 bits, the 0xFFFFFFFFU is a uint32_t.
Including bits of rot_sh in the rA and MQ calculations is nonsensical since it is a rotation count and not a source of bits to be extracted or rotated.
The mask is not complicated, so we don't need to use power_rot_mask.
Fix carry flag calculation. Anding with the rotation count (n = rB) is nonsensical.
(r & ~mask) is the rotated word ANDed with the complement of the generated mask of n zeros followed by 32 - n ones.
The manual says this 32-bit result is ORed together. This means all the bits are ORed together which is equivalent to saying 0 if all zeros and 1 if any ones. In other words: (r & ~mask) != 0.
This boolean is ANDed with bit 0 of rS to produce the carry. int32_t(rS) < 0 will test bit 0. The && operator will treat each side as a boolean so you can exclude "!= 0" tests.
If bit 26 of rB is set then the mask should be all ones.
If bit 26 of rB is set then rA should be all ones or all zeros (depending on the sign bit of rA).
Test bit 26 of rB instead of using >= 0x20 to determine which operation to perform.
The two operations need to be switched such that rA is cleared when bit 26 is set.
Don't forget to store the result in rA.
Test bit 26 of rB instead of using >= 0x20 to determine which operation to perform.
Since the mask is not complicated, we don't need to use power_rot_mask.
It is redundant to test bit 0 of rS and then use bit 0 of rS in the case when bit 0 of rS is set.
In the case when bit 0 of rS is not set, using bit 0 or rS is incorrect since it results in no change of rA.
Operands are supposed to be twos complement numbers.
Calculate overflow first before calculating condition codes because the overflow condition is copied from XER.
Fix OV calculation. Previously, it was using power_setsoov which I think is only for add and subtract operations.
Fix CR calcalation. It's supposed to depend on the low order 32 bits that are placed into MQ.
- Fix CR calculation. It depends on whether a match occurred and only the EQ flag is affected.
- Remove bytes_copied. We can subtract bytes_remaining from bytes_to_load to calculate that.
- Initialize ppc_result_d to zero so that bitmask is not needed to add new bytes to it. This is ok since the manual says that bytes that are not loaded are undefined.
Calculate overflow first before calculating condition codes because the overflow condition is copied from XER.
Fix OV calculation. Previously, it was using power_setsoov which I think is only for add and subtract operations. doz does a subtract but only if the result is supposed to be positive, therefore a negative result indicates an overflow.
dividend and divisor are supposed to be a twos compliment numbers.
Fix OV calculation. Previously, it was using power_setsoov which I think is only for add and subtract operations.
Fix CR calculation. It depends on the remainder, not the quotient.
dividend is supposed to be a twos compliment number.
Fix test for dividend = -0x80000000 and divisor = -1. Previously, the test was assuming dividend was a 32-bit value from rA.
Fix OV calculation. Previously, it was using power_setsoov which I think is only for add and subtract operations.
Fix CR calculation. It depends on the remainder, not the quotient.
For MPC601 CPUs, all values of rA return 64 though the manual says undefined values of rA produce undefined results.
For non-MPC601 CPUs, if this instruction is included (such as for risu DPPC) then return results that are obtained from a G4 running Mac OS 9.2.2.
Making a negative value positive requires unary negate operator rather than binary and operator since negative numbers are stored using twos compliment.
If ov is set then clear overflow when overflow doesn't happen.
doz and dozi were storing the result into the wrong register.
nabs was not taking into account two's complement storage of numbers
and was just setting the signed bit.
These two instructions are used in the implementation of text
measurement in native QuickDraw on 7.1.2/the PDM ROM, and the incorrect
values were resulting in nothing being rendered. With the fix text
appears when booting from the 7.1.2 CD.
Result of running IWYU (https://include-what-you-use.org/) and
applying most of the suggestions about unncessary includes and
forward declarations.
Was motivated by observing that <thread> was being included in
ppcopcodes.cpp even though it was unused (found while researching
the use of threads), but seems generally good to help with build
times and correctness.
Use explicit cast when converting large integer types to smaller integer types when it is known that the most significant bytes are not required.
For pcidevice, check the ROM file size before casting to int. We'll allow expansion ROM sizes up to 4MB but usually they are 64K, sometimes 128K, rarely 256K.
for machinefactory, change the type to size_t so that it can correctly get the size of files that are larger than 4GB; it already checks the file size is 4MB before we need to cast to uint32_t.
For floppyimg, check the image size before casting to int. For raw images, only allow files up to 2MB. For DiskCopy42 images, it already checks the file size, so do the cast after that.
All opcodes should be emulated now. There was also a significant amount of clean-up, particularly with lscbx and the bit rotation/shifting instructions.