All tests are randomly generated, and the end results were achieved empirically using a believed-good 68000 emulation. Some have subsequently been patched as and when 68000 emulation faults are found. JSON formatting is not guaranteed to be consistent.
Since the 68000 isn't fully orthogonal in its instruction encodings — in the ORI example above some modes and some sizes are illegal, those opcodes being used for other instructions — the tests labelled ORI will include both: (i) all valid ORIs; and (ii) some other instructions. I didn't consider this worth fixing.
Every generated opcode is followed by three words of mostly-random data; this data — and almost all other random numbers — never has the lowest bit set, and contains 00 where the size field would be if used for an An + Xn + d addressing mode.
1. MOVE is mostly untested; MOVEq is well-tested and other MOVEs appear within the test set as per the approximate generation algorithm above but due to an error in the generation of move.json, all of its opcodes are $2000 less than they should be, causing them to hit various instructions other than MOVE;
2. there is sparse coverage of the rotates and shifts: LS[L/R], AS[L/R], RO[L/R] and ROX[L/R]; and
3. there are similarly few tests of MULU.
Issues with comparing results between multiple emulators in the case of unusual instructions mean that no tests have been generated for:
1. MOVE [to or from] SR;
2. TRAP;
3. TRAPV;
4. MOVE [to or from] USP;
5. STOP;
6. RTE;
7. Bcc where the offset is an odd number; or
8. BSR where the offset is an odd number.
For both Bcc and BSR, there is good coverage of even-quantity offsets.
Lack of good documentation for the meaning of N and Z flags for DIVU and DIVS in the case of overflow means that the results here may or may not be correct; there was no consensus between emulators and I have been unable to find information on what the proper answers should be.