The implementation is a bit tricky as it requires to take different code paths for the //e, the //c and the IIgs. Additionally the //c only provides a VBL IRQ flag supposed to be used by an IRQ handler to determine what triggered the IRQ. However, masking IRQs on the CPU, activating the VBL IRQ, clearing any pending VBL IRQs and then polling for the IRQ flag does the trick.
As described e.g. in the Apple IIe Technote #6: 'The Apple II Paddle Circuits' it doesn't work to call PREAD several times in immediate succession. However, so far the Apple II joystick driver did just that in order to read the two joystick axis.
Therefore the driver now uses a custom routine that reads both paddles _at_the_same_time_. The code doing so requires nearly twice the cycles meaning that the overall time for a joy_read() stays roughly the same. However, twice the cycles in the read loop means half the resolution. But for the cc65 joystick driver use case that doesn't hurt at all as the driver is supposed to only detect neutral vs. left/right and up/down.
CPU accelerators are supposed to detect access to $C070 and slow down for some time automatically. However, the IIgs rather comes with a modified ROM routine. Therefore it is necessary to manually slow down the IIgs when replacing the ROM routine.
Cleared the screen at the beginning of each demo instead of at the end. Setting the colors before clearing makes it more reliable and consistent across platforms.
The general approach of cl65 when generating the command lines to be executed is to first put options and the put files. However, this doesn't work well with the --lib option which would rather need to be put when libraries in general are put. I opted to not add this special behavior to cl65 as
* the use case for the --lib option is _VERY_ specific
* cl65 is after all a wrapper for ordinary use cases