For quite some time I deliberately didn't add cursor support to the Apple II CONIO imöplementation. I consider it inappropriate to increase the size of cgetc() unduly for a rather seldom used feature.
There's no hardware cursor on the Apple II so displaying a cursor during keyboard input means reading the character stored at the cursor location, writing the cursor character, reading the keyboard and finally writing back the character read initially.
The naive approach is to reuse the part of cputc() that determines the memory location of the character at the cursor position in order to read the character stored there. However that means to add at least one additional JSR / RTS pair to cputc() adding 4 bytes and 12 cycles :-( Apart from that this approach means still a "too" large cgetc().
The approach implemented instead is to include all functionality required by cgetc() into cputc() - which is to read the current character before writing a new one. This may seem surprising at first glance but an LDA(),Y / TAX sequence adds only 3 bytes and 7 cycles so it cheaper than the JSR / RTS pair and allows to brings down the code increase in cgetc() down to a reasonable value.
However so far the internal cputc() code in question saved the X register. Now it uses the X register to return the old character present before writing the new character for cgetc(). This requires some rather small adjustments in other functions using that internal cputc() code.
About all CONIO functions offering a <...>xy variant call
popa
_gotoxy
By providing an internal gotoxy variant that starts with a popa all those CONIO function can be shortened by 3 bytes. As soon as program calls more than one CONIO function this means an overall code size reduction.