There were two aspects of this behavior that were considered undesirable:
- Although the safe inlining is in general desirable it should only be enabled if asked for it - like any other optimization.
- The option name -Os implies that it is a safe option, the potentially unsafe inlining should have a more explicit name.
So now:
- The option -Os enables the safe inlining.
- The new option --eagerly-inline-funcs enables the potentially unsafe inlining (including the safe inlining).
Additionally was added:
- The option --inline-stdfuncs that does like -Os enable the safe inlining but doesn't enable optimizations.
- The pragma inline-stdfuncs that works identical to --inline-stdfuncs.
- The pragma allow-eager-inline that enables the potentially unsafe inlining but doesn't include the safe inlining. That means that by itself it only marks code as safe for potentially unsafe inlining but doesn't actually enable any inlining.
The 'all' target deliberately doesn't build the doc nor the samples. But that doesn't mean that the Makefiles in the 'doc' and 'samples' directories must default to the (empty) 'all' target.
For quite some time I deliberately didn't add cursor support to the Apple II CONIO imöplementation. I consider it inappropriate to increase the size of cgetc() unduly for a rather seldom used feature.
There's no hardware cursor on the Apple II so displaying a cursor during keyboard input means reading the character stored at the cursor location, writing the cursor character, reading the keyboard and finally writing back the character read initially.
The naive approach is to reuse the part of cputc() that determines the memory location of the character at the cursor position in order to read the character stored there. However that means to add at least one additional JSR / RTS pair to cputc() adding 4 bytes and 12 cycles :-( Apart from that this approach means still a "too" large cgetc().
The approach implemented instead is to include all functionality required by cgetc() into cputc() - which is to read the current character before writing a new one. This may seem surprising at first glance but an LDA(),Y / TAX sequence adds only 3 bytes and 7 cycles so it cheaper than the JSR / RTS pair and allows to brings down the code increase in cgetc() down to a reasonable value.
However so far the internal cputc() code in question saved the X register. Now it uses the X register to return the old character present before writing the new character for cgetc(). This requires some rather small adjustments in other functions using that internal cputc() code.