There was a problem with the fix in commit 0047b755c9660: an instruction should have had an immediate operand, but did not because it was missing the #. This might cause the code to behave incorrectly, depending on memory contents.
This includes changes to fread to fix two problems. The first is that fread would ignore and discard characters put back with ungetc(). The second is that it would generally return the wrong value if reading from stdin with an element size other than 1 (it would return the count of bytes, not elements).
This fixes#9 (the first problem mentioned above).
These are the stdio.asm changes that were present in the beta source code on Opus ][, but had been reverted in commit e3c0c962d4. As it turns out, the changes to stdio.asm were OK--the issue was simply that the definitions of stdin/stdout/stderr and the associated initialization code in vars.asm had not been updated to account for the new version of the FILE structure. That has now been done, allowing the changes to work properly.
This fixes#7.
This adds a check for the filename argument being null, in which case it bails out and returns NULL. Previously, it would just dereference the null pointer and treat the contents of memory location 0 as a filename, with unpredictable results. A null filename indicates freopen should try to reopen the same file with a different mode, but the degree of support for that is implementation-defined. We still don't really support it, but at least this will report the failure and avoid unpredictable behavior.
It also removes some unused code at the end, but still forces fputc and fgetc to be linked in. These are needed because there are weak references to them in putchar and getchar, which may need to be used if stdin/stdout have been reopened.
This is what the standards require. Note that the mark is only set to EOF when actually writing to the underlying file, not merely buffering data to write later. This is consistent with the usual POSIX implementation using O_APPEND.
This would seek to the wrong location, and in some cases try to seek before the beginning of the file, leading to an error condition.
The problem stemmed from fseek calling fflush, which sets up the stream's buffer state in a way appropriate for writing but not for reading. It then calls ftell, which (for a read-only stream) would misinterpret this state and miscalculate the current position.
The fix is to make fflush work correctly on a read-only stream, setting up the state for reading. (User code calling fflush on a read-only stream would be undefined behavior, but since fseek does it we need to make it work.)
This fixes#6.
Most files already used spaces, but three used tabs for indentation. These have been converted to use spaces. This allows the files to be displayed with proper formatting in modern editors and on GitHub. It also removes any dependency on SysTabs settings when assembling them.
The spacing in fpextra.asm was also modified to use standard column positions.
There are no non-whitespace changes in this commit.
The C standards require it to do this, in addition to calling fseek.
Here is a test that can show the issue (in a realistic program, the indicator would be set due to an actual IO error, but for testing purposes this just sets it explicitly):
#include <stdio.h>
int main(void) {
FILE *f = tmpfile();
if (!f) return 0;
f->_flag |= _IOERR;
if (!ferror(f)) puts("bad ferror");
rewind(f);
if (ferror(f)) puts("rewind does not reset ferror");
fclose(f);
}
The core loop uses the 65816's decimal mode to create a BCD representation of the number, then we convert that to ASCII to print it. The idea is based on some code from Kent Dickey, with adjustments to fit this codebase. This technique should be faster, and also should typically result in smaller code, since the long long division routine will not need to be linked into every program that uses printf.
The new m16.int64 file contains a new "negate8" macro (for 64-bit negation), as well as an improved version of "ph8". These have been added to the individual *.macros files, but if those are regenerated, m16.int64 should be used as an input to macgen ahead of m16.ORCA (which contains the original version of ph8).
EOF should not be returned in this case.
I think it shouldn't actually write anything to the destination in this case, but it currently does. That problem remains unfixed for the moment; addressing it would require us to have our own internal buffer.
Conversions using it will not actually work right yet (unless assignment is suppressed), because the necessary changes have not been made in SysFloat, but the infrastructure is now in place to allow for those changes.
This was not being reported unless assignment was suppressed, because the "sta ~eofFound" instruction was too late.
(Found based on test cases by Rich Felker.)
Code in ~scanf would call ~RemoveWord to remove the extra parameters, but ~RemoveWord is designed to be called from one of the ~Scan_... subroutines, which is (in effect) called by JSR and thus has an extra word on the stack for its return address. This stack misalignment caused ~RemoveWord to overwrite a word of the caller's stack when called from the code to remove extra parameters.
This could cause a crash in the following program:
#include <stdio.h>
void f(void) {
int a,b;
sscanf("Z", "%i%i", &a, &b);
}
int main(void) {
f();
}
This may cause scanf to return EOF, if no conversions have been done.
This seems to be the intent of the C standards, although the wording could be clearer. It is also consistent with all the other implementations I tested.
This applies to + or - signs for 'd' or 'i' conversions, and to '0x' or '0X' prefixes for 'i' and 'x' conversions. (The new support for signs on u/o/x conversions did not have the problem.)
These cases were treated as if they successfully matched a number, but now they are correctly treated as a failure, causing no assignment to be done and causing the function to return without processing any further directives.
(Incidentally, the handling of '0x' without following digits is something that differs in modern C libraries: macOS libc, glibc, and musl are all different. Our new behavior should match that of musl, which I believe is what is correct under the C standards.)
The 'll' length modifier is now fully supported for the d, i, o, u, x, and X conversion specifiers. The 'n' conversion specifier can also store to a long long, but the value is still limited to 64k. The 'j' length modifier (for intmax_t) is also now treated as specifying a 64-bit value.
The 'b' conversion specifier is an ORCA extension to support p-strings, but lower case letters are reserved for use in future C standards, and in fact C23 is likely to use 'b' for integers in binary notation. For the time being, 'b' is still supported with its existing meaning, but it is considered deprecated. It may be removed in the future if we want to support the C23 behavior. Upper-case letters are available to use for extensions, so 'P' should remain available for our use.
For example, the output of the following should have only two leading zeros (and the rest of the field width padded with spaces):
#include <stdio.h>
int main(void) {
printf("%016.5i\n", 123);
}
If the field width is given by an argument (when * is used in the format specification) and the value given is negative, it should be treated like a positive field width with the - flag. If a negative argument is given for the precision, it should be treated as if the precision was not specified.
The following is an example of code that behaved incorrectly:
#include <stdio.h>
int main(void) {
printf("%*iXX\n", -50, 123);
}
Currently, it works as follows:
*The 'll' length modifier is recognized. 'j' (for intmax_t) is also now treated as denoting a 64-bit type.
*The 'x', 'X', and 'o' format specifiers have full support for 64-bit types.
*The 'n' format specifier can write a 64-bit integer, but only actually supports values up to 64k.
*The 'd', 'i', and 'u' format specifiers can consume a 64-bit value, but they only print it correctly if it is within the range of 32-bit long/unsigned long.
Tabs have been expanded to spaces in several files that use mainly spaces for indentation.
The files ctype.asm, stdio.asm, and string.asm consistently use tabs for indentation. The tabs in these files have been left alone, except that a few tabs between sentences in comments were changed to spaces. One space-indented line in stdio.asm was changed to use a tab.
This could happen due to a race condition, which shouldn't generally be an issue on the GS but is at least theoretically possible under GNO or Golden Gate.
This could happen if a scan error occurred and certain conversion specifiers containing the % character in a scanset (e.g. %4[%]) appeared as subsequent elements in the format string. The fix is to more thoroughly parse the format string when cleaning up after a scan error.
This fixes#26.
perror() should write only a single new-line. Also, it should not write the prefix string, colon and space if the prefix string pointer is NULL or if it points to an empty string.
These all behave identically to 'f'. Note that none of these accept the hexadecimal floating-point representation as input, which they should under C99.