Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
import { byte, KnownValues, Memory, word } from '../types';
|
|
|
|
import { STRING_TO_TOKEN } from './tokens';
|
|
|
|
import { TXTTAB, PRGEND, VARTAB, ARYTAB, STREND } from './zeropage';
|
|
|
|
|
|
|
|
/** Default address for program start */
|
2021-03-31 00:27:44 +00:00
|
|
|
const PROGRAM_START = 0x801;
|
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
/** Parse states. Starts in `NORMAL`. */
|
|
|
|
enum STATES {
|
|
|
|
/**
|
|
|
|
* Tries to tokenize the input. Transitions:
|
|
|
|
* * `"`: `STRING`
|
|
|
|
* * `REM`: `COMMENT`
|
|
|
|
* * `DATA`: `DATA`
|
|
|
|
*/
|
|
|
|
NORMAL = 0,
|
|
|
|
/**
|
|
|
|
* Stores the input exactly. Tranistions:
|
|
|
|
* * `"`: `NORMAL`
|
|
|
|
*/
|
|
|
|
STRING = 1,
|
|
|
|
/** Stores the input exactly up until the end of the line. No transitions. */
|
|
|
|
COMMENT = 2,
|
|
|
|
/**
|
|
|
|
* Stores the input exactly. Transitions:
|
|
|
|
* * `:`: `NORMAL`
|
|
|
|
* * `"`: `DATA_QUOTE`
|
|
|
|
*/
|
|
|
|
DATA = 3,
|
|
|
|
/**
|
|
|
|
* Stores the input exactly. Transitions:
|
|
|
|
* * `"`: `DATA`
|
|
|
|
*/
|
|
|
|
DATA_QUOTE = 4,
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
function writeByte(mem: Memory, addr: word, val: byte) {
|
|
|
|
const page = addr >> 8;
|
|
|
|
const off = addr & 0xff;
|
|
|
|
|
|
|
|
return mem.write(page, off, val);
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
function writeWord(mem: Memory, addr: word, val: byte) {
|
|
|
|
const lsb = val & 0xff;
|
|
|
|
const msb = val >> 8;
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
writeByte(mem, addr, lsb);
|
|
|
|
writeByte(mem, addr + 1, msb);
|
|
|
|
}
|
|
|
|
|
|
|
|
class LineBuffer implements IterableIterator<string> {
|
|
|
|
private prevChar: number = 0;
|
|
|
|
constructor(private readonly line: string, private curChar: number = 0) { }
|
|
|
|
|
|
|
|
[Symbol.iterator](): IterableIterator<string> {
|
|
|
|
return this;
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
clone(): LineBuffer {
|
|
|
|
return new LineBuffer(this.line, this.curChar);
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
2022-06-25 15:26:46 +00:00
|
|
|
next(): IteratorResult<string, string | undefined> {
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
if (this.atEnd()) {
|
|
|
|
return { done: true, value: undefined };
|
|
|
|
}
|
|
|
|
this.prevChar = this.curChar;
|
|
|
|
return { done: false, value: this.line[this.curChar++] };
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
/**
|
|
|
|
* Tries to match the input token at the current buffer location. If
|
|
|
|
* the token matches, the current buffer location is advanced passed
|
|
|
|
* the token and this method returns `true`. Otherwise, this method
|
|
|
|
* returns `false`.
|
2022-06-25 15:26:46 +00:00
|
|
|
*
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
* The input is assumed to be an all-uppercase string and the tokens
|
|
|
|
* in the buffer are uppercased before the comparison.
|
2022-06-25 15:26:46 +00:00
|
|
|
*
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
* @param token An all-uppercase string to match.
|
|
|
|
*/
|
|
|
|
lookingAtToken(token: string): boolean {
|
|
|
|
const oldCurChar = this.curChar;
|
|
|
|
const oldPrevChar = this.prevChar;
|
|
|
|
let possibleToken = '';
|
|
|
|
for (const char of this) {
|
|
|
|
if (char === ' ') {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
possibleToken += char;
|
|
|
|
if (possibleToken.length === token.length) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (possibleToken.toUpperCase() === token) {
|
|
|
|
// Matched; set prevChar to before the match.
|
|
|
|
this.prevChar = oldCurChar;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
// No match; restore state.
|
|
|
|
this.curChar = oldCurChar;
|
|
|
|
this.prevChar = oldPrevChar;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
backup() {
|
|
|
|
this.curChar = this.prevChar;
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
peek(): string {
|
|
|
|
if (this.atEnd()) {
|
|
|
|
throw new RangeError(`Reading past the end of ${this.line}`);
|
|
|
|
}
|
|
|
|
return this.line[this.curChar];
|
|
|
|
}
|
|
|
|
|
|
|
|
atEnd(): boolean {
|
|
|
|
return this.curChar >= this.line.length;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
export default class ApplesoftCompiler {
|
|
|
|
private lines: Map<number, byte[]> = new Map();
|
|
|
|
|
|
|
|
/**
|
2022-07-14 03:34:50 +00:00
|
|
|
* Loads an Applesoft BASIC program into memory.
|
2022-06-25 15:26:46 +00:00
|
|
|
*
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
* @param mem Memory, including zero page, into which the program is
|
|
|
|
* loaded.
|
|
|
|
* @param program A string with a BASIC program to compile (tokenize).
|
|
|
|
* @param programStart Optional start address of the program. Defaults to
|
2022-07-14 03:34:50 +00:00
|
|
|
* standard Applesoft program address, 0x801.
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
*/
|
|
|
|
static compileToMemory(mem: Memory, program: string, programStart: word = PROGRAM_START) {
|
|
|
|
const compiler = new ApplesoftCompiler();
|
|
|
|
compiler.compile(program);
|
|
|
|
const compiledProgram: Uint8Array = compiler.program(programStart);
|
|
|
|
|
|
|
|
for (let i = 0; i < compiledProgram.byteLength; i++) {
|
|
|
|
writeByte(mem, programStart + i, compiledProgram[i]);
|
|
|
|
}
|
|
|
|
// Set zero page locations. Applesoft is weird because, when a line
|
|
|
|
// is inserted, PRGEND is copied to VARTAB in the beginning, but then
|
|
|
|
// VARTAB is manipulated to make space for the line, then PRGEND is
|
|
|
|
// set from VARTAB. There's also a bug in NEW at D657 where the carry
|
|
|
|
// flag is not cleared, so it can add 2 or 3. The upshot, though, is
|
|
|
|
// that PRGEND and VARTAB end up being 1 or 2 bytes past the end of
|
|
|
|
// the program. From my tests is the emulator, it's usually 1, so
|
|
|
|
// that's what we're going with here.
|
|
|
|
const prgend = programStart + compiledProgram.byteLength + 1;
|
|
|
|
writeWord(mem, TXTTAB, programStart);
|
|
|
|
writeWord(mem, PRGEND, prgend);
|
|
|
|
writeWord(mem, VARTAB, prgend);
|
|
|
|
writeWord(mem, ARYTAB, prgend);
|
|
|
|
writeWord(mem, STREND, prgend);
|
|
|
|
}
|
|
|
|
|
|
|
|
private readLineNumber(lineBuffer: LineBuffer): number {
|
|
|
|
let lineNoStr = '';
|
|
|
|
|
|
|
|
for (const character of lineBuffer) {
|
|
|
|
if (/\d/.test(character)) {
|
|
|
|
lineNoStr += character;
|
|
|
|
} else {
|
|
|
|
lineBuffer.backup();
|
|
|
|
break;
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
|
|
|
if (lineNoStr.length === 0) {
|
|
|
|
throw new Error('Missing line number');
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
return parseInt(lineNoStr, 10);
|
|
|
|
}
|
|
|
|
|
|
|
|
private readToken(lineBuffer: LineBuffer): byte {
|
|
|
|
// Try to match a token
|
|
|
|
for (const possibleToken in STRING_TO_TOKEN) {
|
|
|
|
if (lineBuffer.lookingAtToken(possibleToken)) {
|
|
|
|
// NOTE(flan): This special token-preference
|
2022-07-14 03:34:50 +00:00
|
|
|
// logic is straight from the Applesoft BASIC
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// code (D5BE-D5CA in the Apple //e ROM).
|
|
|
|
|
|
|
|
// Found a token
|
|
|
|
if (possibleToken === 'AT' && !lineBuffer.atEnd()) {
|
|
|
|
const lookAhead = lineBuffer.peek();
|
|
|
|
// ATN takes precedence over AT
|
|
|
|
if (lookAhead === 'N') {
|
|
|
|
lineBuffer.next();
|
|
|
|
return STRING_TO_TOKEN['ATN'];
|
|
|
|
}
|
|
|
|
// TO takes precedence over AT
|
|
|
|
if (lookAhead === 'O') {
|
|
|
|
// Backup to before the token
|
|
|
|
lineBuffer.backup();
|
|
|
|
// and emit the 'A' (upper- or lower-case)
|
2022-06-25 15:26:46 +00:00
|
|
|
return lineBuffer.next().value?.charCodeAt(0) ?? 0;
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
return STRING_TO_TOKEN[possibleToken];
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// If not a token, output the character upper-cased
|
2022-06-25 15:26:46 +00:00
|
|
|
return lineBuffer.next().value?.toUpperCase().charCodeAt(0) ?? 0;
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
private compileLine(line: string | null | undefined) {
|
|
|
|
const result: byte[] = [];
|
|
|
|
if (!line) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
const lineBuffer = new LineBuffer(line);
|
|
|
|
let state: KnownValues<typeof STATES> = STATES.NORMAL;
|
|
|
|
|
|
|
|
const lineNumber = this.readLineNumber(lineBuffer);
|
|
|
|
if (lineNumber < 0 || lineNumber > 65535) {
|
|
|
|
throw new Error('Line number out of range');
|
|
|
|
}
|
|
|
|
|
|
|
|
// Read the rest of the line
|
|
|
|
for (const character of lineBuffer) {
|
|
|
|
const charCode = character.charCodeAt(0);
|
|
|
|
switch (state) {
|
|
|
|
case STATES.NORMAL:
|
|
|
|
// Skip spaces
|
|
|
|
if (character === ' ') {
|
2021-03-31 00:27:44 +00:00
|
|
|
break;
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// Transition to parsing a string
|
|
|
|
if (character === '"') {
|
|
|
|
result.push(charCode);
|
|
|
|
state = STATES.STRING;
|
|
|
|
break;
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// Shorthand for PRINT (D580 in Apple //e ROM)
|
|
|
|
if (character === '?') {
|
|
|
|
result.push(STRING_TO_TOKEN['PRINT']);
|
|
|
|
break;
|
|
|
|
}
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// Try to parse a token or character
|
|
|
|
lineBuffer.backup();
|
|
|
|
{
|
|
|
|
const token = this.readToken(lineBuffer);
|
|
|
|
if (token === STRING_TO_TOKEN['REM']) {
|
|
|
|
state = STATES.COMMENT;
|
|
|
|
}
|
|
|
|
if (token === STRING_TO_TOKEN['DATA']) {
|
|
|
|
state = STATES.DATA;
|
|
|
|
}
|
|
|
|
result.push(token);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case STATES.COMMENT:
|
|
|
|
result.push(character.charCodeAt(0));
|
|
|
|
break;
|
|
|
|
case STATES.STRING:
|
|
|
|
if (character === '"') {
|
|
|
|
state = STATES.NORMAL;
|
|
|
|
}
|
|
|
|
result.push(character.charCodeAt(0));
|
|
|
|
break;
|
|
|
|
case STATES.DATA:
|
|
|
|
if (character === ':') {
|
|
|
|
state = STATES.NORMAL;
|
|
|
|
}
|
|
|
|
if (character === '"') {
|
|
|
|
state = STATES.DATA_QUOTE;
|
|
|
|
}
|
|
|
|
result.push(character.charCodeAt(0));
|
|
|
|
break;
|
|
|
|
case STATES.DATA_QUOTE:
|
|
|
|
if (character === '"') {
|
|
|
|
state = STATES.DATA;
|
|
|
|
}
|
|
|
|
result.push(character.charCodeAt(0));
|
|
|
|
break;
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
this.lines.set(lineNumber, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
compile(program: string) {
|
2021-03-31 00:27:44 +00:00
|
|
|
const lines = program.split(/[\r\n]+/g);
|
|
|
|
|
|
|
|
while (lines.length) {
|
|
|
|
const line = lines.shift();
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
this.compileLine(line);
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/** Returns the compiled program at the given start address. */
|
|
|
|
program(programStart: word = PROGRAM_START): Uint8Array {
|
|
|
|
const result: byte[] = [];
|
2021-03-31 00:27:44 +00:00
|
|
|
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
// Lines can be inserted out of order, but they should be in order
|
|
|
|
// when tokenized.
|
|
|
|
const lineNumbers = [...this.lines.keys()].sort();
|
|
|
|
|
|
|
|
for (const lineNo of lineNumbers) {
|
2022-06-25 15:26:46 +00:00
|
|
|
const lineBytes = this.lines.get(lineNo) || [];
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
const nextLineAddr = programStart + result.length + 4
|
|
|
|
+ lineBytes.length + 1; // +1 for the zero at end of line
|
|
|
|
result.push(nextLineAddr & 0xff, nextLineAddr >> 8);
|
|
|
|
result.push(lineNo & 0xff, lineNo >> 8);
|
|
|
|
result.push(...lineBytes);
|
|
|
|
result.push(0x00);
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
Applesoft compiler fixes (#98)
* Add tests for Applesoft compiler in preparation for refactoring
While refactoring the compiler, I found several small bugs:
* Lower-case letters in strings and REM statements were converted
to upper-case.
* Lines are stored in the order received, not sorted by line number.
* Does not prefer `ATN` to `AT`.
* Does not prefer `TO` to `AT`.
* `DATA` statements don't preserve spaces.
* `DATA` statements don't preserve lowercase.
These will be fixed in the upcoming refactoring.
* Refactor the Applesoft Compiler
Before, the compiler had a few bugs that were not trivial to solve
because the implementation was in one heavily-nested function.
In this refactoring of the compiler, things like tokenization have
been split into separate methods which makes them a bit easier to
understand.
This refactoring also passes all of the tests.
* Set `PRGEND` when compiling to memory
Before, `PRGEND` was not adjusted which made round-tripping from
the Applesoft compiler to the decompiler not work. This change
now updates `PRGEND` with the end-of-program + 2 bytes which seems
to be the most frequent value that I have observed.
* Fix two compiler bugs
In debugging the decompiler, I noticed two bugs in the compiler:
* The first character after a line number was skipped.
* `?` was not accepted as a shortcut for `PRINT`.
This change fixes these two problems and adds tests.
* Ignore spaces more aggressively
It turns out that Applesoft happily accepts 'T H E N' for `THEN`
but the parser did not. This change fixes that and adds tests for
some odd cases.
Interestingly, this means that there are some valid statements
that Applesoft can never parse correctly because it is greedy
and ignores (most) spaces. For example, `NOT RACE` will always
parse as `NOTRACE` even though `NOT RACE` is a valid expression.
* Move tokens into a separate file
Because the token lists are just maps in opposite directions, put
them in the same file. In the future, maybe we can build one
automatically.
* Fix `apple2.ts`
I had neglected to actually update `apple2.ts` to use the new
compiler and decompiler. They now do.
Also, the decompiler can be created from `Memory`. It assumes,
though, that the zero page pointers to the start and end of the
program are correct.
* Address comments
* No more `as const` for tokens.
* Extracted zero page constants to their own file.
Co-authored-by: Will Scullin <scullin@scullin.com>
2022-06-24 03:41:45 +00:00
|
|
|
result.push(0x00, 0x00);
|
|
|
|
|
|
|
|
return new Uint8Array(result);
|
2021-03-31 00:27:44 +00:00
|
|
|
}
|
|
|
|
}
|