6.8 KiB
SixtyPical
Version 0.16. Work-in-progress, everything is subject to change.
SixtyPical is a 6502-like programming language with advanced static analysis.
"6502-like" means that it has similar restrictions as programming in 6502 assembly (e.g. the programmer must choose the registers that values will be stored in) and is concomitantly easy for a compiler to translate it to 6502 machine language code.
"Advanced static analysis" includes abstract interpretation, where we go through the program step by step, tracking not just the changes that happen during a specific execution of the program, but sets of changes that could possibly happen in any run of the program. This lets us determine that certain things can never happen, which we can then formulate as safety checks.
In practice, this means it catches things like
- you forgot to clear carry before adding something to the accumulator
- a subroutine that you call trashes a register you thought was preserved
- you tried to read or write a byte beyond the end of a byte array
- you tried to write the address of something that was not a routine, to a jump vector
and suchlike. It also provides some convenient operations based on machine-language programming idioms, such as
- copying values from one register to another (via a third register when there are no underlying instructions that directly support it); this includes 16-bit values, which are copied in two steps
- explicit tail calls
- indirect subroutine calls
The reference implementation can analyze and compile SixtyPical programs to 6502 machine code.
Quick Start
If you have the VICE emulator installed, from this directory, you can run
./loadngo.sh c64 eg/c64/hearts.60p
and it will compile the hearts.60p source code and
automatically start it in the x64
emulator, and you should see:
You can try the loadngo.sh
script on other sources in the eg
directory
tree, which contains more extensive examples, including an entire
game(-like program); see eg/README.md for a listing.
Documentation
- Design Goals
- SixtyPical specification
- SixtyPical revision history
- Literate test suite for SixtyPical syntax
- Literate test suite for SixtyPical analysis
- Literate test suite for SixtyPical compilation
- Literate test suite for SixtyPical fallthru optimization
- 6502 Opcodes used/not used in SixtyPical
- Output formats supported by
sixtypical
TODO
low
and high
address operators
To turn word
type into byte
.
Trying to remember if we have a compelling case for this or now. The best I can think
of is for implementing 16-bit cmp
in an efficient way. Maybe we should see if we
can get by with 16-bit cmp
instead though.
The problem is that once a byte is extracted, putting it back into a word is awkward.
The address operators have to modify a destination in a special way. That is, when
you say st a, >word
, you are updating word
to be word & $ff | a << 8
, somelike.
Is that consistent with st
? Well, probably it is, but we have to explain it.
It might make more sense, then, for it to be "part of the operation" instead of "part of
the reference"; something like st.hi x, word
; st.lo y, word
. Dunno.
Save values
This preserves them, so that, semantically, they can be used later even though they are trashed (or otherwise alternately used) inside the block.
Inside the block, we set them as writeable (but not meaningful). When the block exits, we restore whatever status they had.
This act will trash a
, both in the block, and outside it, unless the value being
saved is a
. One idiom would be something like
save a { save var {
...
} }
which would save all values. Maybe abbreviate this to
save a, var {
...
}
This can use the stack. But it need not use the stack.
Make all symbols forward-referencable
Basically, don't do symbol-table lookups when parsing, but do have a more formal "symbol resolution" (linking) phase right after parsing.
Associate each pointer with the buffer it points into
Check that the buffer being read or written to through pointer, appears in appropriate inputs or outputs set.
In the analysis, when we obtain a pointer, we need to record, in contect, what buffer that pointer came from.
When we write through that pointer, we need to set that buffer as written.
When we read through the pointer, we need to check that the buffer is readable.
Table overlays
They are uninitialized, but the twist is, the address is a buffer that is an input to and/or output of the routine. So, they are defined (insofar as the buffer is defined.)
They are therefore a "view" of a section of a buffer.
This is slightly dangerous since it does permit aliases: the buffer and the table refer to the same memory.
Although, if they are static
, you could say, in the routine in which they
are static
, as soon as you've established one, you can no longer use the
buffer; and the ones you establish must be disjoint.
(That seems to be the most compelling case for restricting them to static
.)
An alternative would be static
pointers, which are currently not possible because
pointers must be zero-page, thus @
, thus uninitialized.
Question "consistent initialization"
Question the value of the "consistent initialization" principle for if
statement analysis.
Part of this is the trashes at the end; I think what it should be is that the trashes
after the if
is the union of the trashes in each of the branches; this would obviate the
need to trash
values explicitly, but if you tried to access them afterwards, it would still
error.
Tail-call optimization
More generally, define a block as having zero or one goto
s at the end. (and goto
s cannot
appear elsewhere.)
If a block ends in a call
can that be converted to end in a goto
? Why not? I think it can.
The constraints should iron out the same both ways.
And - once we have this - why do we need goto
to be in tail position, strictly?
As long as the routine has consistent type context every place it exits, that should be fine.
"Include" directives
Search a searchlist of include paths. And use them to make libraries of routines.
One such library routine might be an interrupt routine
type for various architectures.
Since "the supervisor" has stored values on the stack, we should be able to trash them
with impunity, in such a routine.