1
0
mirror of https://github.com/catseye/SixtyPical.git synced 2024-11-22 17:32:01 +00:00
SixtyPical/doc/SixtyPical.md

19 KiB

SixtyPical

This document describes the SixtyPical programming language version 0.11, both its static semantics (the capabilities and limits of the static analyses it defines) and its runtime semantics (with reference to the semantics of 6502 machine code.)

This document is nominally normative, but the tests in the tests directory are even more normative.

Refer to the bottom of this document for an EBNF grammar of the syntax of the language.

Types

There are five primitive types in SixtyPical:

  • bit (2 possible values)
  • byte (256 possible values)
  • word (65536 possible values)
  • routine (code stored somewhere in memory, read-only)
  • pointer (address of a byte in a buffer)

There are also three type constructors:

  • T table[N] (N entries, 1 ≤ N ≤ 256; each entry holds a value of type T, where T is byte, word, or vector)
  • buffer[N] (N entries; each entry is a byte; 1 ≤ N ≤ 65536)
  • vector T (address of a value of type T; T must be a routine type)

User-defined

A program may define its own types using the typedef feature. Typedefs must occur before everything else in the program. A typedef takes a type expression and an identifier which has not previously been used in the program. It associates that identifer with that type. This is merely a type alias; two types with different names will compare as equal.

Memory locations

A primary concept in SixtyPical is the memory location. At any given point in time during execution, each memory location is either uninitialized or initialized. At any given point in the program text, too, each memory location is either uninitialized or initialized. Where-ever it is one or the other during execution, it is the same in the corresponding place in the program text; thus, it is a static property.

There are four general kinds of memory location. The first three are pre-defined and built-in.

Registers

Each of these hold a byte. They are initially uninitialized.

a
x
y

Flags

Each of these hold a bit. They are initially uninitialized.

c (carry)
z (zero)
v (overflow)
n (negative)

Constants

It may be strange to think of constants as memory locations, but keep in mind that a memory location in SixtyPical need not map to a memory location in the underlying hardware. All constants are read-only. Each is initially initialized with the value that corresponds with its name.

They come in bit and byte types. There are two bit constants,

off
on

two hundred and fifty-six byte constants,

0
1
...
255

and sixty-five thousand five hundred and thirty-six word constants,

word 0
word 1
...
word 65535

Note that if a word constant is between 256 and 65535, the leading word token can be omitted.

User-defined

There may be any number of user-defined memory locations. They are defined by giving the type (which may be any type except bit and routine) and the name.

byte pos

An address in memory may be given explicitly on a user-defined memory location.

byte table screen @ 1024

Or, a user-defined memory location may be given an initial value. But in this case, an explicit address in memory cannot be given.

byte pos : 0

A user-defined vector memory location is decorated with inputs, outputs and trashes lists like a routine (see below), and it may only hold addresses of routines which are compatible. (Meaning, the routine's inputs (resp. outputs, trashes) must be a subset of the vector's inputs (resp. outputs, trashes.))

vector routine
         inputs a, score
         outputs x
         trashes y
  actor_logic @ $c000

Note that in the code of a routine, if a memory location is named by a user-defined symbol, it is an address in memory, and can be read and written. But if it is named by a literal integer, either decimal or hexadecimal, it is a constant and can only be read (and when read always yields that constant value. So, for instance, to read the value at screen above, in the code, you would need to reference the symbol screen; attempting to read 1024 would not work.

This is actually useful, at least at this point, as you can rely on the fact that literal integers in the code are always immediate values. (But this may change at some point.)

Buffers and Pointers

Roughly speaking, a buffer is a table that can be longer than 256 bytes, and a pointer is an address within a buffer.

A pointer is implemented as a zero-page memory location, and accessing the buffer pointed to is implemented with "indirect indexed" addressing, as in

LDA ($02), Y
STA ($02), Y

There are extended instruction modes for using these types of memory location. See copy below, but here is some illustrative example code:

copy ^buf, ptr           // this is the only way to initialize a pointer
add ptr, 4               // ok, but only if it does not exceed buffer's size
ld y, 0                  // you must set this to something yourself
copy [ptr] + y, byt      // read memory through pointer, into byte
copy 100, [ptr] + y      // write memory through pointer (still trashes a)

where ptr is a user-defined storage location of pointer type, and the + y part is mandatory.

Routines

Every routine must list all the memory locations it reads from, which we call its inputs, and all the memory locations it writes to. The latter we divide into two groups: its outputs which it intentionally initializes, and its trashes, which it does not care about, and leaves uninitialized. For example, if it uses a register to temporarily store an intermediate value used in a multiplication, that register has no meaning outside of the multiplication, and is one of the routine's trashes.

It is common to say that the trashes are the memory locations that are not preserved by the routine.

routine foo
  inputs a, score
  outputs x
  trashes y {
    ...
}

The union of the outputs and trashes is sometimes collectively called "the WRITES" of the routine, for historical reasons and as shorthand.

Routines may call only routines previously defined in the program source. Thus, directly recursive routines are not allowed. (However, routines may also call routines via vectors, which are dynamically assigned. In this case, there is, for the time being, no check for recursive calls.)

For a SixtyPical program to be run, there must be one routine called main. This routine is executed when the program is run.

The memory locations given as inputs to a routine are considered to be initialized at the beginning of the routine. Various instructions cause memory locations to be initialized after they are executed. Calling a routine which trashes some memory locations causes those memory locations to be uninitialized after that routine is called. At the end of a routine, all memory locations listed as outputs must be initialized.

A literal word can given instead of the body of the routine. This word is the absolute address of an "external" routine located in memory but not defined by the SixtyPical program.

routine chrout
  inputs a
  trashes a
  @ 65490

Instructions

Instructions are inspired by, and in many cases closely resemble, the 6502 instruction set. However, in many cases they do not map 1:1 to 6502 instructions. If a SixtyPical instruction cannot be translated validly to one more more 6502 instructions while retaining all the stated constraints, that's a static error in a SixtyPical program, and technically any implementation of SixtyPical, even an interpreter, should flag it up.

ld

ld <dest-memory-location>, <src-memory-location> [+ <index-memory-location>]

Reads from src and writes to dest.

  • It is illegal if dest is not a register.
  • It is illegal if dest does not occur in the WRITES of the current routine.
  • It is illegal if src is not of same type as dest (i.e., is not a byte.)
  • It is illegal if src is uninitialized.

After execution, dest is considered initialized. The flags z and n may be changed by this instruction; they must be named in the WRITES, and they are considered initialized after it has executed.

If and only if src is a byte table, the index-memory-location must be given.

Some combinations, such as ld x, y, are illegal because they do not map to underlying opcodes.

There is another mode of ld which reads into a indirectly through a pointer.

ld a, [<src-memory-location>] + y

The memory location in this syntax must be a pointer.

This syntax copies the contents of memory at the pointer (offset by the y register) into a register (which must be the a register.)

In addition to the constraints above, y must be initialized before this mode is used.

st

st <src-memory-location>, <dest-memory-location> [+ <index-memory-location>]

Reads from src and writes to dest.

  • It is illegal if dest is a register or if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.
  • It is illegal if src is not of same type as dest.
  • It is illegal if src is uninitialized.

After execution, dest is considered initialized. No flags are changed by this instruction (unless of course dest is a flag.)

If and only if dest is a byte table, the index-memory-location must be given.

There is another mode of st which write a into memory, indirectly through a pointer.

st a, [<dest-memory-location>] + y

The memory location in this syntax must be a pointer.

This syntax copies the constents of the a register into the contents of memory at the pointer (offset by the y register).

In addition to the constraints above, y must be initialized before this mode is used.

copy

copy <src-memory-location>, <dest-memory-location>

Reads from src and writes to dest. Differs from st in that is able to copy more general types of data (for example, vectors,) and it trashes the z and n flags and the a register.

  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.
  • It is illegal if src is not of same type as dest.
  • It is illegal if src is uninitialized.

After execution, dest is considered initialized, and z and n, and a are considered uninitialized.

There are two extra modes that this instruction can be used in. The first is to load an address into a pointer:

copy ^<src-memory-location>, <dest-memory-location>

This copies the address of src into dest. In this case, src must be of type buffer, and dest must be of type pointer. src will not be considered a memory location that is read, since it is only its address that is being retrieved.

The second is to read or write indirectly through a pointer.

copy [<src-memory-location>] + y, <dest-memory-location>
copy <src-memory-location>, [<dest-memory-location>] + y

In both of these, the memory location in the []+y syntax must be a pointer.

The first copies the contents of memory at the pointer (offset by the y register) into a byte memory location.

The second copies a literal byte, or a byte memory location, into the contents of memory at the pointer (offset by the y register).

In addition to the constraints above, y must be initialized before this mode is used.

add dest, src

add <dest-memory-location>, <src-memory-location>

Adds the contents of src to dest and stores the result in dest.

  • It is illegal if src OR dest OR c is uninitialized.
  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects n, z, c, and v flags, requiring that they be in the WRITES, and initializing them afterwards.

dest and src continue to be initialized afterwards.

In addition, if dest is of word type, then src must also be of word type, and in this case this instruction trashes the a register.

NOTE: If dest is a pointer, the addition does not check if the result of the pointer arithmetic continues to be valid (within a buffer) or not.

inc

inc <dest-memory-location>

Increments the value in dest. Does not honour carry.

  • It is illegal if dest is uninitialized.
  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects n and z flags, requiring that they be in the WRITES, and initializing them afterwards.

sub

sub <dest-memory-location>, <src-memory-location>

Subtracts the contents of src from dest and stores the result in dest.

  • It is illegal if src OR dest OR c is uninitialized.
  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects n, z, c, and v flags, requiring that they be in the WRITES, and initializing them afterwards.

dest and src continue to be initialized afterwards.

dec

dec <dest-memory-location>

Decrements the value in dest. Does not honour carry.

  • It is illegal if dest is uninitialized.
  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects n and z flags, requiring that they be in the WRITES, and initializing them afterwards.

cmp

cmp <dest-memory-location>, <src-memory-location>

Subtracts the contents of src from dest (without considering carry) but does not store the result anywhere, only sets the resulting flags.

  • It is illegal if src OR dest is uninitialized.

Affects n, z, and c flags, requiring that they be in the WRITES, and initializing them afterwards.

and, or, xor

and <dest-memory-location>, <src-memory-location>
or <dest-memory-location>, <src-memory-location>
xor <dest-memory-location>, <src-memory-location>

Applies the given bitwise Boolean operation to src and dest and stores the result in dest.

  • It is illegal if src OR dest OR is uninitialized.
  • It is illegal if dest is read-only.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects n and z flags, requiring that they be in the WRITES of the current routine, and sets them as initialized afterwards.

dest and src continue to be initialized afterwards.

shl, shr

shl <dest-memory-location>
shr <dest-memory-location>

shl shifts the dest left one bit position. The rightmost position becomes c, and c becomes the bit that was shifted off the left.

shr shifts the dest right one bit position. The leftmost position becomes c, and c becomes the bit that was shifted off the right.

  • It is illegal if dest is a register besides a.
  • It is illegal if dest is read-only.
  • It is illegal if dest OR c is uninitialized.
  • It is illegal if dest does not occur in the WRITES of the current routine.

Affects the c flag, requiring that it be in the WRITES of the current routine, and it continues to be initialized afterwards.

call

call <executable-name>

Transfers execution to the given executable, whether that is a previously- defined routine, or a vector location which contains the address of a routine which will be called indirectly. Execution will be transferred back to the current routine, when execution of the executable is finished.

  • It is illegal if any of the memory locations listed in the called routine's inputs are uninitialized immediately before the call.

Just after the call,

  • All memory locations listed in the called routine's trashes are considered to now be uninitialized.
  • All memory locations listed in the called routine's outputs are considered to now be initialized.

goto

goto <executable-name>

Unilaterally transfers execution to the given executable. Execution will not be transferred back to the current routine when execution of the executable is finished; rather, it will be transferred back to the caller of the current routine.

If goto is used in a routine, it must be in tail position. That is, it must be the final instruction in the routine.

Just before the goto,

  • It is illegal if any of the memory locations in the target routine's inputs list is uninitialized.

In addition,

  • The target executable's WRITES must not include any locations that are not already included in the current routine's WRITES.

if

if <src-memory-location> {
    <true-branch>
} else {
    <false-branch>
}

Executes the true-branch if the value in src is nonzero, otherwise executes the false-branch. The false-branch is optional may be omitted; in this case it is treated like an empty block.

  • It is illegal if src is not z, c, n, or v.
  • It is illegal if src is not initialized.
  • It is illegal if any location initialized at the end of the true-branch is not initialized at the end of the false-branch, and vice versa.

The sense of the test can be inverted with not.

repeat

repeat {
    <block>
} until <src-memory-location>

Executes the block repeatedly until the src (observed at the end of the execution of the block) is non-zero. The block is always executed as least once.

  • It is illegal if any memory location is uninitialized at the exit of the loop when that memory location is initialized at the start of the loop.

To simulate a "while" loop, use an if internal to the block, like

repeat {
    cmp y, 25
    if z {
    }
} until z

"until" is optional, but if omitted, must be replaced with "forever":

repeat {
    cmp y, 25
    if z {
    }
} forever

The sense of the test can be inverted with not.

repeat {
    cmp y, 25
    if z {
    }
} until not z

Grammar

Program ::= {TypeDefn} {Defn} {Routine}.
TypeDefn::= "typedef" Type Ident<new>.
Defn    ::= Type Ident<new> [Constraints] (":" Literal | "@" LitWord).
Type    ::= TypeTerm ["table" TypeSize].
TypeExpr::= "byte"
          | "word"
          | "buffer" TypeSize
          | "pointer"
          | "vector" TypeTerm
          | "routine" Constraints
          | "(" Type ")"
          .
TypeSize::= "[" LitWord "]".
Constrnt::= ["inputs" LocExprs] ["outputs" LocExprs] ["trashes" LocExprs].
Routine ::= "define" Ident<new> Type (Block | "@" LitWord).
          | "routine" Ident<new> Constraints (Block | "@" LitWord)
          .
LocExprs::= LocExpr {"," LocExpr}.
LocExpr ::= Register | Flag | Literal | Ident.
Register::= "a" | "x" | "y".
Flag    ::= "c" | "z" | "n" | "v".
Literal ::= LitByte | LitWord.
LitByte ::= "0" ... "255".
LitWord ::= "0" ... "65535".
Block   ::= "{" {Instr} "}".
Instr   ::= "ld" LocExpr "," LocExpr ["+" LocExpr]
          | "st" LocExpr "," LocExpr ["+" LocExpr]
          | "add" LocExpr "," LocExpr
          | "sub" LocExpr "," LocExpr
          | "cmp" LocExpr "," LocExpr
          | "and" LocExpr "," LocExpr
          | "or" LocExpr "," LocExpr
          | "xor" LocExpr "," LocExpr
          | "shl" LocExpr
          | "shr" LocExpr
          | "inc" LocExpr
          | "dec" LocExpr
          | "call" Ident<routine>
          | "goto" Ident<executable>
          | "if" ["not"] LocExpr Block ["else" Block]
          | "repeat" Block ("until" ["not"] LocExpr | "forever")
          | "copy" LocExpr "," LocExpr ["+" LocExpr]
          .