THIS DOCUMENT IS A WORK IN PROGRESS

MG's Davex Forth is a Forth system implementing the Forth 2012 Core word set.

This implementation runs under the Apple II Davex shell.  It requires a version
of Davex that implements the xgetln2 call, which is documented on the project
page, but does not actually exist.  You may contact me to get an experimental
fork of Davex that does implement it.

Additionally, the following are implemented:

* The Exception word set.
* The following words from the Core Extensions word set:
    .( .R U.R 2>R 2R> 2R@ :NONAME AGAIN BUFFER: C" COMPILE, DEFER DEFER! DEFER@
    ERASE FALSE HEX MARKER NIP PAD PARSE PARSE-NAME PICK REFILL
    RESTORE-INPUT SAVE-INPUT SOURCE-ID TO TRUE TUCK U> UNUSED VALUE WITHIN \
* The following words from the Double Number word set:
    DABS DNEGATE D. D.R
* The Facility word set.
* The following words from the Programming-Tools word set:
    .S ? WORDS
* The following words from the Programming-Tools extension word set:
    BYE STATE
* The following words from the String word set:
    BLANK
* Words supporting the Apple II+ProDOS+Davex environment (documented below)

Implementation-defined options (Forth 2012 4.1.1):

  * No address alignment is required for cells or characters.
  * EMIT sends non-printing characters to the output device.
  * ACCEPT allows all editing that Davex allows, except for history.
  * The character set is the Apple II normal character set.
    Characters are stored high-bit OFF.
  * There are no charater set extensions.
  * Control characters match a space character in PARSE-NAME only.
  * The control-flow stack is implemented on the parameter stack as addresses
    to be resolved later by words that consume them.
  * Digits larger than 35 convert to lower-case letters.  If BASE is larger
    than 35, number parsing becomes case-sensitive.
  * After input terminates, the cursor is on the beginning of the next line.
    If no exception occurs, after the line is executed, the system will display
    the sytem prompt.
  * When an exception occurs outside of CATCH, the system will display the
    exception number, will forget any current word being defined, and
    resume user input through QUIT.
  * The input line terminator is the carriage return.
  * The maximum size of a counted string is 255 characters.
  * The maximum size of a parsed string is limited by memory for PARSE and
    PARSE-NAME, and 34 characters for WORD.
  * The maximum size of a definition name is 16 characters.
  * ENVIRONMENT? never returns anything but false.
  * The user input device is the keyboard unless redirected by Davex.
  * The user output device is the screen unless redirected by Davex.
  * The dictionary starts at 256 bytes beyond the lowest memory allowed by
    DaveX and works its way up.
  * An address unit contains 8 bits.
  * Numbers are 16-bits with the sign (if used) in the high bit.  Numbers
    are stored little-endian.  Arithmetic is 16-bit except for mixed-precision.
    No 32-bit by 32-bit division is implemented.
  * Ranges:
      n: -32768..32767
      +n: 0..32767
      u: 0..65535
      d: -2147483648..2147483647
      +d: 0..2147483647
      ud: 0..4294967295
  * There are no read-only data space regions.
  * The buffer for WORD is 35 bytes and is shared with the pictured numeric
    output.  The buffer will move with the dictionary end.
  * One cell is two address units (16 bits total).
  * One character is one address unit (8 bits total).
  * The keyboard terminal input buffer is 252 bytes.
  * The pictured numeric output string buffer is 35 bytes and shared with WORD.
    The buffer will move with the dictionary end.
  * The size of the PAD is 128 bytes.  The PAD will move with the dictionary
    end and it usable size will shrink by one byte for each byte that UNUSED is
    less than 179.  If PAD is used under this circumstance the behavior is
    undefined.
  * The system is not case-sensitive when finding dictionary names.
  * The system prompt is either '[OK]' in the compilation state, or 'OK' in
    the interpretation state, and is displayed after the previous input is
    evaluated successfully.
  * Division rounding is floored by default, but /MOD and M/MOD are deferred
    words that may be used to change the rounding of those and their derived
    single- and mixed-precision words, respectively.
  * STATE takes the value 1 when compiling a definition before any DOES>,
    and 2 after DOES>.
  * Integer overflow is truncated to the low bits, except in UM/MOD and SM/REM
    (and derived operations) where result overthrow results in an exception.
  * The current definiton may be found after DOES>.

Ambiguous conditions (Forth 2012 4.1.2):

General:

  * When a parsed name is neither a dictionary word nor a number, an exception
    is thrown.
  * When a definition name exceeds the maximum allowed length, an exception
    is thrown.
  * When addressing a region not listed in the data space, the system allows
    the access with the consequences being left as an exercise for the
    programmer.
  * Passing incorrect argument types results in the argument being used as if
    it were the expected type, possibly causing undefined behavior.
  * An execution token may be found for a compile-only word.  Executing it
    via EXECUTE outside of the compilation context results in undefined
    behavior.
  * Dividing by zero throws an exception.
  * Data stack overflow throws an exception. Return stack overflow results in
    undefined behavior.
  * Insufficient space for loop-control variables results in undefined behavior.
  * Insufficient space in the dictionary results in undefined behavior.
  * Interpreting a word with undefined interpretation semantics throws an
    exception.
  * Modifying the contents of the input buffer may result in undefined behavior.
    Modifying the contents of a compiled string literal is allowed but it cannot
    be changed in size.  The change is permanent within the lifetime of the
    program.  See below for interpreted string literals.
  * Overflowing the pictured numeric string output buffer may collide with the
    end of the dictionary.
  * Overflowing a parsed string with WORD throws an exception.  PARSE and
    PARSE-NAME effectively allow any length string to be parsed up to the
    end of the the line or input buffer.
  * Producing a number out of range results in overflow and truncation of the
    result *except* when mixed-precision division overflows an exception is
    thrown.
  * Data stack undeflow throws an exception.  Return stack underflow results in
    undefined behavior.
  * Unexpected end of the input buffer while parsing a name returns a zero-
    length string.

Specific:

  * >IN past the size of the input buffer results in termination of parsing.
  * RECURSE after DOES> results in recursion to the definition being compiled
    that contains the DOES>.
  * RESTORE-INPUT requires the current input source to be the same that was
    used during SAVE-INPUT or undefined behavior results.
  * Data space containing definitions may only be de-allocated by a MARKER or
    the behavior is undefined.
  * No ambiguous conditions result from alignment requirements (there are none).
  * The data space pointer cannot be misaligned, alignment is not required.
  * PICK with insufficient stack throws an exception.
  * Loop control parameters unavailable results in undefined behavior.
  * Executing IMMEDIATE affects the last definition with a name.
  * TO relies on >BODY, if >BODY cannot be used on the word, an exception is
    thrown.  That being said, all words defined by CREATE, VALUE, CONSTANT, :,
    :NONAME, DEFER and their derivatives have a body.  This means that TO may
    modify the first execution token within a colon definition.  It can also
    be used to alter a (non-system) CONSTANT or the target of DEFER.
  * When name is not found by POSTPONE, [COMPILE], etc., an exception is thrown
    and the current definition being compiled is discarded.
  * If parameters are not of the same type in DO, the loop proceeds as if they
    were the same type.
  * POSTPONE, [COMPILE], etc. applied to TO result in TO's execution token
    being compiled, making the word a parsing word.
  * WORD is limited to 34 chars + length, which is less than the maximum length
    of a counted string.  An exception will be thrown if the parsed word exceeds
    the maximum.
  * If u is greater than the number of bits in a cell for LSHIFT and RSHIFT,
    the result will be zero.
  * With regards to >BODY and DOES>, all secondary words have a body.  DOES>
    will alter any secondary unless it was created with DEFER.
  * Pictured numeric output words used outside of <# and #>, but before any
    <# may write to unintended locations in memory, resulting in undefined
    behavior.  It is generally safe to use them immediately after the #>, but
    the c-addr,u pair returned by #> will no longer be valid.
  * Accessing an unassigned deferred word throws an exception.
  * Attempting to assign an xt to a word not defined by DEFER throws an
    exception, when using DEFER! and derivatives.
  * POSTPONE, [COMPILE], etc. used to resolve a deferred word results in
    undefined behavior unless the deferred word is declared IMMEDIATE.
  * S\" is not implemented, so \x not followed by two hexadecimal digits is
    not applicable.
  * Similarly, a \ before any character not defined for S\" is not applicable.

Other system documentation (Forth 2012 4.1.3)

  * No non-standard words use PAD.
  * Terminal facilities are the same as those provided by Davex.
  * Program space available is about 1.5K.
  * The return stack is 128 cells, and is implemented in the 6502 stack.  Some
    cells are used by the host system software.
  * The data stack is 128 cells.  The data stack is split, the low unit and
    high unit of any cell on the stack are not adjacent in memory.
  * The system dictionary space is approximately 8K.
  
Non-standard words included:

COLD ( x1..xn -- ): Restart the interpreter, resetting the dictionary.

RDROP ( r: x -- ): drop the top of the return stack

-ROT: rotate the opposite direction as ROT

LAST: return the address of the last named dictionary entry

S/REM: explicit towards-zero 16-bit division.

F/MOD: explicit floored 16-bit division.

M/MOD: mixed-precision division defaulting to floored behavior.  Used for
calculations by other system words, may be changed to towards-zero division
using ' SM/REM ' M/MOD DEFER!

XKEY ( c1 -- c2 ): use Davex to read a key with c1 as the character under
the cursor.

MAXLEN ( -- u ): return maximum size that can be requested via ACCEPT.

X3U. ( d -- ): print an unsigned integer of up to 24 bits, in base 10, via
Davex.

MESSAGE ( n -- ): prints "Msg #" followed by n.  Can be replaced with something
more verbose using DEFER!

ABORT!: like ABORT but an IMMEDIATE word.

0SP: empty the parameter stack

CATBUFF: return Davex CATBUFF address.

FBUFF, FBUFF2, FBUFF3: return the address of the respected Davex buffer.  Each
is 512 bytes.

.FTYPE (u -- ): Use Davex to print ProDOS file type.

.ACCESS (u -- ): use Davex to print ProDOS access bits.

.SD (u -- ): use Davex to print ProDOS slot and drive.

CSTYPE: use Davex to print a counted string.

CS+CS ( c-addr1 c-addr2 -- ): append counted string c-addr2 to c-addr1.

CS+ ( c-addr char -- ): append character char to counted string c-addr

CS/- ( c-addr -- ): remove ProDOS last path component from counted string

CS+/ ( c-addr -- ): append a / to counted string c-addr, but only if it does
not already end with one.

CSMOVE ( c-addr1 c-addr2 ): copy counted string c-addr1 to c-addr2.

PLACE ( c-addr1 u c-addr2 ): place string described by c-addr,u as a counted
string at c-addr2

BUILD_LOCAL (c-addr -- c-addr'): call Davex xbuild_local

REDIRECT? ( -- f ): Return Davex input or output are redirected, b0=1 if input
b1=1 if output.

+REDIRECT, -REDIRECT: affect DaveX I/O redirection.

U% (u1 u2 -- u): use Davex to calculate the percentage of u1 that u2 is.

3U% (d1 d2 -- u): use Davex to calculate the percentage of d1 that d2 is, up
to 24-bit.

Y/N ( -- f ): use Davex to ask "? (y/n)" returning true if Y was pressed.

Y/N2 ( u -- f ): u is either 'y' or 'n'.  Perform as Y/N above, but use u as
the default if space or return are pressed.

BELL: sound the Davex bell as configured by the user.

.DATE ( u -- ): use Davex to print a ProDOS date word.

.TIME ( u -- ): use Davex to print a ProDOS time word.

.P8_ERR ( u -- ): use Davex to print a ProDOS error message.

<DIR ( c-addr -- ): open a new directory level using the path pointed to by
c-addr.

<<DIR ( c-addr -- ): open a new directory level relative to the directory
already open by <DIR.

DIR+ ( -- flag): read one directory entry to CATBUFF and return a truthy value
(address of CATBUFF), if no more return false.

DIR>: close current directory level and opens the previous one if it was open.
must use this once for each <DIR or <<DIR that was used.

WAIT? ( -- f ): returns true if the user wants to soft-abort.  Pauses if the
user types SPACE.  Should be done once per line printed.

IOPOLL: give Davex a change to send stuff to printer, etc.

DIRTY: tell Davex that its config is dirty.

.VER ( u -- ): print a version number via Davex.

XINFO ( u -- u1(ay) u2(x) true | false): call Davex xshell_info, if succesful
returns true at the top of the stack and the info in u1 and u2 representing
the AY and X registers, respectively.

MLI ( u c-addr ): issue ProDOS call u with parameter list at c-addr.  Throws
exception if ProDOS returns an error.  Exception number is in the range $FExx
where XX is the ProDOS error number.

HTAB ( u -- ): set cursor horizontal position

VTAB ( u -- ): set cursor vertical position

2S>D ( n1 n2 -- d1 d2 ): convert two singles to two doubles

UML/MOD ( ud u -- u-rem ud-quot): 32/16 division with 32-bit quotient and 16-
bit remainder.


Notes for standard words:

/MOD defaults to floored division but may be changed to towards-zero divion
using ' S/REM ' /MOD DEFER!

Similarly, M/MOD performs the same function for derived mixed-precision words,
and can be changed via ' SM/REM ' M/MOD DEFER!

S" and C": In interpretation mode, S" and C" use FBUFF3 (documented above),
split into two 256-byte regions and alternating between the two. I.e. the first
S" or C" uses FBUFF3+0, the second FBUFF3+256, the third back to FBUFF3+0
again.  No effort is made to bounds-check.


Examples:

: prname dup c@ 15 and swap 1+ swap type ;
create online_parms 2 c, 0 c, fbuff ,
: online 197 online_parms mli 16 0 do 16 i * fbuff + dup c@ dup 15 and if .sd space [char] / emit prname cr else 2drop leave then loop ;
: prent dup dup prname space 16 + c@ .ftype cr ;
: cat <dir begin dir+ dup if dup prent then 0= until dir> ;

c" /foo" cat

Implementation internals/Hacking

This Forth uses the direct-threaded model.  Forth is implemented as a virtual
machine that may be freely mixed with with 6502 code.

The stack would preferably be implemented on the zero page, but Davex does not
give us enough room to have an acceptably-sized stack.  Therefore the ZP
contains working registers and system variables instead.  This makes the
system slower but somewhat space-efficient with regards to math operations.
Some, if not all, of this slowness is made up for by the direct-threaded model.

As a direct-threaded Forth, each compiled instruction generally refers
to a code address, not a code field address.  The exception is an instruction
in the range $0000..$00FF.  Since no code is allowed on the zero page, these
are implemented as fast literal numbers and are immediately pushed onto the
parameter stack.

The following macros are defined in the source to aid readability:

  ENTER - enter the Forth VM, cells representing compiled Forth code follow
  immediately.  This starts a new thread by pushing the previous Forth IP
  to the stack.  This implements the compiled semantics of a colon definition.
  
  EXIT - exit the current thread and return to the previous thread.
  
  CODE - exit the current thread and return to native code, which immediately
  follows.
  
  NEXT - used at the end of a primitive to execute the next Forth instruction.
  
  PUSHNEXT - used at the end of a primitive to optimize the common case of 
  jsr pushay followed by NEXT.
  
The dictionary is implemented as follows:

No-name (defined by :NONAME for instance) definitions are headerless. and
not searchable.

Definitions with names are stored in the following format:

  Offset  Use
  ------- ---
  $00-$01 Link to previous named definition, $0000 if this is the last one.
  $02     Flags and name length, b7 is always set.
          b0-b3 are name length, b4 is the "smudge" bit, b5 is the compile-only
          flag, and b6 is the IMMEDIATE flag.
  $03-n   Name, ASCII with high bit off.
  n+1-m   Code field, this address is returned by ' (is the execution token).
  m+1     Body, for deferred words.
  m+3     Body, for colon definitions and CREATEd words.
  
Since each code field begins with native code, words defined from within
Forth itself begin with a JSR ($20) or JMP ($4C) opcode.  JSR is used for
all definitions except deferred words, which use JMP.

From an execution token for a named word, the header can be found by scanning
backwards from the xt for the high bit of the flags.

The compile-only flag is used to flag system words that can only be used
at compilation time, such as looping/control-flow words.  This bit may be
used in the future to automatically compile a noname definition in the
interpretation state when such a word is encountered, allowing such words to
be used at any time.  For now using words with this flag in the interpretation
state throws an exception.

The "smudge" bit is used when a definition is open.  If the definition is
aborted due to an error, the smudge bit will still be set and the system will
delete the unfinished definition.  DOES> resets the smudge bit.

In the interpreter source code, the following macros are defined to aid
readibility and ensure consistent system dictionary data:

  dsstart - start the dictionary

  dword dname,fname,flags - create a word with the given label, Forth name, and
  flags.

  hword dname,fname,flags - create a headerless definiton. fname and flags are
  ignored but should be provided so that a headerless word can be changed to
  a normal one and vice-versa.
  
  dwordq dname,fname,flags - as dword, but in the Forth name will have each
  ' replaced with a ", required due to an assembler limitation.  An equivalent
  hwordq is not provided since a headerless word does not have a Forth name.
  
  dchain dname - change the dictionary chain so the next word will link to
  dname instead.
  
  eword - end a definition started with one of the above.
  
  dconst dname,fname,value,flags - define a constant with the given value.
  This macro results in a primitive that cannot be altered.
  
  dvar dname,fname,value,flags - define a variable, equivalent to CREATE 1
  CELLS ALLOT.  The scoped label val is the address of the value.
  
  hvar dname,fname,value,flags - as dvar but produce a headerless definition.
  
  dvalue dname,fname,value,flags - define a VALUE. The scoped label val is
  the address of the value.
  
  hvalue dname,fname,value,flags - define a headerless VALUE.
  
All of the definitions produced by the above contain a scoped label, xt, that
is the address used for the execution token of the word, and must be used when
hand-compiling definitions.  For instance:

  dword MY2DROP,"MY2DROP"
    ENTER
    .addr DROP::xt
    .addr DROP::xt
    EXIT
  eword

dname is the label name to be used for the assembler, and will be used for
hand-compiled Forth code in the interpreter.

fname is the Forth name, what is used inside the interpreter.

flags are the flag bits for the word.  They are always optional.  The high bit
will always be set.

value is the initial value (variables, values) or set value of the constants.