THIS DOCUMENT IS A WORK IN PROGRESS MG's Davex Forth is a Forth system implementing the Forth 2012 Core word set. This implementation runs under the Apple II Davex shell. It requires a version of Davex that implements the xgetln2 call, which is documented on the project page, but does not actually exist. You may contact me to get an experimental fork of Davex that does implement it. Additionally, the following are implemented: * The Exception word set. * The following words from the Core Extensions word set: .( .R U.R 2>R 2R> 2R@ :NONAME AGAIN BUFFER: C" COMPILE, DEFER DEFER! DEFER@ ERASE FALSE HEX MARKER NIP PAD PARSE PARSE-NAME PICK REFILL RESTORE-INPUT SAVE-INPUT SOURCE-ID TO TRUE TUCK U> UNUSED VALUE WITHIN \ * The following words from the Double Number word set: DABS DNEGATE D. D.R * The Facility word set. * The following words from the Programming-Tools word set: .S ? WORDS * The following words from the Programming-Tools extension word set: BYE STATE * The following words from the String word set: BLANK * Words supporting the Apple II+ProDOS+Davex environment (documented below) Implementation-defined options (Forth 2012 4.1.1): * No address alignment is required for cells or characters. * EMIT sends non-printing characters to the output device. * ACCEPT allows all editing that Davex allows, except for history. * The character set is the Apple II normal character set. Characters are stored high-bit OFF. * There are no charater set extensions. * Control characters match a space character in PARSE-NAME only. * The control-flow stack is implemented on the parameter stack as addresses to be resolved later by words that consume them. * Digits larger than 35 convert to lower-case letters. If BASE is larger than 35, number parsing becomes case-sensitive. * After input terminates, the cursor is on the beginning of the next line. If no exception occurs, after the line is executed, the system will display the sytem prompt. * When an exception occurs outside of CATCH, the system will display the exception number, will forget any current word being defined, and resume user input through QUIT. * The input line terminator is the carriage return. * The maximum size of a counted string is 255 characters. * The maximum size of a parsed string is limited by memory for PARSE and PARSE-NAME, and 34 characters for WORD. * The maximum size of a definition name is 16 characters. * ENVIRONMENT? never returns anything but false. * The user input device is the keyboard unless redirected by Davex. * The user output device is the screen unless redirected by Davex. * The dictionary starts at 256 bytes beyond the lowest memory allowed by DaveX and works its way up. * An address unit contains 8 bits. * Numbers are 16-bits with the sign (if used) in the high bit. Numbers are stored little-endian. Arithmetic is 16-bit except for mixed-precision. No 32-bit by 32-bit division is implemented. * Ranges: n: -32768..32767 +n: 0..32767 u: 0..65535 d: -2147483648..2147483647 +d: 0..2147483647 ud: 0..4294967295 * There are no read-only data space regions. * The buffer for WORD is 35 bytes and is shared with the pictured numeric output. The buffer will move with the dictionary end. * One cell is two address units (16 bits total). * One character is one address unit (8 bits total). * The keyboard terminal input buffer is 252 bytes. * The pictured numeric output string buffer is 35 bytes and shared with WORD. The buffer will move with the dictionary end. * The size of the PAD is 128 bytes. The PAD will move with the dictionary end and it usable size will shrink by one byte for each byte that UNUSED is less than 179. If PAD is used under this circumstance the behavior is undefined. * The system is not case-sensitive when finding dictionary names. * The system prompt is either '[OK]' in the compilation state, or 'OK' in the interpretation state, and is displayed after the previous input is evaluated successfully. * Division rounding is floored by default, but /MOD and M/MOD are deferred words that may be used to change the rounding of those and their derived single- and mixed-precision words, respectively. * STATE takes the value 1 when compiling a definition before any DOES>, and 2 after DOES>. * Integer overflow is truncated to the low bits, except in UM/MOD and SM/REM (and derived operations) where result overthrow results in an exception. * The current definiton may be found after DOES>. Ambiguous conditions (Forth 2012 4.1.2): General: * When a parsed name is neither a dictionary word nor a number, an exception is thrown. * When a definition name exceeds the maximum allowed length, an exception is thrown. * When addressing a region not listed in the data space, the system allows the access with the consequences being left as an exercise for the programmer. * Passing incorrect argument types results in the argument being used as if it were the expected type, possibly causing undefined behavior. * An execution token may be found for a compile-only word. Executing it via EXECUTE outside of the compilation context results in undefined behavior. * Dividing by zero throws an exception. * Data stack overflow throws an exception. Return stack overflow results in undefined behavior. * Insufficient space for loop-control variables results in undefined behavior. * Insufficient space in the dictionary results in undefined behavior. * Interpreting a word with undefined interpretation semantics throws an exception. * Modifying the contents of the input buffer may result in undefined behavior. Modifying the contents of a compiled string literal is allowed but it cannot be changed in size. The change is permanent within the lifetime of the program. See below for interpreted string literals. * Overflowing the pictured numeric string output buffer may collide with the end of the dictionary. * Overflowing a parsed string with WORD throws an exception. PARSE and PARSE-NAME effectively allow any length string to be parsed up to the end of the the line or input buffer. * Producing a number out of range results in overflow and truncation of the result *except* when mixed-precision division overflows an exception is thrown. * Data stack undeflow throws an exception. Return stack underflow results in undefined behavior. * Unexpected end of the input buffer while parsing a name returns a zero- length string. Specific: * >IN past the size of the input buffer results in termination of parsing. * RECURSE after DOES> results in recursion to the definition being compiled that contains the DOES>. * RESTORE-INPUT requires the current input source to be the same that was used during SAVE-INPUT or undefined behavior results. * Data space containing definitions may only be de-allocated by a MARKER or the behavior is undefined. * No ambiguous conditions result from alignment requirements (there are none). * The data space pointer cannot be misaligned, alignment is not required. * PICK with insufficient stack throws an exception. * Loop control parameters unavailable results in undefined behavior. * Executing IMMEDIATE affects the last definition with a name. * TO relies on >BODY, if >BODY cannot be used on the word, an exception is thrown. That being said, all words defined by CREATE, VALUE, CONSTANT, :, :NONAME, DEFER and their derivatives have a body. This means that TO may modify the first execution token within a colon definition. It can also be used to alter a (non-system) CONSTANT or the target of DEFER. * When name is not found by POSTPONE, [COMPILE], etc., an exception is thrown and the current definition being compiled is discarded. * If parameters are not of the same type in DO, the loop proceeds as if they were the same type. * POSTPONE, [COMPILE], etc. applied to TO result in TO's execution token being compiled, making the word a parsing word. * WORD is limited to 34 chars + length, which is less than the maximum length of a counted string. An exception will be thrown if the parsed word exceeds the maximum. * If u is greater than the number of bits in a cell for LSHIFT and RSHIFT, the result will be zero. * With regards to >BODY and DOES>, all secondary words have a body. DOES> will alter any secondary unless it was created with DEFER. * Pictured numeric output words used outside of <# and #>, but before any <# may write to unintended locations in memory, resulting in undefined behavior. It is generally safe to use them immediately after the #>, but the c-addr,u pair returned by #> will no longer be valid. * Accessing an unassigned deferred word throws an exception. * Attempting to assign an xt to a word not defined by DEFER throws an exception, when using DEFER! and derivatives. * POSTPONE, [COMPILE], etc. used to resolve a deferred word results in undefined behavior unless the deferred word is declared IMMEDIATE. * S\" is not implemented, so \x not followed by two hexadecimal digits is not applicable. * Similarly, a \ before any character not defined for S\" is not applicable. Other system documentation (Forth 2012 4.1.3) * No non-standard words use PAD. * Terminal facilities are the same as those provided by Davex. * Program space available is about 1.5K. * The return stack is 128 cells, and is implemented in the 6502 stack. Some cells are used by the host system software. * The data stack is 128 cells. The data stack is split, the low unit and high unit of any cell on the stack are not adjacent in memory. * The system dictionary space is approximately 8K. Non-standard words included: COLD ( x1..xn -- ): Restart the interpreter, resetting the dictionary. RDROP ( r: x -- ): drop the top of the return stack -ROT: rotate the opposite direction as ROT LAST: return the address of the last named dictionary entry S/REM: explicit towards-zero 16-bit division. F/MOD: explicit floored 16-bit division. M/MOD: mixed-precision division defaulting to floored behavior. Used for calculations by other system words, may be changed to towards-zero division using ' SM/REM ' M/MOD DEFER! XKEY ( c1 -- c2 ): use Davex to read a key with c1 as the character under the cursor. MAXLEN ( -- u ): return maximum size that can be requested via ACCEPT. X3U. ( d -- ): print an unsigned integer of up to 24 bits, in base 10, via Davex. MESSAGE ( n -- ): prints "Msg #" followed by n. Can be replaced with something more verbose using DEFER! ABORT!: like ABORT but an IMMEDIATE word. 0SP: empty the parameter stack CATBUFF: return Davex CATBUFF address. FBUFF, FBUFF2, FBUFF3: return the address of the respected Davex buffer. Each is 512 bytes. .FTYPE (u -- ): Use Davex to print ProDOS file type. .ACCESS (u -- ): use Davex to print ProDOS access bits. .SD (u -- ): use Davex to print ProDOS slot and drive. CSTYPE: use Davex to print a counted string. CS+CS ( c-addr1 c-addr2 -- ): append counted string c-addr2 to c-addr1. CS+ ( c-addr char -- ): append character char to counted string c-addr CS/- ( c-addr -- ): remove ProDOS last path component from counted string CS+/ ( c-addr -- ): append a / to counted string c-addr, but only if it does not already end with one. CSMOVE ( c-addr1 c-addr2 ): copy counted string c-addr1 to c-addr2. PLACE ( c-addr1 u c-addr2 ): place string described by c-addr,u as a counted string at c-addr2 BUILD_LOCAL (c-addr -- c-addr'): call Davex xbuild_local REDIRECT? ( -- f ): Return Davex input or output are redirected, b0=1 if input b1=1 if output. +REDIRECT, -REDIRECT: affect DaveX I/O redirection. U% (u1 u2 -- u): use Davex to calculate the percentage of u1 that u2 is. 3U% (d1 d2 -- u): use Davex to calculate the percentage of d1 that d2 is, up to 24-bit. Y/N ( -- f ): use Davex to ask "? (y/n)" returning true if Y was pressed. Y/N2 ( u -- f ): u is either 'y' or 'n'. Perform as Y/N above, but use u as the default if space or return are pressed. BELL: sound the Davex bell as configured by the user. .DATE ( u -- ): use Davex to print a ProDOS date word. .TIME ( u -- ): use Davex to print a ProDOS time word. .P8_ERR ( u -- ): use Davex to print a ProDOS error message. : close current directory level and opens the previous one if it was open. must use this once for each D ( n1 n2 -- d1 d2 ): convert two singles to two doubles UML/MOD ( ud u -- u-rem ud-quot): 32/16 division with 32-bit quotient and 16- bit remainder. Notes for standard words: /MOD defaults to floored division but may be changed to towards-zero divion using ' S/REM ' /MOD DEFER! Similarly, M/MOD performs the same function for derived mixed-precision words, and can be changed via ' SM/REM ' M/MOD DEFER! S" and C": In interpretation mode, S" and C" use FBUFF3 (documented above), split into two 256-byte regions and alternating between the two. I.e. the first S" or C" uses FBUFF3+0, the second FBUFF3+256, the third back to FBUFF3+0 again. No effort is made to bounds-check. Examples: : prname dup c@ 15 and swap 1+ swap type ; create online_parms 2 c, 0 c, fbuff , : online 197 online_parms mli 16 0 do 16 i * fbuff + dup c@ dup 15 and if .sd space [char] / emit prname cr else 2drop leave then loop ; : prent dup dup prname space 16 + c@ .ftype cr ; : cat ; c" /foo" cat Implementation internals/Hacking This Forth uses the direct-threaded model. Forth is implemented as a virtual machine that may be freely mixed with with 6502 code. The stack would preferably be implemented on the zero page, but Davex does not give us enough room to have an acceptably-sized stack. Therefore the ZP contains working registers and system variables instead. This makes the system slower but somewhat space-efficient with regards to math operations. Some, if not all, of this slowness is made up for by the direct-threaded model. As a direct-threaded Forth, each compiled instruction generally refers to a code address, not a code field address. The exception is an instruction in the range $0000..$00FF. Since no code is allowed on the zero page, these are implemented as fast literal numbers and are immediately pushed onto the parameter stack. The following macros are defined in the source to aid readability: ENTER - enter the Forth VM, cells representing compiled Forth code follow immediately. This starts a new thread by pushing the previous Forth IP to the stack. This implements the compiled semantics of a colon definition. EXIT - exit the current thread and return to the previous thread. CODE - exit the current thread and return to native code, which immediately follows. NEXT - used at the end of a primitive to execute the next Forth instruction. PUSHNEXT - used at the end of a primitive to optimize the common case of jsr pushay followed by NEXT. The dictionary is implemented as follows: No-name (defined by :NONAME for instance) definitions are headerless. and not searchable. Definitions with names are stored in the following format: Offset Use ------- --- $00-$01 Link to previous named definition, $0000 if this is the last one. $02 Flags and name length, b7 is always set. b0-b3 are name length, b4 is the "smudge" bit, b5 is the compile-only flag, and b6 is the IMMEDIATE flag. $03-n Name, ASCII with high bit off. n+1-m Code field, this address is returned by ' (is the execution token). m+1 Body, for deferred words. m+3 Body, for colon definitions and CREATEd words. Since each code field begins with native code, words defined from within Forth itself begin with a JSR ($20) or JMP ($4C) opcode. JSR is used for all definitions except deferred words, which use JMP. From an execution token for a named word, the header can be found by scanning backwards from the xt for the high bit of the flags. The compile-only flag is used to flag system words that can only be used at compilation time, such as looping/control-flow words. This bit may be used in the future to automatically compile a noname definition in the interpretation state when such a word is encountered, allowing such words to be used at any time. For now using words with this flag in the interpretation state throws an exception. The "smudge" bit is used when a definition is open. If the definition is aborted due to an error, the smudge bit will still be set and the system will delete the unfinished definition. DOES> resets the smudge bit. In the interpreter source code, the following macros are defined to aid readibility and ensure consistent system dictionary data: dsstart - start the dictionary dword dname,fname,flags - create a word with the given label, Forth name, and flags. hword dname,fname,flags - create a headerless definiton. fname and flags are ignored but should be provided so that a headerless word can be changed to a normal one and vice-versa. dwordq dname,fname,flags - as dword, but in the Forth name will have each ' replaced with a ", required due to an assembler limitation. An equivalent hwordq is not provided since a headerless word does not have a Forth name. dchain dname - change the dictionary chain so the next word will link to dname instead. eword - end a definition started with one of the above. dconst dname,fname,value,flags - define a constant with the given value. This macro results in a primitive that cannot be altered. dvar dname,fname,value,flags - define a variable, equivalent to CREATE 1 CELLS ALLOT. The scoped label val is the address of the value. hvar dname,fname,value,flags - as dvar but produce a headerless definition. dvalue dname,fname,value,flags - define a VALUE. The scoped label val is the address of the value. hvalue dname,fname,value,flags - define a headerless VALUE. All of the definitions produced by the above contain a scoped label, xt, that is the address used for the execution token of the word, and must be used when hand-compiling definitions. For instance: dword MY2DROP,"MY2DROP" ENTER .addr DROP::xt .addr DROP::xt EXIT eword dname is the label name to be used for the assembler, and will be used for hand-compiled Forth code in the interpreter. fname is the Forth name, what is used inside the interpreter. flags are the flag bits for the word. They are always optional. The high bit will always be set. value is the initial value (variables, values) or set value of the constants.