Ophis/doc/tutor1.sgm

<chapter id="part1">
  <title>The basics</title>

  <para>
    In this first part of the tutorial we will create a
    simple <quote>Hello World</quote> program to run on the Commodore
    64.  This will cover:

    <itemizedlist>
      <listitem><para>How to make programs run on a Commodore 64</para></listitem>
      <listitem><para>Writing simple code with labels</para></listitem>
      <listitem><para>Numeric and string data</para></listitem>
      <listitem><para>Invoking the assembler</para></listitem>
    </itemizedlist>
  </para>

  <section>
    <title>A note on numeric notation</title>

    <para>
      Throughout these tutorials, I will be using a lot of both
      decimal and hexadecimal notation.  Hex numbers will have a
      dollar sign in front of them.  Thus, 100 = $64, and $100 = 256.
    </para>
  </section>

  <section>
    <title>Producing Commodore 64 programs</title>

    <para>
       Commodore 64 programs are stored in
       the <filename>PRG</filename> format on disk.  Some emulators
       (such as CCS64 or VICE) can run <filename>PRG</filename>
       programs directly; others need them to be transferred to
       a <filename>D64</filename> image first.
    </para>

    <para>
      The <filename>PRG</filename> format is ludicrously simple.  It
      has two bytes of header data: This is a little-endian number
      indicating the starting address.  The rest of the file is a
      single continuous chunk of data loaded into memory, starting at
      that address.  BASIC memory starts at memory location 2048, and
      that's probably where we'll want to start.
    </para>

    <para>
      Well, not quite.  We want our program to be callable from BASIC,
      so we should have a BASIC program at the start.  We guess the
      size of a simple one line BASIC program to be about 16 bytes.
      Thus, we start our program at memory location 2064 ($0810), and
      the BASIC program looks like this:
    </para>

    <programlisting>
10 SYS 2064
    </programlisting>

    <para>
      We <userinput>SAVE</userinput> this program to a file, then
      study it with a hex dumper.  It's 15 bytes long:
    </para>

    <screen>
00000000  01 08 0c 08 0a 00 9e 20  32 30 36 34 00 00 00     |....... 2064...|
    </screen>

    <para>
      The first two bytes are the memory location: $0801.  The rest of
      the data breaks down as follows:
    </para>

    <table frame="all">
      <title>BASIC program breakdown</title>
      <tgroup cols='3'>
        <thead>
          <row>
            <entry align="center">File Offsets</entry>
            <entry align="center">Memory Locations</entry>
            <entry align="center">Value</entry>
          </row>
        </thead>
        <tbody>
          <row><entry>0-1</entry><entry>Nowhere</entry><entry>2-byte pointer to where in memory to load the rest of the file ($0801).</entry></row>
          <row><entry>2-3</entry><entry>$0801-$0802</entry><entry>2-byte pointer to the next line of BASIC code ($080C).</entry></row>
          <row><entry>4-5</entry><entry>$0803-$0804</entry><entry>2-byte line number ($000A = 10).</entry></row>
          <row><entry>6</entry><entry>$0805</entry><entry>Byte code for the <userinput>SYS</userinput> command.</entry></row>
          <row><entry>7-11</entry><entry>$0806-$080A</entry><entry>The rest of the line, which is just the string <quote> 2064</quote>.</entry></row>
          <row><entry>12</entry><entry>$080B</entry><entry>Null byte, terminating the line.</entry></row>
          <row><entry>13-14</entry><entry>$080C-$080D</entry><entry>2-byte pointer to the next line of BASIC code ($0000 = end of program).</entry></row>
        </tbody>
      </tgroup>
    </table>

    <para>
      That's 15 bytes, of which 13 are actually loaded into memory.
      We started at 2049, so we need 2 more bytes of filler to make
      our code actually start at location 2064.  These 17 bytes will
      give us the file format and the BASIC code we need to have our
      machine language program run.
    </para>

    <para>
      These are just bytes&mdash;indistinguishable from any other sort of
      data.  In Ophis, bytes of data are specified with
      the <literal>.byte</literal> command.  We'll also have to tell
      Ophis what the program counter should be, so that it knows what
      values to assign to our labels.  The <literal>.org</literal>
      (origin) command tells Ophis this.  Thus, the Ophis code for our
      header and linking info is:
    </para>

    <programlisting>
.byte $01, $08, $0C, $08, $0A, $00, $9E, $20
.byte $32, $30, $36, $34, $00, $00, $00, $00
.byte $00, $00
.org $0810
    </programlisting>

    <para>
       This gets the job done, but it's completely incomprehensible,
       and it only uses two directives&mdash;not very good for a
       tutorial.  Here's a more complicated, but much clearer, way of
       saying the same thing.
    </para>

    <programlisting>
.word $0801
.org  $0801

        .word next, 10       ; Next line and current line number
        .byte $9e," 2064",0  ; SYS 2064
next:   .word 0              ; End of program

.advance 2064
    </programlisting>

    <para>
      This code has many advantages over the first.

      <itemizedlist>
        <listitem><para> It describes better what is actually
          happening.  The <literal>.word</literal> directive at the
          beginning indicates a 16-bit value stored in the typical
          65xx way (small byte first).  This is followed by
          an <literal>.org</literal> statement, so we let the
          assembler know right away where everything is supposed to
          be.
        </para></listitem>
        <listitem><para>
          Instead of hardcoding in the value $080C, we instead use a
          label to identify the location it's pointing to.  Ophis
          will compute the address of <literal>next</literal> and
          put that value in as data.  We also describe the line
          number in decimal since BASIC line numbers
          generally <emphasis>are</emphasis> in decimal.  Labels are
          defined by putting their name, then a colon, as seen in
          the definition of <literal>next</literal>.
        </para></listitem>
        <listitem><para>
          Instead of putting in the hex codes for the string part of
          the BASIC code, we included the string directly.  Each
          character in the string becomes one byte.
        </para></listitem>
        <listitem><para>
          Instead of adding the buffer ourselves, we
          used <literal>.advance</literal>, which outputs zeros until
          the specified address is reached.  Attempting
          to <literal>.advance</literal> backwards produces an
          assemble-time error. (If we wanted to output something
          besides zeros, we could add it as a second
          argument: <literal>.advance 2064,$FF</literal>, for
          instance.)
        </para></listitem>
        <listitem><para>
          It has comments that explain what the data are for.  The
          semicolon is the comment marker; everything from a semicolon
          to the end of the line is ignored.
        </para></listitem>
      </itemizedlist>
    </para>

    <para>
      We can do better still, though. That initial starting address
      of 2064 was only ever a guess; now that we know that we overshot
      by two bytes, we can simply change the starting address to 2062
      and omit the <literal>.advance</literal> directive entirely. In
      fact, we can even remove the space before the number and make it
      2061 instead&mdash;BASIC doesn't need that space in its
      instruction and it's arguably a wasted byte.
    </para>
  </section>

  <section>
    <title>Related commands and options</title>

    <para>
      This code includes constants that are both in decimal and in
      hex.  It is also possible to specify constants in octal, binary,
      or with an ASCII character.

      <itemizedlist>
        <listitem><para>To specify decimal constants, simply write the number.</para></listitem>
        <listitem><para>To specify hexadecimal constants, put a $ in front.</para></listitem>
        <listitem><para>To specify octal constants, put a 0 (zero) in front.</para></listitem>
        <listitem><para>To specify binary constants, put a % in front.</para></listitem>
        <listitem><para>To specify ASCII constants, put an apostrophe in front.</para></listitem>
      </itemizedlist>

      Example: 65 = $41 = 0101 = %1000001 = 'A
    </para>
    <para>
      There are other commands besides <literal>.byte</literal>
      and <literal>.word</literal> to specify data.  In particular,
      the <literal>.dword</literal> command specifies four-byte values
      which some applications will find useful.  Also, some linking
      formats (such as the <filename>SID</filename> format) have
      header data in big-endian (high byte first) format.
      The <literal>.wordbe</literal> and <literal>.dwordbe</literal>
      directives provide a way to specify multibyte constants in
      big-endian formats cleanly.
    </para>
  </section>

  <section>
    <title>Writing the actual code</title>

    <para>
      Now that we have our header information, let's actually write
      the <quote>Hello world</quote> program.  It's pretty
      short&mdash;a simple loop that steps through a hardcoded array
      until it reaches a 0 or outputs 256 characters.  It then returns
      control to BASIC with an <literal>RTS</literal> statement.
    </para>

    <para>
      Each character in the array is passed as an argument to a
      subroutine at memory location $FFD2.  This is part of the
      Commodore 64's BIOS software, which its development
      documentation calls the KERNAL.  Location $FFD2 prints out the
      character corresponding to the character code in the
      accumulator.
    </para>

    <programlisting>
        ldx #0
loop:   lda hello, x
        beq done
        jsr $ffd2
        inx
        bne loop
done:   rts

hello:  .byte "HELLO, WORLD!", 0
    </programlisting>

    <para>
      The complete, final source is available in
      the <xref linkend="tutor1-src" endterm="tutor1-fname"> file.
    </para>
  </section>
  <section>
    <title>Assembling the code</title>

    <para>
       The Ophis assembler is a collection of Python modules,
       controlled by a master script.  On Windows, this should all
       have been combined into an executable
       file <command>ophis.exe</command>; on other platforms, the
       Ophis modules should be in the library and
       the <command>ophis</command> script should be in your path.
       Typing <command>ophis</command> with no arguments should give a
       summary of available command line options.
    </para>

    <para>
      Ophis takes a list of source files and produces an output file
      based on assembling each file you give it, in order. You can add
      a line to your program like this to name the output file:
    </para>

<programlisting>
.outfile "hello.prg"
</programlisting>

    <para>
      Alternately, you can use the <option>-o</option> option on the
      command line. This will override any <literal>.outfile</literal>
      directives. If you don't specify any name, it will put the
      output into a file named <filename>ophis.bin</filename>.
    </para>

    <para>
      If you are using Ophis as part of some larger toolchain, you can
      also make it run in <emphasis>pipe mode</emphasis>. If you give
      a dash <option>-</option> as an input file or as the output
      target, Ophis will use standard input or output for
      communication.
    </para>

    <table frame="all">
      <title>Ophis Options</title>
      <tgroup cols='2'>
        <thead>
          <row>
            <entry align="center">Option</entry>
            <entry align="center">Effect</entry>
          </row>
        </thead>
        <tbody>
          <row><entry><option>-o FILE</option></entry><entry>Overrides the default filename for output.</entry></row>
          <row><entry><option>-l FILE</option></entry><entry>Specifies an optional listing file that gives the emitted binary in human-readable form, with disassembly.</entry></row>
          <row><entry><option>-m FILE</option></entry><entry>Specifies an optional map file that gives the in-source names for every label used in the program.</entry></row>
          <row><entry><option>-u</option></entry><entry>Allows the 6510 undocumented opcodes as listed in the VICE documentation.</entry></row>
          <row><entry><option>-c</option></entry><entry>Allows opcodes and addressing modes added by the 65C02.</entry></row>
          <row><entry><option>-4</option></entry><entry>Allows opcodes and addressing modes added by the 4502. (Experimental.)</entry></row>
          <row><entry><option>-q</option></entry><entry>Quiet operation.  Only reports warnings and errors.</entry></row>
          <row><entry><option>-v</option></entry><entry>Verbose operation.  Reports files as they are loaded.</entry></row>
        </tbody>
      </tgroup>
    </table>

    <para>
      The only options Ophis demands are an input file and an output
      file.  Here's a sample session, assembling the tutorial file
      here:
    </para>
    <screen>
localhost$ ophis -v hello1.oph
Loading hello1.oph
Assembly complete: 45 bytes output (14 code, 29 data, 2 filler)
    </screen>
    <para>
      This will produce a file
      named <filename>hello.prg</filename>. If your emulator can
      run <filename>PRG</filename> files directly, this file will now
      run (and print <computeroutput>HELLO, WORLD!</computeroutput>)
      as many times as you type <userinput>RUN</userinput>.
      Otherwise, use a <filename>D64</filename> management utility to
      put the <filename>PRG</filename> on a <filename>D64</filename>,
      then load and run the file off that. If you have access to a
      device like the 1541 Ultimate II, you can even load the file
      directly into the actual hardware.
    </para>
  </section>
</chapter>