Overhaul README

2024-06-09 18:29:32 +00:00 · 2022-07-22 14:28:48 +01:00 · 2022-07-22 14:28:48 +01:00 · b243e7afb4
commit b243e7afb4
parent 30c79accd7
1 changed files with 108 additions and 168 deletions
--- a/README.md
+++ b/README.md
@ -2,143 +2,43 @@
 High quality audio player for streaming audio over Ethernet, for the Apple II.
 **Dedicated to Woz on his 70th birthday.  Thank you for a lifetime of enjoyment exploring your wonderful creation.**
 Requires:
 *  Uthernet II (currently assumed to be in slot 1)
 *  Enhanced //e or (untested) //gs.  
-    * The player will run on 6502 (and should even run on a 16KB machine, although the disk image uses ProDOS) but about
+    * The player should run on 6502 but about 10% _faster_ on a 6502 than 65c02 (and with lower audio quality, until
-    10% _faster_ on a 6502 than 65c02 (and with lower audio quality, until the encoder understands this).  See "future
+    the encoder understands this).  See "future work" below.
    work" below.
 NOTE: Ethernet addresses are hardcoded to 10.0.0.1 for the server and 10.0.65.02 for the Apple II.  This is not
 currently configurable without reassembling.
-## What this does
+The audio encoder runs on a modern machine, and produces an encoded audio file suitable for playback on the Apple
 II, via ethernet streaming.
-The audio encoder runs on your modern machine, and produces a bytestream suitable for playback on the Apple II, via
+To encode audio, ][-Sound simulates the movement of the Apple II speaker clock cycle by cycle, and computes the exact
-ethernet streaming.
+clock cycles at which to invert the applied speaker voltage, so that the speaker traces out the desired audio waveform as accurately
 as possible.
-It works by simulating the movement of the Apple II speaker at 1-cycle resolution, and computing the exact cycles
+The resulting audio file causes the Apple II to follow this speaker trajectory with cycle-level precision when it is
-that the speaker cone should switch direction so that it traces out the desired audio waveform as accurately as
+played, and typically ends up toggling the speaker about 100,000 times/second.
 possible.  This includes looking some number of cycles into the future to anticipate upcoming changes in the waveform
 (e.g. sudden spikes), so the speaker can be pre-positioned to best accommodate them.
-The resulting bytestream directs the Apple II to follow this speaker trajectory with cycle-level precision, and
+TODO: link KansasFest 2022 slides/video
 typically ends up toggling the speaker about 110000 times/second.
-The core audio playback code is small enough (~190 bytes) to fit in page 3.  i.e. would have been small enough to type
+## Usage
 in from a magazine back in the day.  The megabytes of audio data would have been hard to type in though ;)  Plus,
 Uthernets didn't exist back then (although a Slinky RAM card would let you do something similar, see Future Work below).
-# Implementation
+The simplest usage is:
 The audio player uses [delta modulation](https://en.wikipedia.org/wiki/Delta_modulation) to produce the audio signal.
 This signal is constructed based on an electrical model of how the Apple II behaves in response to input, which we
 simulate to optimize the audio quality.
 Delta modulation with an RC circuit is also called "BTC", after https://www.romanblack.com/picsound.htm who described
 a number of variations on these (Apple II-like) audio circuits and Delta modulation audio encoding algorithms.  See e.g.
 Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc) for an Apple II implementation that plays from
 memory at 33KHz.
 The big difference with our approach is that we are able to target a 1MHz sampling rate, i.e. manipulate the speaker
 with 1-cycle precision, by choosing how the "player opcodes" are chained together by the ethernet bytestream.
 The catch is that once we have toggled the speaker we can't toggle it again until at least 10 cycles have passed (9
 cycles on 6502), but we can pick any such interval >= 10 cycles (except for 11 cycles because of 65x02 opcode timing
 limitations).  Successive choices are independent.
 In other words, we are able to choose a precise sequence of clock cycles in which to toggle the speaker, but there is a
 "cooldown" period and these cannot be spaced too close together.
 The minimum period of 10 cycles is already short enough that it produces high-quality audio even if we only modulate
 the speaker at a fixed cadence of 10 cycles (i.e. at 102.4KHz instead of 1MHz), although in practice a fixed 14-cycle
 period gave better quality (10 cycles produced a quiet but audible background tone coming from some kind of harmonic --
 perhaps an interaction with the every-64-cycle "long cycle" of the Apple II).  The initial version of ][-Sound used this
 approach (and also used the "spare" 4 cycles for a page-flipping trick to visualize the audio bitstream while playing).
 We can also use another trick to improve audio quality further: certain 65x02 opcodes will access memory multiple times
 during execution (sometimes called "false reads").  For example, the INC $C030,X opcode executes for 7 cycles and will
 access memory location $C030+X on cycles 4,5,6,7 (for values of X that do not result in page-crossing).  So by making
 sure X=0 we can toggle the speaker 4 times in 7 cycles.
 We use the following opcodes to cover all of the timing possibilities: NOP; STA $zp; STA $C030; STA $C030,X; INC $C030;
 INC $C030,X
 This improves audio quality by XXX%
 ## Player
 The player consists of some ethernet setup code and a core playback loop of "player opcodes", which are the basic
 operations that are dispatched to by the bytestream.
 Some other tricks used here:
 - The minimal 10-cycle (9-cycle) speaker loop is: STA $C030; JMP (WDATA), where we use an undocumented property of the
  Uthernet II: I/O registers on the WDATA don't wire up all of the address lines, so they are also accessible at
  other address offsets.  In particular WDATA+1 is a duplicate copy of WMODE.  In our case WMODE happens to be 0x3.
  This lets us use WDATA as a dynamic jump table into page 3, where we place our player code.  We then choose the
  network byte stream to contain the low-order byte of the target address we want to jump to next, and we'll
  indirect-jump to $03xx.
 - There are many potential combinations of opcodes we could choose to produce patterns of speaker access.  If we limit
  to simple cases (e.g. 2 and 3-cycle padding opcodes, plus STA $C030) then the optimal solution can be easily
  constructed by hand, but this is infeasible when we include additional "exotic" choices like INC $C030.  Instead, we
  machine-generate this part of the player code.
 - To do this, we compute all possible sequences of our candidate 65x02 opcodes up to maximum cycle count, and then
  determine the subset that allows access to the largest range of speaker trajectories, subject to the space constraint
  of fitting within page 3.  We also make of the property that the player can jump to any opcode within these sequences,
  which allows much greater coverage.
 - By chaining together these "player opcodes", we can toggle the speaker with a wide variety of cycle patterns, though
  successive player opcodes always have a gap of at least 10 cycles between speaker toggles.  However even this cooldown
  gap amounts to 102.4KHz which is far beyond audible range. 
 - As with my [\]\[-Vision](https://github.com/KrisKennaway/ii-vision) streaming video+audio player, we schedule a "slow
  path" dispatch to occur every 2KB in the byte stream, and use this to manage the socket buffers (ACK the read 2KB and
  wait until at least 2KB more is available, which is usually non-blocking).  While doing this we need to maintain a
  regular (non-audible) tick cadence so the speaker is in a known trajectory.  We can also partly compensate for this in
  the audio encoder. 
 ## Encoding
 The encoder models the Apple II speaker as an RC circuit with given time constant and simulates it at 1MHz (i.e.
 cycle-level) time resolution.
 At every step we evaluate the possible next choices for the player, i.e. which player "opcode" we should branch to
 next, considering the effect this will have on the speaker movement.  For example, an opcode that will run for 10 cycles
 and invert the speaker voltage on cycle 4. 
 To optimize the audio quality we look ahead some defined number of cycles (e.g. 20 cycles gives good results) and choose
 a speaker trajectory that minimizes errors over this range, considering all possible sequences of opcodes that we could
 choose to schedule during this cycle window.  This makes the encoding exponentially slower, but improves quality since
 it allows us to e.g. anticipate large amplitude changes by pre-moving the speaker to better approximate them.
 This also needs to take into account scheduling the "slow path" every 2048 output bytes, where the Apple II will manage
 the TCP socket buffer while ticking the speaker at a constant cadence (currently chosen to be every 14 cycles XXX).  Since
 we know this is happening we can compensate for it, i.e. look ahead to this upcoming slow path and pre-position the
 speaker so that it introduces the least error during this "dead" period when we're keeping the speaker in a net-neutral
 position.
 ```
-$ ./encode_audio.py <input> <step size> <lookahead steps> <output.a2s>
+$ ./encode_audio.py <input> <output.a2s>
 ```
 where: 
 *  `input` is the audio file to encode.  .mp3, .wav and probably others are supported.
 *  `step size` is the fractional movement from current voltage to target voltage that we assume the Apple II speaker is
   making during each clock cycle.  A value of 500 (i.e. moving 1/500 of the distance) seems to be about right for my
   Apple //e.  This corresponds to a time constant of about 500us for the speaker RC circuit. XXX
 *  `lookahead steps` defines how many cycles into the future we want to look when optimizing.  This is exponentially
   slower since we have to evaluate all possible sequences of player opcodes that could be chosen within the lookahead
   horizon.  A value of 20 gives good quality.
 *  `output.a2s` is the output file to write to.
 TODO: document flags
 ## Serving
 This runs a HTTP server listening on port 1977 to which the player connects, then unidirectionally streams it the data.
@ -147,75 +47,115 @@ This runs a HTTP server listening on port 1977 to which the player connects, the
 $ ./play_audio.py <filename.a2s>
 ```
-# Theory of operation
+# Details
-When we access $C030 it inverts the applied voltage across the speaker, and left to itself this results in an audio
+## Theory of operation
 "click".  When we invert the applied voltage, the speaker initially responds by moving asymptotically towards
 the new voltage level, before developing oscillations that decay in amplitude over the following few milliseconds.
-Electrically, the speaker behaves like an [RLC circuit](https://en.wikipedia.org/wiki/RLC_circuit), and the change in
+Control of the Apple II speaker has very limited hardware support: accessing a special memory location ($C030 hex)
-applied voltage produces an oscillating audio waveform.  (Actually this seems to be an approximation, and the actual
+causes the voltage across the speaker to be inverted (toggled high/low), which causes the speaker cone to begin
-audio output looks more like the sum of _two_ RLC circuits, with different frequencies - I'd like to understand this
+switching position (in/out).  By itself, a single memory access causes the speaker to emit a 'click'.  Producing more
-better)
+complex sounds from the Apple II requires accessing the speaker address repeatedly, under direct CPU control.
-If we actuate the speaker frequently enough, these oscillations don't have time to develop and we can ignore them, so
+][-Sound uses a highly optimized audio player running on the Apple II that is capable of accessing the speaker
-the modeling becomes simpler.  This amounts to approximating the RLC circuit by an
+on _arbitrary_ clock cycles (i.e. at the maximum possible 1MHz resolution), as long as successive accesses are at least
-[RC circuit](https://en.wikipedia.org/wiki/RC_circuit) which is easier to simulate.
+10 cycles apart.
-With some empirical tuning of the time constant of this RC circuit, we can accurately model how the Apple II speaker
+The audio encoder uses [delta modulation](https://en.wikipedia.org/wiki/Delta_modulation) to produce the audio output.
-will respond to voltage changes, and use this to make the speaker "trace out" our desired waveform.  We can't do this
+The audio stream is constructed based on a simulation of how the Apple II speaker behaves in response to changes in input
-exactly -- the speaker will zig-zag around the target waveform because we can only move it in finite jumps -- so there
+voltage, which is used to optimize the audio quality.
 is some left-over "quantization noise" that manifests as background static, though in our case this is barely noticeable.
-In practise the resulting audio also sometimes contains clicks or "crackling".  This problem is also found in other
+Delta modulation has been previously used for Apple II audio playback from memory, e.g. Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc)
-Apple II audio playback techniques (e.g. PWM) and (from looking at audio waveforms) it seems to be due to the speaker
+implements delta modulation at about 33KHz frequency and with 33Khz precision.  i.e. every ~30 cycles, it either toggles
-falling over into the non-linear oscillation mode.  i.e. we haven't successfully managed to keep it in the linear
+the speaker or leaves it untouched for another 30 cycles.
 regime.  Perhaps it will be necessary to model the full RLC circuit behaviour to control for this.
-## Future work
+The big difference with our approach is that we are able to achieve 1Mhz precision, and 100KHz frequency.  i.e. ][-Sound
 is able to toggle the speaker at _any_ clock cycle (1MHz precision), as long as successive toggles are more than 10
 cycles apart (100KHz frequency).
-### Ethernet configuration
+The other major improvement is in accuracy of the Apple II speaker simulation.   Previous delta modulation
 implementations modeled the speaker as an [RC circuit](https://en.wikipedia.org/wiki/RC_circuit) (based on https://www.romanblack.com/picsound.htm
 which described a number of variations of (Apple II-like) audio circuits and Delta modulation audio encoding algorithms,
 which they referred to as "Binary Time Constant" audio).
 Instead, ][-Sound models the speaker as an [RLC circuit](https://en.wikipedia.org/wiki/RLC_circuit), i.e. damped harmonic oscillator, which matches the actual
 speaker response much more closely.  At very short timescales the response of an RLC circuit (oscillatory response to
 applied voltage with exponential damping) looks approximately like that of an RC circuit (exponential response to
 applied voltage), which is why the simpler approach still gives reasonable results.
 ## Player
 The player consists of some ethernet setup code and a core playback loop of "player opcodes", which are the basic
 operations that are dispatched to by the audio bytestream.
 Some other tricks used here:
 - The minimal 10-cycle (9-cycle) speaker loop is: `STA $C030; JMP (WDATA)`, where we use an undocumented property of the
  Uthernet II: the special I/O registers at $C0nx (which are used for communication with the onboard W5100 hardware)
  don't wire up all of the address lines, so they are also accessible at other address offsets.  In particular WDATA+1 is a duplicate copy of WMODE.  In our case WMODE happens to be 0x3.
  This lets us use WDATA as a dynamic jump table into page 3, where we place our player code.  We then choose the
  network byte stream to contain the low-order byte of the target address we want to jump to next, and we'll
  indirect-jump to $03xx.
 - The core audio playback loop is a carefully chosen sequence of 6502 opcodes that can be chained together (via this
  `JMP (WDATA)` trick) to access the speaker at any interval of >=10 CPU cycles.  This only requires 16 bytes of space 
  which easily fits within page 3.
 - By chaining together these "player opcodes", we can toggle the speaker at arbitrary clock cycles, but no more often
  than every 10 cycles.  This gives an upper bound of 102.4KHz for speaker accesses, which means a maximum audio
  frequency of 51.2KHz that is far outside audible range (this may seem like overkill, but a high modulation frequency is desirable in delta modulation to limit "quantization error", i.e. to allow zig-zagging back and forth as closely as possible around the target waveform) 
 - As with my [\]\[-Vision](https://github.com/KrisKennaway/ii-vision) streaming video+audio player, we schedule a "slow
  path" dispatch to occur every 2KB in the byte stream, and use this to manage the socket buffers (ACK the read 2KB and
  wait until at least 2KB more is available, which is usually non-blocking).  While doing this we need to maintain a
  regular speaker cadence so the speaker is in a known trajectory.  We can also partly compensate for this in
  the audio encoder. 
 ## Encoding
 The encoder models the Apple II speaker as an RLC circuit with parameters (resonance frequency and envelope decay rate)
 fitted to the observed speaker response, and simulates the speaker response at 1MHz (i.e. cycle-level) time resolution.
 At every step we evaluate the possible next choices for the player, i.e. which player "opcode" we should branch to
 next, considering the effect this will have on the speaker movement.  For example, an opcode that will run for 10 cycles
 and invert the speaker voltage on cycle 4. 
 To optimize the audio quality we look ahead some defined number of cycles (e.g. 30 cycles gives good results) and choose
 a speaker trajectory that minimizes errors over this range, considering all possible sequences of opcodes that we could
 choose to schedule during this cycle window.  This makes the encoding exponentially slower, but improves quality since
 it allows us to e.g. anticipate large amplitude changes by pre-moving the speaker to better approximate them.
 This also needs to take into account scheduling the "slow path" every 2048 output bytes, where the Apple II will manage
 the TCP socket buffer while ticking the speaker at some constant cadence of (a, b) cycles.  Since
 we know this is happening we can compensate for it, i.e. look ahead to this upcoming slow path and pre-position the
 speaker so that it introduces the least error during this period when we have to step away from direct cycle-level control of the speaker position.
 # Future work
 ## Ethernet configuration
 Hard-coding the ethernet config is not especially user friendly.  This should be configurable at runtime.
-### 6502 support
+## In-memory playback
-The player relies heavily on the JMP (indirect) 6502 opcode, which has a different cycle count on the 6502 (5 cycles)
+This level of audio quality requires high bit rate, about 92KB/sec.  So 1 minute of audio requires about 5.5MB of data.
 and 65c02 (6 cycles).  This means the player will be about 10% **faster** on a 6502 (e.g. II+, Unenhanced //e), but audio
 quality will be off until the encoder is made aware of this and able to compensate.
 This might be one of the few pieces of software for which a 65c02 at the same clock speed causes a measurable
 performance degradation (adding almost a minute to playback of an 8-minute song, until I compensated for it).
 Hat tip to Scott Duensing who noticed that my sample audio sounded "a tad slow", which turned out to be due to this
 1-cycle difference!
 ### Better encoding performance
 The encoder is written in Python and is about 30x slower than real-time at a reasonable quality level.  Further
 optimizations are possible but rewriting in e.g. C++ should give a large performance boost.
 ### Modeling as RLC circuit
 Modeling the full RLC circuit behaviour may give insight into the "crackling" audio behaviour, and/or allow for better
 controlling this.  As this is a second-order differential equation the simulation will be more complex and therefore
 slower.
 ### In-memory playback
 This level of audio quality requires high bit rate, about 85KB/sec.  So 1 minute of audio requires about 5MB of data.
 A "Slinky" style memory card (RamFactor etc) uses a very similar I/O mechanism to the Uthernet II, i.e a $C0xx address
 that auto-increments through the onboard memory space.  So it should be straightforward to extend ][-Sound to support
-RamFactor playback (I don't have one though).
+RamFactor playback.
 Playback from bank-switched memory (e.g. RamWorks) should also be feasible, though would require a small amount of 
 extra code to add the player opcode to switch banks.
-The other option is to reduce bitrate (and therefore audio quality).  Existing in-memory delta modulation players exist,
+The other option is to reduce bitrate (and therefore audio quality).  I think it should also be possible to improve
-e.g. Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc), though tooling for producing well-optimized
+in-memory playback quality at similar bitrate, through using some of the cycle-level targeting techniques (though
-audio data for them did not exist.  It should be possible to adapt the ][-sound encoder to produce better-quality audio
+probably not at full 1-cycle resolution).
 for these existing players.
-I think it should also be possible to improve in-memory playback quality at similar bitrate, through using some of the
+## 6502 support
-cycle-level targeting techniques (though perhaps not at full 1-cycle resolution).
+
 The player relies heavily on the JMP (indirect) 6502 opcode, which has a different cycle count on the 6502 (5 cycles)
 and 65c02 (6 cycles).  This means the player will be about 10% **faster** on a 6502 (e.g. II+, Unenhanced //e), but
 audio quality will be off until the encoder is made aware of this and able to compensate.
 This might be one of the few pieces of software for which a 65c02 at the same clock speed causes a measurable
 performance degradation (adding almost a minute to playback of an 8-minute song - hat tip to Scott Duensing who noticed
 that my sample audio sounded "a tad slow", which turned out to be due to hearing this 1-cycle timing difference!