mirror of
https://github.com/KrisKennaway/ii-sound.git
synced 2024-06-09 18:29:32 +00:00
Overhaul README to describe current status. Still some work needed.
This commit is contained in:
parent
f17db175a5
commit
3ef80ec6cf
103
README.md
103
README.md
|
@ -24,7 +24,10 @@ that the speaker cone should switch direction so that it traces out the desired
|
|||
possible. This includes looking some number of cycles into the future to anticipate upcoming changes in the waveform
|
||||
(e.g. sudden spikes), so the speaker can be pre-positioned to best accommodate them.
|
||||
|
||||
The resulting bytestream directs the Apple II to follow this speaker trajectory with cycle-level precision.
|
||||
The resulting bytestream directs the Apple II to follow this speaker trajectory with cycle-level precision, and
|
||||
typically ends up toggling the speaker about 110000 times/second.
|
||||
|
||||
XXX new player size
|
||||
|
||||
The actual audio playback code is small enough (~150 bytes) to fit in page 3. i.e. would have been small enough to type
|
||||
in from a magazine back in the day. The megabytes of audio data would have been hard to type in though ;) Plus,
|
||||
|
@ -33,19 +36,13 @@ Uthernets didn't exist back then (although a Slinky RAM card would let you do so
|
|||
# Implementation
|
||||
|
||||
The audio player uses [delta modulation](https://en.wikipedia.org/wiki/Delta_modulation) to produce the audio signal.
|
||||
|
||||
How this works is by modeling the Apple II speaker as an [RC circuit](https://en.wikipedia.org/wiki/RC_circuit). When
|
||||
we access $C030 it inverts the applied voltage across the speaker, and the speaker responds by moving
|
||||
asymptotically towards the new applied voltage level. Left to itself this results in an audio "tick". With some
|
||||
empirical tuning of the time constant of this RC circuit, we can precisely model how the Apple II speaker will respond
|
||||
to voltage changes, and use this to make the speaker "trace out" our desired waveform. We can't do this exactly --
|
||||
the speaker will zig-zag around the target waveform because we can only move it in finite steps -- so there is some
|
||||
left-over quantization noise that manifests as background static, though in our case this is barely noticeable.
|
||||
This signal is constructed based on an electrical model of how the Apple II behaves in response to input, which we
|
||||
simulate to optimize the audio quality.
|
||||
|
||||
Delta modulation with an RC circuit is also called "BTC", after https://www.romanblack.com/picsound.htm who described
|
||||
a number of variations on these (Apple II-like) audio circuits and Delta modulation audio encoding algorithms. See e.g.
|
||||
Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc) for an Apple II implementation that plays from
|
||||
memory at 33KHz
|
||||
memory at 33KHz.
|
||||
|
||||
The big difference with our approach is that we are able to target a 1MHz sampling rate, i.e. manipulate the speaker
|
||||
with 1-cycle precision, by choosing how the "player opcodes" are chained together by the ethernet bytestream.
|
||||
|
@ -58,10 +55,20 @@ In other words, we are able to choose a precise sequence of clock cycles in whic
|
|||
|
||||
The minimum period of 10 cycles is already short enough that it produces high-quality audio even if we only modulate
|
||||
the speaker at a fixed cadence of 10 cycles (i.e. at 102.4KHz instead of 1MHz), although in practice a fixed 14-cycle
|
||||
period gave better quality (10 cycles produces a quiet but audible background tone coming from some kind of harmonic --
|
||||
period gave better quality (10 cycles produced a quiet but audible background tone coming from some kind of harmonic --
|
||||
perhaps an interaction with the every-64-cycle "long cycle" of the Apple II). The initial version of ][-Sound used this
|
||||
approach (and also used the "spare" 4 cycles for a page-flipping trick to visualize the audio bitstream while playing).
|
||||
|
||||
We can also use another trick to improve audio quality further: certain 65x02 opcodes will access memory multiple times
|
||||
during execution (sometimes called "false reads"). For example, the INC $C030,X opcode executes for 7 cycles and will
|
||||
access memory location $C030+X on cycles 4,5,6,7 (for values of X that do not result in page-crossing). So by making
|
||||
sure X=0 we can toggle the speaker 4 times in 7 cycles.
|
||||
|
||||
We use the following opcodes to cover all of the timing possibilities: NOP; STA $zp; STA $C030; STA $C030,X; INC $C030;
|
||||
INC $C030,X
|
||||
|
||||
This improves audio quality by XXX%
|
||||
|
||||
## Player
|
||||
|
||||
The player consists of some ethernet setup code and a core playback loop of "player opcodes", which are the basic
|
||||
|
@ -72,19 +79,29 @@ Some other tricks used here:
|
|||
- The minimal 10-cycle (9-cycle) speaker loop is: STA $C030; JMP (WDATA), where we use an undocumented property of the
|
||||
Uthernet II: I/O registers on the WDATA don't wire up all of the address lines, so they are also accessible at
|
||||
other address offsets. In particular WDATA+1 is a duplicate copy of WMODE. In our case WMODE happens to be 0x3.
|
||||
This lets us use WDATA as a jump table into page 3, where we place our player code. We then choose the network
|
||||
byte stream to contain the low-order byte of the target address we want to jump to next.
|
||||
This lets us use WDATA as a dynamic jump table into page 3, where we place our player code. We then choose the
|
||||
network byte stream to contain the low-order byte of the target address we want to jump to next, and we'll
|
||||
indirect-jump to $03xx.
|
||||
|
||||
- We prepend runs of 2 or 3-cycle padding opcodes (NOP, dummy 3-cycle store) to allow jumping into an opcode at
|
||||
various entry points to give additional delay variants.
|
||||
- There are many potential combinations of opcodes we could choose to produce patterns of speaker access. If we limit
|
||||
to simple cases (e.g. 2 and 3-cycle padding opcodes, plus STA $C030) then the optimal solution can be easily
|
||||
constructed by hand, but this is infeasible when we include additional "exotic" choices like INC $C030. Instead, we
|
||||
machine-generate this part of the player code.
|
||||
|
||||
- The choice of cycle lengths for the delay+tick and delay-only opcodes is such that we can obtain any delay period
|
||||
between speaker ticks (except <10, or 11) by chaining them together.
|
||||
- To do this, we compute all possible sequences of our candidate 65x02 opcodes up to maximum cycle count, and then
|
||||
determine the subset that allows access to the largest range of speaker trajectories, subject to the space constraint
|
||||
of fitting within page 3. We also make of the property that the player can jump to any opcode within these sequences,
|
||||
which allows much greater coverage.
|
||||
|
||||
- By chaining together these "player opcodes", we can toggle the speaker with a wide variety of cycle patterns, though
|
||||
successive player opcodes always have a gap of at least 10 cycles between speaker toggles. However even this cooldown
|
||||
gap amounts to 102.4KHz which is far beyond audible range.
|
||||
|
||||
- As with my [\]\[-Vision](https://github.com/KrisKennaway/ii-vision) streaming video+audio player, we schedule a "slow
|
||||
path" dispatch to occur every 2KB in the byte stream, and use this to manage the socket buffers (ACK the read 2KB and
|
||||
wait until at least 2KB more is available, which is usually non-blocking). While doing this we need to maintain a
|
||||
regular tick cadence so the speaker is in a known trajectory. We can compensate for this in the audio encoder.
|
||||
regular (non-audible) tick cadence so the speaker is in a known trajectory. We can also partly compensate for this in
|
||||
the audio encoder.
|
||||
|
||||
## Encoding
|
||||
|
||||
|
@ -101,7 +118,7 @@ choose to schedule during this cycle window. This makes the encoding exponentia
|
|||
it allows us to e.g. anticipate large amplitude changes by pre-moving the speaker to better approximate them.
|
||||
|
||||
This also needs to take into account scheduling the "slow path" every 2048 output bytes, where the Apple II will manage
|
||||
the TCP socket buffer while ticking the speaker at a constant cadence (currently chosen to be every 14 cycles). Since
|
||||
the TCP socket buffer while ticking the speaker at a constant cadence (currently chosen to be every 14 cycles XXX). Since
|
||||
we know this is happening we can compensate for it, i.e. look ahead to this upcoming slow path and pre-position the
|
||||
speaker so that it introduces the least error during this "dead" period when we're keeping the speaker in a net-neutral
|
||||
position.
|
||||
|
@ -116,7 +133,7 @@ where:
|
|||
|
||||
* `step size` is the fractional movement from current voltage to target voltage that we assume the Apple II speaker is
|
||||
making during each clock cycle. A value of 500 (i.e. moving 1/500 of the distance) seems to be about right for my
|
||||
Apple //e. This corresponds to a time constant of about 500us for the speaker RC circuit.
|
||||
Apple //e. This corresponds to a time constant of about 500us for the speaker RC circuit. XXX
|
||||
|
||||
* `lookahead steps` defines how many cycles into the future we want to look when optimizing. This is exponentially
|
||||
slower since we have to evaluate all possible sequences of player opcodes that could be chosen within the lookahead
|
||||
|
@ -132,6 +149,31 @@ This runs a HTTP server listening on port 1977 to which the player connects, the
|
|||
$ ./play_audio.py <filename.a2s>
|
||||
```
|
||||
|
||||
# Theory of operation
|
||||
|
||||
When we access $C030 it inverts the applied voltage across the speaker, and left to itself this results in an audio
|
||||
"click". When we invert the applied voltage, the speaker initially responds by moving asymptotically towards
|
||||
the new voltage level, before developing oscillations that decay in amplitude over the following few milliseconds.
|
||||
|
||||
Electrically, the speaker behaves like an [RLC circuit](https://en.wikipedia.org/wiki/RLC_circuit), and the change in
|
||||
applied voltage produces an oscillating audio waveform. (Actually this seems to be an approximation, and the actual
|
||||
audio output looks more like the sum of _two_ RLC circuits, with different frequencies - I'd like to understand this
|
||||
better)
|
||||
|
||||
If we actuate the speaker frequently enough, these oscillations don't have time to develop and we can ignore them, so
|
||||
the modeling becomes simpler. This amounts to approximating the RLC circuit by an
|
||||
[RC circuit](https://en.wikipedia.org/wiki/RC_circuit) which is easier to simulate.
|
||||
|
||||
With some empirical tuning of the time constant of this RC circuit, we can accurately model how the Apple II speaker
|
||||
will respond to voltage changes, and use this to make the speaker "trace out" our desired waveform. We can't do this
|
||||
exactly -- the speaker will zig-zag around the target waveform because we can only move it in finite jumps -- so there
|
||||
is some left-over "quantization noise" that manifests as background static, though in our case this is barely noticeable.
|
||||
|
||||
In practise the resulting audio also sometimes contains clicks or "crackling". This problem is also found in other
|
||||
Apple II audio playback techniques (e.g. PWM) and (from looking at audio waveforms) it seems to be due to the speaker
|
||||
falling over into the non-linear oscillation mode. i.e. we haven't successfully managed to keep it in the linear
|
||||
regime. Perhaps it will be necessary to model the full RLC circuit behaviour to control for this.
|
||||
|
||||
## Future work
|
||||
|
||||
### Ethernet configuration
|
||||
|
@ -155,24 +197,11 @@ Hat tip to Scott Duensing who noticed that my sample audio sounded "a tad slow",
|
|||
The encoder is written in Python and is about 30x slower than real-time at a reasonable quality level. Further
|
||||
optimizations are possible but rewriting in e.g. C++ should give a large performance boost.
|
||||
|
||||
### Better quality?
|
||||
### Modeling as RLC circuit
|
||||
|
||||
We can tick the speaker more frequently than 10 cycles using a couple of methods:
|
||||
|
||||
- chaining multiple STA $C030 together, e.g. to give a 4/.../4/4/10 cadence.
|
||||
|
||||
- by exploiting 6502 opcodes that repeatedly access memory during execution, including "false reads". During the course
|
||||
of executing a 6502 opcode, the CPU may access memory locations multiple times (up to 4 times, during successive clock
|
||||
cycles). This would give additional options for (partial) control of the speaker in the <10-cycle period regime.
|
||||
|
||||
Early results suggest that using these exotic opcode variants (e.g. INC $C030) may give a quality boost.
|
||||
|
||||
### Measure speaker time constants
|
||||
|
||||
It would be interesting to measure the time constant of the speaker circuit directly (e.g. via oscilloscope) instead
|
||||
of tuning by ear by picking a value whose output "sounds best".
|
||||
|
||||
Different Apple II models may well have different speaker characteristics - so far I've only tested on a single //e.
|
||||
Modeling the full RLC circuit behaviour may give insight into the "crackling" audio behaviour, and/or allow for better
|
||||
controlling this. As this is a second-order differential equation the simulation will be more complex and therefore
|
||||
slower.
|
||||
|
||||
### In-memory playback
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user