Fixup player timings and opcode variants for 65c02 timings since JMP

(indirect) takes 5 cycles not 6!  It should be possible to also
accommodate 6502 timings in a followup.

h/t to Scott Duensing who noticed that my sample audio sounded "a tad
slow", which turned out to be due to this 1-cycle difference (which
added up to almost an extra minute playback to an 8-minute song).

Add comments and tidy up the code a bit.

Flesh out README some more.
This commit is contained in:
kris 2020-08-16 23:15:30 +01:00
parent 9e0e1fcbcb
commit 4767ee51fd
6 changed files with 284 additions and 334 deletions

View File

@ -26,43 +26,46 @@ possible. This includes looking some number of cycles into the future to antici
The resulting bytestream directs the Apple II to follow this speaker trajectory with cycle-level precision.
The actual audio playback code is small enough to fit in page 3. i.e. would have been small enough to type in from a
magazine back in the day (the megabytes of audio data would have been hard to type in though). Plus, Uthernets didn't
exist back then (although a Slinky RAM card would let you do something similar, see Future Work below).
The actual audio playback code is small enough (~150 bytes) to fit in page 3. i.e. would have been small enough to type
in from a magazine back in the day. The megabytes of audio data would have been hard to type in though ;) Plus,
Uthernets didn't exist back then (although a Slinky RAM card would let you do something similar, see Future Work below).
# Implementation
## Player
The audio player uses [delta modulation](https://en.wikipedia.org/wiki/Delta_modulation) to produce the audio signal.
How this works is by modeling the Apple II speaker as an [RC circuit](https://en.wikipedia.org/wiki/RC_circuit). When
we tick the speaker (access $C030) it inverts the applied voltage across it, and the speaker responds by moving
asymptotically towards the new applied voltage level. With some empirical tuning of the time constant of this RC
circuit, we can precisely model how the Apple II speaker will respond to voltage changes, and use this to make the
speaker "trace out" our desired waveform. We can't do this exactly so there is some left-over quantization noise that
manifests as background static.
we access $C030 it inverts the applied voltage across the speaker, and the speaker responds by moving
asymptotically towards the new applied voltage level. Left to itself this results in an audio "tick". With some
empirical tuning of the time constant of this RC circuit, we can precisely model how the Apple II speaker will respond
to voltage changes, and use this to make the speaker "trace out" our desired waveform. We can't do this exactly --
the speaker will zig-zag around the target waveform because we can only move it in finite steps -- so there is some
left-over quantization noise that manifests as background static, though in our case this is barely noticeable.
Delta modulation with an RC circuit is also called "BTC", after https://www.romanblack.com/picsound.htm who described
a number of variations on these (Apple II-like) audio circuits and Delta modulation audio encoding algorithms. See e.g.
Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc) for an Apple II implementation that plays from
memory.
memory at 33KHz
The big difference with our approach is that we are able to target a 1-cycle resolution, i.e. modulate the audio at
1MHz. The caveat is that we once we toggle the speaker there is a "cooldown period" of 10 cycles (9 cycles on 6502)
until we can toggle it again, though we can target any period larger than 11 (i.e. possible values are every 10, 12, 13,
14, ... cycles). Successive choices are independent.
The big difference with our approach is that we are able to target a 1MHz sampling rate, i.e. manipulate the speaker
with 1-cycle precision, by choosing how the "player opcodes" are chained together by the ethernet bytestream.
The catch is that once we have toggled the speaker we can't toggle it again until at least 10 cycles have passed (9
cycles on 6502), but we can pick any such interval >= 10 cycles (except for 11 cycles because of 65x02 opcode timing
limitations). Successive choices are independent.
In other words, we are able to choose a precise sequence of clock cycles in which to toggle the speaker, but these
cannot be spaced too close together.
In other words, we are able to choose a precise sequence of clock cycles in which to toggle the speaker, but there is a
"cooldown" period and these cannot be spaced too close together.
This minimum period of 10 cycles is already short enough that it produces high-quality audio even if we only modulate
the speaker at a fixed cadence of 10 cycles (i.e. at 102.4KHz), although in practice a fixed 14-cycle period gave better
audio (10 cycles produces a quiet but audible background tone coming from some kind of harmonic). The initial version
of ][-Sound used this approach (and used the "spare" 4 cycles for a page-flipping trick to visualize the audio bitstream
while playing).
The minimum period of 10 cycles is already short enough that it produces high-quality audio even if we only modulate
the speaker at a fixed cadence of 10 cycles (i.e. at 102.4KHz instead of 1MHz), although in practice a fixed 14-cycle
period gave better quality (10 cycles produces a quiet but audible background tone coming from some kind of harmonic --
perhaps an interaction with the every-64-cycle "long cycle" of the Apple II). The initial version of ][-Sound used this
approach (and also used the "spare" 4 cycles for a page-flipping trick to visualize the audio bitstream while playing).
The player consists of some ethernet setup code and a core playback loop of "player opcodes", which are the
## Player
The player consists of some ethernet setup code and a core playback loop of "player opcodes", which are the basic
operations that are dispatched to by the bytestream.
Some other tricks used here:
@ -80,8 +83,8 @@ Some other tricks used here:
- As with my [\]\[-Vision](https://github.com/KrisKennaway/ii-vision) streaming video+audio player, we schedule a "slow
path" dispatch to occur every 2KB in the byte stream, and use this to manage the socket buffers (ACK the read 2KB and
wait until at least 2KB more is available, which is usually non-blocking). While doing this we need to maintain the
13 cycle cadence so the speaker is in a known trajectory. We can compensate for this in the audio encoder.
wait until at least 2KB more is available, which is usually non-blocking). While doing this we need to maintain a
regular tick cadence so the speaker is in a known trajectory. We can compensate for this in the audio encoder.
## Encoding
@ -98,7 +101,7 @@ choose to schedule during this cycle window. This makes the encoding exponentia
it allows us to e.g. anticipate large amplitude changes by pre-moving the speaker to better approximate them.
This also needs to take into account scheduling the "slow path" every 2048 output bytes, where the Apple II will manage
the TCP socket buffer while ticking the speaker at a constant cadence (currently chosen to be every 13 cycles). Since
the TCP socket buffer while ticking the speaker at a constant cadence (currently chosen to be every 14 cycles). Since
we know this is happening we can compensate for it, i.e. look ahead to this upcoming slow path and pre-position the
speaker so that it introduces the least error during this "dead" period when we're keeping the speaker in a net-neutral
position.
@ -115,8 +118,9 @@ where:
making during each clock cycle. A value of 500 (i.e. moving 1/500 of the distance) seems to be about right for my
Apple //e. This corresponds to a time constant of about 500us for the speaker RC circuit.
* `lookahead steps` defines how far into the future we want to look when optimizing. This is exponentially slower
since we have to evaluate all 2^N possible combinations of tick/no-tick. A value of 15-20 gives good quality.
* `lookahead steps` defines how many cycles into the future we want to look when optimizing. This is exponentially
slower since we have to evaluate all possible sequences of player opcodes that could be chosen within the lookahead
horizon. A value of 20 gives good quality.
* `output.a2s` is the output file to write to.
@ -137,7 +141,7 @@ Hard-coding the ethernet config is not especially user friendly. This should be
### 6502 support
The player relies heavily on the JMP (indirect) 6502 opcode, which has a different cycle count on the 6502 (5 cycles)
and 65c02 (6 cycles). This means the player will be about 10% faster on a 6502 (e.g. II+, Unenhanced //e), but audio
and 65c02 (6 cycles). This means the player will be about 10% **faster** on a 6502 (e.g. II+, Unenhanced //e), but audio
quality will be off until the encoder is made aware of this and able to compensate.
This might be one of the few pieces of software for which a 65c02 at the same clock speed causes a measurable
@ -152,13 +156,13 @@ optimizations are possible but rewriting in e.g. C++ should give a large perform
We can tick the speaker more frequently than 10 cycles using a couple of methods:
- chaining multiple STA $C030 together, e.g. to give a 4/.../4/4/9 cadence.
- chaining multiple STA $C030 together, e.g. to give a 4/.../4/4/10 cadence.
- by exploiting 6502 "false reads". During the course of executing a 6502 opcode, the CPU may access memory locations
multiple times (up to 4 times, during successive clock cycles). This would give additional options for (partial)
control of the speaker in the <10-cycle period regime.
It remains to be seen to what extent these approaches may effect audio quality.
- by exploiting 6502 opcodes that repeatedly access memory during execution, including "false reads". During the course
of executing a 6502 opcode, the CPU may access memory locations multiple times (up to 4 times, during successive clock
cycles). This would give additional options for (partial) control of the speaker in the <10-cycle period regime.
Early results suggest that using these exotic opcode variants (e.g. INC $C030) may give a quality boost.
### Measure speaker time constants
@ -183,5 +187,5 @@ e.g. Oliver Schmidt's [PLAY.BTC](https://github.com/oliverschmidt/Play-BTc), tho
audio data for them did not exist. It should be possible to adapt the ][-sound encoder to produce better-quality audio
for these existing players.
I think it should also be possible to improve quality at similar bitrate, through using some of the cycle-level targeting
techniques (though perhaps not at full 1-cycle resolution).
I think it should also be possible to improve in-memory playback quality at similar bitrate, through using some of the
cycle-level targeting techniques (though perhaps not at full 1-cycle resolution).

View File

@ -1,101 +1,78 @@
#!/usr/bin/env python3
# Delta modulation audio encoder.
#
# Models the Apple II speaker as an RC circuit with given time constant
# and computes a sequence of speaker ticks at multiples of 13-cycle intervals
# to approximate the target audio waveform.
# Simulates the Apple II speaker at 1MHz (i.e. cycle-level) resolution,
# by modeling it as an RC circuit with given time constant. In order to
# reproduce a target audio waveform, we upscale it to 1MHz sample rate,
# and compute the sequence of player opcodes to best reproduce this waveform.
#
# To optimize the audio quality we look ahead some defined number of steps and
# choose a speaker trajectory that minimizes errors over this range. e.g.
# this allows us to anticipate large amplitude changes by pre-moving
# Since the player opcodes are chosen to allow ticking the speaker during any
# given clock cycle (though with some limits on the minimum time
# between ticks), this means that we are able to control the Apple II speaker
# with cycle-level precision, which results in high audio fidelity with low
# noise.
#
# To further optimize the audio quality we look ahead some defined number of
# cycles and choose a speaker trajectory that minimizes errors over this range.
# e.g. this allows us to anticipate large amplitude changes by pre-moving
# the speaker to better approximate them.
#
# This also needs to take into account scheduling the "slow path" every 2048
# output bytes, where the Apple II will manage the TCP socket buffer while
# ticking the speaker every 13 cycles. Since we know this is happening
# we can compensate for it, i.e. look ahead to this upcoming slow path and
# pre-position the speaker so that it introduces the least error during
# this "dead" period when we're keeping the speaker in a net-neutral position.
# This also needs to take into account scheduling the "slow path" opcode every
# 2048 output bytes, where the Apple II will manage the TCP socket buffer while
# ticking the speaker at a regular cadence of 13 cycles to keep it in a
# net-neutral position. When looking ahead we can also (partially)
# compensate for this "dead" period by pre-positioning.
import collections
import sys
import librosa
import numpy
from typing import List, Tuple
from eta import ETA
import opcodes
#
# # TODO: test
# @functools.lru_cache(None)
# def lookahead_patterns(
# lookahead: int, slowpath_distance: int,
# voltage: float) -> numpy.ndarray:
# initial_voltage = voltage
# patterns = set()
#
# slowpath_pre_bits = 0
# slowpath_post_bits = 0
# if slowpath_distance <= 0:
# slowpath_pre_bits = min(12 + slowpath_distance, lookahead)
# elif slowpath_distance <= lookahead:
# slowpath_post_bits = lookahead - slowpath_distance
#
# enumerate_bits = lookahead - slowpath_pre_bits - slowpath_post_bits
# assert slowpath_pre_bits + enumerate_bits + slowpath_post_bits == lookahead
#
# for i in range(2 ** enumerate_bits):
# voltage = initial_voltage
# pattern = []
# for j in range(slowpath_pre_bits):
# voltage = -voltage
# pattern.append(voltage)
#
# for j in range(enumerate_bits):
# voltage = 1.0 if ((i >> j) & 1) else -1.0
# pattern.append(voltage)
#
# for j in range(slowpath_post_bits):
# voltage = -voltage
# pattern.append(voltage)
#
# patterns.add(tuple(pattern))
#
# res = numpy.array(list(patterns), dtype=numpy.float32)
# return res
# TODO: add flags to parametrize options
def lookahead(step_size: int, initial_position: float, data: numpy.ndarray,
offset: int,
voltages: numpy.ndarray):
offset: int, voltages: numpy.ndarray):
"""Evaluate effects of multiple potential opcode sequences and pick best.
We simulate the speaker voltage trajectory resulting from applying multiple
voltage profiles, compute the resulting squared error relative to the
target waveform, and pick the best one.
We use numpy to vectorize the computation since it has better scaling
performance with more opcode choices, although also has a larger fixed
overhead.
"""
positions = numpy.empty((voltages.shape[0], voltages.shape[1] + 1),
dtype=numpy.float32)
positions[:, 0] = initial_position
target_val = data[offset:offset + voltages.shape[1]]
# total_error = numpy.zeros(shape=voltages.shape[0], dtype=numpy.float32)
scaled_voltages = voltages / step_size
for i in range(0, voltages.shape[1]):
positions[:, i + 1] = positions[:, i] + (
voltages[:, i] - positions[:, i]) / step_size
# err = numpy.power(numpy.abs(positions - target_val[i]), 2)
# total_error += err
try:
err = positions[:, 1:] - target_val
except ValueError:
print(offset, len(data), positions.shape, target_val.shape)
raise
positions[:, i + 1] = (
scaled_voltages[:, i] + positions[:, i] * (1 - 1 / step_size))
err = positions[:, 1:] - target_val
total_error = numpy.sum(numpy.power(err, 2), axis=1)
best = numpy.argmin(total_error)
return best
# TODO: share implementation with lookahead
def evolve(opcode: opcodes.Opcode, starting_position, starting_voltage,
step_size, data, starting_idx):
# Skip ahead to end of this opcode
"""Apply the effects of playing a single opcode to completion.
Returns new state.
"""
opcode_length = opcodes.cycle_length(opcode)
voltages = starting_voltage * opcodes.CYCLE_SCHEDULE[opcode]
voltages = starting_voltage * opcodes.VOLTAGE_SCHEDULE[opcode]
position = starting_position
total_err = 0.0
v = starting_voltage
@ -105,8 +82,10 @@ def evolve(opcode: opcodes.Opcode, starting_position, starting_voltage,
total_err += err ** 2
return position, v, total_err, starting_idx + opcode_length
@profile
def sample(data: numpy.ndarray, step: int, lookahead_steps: int):
def audio_bytestream(data: numpy.ndarray, step: int, lookahead_steps: int):
"""Computes optimal sequence of player opcodes to reproduce audio data."""
dlen = len(data)
data = numpy.concatenate([data, numpy.zeros(lookahead_steps)]).astype(
numpy.float32)
@ -119,7 +98,9 @@ def sample(data: numpy.ndarray, step: int, lookahead_steps: int):
eta = ETA(total=1000)
i = 0
last_updated = 0
while i < int(dlen / 100):
opcode_counts = collections.defaultdict(int)
while i < dlen:
if (i - last_updated) > int((dlen / 1000)):
eta.print_status()
last_updated = i
@ -131,8 +112,10 @@ def sample(data: numpy.ndarray, step: int, lookahead_steps: int):
opcode_idx = lookahead(step, position, data, i, voltage * voltages)
opcode = pruned_opcodes[opcode_idx].opcodes[0]
opcode_counts[opcode] += 1
yield opcode
# TODO: round position and memoize, and use in lookahead too
position, voltage, new_error, i = evolve(
opcode, position, voltage, step, data, i)
@ -140,18 +123,25 @@ def sample(data: numpy.ndarray, step: int, lookahead_steps: int):
frame_offset = (frame_offset + 1) % 2048
for _ in range(frame_offset % 2048, 2047):
yield opcodes.Opcode.NOTICK_5
yield opcodes.Opcode.NOTICK_6
yield opcodes.Opcode.EXIT
eta.done()
print("Total error %f" % total_err)
print("Opcodes used:")
for v, k in sorted(list(opcode_counts.items()), key=lambda kv: kv[1],
reverse=True):
print("%s: %d" % (v, k))
def preprocess(
filename: str, target_sample_rate: int,
normalize: float = 0.5) -> numpy.ndarray:
"""Upscale input audio to target sample rate and normalize signal."""
data, _ = librosa.load(filename, sr=target_sample_rate, mono=True)
max_value = numpy.percentile(data, 90)
max_value = numpy.percentile(data, 100)
data /= max_value
data *= normalize
@ -161,13 +151,19 @@ def preprocess(
def main(argv):
serve_file = argv[1]
step = int(argv[2])
# TODO: if we're not looking ahead beyond the longest (non-slowpath) opcode
# then this will reduce quality, e.g. a long NOTICK and TICK will
# both look the same over a too-short horizon, but have different results.
lookahead_steps = int(argv[3])
out = argv[4]
# TODO: PAL Apple ][ clock rate is slightly different
sample_rate = int(1024. * 1000)
data = preprocess(serve_file, sample_rate)
with open(out, "wb+") as f:
for opcode in sample(data, step, lookahead_steps):
for opcode in audio_bytestream(data, step, lookahead_steps):
f.write(bytes([opcode.value]))

View File

@ -4,67 +4,71 @@ import numpy
from typing import Dict, List, Tuple, Iterable
# TODO: support 6502 cycle counts as well
class Opcode(enum.Enum):
TICK_12 = 0x00
TICK_17 = 0x08
TICK_15 = 0x09
TICK_13 = 0x0a
TICK_11 = 0x0b
TICK_9 = 0x0c
"""Audio player opcodes representing atomic units of audio playback work."""
TICK_17 = 0x00
TICK_15 = 0x01
TICK_13 = 0x02
TICK_14 = 0x0a
TICK_12 = 0x0b
TICK_10 = 0x0c
NOTICK_6 = 0x0f
NOTICK_8 = 0x12
NOTICK_11 = 0x17
NOTICK_9 = 0x18
NOTICK_7 = 0x19
NOTICK_5 = 0x1a
EXIT = 0x1d
SLOWPATH = 0x2d
EXIT = 0x12
SLOWPATH = 0x22
def make_tick_cycles(length) -> numpy.ndarray:
def make_tick_voltages(length) -> numpy.ndarray:
"""Voltage sequence for a NOP; ...; STA $C030; JMP (WDATA)."""
c = numpy.full(length, 1.0, dtype=numpy.float32)
for i in range(length - 6, length):
for i in range(length - 7, length): # TODO: 6502
c[i] = -1.0
return c
def make_notick_cycles(length) -> numpy.ndarray:
def make_notick_voltages(length) -> numpy.ndarray:
"""Voltage sequence for a NOP; ...; JMP (WDATA)."""
return numpy.full(length, 1.0, dtype=numpy.float32)
def make_slowpath_cycles() -> numpy.ndarray:
length = 12 * 13
def make_slowpath_voltages() -> numpy.ndarray:
"""Voltage sequence for slowpath TCP processing."""
length = 8 * 14 + 10 # TODO: 6502
c = numpy.full(length, 1.0, dtype=numpy.float32)
voltage_high = True
for i in range(12):
for i in range(8):
voltage_high = not voltage_high
for j in range(3 + 13 * i, min(length, 3 + 13 * (i + 1))):
for j in range(3 + 14 * i, min(length, 3 + 14 * (i + 1))):
c[j] = 1.0 if voltage_high else -1.0
return c
# XXX rename to voltages
CYCLE_SCHEDULE = {
Opcode.TICK_12: make_tick_cycles(12),
Opcode.TICK_17: make_tick_cycles(17),
Opcode.TICK_15: make_tick_cycles(15),
Opcode.TICK_13: make_tick_cycles(13),
Opcode.TICK_11: make_tick_cycles(11),
Opcode.TICK_9: make_tick_cycles(9),
Opcode.NOTICK_8: make_notick_cycles(8),
Opcode.NOTICK_11: make_notick_cycles(11),
Opcode.NOTICK_9: make_notick_cycles(9),
Opcode.NOTICK_7: make_notick_cycles(7),
Opcode.NOTICK_5: make_notick_cycles(5),
Opcode.SLOWPATH: make_slowpath_cycles()
# Sequence of applied voltage inversions that result from executing each player
# opcode, at each processor cycle. We assume the starting applied voltage is
# 1.0.
VOLTAGE_SCHEDULE = {
Opcode.TICK_17: make_tick_voltages(17),
Opcode.TICK_15: make_tick_voltages(15),
Opcode.TICK_13: make_tick_voltages(13),
Opcode.TICK_14: make_tick_voltages(14),
Opcode.TICK_12: make_tick_voltages(12),
Opcode.TICK_10: make_tick_voltages(10),
Opcode.NOTICK_6: make_notick_voltages(6),
Opcode.SLOWPATH: make_slowpath_voltages(),
} # type: Dict[Opcode, numpy.ndarray]
def cycle_length(op: Opcode) -> int:
return len(CYCLE_SCHEDULE[op])
"""Returns the 65C02 cycle length of a player opcode."""
return len(VOLTAGE_SCHEDULE[op])
class _Opcodes:
"""Container for immutable Iterable[Opcode], to improve hash performance."""
def __init__(self, opcodes: Iterable[Opcode]):
self.opcodes = tuple(opcodes)
self._hash = hash(self.opcodes)
@ -72,31 +76,48 @@ class _Opcodes:
def __hash__(self):
return self._hash
# Guarantees each Tuple[Opcode] has a unique _Opcodes representation
_OPCODES_CACHE = {}
_OPCODES_SINGLETON = {}
@functools.lru_cache(None)
def Opcodes(opcodes: Tuple[Opcode]):
return _OPCODES_CACHE.setdefault(opcodes, _Opcodes(opcodes))
"""Returns unique _Opcodes representation for Tuple[Opcode]."""
return _OPCODES_SINGLETON.setdefault(opcodes, _Opcodes(opcodes))
@functools.lru_cache(None)
def opcode_choices(frame_offset: int) -> List[Opcode]:
"""Returns sorted list of valid opcodes for given frame offset.
Sorted by decreasing cycle length, so that if two opcodes produce equally
good results, we'll pick the one with the longest cycle count to reduce the
stream bitrate.
"""
if frame_offset == 2047:
return [Opcode.SLOWPATH]
opcodes = set(CYCLE_SCHEDULE.keys()) - {Opcode.SLOWPATH}
# Prefer longer opcodes to have a more compact bytestream
# XXX if we aren't looking ahead beyond 1 opcode we should
# pick the shortest?
opcodes = set(VOLTAGE_SCHEDULE.keys()) - {Opcode.SLOWPATH}
return sorted(list(opcodes), key=cycle_length, reverse=True)
@functools.lru_cache(None)
def opcode_lookahead(
frame_offset: int,
lookahead_cycles: int) -> Tuple[_Opcodes]:
"""Computes all valid sequences of opcodes spanning lookahead_cycles."""
return tuple(Opcodes(ops) for ops in
_opcode_lookahead(frame_offset, lookahead_cycles))
@functools.lru_cache(None)
def _opcode_lookahead(
frame_offset: int,
lookahead_cycles: int) -> Tuple[Tuple[Opcode]]:
"""Recursively enumerates all valid opcode sequences."""
ch = opcode_choices(frame_offset)
ops = []
for op in ch:
@ -104,23 +125,14 @@ def _opcode_lookahead(
ops.append((op,))
else:
for res in _opcode_lookahead((frame_offset + 1) % 2048,
lookahead_cycles - cycle_length(op)):
lookahead_cycles - cycle_length(op)):
ops.append((op,) + res)
return tuple(ops) # XXX type
@functools.lru_cache(None)
def opcode_lookahead(
frame_offset: int,
lookahead_cycles: int) -> Tuple[_Opcodes]:
return tuple(Opcodes(ops) for ops in
_opcode_lookahead(frame_offset, lookahead_cycles))
_CYCLES_CACHE = {}
return tuple(ops) # TODO: fix return type
class Cycles:
"""Container for immutable Tuple[float], to improve hash performance."""
def __init__(self, cycles: Tuple[float]):
self.cycles = cycles
self._hash = hash(cycles)
@ -129,22 +141,36 @@ class Cycles:
return self._hash
# Guarantees each Tuple[float] has a unique Cycles representation
_CYCLES_SINGLETON = {}
@functools.lru_cache(None)
def cycle_lookahead(
opcodes: _Opcodes,
lookahead_cycles: int
) -> Cycles:
"""Computes the applied voltage effects of a sequence of opcodes.
i.e. produces the sequence of applied voltage changes that will result
from executing these opcodes, limited to the next lookahead_cycles.
"""
cycles = []
for op in opcodes.opcodes:
cycles.extend(CYCLE_SCHEDULE[op])
cycles.extend(VOLTAGE_SCHEDULE[op])
trunc_cycles = tuple(cycles[:lookahead_cycles])
return _CYCLES_CACHE.setdefault(trunc_cycles, Cycles(trunc_cycles))
return _CYCLES_SINGLETON.setdefault(trunc_cycles, Cycles(trunc_cycles))
@functools.lru_cache(None)
def prune_opcodes(
opcodes: Tuple[_Opcodes], lookahead_cycles: int
) -> Tuple[List[_Opcodes], numpy.ndarray]:
"""Deduplicate a tuple of opcode sequences that are equivalent.
For each opcode sequence whose effect is the same when truncated to
lookahead_cycles, retains the first such opcode sequence.
"""
seen_cycles = set()
pruned_opcodes = []
pruned_cycles = []
@ -156,11 +182,4 @@ def prune_opcodes(
pruned_opcodes.append(ops)
pruned_cycles.append(cycles.cycles)
return pruned_opcodes, numpy.array(pruned_cycles, dtype=numpy.float32)
if __name__ == "__main__":
lah = 50
ops = opcode_lookahead(0, lah)
pruned = prune_opcodes(ops, lah)
print(len(ops), len(pruned[0]))
return pruned_opcodes, numpy.array(pruned_cycles, dtype=numpy.float32)

Binary file not shown.

View File

@ -4,19 +4,22 @@
; Created by Kris Kennaway on 27/07/2020.
; Copyright © 2020 Kris Kennaway. All rights reserved.
;
; Delta modulation audio player for streaming audio over Ethernet (often called "BTC" in the Apple II community, after
; https://www.romanblack.com/picsound.htm who described various Apple II-like audio circuits and audio encoding
; algorithms).
; Delta modulation audio player for streaming audio over Ethernet.
;
; How this works is by modeling the Apple II speaker as an RC circuit. When we tick the speaker it inverts the voltage
; across it, and the speaker responds by moving asymptotically towards the new level. With some empirical tuning of
; the time constant of this RC circuit, we can precisely model how the speaker will respond to voltage changes, and use
; this to make the speaker "trace out" our desired waveform. We can't do this precisely so there is some left-over
; quantization noise that manifests as background static.
; How this works is by modeling the Apple II speaker as an RC circuit. Delta modulation with an RC circuit is often
; called "BTC", after https://www.romanblack.com/picsound.htm.
;
; This player uses a 13-cycle period, i.e. about 78.7KHz sampling rate. We could go as low as 9 cycles for the period,
; but there is an audible 12.6KHz harmonic that I think is due to interference between the 9 cycle period and the
; every-65-cycle "long cycle" of the Apple II CPU. 13 cycles evenly divides 65 so this avoids the harmonic.
; When we tick the speaker it inverts the applied voltage across it, and the speaker responds by moving asymptotically
; towards the new level. With some empirical tuning of the time constant of this RC circuit (which seems to be about
; 500 us), we can precisely model how the speaker will respond to voltage changes, and use this to make the speaker
; "trace out" our desired waveform. We can't do this precisely -- the speaker will zig-zag around the target waveform
; because we can only move it in finite steps -- so there is some left-over quantization noise that manifests as
; background static.
;
; This player is capable of manipulating the speaker with 1-cycle precision, i.e. a 1MHz sampling rate, depending on
; how the "player opcodes" are chained together by the ethernet bytestream. The catch is that once we have toggled
; the speaker we can't toggle it again until at least 10 cycles have passed, but we can pick any interval >= 10 cycles
; (except for 11 because of 6502 opcode timing limitations).
;
; Some other tricks used here:
;
@ -27,7 +30,7 @@
; byte stream to contain the low-order byte of the target address we want to jump to next.
; - Since our 13-cycle period gives us 4 "spare" cycles over the minimal 9, that also lets us do a page-flipping trick
; to visualize the audio bitstream while playing.
; - As with my II-Vision streaming video+audio player, we schedule a "slow path" dispatch to occur every 2KB in the
; - As with my ][-Vision streaming video+audio player, we schedule a "slow path" dispatch to occur every 2KB in the
; byte stream, and use this to manage the socket buffers (ACK the read 2KB and wait until at least 2KB more is
; available, which is usually non-blocking). While doing this we need to maintain the 13 cycle cadence so the
; speaker is in a known trajectory. We can compensate for this in the audio encoder.
@ -92,6 +95,7 @@ STESTABLISHED = $17
PRODOS = $BF00 ; ProDOS MLI entry point
RESET_VECTOR = $3F2 ; Reset vector
COUT = $FDED
HOME = $FC58
TICK = $C030 ; where the magic happens
TEXTOFF = $C050
@ -152,7 +156,7 @@ reset_w5100:
STA WDATA ; SET RECEIVE BUFFER
STA WDATA ; SET TRANSMIT BUFFER
; CONFIGRE SOCKET 0 FOR TCP
; CONFIGURE SOCKET 0 FOR TCP
LDA #>S0MR
STA WADRH
@ -260,93 +264,65 @@ setup:
CPX #(end_copy_page1 - begin_copy_page1+1)
BNE @0
; pretty colours
STA TEXTOFF
STA FULLSCR
LDA #$22
LDX #$04
LDY #$08
JSR fill
LDA #$66
LDX #$08
LDY #$0c
JSR fill
; clear screen
jsr HOME
; to restore after checkrecv
LDY #>RXBASE
LDA #>S0RXRSR
STA WADRH
JMP checkrecv
fill:
STX @1+2
STY @2+1
; The actual player code, which will be copied to $3xx for execution
;
; opcode cycle counts are for 65c02, for 6502 they are 1 less because JMP (indirect) is 5 cycles instead of 6.
PHA
@0:
PLA
LDX #$00
@1:
STA $0400,X
INX
CPX #$78
BNE @1
PHA
CLC
LDA @1+1
ADC #$80
STA @1+1
LDA @1+2
ADC #$00
STA @1+2
@2:
CMP #$08
BNE @0
PLA
RTS
; The actual player code
; TODO: evaluate whether it's worth adding longer NOTICK variants. They are less commonly needed than TICK because
; we typically don't want to leave the speaker alone for a long period of time - it's unlikely that the target waveform
; exactly tracks what the speaker will do without intervention.
begin_copy_page1:
; $300
tick_12: ; ticks on cycle 7 of 12
STA zpdummy
STA $C030
JMP (WDATA)
; combinations of the following tick_even and tick_odd opcodes are enough to recover all tick intervals >= 10 cycles,
; except for 11:
;
; even tick intervals
; 10 = TICK_10
; 12 = TICK_12
; 14 = TICK_14
; 16 = NOTICK_6 + TICK_10
; 18 = NOTICK_6 + TICK_12
; 20 = NOTICK_6 + TICK_14
; 22 = NOTICK_6 + NOTICK_6 + TICK_10
; 24 = ...
;
; odd tick intervals
; 11 = ?
; 13 = TICK_13
; 15 = TICK_15
; 17 = TICK_17
; 19 = NOTICK_6 + TICK_13
; 21 = NOTICK_6 + TICK_15
; 23 = NOTICK_6 + TICK_17
; 25 = NOTICK_6 + NOTICK_6 + TICK_13
; 27 = ...
; $308
; ticks on cycle count 2n+4 out of 2n+9, minimum 4 out of 9
; 9, 11, 13, 15, 17
; only need up to tick_17 because others come from combinations
tick_n_odd:
NOP
NOP
NOP
NOP
STA $C030
JMP (WDATA)
; $300
tick_odd: ; (NOTICK_6), (TICK_10), TICK_13, TICK_15, TICK_17
NOP ; 2
NOP ; 2
STA zpdummy ; 3
STA $C030 ; 4
JMP (WDATA) ; 6
; $30a
tick_even: ; NOTICK_6, TICK_10, TICK_12, TICK_14
NOP ; 2
NOP ; 2
STA $C030 ; 4
JMP (WDATA) ; 6
; $312
notick_8:
STA zpdummy
JMP (WDATA)
; $317
; 2n+5 cycles, minimum 5
; only need 5,7,9,11
; then 13 = 8+5
notick_n_odd:
NOP
NOP
NOP
JMP (WDATA)
; $31d
; Quit to ProDOS
exit:
INC RESET_VECTOR+2 ; Invalidate power-up byte
@ -363,15 +339,22 @@ exit_parmtable:
; Manage W5100 socket buffer and ACK TCP stream.
;
; In order to simplify the buffer management we expect this ACK opcode to consume
; the last 4 bytes in a 2K "TCP frame". i.e. we can assume that we need to consume
; exactly 2K from the W5100 socket buffer.
; In order to simplify the buffer management we expect this ACK opcode to consume the last 4 bytes in a 2K "TCP frame".
; i.e. we can assume that we need to consume exactly 2K from the W5100 socket buffer.
;
; While during this we need to keep ticking the speaker every 13 cycles to maintain the same
; net position of the speaker cone. It might be possible to compensate for some other cadence in the encoder,
; but this risks introducing unwanted harmonics. We end up ticking 12 times assuming we don't stall waiting for
; the socket buffer to refill. In that case audio is already going to be disrupted though.
slowpath: ;$32d
; While during this we need to keep ticking the speaker at a regular cadence to maintain the same net position of the
; speaker cone. We choose to tick every 14 cycles, which requires adding in minimal NOP padding.
;
; We end up ticking 8 times with 10 cycles left over, assuming we don't stall waiting for the socket buffer to refill.
;
; From the point of view of speaker voltages this slowpath is equivalent to the following opcode sequence:
; TICK_6 (TICK_14 * 7) with 4 cycles left over, adding 4 to the effective n of the next TICK_n we jump to (as chosen by
; the encoder).
;
; If we do stall waiting for data then there is no need to worry about maintaining an even cadence, because audio
; will already be disrupted (since the encoder won't have predicted it, so will be tracking wrong). The speaker will
; resynchronize within a few hundred microseconds though.
slowpath: ;$322
STA TICK ; 4
; Save the W5100 address pointer so we can come back here later
@ -381,73 +364,49 @@ slowpath: ;$32d
; Read Received Read pointer
LDA #>S0RXRD ; 2
STA zpdummy ; 3
STA TICK ; 4 [13]
STA WADRH ; 4
LDX #<S0RXRD ; 2
STA zpdummy ; 3
STA TICK ; 4 [13]
STX WADRL ; 4
NOP ; 2
STA zpdummy ; 3
STA TICK ; 4 [ 13]
STA TICK ; 4 [14]
LDX #<S0RXRD ; 2
STX WADRL ; 4
LDA WDATA ; 4 Read high byte
STA TICK ; 4 [14]
; No need to read low byte since it's guaranteed to be 0 since we're at the end of a 2K frame.
; Update new Received Read pointer
; We have received an additional 2KB
CLC ; 2
STA zpdummy ; 3
STA TICK ; 4 [13]
ADC #$08 ; 2
STX WADRL ; 4 Reset address pointer, X still has #<S0RXRD
STA zpdummy ; 3
STA TICK ; 4 [13]
NOP ; 2
STA TICK ; 4 [14]
STA WDATA ; 4 Store new high byte
; No need to store low byte since it's unchanged at 0
; Send the Receive command
LDA #<S0CR ; 2
STA zpdummy ; 3
STA TICK ; 4 [13]
STA WADRL ; 4
STA TICK ; 4 [14]
LDA #SCRECV ; 2
STA zpdummy ; 3
STA TICK ; 4 [13]
STA WDATA ; 4
checkrecv:
LDA #<S0RXRSR ; 2 Socket 0 Received Size register
STA zpdummy ; 3
LDX #$07 ; 2
STA TICK ; 4 [14]
; we might loop an unknown number of times here waiting for data but the default should be to fall
; straight through
@0:
STA TICK ; 4
STA WADRL ; 4
LDX #$07; 2 could move out of loop but need to pad cycles anyway
STA zpdummy ; 3
STA TICK ; 4 [13]
CPX WDATA ; 4 High byte of received size
BCC @1 ; 2
BCS @0 ; 3
@1:
NOP ; 2
STA TICK ; 4 [13]
STA TICK ; 4 [14]
BCS @0 ; 2 in common case when there is already sufficient data waiting.
; point W5100 back into the RX buffer where we left off
; There is data to read - we don't care exactly how much because it's at least 2K
@ -458,10 +417,10 @@ checkrecv:
; Since we're using an 8K socket, that means we don't have to do any work to manage the read pointer!
STY WADRH ; 4
LDX #$00 ; 2
STA zpdummy ; 3
STA TICK ; 4
NOP ; 2
STA TICK ; 4 [14]
STX WADRL ; 4
JMP (WDATA) ; 5
JMP (WDATA) ; 6 [10/14]
end_copy_page1:
.endproc

View File

@ -1,28 +0,0 @@
import sys
import librosa
import numpy
import soundfile as sf
def preprocess(
filename: str, target_sample_rate: int,
normalize: float = 0.5) -> numpy.ndarray:
data, _ = librosa.load(filename, sr=target_sample_rate, mono=True)
max_value = numpy.percentile(data, 90)
data /= max_value
data *= normalize
return data
def main(argv):
serve_file = argv[1]
out = argv[2]
sample_rate = int(1024. * 1000)
sf.write(out, preprocess(serve_file, sample_rate), sample_rate)
if __name__ == "__main__":
main(sys.argv)