mirror of
https://github.com/transistorfet/moa.git
synced 2024-12-22 12:29:51 +00:00
Added part 2 of the Emulating the Genesis series
This commit is contained in:
parent
90d4f0f526
commit
3fc1f6270b
733
docs/posts/2022-01-emulating_the_sega_genesis_part2.md
Normal file
733
docs/posts/2022-01-emulating_the_sega_genesis_part2.md
Normal file
@ -0,0 +1,733 @@
|
||||
|
||||
Emulating the Sega Genesis - Part II
|
||||
====================================
|
||||
|
||||
###### *Written December 2021 by transistor_fet*
|
||||
|
||||
|
||||
A few months ago, I wrote a 68000 emulator in Rust named
|
||||
[Moa](https://jabberwocky.ca/projects/moa/). My original goal was to emulate a simple
|
||||
[computer](https://jabberwocky.ca/projects/computie/) I had previously built. After only a few
|
||||
weeks, I had that software up and running in the emulator, and my attention turned to what other
|
||||
platforms with 68000s I could try emulating. My thoughts quickly turned to the Sega Genesis and
|
||||
without thinking about it too much, I dove right in. What started as an unserious half-thought of
|
||||
"wouldn't that be cool" turned into a few months of fighting documentation, game programming hacks,
|
||||
and my sanity with some side quests along the way, all in the name of finding and squashing bugs in
|
||||
the 68k emulator I had already written.
|
||||
|
||||
This is Part II in the series. If you haven't already read [Part
|
||||
I](https://jabberwocky.ca/posts/2022-01-emulating_the_sega_genesis_part1.html), you might want to do
|
||||
so. It covers setting up the emulator, getting some game ROMs to run, and implementing the DMA and
|
||||
memory features of the VDP. Part II covers adding a graphical frontend to Moa, and then
|
||||
implementing a first attempt at generating video output. [Part
|
||||
III](https://jabberwocky.ca/posts/2022-01-emulating_the_sega_genesis_part3.html) will be about
|
||||
debugging the various problems in the VDP and CPU implementations to get a working emulator capable
|
||||
of playing games. For more detail on the 68000 and the basic design of Moa, check out [Making a
|
||||
68000 Emulator in Rust](https://jabberwocky.ca/posts/2021-11-making_an_emulator.html).
|
||||
|
||||
* [Previously](#previously)
|
||||
* [Choosing A Graphics Crate](#choosing-a-graphics-crate)
|
||||
* [Abstracting Out The Frontend](#abstracting-out-the-frontend)
|
||||
* [Updating Windows](#updating-windows)
|
||||
* [The VDP Device](#the-vdp-device)
|
||||
* [Colours And Patterns](#colours-and-patterns)
|
||||
* [Scrolls And Sprites](#scrolls-and-sprites)
|
||||
* [Next Time](#next-time)
|
||||
|
||||
|
||||
Previously
|
||||
----------
|
||||
|
||||
In less than a week, I had taken my 68000 emulator, plugged some Sega Genesis ROMs into it, written
|
||||
some boilerplate devices, and implemented the memory and DMA operations of the Genesis' video
|
||||
display processor (VDP). I could watch the log messages produced by the emulator as the ROMs were
|
||||
being interpreted instruction by instruction, reading the controller data, writing to the VDP, and
|
||||
generally doing things. (Doing what? I didn't quite know yet).
|
||||
|
||||
The CPU interface to the VDP was pretty much done, so the next thing to work on was actually
|
||||
displaying output. The data, which is already loaded into VRAM, CRAM, and VSRAM, as well as the
|
||||
internal VDP registers, needs to be turned into pictures. While the real hardware would generate a
|
||||
CRT video signal which would directly control a CRT-based television, the emulator will generate a
|
||||
single frame of video at a time, stored in a buffer, and display that buffer in a local window on
|
||||
the host computer.
|
||||
|
||||
|
||||
Choosing A Graphics Crate
|
||||
-------------------------
|
||||
|
||||
Before I could implement the display output, I needed something to draw the images onto. There are
|
||||
quite a few Rust crates available to create a GUI window and update it with 2D graphics. Most of
|
||||
these are of course intended for making games, and also include ways of getting key presses as
|
||||
input, which I'll also need. I looked at [Piston](https://www.piston.rs), which I've used before on
|
||||
other projects, [Macroquad](https://macroquad.rs/), which also supports web assembly as well as
|
||||
desktop targets, [Pixels](https://github.com/parasyte/pixels), which is intended specifically for 2D
|
||||
games, and [Minifb](https://github.com/emoon/rust_minifb), which is also specifically for 2D
|
||||
applications, but is much simpler. I also tried out
|
||||
[libretro](https://github.com/koute/libretro-backend), which is specifically made for video game
|
||||
emulation, but I found it much more restrictive than the others because of it's narrow focus.
|
||||
|
||||
Initially I had wanted to run the simulation in another thread so that I could run it the same way I
|
||||
had been for emulating Computie, but most of these libraries are set up to use a single main loop
|
||||
where everything happens in-line with the screen updating and input reading. For a normal game, the
|
||||
gameplay and any frame updating would be done just before submitting the frame buffer to be drawn to
|
||||
the screen, and then the library would block until the next frame is needed.
|
||||
|
||||
In order to run it inline, I added a function to `System` to only run the step functions for a given
|
||||
amount of simulated time before returning, which would allow the simulation to proceed until the
|
||||
next frame is updated before it updates the GUI window. Even though there is no specific
|
||||
coordination between when the frame is updated vs how much simulated time has passed, it still works
|
||||
surprisingly well. That said I'm still concerned with the fact that the emulator is not
|
||||
cycle-accurate yet, so I still wanted the option of running it in another thread. This was a major
|
||||
factor in choosing a library.
|
||||
|
||||
Piston is feature rich, and it's modular design allows it to be tailored for any given use, but it's
|
||||
a bit more than I need for this project. It includes all sorts of drawing primitives for 2D
|
||||
graphics, but what I really want is to just draw individual pixels to a buffer, or update the entire
|
||||
screen at once from a pre-drawn buffer. The size of the compiled binary using Piston was over 100MB
|
||||
too, with Pixels coming in a close second in size.
|
||||
|
||||
Pixels was also a bit more than I needed. It's based on `wgpu` which supports all sorts of GPU
|
||||
features like shading, but I don't need any of that since the Genesis will only generate raw pixel
|
||||
data, so the extra features just make it more complicated than it needs to be.
|
||||
|
||||
Macroquad is much lighter, by comparison, but because it can run in a web browser, it has some
|
||||
restrictions on how the updating can be done. It uses an `async` main function because in
|
||||
WebAssembly, the main loop cannot block like a normal loop without inhibiting other background code.
|
||||
It would be neat to run Moa in web assembly, but porting it to run in web assembly could be a lot of
|
||||
work that I don't intend to do any time soon, and since Macroquad wont easily support the threaded
|
||||
mode, I'll leave it as a possible secondary frontend for a later date.
|
||||
|
||||
Libretro has a similar restriction in that you give it a function to be called each time a frame is
|
||||
needed, and the simulation would have to be run in that time. It also really doesn't fit with my
|
||||
hope for a more general simulation platform, as it's specifically intended for game emulators, so
|
||||
this definitely isn't a good choice as my primary frontend.
|
||||
|
||||
That left Minifb, which turned out to be the best fit. It's very simple without a lot of features.
|
||||
You create a window (just one function call), a frame buffer to fill the window, and a main loop
|
||||
which just has to call a function to update the frame buffer to the screen. An optional frame
|
||||
limiter can be used, which causes the update function to block until the next frame is needed, but
|
||||
the simulation can be run on a separate thread too. Input can be read during the main loop, and the
|
||||
main loop is implemented entirely in the application-side, so it's a bit more flexible without being
|
||||
complicated for my needs. The compiled binary comes in at only 5 or 6 MB as well, which nice.
|
||||
|
||||
|
||||
Abstracting Out The Frontend
|
||||
----------------------------
|
||||
|
||||
As usual, given my obsession with flexibility and modularity, I set up the frontend so that it can
|
||||
be swapped out with another one. I need both a console-only "frontend" for Computie, and the minifb
|
||||
frontend for the Sega Genesis, so I made separate crates for each. Cargo has a "workspaces" feature
|
||||
to make it easy to organize a single repository into multiple crates, each compiled separately. The
|
||||
Cargo.toml file in the root directory of the project needs the following lines:
|
||||
|
||||
```toml
|
||||
[workspace]
|
||||
members = [".", "frontends/moa-console", "frontends/moa-minifb"]
|
||||
```
|
||||
|
||||
The first directory is the current directory, which has the common Moa code in `src/`, and
|
||||
dependencies for that crate also go in the same Cargo.toml file in the root of the project. Each of
|
||||
the frontends have their own Cargo.toml files with their own dependencies and feature flags. They
|
||||
also contain files in `frontends/<name>/src/bin/<machine>.rs` which will be compiled into separate
|
||||
binaries, one for each machine that can be run using that frontend, which makes it easier to run
|
||||
different machines. In this configuration, if using cargo to compile and run a given binary, both
|
||||
the crate and binary name must be specified or else cargo will use the configured default.
|
||||
|
||||
The Genesis machine can be run using:
|
||||
```
|
||||
cargo run -p moa-minifb --bin moa-genesis
|
||||
```
|
||||
|
||||
The `moa-genesis` binary looks something like this, with all the details hidden in `init_frontend`
|
||||
and `start`:
|
||||
|
||||
```rust
|
||||
use moa_minifb;
|
||||
use moa::machines::genesis::build_genesis;
|
||||
|
||||
fn main() {
|
||||
let mut frontend = MiniFrontend::init_frontend();
|
||||
let mut system = build_genesis(&frontend).unwrap();
|
||||
|
||||
frontend.start(system);
|
||||
}
|
||||
```
|
||||
|
||||
In order to interface between the common Moa devices and the frontend, there is a `Host` trait which
|
||||
is implemented in each frontend, and passed to the machine building function that's called from the
|
||||
frontend machine-specific binary. Through that trait, a callback can be registered by whichever
|
||||
devices need to output video data. Separate callbacks can be registered for getting data from the
|
||||
keyboard or controllers.
|
||||
|
||||
```rust
|
||||
pub trait Host {
|
||||
fn add_window(&mut self, _updater: Box<dyn WindowUpdater>) -> Result<(), Error> {
|
||||
// Default implementation if it's not defined in the 'impl Host for ...'
|
||||
Err(Error::new("This frontend doesn't support windows"))
|
||||
}
|
||||
|
||||
fn register_controller(&mut self, _device: ControllerDevice, _input: Box<dyn ControllerUpdater>) -> Result<(), Error> {
|
||||
// Default implementation if it's not defined in the 'impl Host for ...'
|
||||
Err(Error::new("This frontend doesn't support game controllers"))
|
||||
}
|
||||
}
|
||||
|
||||
pub trait WindowUpdater: Send {
|
||||
fn get_size(&mut self) -> (u32, u32);
|
||||
fn update_frame(&mut self, width: u32, height: u32, bitmap: &mut [u32]);
|
||||
}
|
||||
|
||||
pub trait ControllerUpdater: Send {
|
||||
fn update_controller(&mut self, event: ControllerEvent);
|
||||
}
|
||||
```
|
||||
|
||||
The `Host` trait is implemented by each frontend and passed to the function that builds the machine
|
||||
configuration, before the simulation is started. The machine configuration can choose to create a
|
||||
window only when needed by that machine. Not shown in the above snippet are the `Host` functions
|
||||
to create PTYs (used only by Computie), and to register a keyboard updater (used by the TRS-80 and
|
||||
Macintosh machines).
|
||||
|
||||
```rust
|
||||
pub struct MiniFrontend {
|
||||
pub updater: Option<Box<dyn WindowUpdater>>,
|
||||
}
|
||||
|
||||
impl Host for MiniFrontend {
|
||||
fn add_window(&self, updater: Box<dyn WindowUpdater>) -> Result<(), Error> {
|
||||
if self.updater.is_some() {
|
||||
return Err(Error::new("A window updater has already been registered with the frontend"));
|
||||
}
|
||||
self.updater = Some(updater);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl MiniFrontend {
|
||||
pub fn init_frontend() -> MiniFrontend {
|
||||
MiniFrontend {
|
||||
updater: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn start(&self, mut system: System) {
|
||||
let buffer = vec![0; WIDTH * HEIGHT]
|
||||
let options = minifb::WindowOptions::default();
|
||||
let mut window = minifb::Window::new("Test", WIDTH, HEIGHT, options).unwrap();
|
||||
|
||||
// Limit to max ~60 fps update rate
|
||||
window.limit_update_rate(Some(Duration::from_micros(16600)));
|
||||
|
||||
while window.is_open() && !window.is_key_down(Key::Escape) {
|
||||
// Run the simulation for 16.6ms, the same as the frame limiter
|
||||
system.run_for(16_600_000).unwrap();
|
||||
|
||||
if let Some(updater) = self.updater {
|
||||
updater.update_frame(WIDTH as u32, HEIGHT as u32, &mut buffer);
|
||||
window.update_with_buffer(&buffer, WIDTH, HEIGHT).unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
There's not much to it. Only one window can be created at the momemnt, and input is not yet
|
||||
supported. The threaded option is also not shown here. Before long, the code grew more
|
||||
complicated, and now includes parsing of command line arguments with the `clap` crate. To see the
|
||||
latest, check out the latest [Genesis machine-specific
|
||||
binary](https://github.com/transistorfet/moa/blob/main/frontends/moa-minifb/src/bin/moa-genesis.rs)
|
||||
and the [MiniFB host impl and main
|
||||
loop](https://github.com/transistorfet/moa/blob/main/frontends/moa-minifb/src/lib.rs)
|
||||
|
||||
|
||||
Updating Windows
|
||||
----------------
|
||||
|
||||
I now needed to implement the `WindowUpdater` trait and to do this, I made a `Frame` object which
|
||||
holds a single frame in a buffer. When the update function is called, the frame buffer will be
|
||||
copied to the minifb buffer. Pixel data can be fed to the `.blit` function using an Iterator,
|
||||
rather than using yet another intermediate buffer for individual images drawn to the frame. The
|
||||
`Arc<Mutex<Frame>>` used in the updater is necessary for the threaded version, so that the frame can
|
||||
be held both by the simulation thread and by the UI thread at the same time.
|
||||
|
||||
```rust
|
||||
// If a pixel has this value, it wont be copied to the buffer
|
||||
const MASK_COLOUR: u32 = 0xFFFFFFFF;
|
||||
|
||||
pub trait BlitableSurface {
|
||||
fn blit<B: Iterator<Item=u32>>(&mut self, pos_x: u32, pos_y: u32, bitmap: B, width: u32, height: u32);
|
||||
fn clear(&mut self, value: u32);
|
||||
}
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct Frame {
|
||||
pub width: u32,
|
||||
pub height: u32,
|
||||
pub bitmap: Vec<u32>,
|
||||
}
|
||||
|
||||
impl Frame {
|
||||
pub fn new(width: u32, height: u32) -> Self {
|
||||
Self { width, height, bitmap: vec![0; (width * height) as usize] }
|
||||
}
|
||||
}
|
||||
|
||||
impl BlitableSurface for Frame {
|
||||
fn blit<B: Iterator<Item=u32>>(&mut self, pos_x: u32, pos_y: u32, mut bitmap: B, width: u32, height: u32) {
|
||||
for y in pos_y..(pos_y + height) {
|
||||
for x in pos_x..(pos_x + width) {
|
||||
match bitmap.next().unwrap() {
|
||||
MASK_COLOUR =>
|
||||
{ },
|
||||
value if x < self.width && y < self.height =>
|
||||
{ self.bitmap[(x + (y * self.width)) as usize] = value; },
|
||||
_ =>
|
||||
{ },
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn clear(&mut self, value: u32) {
|
||||
let value = if value == MASK_COLOUR { 0 } else { value };
|
||||
for i in 0..((self.width as usize) * (self.height as usize)) {
|
||||
self.bitmap[i] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub struct FrameUpdateWrapper(Arc<Mutex<Frame>>);
|
||||
|
||||
impl WindowUpdater for FrameUpdateWrapper {
|
||||
fn get_size(&mut self) -> (u32, u32) {
|
||||
let frame = self.0.lock().unwrap();
|
||||
(frame.width, frame.height)
|
||||
}
|
||||
|
||||
fn update_frame(&mut self, width: u32, _height: u32, bitmap: &mut [u32]) {
|
||||
let frame = self.0.lock().unwrap();
|
||||
for y in 0..frame.height {
|
||||
for x in 0..frame.width {
|
||||
bitmap[(x + (y * width)) as usize] =
|
||||
frame.bitmap[(x + (y * frame.width)) as usize];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
I know I'm copying the buffer twice when updating the frame, but it's a compromise in the name of
|
||||
flexibility. A buffer held only by the frontend is passed into `WindowUpdate::update_frame()`, into
|
||||
which the frame data is copied, and then that frontend buffer is passed to minifb's
|
||||
`.update_with_buffer()` function. The trouble is that minifb uses its `.update_with_buffer()`
|
||||
function to also limit the frame rate. The frame limiter is set to 60 updates a second, and the
|
||||
delay that's necessary to achieve that limit happens before the update function returns. The delay
|
||||
is typically between 10 to 12 ms in order to achieve a total 16.6ms delay between frames.
|
||||
|
||||
If the shared frame buffer is passed to the minifb update function, the lock on the frame will be
|
||||
held until the delay expires and the function returns. If the simulation code is run between
|
||||
frames, then holding the lock isn't so much a problem because it's only after the update function
|
||||
returns that the frame will be accessed, but if the simulation is run in a separate thread, then the
|
||||
lock contention makes the frame rate excruciatingly slow. Since I wanted the option to run the
|
||||
simulation in a separate thread, and since the extra buffer copy is negligible on a typical desktop,
|
||||
I've kept it, but if I ever try to run in on slower hardware, I might see about redesigning this.
|
||||
|
||||
There is also the issue of partial frames. Currently the VDP simulation code draws an entire frame
|
||||
at once instead of more accurately drawing it line by line spread out over many calls to its step
|
||||
function. I suspect that this causes the issue with the Sonic 2 title screen where Tails appears to
|
||||
be an incorrect colour. The ROM might be trying to change something while the image is being
|
||||
updated in order to pack more colours onto the screen at once.
|
||||
|
||||
The advantage of updating all at once, however, is that the frame will always be completely drawn
|
||||
when the frame buffer lock is obtained. I had originally written some code to use two frames,
|
||||
swapping them after each draw cycle is complete, so that there's always a complete frame available
|
||||
when the screen is updated. If the simulation is running slow, then the same completed frame will
|
||||
be sent more than once. If it's too fast, then some frames wont be sent to the screen at all before
|
||||
being redrawn. This turned out to not be necessary because of the all-at-once update, but I will
|
||||
likely need to change this in the future, so the `FrameSwapper` code has been left in place in the
|
||||
actual Moa code.
|
||||
|
||||
|
||||
The VDP Device
|
||||
--------------
|
||||
|
||||
I can now add the frame buffer to the VDP device, including the call to `.add_window()` to register
|
||||
it with the frontend, and some logic to handle the horizontal and vertical interrupts.
|
||||
|
||||
```rust
|
||||
pub struct Ym7101State {
|
||||
pub regs: [22; u8], // The internal registers of the VDP
|
||||
|
||||
pub last_clock: Clock,
|
||||
pub h_clock: u32, // Used to generate the horizontal interrupt
|
||||
pub v_clock: u32, // Used to generate the vertical interrupt
|
||||
pub h_scanlines: u32,
|
||||
|
||||
... // Additional fields used by memory/DMA code
|
||||
}
|
||||
|
||||
pub struct Ym7101 {
|
||||
pub frame: Arc<Mutex<Frame>>, // The output video frame, shared with the frontend
|
||||
pub state: Ym7101State,
|
||||
}
|
||||
|
||||
impl Ym7101 {
|
||||
pub fn new<H: Host>(host: &H) -> Ym7101 {
|
||||
let frame = Frame::new(320, 224);
|
||||
|
||||
host.add_window(Box::new(frame.clone()));
|
||||
|
||||
Ym7101 {
|
||||
frame,
|
||||
state: Ym7101State::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Steppable for Ym7101 {
|
||||
fn step(&mut self, system: &System) -> Result<ClockElapsed, Error> {
|
||||
// Calculate the actual time since the last step occured
|
||||
let diff = (system.clock - self.state.last_clock) as u32;
|
||||
self.state.last_clock = system.clock;
|
||||
|
||||
// Reset the interrupts
|
||||
if self.state.reset_int {
|
||||
system.get_interrupt_controller().set(false, 4, 28)?;
|
||||
system.get_interrupt_controller().set(false, 6, 30)?;
|
||||
}
|
||||
|
||||
self.state.h_clock += diff;
|
||||
if self.state.h_clock > 63_500 {
|
||||
self.state.h_clock -= 63_500;
|
||||
|
||||
// Trigger the horizontal interrupt
|
||||
// The H Interrupt register has the number of lines that should be
|
||||
// drawn before the interrupt occurs
|
||||
self.state.h_scanlines = self.state.h_scanlines.wrapping_add(1);
|
||||
if self.state.hsync_int_enabled() && self.state.h_scanlines >= self.state.regs[REG_H_INTERRUPT] {
|
||||
self.state.h_scanlines = 0;
|
||||
system.get_interrupt_controller().set(true, 4, 28)?;
|
||||
self.state.reset_int = true;
|
||||
}
|
||||
}
|
||||
|
||||
self.state.v_clock += diff;
|
||||
if self.state.v_clock > 16_630_000 {
|
||||
self.state.v_clock -= 16_630_000;
|
||||
|
||||
// Trigger the vertical interrupt
|
||||
if self.state.vsync_int_enabled() {
|
||||
system.get_interrupt_controller().set(true, 6, 30)?;
|
||||
self.state.reset_int = true;
|
||||
}
|
||||
|
||||
// Lock the frame and update it
|
||||
let mut frame = self.frame.lock().unwrap();
|
||||
self.state.draw_frame(&mut frame);
|
||||
}
|
||||
|
||||
// Run the DMA transfer if configured
|
||||
self.state.step_dma(system)?;
|
||||
|
||||
Ok((1_000_000_000 / 13_423_294) * 4)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `.step()` function will increment the `v_clock` until it has counted the number of nanoseconds
|
||||
between refreshes on an NTSC television, which is 16.63 milliseconds. At that point, it will
|
||||
trigger the vertical interrupt and update the frame all at once. There is also a horizontal
|
||||
interrupt which can be configured to trigger only after a certain number of lines have been drawn.
|
||||
|
||||
I've shown here the original way I implemented interrupts, which was only intended to be temporary.
|
||||
It definitely caused problems and was fixed not long after, but I'll go into more detail in Part III
|
||||
where I talk about the process of debugging.
|
||||
|
||||
|
||||
Colours And Patterns
|
||||
--------------------
|
||||
|
||||
Before anything can be drawn, there needs to be some colours, and all the colours drawn by the VDP
|
||||
are stored in the CRAM. The CRAM can hold up to 4 different 16-colour palettes, with each palette
|
||||
colour specified as a 9-bit RGB colour, which is actually organized as a 12-bit colour in a 16-bit
|
||||
memory location. (Yikes) The extra bits are not actually implemented in the hardware VDP to save
|
||||
space on the chip, but for the purposes of emulating, I'll just store each colour in a 16-bit word.
|
||||
This means CRAM will be 128 bytes in size. The colours are actually stored in BGR order in the
|
||||
word, so a middle red colour would be 0x008 and a middle blue colour would be 0x800. To make the
|
||||
brightest white, the colour value would be 0xEEE (since the lower bit of each RGB component is
|
||||
always 0).
|
||||
|
||||
Technically this allows up to 512 different colours to be displayed, but only 64 of them can be on
|
||||
the screen at once because of the limited palette size. A further limit is that the `0` colour of
|
||||
each palette is a special mask colour which won't be drawn, so that any pixel below it will show
|
||||
through. This is especially useful for sprites, so that they can be drawn on top of the other
|
||||
graphics without squared edges. There are also special highlight and shadow modes which shift the
|
||||
colour output either high or low so that the 9-bit colour value is spread across only half of the
|
||||
colour range, which increases the total possible colours that can be displayed. That feature has
|
||||
some complex conditions to determine when to use the shadow or highlight mode, so I just left it
|
||||
unimplemented for the time being.
|
||||
|
||||
All graphics generated by the VDP are made up of 8x8 pixel images called cells. A cell is generated
|
||||
using a pattern which contains the pixel data, in combination with a palette number. Each pixel in
|
||||
the pattern is a 4-bit number, which selects one of 16 colours from the current colour palette.
|
||||
That means each pattern is 32 bytes long, and each byte in the pattern will contain two pixels of
|
||||
data with the upper nibble being the first pixel, and the lower nibble being the second. The first
|
||||
pixel in the first byte corresponds to the upper left hand corner of the pattern, and the pixels are
|
||||
organized from left to right, top to bottom.
|
||||
|
||||
All the patterns must be in VRAM to be drawn, and they start from address 0 in VRAM, with the first
|
||||
32 bytes being pattern 0, the next 32 bytes being pattern 1, and so on. When a pattern is
|
||||
referenced, an 11-bit number is used, which is then multiplied by 32 (or shifted to the left by 5
|
||||
bits) to get the address in VRAM where the patterns starts. So a pattern can be anywhere in the
|
||||
64KB of VRAM, so long as it's aligned to a 32 byte boundary. The pattern reference also includes
|
||||
the palette number to use for drawing the pattern, and whether the pattern should be reversed in the
|
||||
horizontal and/or vertical direction. This allows the same pattern to be used more often, saving
|
||||
space in VRAM.
|
||||
|
||||
Since generating the pattern data takes some translation, I opted to use an iterator to return the
|
||||
data. Each iteration will return one of 64 pixels as a 32-bit number which can then be written
|
||||
directly to the frame buffer.
|
||||
|
||||
```rust
|
||||
pub struct PatternIterator<'a> {
|
||||
state: &'a Ym7101State, // A stored reference which is needed to access the colour values
|
||||
palette: u8, // The palette number (0-3)
|
||||
base: usize, // The base address in VRAM where the pattern starts
|
||||
h_rev: bool, // Whether to reverse it horizontally
|
||||
v_rev: bool, // Whether to reverse it vertically
|
||||
line: i8, // Current line (needed by reversing code)
|
||||
col: i8, // Current column (needed by reversing code)
|
||||
second: bool, // Whether this is the second pixel (lower nibble) in the byte
|
||||
}
|
||||
|
||||
impl<'a> PatternIterator<'a> {
|
||||
pub fn new(state: &'a Ym7101State, start: u32, palette: u8, h_rev: bool, v_rev: bool) -> Self {
|
||||
Self {
|
||||
state,
|
||||
palette,
|
||||
base: start as usize,
|
||||
h_rev,
|
||||
v_rev,
|
||||
line: 0,
|
||||
col: 0,
|
||||
second: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a> Iterator for PatternIterator<'a> {
|
||||
type Item = u32;
|
||||
|
||||
fn next(&mut self) -> Option<Self::Item> {
|
||||
let line = (if !self.v_rev { self.line } else { 7 - self.line }) as usize;
|
||||
let column = (if !self.h_rev { self.col } else { 3 - self.col }) as usize;
|
||||
let offset = self.base + line * 4 + column;
|
||||
|
||||
let value = if (!self.h_rev && !self.second) || (self.h_rev && self.second) {
|
||||
self.state.get_palette_colour(self.palette, self.state.vram[offset] >> 4)
|
||||
} else {
|
||||
self.state.get_palette_colour(self.palette, self.state.vram[offset] & 0x0f)
|
||||
};
|
||||
|
||||
if !self.second {
|
||||
self.second = true;
|
||||
} else {
|
||||
self.second = false;
|
||||
self.col += 1;
|
||||
if self.col >= 4 {
|
||||
self.col = 0;
|
||||
self.line += 1;
|
||||
}
|
||||
}
|
||||
|
||||
Some(value)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It's a bit messy, and I'm sure I could optimize it, but it works for now. At this point I'm still
|
||||
focused on getting something working more than cleaning up and optimizing the code. (Note: I ended
|
||||
up throwing this code away, which goes to show it's not always worth getting hung up on the quality
|
||||
of code when in early development and things are changing rapidly. I'll talk more about the change
|
||||
in Part III).
|
||||
|
||||
|
||||
Scrolls And Sprites
|
||||
-------------------
|
||||
|
||||
Now that there's a way to draw cells to the screen, how is the VDP told which ones to draw and
|
||||
where? They can either be specified in a scroll table, or they can be added as a sprite. There are
|
||||
two moveable planes called Scroll A and Scroll B, the tables for which are stored in VRAM and the
|
||||
starting address of each table is stored in their own VDP registers. Each table is an array of
|
||||
16-bit words where each word is called a pattern name and contains the pattern number, the colour
|
||||
palette to render it with, two bits to reverse the pattern in the horizontal and/or vertical
|
||||
direction, and a priority bit used to determine the draw order of the different planes. The exact
|
||||
format in memory is better shown at
|
||||
[megadrive.org](https://wiki.megadrive.org/index.php?title=VDP_Scrolls) and
|
||||
[segaretro.org](https://segaretro.org/Sega_Mega_Drive/Planes)
|
||||
|
||||
Both scrolls must be the same size, but the size can be any combination of 32, 64, or 128 cells in
|
||||
either direction. This means they are usually bigger than the size of the screen itself, which for
|
||||
the NTSC version is usually 40 x 28 cells. The portion of the scroll that's drawn on the screen can
|
||||
be controlled using the scrolling features of the VDP, which at its simplest is just two numbers per
|
||||
plane to specify the vertical and horizontal offset of the scroll relative to the upper left corner
|
||||
of the screen, but at it's most complex can have a different offset for each line of pixels which
|
||||
controls the horizontal position of each line. I left the scrolling functionality unimplemented
|
||||
until the scroll planes were working, so I'll go into more detail about it later. For the logo and
|
||||
title screens, there usually isn't a scroll offset, so displaying the upper left corner of the
|
||||
scroll at the upper left corner of the screen should still display something.
|
||||
|
||||
There is also a special fixed plane called the window, but not many games seem to need it and my
|
||||
early attempts at implementing it caused weird graphics, so I left it for later.
|
||||
|
||||
The other way to draw to the screen is using sprites. Like the scrolls, a table of sprite data is
|
||||
stored in VRAM and a special register contains the address of the start of that table. Each entry
|
||||
in the table is four 16-bit words instead of just one, with each entry corresponding to an
|
||||
independent sprite. For each sprite, there is a vertical and horizontal position (relative to the
|
||||
upper left corner of the screen minus 128 pixels in each direction), the size of the sprite in cells
|
||||
(a single sprite can be comprised of up to 4 x 4 cells), the pattern name to use for the first cell
|
||||
(in the same format as the scrolls), and a link number. The organization of a sprite entry is more
|
||||
clearly displayed [here](https://wiki.megadrive.org/index.php?title=VDP_Sprites)
|
||||
|
||||
The link number is used to determine the sprite priority. Sprite 0, which is the first entry in the
|
||||
table, is always the highest priority sprite. Its link number is the index in the sprite table for
|
||||
the next highest priority sprite, which could be anywhere in the sprite table. That sprite's link
|
||||
number would then point to the next highest priority sprite and so on until a sprite has a link
|
||||
number of 0. When multiple sprites are on top of each other, the highest priority sprite will
|
||||
appear on top.
|
||||
|
||||
<p align="center">
|
||||
<img src="images/2022-01/sonic2-scroll-breakdown.png" title="A breakdown of the screen planes in Sonic 2" />
|
||||
</p>
|
||||
|
||||
This shows the breakdown of the different planes that are combined to form the final output. From
|
||||
the top left to the bottom right is Scroll B (the background), Scroll A (the foreground), the
|
||||
Sprites (the moveable graphics), and then the final image is the three planes combined.
|
||||
|
||||
The following code is my first attempt at implementing this. (Please note: this code contains many
|
||||
issues but it wasn't until after much debugging that I figured out what they all were. I'd like to
|
||||
describe the process of debugging this code in Part III, so I'm showing the code I started with
|
||||
here. Bonus points to anyone who can figure them out without reading onward).
|
||||
|
||||
```rust
|
||||
pub fn draw_frame(&mut self, frame: &mut Frame) {
|
||||
self.draw_background(frame);
|
||||
self.draw_cell_table(frame, ((self.regs[REG_SCROLL_B_ADDR] as u16) << 13) as u32);
|
||||
self.draw_cell_table(frame, ((self.regs[REG_SCROLL_A_ADDR] as u16) << 10) as u32);
|
||||
self.draw_sprites(frame);
|
||||
}
|
||||
|
||||
fn draw_background(&mut self, frame: &mut Frame) {
|
||||
let bg_colour = self.get_palette_colour((self.regs[REG_BACKGROUND] & 0x30) >> 4, self.regs[REG_BACKGROUND] & 0x0f);
|
||||
frame.clear(bg_colour);
|
||||
}
|
||||
|
||||
fn draw_cell_table(&mut self, frame: &mut Frame, cell_table: u32) {
|
||||
let (scroll_h, scroll_v) = self.get_scroll_size();
|
||||
let (cells_h, cells_v) = self.get_screen_size();
|
||||
|
||||
for cell_y in 0..cells_v {
|
||||
for cell_x in 0..cells_h {
|
||||
let pattern_name = read_beu16(&self.vram[(cell_table + ((cell_x + (cell_y * scroll_h)) << 1) as u32) as usize..]);
|
||||
let iter = self.get_pattern_iter(pattern_name);
|
||||
frame.blit((cell_x << 3) as u32, (cell_y << 3) as u32, iter, 8, 8);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn build_link_list(&mut self, sprite_table: usize, links: &mut [usize]) -> usize {
|
||||
links[0] = 0;
|
||||
let mut i = 0;
|
||||
loop {
|
||||
let link = self.vram[sprite_table + (links[i] * 8) + 3];
|
||||
if link == 0 || link > 80 {
|
||||
break;
|
||||
}
|
||||
i += 1;
|
||||
links[i] = link as usize;
|
||||
}
|
||||
i
|
||||
}
|
||||
|
||||
fn draw_sprites(&mut self, frame: &mut Frame) {
|
||||
let sprite_table = (self.regs[REG_SPRITES_ADDR] as usize) << 9;
|
||||
let (cells_h, cells_v) = self.get_screen_size();
|
||||
let (pos_limit_h, pos_limit_v) = (if cells_h == 32 { 383 } else { 447 }, if cells_v == 28 { 351 } else { 367 });
|
||||
|
||||
let mut links = [0; 80];
|
||||
let lowest = self.build_link_list(sprite_table, &mut links);
|
||||
|
||||
for i in (0..lowest + 1).rev() {
|
||||
let sprite_data = &self.vram[(sprite_table + (links[i] * 8))..];
|
||||
|
||||
let v_pos = read_beu16(&sprite_data[0..]);
|
||||
let size = sprite_data[2];
|
||||
let pattern_name = read_beu16(&sprite_data[4..]);
|
||||
let h_pos = read_beu16(&sprite_data[6..]);
|
||||
|
||||
let (size_h, size_v) = (((size >> 2) & 0x03) as u16 + 1, (size & 0x03) as u16 + 1);
|
||||
let h_rev = (pattern_name & 0x0800) != 0;
|
||||
let v_rev = (pattern_name & 0x1000) != 0;
|
||||
|
||||
for ih in 0..size_h {
|
||||
for iv in 0..size_v {
|
||||
let (h, v) = (if !h_rev { ih } else { size_h - ih }, if !v_rev { iv } else { size_v - iv });
|
||||
let (x, y) = (h_pos + h * 8, v_pos + v * 8);
|
||||
if x > 128 && x < pos_limit_h && y > 128 && y < pos_limit_v {
|
||||
let iter = self.get_pattern_iter(pattern_name + (h * size_v) + v);
|
||||
|
||||
frame.blit(x as u32 - 128, y as u32 - 128, iter, 8, 8);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn get_pattern_iter<'a>(&'a self, pattern_name: u16) -> PatternIterator<'a> {
|
||||
let pattern_addr = (pattern_name & 0x07FF) << 5;
|
||||
let pattern_palette = ((pattern_name & 0x6000) >> 13) as u8;
|
||||
let h_rev = (pattern_name & 0x0800) != 0;
|
||||
let v_rev = (pattern_name & 0x1000) != 0;
|
||||
PatternIterator::new(&self, pattern_addr as u32, pattern_palette, h_rev, v_rev)
|
||||
}
|
||||
|
||||
fn get_scroll_size(&self) -> (u16, u16) {
|
||||
let h = scroll_size(self.regs[REG_SCROLL_SIZE] & 0x03);
|
||||
let v = scroll_size((self.regs[REG_SCROLL_SIZE] >> 4) & 0x03);
|
||||
(h, v)
|
||||
}
|
||||
|
||||
fn get_screen_size(&self) -> (u16, u16) {
|
||||
let h_cells = if (self.regs[REG_MODE_SET_4] & MODE4_BF_H_CELL_MODE) == 0 { 32 } else { 40 };
|
||||
let v_cells = if (self.regs[REG_MODE_SET_2] & MODE2_BF_V_CELL_MODE) == 0 { 28 } else { 30 };
|
||||
(h_cells, v_cells)
|
||||
}
|
||||
```
|
||||
|
||||
And after running this with the Sonic 1 ROM, I'm greeted by...
|
||||
|
||||
<p align="center">
|
||||
<img src="images/2022-01/sonic1-broken-oct-26.png" title="Sonic 1 broken SEGA screen" />
|
||||
</p>
|
||||
|
||||
Well it kinda looks like the SEGA logo but why are the colours so wrong...
|
||||
|
||||
|
||||
Next Time
|
||||
---------
|
||||
|
||||
After about two weeks or so of working on the Sega Genesis support for Moa, reading up on the
|
||||
internal workings of the console, implementing a simple swappable frontend, and implementing my
|
||||
first best guess of how it should work, I could display a very mangled image with bright magenta
|
||||
colours instead of blues. To be honest, I was a bit dejected. I was hoping to have something that
|
||||
at least looked coherent before I added my work in progress to git. Fiddling with it wasn't
|
||||
improving matters at all, so it wasn't going to be a quick fix.
|
||||
|
||||
I was going to have roll up my sleeves and grind it out. This is where the real journey began,
|
||||
tirelessly debugging until I hit a wall, taking some detours, working on other projects for a while,
|
||||
eventually returning to it, isolating the problems with the help of some test ROMs and another
|
||||
Genesis emulator as a reference, and finally getting it working. Stay tuned for the (not so)
|
||||
thrilling conclusion in [Part
|
||||
III](https://jabberwocky.ca/posts/2022-01-emulating_the_sega_genesis_part3.html) of Emulating The
|
||||
Sega Genesis.
|
||||
|
BIN
docs/posts/images/2022-01/sonic1-broken-oct-26.png
Normal file
BIN
docs/posts/images/2022-01/sonic1-broken-oct-26.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 28 KiB |
BIN
docs/posts/images/2022-01/sonic2-scroll-breakdown.png
Normal file
BIN
docs/posts/images/2022-01/sonic2-scroll-breakdown.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 24 KiB |
Loading…
Reference in New Issue
Block a user