diff --git a/zdocs/cpu/powerpc/mmu.md b/zdocs/cpu/powerpc/mmu.md new file mode 100644 index 0000000..399ac36 --- /dev/null +++ b/zdocs/cpu/powerpc/mmu.md @@ -0,0 +1,23 @@ +## Disabling BAT translation + +BAT translation can be disabled by invalidating BAT registers. This is somewhat CPU specific. +MPC601 implements its own format for BAT registers that differs from the PowerPC specification. + +MPC601-specific lower BAT registers has the "V" bit. If it's cleared, the corresponding BAT pair +is invalid and won't be used for address translation. To invalidate BATs on MPC601, it's enough +to write NULL to lower BAT registers. That's exactly what PowerMac 6100 ROM does: + ``` +li r0, 0 +mtspr ibat0l, r0 +mtspr ibat1l, r0 +mtspr ibat2l, r0 +``` + +PowerPC CPUs starting with 603 uses the BAT register format described in the PowerPC specification. +The upper BAT registers contain two bits: Vs (supervisor state valid bit) and Vp (problem/user state valid bit). +PowerPC Architecture First Edition from 1993 gives the following code: + +```BAT_entry_valid = (Vs & ~MSR_PR) | (Vp & MSR_PR)``` + +If neither Vs nor Vp is set, the corresponding BAT pair isn't valid and doesn't participate in address translation. +To invalidate BATs on non-601, it's sufficient to set the upper BAT register to 0x00000000. diff --git a/zdocs/cpu/powerpc/mmuemu.md b/zdocs/cpu/powerpc/mmuemu.md new file mode 100644 index 0000000..4e6e988 --- /dev/null +++ b/zdocs/cpu/powerpc/mmuemu.md @@ -0,0 +1,107 @@ +# PowerPC Memory Management Unit Emulation + +Emulation of a [memory management unit](https://en.wikipedia.org/wiki/Memory_management_unit) +(MMU) in a full system emulator is considered a hard task. The biggest challenge is to do it fast. + +In this article, I'm going to present a solution for a reasonably fast emulation +of the PowerPC MMU. + +This article is based on ideas presented in the paper "Optimizing Memory Emulation +in Full System Emulators" by Xin Tong and Motohiro Kawahito (IBM Research Laboratory). + +## PowerPC MMU operation + +The operation of the PowerPC MMU can be described using the following pseudocode: + +``` +VA is the virtual address of some memory to be accessed +PA is the physical address of some memory translated by the MMU +AT is access type we want to perform + +if address translation is enabled: + PA = block_address_translation(VA) + if not PA: + PA = page_address_translation(VA) +else: + PA = VA + +if access_permitted(PA, AT): + perfom_phys_access(PA) +else: + generate_mmu_exception(VA, PA, AT) +``` + +A HW MMU usually performs several operations in a parallel fashion so the final +address translation and memory access only take a few CPU cycles. +The slowest part is the page address translation because it requires accessing +system memory that is usually an order of magnitudes slower than the CPU. +To mitigate this, a PowerPC CPU includes some very fast on-chip memory used for +building various [caches](https://en.wikipedia.org/wiki/CPU_cache) like +instruction/data cache as well as +[translation lookaside buffers (TLB)](https://en.wikipedia.org/wiki/Translation_lookaside_buffer). + +## PowerPC MMU emulation + +### Issues + +An attempt to mimic the HW MMU operation in software will likely have a poor +performance. That's because modern hardware can perform several tasks in parallel. +However, software has to do almost everything serially. Thus, accessing some memory +with address translation enabled can take up to 300 host instructions! Considering +the fact that every 10th instruction is a load and every 15th instruction is a store, +it will be nearly impossible to achieve a performance comparable to that of the +original system. + +Off-loading some operations to the MMU of the host CPU for speeding up emulation +isn't feasible because Apple's computers often have hardware being accessed like an +usual memory. Thus, an emulator needs to distinguish between accesses to real memory +(ROM or RAM) from accesses to memory-mapped peripheral devices. The only way to +do that is to maintain special software descriptors for each virtual memory region +and consult them on each memory access. + +### Solution + +My solution for a reasonable fast MMU emulation employs a software TLB. It's +used for all memory accesses even when address translation is disabled. + +The first stage of the SoftTLB uses a +[direct-mapped cache](https://en.wikipedia.org/wiki/Cache_placement_policies#Direct-mapped_cache) +called **primary TLB** in my implementation. That's because this kind of cache +is the fastest one - one lookup requires up to 15 host instructions. Unfortunately, +this cache is not enough to cover all memory accesses due to a high number of +collisions, i.e. when several distinct memory pages are mapped to the same cache +location. + +That's why, the so-called **secondary TLB** was introduced. Secondary TLB is a +4-way fully associative cache. A lookup in this cache is slower than a lookup in the +primary TLB. But it's still much faster than performing a full page table walk +requiring hundreds of host instructions. + +All translations for memory-mapped I/O go into the secondary TLB because accesses +to such devices tend to be slower than real memory accesses in the real HW anyways. +Moreover, they usually bypass CPU caches (cache-inhibited accesses). But there +are exceptions from this rule, for example, video memory. + +When no translation for a virtual address was found in either cache, a full address +translation including the full page table walk is performed. This path is the +slowest one. Fortunately, the probabilty that this path will be taken seems to be +very low. + +The complete algorithm looks like that: +``` +VA is the virtual address of some memory to be accessed +PA is the physical address of some memory translated by the MMU +AT is access type we want to perform + +PA = lookup_primary_tlb(VA) +if VA in the primary TLB: + perform_memory_access(PA, ART) +else: + PA = lookup_secondary_tlb(VA) + if VA not in the secondary TLB: + PA = perform_full_address_translation(VA) + refill_secondary_tlb(VA, PA) + if is_real_memory(PA): + refill_primary_tlb(VA, PA) + perform_memory_access(PA, ART) +```