This patch helps to keep the audio from breaking-up on slow machines when using
SDL audio. On those slow machines you do still get the break-up every so often
but the sound tends not to break-up nearly as often. It is much better on the
ears. Notably often the system beeps do not have a pause in them.
Slow machine is <= 1 GHz G4.
This first patch gets B2 and SS to build under Leopard and Tiger.
I tested this on a 32-bit intel 10.5.6 mac like so:
B2
./autogen.sh --disable-standalone-gui --enable-vosf --enable-sdl-video --enable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --without-x
SS
./autogen.sh --disable-standalone-gui --enable-vosf -enable-sdl-video --disable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --without-x --enable-jit
There is also a little tweak so that you can use sdl audio in SheepShaver when building for Mac OS X.
Previously, SheepShaver would usually hang if it was unable to access the ROM
file on startup, due to a race between media_poll_func() and DarwinSysExit().
This change eliminates the race by ensuring that media_poll_func() always ends
up waiting in CFRunLoopRun(), which allows us to terminate the polling thread
in a consistent way.
This fixes the mapping of SDL mouse-button numbers to MacOS/ADB mouse-button numbers,
to correct the reversal of the middle and right buttons. Most useful in conjunction
with a multi-button mouse enabler such as TheMouse2B:
http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cfg/themouse-2b-11.hqx
... which can turn a right-click into a control-click.
The CDROM status call "WhoIsThere" (csCode 97) is now implemented. Apart from
eliminating "WARNING: Unknown CDROMStatus(97)" complaints from the console log,
this does not appear to have had any effects whatsoever.
A typo in the implementation of the CDROM status call "GetCDFeatures" has been
corrected per Technical Note DV22:
http://developer.apple.com/technotes/dv/dv_22.html
Software cursor mode is now supported, although currently the existing hardware
cursor mode is used whenever possible. (Software mode will be used if you are
running with a recent version of SDL's Quartz video driver, since a bug in SDL
1.2.11 and later prevents the hardware cursor from working properly with that
driver.)
In hardware cursor mode, the hot-spot is now determined heuristically. Formerly
it could not be determined and was always (1,1), an annoyance for many cursors
other than the arrow.
In hardware cursor mode, the cursor will now be hidden when requested by the
emulated OS (such as when you are typing in a text field).
In hardware cursor mode, some cursor image formats that the code does not handle
correctly will now be rejected, causing the emulated OS to revert temporarily to
software cursor mode. Formerly you would just end up with random garbage for a
cursor. This typically happened for grayscale or color cursors; rejecting images
with rowBytes != 2 eliminates the worst cases.
SheepShaver window a number of times (somewhere around 30 or 40 times will do
it), SheepShaver appears to lock up. This occurs because SDL posts application
activate/deactivate events to its event queue when the mouse moves in/out of the
SheepShaver window, but these events are never consumed, and as a result, the
event queue fills up. Thereafter, no new events can be posted, and user inputs
are ignored. The fix is to consume SDL_ACTIVEEVENT in handle_events().
file I/O to the external filesystem. The application-specified ioPosMode parameter must
be masked off appropriately in extfs.cpp:fs_set_fpos(), as is done elsewhere in the file.
- Rename X86_SSE_CC_NE to X86_SSE_CC_NEQ (match Intel reference manual)
- Rename MOVDLX to MOVDXD (%Xmm register as Destination)
- Rename MOVDQX to MOVQXD (%Xmm register as Destination)
- Rename MOVDXL to MOVDXS (%Xmm register as Source)
- Rename MOVDXQ to MOVQXS (%Xmm register as Source)
explicitly generated from mig. The advantage of that is to provide a "fast"
path for x86_64 on Leopard too (fault address in code[1]).
By "fast", this means +33% faster wrt. explicitly thread_get_state() but
still pretty slow (40 usec/fault). This is on par with the i386 code path though.
Leopard kernel faster? This is pure marketing hype. For 32-bit applications,
Mach exception recovery is 60% slower. For 64-bit applications, this is up
to 40% faster though. In any case, MacOS X remains pretty slow wrt. Linux...
environment variable: SIGSEGV_MACH_FAULT. It can be set to "direct" to
assume the fault address comes from code[1] argument, or "slow" to use
the slow path through thread_get_status(EXCEPTION_STATE)->faultvaddr.
in the bundle. This is faster and more accurate as this avoids emulation.
Also clean-up code so that to prepare the use of lib uaccess on hpux/ia64.
XXX: this will need explicit use of uint64_t to define registers because
HP/UX is ILP32 capable and all registers are 64-bit capable so "unsigned long"
won't fit.
complex than expected but it was fun to play with. Who designed this ISA?
I'd love to see how the decoder is implemented in HW, by all means it is
not "simplified" unless I missed some pattern...
XNU 792.21.3 (10.4.10) and XNU 1228 (10.5.0), exception handler code[1] always
contains the fault address nowadays. So make it the default fast path but keep
provisions to check that at run-time first.
This yields a nearly 4x improvement in SIGSEGV recovery but MacOS X is still
suboptimal wrt. Linux, so VOSF is still not possible with frameskip == 0.
XXX: the ppc kernel had bugs that caused DAR (put into code[1]) to be incorrectly
decoded. This would need a broader test audience or more careful audit of the
sources changes.
- set slirp client hostname
- fix slirp redirection on systems without a useful host IP address
- separate alias_addr (10.0.2.2) from our_addr (Ed Swierk)
- fix 32+ KB packets handling (Ed Swierk)
- fix UDP broadcast translation error
- solaris port (Ben Taylor)
on Tiger+ to store FInfo and FXInfo. Otherwise, plain old .finfo/ helpers are
used. "Safe" flags and fields are always synchronized to/from MacOS X.
BTW, CFString leak was fixed at the same time.
I am adding functionality to support this. For the moment, I've only
added the platform-specific conversion for MacOSX (ie: UTF8 -> MacRoman),
but others can be added later.
Rather, use an address override prefix (0x67) though Intel Core optimization
reference guide says to avoid LCP prefixes. In practise, impact on performance
is measurably marginal on e.g. Speedometer tests.
Not quite the way I wanted to do it but it will do for now.
(on a real Mac, the real audio hardware should be able to pull/grab the data
from our buffers - an extra thread with its own set of buffers is wasteful!)
Not quite the way I wanted to do it but it will do for now.
(on a real Mac, the real audio hardware should be able to pull/grab the data
from our buffers - an extra thread with its own set of buffers is wasteful!)
- Don't export transfer types definitions (formerly used by older API)
- Handle ADD instructions in ix86_skip_instruction() (generated by icc 9.1)
- Use "%p" format for EIP/RIP addresses
if you have changed the depth since boot (seems to be something strange
with the parameters that I still haven't worked out). If this happens,
we now put a suggested workaround in the warning message.
This reduces the number of Screen_fault_handler() calls by 80%. i.e. VOSF
is now viable on this turtle MacOS X. Besides, since there is no buffer
comparison, idle sleep can really be effective. SheepShaver in idle mode
on my PBG4 now goes below 8% of CPU resources instead of 70-80% with
bounding boxes based video refreshes.
Caveat: if your program doesn't use standard MacOS routines that call NQD,
then you can expect slower (visual) performance. However, I do think the
new default behavior (VOSF+NQD) is the most common.
This does not improve graphics performance but helps CPU because it reduces
the number of bytes transfered to actual screen. I saw an improvement by up
to 26% in frameskip 4 800x600x16 but also a hit by 3% with frameskip 0.
The next step is to use NQD bounding boxes to help detecting dirty areas.
So far, this is the best I can do without VOSF working (MacOS X performance
bugs -- pitifully slow Mach syscalls)
- Properly handle migration from "screenmodes" and "windowmodes" to "screen"
- Fix has_mode() logic to really test for actual mode availability. i.e.
no longer start in large screen mode if user specified a max size.
- Call user handler for KERN_INVALID_ADDRESS too (SIGBUS)
- Check for VALID_THREAD_STATE_FLAVOR in forward_exception()
- Return KERN_FAILURE if forward_exception() got an unknown behavior code
Other bugs fixed:
- CD-ROM media are polled and now can be changed without rebooting
- Buffer overflow, memory leak and extra wait in CD-ROM ejection code
them. So, if someone has BeOS and wants to give it a try, please change and
test this new code. Corner case could be a resume_thread() when emul_thread
is not suspended.
Fixlet to powerrom_cpu: call idle_resume() from TriggerInterrupt().
prefs items changes but it should now be simpler to add other ethernet
emulation means (slirp, tap-win32).
# Basilisk II driver mode
ether {guid}
becomes
ether b2ether
etherguid {guid}
# Basilisk II Router mode
routerenabled true
becomes
ether router
as BasiliskIIGUI.app, or /Applications/BasiliskII.app if none was found.
Also make yet another arrangement for MacOS X "difference". This scenario
was not working: WarningAlert -> ErrorAlert, the latter was not performed
because the exit status was not properly filled in sip->si_status...
- Rewrote dispatch loop to accomodate GTK+1.2 for MacOS X (which doesn't
like threads nor forks(!)). The latter also requires an additional patch
to the version 0.7 available on SourceForge
- Run-time detect JIT capability so that we could hopefully use the ppc GUI
on intel based Macs (check!)
STANDALONE_GUI. This is the second step towards a more interesting GUI alike
to VMware. Communication from/to the GUI is held by some lightweight RPC.
Note: The step should be enough to provide a tiny GTK GUI for MacOS X.
<http://lists.nongnu.org/archive/html/qemu-devel/2006-04/msg00245.html>
This does improve slirp performance a lot, especially in FTP passive mode
transfers. i.e. now, they are equally as fast as non passive mode. I get
approx. 800 KB/sec in B2 and 500 KB/sec in SheepShaver (over a DSL line).
In native env, the max download data rate from my ISP is around 950 KB/sec.
up to 1 GB of Mac RAM in both REAL_ADDRESSING and DIRECT_ADDRESSING modes.
NetBSD 2.0 can use the Linux linker script. However, I could not verify 1G
support since my installation does not permit this.
arches. This probably already worked in the past but I have just verified
that Basilisk II works with up to 1 GB of Mac RAM in DIRECT_ADDRESSING or
REAL_ADDRESSING mode.
BTW, a quick Speedometer 4 CPU performance test showed a +15% speed increase
in real addressing mode vs. direct addressing. x86 arches don't benefit much
from that mode since they support complex address modes already (beyond plain
load/store).
TODO: check on MacOS X for Intel so that to reduce the test to darwin*:*)
addressing in REAL_ADDRESSING mode. Only support platforms with proper
linker scripts to map the whole Mac memory from address 0. Warning fix.
NOTE: when compiled with --enable-addressing=real on Linux {x86,x86_64},
you can not address up to 1.5 GB in Basilisk II.
This was only an experiment. Improvement was marginal: only +3% on AMD64
(an Athlon 64 3200+). However, it may be interesting to test it on EM64T
(e.g. newer P4s) since an older P3/800, hence in 32-bit mode, got a +15%
improvement in Speedometer 4 benchmarks.
Rationale: lahf/seto sequences avoid load/stores to the stack (push/pop)
and it was thus hoped to be faster.
Anyhow, SAHF_SETO_PROFITABLE can only be enabled manually at this time.
Edit your generated Makefile for testing, but first make sure your CPU
supports lahf in 64-bit mode (lahf_lm flag in /proc/cpuinfo).
- In the instruction skipper code, add a huge kludge (trampoline) to forcibly
zero out %global registers when requested. Otherwise, Solaris/SPARC turned
out to use %g1 during signal handling, and the zero we could have written
to there vanished. This assumes [%sp-8] is valid to use (ABI states data
below %sp is undefined though)
to a generic instruction handler (untranslated code). This caused problems
on MacOS X for Intel where the unaligned stack conditions turned out to be
more visible. Performance loss is really neglectable and this is the right
fix now anyway.