GCC has become too smart - we need to slice the binary created to be sure the
address of the trap is within the test addresses. This is why each trap occurs
between two case labels and a new section of assembly code is set in between.
giving it to the host OS, and don't clear clipboard every time as some
apps will put many varieties of the same data in succession...
however, a better fix would be to patch the ROM ZeroScrap function in a
similar way as we patch GetScrap/PutScrap
Attached is a set of patches to port the precise timer that is currently used in the Linux and BeOS builds of SheepShaver to Mac OS X (and any other Mach-based operating systems).
Currently, the Linux build uses the clock_gettime() function to get nanosecond-precision time, and falls back on gettimeofday() if it is not present. Unfortunately, Mac OS X does not currently support clock_gettime(), and gettimeofday() has only microsecond granularity. The Mach kernel, however, has a clock_get_time() function that does very nearly the same thing as clock_gettime(). The patches to BasiliskII cause the timing functions such as timer_current_time() to use clock_get_time() instead of gettimeofday() on Mach-based systems that do not support clock_gettime().
The changes to SheepShaver involve the precise timer. The existing code for Linux uses pthreads and real-time signals to handle the timing. Mac OS X unfortunately does not seem to support real-time signals, so Mach calls are again used to suspend and resume the timer thread in order to attempt to duplicate the Linux and BeOS versions of the timer. The code is somewhat ugly right now, as I decided to leave alone the pre-existing style of the source file, which unfortunately involves #ifdefs scattered throughout the file and some duplication of code. A future patch may want to clean this up to separate out the OS-specific code and put it all together at the top of the file. However, for the time being, this seems to work.
This has not been extensively tested, because I have not been able to get my hands on a good test-case app for the classic Mac OS that would run inside the emulator and try out the timer. However, performance does seem to be better than with the pre-existing code, and nothing seems to have blown up as far as I can tell. I did find a game via a Google search - Cap'n Magneto - that is known to have problems with Basilisk/SheepShaver's legacy 60 Hz timer, and the opening fade-to-color for this game appears to run much more smoothly with the precise timer code in place.
SheepShaver includes the C errno string in many error messages. One case is when it calls the memory allocation routines in the Basilisk II vm_alloc.cpp program.
This works when the memory allocation routine uses functions that set errno (such as mmap or malloc). For example, running SheepShaver on a Linux hosts produces meaningful error messages.
The problem is that when run on an OS X host, the memory allocation uses Mach routines such as vm_allocate, which do not set errno.
So when SheepShaver reported the error, it used a stale value of errno, which happened to be 17. The result was an extremely misleading error message: "Cannot map RAM: File already exists".
The fix is to change vm_alloc so that it translates Mac return codes into POSIX errno values.
It also initializes errno to 0 at the start of the memory allocation routine, so that no matter what path it takes, it won't return a stale value.
Fixes copy/paste errors in the Windows version of SheepShaver, wherein pasted
text would have a trailing null character or extra garbage after the end.
Fix for bug: SheepShaver compiled with VOSF off will not display
fullscreen on OS X. The VM boots, but the display is entirely black.
This was expected, I suppose, since video_refresh_dga() didn't
actually attempt to draw anything!
The patch fixes this. Notes:
* video_refresh_window() now takes an argument of type driver_base,
since nothing specific to driver_window was used
* video_refresh_dga() can now call video_refresh_window_static()
* update_display_static_bbox() now respects the destination having a
different bytes-per-row from the source
* fullscreen modes are now created for all depths
Here is a patch to allow compiling of SS and B2 with an SDL Framework. You can
get this by downloading from:
http://www.libsdl.org/release/SDL-1.2.13.dmg
Here is how I tested on an intel 32-bit mac with Mac OS X 10.5.6:
SS ./autogen.sh --disable-standalone-gui --enable-vosf --enable-sdl-framework --enable-sdl-framework-prefix=/Users/mzs/Library/Frameworks --enable-sdl-video --disable-sdl-audio --enable-addressing=real
--without-esd --without-gtk --without-mon --without-x
SS /autogen.sh --disable-standalone-gui --enable-vosf --disable-sdl-framework --disable-sdl-video --disable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --with-x
B2 ./autogen.sh --disable-standalone-gui --enable-vosf --enable-sdl-framework --enable-sdl-framework-prefix=/Users/mzs/Library/Frameworks --enable-sdl-video --enable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --without-x --enable-jit-compiler
B2 ./autogen.sh --disable-standalone-gui --enable-vosf --disable-sdl-framework --disable-sdl-video --disable-sdl-audio --enable-addressing=real --with-esd --without-gtk --without-mon --with-x --enable-jit-compiler
(esound does not really work on mac, it needs some better coreaudio patches.)
configure.ac for SS has two little additional fixes so that the Cocoa prefs gui
does not get built if you are building for X11 and so that you can use esd, sdl,
or coreaudio for sound.
I was testing some other SS patches and I noticed that when I ran an X11
build of SS there were not all the video modes I expected in the the
control strip. Mac OS X 10.5 changed the form of the DISPLAY environment
variable. The reason for this is that the DISPLAY variable looks like
this in Leopard:
/tmp/launch-XXXXXX/:0
The Xs are like in mktemp.
Here is a patch that has a shell script cpr.sh to recursively copy directories but
discarding things that cause problems at least on 10.4 when making the .app bundles.
This patch helps to keep the audio from breaking-up on slow machines when using
SDL audio. On those slow machines you do still get the break-up every so often
but the sound tends not to break-up nearly as often. It is much better on the
ears. Notably often the system beeps do not have a pause in them.
Slow machine is <= 1 GHz G4.
This first patch gets B2 and SS to build under Leopard and Tiger.
I tested this on a 32-bit intel 10.5.6 mac like so:
B2
./autogen.sh --disable-standalone-gui --enable-vosf --enable-sdl-video --enable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --without-x
SS
./autogen.sh --disable-standalone-gui --enable-vosf -enable-sdl-video --disable-sdl-audio --enable-addressing=real --without-esd --without-gtk --without-mon --without-x --enable-jit
There is also a little tweak so that you can use sdl audio in SheepShaver when building for Mac OS X.
Previously, SheepShaver would usually hang if it was unable to access the ROM
file on startup, due to a race between media_poll_func() and DarwinSysExit().
This change eliminates the race by ensuring that media_poll_func() always ends
up waiting in CFRunLoopRun(), which allows us to terminate the polling thread
in a consistent way.
This fixes the mapping of SDL mouse-button numbers to MacOS/ADB mouse-button numbers,
to correct the reversal of the middle and right buttons. Most useful in conjunction
with a multi-button mouse enabler such as TheMouse2B:
http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cfg/themouse-2b-11.hqx
... which can turn a right-click into a control-click.
The CDROM status call "WhoIsThere" (csCode 97) is now implemented. Apart from
eliminating "WARNING: Unknown CDROMStatus(97)" complaints from the console log,
this does not appear to have had any effects whatsoever.
A typo in the implementation of the CDROM status call "GetCDFeatures" has been
corrected per Technical Note DV22:
http://developer.apple.com/technotes/dv/dv_22.html
Software cursor mode is now supported, although currently the existing hardware
cursor mode is used whenever possible. (Software mode will be used if you are
running with a recent version of SDL's Quartz video driver, since a bug in SDL
1.2.11 and later prevents the hardware cursor from working properly with that
driver.)
In hardware cursor mode, the hot-spot is now determined heuristically. Formerly
it could not be determined and was always (1,1), an annoyance for many cursors
other than the arrow.
In hardware cursor mode, the cursor will now be hidden when requested by the
emulated OS (such as when you are typing in a text field).
In hardware cursor mode, some cursor image formats that the code does not handle
correctly will now be rejected, causing the emulated OS to revert temporarily to
software cursor mode. Formerly you would just end up with random garbage for a
cursor. This typically happened for grayscale or color cursors; rejecting images
with rowBytes != 2 eliminates the worst cases.
SheepShaver window a number of times (somewhere around 30 or 40 times will do
it), SheepShaver appears to lock up. This occurs because SDL posts application
activate/deactivate events to its event queue when the mouse moves in/out of the
SheepShaver window, but these events are never consumed, and as a result, the
event queue fills up. Thereafter, no new events can be posted, and user inputs
are ignored. The fix is to consume SDL_ACTIVEEVENT in handle_events().
file I/O to the external filesystem. The application-specified ioPosMode parameter must
be masked off appropriately in extfs.cpp:fs_set_fpos(), as is done elsewhere in the file.
- Rename X86_SSE_CC_NE to X86_SSE_CC_NEQ (match Intel reference manual)
- Rename MOVDLX to MOVDXD (%Xmm register as Destination)
- Rename MOVDQX to MOVQXD (%Xmm register as Destination)
- Rename MOVDXL to MOVDXS (%Xmm register as Source)
- Rename MOVDXQ to MOVQXS (%Xmm register as Source)
explicitly generated from mig. The advantage of that is to provide a "fast"
path for x86_64 on Leopard too (fault address in code[1]).
By "fast", this means +33% faster wrt. explicitly thread_get_state() but
still pretty slow (40 usec/fault). This is on par with the i386 code path though.
Leopard kernel faster? This is pure marketing hype. For 32-bit applications,
Mach exception recovery is 60% slower. For 64-bit applications, this is up
to 40% faster though. In any case, MacOS X remains pretty slow wrt. Linux...
environment variable: SIGSEGV_MACH_FAULT. It can be set to "direct" to
assume the fault address comes from code[1] argument, or "slow" to use
the slow path through thread_get_status(EXCEPTION_STATE)->faultvaddr.
in the bundle. This is faster and more accurate as this avoids emulation.
Also clean-up code so that to prepare the use of lib uaccess on hpux/ia64.
XXX: this will need explicit use of uint64_t to define registers because
HP/UX is ILP32 capable and all registers are 64-bit capable so "unsigned long"
won't fit.
complex than expected but it was fun to play with. Who designed this ISA?
I'd love to see how the decoder is implemented in HW, by all means it is
not "simplified" unless I missed some pattern...
XNU 792.21.3 (10.4.10) and XNU 1228 (10.5.0), exception handler code[1] always
contains the fault address nowadays. So make it the default fast path but keep
provisions to check that at run-time first.
This yields a nearly 4x improvement in SIGSEGV recovery but MacOS X is still
suboptimal wrt. Linux, so VOSF is still not possible with frameskip == 0.
XXX: the ppc kernel had bugs that caused DAR (put into code[1]) to be incorrectly
decoded. This would need a broader test audience or more careful audit of the
sources changes.
- set slirp client hostname
- fix slirp redirection on systems without a useful host IP address
- separate alias_addr (10.0.2.2) from our_addr (Ed Swierk)
- fix 32+ KB packets handling (Ed Swierk)
- fix UDP broadcast translation error
- solaris port (Ben Taylor)
on Tiger+ to store FInfo and FXInfo. Otherwise, plain old .finfo/ helpers are
used. "Safe" flags and fields are always synchronized to/from MacOS X.
BTW, CFString leak was fixed at the same time.
I am adding functionality to support this. For the moment, I've only
added the platform-specific conversion for MacOSX (ie: UTF8 -> MacRoman),
but others can be added later.
Rather, use an address override prefix (0x67) though Intel Core optimization
reference guide says to avoid LCP prefixes. In practise, impact on performance
is measurably marginal on e.g. Speedometer tests.
Not quite the way I wanted to do it but it will do for now.
(on a real Mac, the real audio hardware should be able to pull/grab the data
from our buffers - an extra thread with its own set of buffers is wasteful!)
Not quite the way I wanted to do it but it will do for now.
(on a real Mac, the real audio hardware should be able to pull/grab the data
from our buffers - an extra thread with its own set of buffers is wasteful!)
- Don't export transfer types definitions (formerly used by older API)
- Handle ADD instructions in ix86_skip_instruction() (generated by icc 9.1)
- Use "%p" format for EIP/RIP addresses
if you have changed the depth since boot (seems to be something strange
with the parameters that I still haven't worked out). If this happens,
we now put a suggested workaround in the warning message.
This reduces the number of Screen_fault_handler() calls by 80%. i.e. VOSF
is now viable on this turtle MacOS X. Besides, since there is no buffer
comparison, idle sleep can really be effective. SheepShaver in idle mode
on my PBG4 now goes below 8% of CPU resources instead of 70-80% with
bounding boxes based video refreshes.
Caveat: if your program doesn't use standard MacOS routines that call NQD,
then you can expect slower (visual) performance. However, I do think the
new default behavior (VOSF+NQD) is the most common.
This does not improve graphics performance but helps CPU because it reduces
the number of bytes transfered to actual screen. I saw an improvement by up
to 26% in frameskip 4 800x600x16 but also a hit by 3% with frameskip 0.
The next step is to use NQD bounding boxes to help detecting dirty areas.
So far, this is the best I can do without VOSF working (MacOS X performance
bugs -- pitifully slow Mach syscalls)
- Properly handle migration from "screenmodes" and "windowmodes" to "screen"
- Fix has_mode() logic to really test for actual mode availability. i.e.
no longer start in large screen mode if user specified a max size.
- Call user handler for KERN_INVALID_ADDRESS too (SIGBUS)
- Check for VALID_THREAD_STATE_FLAVOR in forward_exception()
- Return KERN_FAILURE if forward_exception() got an unknown behavior code
Other bugs fixed:
- CD-ROM media are polled and now can be changed without rebooting
- Buffer overflow, memory leak and extra wait in CD-ROM ejection code
them. So, if someone has BeOS and wants to give it a try, please change and
test this new code. Corner case could be a resume_thread() when emul_thread
is not suspended.
Fixlet to powerrom_cpu: call idle_resume() from TriggerInterrupt().
prefs items changes but it should now be simpler to add other ethernet
emulation means (slirp, tap-win32).
# Basilisk II driver mode
ether {guid}
becomes
ether b2ether
etherguid {guid}
# Basilisk II Router mode
routerenabled true
becomes
ether router
as BasiliskIIGUI.app, or /Applications/BasiliskII.app if none was found.
Also make yet another arrangement for MacOS X "difference". This scenario
was not working: WarningAlert -> ErrorAlert, the latter was not performed
because the exit status was not properly filled in sip->si_status...
- Rewrote dispatch loop to accomodate GTK+1.2 for MacOS X (which doesn't
like threads nor forks(!)). The latter also requires an additional patch
to the version 0.7 available on SourceForge
- Run-time detect JIT capability so that we could hopefully use the ppc GUI
on intel based Macs (check!)
STANDALONE_GUI. This is the second step towards a more interesting GUI alike
to VMware. Communication from/to the GUI is held by some lightweight RPC.
Note: The step should be enough to provide a tiny GTK GUI for MacOS X.
<http://lists.nongnu.org/archive/html/qemu-devel/2006-04/msg00245.html>
This does improve slirp performance a lot, especially in FTP passive mode
transfers. i.e. now, they are equally as fast as non passive mode. I get
approx. 800 KB/sec in B2 and 500 KB/sec in SheepShaver (over a DSL line).
In native env, the max download data rate from my ISP is around 950 KB/sec.
up to 1 GB of Mac RAM in both REAL_ADDRESSING and DIRECT_ADDRESSING modes.
NetBSD 2.0 can use the Linux linker script. However, I could not verify 1G
support since my installation does not permit this.
arches. This probably already worked in the past but I have just verified
that Basilisk II works with up to 1 GB of Mac RAM in DIRECT_ADDRESSING or
REAL_ADDRESSING mode.
BTW, a quick Speedometer 4 CPU performance test showed a +15% speed increase
in real addressing mode vs. direct addressing. x86 arches don't benefit much
from that mode since they support complex address modes already (beyond plain
load/store).
TODO: check on MacOS X for Intel so that to reduce the test to darwin*:*)
addressing in REAL_ADDRESSING mode. Only support platforms with proper
linker scripts to map the whole Mac memory from address 0. Warning fix.
NOTE: when compiled with --enable-addressing=real on Linux {x86,x86_64},
you can not address up to 1.5 GB in Basilisk II.
This was only an experiment. Improvement was marginal: only +3% on AMD64
(an Athlon 64 3200+). However, it may be interesting to test it on EM64T
(e.g. newer P4s) since an older P3/800, hence in 32-bit mode, got a +15%
improvement in Speedometer 4 benchmarks.
Rationale: lahf/seto sequences avoid load/stores to the stack (push/pop)
and it was thus hoped to be faster.
Anyhow, SAHF_SETO_PROFITABLE can only be enabled manually at this time.
Edit your generated Makefile for testing, but first make sure your CPU
supports lahf in 64-bit mode (lahf_lm flag in /proc/cpuinfo).
- In the instruction skipper code, add a huge kludge (trampoline) to forcibly
zero out %global registers when requested. Otherwise, Solaris/SPARC turned
out to use %g1 during signal handling, and the zero we could have written
to there vanished. This assumes [%sp-8] is valid to use (ABI states data
below %sp is undefined though)
to a generic instruction handler (untranslated code). This caused problems
on MacOS X for Intel where the unaligned stack conditions turned out to be
more visible. Performance loss is really neglectable and this is the right
fix now anyway.