Executor Overview

What Executor is

Executor is a commercial emulator that allows PCs to run many Macintosh applications. Executor does not require Macintosh ROMs or a Macintosh System file and contains no Appple code itself. Executor was written entiredly by engineers without Macintosh backgrounds who have not disassembled any of Apple's ROMs or System file.

Limitations

Because Executor was written strictly from publicly available documentation (Inside Macintosh, Tech. Notes, etc.), programs which make use of undocumented features of MacOS may fail under Executor. Furthermore, there are some portions of MacOS that we haven't implemented yet. Executor is sufficiently large that there are probably bugs in some of our code as well. We realize these are major limitations, but this paper is primarily concerned with implementation details that are interesting to our fellow programmers as opposed to feature sets and limitations which are of more concern to end users and our marketing department.

Design Goals

Our goal is for Executor to be accurate, fast and portable. Beyond that, completeness is a secondary issue.

Accuracy means that each subsystem that we implement should behave exactly according to the functional specs for the subsystem that we've derived from a combination of reading documentation, writing test cases and running programs under Executor.

Fast is harder to qualify. As programmers we like to use advanced techniques that will result in programs running under Executor as quickly as possible. Unfortunately, we have a limited number of engineer hours in a week and most engineering time is spent implementing new subsystems or finding and fixing subtle incompatibilities. We're proud of the speed that we've obtained so far, but we know that we can do better in the future.

Portability is the ability to support multiple platforms from the same source base. A platform is a combination of CPU, operating system and graphics device or windowing system. Executor currently supports Intel 80[3456]86 and compatible CPUs, Motorola m680[34]0 CPUs, the operating systems DOS, Linux and NEXTSTEP and can interact with VGA, SVGA, Display PostScript and X-Windows. To get the best performance on some architectures we do use architecture specific code, but we also write portable versions to be used where the platform specific versions can't be. Although not supported as a product, Executor was ported to DEC's Alpha, but since ARDI has no Alpha and DEC lost interest, the Alpha port is no longer current. Although not recently, ROMlib, ARDI's rewrite of the MacOS OS and Toolbox routines, has been ported to a wide variety of platforms, including MIPS , m88k , Clipper, IBM RT, SPARC and even VAX based systems.

Those three design goals have led us in the direction of dynamic code generation for both the 680x0 emulation and for our blitter. In both cases we use high level descriptions of what we want accomplished and then use special purpose tools at compile time to translate these high level descriptions into constructs that we can then use at run time.

High level descriptions are less error prone, allowing us to document the semantics that we wish to see in our synthetic CPU or blitter using a special purpose language that is directly suited to the task at hand, rather than a general purpose language like C or the traditional language of speed freaks -- assembler.

High level descriptions also lend themselves to portability. We have our tools generate portable constructs for the general case and, with a little more programming effort, faster architecture specific constructs for the architectures that we consider most important.

Since the conversion from high level description to useful construct takes place at compile time, there is no need to worry about the CPU cycles spent doing the mapping. This allows us to design our code by thinking: "At runtime, what would be the optimal instruction sequence to perform a specific task?" Once we know the answer to that question we can ask: "How can we represent at a high level, the task is being accomplished by that optimal set of instructions?". Then, the final question is "Given what we want to generate and how we want to represent it, what does the compile time mapping look like?". The entire time we're pondering those three questions, we're keeping accuracy, portability and efficiency in mind.

Executor Subsystems

Synthetic CPU

Overview

Syn68k is the name of the synthetic CPU that Executor 2 uses. Syn68k is both highly portable and fast. The portable core of Syn68k, which works by dynamically compiling 680x0 code into an efficient interpreted form, was designed to run on all major CPU's. On supported architectures, Syn68k can also translate 680x0 code into native code that the host processor can run directly.

Syngen

Syngen analyzes a lisp-like file describing the bit patterns and semantics of the 680x0 instruction set and produces lookup tables and C code for the runtime system to use. The code and tables generated by syngen depend somewhat on the characteristics of the host processor; for example, on a little endian machine it is advantageous to byte swap some extracted 680x0 operands at translation time instead of at runtime.

The 680x0 description file can describe multiple ways to emulate any particular 680x0 opcode. The runtime system looks at what CC bits are live after the instruction and chooses the fastest variant it can legally use. In Figure 1, we have two CC variants of lsrw; one computes no CC bits, and the other computes all of them.

The 680x0 description file can also specify which 680x0 operands should be "expanded" to become implicitly known by the corresponding synthetic opcode. For example, fully expanding out "addl dx,dy" would result in 64 synthetic opcodes, one

for each combination of data register operands. This results in smaller and faster synthetic opcodes at the expense of increasing the total number of synthetic opcodes. To conserve space, we only "expand out " common 680x0 opcodes. On host architectures where we can compile to native code, we don't waste space by "expanding out" common synthetic opcodes.

Interpreted Code

Our interpreted code consists of contiguous sequences of "synthetic opcodes" and their operands. Syngen can generate ANSI C, but when

• More than four contiguous longs are

completely in the Region, and the number of longs equals 0 mod 4.

• More than four contiguous longs are completely in the Region, and the number of longs equals 1 mod 4.

• More than four contiguous longs are completely in the Region, and the number of longs equals 2 mod 4.

• More than four contiguous longs are completely in the Region, and the number of longs equals 3 mod 4.

The particular case encountered determines which function pointer to load from a lookup table corresponding to the current drawing mode. For example, the "patCopy" drawing mode has one table of function pointers, "patXor" another. There are also some special case tables for drawing patterns that are either all zero bits or all one bits.

The main blitter doesn't care what drawing mode is being used, since it does all mode-specific work through the supplied function pointer table.

Each function pointer points to a function that generates 80x86 code for the appropriate case. For example, one function generates code for a "patCopy" to three contiguous longs, one generates code for "patXor" only to certain specified bits within one long, etc.

The blitter compilation engine marches through the Region scanline from left to right, calling code generation functions as it goes. The generated code is accrued into a 32-byte aligned buffer on the stack. In this way, the blitter constructs a subroutine to draw the Region.

The compilation engine isn't very complicated. The tricky part is the numerous generation subroutines, which need to be fast since they are called so often and need to be easy to write since there are so many of them. For each drawing mode there's one for each case the compilation engine cares about. For pattern drawing modes, there are separate specialized subroutines for cases like patterns that can be entirely expressed in one 32-bit value ("short/narrow") patterns, patterns which can be expressed as one 32-bit value for each row, but which vary per row ("tall/narrow"), as well as "wide" variants of both. Beyond that, there are some versions specialized for 80486 and higher processors (which have the "bswap" instruction).

Generating fast and robust code generators is where the Perl meta-assembler comes into play.

The meta-assembler takes as input an assembly language template, and generates as output Pentium-scheduled assembly code that outputs an 80x86 binary for the input template. This process only takes place when Executor is compiled. Got it? This can be a little confusing, so a few examples are in order.

Here is perhaps the simplest template:

@meta copy_short_narrow_1

movl %eax,@param_offset@(%edi)

@endmeta

This template describes what should be done when the blitter wants to write one long to memory. The meta-assembler processes that into this 80x86 assembly code which is to be called by the blitter compilation engine:

.align 4,0x90

_xdblt_copy_short_narrow_1:

movw $0x8789,(%edi)

movl %eax,2(%edi)

addl $6,%edi

ret

The subroutine that the meta-assembler has produced above, when executed, will generate the movl instruction (i.e. the movl instruction in the template) followed by its argument. The meta-assembler has deduced that "movl" in the example template is 80x86 opcode 0x8789.

Let's take a look at a more complicated template. This template handles the case where we want to bitwise OR a pattern to the destination bitmap, and the number of longs to transfer equals zero mod 4 (e.g. if the blitter wants to OR 36 longs to memory):

@meta or_short_narrow_many_mod_0

addl $@param_offset@,%edi

movl $@param_l_cnt_div_4@,%ecx

1: orl %eax,(%edi)

orl %eax,4(%edi)

orl %eax,8(%edi)

orl %eax,12(%edi)

addl $16,%edi

decl %ecx

jnz 1b

@lit leal (%eax,%edx,4),%ecx

@lit addl %ecx,edi_offset

@endmeta

The meta-assembler compiles that to this:

.align 4,0x90

_xdblt_or_short_narrow_many_mod_0:

movw $0xC781,(%edi)

movl %eax,2(%edi)

movl $0x47090709,11(%edi)

movb $0xB9,6(%edi)

movl $0x8470904,15(%edi)

movl $0x754910C7,23(%edi)

movl $0x830C4709,19(%edi)

movb $0xEF,27(%edi)

movl %edx,%ecx

shrl $2,%ecx

movl %ecx,7(%edi)

addl $28,%edi

leal (%eax,%edx,4),%ecx

addl %ecx,edi_offset

ret

This mechanically generated subroutine generates the executable 80x86 binary for the "or_short_narrow_many_mod_0" template. It gets called by the blitter compilation engine when it needs code to OR a bunch of longs to memory.

The output of the meta-assembler isn't meant for human consumption. As such, the output contains a hodge-podge of magic numbers (0x47090709, 0xB9, 0x8470904, etc.). These numbers are fixed machine code values corresponding to opcodes, constant operands, and other values.

Even though this subroutine is longer than the previous example, it still doesn't take very long to execute. Furthermore, it only gets called when the blitter has determined that many longs are to be ORed to memory, so the time taken actually blitting to memory will typically dwarf the time taken to execute these 15 code generation instructions.

The meta-assembler is a Perl script that works by running numerous syntactically modified versions of the assembly template through "gas", the GNU assembler, and examining the output bytes to discover which bits are fixed opcode bits and which bits correspond to operands. Once it has figured out what goes where, it generates 80x86 assembly code which writes out the constant bytes and computes and writes out the operand bytes. That code is run through a simple Pentium instruction scheduler and the meta-assembler is done. This entire process is, of course, done only once, when Executor is compiled.

A Portable Dynamically Recompiling Blitter

Although the meta-assembler-based blitter works only on 80x86 processors, Executor itself can run on non-Intel processors. On other CPUs (such as the 68040 used in the NeXTstation) Executor's blitter works somewhat differently.

The basic idea is still the same: translate Region scanlines into an efficient form once and then use that efficient form each time the scanline gets drawn. This time, however, the "efficient form" is processor independent, and the blitter is written entirely in C.

As is the case with the 80x86-specific blitter, the portable blitter compilation engine examines scanline start/stop pairs and identifies which of several cases is appropriate. One case is "output three longs", another is "output only certain pixels within the current long", and so on.

Like the 80x86-specific blitter, the particular case encountered determines which entry in a lookup table will be used. But there the similarity ends. The lookup tables contain pointers to C code labels[1] rather than to routines that generates 80x86 code on the fly.

Each scanline gets translated into an array of opcodes for the "blitter opcode interpreter" (which will be described below). Each opcode is stored in one of these C structs:

struct

{

/* Pointer to C code to handle

this opcode. */

const void *label;

/* Offset into scanline */

int32 offset;

/* Extra operand with

different uses. */

int32 arg;

};

For example, consider the case where the blitter wants to write out five contiguous longs from a "simple" pattern starting 64 bytes into the current row. In this case, "label" would equal "&&copy_short_narrow_many_5", "offset" would equal 64, and "arg" would equal 5.

The Blitter Opcode Interpreter

The blitter opcode interpreter is machine generated C code created by a Perl script when Executor is compiled. That Perl script takes as input C code snippets that tell it how to handle particular drawing modes, and produces as output C code for an interpreter.

Here is the template taken as input by the Perl script for the "copy_short_narrow" case. This is the simple case where the pixels for the pattern being displayed can be stored entirely within one 32-bit long (for example, solid white or solid black).

begin_mode cpy_shrt_narrow max_unwrap

repeat @dst@ = v;

mask @dst@ = (@dst@ & ~arg)

| (v & arg);

end_mode

The "repeat" field tells the Perl script what C code to generate for the simple case where all pixels in the destination long are to be affected. The "mask" case tells it what to do when it must only modify certain bits in the target long and must leave others alone. Max_unwrap tells the Perl script to unroll the new blitting loop.

The generated interpreter takes as input an array of blitter opcode structs, which it then proceeds to interpret once for each row to be drawn.

Here is the section of the (machine-generated) interpreter that handles the copy_short_narrow cases. Remember that each "blitter opcode" is really just a pointer to one of these C labels. This code would get used when filling a rectangle with a solid color.

copy_short_narrow_mask:

*dst = (*dst & ~arg) | (v & arg);

JUMP_TO_NEXT;

copy_short_narrow_many_loop:

dst += 8;

copy_short_narrow_many_8:

dst[0] = v;

copy_short_narrow_many_7:

dst[1] = v;

copy_short_narrow_many_6:

dst[2] = v;

copy_short_narrow_many_5:

dst[3] = v;

copy_short_narrow_many_4:

dst[4] = v;

copy_short_narrow_many_3:

dst[5] = v;

copy_short_narrow_many_2:

dst[6] = v;

copy_short_narrow_many_1:

dst[7] = v;

if ((arg -= 8) > 0)

goto copy_short_narrow_many_loop;

JUMP_TO_NEXT;

Note how the inner blitting loop is "unwrapped" for speed. A blitter opcode would specify that 39 longs are to be output by making its "arg" field be 39 and the "label" field point to "copy_short_narrow_many_7", in the middle of the unwrapped loop (39 mod 8 equals 7). The interpreter would jump there and loop until all of the pixels had been written out, at 32 bytes per loop iteration. This is very fast, especially for portable code.

Of course, if any other pixels needed to be drawn, there would be additional blitter opcode structs telling the interpreter what to do. The interpreter dispatches to the next opcode by executing the "JUMP_TO_NEXT" macro, which automatically uses GCC's "goto void *" extension to "goto" the C label that handles the next opcode.

Development Tools

Free Software

It is true that ARDI has a very tight R&D budget, but we really don't skimp on the tools that we use to build Executor. We use free software to develop Executor because we like to push the tools that we use very hard and the only way we can do that and still sleep at night is when we know that if we find bugs in our tools that they can be fixed quickly. With free software the worst case is to fix bugs ourselves, and that worst case is actually much better than the average case with non-free software where you report a bug and pray for a patch. In reality it's rare that we even have to resort to the worst case since bugs reported are often fixed in less than a day.

GCC

GCC is the Free Software Foundation's C compiler. It produces good code and has a powerful inline assembly syntax that allows optimization to be done on the expressions in the inline assembly without the optimization ruining the assembly you've written.

Another handy GCC extension is "typeof" which can be used in macros to cast a value to the type of a different value. The combination of powerful inline assembly and typeof allows us to have efficient macros that swap bytes in a 16 bit or 32 bit quantity. Since the Mac and PC are of different endianness, quick byte swapping routines are very important.

As mentioned above in our synthetic CPU and portable blitter descriptions, we also use GCC's ability to take the address of a label and store it in a variable so that we can produce our own threaded code on the fly.

Hacked GCC

Because the source to GCC is available, it is possible, although not necessarily advisable, to hack in custom extensions. At ARDI we've done this twice in the past. At one time we used a set of locally written modifications to support the pascal keyword so that we could automatically call functions using Pascal calling conventions. At the same time we also supported '1234' (i.e. the ability to construct a 32-bit quantity out of four character constants inside apostrophes). Eventually we decided that we didn't get enough benefit from these extensions to make it worth patching new versions of GCC as they came out.

The other time we modified GCC was when we were porting Executor to DEC's Alpha processor. We were doing this under OSF/1 which uses 64-bit pointers. Since Executor needs to use the same internal representation that Macs use, we wanted a way to easily write 32-bit pointers to memory in such a way that they would be extended to 64-bits when they were read into a register for use. To do this we made GCC support "pointer bit fields", a logical extension that allowed bit-field notation to be used when specifying pointers. At that time we didn't have a resident GCC expert, so we were lucky that such modifications basically consisted in taking out a few checks that disallowed such constructs. Once those checks were removed, pointer bit-fields, "just worked".

DJGPP

DJGPP is DJ Delorie's (see http://www.delorie.com) port of GCC to MSDOS. It allows DOS users to compile UNIX programs under DOS and to run them with little or no modification. DJGPP is GCC and associated development tools with a special UNIX like C-library and a "DOS Extender". DOS extenders are used to combat OS inferiority. DOS is a 16-bit OS, whereas most relatively modern OSes are 32-bit. DOS extenders allow 32-bit programs to run under DOS. Executor is one such program. In fact, we use the djgpp libraries and DOS extender but we don't actually use the DOS port of GCC, because we don't like DOS. We like Linux and GCC is well structured so we can do cross-compilation and cross-linking with the djgpp libraries and build our DOS product under Linux. We completely compile the DOS version of Executor under Linux. We then copy the new Executor binary to a DOS partition, reboot to DOS, test Executor and then get the heck out of DOS. Time spent using Executor is more like a Mac than it is like DOS.

Debugging Tools

Internally we have many debugging tools to help us figure out why an application may die or misbehave under Executor.

More Free Software

GDB in General

Almost all of our debugging is done under the GDB debugger. As with GCC, we're not using GDB because it's the free debugger; we're using the free debugger because it's GDB. GDB is quite powerful.

Whenever we find that a given application fails under Executor, we try to reproduce the failure under Linux. Debugging on a system that has complete memory protection and pre-emptive multi-tasking means that your system stays up even when your application crashes. There's also no need to worry that when a program is misbehaving that it's subtly corrupting other programs on the system.

hardware watch points

Beyond the features that are handed to us due to the underlying robustness of the OS, GDB also supports hardware watch points, at least on 80x86 based PCs. "80x86"s have the ability to use hardware to watch a small set of memory locations to see when they change. Since the checking is done by hardware, the program runs at full speed until the memory location is modified, at which point the debugger stops, tells us which instruction modified which memory address and what the old and new values are for that address.

As an example, assume we want to know when the low-memory global TheMenu is changing, here is how it might look under GDB:

(gdb) watch TheMenu

Hardware watchpoint 1: TheMenu

(gdb) c

Continuing.

Hardware watchpoint 1: TheMenu

Old value = 0

New value = 768

C_HiliteMenu (mid=3) at menu.c:877

(gdb) swap16 768

$2 = 0x3

(gdb) c

Continuing.

Hardware watchpoint 1: TheMenu

Old value = 768

New value = 0

C_HiliteMenu (mid=0) at menu.c:877

(gdb) delete 3

(gdb) c

Continuing.

At the first (gdb) prompt above, we tell GDB that we want to be alerted whenever the expression "TheMenu" changes. GDB is clever enough to realize that it can watch that expression with a hardware watchpoint, so it assigns watchpoint 1 to the task. We then continue, which allows Executor to continue running whatever program it was already running.[2]

Eventually, when the menu bar was accessed, GDB told us that TheMenu had changed from 0 to 768. 768 may sound like a weird value for TheMenu to take, but this is on a byte swapped machine, so we need to swap that 16-bit value to see what the TheMenu would look like to a Mac program and we find that it's 3, a sane value for TheMenu, after all. We let the program continue and later TheMenu is changed back to zero.

You can't see it, but in another window the source to Executor is displayed so that we are automatically shown the 877th line of menu.c when GDB's watch point triggers there.

The argument to the watch command is an arbitrary expression, so it is possible to watch for much more complex changes than our example demonstrated. Although only relatively simple watchpoints will be handled by hardware watchpoints, the others will be handled by software watchpoints which are much slower.

Hacked GDB

Unlike GCC, where we made local modifications and then, upon reflection, threw them out, we have made a slight change to GDB that is a big win for debugging Executor (and Mac programs running under Executor) on PCs. GDB always knows how to disassemble the object code that it's running, and GDB is available for many architectures, so we modified GDB so that on the 80x86 we can do both 80x86 disassembly and 680x0 disassembly. That allows us to look at sections of memory within our emulator and see what 680x0 code is there.

In the example below, Executor is running the game Risk, when we interrupt Executor and then tell GDB to break in the routine alinehandler. We then continue until alinehandler is hit. We then disassemble, in 680x0 format, the first nine instructions at the location from which alinehandler was dispatched. After doing that we disassemble in 80x86 format the first nine instructions of alinehandler itself.

(gdb) b alinehandler

Breakpoint 6 at 0x17ce2d: file executor.c, line 369.

(gdb) c

Continuing.

Breakpoint 6, alinehandler (pc=3652006, ignored=0x0) at executor.c:369

(gdb) set m68k

(gdb) x/9i pc

0x37b9a6 : _SystemTask

0x37b9a8 : clrw sp@-

0x37b9aa : movew #-1,sp@-

0x37b9ae : pea a5@(-27598)

0x37b9b2 : _GetNextEvent

0x37b9b4 : moveb sp@+,d0

0x37b9b6 : tstb d0

0x37b9b8 : beqw 0x37ba0e <end+667542>

0x37b9bc : movew a5@(-27598),d0

(gdb) set m68k off

(gdb) x/9i alinehandler

<alinehandler>: pushl %ebp

<alinehandler+1>: movl %esp,%ebp

<alinehandler+3>: subl $0x28,%esp

<alinehandler+6>: pushl %esi

<alinehandler+7>: pushl %ebx

<alinehandler+8>: jmp 0x17ce10 <alinehandler+48>

<alinehandler+10>: nop

<alinehandler+11>: nop

<alinehandler+12>: nop

Being able to disassemble 680x0 code on the 80x86 required us to change approximately 50 source lines of GDB (remember, the 680x0 disassembly code was already present for use in GDB on 680x0 machines). We also added a set of tables so that a-line traps and low-memory globals are displayed by name, rather than by number.

Although our special circumstances led us to modify the GDB source code, GDB is customizable out of the box. We've defined a handful of macros that automate debugging tasks. Figure 6 is a macro that crawls through the stack in mac space.

For comparison, Figure 7 is what GDB produces when backtracking code that is compiled with GDB debugging symbols.

As you might guess, this disparity of information makes it much easier for for us to track down bugs in our own code then finding bizarre incompatibilities in the code that is being run under the emulator.

Disassembler

Since GDB already knows how to disassemble 680x0 code it was possible to write a driver for GDB so that it can disassemble Mac programs. The driver is about 1,000 lines of C code, with another 500 lines describing the low-memory globals. Basically the driver knows about CODE resources and how intersegment jumps work. GDB normally doesn't produce labels for jump targets or the beginning of subroutines, so the driver adds those too, to make the output that much easier to read.

Run-time Aids

Because we're using our own set of OS and Toolbox routines, we can add code that is conditionally compiled into debug versions of Executor that can provide still more information than GDB or GDB macros can.

Debugtable, Debugnumber

Our A-line trap handler has a table, known as debugtable, of 4096 32-bit ints that it updates each time a trap is taken. Each time alinehandler is called, a variable known as "debugnumber" is incremented and then the value of debugnumber is stored in the slot in debugtable corresponding to the aline trap that was called. This allows us to see both what traps were recently executed and a complete list of every trap that an application makes, no matter how long the application has run.

This scheme has its drawbacks. Traps that are dispatched via selectors are all lumped together. Traps whose addresses are taken and then are called by jumps through the address don't show up in debugtable. Although debugtable and debugnumber are perhaps the least sophisticated portion of Executor, they're still quite handy, since a visual inspection of the last 100 traps made before an application died often gives a good idea of where to start looking for the source of the incompatibility.

XX_slam

In the course of developing Executor, we did a major rewrite of our memory manager and our TextEdit replacement. In both cases it's not enough to just implement the APIs that are defined in Inside Macintosh, we also have to duplicate the in-memory data structures so that programs which count on them will run properly. To help us verify that we weren't adding new bugs when we rewrote those subsystems we added routines that would consistency check the data structures that each of those subsystems support.

Because these consistency checks are thorough but time consuming, we call them "slams", and by default they are not enabled, even in debugging versions of Executor. When they are enabled, the data structures for each subsystem are slammed at the entry to a call that might modify one of the data structures and the data structure is slammed once again on exit of the routine. We can turn them on at run-time either by using a command line option when Executor is started or by using GDB to enable the slamming. This is something we should have done for all of Executor's subsystems from day one, since it's ever so helpful to be told that going into routine XXX, the heap was fine, but coming out the heap was corrupted.

Image Viewer

Reading disassembled code is much easier than staring at hex numbers. Similarly, being able to view a portion of memory as some sort of PixMap (assuming that the memory really is a bit image) is also better than staring at a bunch of hex numbers. When we build Executor for X-Windows, we also build an image server that uses UNIX interprocess communication to communicate with the process being debugged under GDB. This allows us to monitor offscreen graphics, which can be very important when an application makes many graphics calls and eventually an abomination is drawn on the screen instead of what should have been drawn.

Our debugging arsenal includes other, more prosaic, tools. In fact, our debugging environment encourages the development of new tools, because it's so easy to leverage existing tools into new tools and even write new tools from scratch.

Future Plans

Much of VCPU, a successor to Syn68k, has already been written. VCPU performs many optimizations that Syn68k does not, including improved register allocation, dead subregister elimination, opcode "widening", and moving work outside of loops. VCPU has a clean high-level syntax for specifying both front ends and back ends, allowing it to dynamically compile both PowerPC and m68k binaries on any architecture we decide to support.

Although we don't explicitly mention it, the graphic subsystem one layer above the blitter already has hooks in it to allow use of graphics accelerators, where present. We plan a native port to Win32 and OS/2 and those ports should be able to use fancier graphic subsystems and also make use of the underlying network APIs.

Currently INITs and CDEVs do not run under Executor, but the same mechanisms that allow applications to run can also allow INITs and CDEVs to run. QuickTime and ATM will both be high priorities after Executor 2 ships.

We will also be developing compiler tools that will allow ISVs to natively compile CPU specific routines to be used when their applications are run under Executor. Executor already uses such gateways internally.

Already, multiple simultaneous instances of Executor can be run under NEXTSTEP and Linux (and to a lesser extent under Windows '95). Currently only Executor/NEXTSTEP handles PICT pasteboard cutting and pasting from one instantiation of Executor to another, and no versions of Executor do enough file locking to allow concurrent access of the same HFS volumes at once. This needs to be fixed, since either through shared text segments under UNIX and UNIX like operating systems or through DLLs under Microsoft operating systems, it can be made fairly efficient to run multiple instances of Executor simultaneously. When that is done, each instance of Executor has its own address space and is automatically scheduled by the underlying operating system scheduler. That means that Executor "inherits" memory-protection and pre-emptive multi-tasking from the underlying core operating system.

By properly exploiting this inheritance it should be possible to provide an environment that allows well-behaved Mac applications to run efficiently under a variety of PC operating systems with automatic protection from non-well-behaved applications.

One interesting variant on this theme would be to use Linux as the core OS, but to hide it from the end-user, for a net result of an 80x86 box that boots an efficient, robust MacOS-like environment.

[1]"What the heck is a pointer to a C code label?", you ask? gcc (the GNU C compiler) has a "pointer to label" extension to the C language which makes the statement "&&my_label" evaluate to a

"void *" that points to the compiled code for "my_label:" within a C function. This, combined with gcc's "goto void *" extension, allows C programs to execute goto statements whose destinations are not known at compile time.

[2]I actually set this watchpoint in the session of Executor that I am using to run Word 5.1 for the Macintosh to compose this document (Executor/Linux on a 90 MHz Pentium).

Executor Overview

What Executor is

Limitations

Design Goals

Executor Subsystems

Synthetic CPU

Overview

Syngen

Interpreted Code

Native Code

Self-modifying Code

Examples

Graphics

SVGA Graphics

Blitter Overview

A Simple Blitter

A Dynamically Recompiling Blitter

Meta-Assembler

A Portable Dynamically Recompiling Blitter

The Blitter Opcode Interpreter

Development Tools

Free Software

GCC

Hacked GCC

DJGPP

Debugging Tools

More Free Software

GDB in General

hardware watch points

Hacked GDB

Disassembler

Run-time Aids

Debugtable, Debugnumber

XX_slam

Image Viewer

Future Plans