mirror of
https://github.com/ctm/executor.git
synced 2025-01-05 12:30:29 +00:00
643 lines
29 KiB
Plaintext
643 lines
29 KiB
Plaintext
|
Low priority wish list. This is just for random things that you
|
||
|
notice but don't intend to improve any time soon.
|
||
|
----------------------------------------------------------------------
|
||
|
Modify gdb so it assigns labels to branch targets "on the fly" and
|
||
|
remembers them.
|
||
|
|
||
|
gestalt.c should recognize the gestaltVMAttr selector, but return 0
|
||
|
for its value. See the comments preceding gestaltVMAttr in gestalt.c
|
||
|
for more information.
|
||
|
|
||
|
rsys/glue.h can probably go away.
|
||
|
|
||
|
FAKEASYNC and CALLCOMPLETION are big enough that they should be
|
||
|
functions, not macros.
|
||
|
|
||
|
Replace all bzero's with memset's.
|
||
|
|
||
|
Make rsys/libcproto.h and rsys/Olibcproto.h go away.
|
||
|
|
||
|
Give some thought to rootless windows during the code restructuring
|
||
|
phase. Also mapping Mac menus to Windows menus might be nice, for ISV
|
||
|
bundling and such.
|
||
|
|
||
|
Use LOCK_HANDLE_EXCURSION_* and THEPORT_SAVE_EXCURSION everywhere
|
||
|
appropriate. DrawDialog is one example where it would be handy.
|
||
|
|
||
|
Change slash.c so that DOUBLE_SLASH_ADVANCE just calls a subroutine
|
||
|
with alloca(strlen(path) + 1) as the first argument and path as the
|
||
|
second argument.
|
||
|
|
||
|
Fetch and read this document:
|
||
|
ftp://ftp.microsoft.com/developr/msdn/unix2n.zip.
|
||
|
|
||
|
Fix code like this:
|
||
|
if (Hx(c, contrlVis) &&
|
||
|
to use HxX and avoid the pointless byte swap.
|
||
|
|
||
|
Fix old Cliff macros that require you to avoid using a semicolon. Use
|
||
|
either the "do { } while (0)" trick or the ({ }) gccism. Avoiding the
|
||
|
semicolon confuses `indent' and some of these macros are not safe in
|
||
|
situations involving if/else statements.
|
||
|
|
||
|
Change (*item).p to item->p.
|
||
|
|
||
|
HOOKSAVEREGS() and HOOKRESTOREREGS(), and other old macros, should go
|
||
|
away. In general, check all macros in rsys/cruft.h. Some macros are
|
||
|
obsoleted by other suggestions in this file. With the new m68k port,
|
||
|
TONS of macros (for register calling conventions, preserving a5, etc.)
|
||
|
are all totally useless. They should be axed.
|
||
|
|
||
|
Some #if 0's are useful for future reference, but there are many
|
||
|
#if 0's that are totally dead and just cluttering things up. Such code
|
||
|
should go away.
|
||
|
|
||
|
A function "extern GrafPtr setport_return_orig_port(GrafPtr new_port);"
|
||
|
would make THE_PORT_SAVE_EXCURSION macro a little simpler and smaller,
|
||
|
especially considering what a mess `thePort' expands to.
|
||
|
|
||
|
Straighten out the whole "ROMlib_hook" thing, e.g. from script.c:
|
||
|
#if defined (BINCOMPAT)
|
||
|
ROMlib_hook(script_notsupported);
|
||
|
#endif /* BINCOMPAT */
|
||
|
|
||
|
maybe make romlib_stubs go away.
|
||
|
|
||
|
Make a ROMlib global variable for the Point {1, 1}, which we use in
|
||
|
many places, and perhaps for Point {CWC (1), CWC (1)}. I think
|
||
|
with gcc these could actually be macros:
|
||
|
#define POINT_1_1 ((Point) { 1, 1 })
|
||
|
#define POINT_1_1_be ((Point) { CWC (1), CWC (1) })
|
||
|
This is better than having a global struct because it lets gcc just
|
||
|
use a 32-bit constant, instead of a variable in memory. The drawback
|
||
|
is that taking pointers to this value will result in a new struct for
|
||
|
each .o file.
|
||
|
|
||
|
Work on making disasm jump the tracks less often.
|
||
|
|
||
|
Write a tool that takes a disasm file as input and can tell you where
|
||
|
decisions might have been made to get you to a certain point in the
|
||
|
code.
|
||
|
|
||
|
On the x86 we can add to a swapped 2-byte value with an addb and an
|
||
|
adcb (we can also subtract). This works for memory locations and for
|
||
|
%ax, %bx, %cx, %dx (various combinations of src and dest). So for a
|
||
|
memory operand, movw;rorw;addw;rorw;movw becomes addb;adcb.
|
||
|
|
||
|
Write an assembly filter to recognize and optimize this combo, and others?
|
||
|
|
||
|
We should be better about using convenience functions, like SetRect,
|
||
|
OffsetRect, RECT_WIDTH(), etc. instead of doing it by hand over and
|
||
|
over again.
|
||
|
|
||
|
Write a RECT_SET(rect, t, l, b, r) that gloms adjacent constant args
|
||
|
together and writes them out as a single long, etc. Or just fix
|
||
|
gcc to do this?
|
||
|
|
||
|
Replace many BlockMove's with memmove's. memmove is *FAR* faster
|
||
|
because it doesn't have to do syn68k junk to decompile code.
|
||
|
|
||
|
Axe all ONLY_DESTROY_BETWEEN_CODE_SEGMENTS stuff (it's been replaced
|
||
|
by syn68k block checksumming).
|
||
|
|
||
|
Use int or long, not INTEGER, for misc. loop counters. ints are
|
||
|
generally better on the x86 and RISC chips.
|
||
|
|
||
|
Merge rsys/prefs.h and rsys/flags.h
|
||
|
|
||
|
Make ROMlib_emptyvis go away.
|
||
|
|
||
|
Make ROMlib_installhandle go away (?)
|
||
|
|
||
|
Delete all empty, unused, and otherwise bogus header files, and all
|
||
|
references to them.
|
||
|
|
||
|
str255assign should probably call memmove instead of BlockMove
|
||
|
(which has additional baggage).
|
||
|
|
||
|
Clean up #define main oldmain cruft.
|
||
|
|
||
|
Make main() small.
|
||
|
|
||
|
Under DOS we could use the dpmi discard pages call to drastically cut
|
||
|
down on paging. For example, we can call this when large blocks of
|
||
|
memory are freed on the heap, or on our stack. Sandmann says this can
|
||
|
be a big win. It would be easy enough to write a function like:
|
||
|
void dispose_memory (void *start, uint32 num_bytes); That would
|
||
|
dispose any *complete* pages in that range (we couldn't toss the
|
||
|
partial pages on either end, obviously). We could call this routine
|
||
|
at various times to improve paging performance. This is an especially
|
||
|
big win because of round-robin Mac heap semantics, which are terrible
|
||
|
from a virtual memory perspective.
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
Re-indent all the sources. We need to tell indent about all of our
|
||
|
types, or it does the wrong thing with:
|
||
|
|
||
|
(StringPtr) &foo
|
||
|
|
||
|
As far as indent knows, StringPtr is an expression, and the & is a
|
||
|
binary bitwise AND, so indent changes the code to:
|
||
|
|
||
|
(StringPtr) & foo
|
||
|
|
||
|
If you say "indent -T StringPtr" the right thing happens.
|
||
|
----------------------------------------------------------------------
|
||
|
|
||
|
Answer the question, "why is Executor so big?" and see where that leads.
|
||
|
|
||
|
Get no warnings when compiling with these CFLAGS:
|
||
|
"-O -Wall -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs"
|
||
|
|
||
|
Use NEXTSTEP gprof to determine a good link order to avoid paging. Or
|
||
|
perhaps write a Perl script that can do the same thing based on gprof
|
||
|
output.
|
||
|
|
||
|
Minimize long include chains.
|
||
|
|
||
|
Add `const' directives everywhere we can, even in A-line trap handlers
|
||
|
(although there is always the question of what happens if the trap is
|
||
|
patched out and the patched out trap modifies the value...)
|
||
|
|
||
|
It would be nice if we could leverage stroked font support from
|
||
|
windowing systems that support them. I think we just need an
|
||
|
interface that allows us to query the server for a list of font names
|
||
|
and also allows us to have it draw fonts in a bitmap that we can read
|
||
|
and then store in Executor space.
|
||
|
|
||
|
Modify syn68k to allow more "direct" A-line trap calls (for speed).
|
||
|
|
||
|
Get rid of all CW(constant), CL(constant), and esp. Cx(constant) <--
|
||
|
evil! These are easily located with some __builtin_constant_p magic
|
||
|
in those macros. In cases where the operand can be either a constant
|
||
|
or a non-constant (as in a macro def'n containing a CW), use `CWV' or
|
||
|
`CLV' as appropriate. Those macros decide at compile time whether to
|
||
|
swap the input as a constant or not.
|
||
|
|
||
|
Punt PascalToCCall and replace with N machine-generated functions that
|
||
|
are hardcoded to handle a particular set of arguments. We'd only
|
||
|
create one instantiation of each calling convention type and share it
|
||
|
among all functions with that signature. That lets us eliminate
|
||
|
Cliff's hack to handle traps with > 9 arguments and it's substantially
|
||
|
faster than the current approach.
|
||
|
|
||
|
Use hardcoded functions for all CToPascalCall and PascalToCCall.
|
||
|
We'll build one instantation of each required function based on the
|
||
|
types of traps we have *AND* also if we see any calls to
|
||
|
call_ptoc_[call signature] in the Executor sources we'll make sure
|
||
|
that exists as well. We won't need any more ridiculous kludges to get
|
||
|
flags for ctop calls (like right now we sometimes just use the flags
|
||
|
for another trap which we know has the same signature...blech!)
|
||
|
Increase locality of reference by sorting functions by reference count
|
||
|
(# of traps that use it), so little-used functions tend to go at the
|
||
|
end and common ones are all lumped together?
|
||
|
|
||
|
Unify syn68k and Executor byteswapping and stuff. Right now PUSHUL
|
||
|
does slow byte swapping of constants and stuff...blech.
|
||
|
|
||
|
Eliminate P8(...) etc. macros, and replace with normal C syntax
|
||
|
(perhaps preceded with magic keywords to indicate a trap...? keywords
|
||
|
not necessary if we do scheme-like hack, below) This will let us use
|
||
|
standard tools like etags and be less baffling for newbies. We may be
|
||
|
able to use this in conjunction with some tool that generates
|
||
|
prototypes (is there a makeproto or something?) to create ptocflags,
|
||
|
etc.
|
||
|
|
||
|
Have one (scheme-like?) file that describes information about each
|
||
|
trap, with information like: trap number, name, selector info, C
|
||
|
function that handles it, and maybe some info about calling convention
|
||
|
(normal pascal, or 1st arg is in a0, ret val in d0.w, whatever).
|
||
|
Maybe also throw in an int indicating a guess for how common that trap
|
||
|
is, for potential locality-of-reference hacks.
|
||
|
|
||
|
We could have a flag that says "even if this trap is patched out, we
|
||
|
won't call the patched out version inside ROMlib code." That way we
|
||
|
could say that if, for example, we call BitAnd, we'll just call BitAnd
|
||
|
and not check for a patched out version. This is a speed and space
|
||
|
improvement, and we can do it for some traps because any program that
|
||
|
relies on us making a particular sequence of trap calls from within
|
||
|
another trap will lose anyway. We'd clearly want to let them patch
|
||
|
out "big things", but stuff like SetRect, etc. we should just be able
|
||
|
to call directly. I think Apple has introduced the notion of traps
|
||
|
that cannot be patched out, too. We could also mark traps that say
|
||
|
"this trap cannot be patched out" (as a speed hack) and then complain
|
||
|
if the program attempts to patch it out. We could programmatically
|
||
|
generate a packed bit array for all traps indicating whether or not it
|
||
|
can be patched out.
|
||
|
|
||
|
We can avoid annotating the C code for traps with P8, etc, and
|
||
|
instead use this list to decide which C functions should be examined
|
||
|
and processed.
|
||
|
If we note information about calling conventions, we may be able to
|
||
|
machine-generate stubs that call the real C handler with the
|
||
|
appropriate args plucked from syn68k space, or even generate
|
||
|
scheme-like code to allow syn68k or vcpu to *directly* call the C
|
||
|
handler (when not patched out).
|
||
|
We could even annotate the scheme entry for each trap with
|
||
|
information about what logging actions should be taken for that trap
|
||
|
when in debugging mode; machine-generated glue that sits between the
|
||
|
trap invocation and the C function can handle the tracing
|
||
|
appropriately.
|
||
|
|
||
|
Right now we handle patched out traps in a slow and voluminous way,
|
||
|
with a large ?: expression for each C call to anything which is a trap
|
||
|
(see rsys/stubify.h). Instead, we could have a big table of function
|
||
|
pointers, which by default point to our C handler, but alternately
|
||
|
point to machine-generated stubs that call the patched out code. The
|
||
|
stubs can even be placed in a library (.a) so only the stubs we
|
||
|
possibly call internally will take any space in the executable. One
|
||
|
way to avoid having a separate stub for each trap which we can call is
|
||
|
to have the function pointer in the table take an extra uint32
|
||
|
argument at the end. This argument would be ignored by the "real"
|
||
|
handler (C calling convention ignores extra args), but the c-to-p stub
|
||
|
could use that uint32 as the address for the patched out code to call.
|
||
|
So we'd need one c-to-p stub for each trap call signature (arg types +
|
||
|
return type), rather than for each trap.
|
||
|
The problem with this approach is apps who slime the trap tables
|
||
|
directly, without calling SetTrapAddress(). Do people do this? We
|
||
|
don't handle it now because our tables are in the wrong byte order,
|
||
|
and (I think) not at the right address (Quark examines address
|
||
|
0x1008). We could catch this sliminess by write-protecting the trap
|
||
|
tables...hmmm. Even if we don't do that, we could still have stubs
|
||
|
which check to see if the trap is patched out, and if not then just
|
||
|
call the real routine. That would allow us to punt the space taken by
|
||
|
the ?: for every single invocation of the trap. We may also be able
|
||
|
to do other slimy things, like have legitimate C function pointers in
|
||
|
the trap tables, and tell syn68k to compile jumps to each of those
|
||
|
addresses specially or something (toss those addresses in its hash
|
||
|
table). Byte swapping those function pointers confuses the issue
|
||
|
though, since a swapped C function address may be legitimate 68k
|
||
|
code...
|
||
|
|
||
|
Have a scheme-like file describing low globals. It would describe
|
||
|
their name, type, address, and even initial contents on bootup,
|
||
|
launch, whatever. Some variables of course will have initial contents
|
||
|
too difficult to describe here...that's fine, just note that and let
|
||
|
them get set up at runtime with special case C code. At compile time
|
||
|
we can figure out what low memory should look like, and PackBits that
|
||
|
into a C array which gets unpacked at runtime when we initialize mem.
|
||
|
We can also put in trace information that describes what should be
|
||
|
done when these variables are accessed under a debugging vcpu or under
|
||
|
a debugging Executor with page 0 read/write trapped for logging
|
||
|
purposes. This file could generate the appropriate lowglobals.s for a
|
||
|
given host, and even make a .gdbinit that could print them (or "dump"
|
||
|
them) in the appropriate byte-swapped format.
|
||
|
|
||
|
Implement code to allow P_ExitToShell (etc.) to be compile-time
|
||
|
constants, and ask syn68k to put the callback at that address in
|
||
|
particular. It should be possible to machine-generate this
|
||
|
information. That way we can CLC() them, etc.
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
Implement a "package" interface for things like vdriver, events, etc.
|
||
|
This would allow us to:
|
||
|
|
||
|
1) Support multiple vdrivers, etc. simultaneously. Right now we
|
||
|
have separate svgalib and x windows Linux Executors for no good
|
||
|
reason.
|
||
|
2) Dynamically load packages we need, and leave unused packages on disk.
|
||
|
3) Release updates for things like sound, video, etc. without an
|
||
|
entire Executor binary.
|
||
|
4) Distribute the source to various packages (like the svgalib front end).
|
||
|
5) Make it easier for new engineers to implement ports to different
|
||
|
systems. Once they implement the API, Executor should work.
|
||
|
6) Load initialization routines that won't be hit again into a
|
||
|
separate area of memory, for performance reasons.
|
||
|
|
||
|
Each package class will have a unique name (e.g. "vdriver"), and each
|
||
|
subclass will also have a unique name (e.g. "vga"). All global
|
||
|
symbols for that package will be preceded with "vdriver_vga_", etc.
|
||
|
|
||
|
All code outside those packages will just be able to call
|
||
|
"vdriver_shutdown". In reality that will be a macro for some entry of
|
||
|
a function pointer struct (or array?) corresponding to that package.
|
||
|
The real entry might point to "vdriver_vga_shutdown". HOWEVER, if
|
||
|
only one package is compiled in for that particular category (e.g. we
|
||
|
only build in "arch_i386_") then those macros will point directly to
|
||
|
that package and no function pointer table will exist.
|
||
|
|
||
|
We'll also need to do something about globally visible symbols,
|
||
|
e.g. "vdriver_width". There are a few ways to handle those. Probably
|
||
|
they should just be tossed in a struct for that package.
|
||
|
|
||
|
We can support subclassing (useful for different types of vga,
|
||
|
e.g. svgalib and VESA).
|
||
|
|
||
|
I think we should also facilitate NOPs by allowing packages to request
|
||
|
a magic function pointer for a function that returns 0 and one that
|
||
|
returns 1...we'll just share one instantiation of each of those
|
||
|
functions.
|
||
|
|
||
|
With multiple packages being allowed, we can't use preprocessor
|
||
|
conditionals in the same way we do now. For example, right now we
|
||
|
have VDRIVER_DISPLAYED_IN_WINDOW; this will be #defined for X but not
|
||
|
for svgalib...in the future we want both packages to be simultaneously
|
||
|
enabled. I think the way to handle this may be to have preprocessor
|
||
|
conditionals that say VDRIVER_{ANY,ALL,NONE}_DISPLAYED_IN_WINDOW,
|
||
|
based on what subset of vdriver packages installed support that
|
||
|
feature. That way we'd know we don't need the "far pointer" special
|
||
|
case code for either svgalib or X windows, but we do need it for VESA,
|
||
|
etc.
|
||
|
|
||
|
Each package should be able to report its dependencies so we
|
||
|
initialize the packages in the proper order.
|
||
|
|
||
|
Executor should have a list of required packages, like `arch', `os',
|
||
|
`vdriver', `events', so we can tell at compile time if a configuration
|
||
|
is not legitimate.
|
||
|
|
||
|
I think we want to use the package interface for things like block
|
||
|
devices, so we can have ASPI, BIOS, mscdex, HFV all be separate
|
||
|
packages. This is a little different than the previously described
|
||
|
model, since we might have several instantiations of this package
|
||
|
present at once. How do we handle this gracefully? If each package
|
||
|
is a struct full of function pointers and global variables, we can
|
||
|
just keep a pointer to the struct around, and call functions in that
|
||
|
struct. Or, for special packages with multiple simultaneous
|
||
|
instantiations, we can make the first arg to most calls of that
|
||
|
package be a pointer to one of those structs with the function
|
||
|
pointers, e.g. "blockdev_read (dev_info, buf, 0x200);" or whatever,
|
||
|
which would be a macro for "dev_info->read (buf, 0x200)". I dunno.
|
||
|
This is a bit confusing, since the natural approach involves having
|
||
|
distinct structs for each device (e.g. two HFV's have two different
|
||
|
structs) even though they use the same package to be processed. This
|
||
|
would mean the structs couldn't be exactly the same as the generic
|
||
|
package struct. Maybe that's not so bad.
|
||
|
|
||
|
Each package must provide:
|
||
|
[package]_init(allow args, or should all be void?)
|
||
|
[package]_shutdown (void)
|
||
|
|
||
|
I'm also toying with the idea of a stub function that tells whether
|
||
|
the rest of the package should be dynamically linked into the
|
||
|
executable. For example, an X11 package could see if X is running,
|
||
|
and if not we wouldn't bother loading up the code for the rest of the
|
||
|
package.
|
||
|
|
||
|
Should we just use ObjC or C++ for packages? I'm concerned about the
|
||
|
portability implications, although maybe ObjC is everywhere gcc is
|
||
|
these days.
|
||
|
----------------------------------------------------------------------
|
||
|
|
||
|
----------------------------------------------------------------------
|
||
|
[Note: I hacked up some of this stuff in ~mat/x86fetch.c]
|
||
|
|
||
|
A common operation right now is to deference a handle pointing to a
|
||
|
struct, grab some bytes in that handle, and swap them (an Hx
|
||
|
operation). We could make Executor smaller (and faster on i486's) by
|
||
|
having N machine-generated asm routines which do this for us...we'd
|
||
|
have to be more consistent about using Hx and HxX then.
|
||
|
The chosen asm routines could even use bswap or rorw;rorl;rorw
|
||
|
depending on whether they were on an i486 or an i386. We can use
|
||
|
gcc's register calling convention directive so calls to these routine
|
||
|
are fast; however, it may be even better to just make these be inline
|
||
|
asm `call' statements and make sure they smash no registers at all, so
|
||
|
gcc doesn't have to save any regs when calling the subroutine.
|
||
|
Question: should the return value come back in a different register
|
||
|
than the input value?
|
||
|
To be more specific: we'll have a slew of tiny routines which take
|
||
|
as an argument a handle in %eax, and each is hardcoded to return a
|
||
|
byte, word or long offset N bytes from that dereferenced handle.
|
||
|
Each routine can have an alternate entry point right before the main
|
||
|
function; this entry point would also swap the handle first (useful
|
||
|
when it was snarfed from a low global or whatever). So anyway, one of
|
||
|
these routines might look like:
|
||
|
|
||
|
.align 4 ; .align 16 under Linux, bleah
|
||
|
_fetch_swaph_offset12_swap32:
|
||
|
bswap %eax ; swap handle itself
|
||
|
_fetch_offset12_swap32:
|
||
|
movl (%eax),%eax ; dereference handle
|
||
|
bswap %eax ; put that in native order
|
||
|
movl 12(%eax),%eax ; fetch a long from that struct
|
||
|
bswap %eax ; byte swap it
|
||
|
ret ; and return
|
||
|
|
||
|
and on the i386 it would be generated as the following routine. The
|
||
|
first `jmp' is necessary so there are still two bytes between the two
|
||
|
entry points (bswap takes 2 bytes).
|
||
|
|
||
|
.align 4 ; .align 16 under Linux, bleah
|
||
|
_fetch_swaph_offset12_swap32:
|
||
|
jmp 2f ; need to do it this way, only got 2 bytes
|
||
|
_fetch_offset12_swap32:
|
||
|
movl (%eax),%eax ; dereference handle
|
||
|
call 1f ; byte swap it, who cares about i386 speed
|
||
|
; this hack keeps maximum routine size down
|
||
|
; although if these are aligned % 16
|
||
|
; and we have room, we can just inline
|
||
|
; this byte swap. Right now this takes
|
||
|
; 31 bytes, so there's no room.
|
||
|
movl 12(%eax),%eax ; fetch a long from that struct
|
||
|
1: rorw $8,%ax ; byte swap it
|
||
|
rorl $16,%eax
|
||
|
rorw $8,%ax
|
||
|
ret ; and return
|
||
|
2: pushl $_fetch_offset12_swap32 ; fake ret address
|
||
|
jmp 1b
|
||
|
|
||
|
Actually this is better done with xchgb on the 80386.
|
||
|
|
||
|
These functions would start out as the following routine with the same
|
||
|
entry points, and get lazily compiled to either the i386 or the i486
|
||
|
version:
|
||
|
|
||
|
.align 4
|
||
|
_fetch_swaph_offset12_swap32:
|
||
|
jmp 2f
|
||
|
_fetch_offset12_swap32:
|
||
|
pushl $_fetch_offset12_swap32 ; fake `ret' address
|
||
|
1: pushl $12 ; offset
|
||
|
pushl $4 ; value size
|
||
|
pushl $_fetch_swaph_offset12_swap32 ; start of code to rewrite
|
||
|
; WARNING!!!!!!!!!!!!!! If we do the `asm' hack to call the fetch routine,
|
||
|
; so gcc doesn't save as many regs, we need to make SURE that
|
||
|
; setup_x86_fetch_routine saves ALL registers, perhaps with pushal/popal.
|
||
|
; we can't do that here since we're about to smash our own code.
|
||
|
; We can call a stub routine which does a pushal/popal around a call
|
||
|
; to C code to fill in the stub.
|
||
|
jmp _setup_x86_fetch_routine
|
||
|
2: pushl $_fetch_swaph_offset12_swap32 ; fake `ret' address
|
||
|
jmp 1b
|
||
|
|
||
|
|
||
|
Here's one way to write the glue routine that preserves regs and calls
|
||
|
a routine to create the real routine. Maybe the entire routine to
|
||
|
lazily compile the translator should be written in assembly...it
|
||
|
basically just slams out a few bytes and fills in some holes with the
|
||
|
supplied operands.
|
||
|
|
||
|
_setup_x86_fetch_routine:
|
||
|
pushal ; save all regs
|
||
|
pushl 32(%esp) ; copy args to stack
|
||
|
pushl 40(%esp)
|
||
|
pushl 48(%esp)
|
||
|
call _create_x86_fetch_routine
|
||
|
addl $12,%esp ; pop off old args
|
||
|
popal ; restore regs
|
||
|
addl $12,$esp ; pop off old args
|
||
|
ret
|
||
|
|
||
|
We could also have a routine that writes a value to such a
|
||
|
dereferenced location.
|
||
|
We could put these routines in a library so we only link in the ones
|
||
|
we need. Their initial code could set itself up appropriately for the
|
||
|
i386 or i486 (self-modify). This way we only use the ones we need.
|
||
|
I'm not sure how we'd tell gcc to call different functions based on
|
||
|
offsetof() and sizeof() the field being dereferenced...tricky.
|
||
|
Actually if we use asm() directive to call this we might just be able
|
||
|
to make offsetof and sizeof be operands to the asm and have them get
|
||
|
textually glommed into the called routine name. Hehheh. We could
|
||
|
also easily machine-generate access macros for various structs, I
|
||
|
guess.
|
||
|
----------------------------------------------------------------------
|
||
|
|
||
|
Using special macros which call subroutines to access and byte swap
|
||
|
struct fields should help us keep Executor's size down on
|
||
|
architectures with strict alignment constraints. Otherwise each
|
||
|
struct field reference may involve more inline code to load individual
|
||
|
bytes and munge them together. Instead we can store that code in a
|
||
|
few subroutines.
|
||
|
|
||
|
Machine-generate accessor macros for structs. Not sure how useful
|
||
|
this is, and it would take some effort to preserve our naming
|
||
|
conventions. OTOH, we'd get consistent naming conventions when done,
|
||
|
and if we ONLY used accessor macros, we'd be closer to compiling on
|
||
|
non-gcc compilers. The code generated to access a struct field would
|
||
|
be *(long *)((char *)&foo + 14) or whatever...it could do clever
|
||
|
things when there are alignment problems on the host.
|
||
|
|
||
|
What would happen if we postprocessed our assembly and extracted all
|
||
|
sequences of 6 insns or more that show up 4 or more times into a
|
||
|
subroutine, and have everyone just call that? (Be careful with
|
||
|
anything that touches the stack, obviously). We could add magic asm
|
||
|
directives to mark time-critical code when we shouldn't do this. And
|
||
|
we could juke those shared subroutines to self-modify to use bswap
|
||
|
when necessary...hmmm...
|
||
|
|
||
|
Use hardware acceleration for drawing lines whenever we know the
|
||
|
accelerated line will be pixel for pixel correct.
|
||
|
|
||
|
Experiment with using pushl in blitter again. This used to lose when
|
||
|
interrupts came in and used the screen for stack space, but that may
|
||
|
have been before we started using V2 for everything (???)...DPMI might
|
||
|
never use our stack for anything (false under WinNT?)
|
||
|
|
||
|
replace all `NewHandle (...); memset (..., 0, ...); with `NewHandleClear (...)'
|
||
|
|
||
|
Use NewPtrSys() to rewrite code like this (from fileAccess.c):
|
||
|
saveZone = TheZone;
|
||
|
TheZone = SysZone;
|
||
|
cachedir = (char *) NewPtr(dirnamelen+1+MAXNAMLEN+1);
|
||
|
TheZone = saveZone;
|
||
|
NewPtrSysClear() for this code from main.c, etc.:
|
||
|
TheZone = SysZone;
|
||
|
UTableBase =
|
||
|
(DCtlHandlePtr) (long) CL (NewPtr (sizeof (UTableBase[0].p) * NDEVICES));
|
||
|
memset (CL (UTableBase), 0, sizeof (UTableBase[0].p) * NDEVICES);
|
||
|
UnitNtryCnt = CW (NDEVICES);
|
||
|
TheZone = ApplZone;
|
||
|
|
||
|
modify qPicstuff picture drawing engine to run in `debug' mode and
|
||
|
output readable text for the ops/data in the picture.
|
||
|
|
||
|
Use new consistent syntax for byte swapping and handle dereferencing
|
||
|
operations, with chains of operations concatenated right-to-left ala
|
||
|
LISP's CADDR, etc. So:
|
||
|
W = swap 16 bit value and cast back to typeof.
|
||
|
L = swap 32 bit value and cast back to typeof.
|
||
|
P = swap 32 bit "Point" value (0x12345678 -> 0x34127856).
|
||
|
D = dereference and swap
|
||
|
F = get struct field (takes additional arg with field name)
|
||
|
H = dereference, swap long, and get struct field (equivalent to `FD')
|
||
|
If the last letter is a `C', that means the input value is a
|
||
|
compile-time constant. We might wish to preface each macro with a `C'
|
||
|
as a standard prefix (for `convert'?) Otherwise we'd get macros like
|
||
|
`W', which may be too short. Examples, mapping old syntax to new:
|
||
|
CWC(n) -> CWC(n)
|
||
|
CL(n) -> CL(n)
|
||
|
Hx(h, field) -> CWH(h, field) or CLH(h, field), depending on size
|
||
|
HxX(CL(MainDevice), gdPMap) -> CHL(MainDevice, gdPMap)
|
||
|
Hx(CL(MainDevice), gdPMap) -> CLHL(MainDevice, gdPMap)
|
||
|
Hx(Hx(CL(MainDevice), gdPMap), baseAddr) -> CLHLHL(MainDevice, gdPMap, baseAddr)
|
||
|
Any or all of these routines can be inline assembly, if the host
|
||
|
architecture provides them. Any not provided will be automatically
|
||
|
provided or synthesized by nesting low level routines (e.g. CHL can be
|
||
|
written as CH(CL(...))). See also the idea above which describes how
|
||
|
to write Hx (==CLHL for long fields) on the x86 to take advantage of
|
||
|
the 486 bswap and make Executor smaller.
|
||
|
|
||
|
Need an OS-specific function that hints "now is a good time to yield
|
||
|
to another process", for busy loops, WaitNextEvent, etc.
|
||
|
|
||
|
Create a routine that makes a new rectangular region, and use it instead
|
||
|
of separate code to create and then set up the region?
|
||
|
|
||
|
Can the BOOLEANRET type go away now? What's it for?
|
||
|
|
||
|
PackBits the data in color_wheel_bits.c, and uncompress it on the fly?
|
||
|
|
||
|
A 68k interpreter that had nice debugging features:
|
||
|
- 68k breakpoints
|
||
|
- 68k watchpoints
|
||
|
- reading any memory location or range of memory locations
|
||
|
- writing any memory location or range of memory locations
|
||
|
- values appearing in registers
|
||
|
- single stepping
|
||
|
- identifying recent conditional branches that got us here
|
||
|
- backtrace facility that logs jsr's and rts's and so can
|
||
|
backtrace even when there's no frame pointer
|
||
|
|
||
|
We need a consistent naming convention for variables which hold big
|
||
|
endian values, perhaps ending such a variable with a "_be" or
|
||
|
something. But would this apply to Mac struct fields as well? Blah.
|
||
|
Renaming all the Mac struct fields sounds like a bad idea. But always
|
||
|
being conscious of endianness is probably a good thing. Perhaps just
|
||
|
append a "_be" to variables, but not to struct fields? Hmmm. What
|
||
|
about Mac low globals? I guess they sorta stand out because they use
|
||
|
a different capitalization convention anyway.
|
||
|
|
||
|
Use memcpy for code like this (from menu.c), or perhaps a convenience
|
||
|
function that does a NewHandle+memcpy:
|
||
|
h = NewHandle(hsize);
|
||
|
sp = (char *) STARH(mh) + soff;
|
||
|
dp = (char *) STARH(h);
|
||
|
ep = sp + hsize;
|
||
|
while (sp != ep)
|
||
|
*dp++ = *sp++;
|
||
|
Gotta be a little careful here, because if we pass STARH(mh) to the
|
||
|
NewHandle convenience function, that pointer may be invalidated if
|
||
|
memory moves during the real NewHandle operation.
|
||
|
|
||
|
In many places we have (now) pointless casts to Size. These should
|
||
|
go away.
|
||
|
BlockMove((Ptr) tm, (Ptr) &tml, (Size) sizeof(tml));
|
||
|
|
||
|
Prune down common.h as much as possible.
|
||
|
|
||
|
This qPicstuff.c macro is bogus:
|
||
|
#define SE(x) ((x & 0x80) ? x|(~0^0xff) : x & 0xff) /* sign extend */
|
||
|
It has no parens, evaluates its argument multiple times, and could
|
||
|
instead be "#define SE(x) ((int) (int8) (x))"
|
||
|
|
||
|
If we split up the Executor source tree into directories, it might
|
||
|
make sense to have a separate header file for accessor macros for each
|
||
|
data type, with the form TypeName_accessors.h, e.g.:
|
||
|
#include "Region_accessors.h"
|
||
|
|
||
|
Make a RECT_MID_Y and RECT_MID_X that compute the center coordinates of
|
||
|
a rect.
|
||
|
|
||
|
Make NumToString just call sprintf.
|
||
|
|
||
|
Make StringToNum call atoi, atol, or sscanf...
|
||
|
|
||
|
ROMlib_setuid seems unnecessary now that we don't have the mshort
|
||
|
hack. DJGPP has `setuid'. ROMlib_seteuid can also just be replaced
|
||
|
with seteuid.
|
||
|
|
||
|
If low globals were done with macros, like (*(INTEGER *)0x3CA) or
|
||
|
whatever, this would actually generate better code sometimes given
|
||
|
Mat's recent gcc mods. The modified gcc can tell that an address is
|
||
|
long-aligned and do good things.
|