Low priority wish list. This is just for random things that you notice but don't intend to improve any time soon. ---------------------------------------------------------------------- Modify gdb so it assigns labels to branch targets "on the fly" and remembers them. gestalt.c should recognize the gestaltVMAttr selector, but return 0 for its value. See the comments preceding gestaltVMAttr in gestalt.c for more information. rsys/glue.h can probably go away. FAKEASYNC and CALLCOMPLETION are big enough that they should be functions, not macros. Replace all bzero's with memset's. Make rsys/libcproto.h and rsys/Olibcproto.h go away. Give some thought to rootless windows during the code restructuring phase. Also mapping Mac menus to Windows menus might be nice, for ISV bundling and such. Use LOCK_HANDLE_EXCURSION_* and THEPORT_SAVE_EXCURSION everywhere appropriate. DrawDialog is one example where it would be handy. Change slash.c so that DOUBLE_SLASH_ADVANCE just calls a subroutine with alloca(strlen(path) + 1) as the first argument and path as the second argument. Fetch and read this document: ftp://ftp.microsoft.com/developr/msdn/unix2n.zip. Fix code like this: if (Hx(c, contrlVis) && to use HxX and avoid the pointless byte swap. Fix old Cliff macros that require you to avoid using a semicolon. Use either the "do { } while (0)" trick or the ({ }) gccism. Avoiding the semicolon confuses `indent' and some of these macros are not safe in situations involving if/else statements. Change (*item).p to item->p. HOOKSAVEREGS() and HOOKRESTOREREGS(), and other old macros, should go away. In general, check all macros in rsys/cruft.h. Some macros are obsoleted by other suggestions in this file. With the new m68k port, TONS of macros (for register calling conventions, preserving a5, etc.) are all totally useless. They should be axed. Some #if 0's are useful for future reference, but there are many #if 0's that are totally dead and just cluttering things up. Such code should go away. A function "extern GrafPtr setport_return_orig_port(GrafPtr new_port);" would make THE_PORT_SAVE_EXCURSION macro a little simpler and smaller, especially considering what a mess `thePort' expands to. Straighten out the whole "ROMlib_hook" thing, e.g. from script.c: #if defined (BINCOMPAT) ROMlib_hook(script_notsupported); #endif /* BINCOMPAT */ maybe make romlib_stubs go away. Make a ROMlib global variable for the Point {1, 1}, which we use in many places, and perhaps for Point {CWC (1), CWC (1)}. I think with gcc these could actually be macros: #define POINT_1_1 ((Point) { 1, 1 }) #define POINT_1_1_be ((Point) { CWC (1), CWC (1) }) This is better than having a global struct because it lets gcc just use a 32-bit constant, instead of a variable in memory. The drawback is that taking pointers to this value will result in a new struct for each .o file. Work on making disasm jump the tracks less often. Write a tool that takes a disasm file as input and can tell you where decisions might have been made to get you to a certain point in the code. On the x86 we can add to a swapped 2-byte value with an addb and an adcb (we can also subtract). This works for memory locations and for %ax, %bx, %cx, %dx (various combinations of src and dest). So for a memory operand, movw;rorw;addw;rorw;movw becomes addb;adcb. Write an assembly filter to recognize and optimize this combo, and others? We should be better about using convenience functions, like SetRect, OffsetRect, RECT_WIDTH(), etc. instead of doing it by hand over and over again. Write a RECT_SET(rect, t, l, b, r) that gloms adjacent constant args together and writes them out as a single long, etc. Or just fix gcc to do this? Replace many BlockMove's with memmove's. memmove is *FAR* faster because it doesn't have to do syn68k junk to decompile code. Axe all ONLY_DESTROY_BETWEEN_CODE_SEGMENTS stuff (it's been replaced by syn68k block checksumming). Use int or long, not INTEGER, for misc. loop counters. ints are generally better on the x86 and RISC chips. Merge rsys/prefs.h and rsys/flags.h Make ROMlib_emptyvis go away. Make ROMlib_installhandle go away (?) Delete all empty, unused, and otherwise bogus header files, and all references to them. str255assign should probably call memmove instead of BlockMove (which has additional baggage). Clean up #define main oldmain cruft. Make main() small. Under DOS we could use the dpmi discard pages call to drastically cut down on paging. For example, we can call this when large blocks of memory are freed on the heap, or on our stack. Sandmann says this can be a big win. It would be easy enough to write a function like: void dispose_memory (void *start, uint32 num_bytes); That would dispose any *complete* pages in that range (we couldn't toss the partial pages on either end, obviously). We could call this routine at various times to improve paging performance. This is an especially big win because of round-robin Mac heap semantics, which are terrible from a virtual memory perspective. ---------------------------------------------------------------------- Re-indent all the sources. We need to tell indent about all of our types, or it does the wrong thing with: (StringPtr) &foo As far as indent knows, StringPtr is an expression, and the & is a binary bitwise AND, so indent changes the code to: (StringPtr) & foo If you say "indent -T StringPtr" the right thing happens. ---------------------------------------------------------------------- Answer the question, "why is Executor so big?" and see where that leads. Get no warnings when compiling with these CFLAGS: "-O -Wall -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs" Use NEXTSTEP gprof to determine a good link order to avoid paging. Or perhaps write a Perl script that can do the same thing based on gprof output. Minimize long include chains. Add `const' directives everywhere we can, even in A-line trap handlers (although there is always the question of what happens if the trap is patched out and the patched out trap modifies the value...) It would be nice if we could leverage stroked font support from windowing systems that support them. I think we just need an interface that allows us to query the server for a list of font names and also allows us to have it draw fonts in a bitmap that we can read and then store in Executor space. Modify syn68k to allow more "direct" A-line trap calls (for speed). Get rid of all CW(constant), CL(constant), and esp. Cx(constant) <-- evil! These are easily located with some __builtin_constant_p magic in those macros. In cases where the operand can be either a constant or a non-constant (as in a macro def'n containing a CW), use `CWV' or `CLV' as appropriate. Those macros decide at compile time whether to swap the input as a constant or not. Punt PascalToCCall and replace with N machine-generated functions that are hardcoded to handle a particular set of arguments. We'd only create one instantiation of each calling convention type and share it among all functions with that signature. That lets us eliminate Cliff's hack to handle traps with > 9 arguments and it's substantially faster than the current approach. Use hardcoded functions for all CToPascalCall and PascalToCCall. We'll build one instantation of each required function based on the types of traps we have *AND* also if we see any calls to call_ptoc_[call signature] in the Executor sources we'll make sure that exists as well. We won't need any more ridiculous kludges to get flags for ctop calls (like right now we sometimes just use the flags for another trap which we know has the same signature...blech!) Increase locality of reference by sorting functions by reference count (# of traps that use it), so little-used functions tend to go at the end and common ones are all lumped together? Unify syn68k and Executor byteswapping and stuff. Right now PUSHUL does slow byte swapping of constants and stuff...blech. Eliminate P8(...) etc. macros, and replace with normal C syntax (perhaps preceded with magic keywords to indicate a trap...? keywords not necessary if we do scheme-like hack, below) This will let us use standard tools like etags and be less baffling for newbies. We may be able to use this in conjunction with some tool that generates prototypes (is there a makeproto or something?) to create ptocflags, etc. Have one (scheme-like?) file that describes information about each trap, with information like: trap number, name, selector info, C function that handles it, and maybe some info about calling convention (normal pascal, or 1st arg is in a0, ret val in d0.w, whatever). Maybe also throw in an int indicating a guess for how common that trap is, for potential locality-of-reference hacks. We could have a flag that says "even if this trap is patched out, we won't call the patched out version inside ROMlib code." That way we could say that if, for example, we call BitAnd, we'll just call BitAnd and not check for a patched out version. This is a speed and space improvement, and we can do it for some traps because any program that relies on us making a particular sequence of trap calls from within another trap will lose anyway. We'd clearly want to let them patch out "big things", but stuff like SetRect, etc. we should just be able to call directly. I think Apple has introduced the notion of traps that cannot be patched out, too. We could also mark traps that say "this trap cannot be patched out" (as a speed hack) and then complain if the program attempts to patch it out. We could programmatically generate a packed bit array for all traps indicating whether or not it can be patched out. We can avoid annotating the C code for traps with P8, etc, and instead use this list to decide which C functions should be examined and processed. If we note information about calling conventions, we may be able to machine-generate stubs that call the real C handler with the appropriate args plucked from syn68k space, or even generate scheme-like code to allow syn68k or vcpu to *directly* call the C handler (when not patched out). We could even annotate the scheme entry for each trap with information about what logging actions should be taken for that trap when in debugging mode; machine-generated glue that sits between the trap invocation and the C function can handle the tracing appropriately. Right now we handle patched out traps in a slow and voluminous way, with a large ?: expression for each C call to anything which is a trap (see rsys/stubify.h). Instead, we could have a big table of function pointers, which by default point to our C handler, but alternately point to machine-generated stubs that call the patched out code. The stubs can even be placed in a library (.a) so only the stubs we possibly call internally will take any space in the executable. One way to avoid having a separate stub for each trap which we can call is to have the function pointer in the table take an extra uint32 argument at the end. This argument would be ignored by the "real" handler (C calling convention ignores extra args), but the c-to-p stub could use that uint32 as the address for the patched out code to call. So we'd need one c-to-p stub for each trap call signature (arg types + return type), rather than for each trap. The problem with this approach is apps who slime the trap tables directly, without calling SetTrapAddress(). Do people do this? We don't handle it now because our tables are in the wrong byte order, and (I think) not at the right address (Quark examines address 0x1008). We could catch this sliminess by write-protecting the trap tables...hmmm. Even if we don't do that, we could still have stubs which check to see if the trap is patched out, and if not then just call the real routine. That would allow us to punt the space taken by the ?: for every single invocation of the trap. We may also be able to do other slimy things, like have legitimate C function pointers in the trap tables, and tell syn68k to compile jumps to each of those addresses specially or something (toss those addresses in its hash table). Byte swapping those function pointers confuses the issue though, since a swapped C function address may be legitimate 68k code... Have a scheme-like file describing low globals. It would describe their name, type, address, and even initial contents on bootup, launch, whatever. Some variables of course will have initial contents too difficult to describe here...that's fine, just note that and let them get set up at runtime with special case C code. At compile time we can figure out what low memory should look like, and PackBits that into a C array which gets unpacked at runtime when we initialize mem. We can also put in trace information that describes what should be done when these variables are accessed under a debugging vcpu or under a debugging Executor with page 0 read/write trapped for logging purposes. This file could generate the appropriate lowglobals.s for a given host, and even make a .gdbinit that could print them (or "dump" them) in the appropriate byte-swapped format. Implement code to allow P_ExitToShell (etc.) to be compile-time constants, and ask syn68k to put the callback at that address in particular. It should be possible to machine-generate this information. That way we can CLC() them, etc. ---------------------------------------------------------------------- Implement a "package" interface for things like vdriver, events, etc. This would allow us to: 1) Support multiple vdrivers, etc. simultaneously. Right now we have separate svgalib and x windows Linux Executors for no good reason. 2) Dynamically load packages we need, and leave unused packages on disk. 3) Release updates for things like sound, video, etc. without an entire Executor binary. 4) Distribute the source to various packages (like the svgalib front end). 5) Make it easier for new engineers to implement ports to different systems. Once they implement the API, Executor should work. 6) Load initialization routines that won't be hit again into a separate area of memory, for performance reasons. Each package class will have a unique name (e.g. "vdriver"), and each subclass will also have a unique name (e.g. "vga"). All global symbols for that package will be preceded with "vdriver_vga_", etc. All code outside those packages will just be able to call "vdriver_shutdown". In reality that will be a macro for some entry of a function pointer struct (or array?) corresponding to that package. The real entry might point to "vdriver_vga_shutdown". HOWEVER, if only one package is compiled in for that particular category (e.g. we only build in "arch_i386_") then those macros will point directly to that package and no function pointer table will exist. We'll also need to do something about globally visible symbols, e.g. "vdriver_width". There are a few ways to handle those. Probably they should just be tossed in a struct for that package. We can support subclassing (useful for different types of vga, e.g. svgalib and VESA). I think we should also facilitate NOPs by allowing packages to request a magic function pointer for a function that returns 0 and one that returns 1...we'll just share one instantiation of each of those functions. With multiple packages being allowed, we can't use preprocessor conditionals in the same way we do now. For example, right now we have VDRIVER_DISPLAYED_IN_WINDOW; this will be #defined for X but not for svgalib...in the future we want both packages to be simultaneously enabled. I think the way to handle this may be to have preprocessor conditionals that say VDRIVER_{ANY,ALL,NONE}_DISPLAYED_IN_WINDOW, based on what subset of vdriver packages installed support that feature. That way we'd know we don't need the "far pointer" special case code for either svgalib or X windows, but we do need it for VESA, etc. Each package should be able to report its dependencies so we initialize the packages in the proper order. Executor should have a list of required packages, like `arch', `os', `vdriver', `events', so we can tell at compile time if a configuration is not legitimate. I think we want to use the package interface for things like block devices, so we can have ASPI, BIOS, mscdex, HFV all be separate packages. This is a little different than the previously described model, since we might have several instantiations of this package present at once. How do we handle this gracefully? If each package is a struct full of function pointers and global variables, we can just keep a pointer to the struct around, and call functions in that struct. Or, for special packages with multiple simultaneous instantiations, we can make the first arg to most calls of that package be a pointer to one of those structs with the function pointers, e.g. "blockdev_read (dev_info, buf, 0x200);" or whatever, which would be a macro for "dev_info->read (buf, 0x200)". I dunno. This is a bit confusing, since the natural approach involves having distinct structs for each device (e.g. two HFV's have two different structs) even though they use the same package to be processed. This would mean the structs couldn't be exactly the same as the generic package struct. Maybe that's not so bad. Each package must provide: [package]_init(allow args, or should all be void?) [package]_shutdown (void) I'm also toying with the idea of a stub function that tells whether the rest of the package should be dynamically linked into the executable. For example, an X11 package could see if X is running, and if not we wouldn't bother loading up the code for the rest of the package. Should we just use ObjC or C++ for packages? I'm concerned about the portability implications, although maybe ObjC is everywhere gcc is these days. ---------------------------------------------------------------------- ---------------------------------------------------------------------- [Note: I hacked up some of this stuff in ~mat/x86fetch.c] A common operation right now is to deference a handle pointing to a struct, grab some bytes in that handle, and swap them (an Hx operation). We could make Executor smaller (and faster on i486's) by having N machine-generated asm routines which do this for us...we'd have to be more consistent about using Hx and HxX then. The chosen asm routines could even use bswap or rorw;rorl;rorw depending on whether they were on an i486 or an i386. We can use gcc's register calling convention directive so calls to these routine are fast; however, it may be even better to just make these be inline asm `call' statements and make sure they smash no registers at all, so gcc doesn't have to save any regs when calling the subroutine. Question: should the return value come back in a different register than the input value? To be more specific: we'll have a slew of tiny routines which take as an argument a handle in %eax, and each is hardcoded to return a byte, word or long offset N bytes from that dereferenced handle. Each routine can have an alternate entry point right before the main function; this entry point would also swap the handle first (useful when it was snarfed from a low global or whatever). So anyway, one of these routines might look like: .align 4 ; .align 16 under Linux, bleah _fetch_swaph_offset12_swap32: bswap %eax ; swap handle itself _fetch_offset12_swap32: movl (%eax),%eax ; dereference handle bswap %eax ; put that in native order movl 12(%eax),%eax ; fetch a long from that struct bswap %eax ; byte swap it ret ; and return and on the i386 it would be generated as the following routine. The first `jmp' is necessary so there are still two bytes between the two entry points (bswap takes 2 bytes). .align 4 ; .align 16 under Linux, bleah _fetch_swaph_offset12_swap32: jmp 2f ; need to do it this way, only got 2 bytes _fetch_offset12_swap32: movl (%eax),%eax ; dereference handle call 1f ; byte swap it, who cares about i386 speed ; this hack keeps maximum routine size down ; although if these are aligned % 16 ; and we have room, we can just inline ; this byte swap. Right now this takes ; 31 bytes, so there's no room. movl 12(%eax),%eax ; fetch a long from that struct 1: rorw $8,%ax ; byte swap it rorl $16,%eax rorw $8,%ax ret ; and return 2: pushl $_fetch_offset12_swap32 ; fake ret address jmp 1b Actually this is better done with xchgb on the 80386. These functions would start out as the following routine with the same entry points, and get lazily compiled to either the i386 or the i486 version: .align 4 _fetch_swaph_offset12_swap32: jmp 2f _fetch_offset12_swap32: pushl $_fetch_offset12_swap32 ; fake `ret' address 1: pushl $12 ; offset pushl $4 ; value size pushl $_fetch_swaph_offset12_swap32 ; start of code to rewrite ; WARNING!!!!!!!!!!!!!! If we do the `asm' hack to call the fetch routine, ; so gcc doesn't save as many regs, we need to make SURE that ; setup_x86_fetch_routine saves ALL registers, perhaps with pushal/popal. ; we can't do that here since we're about to smash our own code. ; We can call a stub routine which does a pushal/popal around a call ; to C code to fill in the stub. jmp _setup_x86_fetch_routine 2: pushl $_fetch_swaph_offset12_swap32 ; fake `ret' address jmp 1b Here's one way to write the glue routine that preserves regs and calls a routine to create the real routine. Maybe the entire routine to lazily compile the translator should be written in assembly...it basically just slams out a few bytes and fills in some holes with the supplied operands. _setup_x86_fetch_routine: pushal ; save all regs pushl 32(%esp) ; copy args to stack pushl 40(%esp) pushl 48(%esp) call _create_x86_fetch_routine addl $12,%esp ; pop off old args popal ; restore regs addl $12,$esp ; pop off old args ret We could also have a routine that writes a value to such a dereferenced location. We could put these routines in a library so we only link in the ones we need. Their initial code could set itself up appropriately for the i386 or i486 (self-modify). This way we only use the ones we need. I'm not sure how we'd tell gcc to call different functions based on offsetof() and sizeof() the field being dereferenced...tricky. Actually if we use asm() directive to call this we might just be able to make offsetof and sizeof be operands to the asm and have them get textually glommed into the called routine name. Hehheh. We could also easily machine-generate access macros for various structs, I guess. ---------------------------------------------------------------------- Using special macros which call subroutines to access and byte swap struct fields should help us keep Executor's size down on architectures with strict alignment constraints. Otherwise each struct field reference may involve more inline code to load individual bytes and munge them together. Instead we can store that code in a few subroutines. Machine-generate accessor macros for structs. Not sure how useful this is, and it would take some effort to preserve our naming conventions. OTOH, we'd get consistent naming conventions when done, and if we ONLY used accessor macros, we'd be closer to compiling on non-gcc compilers. The code generated to access a struct field would be *(long *)((char *)&foo + 14) or whatever...it could do clever things when there are alignment problems on the host. What would happen if we postprocessed our assembly and extracted all sequences of 6 insns or more that show up 4 or more times into a subroutine, and have everyone just call that? (Be careful with anything that touches the stack, obviously). We could add magic asm directives to mark time-critical code when we shouldn't do this. And we could juke those shared subroutines to self-modify to use bswap when necessary...hmmm... Use hardware acceleration for drawing lines whenever we know the accelerated line will be pixel for pixel correct. Experiment with using pushl in blitter again. This used to lose when interrupts came in and used the screen for stack space, but that may have been before we started using V2 for everything (???)...DPMI might never use our stack for anything (false under WinNT?) replace all `NewHandle (...); memset (..., 0, ...); with `NewHandleClear (...)' Use NewPtrSys() to rewrite code like this (from fileAccess.c): saveZone = TheZone; TheZone = SysZone; cachedir = (char *) NewPtr(dirnamelen+1+MAXNAMLEN+1); TheZone = saveZone; NewPtrSysClear() for this code from main.c, etc.: TheZone = SysZone; UTableBase = (DCtlHandlePtr) (long) CL (NewPtr (sizeof (UTableBase[0].p) * NDEVICES)); memset (CL (UTableBase), 0, sizeof (UTableBase[0].p) * NDEVICES); UnitNtryCnt = CW (NDEVICES); TheZone = ApplZone; modify qPicstuff picture drawing engine to run in `debug' mode and output readable text for the ops/data in the picture. Use new consistent syntax for byte swapping and handle dereferencing operations, with chains of operations concatenated right-to-left ala LISP's CADDR, etc. So: W = swap 16 bit value and cast back to typeof. L = swap 32 bit value and cast back to typeof. P = swap 32 bit "Point" value (0x12345678 -> 0x34127856). D = dereference and swap F = get struct field (takes additional arg with field name) H = dereference, swap long, and get struct field (equivalent to `FD') If the last letter is a `C', that means the input value is a compile-time constant. We might wish to preface each macro with a `C' as a standard prefix (for `convert'?) Otherwise we'd get macros like `W', which may be too short. Examples, mapping old syntax to new: CWC(n) -> CWC(n) CL(n) -> CL(n) Hx(h, field) -> CWH(h, field) or CLH(h, field), depending on size HxX(CL(MainDevice), gdPMap) -> CHL(MainDevice, gdPMap) Hx(CL(MainDevice), gdPMap) -> CLHL(MainDevice, gdPMap) Hx(Hx(CL(MainDevice), gdPMap), baseAddr) -> CLHLHL(MainDevice, gdPMap, baseAddr) Any or all of these routines can be inline assembly, if the host architecture provides them. Any not provided will be automatically provided or synthesized by nesting low level routines (e.g. CHL can be written as CH(CL(...))). See also the idea above which describes how to write Hx (==CLHL for long fields) on the x86 to take advantage of the 486 bswap and make Executor smaller. Need an OS-specific function that hints "now is a good time to yield to another process", for busy loops, WaitNextEvent, etc. Create a routine that makes a new rectangular region, and use it instead of separate code to create and then set up the region? Can the BOOLEANRET type go away now? What's it for? PackBits the data in color_wheel_bits.c, and uncompress it on the fly? A 68k interpreter that had nice debugging features: - 68k breakpoints - 68k watchpoints - reading any memory location or range of memory locations - writing any memory location or range of memory locations - values appearing in registers - single stepping - identifying recent conditional branches that got us here - backtrace facility that logs jsr's and rts's and so can backtrace even when there's no frame pointer We need a consistent naming convention for variables which hold big endian values, perhaps ending such a variable with a "_be" or something. But would this apply to Mac struct fields as well? Blah. Renaming all the Mac struct fields sounds like a bad idea. But always being conscious of endianness is probably a good thing. Perhaps just append a "_be" to variables, but not to struct fields? Hmmm. What about Mac low globals? I guess they sorta stand out because they use a different capitalization convention anyway. Use memcpy for code like this (from menu.c), or perhaps a convenience function that does a NewHandle+memcpy: h = NewHandle(hsize); sp = (char *) STARH(mh) + soff; dp = (char *) STARH(h); ep = sp + hsize; while (sp != ep) *dp++ = *sp++; Gotta be a little careful here, because if we pass STARH(mh) to the NewHandle convenience function, that pointer may be invalidated if memory moves during the real NewHandle operation. In many places we have (now) pointless casts to Size. These should go away. BlockMove((Ptr) tm, (Ptr) &tml, (Size) sizeof(tml)); Prune down common.h as much as possible. This qPicstuff.c macro is bogus: #define SE(x) ((x & 0x80) ? x|(~0^0xff) : x & 0xff) /* sign extend */ It has no parens, evaluates its argument multiple times, and could instead be "#define SE(x) ((int) (int8) (x))" If we split up the Executor source tree into directories, it might make sense to have a separate header file for accessor macros for each data type, with the form TypeName_accessors.h, e.g.: #include "Region_accessors.h" Make a RECT_MID_Y and RECT_MID_X that compute the center coordinates of a rect. Make NumToString just call sprintf. Make StringToNum call atoi, atol, or sscanf... ROMlib_setuid seems unnecessary now that we don't have the mshort hack. DJGPP has `setuid'. ROMlib_seteuid can also just be replaced with seteuid. If low globals were done with macros, like (*(INTEGER *)0x3CA) or whatever, this would actually generate better code sometimes given Mat's recent gcc mods. The modified gcc can tell that an address is long-aligned and do good things.