{ Created: Wednesday, November 1, 1989 DisAsmLookup.p Pascal Interface to the Macintosh Libraries Copyright Apple Computer, Inc. 1987-1991 All rights reserved This file is used in these builds: ROM System Change History (most recent first): <3> 8/8/91 JL Updated Copyright. <2> 3/13/91 JL Checked in MPW version. To Do: } {$IFC UNDEFINED UsingIncludes} {$SETC UsingIncludes := 0} {$ENDC} {$IFC NOT UsingIncludes} UNIT DisAsmLookup; INTERFACE {$ENDC} {$IFC UNDEFINED UsingDisAsmLookup} {$SETC UsingDisAsmLookup := 1} {$I+} {$SETC DisAsmLookupIncludes := UsingIncludes} {$SETC UsingIncludes := 1} {$IFC UNDEFINED UsingTypes} {$I $$Shell(PInterfaces)Types.p} {$ENDC} {$SETC UsingIncludes := DisAsmLookupIncludes} TYPE LookupRegs = (_A0_, _A1_, _A2_, _A3_, _A4_, _A5_, _A6_, _A7_, _PC_, _ABS_, _TRAP_, _IMM_); (*----------------------------------------------------------------------*) PROCEDURE Disassembler( DstAdjust: LongInt; {addr correction} VAR BytesUsed: Integer; {bytes used up } FirstByte: UNIV Ptr; {starting byte } VAR Opcode: UNIV Str255; {mnemonic } VAR Operand: UNIV Str255; {operand } VAR Comment: UNIV Str255; {comment } LookupProc: UNIV Ptr); {search proc } (* Disassembler is a Pascal routine to be called to disassemble a sequence of bytes. All MC68xxx, MC68881, and MC68851 instructions are supported. The sequence of bytes to be disassembled are pointed to by FirstByte. BytesUsed bytes starting at FirstByte are consumed by the disassembly, and the Opcode, Operand, and Comment strings returned as NULL TERMINATED Pascal strings (for easier manipulation with C). The caller is then free to format or use the output strings any way appropriate to the application. Depending on the opcode and effective address(s) (EA's) to be disassembled, the Opcode, Operand, and Comment strings contain the following information: Case Opcode Operand Comment ======================================================================= Non PC-relative EA's op.sz EA's ; 'cÉ' (for immediates) PC-relative EA's op.sz EA's ; address Toolbox traps DC.W $AXXX ; TB XXXX OS traps DC.W $AXXX ; OS XXXX Invalid bytes DC.W $XXXX ; ???? ======================================================================= For valid disassembly of processor instructions the appropriate MC68XXX opcode mnemonic is generated for the Opcode string along with a size attribute when required. The source and destination EA's are generated as the Operand along with a possible comment. Comments start with a ';'. Traps use a DC.W assembler directive as the Opcode with the trap word as the Operand and a comment indicating whether the trap is a toolbox or OS trap and what the trap number is. As described later the caller can generate symbolic substitutions into EA's and provide names for traps. Invalid instructions cause the string 'DC.W' to be returned in the Opcode string. Operand is '$XXXX' (the invalid word) with a comment of '; ????'. BytesUsed is 2. This is similar to the trap call case except for the comment. Note, the Operand EA's is syntatically similar to but NOT COMPATIBLE with the MPW assembler! This is because the Disassembler generates byte hex constants as "$XX" and word hex constants as "$XXXX". Negative values (e.g., $FF or $FFFF) produced by the Disassembler are treated as long word values by the MPW assembler. Thus it is assumed that Disassembler output will NOT be used as MPW assembler input. If that is the goal, then the caller must convert strings of the form $XX or $XXXX in the Operand string to their decimal equivalent. The routine ModifyOperand is provided in this unit to aid with the conversion process. Since a PC-relative comment is an address, the only address that the Disassembler knows about is the address of the code pointed to by FirstByte. Generally, that may be a buffer that has no relation to "reality", i.e., the actual code loaded into the buffer. Therefore, to allow the address comment to be mapped back to some actual address the caller may specify an adjustment factor, specified by DstAdjust, that is ADDED to the value that normally would be placed in the comment. Operand effective address strings are generated as a function of the effective address mode and a special case is made for A-trap opcode strings. In places where a possible symbolic reference could be substituted for an address (or a portion of an address), the Disassembler can call a user specified routine to do the substitution (using the LookupProc parameter described later). The following table summarizes the generated effective addresses and where symbolic substitutions (S) can be made: Mode Generated Effective Address Effective Address with Substitution ======================================================================== 0 Dn Dn 1 An An 2 (An) (An) 3 (An)+ (An)+ 4 -(An) -(An) 5 ¶(An) S(An) or just S (if An=A5, ¶³0) 6n ¶(An,Xn.Size*Scale) S(An,Xn.Size*Scale) 6n (BD,An,Xn.Size*Scale) (S,An,Xn.Size*Scale) 6n ([BD,An],Xm.Size*Scale,OD) ([S,An],Xm.Size*Scale,OD) 6n ([BD,An,Xn.Size*Scale],OD) ([S,An,Xn.Size*Scale],OD) 70 ¶ S 71 ¶ S 72 *±¶ S 73 *±¶(Xn.Size*Scale) S(Xn.Size*Scale) 73 (*±¶,Xn.Size*Scale) (S,Xn.Size*Scale) 73 ([*±¶],Xm.Size*Scale,OD) ([S],Xm.Size*Scale,OD) 73 ([*±¶,Xn.Size*Scale],OD) ([S,Xn.Size*Scale],OD) 74 #data S (#data made comment) A-traps $AXXX S (as opcode, AXXX made comment) ======================================================================== For A-traps, the substitution can be performed to substitute for the DC.W opcode string. If the substitution is made then the Disassembler will generate ,Sys and/or ,Immed flags as operands for Toolbox traps and ,AutoPop for OS traps when the bits in the trap word indicates these settings. | Generated | Substituted | Opcode Operand Comment | Opcode Operand Comment ======================================================================== Toolbox | DC.W $AXXX ; TB XXXX | S [,Sys][,Immed] ; AXXX OS | DC.W $AXXX ; OS XXXX | S [,AutoPop] ; AXXX ======================================================================== All displacements (¶, BD, OD) are hexadecimal values shown as a byte ($XX), word ($XXXX), or long ($XXXXXXXX) as appropriate. The *Scale is suppressed if 1. The Size is W or L. Note that effective address substitutions can only be made for "¶(An)", "BD,An", and "*±¶" cases. For all the effective address modes 5, 6n, 7n, and for A-traps, a coroutine (a procedure) whose address is specified by the LookupProc parameter is called by the Disassembler (if LookupProc is not NIL) to do the substitution (or A-trap comment) with a string returned by the proc. It is assumed that the proc pointed to by LookupProc is a level 1 Pascal proc declared as follows: PROCEDURE Lookup( PC: UNIV Ptr; {Addr of extension/trap word} BaseReg: LookupRegs; {Base register/lookup mode } Opnd: UNIV LongInt; {Trap word, PC addr, disp. } VAR S: Str255); {Returned substitution } or in C, pascal void LookUp(Ptr PC, /* Addr of extension/trap word */ LookupRegs BaseReg, /* Base register/lookup mode */ long Opnd, /* Trap word, PC addr, disp. */ char *S); /* Returned substitution */ PC = Pointer to instruction extension word or A-trap word in the buffer pointed to by the Disassembler's FirstByte parameter. BaseReg = This determines the meaning of the Opnd value and supplies the base register for the "¶(An)", "BD,An", and "*±¶" cases. BaseReg may contain any one of the following values: _A0_ = 0 ==> A0 _A1_ = 1 ==> A1 _A2_ = 2 ==> A2 _A3_ = 3 ==> A3 _A4_ = 4 ==> A4 _A5_ = 5 ==> A5 _A6_ = 6 ==> A6 _A7_ = 7 ==> A7 _PC_ = 8 ==> PC-relative (special case) _ABS_ = 9 ==> Abs addr (special case) _TRAP_ = 10 ==> Trap word (special case) _IMM_ = 11 ==> Immediate (special case) For absolute addressing (modes 70 and 71), BaseReg contains _ABS_. For A-traps, BaseReg would contain _TRAP_. For immediate data (mode 74), BaseReg would contain _IMM_. Opnd = The contents of this LongInt is determined by the BaseReg parameter just described. For BaseReg = _IMM_ (immediate data): Opnd contains the (extended) 32-bit immediate data specified by the instruction. For BaseReg = _TRAP_ (A-traps): Opnd is the entire trap word. The high order 16 bits of Opnd are zero. For BaseReg = _ABS_ (absolute effective address): Opnd contains the (extended) 32-bit address specifed by the instruction's effective address. Such addresses would generally be used to reference low memory globals on a Macintosh. For BaseReg = _PC_ (PC-relative effective address): Opnd contains the 32-bit address represented by "*±¶" adjusted by the Disassembler's DstAdjust parameter. For BaseReg = _An_ (effective address with a base register): Opnd contains the (sign-extended) 32-bit (base) displacement from the instruction's effective address. In the Macintosh environment, a BaseReg specifying A5 implies either global data references or Jump Table references. Positive Opnd values with an A5 BaseReg thus mean Jump Table references, while a negative offset would mean a global data reference. Base registers of A6 or A7 would usually mean local data. S = Pascal string returned from Lookup containing the effective address substitution string or a trap name for A-traps. S is set to null PRIOR to calling Lookup. If it is still null on return, the string is not used. If not null, then for A-traps, the returned string is used as the opcode string. In all other cases the string is substituted as shown in the above table. Depending on the application, the caller has three choices on how to use the Disassembler and an associated Lookup proc: (1). The caller can call just the Disassembler and provide his own Lookup proc. In that case the calling conventions discussed above must be followed. (2). The caller can provide NIL for the LookupProc parameter, in which case, NO Lookup proc will be called. (3). The caller can call first InitLookup (described below, a proc provided with this unit) and pass the address of this unit's standard Lookup proc when Disassembler is called. In this case all the control logic to determine the kind of substitution to be done is provided for the caller and all that need to be provided by the user are routines to look up any or all of the following: ¥ PC-relative references ¥ Jump Table references ¥ Absolute address references ¥ Trap names ¥ Immediate data names ¥ References with offsets from base registers *) PROCEDURE InitLookup(PCRelProc, JTOffProc, TrapProc, AbsAddrProc, IdProc, ImmDataProc: UNIV Ptr); {Prepare for use of this unit's Lookup proc. When Disassembler is called and the address of this unit's Lookup proc is specified, then for immediate data, PC-relative, Jump Table references, A-traps, absolute addresses, and offsets from a base register, the associated level 1 Pascal proc specified here is called (if not NIL -- all six addresses are preset to NIL). The calls assume the following declarations for these procs (see Lookup, below for further details): PROCEDURE PCRelProc(Address: UNIV LongInt; VAR S: UNIV Str255); PROCEDURE JTOffProc(A5JTOffset: UNIV Integer; VAR S: UNIV Str255); PROCEDURE TrapNameProc(TrapWord: UNIV Integer; VAR S: UNIV Str255); PROCEDURE AbsAddrProc(AbsAddr: UNIV LongInt; VAR S: UNIV Str255); PROCEDURE IdProc(BaseReg: LookupRegs; Offset: UNIV LongInt; VAR S: UNIV Str255); PROCEDURE ImmDataProc(ImmData: UNIV LongInt; VAR S: UNIV Str255); Note: InitLookup contains initialized data which requires initializing at load time (this is of concern only to users with assembler main programs.} PROCEDURE Lookup( PC: UNIV Ptr; {Addr of extension/trap word} BaseReg: LookupRegs; {Base register/lookup mode } Opnd: UNIV LongInt; {Trap word, PC addr, disp. } VAR S: Str255); {Returned substitution } {This is a standard Lookup proc available to the caller for calls to the Disassembler. If the caller elects to use this proc, then InitLookup MUST be called prior to any calls to the Disassembler. All the logic to determine the type of lookup is done by this proc. For PC-relative, Jump Table references, A-traps, absolute addresses, and offsets from a base register, the associated level 1 Pascal proc specified in the InitLookup call (if not NIL) is called. This scheme simplifies the Lookup mechanism by allowing the caller to deal with just the problems related to the application.} PROCEDURE LookupTrapName(TrapWord: UNIV Integer; VAR S: UNIV Str255); {This is a procedure provided to allow conversion of a trap instruction (in TrapWord) to its corresponding trap name (in S). It is provided primarily for use with the Disassembler and its address may be passed to InitLookup above for use by this unit's Lookup routine. Alternatively, there is nothing prohibiting the caller from using it directly for other purposes or by some other Lookup proc. Note: The tables in this proc make the size of this proc about 9500 bytes. The trap names are fully spelled out in upper and lower case.} PROCEDURE ModifyOperand(VAR Operand: UNIV Str255); {Scan an operand string, i.e., the null terminated Pascal string returned by the Disassembler (null MUST be present here) and modify negative hex values to negated positive value. For example, $FFFF(A5) would be modified to -$0001(A5). The operand to be processed is passed as the function's parameter which is edited "in place" and returned to the caller. This routine is essentially a pattern matcher and attempts to only modify 2, 4, and 8 digit hex strings in the operand that "might" be offsets from a base register. If the matching tests are passed, the same number of original digits are output (because that indicates a value's size -- byte, word, or long). For a hex string to be modified, the following tests must be passed: ¥ There must have been exactly 2, 4, or 8 digits. Only hex strings $XX, $XXXX, and $XXXXXXXX are possible candidates because that is the only way the Disassembler generates offsets. ¥ Hex string must be delimited by a "(" or a ",". The "(" allows offsets for $XXXX(An,...) and $XX(An,Xn) addressing modes. The comma allows for the MC68020 addressing forms. ¥ The "$X..." must NOT be preceded by a "±". This eliminates the possibility of modifying the offset of a PC-relative addressing mode always generated in the form "*±$XXXX". ¥ The "$X..." must NOT be preceded by a "#". This eliminates modifying immediate data. ¥ Value must be negative. Negative values are the only values we modify. A value $FFFF is modified to -$0001.} FUNCTION validMacsBugSymbol(symStart, limit: UNIV Ptr; symbol: StringPtr): StringPtr; C; {Check that the bytes pointed to by symStart represents a valid MacsBug symbol. The symbol must be fully contained in the bytes starting at symStart, up to, but not including, the byte pointed to by the limit parameter. If a valid symbol is NOT found, then NIL is returned as the function's result. However, if a valid symbol is found, it is copied to symbol (if it is not NIL) as a null terminated Pascal string, and return a pointer to where we think the FOLLOWING module begins. In the "old style" cases (see below) this will always be 8 or 16 bytes after the input symStart. For new style Apple Pascal and C cases this will depend on the symbol length, existence of a pad byte, and size of the constant (literal) area. In all cases, trailing blanks are removed from the symbol. A valid MacsBug symbol consists of the characters '_', '%', spaces, digits, and upper/lower case letters in a format determined by the first two bytes of the symbol as follows: 1st byte | 2nd byte | Byte | Range | Range | Length | Comments ======================================================================= $20 - $7F | $20 - $7F | 8 | "Old style" MacsBug symbol format $A0 - $7F | $20 - $7F | 8 | "Old style" MacsBug symbol format ----------------------------------------------------------------------- $20 - $7F | $80 - $FF | 16 | "Old style" MacApp symbol ab==>b.a $A0 - $7F | $80 - $FF | 16 | "Old style" MacApp symbol ab==>b.a ----------------------------------------------------------------------- $80 | $01 - $FF | n | n = 2nd byte (Apple symbol) $81 - $9F | $00 - $FF | m | m = BAnd(1st byte,$7F) (Apple symbol) ======================================================================= The formats are determined by whether bit 7 is set in the first and second bytes. This bit will removed when we find it or'ed into the first and/or second valid symbol characters. The first two formats in the above table are the basic "old style" (pre- existing) MacsBug formats. The first byte may or may not have bit 7 set the second byte is a valid symbol character. The first byte (with bit 7 removed) and the next 7 bytes are assumed to comprise the symbol. The second pair of formats are also "old style" formats, but used for MacApp symbols. Bit 7 set in the second character indicates these formats. The symbol is assumed to be 16 bytes with the second 8 bytes preceding the first 8 bytes in the generated symbol. For example, 12345678abcdefgh represents the symbol abcdefgh.12345678. The last pair of formats are reserved by Apple and generated by the MPW Pascal and C compilers. In these cases the value of the first byte is always between $80 and $9F, or with bit 7 removed, between $00 and $1F. For $00, the second byte is the length of the symbol with that many bytes following the second byte (thus a max length of 255). Values $01 to $1F represent the length itself. A pad byte may follow these variable length cases if the symbol does not end on a word boundary. Following the symbol and the possible pad byte is a word containing the size of the constants (literals) generated by the compiler. Note that if symStart actually does point to a valid MacsBug symbol, then you may use showMacsBugSymbol to convert the MacsBug symbol bytes to a string that could be used as a DC.B operand for disassembly purposes. This string explicitly shows the MacsBug symbol encodings.} FUNCTION endOfModule(address, limit: UNIV Ptr; symbol: StringPtr; VAR nextModule: UNIV Ptr): StringPtr; C; {Check to see if the specified memory address, contains a RTS, JMP (A0) or RTD #n instruction immediately followed by a valid MacsBug symbol. These sequences are the only ones which can determine an end of module when MacsBug symbols are present. During the check, the instruction and its following MacsBug symbol must be fully contained in the bytes starting at the specified address parameter, up to, but not including, the byte pointed to by the limit parameter. If the end of module is NOT found, then NIL is returned as the function's result. However, if a end of module is found, the MacsBug symbol is returned in symbol (if it is not NIL) as a null terminated Pascal string (with trailing blanks removed), and the functions returns the pointer to the start of the MacsBug symbol (i.e., address+2 for RTS or JMP (A0) and address+4 for RTD #n). This address may then be used as in input parameter to showMacsBugSymbol to convert the MacsBug symbol to a Disassembler operand string. Also returned in nextModule is where we think the FOLLOWING module begins. In the "old style" cases (see validMacsBugSymbol) this will always be 8 or 16 bytes after the input address. For new style the Apple Pascal and C cases this will depend on the symbol length, existence of a pad byte, and size of the constant (literal) area. See validMacsBugSymbol for a description of valid MacsBug symbol formats.} FUNCTION showMacsBugSymbol(symStart, limit: UNIV Ptr; operand: StringPtr; VAR bytesUsed: Integer): StringPtr; C; {Format a MacsBug symbol as a operand of a DC.B directive. The first one or two bytes of the symbol are generated as $80+'c' if they have there high high bits set. All other characters are shown as characters in a string constant. The pad byte, if present, is one is also shown as $00. When called, showMacsBugSymbol assumes that symStart is pointing at a valid MacsBug symbol as validated by the validMacsBugSymbol or endOfModule routines. As with validMacsBugSymbol, the symbol must be fully contained in the bytes starting at symStart up to, but not including, the byte pointed to by the limit parameter. The string is returned in the 'operand' parameter as a null terminated Pascal string. The function also returns a pointer to this string as its return value (NIL is returned only if the byte pointed to by the limit parameter is reached prior to processing the entire symbol -- which should not happen if properly validated). The number of bytes used for the symbol is returned in bytesUsed. Due to the way MacsBug symbols are encoded, bytesUsed may not necessarily be the same as the length of the operand string. A valid MacsBug symbol consists of the characters '_', '%', spaces, digits, and upper/lower case letters in a format determined by the first two bytes of the symbol as described in the validMacsBugSymbol routine.} {$ENDC} { UsingDisAsmLookup } {$IFC NOT UsingIncludes} END. {$ENDC}