supermario/base/SuperMarioProj.1994-02-09/Interfaces/PInterfaces/DisAsmLookUp.p
2019-06-29 23:17:50 +08:00

490 lines
23 KiB
OpenEdge ABL

{
Created: Wednesday, November 1, 1989
DisAsmLookup.p
Pascal Interface to the Macintosh Libraries
Copyright Apple Computer, Inc. 1987-1991
All rights reserved
This file is used in these builds: ROM System
Change History (most recent first):
<3> 8/8/91 JL Updated Copyright.
<2> 3/13/91 JL Checked in MPW version.
To Do:
}
{$IFC UNDEFINED UsingIncludes}
{$SETC UsingIncludes := 0}
{$ENDC}
{$IFC NOT UsingIncludes}
UNIT DisAsmLookup;
INTERFACE
{$ENDC}
{$IFC UNDEFINED UsingDisAsmLookup}
{$SETC UsingDisAsmLookup := 1}
{$I+}
{$SETC DisAsmLookupIncludes := UsingIncludes}
{$SETC UsingIncludes := 1}
{$IFC UNDEFINED UsingTypes}
{$I $$Shell(PInterfaces)Types.p}
{$ENDC}
{$SETC UsingIncludes := DisAsmLookupIncludes}
TYPE
LookupRegs = (_A0_, _A1_, _A2_, _A3_, _A4_, _A5_, _A6_, _A7_,
_PC_, _ABS_, _TRAP_, _IMM_);
(*----------------------------------------------------------------------*)
PROCEDURE Disassembler( DstAdjust: LongInt; {addr correction}
VAR BytesUsed: Integer; {bytes used up }
FirstByte: UNIV Ptr; {starting byte }
VAR Opcode: UNIV Str255; {mnemonic }
VAR Operand: UNIV Str255; {operand }
VAR Comment: UNIV Str255; {comment }
LookupProc: UNIV Ptr); {search proc }
(*
Disassembler is a Pascal routine to be called to disassemble a sequence
of bytes. All MC68xxx, MC68881, and MC68851 instructions are supported.
The sequence of bytes to be disassembled are pointed to by FirstByte.
BytesUsed bytes starting at FirstByte are consumed by the disassembly,
and the Opcode, Operand, and Comment strings returned as NULL TERMINATED
Pascal strings (for easier manipulation with C). The caller is then free
to format or use the output strings any way appropriate to the
application.
Depending on the opcode and effective address(s) (EA's) to be
disassembled, the Opcode, Operand, and Comment strings contain the
following information:
Case Opcode Operand Comment
=======================================================================
Non PC-relative EA's op.sz EA's ; 'c…' (for immediates)
PC-relative EA's op.sz EA's ; address
Toolbox traps DC.W $AXXX ; TB XXXX
OS traps DC.W $AXXX ; OS XXXX
Invalid bytes DC.W $XXXX ; ????
=======================================================================
For valid disassembly of processor instructions the appropriate MC68XXX
opcode mnemonic is generated for the Opcode string along with a size
attribute when required. The source and destination EA's are generated
as the Operand along with a possible comment. Comments start with a ';'.
Traps use a DC.W assembler directive as the Opcode with the trap word
as the Operand and a comment indicating whether the trap is a toolbox or
OS trap and what the trap number is. As described later the caller can
generate symbolic substitutions into EA's and provide names for traps.
Invalid instructions cause the string 'DC.W' to be returned in the
Opcode string. Operand is '$XXXX' (the invalid word) with a comment of
'; ????'. BytesUsed is 2. This is similar to the trap call case except
for the comment.
Note, the Operand EA's is syntatically similar to but NOT COMPATIBLE
with the MPW assembler! This is because the Disassembler generates
byte hex constants as "$XX" and word hex constants as "$XXXX". Negative
values (e.g., $FF or $FFFF) produced by the Disassembler are treated as
long word values by the MPW assembler. Thus it is assumed that
Disassembler output will NOT be used as MPW assembler input. If that is
the goal, then the caller must convert strings of the form $XX or $XXXX
in the Operand string to their decimal equivalent. The routine
ModifyOperand is provided in this unit to aid with the conversion
process.
Since a PC-relative comment is an address, the only address that the
Disassembler knows about is the address of the code pointed to by
FirstByte. Generally, that may be a buffer that has no relation to
"reality", i.e., the actual code loaded into the buffer. Therefore,
to allow the address comment to be mapped back to some actual address
the caller may specify an adjustment factor, specified by DstAdjust,
that is ADDED to the value that normally would be placed in the
comment.
Operand effective address strings are generated as a function of the
effective address mode and a special case is made for A-trap opcode
strings. In places where a possible symbolic reference could be
substituted for an address (or a portion of an address), the Disassembler
can call a user specified routine to do the substitution (using the
LookupProc parameter described later). The following table summarizes
the generated effective addresses and where symbolic substitutions (S)
can be made:
Mode Generated Effective Address Effective Address with Substitution
========================================================================
0 Dn Dn
1 An An
2 (An) (An)
3 (An)+ (An)+
4 -(An) -(An)
5 (An) S(An) or just S (if An=A5, 0)
6n (An,Xn.Size*Scale) S(An,Xn.Size*Scale)
6n (BD,An,Xn.Size*Scale) (S,An,Xn.Size*Scale)
6n ([BD,An],Xm.Size*Scale,OD) ([S,An],Xm.Size*Scale,OD)
6n ([BD,An,Xn.Size*Scale],OD) ([S,An,Xn.Size*Scale],OD)
70 S
71 S
72 *± S
73 *±(Xn.Size*Scale) S(Xn.Size*Scale)
73 (*±,Xn.Size*Scale) (S,Xn.Size*Scale)
73 ([*±],Xm.Size*Scale,OD) ([S],Xm.Size*Scale,OD)
73 ([*±,Xn.Size*Scale],OD) ([S,Xn.Size*Scale],OD)
74 #data S (#data made comment)
A-traps $AXXX S (as opcode, AXXX made comment)
========================================================================
For A-traps, the substitution can be performed to substitute for the DC.W
opcode string. If the substitution is made then the Disassembler will
generate ,Sys and/or ,Immed flags as operands for Toolbox traps and
,AutoPop for OS traps when the bits in the trap word indicates these
settings.
| Generated | Substituted
| Opcode Operand Comment | Opcode Operand Comment
========================================================================
Toolbox | DC.W $AXXX ; TB XXXX | S [,Sys][,Immed] ; AXXX
OS | DC.W $AXXX ; OS XXXX | S [,AutoPop] ; AXXX
========================================================================
All displacements (, BD, OD) are hexadecimal values shown as a byte
($XX), word ($XXXX), or long ($XXXXXXXX) as appropriate. The *Scale is
suppressed if 1. The Size is W or L. Note that effective address
substitutions can only be made for "∂(An)", "BD,An", and "*±∂" cases.
For all the effective address modes 5, 6n, 7n, and for A-traps, a
coroutine (a procedure) whose address is specified by the LookupProc
parameter is called by the Disassembler (if LookupProc is not NIL) to
do the substitution (or A-trap comment) with a string returned by the
proc. It is assumed that the proc pointed to by LookupProc is a level 1
Pascal proc declared as follows:
PROCEDURE Lookup( PC: UNIV Ptr; {Addr of extension/trap word}
BaseReg: LookupRegs; {Base register/lookup mode }
Opnd: UNIV LongInt; {Trap word, PC addr, disp. }
VAR S: Str255); {Returned substitution }
or in C,
pascal void LookUp(Ptr PC, /* Addr of extension/trap word */
LookupRegs BaseReg, /* Base register/lookup mode */
long Opnd, /* Trap word, PC addr, disp. */
char *S); /* Returned substitution */
PC = Pointer to instruction extension word or A-trap word in the
buffer pointed to by the Disassembler's FirstByte parameter.
BaseReg = This determines the meaning of the Opnd value and supplies
the base register for the "∂(An)", "BD,An", and "*±∂" cases.
BaseReg may contain any one of the following values:
_A0_ = 0 ==> A0
_A1_ = 1 ==> A1
_A2_ = 2 ==> A2
_A3_ = 3 ==> A3
_A4_ = 4 ==> A4
_A5_ = 5 ==> A5
_A6_ = 6 ==> A6
_A7_ = 7 ==> A7
_PC_ = 8 ==> PC-relative (special case)
_ABS_ = 9 ==> Abs addr (special case)
_TRAP_ = 10 ==> Trap word (special case)
_IMM_ = 11 ==> Immediate (special case)
For absolute addressing (modes 70 and 71), BaseReg contains
_ABS_. For A-traps, BaseReg would contain _TRAP_. For
immediate data (mode 74), BaseReg would contain _IMM_.
Opnd = The contents of this LongInt is determined by the BaseReg
parameter just described.
For BaseReg = _IMM_ (immediate data):
Opnd contains the (extended) 32-bit immediate data specified
by the instruction.
For BaseReg = _TRAP_ (A-traps):
Opnd is the entire trap word. The high order 16 bits of
Opnd are zero.
For BaseReg = _ABS_ (absolute effective address):
Opnd contains the (extended) 32-bit address specifed by
the instruction's effective address. Such addresses would
generally be used to reference low memory globals on a
Macintosh.
For BaseReg = _PC_ (PC-relative effective address):
Opnd contains the 32-bit address represented by "*±∂"
adjusted by the Disassembler's DstAdjust parameter.
For BaseReg = _An_ (effective address with a base register):
Opnd contains the (sign-extended) 32-bit (base)
displacement from the instruction's effective address.
In the Macintosh environment, a BaseReg specifying A5
implies either global data references or Jump Table
references. Positive Opnd values with an A5 BaseReg thus
mean Jump Table references, while a negative offset would
mean a global data reference. Base registers of A6 or A7
would usually mean local data.
S = Pascal string returned from Lookup containing the effective
address substitution string or a trap name for A-traps. S is
set to null PRIOR to calling Lookup. If it is still null on
return, the string is not used. If not null, then for A-traps,
the returned string is used as the opcode string. In all other
cases the string is substituted as shown in the above table.
Depending on the application, the caller has three choices on how to
use the Disassembler and an associated Lookup proc:
(1). The caller can call just the Disassembler and provide his own Lookup
proc. In that case the calling conventions discussed above must be
followed.
(2). The caller can provide NIL for the LookupProc parameter, in which
case, NO Lookup proc will be called.
(3). The caller can call first InitLookup (described below, a proc
provided with this unit) and pass the address of this unit's
standard Lookup proc when Disassembler is called. In this case all
the control logic to determine the kind of substitution to be done
is provided for the caller and all that need to be provided by the
user are routines to look up any or all of the following:
• PC-relative references
• Jump Table references
• Absolute address references
• Trap names
• Immediate data names
• References with offsets from base registers *)
PROCEDURE InitLookup(PCRelProc, JTOffProc, TrapProc, AbsAddrProc, IdProc, ImmDataProc: UNIV Ptr);
{Prepare for use of this unit's Lookup proc. When Disassembler is called
and the address of this unit's Lookup proc is specified, then for immediate
data, PC-relative, Jump Table references, A-traps, absolute addresses, and
offsets from a base register, the associated level 1 Pascal proc
specified here is called (if not NIL -- all six addresses are preset to
NIL). The calls assume the following declarations for these procs (see
Lookup, below for further details):
PROCEDURE PCRelProc(Address: UNIV LongInt;
VAR S: UNIV Str255);
PROCEDURE JTOffProc(A5JTOffset: UNIV Integer;
VAR S: UNIV Str255);
PROCEDURE TrapNameProc(TrapWord: UNIV Integer;
VAR S: UNIV Str255);
PROCEDURE AbsAddrProc(AbsAddr: UNIV LongInt;
VAR S: UNIV Str255);
PROCEDURE IdProc(BaseReg: LookupRegs;
Offset: UNIV LongInt;
VAR S: UNIV Str255);
PROCEDURE ImmDataProc(ImmData: UNIV LongInt;
VAR S: UNIV Str255);
Note: InitLookup contains initialized data which requires initializing
at load time (this is of concern only to users with assembler
main programs.}
PROCEDURE Lookup( PC: UNIV Ptr; {Addr of extension/trap word}
BaseReg: LookupRegs; {Base register/lookup mode }
Opnd: UNIV LongInt; {Trap word, PC addr, disp. }
VAR S: Str255); {Returned substitution }
{This is a standard Lookup proc available to the caller for calls to the
Disassembler. If the caller elects to use this proc, then InitLookup
MUST be called prior to any calls to the Disassembler. All the logic
to determine the type of lookup is done by this proc. For PC-relative,
Jump Table references, A-traps, absolute addresses, and offsets from a
base register, the associated level 1 Pascal proc specified in the
InitLookup call (if not NIL) is called.
This scheme simplifies the Lookup mechanism by allowing the caller
to deal with just the problems related to the application.}
PROCEDURE LookupTrapName(TrapWord: UNIV Integer;
VAR S: UNIV Str255);
{This is a procedure provided to allow conversion of a trap instruction
(in TrapWord) to its corresponding trap name (in S). It is provided
primarily for use with the Disassembler and its address may be passed to
InitLookup above for use by this unit's Lookup routine. Alternatively,
there is nothing prohibiting the caller from using it directly for other
purposes or by some other Lookup proc.
Note: The tables in this proc make the size of this proc about 9500
bytes. The trap names are fully spelled out in upper and lower
case.}
PROCEDURE ModifyOperand(VAR Operand: UNIV Str255);
{Scan an operand string, i.e., the null terminated Pascal string returned
by the Disassembler (null MUST be present here) and modify negative hex
values to negated positive value. For example, $FFFF(A5) would be
modified to -$0001(A5). The operand to be processed is passed as the
function's parameter which is edited "in place" and returned to the
caller.
This routine is essentially a pattern matcher and attempts to only
modify 2, 4, and 8 digit hex strings in the operand that "might" be
offsets from a base register. If the matching tests are passed, the
same number of original digits are output (because that indicates a
value's size -- byte, word, or long).
For a hex string to be modified, the following tests must be passed:
There must have been exactly 2, 4, or 8 digits.
Only hex strings $XX, $XXXX, and $XXXXXXXX are possible candidates
because that is the only way the Disassembler generates offsets.
Hex string must be delimited by a "(" or a ",".
The "(" allows offsets for $XXXX(An,...) and $XX(An,Xn) addressing
modes. The comma allows for the MC68020 addressing forms.
The "$X..." must NOT be preceded by a "±".
This eliminates the possibility of modifying the offset of a
PC-relative addressing mode always generated in the form "*±$XXXX".
The "$X..." must NOT be preceded by a "#".
This eliminates modifying immediate data.
Value must be negative.
Negative values are the only values we modify. A value $FFFF is
modified to -$0001.}
FUNCTION validMacsBugSymbol(symStart, limit: UNIV Ptr;
symbol: StringPtr): StringPtr; C;
{Check that the bytes pointed to by symStart represents a valid MacsBug
symbol. The symbol must be fully contained in the bytes starting at
symStart, up to, but not including, the byte pointed to by the limit
parameter.
If a valid symbol is NOT found, then NIL is returned as the function's
result. However, if a valid symbol is found, it is copied to symbol (if
it is not NIL) as a null terminated Pascal string, and return a pointer
to where we think the FOLLOWING module begins. In the "old style" cases
(see below) this will always be 8 or 16 bytes after the input symStart.
For new style Apple Pascal and C cases this will depend on the symbol
length, existence of a pad byte, and size of the constant (literal) area.
In all cases, trailing blanks are removed from the symbol.
A valid MacsBug symbol consists of the characters '_', '%', spaces,
digits, and upper/lower case letters in a format determined by the first
two bytes of the symbol as follows:
1st byte | 2nd byte | Byte |
Range | Range | Length | Comments
=======================================================================
$20 - $7F | $20 - $7F | 8 | "Old style" MacsBug symbol format
$A0 - $7F | $20 - $7F | 8 | "Old style" MacsBug symbol format
-----------------------------------------------------------------------
$20 - $7F | $80 - $FF | 16 | "Old style" MacApp symbol ab==>b.a
$A0 - $7F | $80 - $FF | 16 | "Old style" MacApp symbol ab==>b.a
-----------------------------------------------------------------------
$80 | $01 - $FF | n | n = 2nd byte (Apple symbol)
$81 - $9F | $00 - $FF | m | m = BAnd(1st byte,$7F) (Apple symbol)
=======================================================================
The formats are determined by whether bit 7 is set in the first and
second bytes. This bit will removed when we find it or'ed into the first
and/or second valid symbol characters.
The first two formats in the above table are the basic "old style" (pre-
existing) MacsBug formats. The first byte may or may not have bit 7 set
the second byte is a valid symbol character. The first byte (with bit 7
removed) and the next 7 bytes are assumed to comprise the symbol.
The second pair of formats are also "old style" formats, but used for
MacApp symbols. Bit 7 set in the second character indicates these
formats. The symbol is assumed to be 16 bytes with the second 8 bytes
preceding the first 8 bytes in the generated symbol. For example,
12345678abcdefgh represents the symbol abcdefgh.12345678.
The last pair of formats are reserved by Apple and generated by the MPW
Pascal and C compilers. In these cases the value of the first byte is
always between $80 and $9F, or with bit 7 removed, between $00 and $1F.
For $00, the second byte is the length of the symbol with that many bytes
following the second byte (thus a max length of 255). Values $01 to $1F
represent the length itself. A pad byte may follow these variable length
cases if the symbol does not end on a word boundary. Following the
symbol and the possible pad byte is a word containing the size of the
constants (literals) generated by the compiler.
Note that if symStart actually does point to a valid MacsBug symbol,
then you may use showMacsBugSymbol to convert the MacsBug symbol bytes to
a string that could be used as a DC.B operand for disassembly purposes.
This string explicitly shows the MacsBug symbol encodings.}
FUNCTION endOfModule(address, limit: UNIV Ptr; symbol: StringPtr;
VAR nextModule: UNIV Ptr): StringPtr; C;
{Check to see if the specified memory address, contains a RTS, JMP (A0) or
RTD #n instruction immediately followed by a valid MacsBug symbol. These
sequences are the only ones which can determine an end of module when
MacsBug symbols are present. During the check, the instruction and its
following MacsBug symbol must be fully contained in the bytes starting at
the specified address parameter, up to, but not including, the byte
pointed to by the limit parameter.
If the end of module is NOT found, then NIL is returned as the
function's result. However, if a end of module is found, the MacsBug
symbol is returned in symbol (if it is not NIL) as a null terminated
Pascal string (with trailing blanks removed), and the functions returns
the pointer to the start of the MacsBug symbol (i.e., address+2 for RTS
or JMP (A0) and address+4 for RTD #n). This address may then be used as
in input parameter to showMacsBugSymbol to convert the MacsBug symbol to
a Disassembler operand string.
Also returned in nextModule is where we think the FOLLOWING module
begins. In the "old style" cases (see validMacsBugSymbol) this will
always be 8 or 16 bytes after the input address. For new style the
Apple Pascal and C cases this will depend on the symbol length, existence
of a pad byte, and size of the constant (literal) area. See
validMacsBugSymbol for a description of valid MacsBug symbol formats.}
FUNCTION showMacsBugSymbol(symStart, limit: UNIV Ptr; operand: StringPtr;
VAR bytesUsed: Integer): StringPtr; C;
{Format a MacsBug symbol as a operand of a DC.B directive. The first one
or two bytes of the symbol are generated as $80+'c' if they have there
high high bits set. All other characters are shown as characters in a
string constant. The pad byte, if present, is one is also shown as $00.
When called, showMacsBugSymbol assumes that symStart is pointing at a
valid MacsBug symbol as validated by the validMacsBugSymbol or
endOfModule routines. As with validMacsBugSymbol, the symbol must be
fully contained in the bytes starting at symStart up to, but not
including, the byte pointed to by the limit parameter.
The string is returned in the 'operand' parameter as a null terminated
Pascal string. The function also returns a pointer to this string as its
return value (NIL is returned only if the byte pointed to by the limit
parameter is reached prior to processing the entire symbol -- which
should not happen if properly validated). The number of bytes used for
the symbol is returned in bytesUsed. Due to the way MacsBug symbols are
encoded, bytesUsed may not necessarily be the same as the length of the
operand string.
A valid MacsBug symbol consists of the characters '_', '%', spaces,
digits, and upper/lower case letters in a format determined by the first
two bytes of the symbol as described in the validMacsBugSymbol routine.}
{$ENDC} { UsingDisAsmLookup }
{$IFC NOT UsingIncludes}
END.
{$ENDC}