ld65 A Linker for ca65 Object modules (C) Copyright 1998-1999 Ullrich von Bassewitz (uz@musoftware.de) Contents -------- 1. Overview 2. Usage 3. Detailed workings 4. Output configuration files 4.1 Introduction 4.2 Reference 4.3 Builtin configurations 5. Bugs/Feedback 6. Copyright 1. Overview ----------- ld65 is a replacement for the link65 linker that was part of the cc65 C compiler suite developed by John R. Dunning. link65 had some problems and the copyright does not permit some things which I wanted to be possible, so I decided to write a completely new assembler/linker/archiver suite for the cc65 compiler. ld65 is part of this suite. The ld65 linker combines several object modules, producing an executable file. The object modules may be read from a library created by the ar65 archiver (this is somewhat faster and more convenient). The linker was designed to be as flexible as possible. It complements the features that are built into the ca65 macroassembler: * Accept any number of segments to form an executable module. * Resolve arbitrary expressions stored in the object files. * In case of errors, use the meta information stored in the object files to produce helpful error messages. In case of undefined symbols, expression range errors, or symbol type mismatches, ld65 is able to tell you the exact location in the source, where the symbol was referenced. * Flexible output. The output of ld65 is highly configurable by a config file. More common platforms are supported by builtin configurations that may be activated by naming the target system. The output generation was designed with different output formats in mind, so adding other formats shouldn't be a great problem. 2. Usage -------- The linker is called as follows: Usage: ld65 [options] module ... Options are: -m name Create a map file -o name Name the default output file -t type Type of target system -v Verbose mode -vm Verbose map file -C name Use linker config file -Ln name Create a VICE label file -Lp Mark write protected segments as such (VICE) -S addr Set the default start address -V Print linker version The -m switch (which needs an argument that will used as a filename for the generated map file) will cause the linker to generate a map file. The map file does contain a detailed overview over the modules used, the sizes for the different segments, and a table containing exported symbols. The -o switch is used to give the name of the default output file. Depending on your output configuration, this name may NOT be used as name for the output file. However, for the builtin configurations, this name is used for the output file name. The argument for the -t switch is the name of the target system. Since this switch will activate a builtin configuration, it may not be used together with the -C option. The following target systems are defined (* = currently unsupported): none atari c64 c128 ace plus4 cbm610 pet nes apple2 See section 4.3 for more information about the builtin configurations. Using the -v option, you may enable more output that may help you to locate problems. If an undefined symbol is encountered, -v causes the linker to print a detailed list of the references (that is, source file and line) for this symbol. -C gives the name of an output config file to use. See section 4 for more information about config files. -C may not be used together with -t. -L allows you to create a file that contains all global labels and may be loaded into VICE emulator using the pb (playback) command. You may use this to debug your code with VICE. Note: The label feature is very new in VICE and has some bugs. If you have problems, please get the latest VICE version. Using -S you may define the default starting address. If and how this address is used depends on the config file in use. For the builtin configurations, only the "none" system honors an explicit start address, all other builtin config provide their own. -V prints the version number of the linker. If you send any suggestions or bugfixes, please include this number. If one of the modules is not found in the current directory, and the module name does not have a path component, the value of the environment variable CC65_LIB is prepended to the name, and the linker tries to open the module with this new name. 3. Detailed workings -------------------- The linker does several things when combining object modules: First, the command line is parsed from left to right. For each object file encountered (object files are recognized by a magic word in the header, so the linker does not care about the name), imported and exported identifiers are read from the file and inserted in a table. If a library name is given (libraries are also recognized by a magic word, there are no special naming conventions), all modules in the library are checked if an export from this module would satisfy an import from other modules. All modules where this is the case are marked. If duplicate identifiers are found, the linker issues a warning. This procedure (parsing and reading from left to right) does mean, that a library may only satisfy references for object modules (given directly or from a library) named BEFORE that library. With the command line ld65 crt0.o clib.lib test.o the module test.o may not contain references to modules in the library clib.lib. If this is the case, you have to change the order of the modules on the command line: ld65 crt0.o test.o clib.lib Step two is, to read the configuration file, and assign start addresses for the segments and define any linker symbols (see section 4). After that, the linker is ready to produce an output file. Before doing that, it checks it's data for consistency. That is, it checks for unresolved externals (if the output format is not relocatable) and for symbol type mismatches (for example a zero page symbol is imported by a module as absolute symbol). Step four is, to write the actual target files. In this step, the linker will resolve any expressions contained in the segment data. Circular references are also detected in this step (a symbol may have a circular reference that goes unnoticed if the symbol is not used). Step five is to output a map file with a detailed list of all modules, segments and symbols encountered. And, last step, if you give the -v switch twice, you get a dump of the segment data. However, this may be quite unreadable if you're not a developer:-) 4. Output configuration files ----------------------------- Configuration files are used to describe the layout of the output file(s). Two major topics are covered in a config file: The memory layout of the target architecture, and the assignment of segments to memory areas. In addition, several other attributes may be specified. Case is ignored for keywords, that is, section or attribute names, but it is NOT ignored for names and strings. 4.1 Introduction ---------------- Memory areas are specified in a "MEMORY" section. Lets have a look at an example (this one describes the usable memory layout of the C64): MEMORY { RAM1: start = $0800, size = $9800; ROM1: start = $A000, size = $2000; RAM2: start = $C000, size = $1000; ROM2: start = $E000, size = $2000; } As you can see, there are two ram areas and two rom areas. The names (before the colon) are arbitrary names that must start with a letter, with the remaining characters being letters or digits. The names of the memory areas are used when assigning segments. As mentioned above, case is significant for these names. The syntax above is used in all sections of the config file. The name ("ROM1" etc.) is said to be an identifier, the remaining tokens up to the semicolon specify attributes for this identifier. You may use the equal sign to assign values to attributes, and you may use a comma to separate attributes, you may also leave both out. But you MUST use a semicolon to mark the end of the attributes for one identifier. The section above may also have looked like this: # Start of memory section MEMORY { RAM1: start $0800 size $9800; ROM1: start $A000 size $2000; RAM2: start $C000 size $1000; ROM2: start $E000 size $2000; } There are of course more attributes for a memory section than just start and size. Start and size are mandatory attributes, that means, each memory area defined MUST have these attributes given (the linker will check that). I will cover other attributes later. As you may have noticed, I've used a comment in the example above. Comments start with a hash mark (`#'), the remainder of the line is ignored if this character is found. Let's assume you have written a program for your trusty old C64, and you would like to run it. For testing purposes, it should run in the RAM area. So we will start to assign segments to memory sections in the SEGMENTS section: SEGMENTS { CODE: load = RAM1, type = ro; RODATA: load = RAM1, type = ro; DATA: load = RAM1, type = rw; BSS: load = RAM1, type = bss, define = yes; } What we are doing here is telling the linker, that all segments go into the RAM1 memory area in the order specified in the SEGMENTS section. So the linker will first write the CODE segment, then the RODATA segment, then the DATA segment - but it will not write the BSS segment. Why? Enter the segment type: For each segment specified, you may also specify a segment attribute. There are five possible segment attributes: ro means readonly wprot same as ro but will be marked as write protected in the VICE label file if -Lp is given rw means read/write bss means that this is an uninitialized segment empty will not go in any output file So, because we specified that the segment with the name BSS is of type bss, the linker knows that this is uninitialized data, and will not write it to an output file. This is an important point: For the assembler, the BSS segment has no special meaning. You specify, which segments have the bss attribute when linking. This approach is much more flexible than having one fixed bss segment, and is a result of the design decision to supporting an arbitrary segment count. If you specify "type = bss" for a segment, the linker will make sure that this segment does only contain uninitialized data (that is, zeroes), and issue a warning if this is not the case. For a bss type segment to be useful, it must be cleared somehow by your program (this happens usually in the startup code - for example the startup code for cc65 generated programs takes care about that). But how does your code know, where the segment starts, and how big it is? The linker is able to give that information, but you must request it. This is, what we're doing with the "define = yes" attribute in the BSS definitions. For each segment, where this attribute is true, the linker will export three symbols. __NAME_LOAD__ This is set to the address where the segment is loaded. __NAME_RUN__ This is set to the run address of the segment. We will cover run addresses later. __NAME_SIZE__ This is set to the segment size. Replace "NAME" by the name of the segment, in the example above, this would be "BSS". These symbols may be accessed by your code. Now, as we've configured the linker to write the first three segments and create symbols for the last one, there's only one question left: Where does the linker put the data? It would be very convenient to have the data in a file, wouldn't it? We don't have any files specified above, and indeed, this is not needed in a simple configuration like the one above. There is an additional attribute "file" that may be specified for a memory area, that gives a file name to write the area data into. If there is no file name given, the linker will assign the default file name. This is "a.out" or the one given with the -o option on the command line. Since the default behaviour is ok for our purposes, I did not use the attribute in the example above. Let's have a look at it now. The "file" attribute (the keyword may also be written as "FILE" if you like that better) takes a string enclosed in double quotes (`"') that specifies the file, where the data is written. You may specifiy the same file several times, in that case the data for all memory areas having this file name is written into this file, in the order of the memory areas defined in the MEMORY section. Let's specify some file names in the MEMORY section used above: MEMORY { RAM1: start = $0800, size = $9800, file = %O; ROM1: start = $A000, size = $2000, file = "rom1.bin"; RAM2: start = $C000, size = $1000, file = %O; ROM2: start = $E000, size = $2000, file = "rom2.bin"; } The %O used here is a way to specify the default behaviour explicitly: %O is replaced by a string (including the quotes) that contains the default output name, that is, "a.out" or the name specified with the -o option on the command line. Into this file, the linker will first write any segments that go into RAM1, and will append then the segments for RAM2, because the memory areas are given in this order. So, for the RAM areas, nothing has really changed. We've not used the ROM areas, but we will do that below, so we give the file names here. Segments that go into ROM1 will be written to a file named "rom1.bin", and segments that go into ROM2 will be written to a file named "rom2.bin". The name given on the command line is ignored in both cases. Let us look now at a more complex example. Say, you've successfully tested your new "Super Operating System" (SOS for short) for the C64, and you will now go and replace the ROMs by your own code. When doing that, you face a new problem: If the code runs in RAM, we need not to care about read/write data. But now, if the code is in ROM, we must care about it. Remember the default segments (you may of course specify your own): CODE read only code RODATA read only data DATA read/write data BSS uninitialized data, read/write Since the BSS is not initialized, we must not care about it now, but what about DATA? DATA contains initialized data, that is, data that was explicitly assigned a value. And your program will rely on these values on startup. Since there's no other way to remember the contents of the data segment, than storing it into one of the ROMs, we have to put it there. But unfortunately, ROM is not writeable, so we have to copy it into RAM before running the actual code. The linker cannot help you copying the data from ROM into RAM (this must be done by the startup code of your program), but it has some features that will help you in this process. First, you may not only specify a "load" attribute for a segment, but also a "run" attribute. The "load" attribute is mandatory, and, if you don't specify a "run" attribute, the linker assumes that load area and run area are the same. We will use this feature for our data area: SEGMENTS { CODE: load = ROM1, type = ro; RODATA: load = ROM2, type = ro; DATA: load = ROM2, run = RAM2, type = rw, define = yes; BSS: load = RAM2, type = bss, define = yes; } Let's have a closer look at this SEGMENTS section. We specify that the CODE segment goes into ROM1 (the one at $A000). The readonly data goes into ROM2. Read/write data will be loaded into ROM2 but is run in RAM2. That means that all references to labels in the DATA segment are relocated to be in RAM2, but the segment is written to ROM2. All your startup code has to do is, to copy the data from it's location in ROM2 to the final location in RAM2. So, how do you know, where the data is located? This is the second point, where you get help from the linker. Remember the "define" attribute? Since we have set this attribute to true, the linker will define three external symbols for the data segment that may be accessed from your code: __DATA_LOAD__ This is set to the address where the segment is loaded, in this case, it is an address in ROM2. __DATA_RUN__ This is set to the run address of the segment, in this case, it is an address in RAM2. __DATA_SIZE__ This is set to the segment size. So, what your startup code must do, is to copy __DATA_SIZE__ bytes from __DATA_LOAD__ to __DATA_RUN__ before any other routines are called. All references to labels in the DATA segment are relocated to RAM2 by the linker, so things will work properly. There are some other attributes not covered above. Before starting the reference section, I will discuss the remaining things here. You may request symbols definitions also for memory areas. This may be useful for things like a software stack, or an i/o area. MEMORY { STACK: start = $C000, size = $1000, define = yes; } This will define three external symbols that may be used in your code: __STACK_START__ This is set to the start of the memory area, $C000 in this example. __STACK_SIZE__ The size of the area, here $1000. __STACK_LAST__ This is NOT the same as START+SIZE. Instead, it it defined as the first address that is not used by data. If we don't define any segments for this area, the value will be the same as START. A memory section may also have a type. Valid types are ro for readonly memory and rw for read/write memory. The linker will assure, that no segment marked as read/write or bss is put into a memory area that is marked as readonly. Unused memory in a memory area may be filled. Use the "fill = yes" attribute to request this. The default value to fill unused space is zero. If you don't like this, you may specify a byte value that is used to fill these areas with the "fillval" attribute. This value is also used to fill unfilled areas generated by the assemblers .ALIGN and .RES directives. Segments may be aligned to some memory boundary. Specify "align = num" to request this feature. Num must be a power of two. To align all segments on a page boundary, use SEGMENTS { CODE: load = ROM1, type = ro, align = $100; RODATA: load = ROM2, type = ro, align = $100; DATA: load = ROM2, run = RAM2, type = rw, define = yes, align = $100; BSS: load = RAM2, type = bss, define = yes, align = $100; } If an alignment is requested, the linker will add enough space to the output file, so that the new segment starts at an address that is divideable by the given number without a remainder. All addresses are adjusted accordingly. To fill the unused space, bytes of zero are used, or, if the memory area has a "fillval" attribute, that value. Alignment is always needed, if you have the used the .ALIGN command in the assembler. The alignment of a segment must be equal or greater than the alignment used in the .ALIGN command. The linker will check that, and issue a warning, if the alignment of a segment is lower than the alignment requested in a .ALIGN command of one of the modules making up this segment. For a given segment you may also specify a fixed offset into a memory area or a fixed start address. Use this if you want the code to run at a specific address (a prominent case is the interrupt vector table which must go at address $FFFA). Only one of ALIGN or OFFSET or START may be specified. If the directive creates empty space, it will be filled with zero, of with the value specified with the "fillval" attribute if one is given. The linker will warn you if it is not possible to put the code at the specified offset (this may happen if other segments in this area are too large). Here's an example: SEGMENTS { VECTORS: load = ROM2, type = ro, start = $FFFA; } or (for the segment definitions from above) SEGMENTS { VECTORS: load = ROM2, type = ro, offset = $1FFA; } File names may be empty, data from segments assigned to a memory area with an empty file name is discarded. This is useful, if the a memory area has segments assigned that are empty (for example because they are of type bss). In that case, the linker will create an empty output file. This may be suppressed by assigning an empty file name to that memory area. The symbol %S may be used to access the default start address (that is, $200 or the value given on the command line with the -S option). 4.2 Reference ------------- 4.3 Builtin configurations -------------------------- Here is a list of the builin configurations for the different target types: none: MEMORY { RAM: start = %S, size = $10000, file = %O; } SEGMENTS { CODE: load = RAM, type = rw; RODATA: load = RAM, type = rw; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } atari: (non-existent) c64: MEMORY { RAM: start = $7FF, size = $c801, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } c128: MEMORY { RAM: start = $1bff, size = $a401, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } ace: (non-existent) plus4: MEMORY { RAM: start = $0fff, size = $7001, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } cbm610: MEMORY { RAM: start = $0001, size = $FFF0, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } pet: MEMORY { RAM: start = $03FF, size = $7BFF, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } nes: MEMORY { RAM: start = $0200, size = $0600, file = ""; ROM: start = $8000, size = $8000, file = %O; } SEGMENTS { CODE: load = ROM, type = ro; RODATA: load = ROM, type = ro; DATA: load = ROM, run = RAM, type = rw, define = yes; BSS: load = RAM, type = bss, define = yes; VECTORS: load = ROM, type = ro, start = $FFFA; } apple2: MEMORY { RAM: start = $800, size = $8E00, file = %O; } SEGMENTS { CODE: load = RAM, type = ro; RODATA: load = RAM, type = ro; DATA: load = RAM, type = rw; BSS: load = RAM, type = bss, define = yes; } The "start" attribute for the RAM memory area of the CBM systems is two less than the actual start of the basic RAM to account for the two bytes load address that is needed on disk and supplied by the startup code. 5. Bugs/Feedback ---------------- If you have problems using the linker, if you find any bugs, or if you're doing something interesting with it, I would be glad to hear from you. Feel free to contact me by email (uz@musoftware.de). 6. Copyright ------------ ld65 (and all cc65 binutils) are (C) Copyright 1998 Ullrich von Bassewitz. For usage of the binaries and/or sources the following conditions do apply: This software is provided 'as-is', without any expressed or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution.