cc65 coding hints <author><url url="mailto:uz@cc65.org" name="Ullrich von Bassewitz"> <abstract> How to generate the most effective code with cc65. </abstract> <sect>Use prototypes<p> This will not only help to find errors between separate modules, it will also generate better code, since the compiler must not assume that a variable sized parameter list is in place and must not pass the argument count to the called function. This will lead to shorter and faster code. <sect>Don't declare auto variables in nested function blocks<p> Variable declarations in nested blocks are usually a good thing. But with cc65, there is a drawback: Since the compiler generates code in one pass, it must create the variables on the stack each time the block is entered and destroy them when the block is left. This causes a speed penalty and larger code. <sect>Remember that the compiler does no high level optimizations<p> The compiler needs hints from you about the code to generate. It will try to optimize the generated code, but follow the outline you gave in your C program. So for example, when accessing indexed data structures, get a pointer to the element and use this pointer instead of calculating the index again and again. If you want to have your loops unrolled, or loop invariant code moved outside the loop, you have to do that yourself. <sect>Longs are slow!<p> While long support is necessary for some things, it's really, really slow on the 6502. Remember that any long variable will use 4 bytes of memory, and any operation works on double the data compared to an int. <sect>Use unsigned types wherever possible<p> The 6502 CPU has no opcodes to handle signed values greater than 8 bit. So sign extension, test of signedness etc. has to be done with extra code. As a consequence, the code to handle signed operations is usually a bit larger and slower than the same code for unsigned types. <sect>Use chars instead of ints if possible<p> While in arithmetic operations, chars are immidiately promoted to ints, they are passed as chars in parameter lists and are accessed as chars in variables. The code generated is usually not much smaller, but it is faster, since accessing chars is faster. For several operations, the generated code may be better if intermediate results that are known not to be larger than 8 bit are casted to chars. You should especially use unsigned chars for loop control variables if the loop is known not to execute more than 255 times. <sect>Make the size of your array elements one of 1, 2, 4, 8<p> When indexing into an array, the compiler has to calculate the byte offset into the array, which is the index multiplied by the size of one element. When doing the multiplication, the compiler will do a strength reduction, that is, replace the multiplication by a shift if possible. For the values 2, 4 and 8, there are even more specialized subroutines available. So, array access is fastest when using one of these sizes. <sect>Expressions are evaluated from left to right<p> Since cc65 is not building an explicit expression tree when parsing an expression, constant subexpressions may not be detected and optimized properly if you don't help. Look at this example: <tscreen><verb> #define OFFS 4 int i; i = i + OFFS + 3; </verb></tscreen> The expression is parsed from left to right, that means, the compiler sees 'i', and puts it contents into the secondary register. Next is OFFS, which is constant. The compiler emits code to add a constant to the secondary register. Same thing again for the constant 3. So the code produced contains a fetch of 'i', two additions of constants, and a store (into 'i'). Unfortunately, the compiler does not see, that "OFFS + 3" is a constant for itself, since it does its evaluation from left to right. There are some ways to help the compiler to recognize expression like this: <enum> <item>Write "i = OFFS + 3 + i;". Since the first and second operand are constant, the compiler will evaluate them at compile time reducing the code to a fetch, one addition (secondary + constant) and one store. <item>Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, the compiler will start a new expression evaluation for the stuff in the braces, and since all operands in the subexpression are constant, it will detect this and reduce the code to one fetch, one addition and one store. </enum> <sect>Use the preincrement and predecrement operators<p> The compiler is not always smart enough to figure out, if the rvalue of an increment is used or not. So it has to save and restore that value when producing code for the postincrement and postdecrement operators, even if this value is never used. To avoid the additional overhead, use the preincrement and predecrement operators if you don't need the resulting value. That means, use <tscreen><verb> ... ++i; ... </verb></tscreen> instead of <tscreen><verb> ... i++; ... </verb></tscreen> <sect>Use constants to access absolute memory locations<p> The compiler produces optimized code, if the value of a pointer is a constant. So, to access direct memory locations, use <tscreen><verb> #define VDC_STATUS 0xD601 *(char*)VDC_STATUS = 0x01; </verb></tscreen> That will be translated to <tscreen><verb> lda #$01 sta $D601 </verb></tscreen> The constant value detection works also for struct pointers and arrays, if the subscript is a constant. So <tscreen><verb> #define VDC ((unsigned char*)0xD600) #define STATUS 0x01 VDC[STATUS] = 0x01; </verb></tscreen> will also work. If you first load the constant into a variable and use that variable to access an absolute memory location, the generated code will be much slower, since the compiler does not know anything about the contents of the variable. <sect>Use initialized local variables<p> Initialization of local variables when declaring them gives shorter and faster code. So, use <tscreen><verb> int i = 1; </verb></tscreen> instead of <tscreen><verb> int i; i = 1; </verb></tscreen> But beware: To maximize your savings, don't mix uninitialized and initialized variables. Create one block of initialized variables and one of uniniitalized ones. The reason for this is, that the compiler will sum up the space needed for uninitialized variables as long as possible, and then allocate the space once for all these variables. If you mix uninitialized and initialized variables, you force the compiler to allocate space for the uninitialized variables each time, it parses an initialized one. So do this: <tscreen><verb> int i, j; int a = 3; int b = 0; </verb></tscreen> instead of <tscreen><verb> int i; int a = 3; int j; int b = 0; </verb></tscreen> The latter will work, but will create larger and slower code. <sect>Use the array operator [] even for pointers<p> When addressing an array via a pointer, don't use the plus and dereference operators, but the array operator. This will generate better code in some common cases. Don't use <tscreen><verb> char* a; char b, c; char b = *(a + c); </verb></tscreen> Use <tscreen><verb> char* a; char b, c; char b = a[c]; </verb></tscreen> instead. <sect>Use register variables with care<p> Register variables may give faster and shorter code, but they do also have an overhead. Register variables are actually zero page locations, so using them saves roughly one cycle per access. The calling routine may also use register variables, so the old values have to be saved on function entry and restored on exit. Saving and restoring has an overhead of about 70 cycles per 2 byte variable. It is easy to see, that - apart from the additional code that is needed to save and restore the values - you need to make heavy use of a variable to justify the overhead. As a general rule: Use register variables only for pointers that are dereferenced several times in your function, or for heavily used induction variables in a loop (with several 100 accesses). When declaring register variables, try to keep them together, because this will allow the compiler to save and restore the old values in one chunk, and not in several. And remember: Register variables must be enabled with <tt/-r/ or <tt/-Or/. <sect>Decimal constants greater than 0x7FFF are actually long ints<p> The language rules for constant numeric values specify that decimal constants without a type suffix that are not in integer range must be of type long int or unsigned long int. So a simple constant like 40000 is of type long int! This is often unexpected and may cause an expression to be evaluated with 32 bits. While in many cases the compiler takes care about it, in some places it can't. So be careful when you get a warning like <tscreen><verb> test.c(7): Warning: Constant is long </verb></tscreen> Use the <tt/U/, <tt/L/ or <tt/UL/ suffixes to tell the compiler the desired type of a numeric constant. <sect>Access to parameters in variadic functions is expensive<p> Since cc65 has the "wrong" calling order, the location of the fixed parameters in a variadic function (a function with a variable parameter list) depends on the number and size of variable arguments passed. Since this number and size is unknown at compile time, the compiler will generate code to calculate the location on the stack when needed. Because of this additional code, accessing the fixed parameters in a variadic function is much more expensive than access to parameters in a "normal" function. Unfortunately, this additional code is also invisible to the programmer, so it is easy to forget. As a rule of thumb, if you access such a parameter more than once, you should think about copying it into a normal variable and using this variable instead. </article>