Optimising 6502 Machine Code ---------------------------- by Steven Flintham ------------------ Optimising machine code is one of those things that doesn't seem immediately useful, but you never know when it might be useful. For instance, its surprising just how many short pieces of code manage to be just a few bytes over 256 bytes (one page). The JSR/RTS method ------------------ One of the best known methods of cutting a few bytes out of a piece of code involves replacing a JSR followed by an RTS with a JMP. For example: JSR &FFEE RTS can be replaced with: JMP &FFEE This works because the RTS at the end of the subroutine replaces the original RTS. The devious branch ------------------ The 65C02 and its immediate relatives contain the additional instruction BRA, which provides an unconditional branch to nearby code, taking only two bytes compared to three for an equivalent JMP. The standard 6502, however, contains no such instruction, but in some situations a different branch instruction can be used to provide the same effect. For example: LDA flag CMP #10 BEQ set_A_to_0 LDA #1 JMP skip .set_A_to_0 LDA #0 .skip (which sets A to 0 if flag contains 10, and 1 otherwise) can be repaced with: LDA flag CMP #10 BEQ set_A_to_0 LDA #1 BNE skip .set_A_to_0 LDA #0 .skip This saves one byte by replacing the JMP with a BNE. This can be done because after loading A with 1, the Z flag will definitely be cleared. However, don't fall into the trap of contriving such a condition. For example, replacing: JMP code with SEC:BCS code provides no advantage as it occupies the same number of bytes (three) AND corrupts the C flag. In the first example, the instruction which had to be carried out anyway (LDA #1) cleared the Z flag, enabling us to take advantage. If in doubt, don't use this method as it can lead to problems - there are suggestions that the reported bug in some versions of ViewSpell when used with the ADFS is due to some sort of assumption being incorrectly made about a particular flag being set or cleared. Don't write more code than you need ----------------------------------- If you're writing an interrupt driven routine or a ROM service call handler, you'll have to save the processor registers and status flags at some point. However, when you're deciding whether you want to accept the call or if its the right interrupt, you often only need to save one or two registers. For instance, if you're writing a ROM service call handler: .service_call_handler PHP:PHA:STA temp:TXA:PHA:TYA:PHA:LDA temp CMP #4 BEQ command CMP #9 BEQ help PLA:TAY:PLA:TAX:PLA:PLP RTS you don't need to save the X and Y registers unless you actually have to service a call. This not only avoids the need for the TXA:PHA:TYA:PHA, but also avoids having to store the accumulator temporarily. The rewritten code is: .service_call_handler PHP:PHA CMP #4 BEQ command CMP #9 BEQ help PLA:PLP RTS This is a total saving of 14 bytes, assuming that temp is not in zero page. Make the most of post-indexed indirect -------------------------------------- On the 6502, the instructions: LDY #0 LDA (zero_page),Y frequently occur. If this happens, try to make use of the Y assignment. I won't give an example because this is of most use in complex situations, which can't be demonstrated easily. As an illustration, if you are using this instruction to zero an area of memory, you could use TYA to zero the accumulator instead of LDA #0, saving one byte, but the example would become convoluted because it would be easier to LDA #0 at the beginning and USE the Y register to scan through the area of memory. Use your processor ------------------ If you're writing software which will definitely be used on a Master or second processor, both of which contain a 65C02 compatible chip, use the extra instructions. [ Editor's note : From my limited knowledge of BBC machine code, it seems that most of the following ways of optimising code are only compatible with Master machines, or BBC B with 6502 second processor. If you are considering using any of the following methods, consider very carefully the meaning of the word "definitely". There is very little point in making a program incompatible with much of the 6502 world simply for the sake of a few bytes (or because you only use a Master and have forgotten that certain instructions are unavailable on the BBC B). Steven himself has in the past laid down a lack of 65C02-specific instructions as a requirement for code submitted to him for inclusion in his ADFSUtils ROM, and this seems, in my opinion, a sensible step. You may feel that the particular program you are writing is so specific as to be of use only for yourself, but it is more than likely that other users will also find your software useful, and I would very much like to feature a wide selection of fully-compatible software in 8-Bit Software. My views on compatibility are based largely upon the fact that it is very easy to ensure and at the same time advantageous to everyone. I spent around twelve months loading PD software onto Econet (or rather, getting other people to do it for me) just to find that many otherwise excellent pieces of software did not follow the Reference Manual's recommendations that "programs which might possibly be run in a network environment" should not use certain (small) areas of memory. All PD software will probably find its way to a network eventually because site licences for commercial software are so expensive! More to the point, networks may seem few enough in number to be virtually irrelevant (in the view of many programmers!) when compared with the large number of individual users, but the particular network in question had upwards of 250 users, which makes 8-Bit Software seem fairly small potatoes by comparison. Anyway, enough of my digression about compatibility - the rest of it is in the "Submission Requirements" section. If your program contains some other fundamental incompatibility with the BBC B, then obviously you might as well use as many 65C02 instructions as you wish. - D.G.S. ] The stack operations -------------------- One very obvious example, which you might miss if you're not used to the extra instructions, is replacing lines like: PHP:PHA:TXA:PHA:TYA:PHA with PHP:PHA:PHX:PHY This saves two bytes and avoids corrupting the accumulator. Post-indexed indirect optimisation ---------------------------------- It's surprising how often code of the form: LDY #0 LDA (&70),Y appears in 6502 machine code. The 65C02 and family can provide indirect addressing without post indexing, allowing you to write: LDA (&70) This saves two bytes and prevents you having to corrupt the Y register. However, if you haven't got a 65C02 you can try to modify the code so that setting Y to 0 performs some other useful function. This is mentioned elsewhere. Use BRA ------- If your code uses JMP to skip over short sections of code, use BRA instead, which saves one byte. Use STZ ------- Don't forget that if you have to zero an area of memory, the 65C02 and family support an STZ instruction. For example: LDA #0 STA address can be replaced with: STZ address This is two bytes shorter and also avoids corrupting the accumulator and setting the Z flag. Postscript ---------- I'm sure there are many more techniques than those listed here, but these are the ones that I find most useful. A final word of warning, however - if you're optimising a program you've already written, keep a copy of the original source code!