Tuesday 9 April 2019

Multiplying on the 6502

The 6502 does not have a hardware multiply instruction, so we need to write us a multiply routine in software. Here we need to multiply an 8-bit value by an 8-bit value giving a 16-bit product, but the principle can be extended up to a 256-bit multiplicand, 256-bit multiplier and 512-bit product.

Multiplication in binary is exactly the same as the decimal multiplication you learned in primary school, but using only ones and zeros.  For instance, here is how we would perform 7 * 5 in binary:

0 1 1 1 * 0 1 0 1
         0 1 1 1
         0 1 0 1 *
-------------------
         0 1 1 1    (from the 1 in the units position)
     0 1 1 1 0 0    (from the 1 in the fours position)
-------------------
      1 1 1         (carry between bits)
 0 0 1 0 0 0 1 1
===================

We look at the multiplier, working from right to left.  If there is a 0 at that bit, we do nothing.  If there is a 1, we write down a copy of the multiplicand, lining its units digit up with the multiplier digit in question.  When we have all our partial product rows, we just add them up to get our final answer.  (In practice, we will keep adding as we go along.)

The 6502 has bit-shifting instructions ROL and ROR, which shift the accumulator or addressed memory location left or right respectively; importing the previous value of the carry flag onto one end as a new digit, and catching the bit that falls out the other end in the carry flag. There are also ASL and LSR instructions, which do the same but clear the carry flag first so the newly-imported bit will always be zero.  Typically you would use LSR on the high byte and ROR on the lower bytes in turn, or ASL on the low byte and ROL on the higher bytes in turn; but if you wish to import a one, you could do SEC followed by ROL or ROR.

The code:
.mult8
TXA              \ we are going to stomp on X
PHA
LDA #0
STA mulA+1       \ extend A register to 16 bits
STA product      \ zero product register
STA product+1
LDX #8           \ 8 bits
.mult8_1         \ BEGINNING OF LOOP
LSR mulB         \   shift B right
BCC mult8_2      \   if 0 fell out, skip all this
LDA mulB         \     set the high bit of B, so we regenerate
ORA #128         \     the original value after eight shifts
STA mulB         \ 
LDA product      \     add shifted-A to the product
CLC
ADC mulA
STA product
LDA product+1    \     now the high byte
ADC mulA+1
STA product+1
.mult8_2         \  paths merge again
ASL mulA         \  shift A left
ROL mulA+1
DEX
BNE mult8_1      \ move on to the next bit
LDA mulA+1       \ effectively, shift A right 8 places
STA mulA
PLA
TAX              \ restore X
RTS              \ ..... and we're outta here
 
.mulA    EQUD 0  \ reserve 32 bits, in case we need a 16-bit multiply
.mulB    EQUW 0  \ reserve 16 bits
.product EQUD 0  \ reserve 32 bits
We begin by transferring the X register into the accumulator and pushing it onto the stack, so we can retrieve it later.  Then we zero out the byte immediately after A, so we have a clean 16-bit workspace with an 8-bit value at the bottom end, along which we can shift A; and the two bytes to hold the product.  The last step of the initialisation phase is to load the X register with 8  (the number of times we are going to go round).

Each time around the loop, we shift B to the right each time; the rightmost bit will fall off the end and be caught in the carry flag.  We can then use a BCC instruction to skip right over the addition if the bit that fell out was a 0.  If it was a 1, we first set the high bit of B -- which will have the effect of regenerating the original value after 8 shifts -- and then add A  (suitably shifted to line up with the original bit position out of B that was a 1)  to the product.  We know that the carry flag is set, but can't make use of this as we are not adding a constant.  (Otherwise, we could add one less than what we wanted to add, and save a byte and two cycles clearing the carry.)

Whatever bit shifted out of B, we eventually arrive at the same point.  We shift A left 1 bit and decrease X.  If the result is not zero, we go around the loop again.

After eight rounds, A has been shifted left 8 bits; and thanks to our setting the high bit whenever the low bit was 1, B is exactly as it originally was.  We copy A back to where it started out, pull the value we pushed earlier from the stack, transfer it back to X and return whence we came.

Note, I have purposely reserved double the required space for A, B and the product; on the basis that the same workspace could be used by both an 8-bit and a 16-bit multiply subroutine, and/or a generic, any-length multiply.