Friday, 17 July 2020

Multiplying on the 6502, but faster

There's a much faster way to do multiplication, and it all comes down to handling the business of adding up the shifted copies of the multiplicand.
Let's look at 7 * 5 in binary again:
0 1 1 1 * 0 1 0 1
         0 1 1 1
         0 1 0 1 *
----------------
0 0 1 0 0 0 1 1


0 0 0 0 0 1 0 1
0 1 1 1 ^ Add if 1
-------
0 1 1 1 0 1 0 1 -> shift right -> 0 0 1 1 1 0 1 0

0 0 1 1 1 0 1 0
0 1 1 1 ^ Don't add if 0
-------
0 0 1 1 1 0 1 0 -> shift right -> 0 0 0 1 1 1 0 1

0 0 0 1 1 1 0 1
0 1 1 1 ^ Add if 1
-------
1 0 0 0 1 1 0 1 -> shift right -> 0 1 0 0 0 1 1 0

0 1 0 0 0 1 1 0
0 1 1 1 ^ Don't add if 0
-------
0 1 0 0 0 1 1 0 -> shift right -> 0 0 1 0 0 0 1 1
Here we begin with zeros in the high bits of the product register, and a copy of the multiplier in the low bits of the product register.  Our model is only using 4 bit words, but of course the real 6502 uses 8 bits.

We loop for as many times as there are bits in the multiplier.

Each time around the loop, we examine the lowest bit of the product (which will actually be a bit from the multiplier; the units the first time around, the twos the second time around, and so forth).

If the lowest bit is a 1, we add the multiplicand to the high bits of the product.

Whether or not we performed an addition, we shift the product one bit to the right; discarding a bit from the multiplier.

After we are done going round the loop, we have the product in its rightful place.

This works because we have shifted the product a whole word to the right, so the units bit of the product is aligned to the units bit of a word in memory.  The copy of the multiplicand we might have added depending on the units bit of the multiplier has been shifted all the way down to the units position.  The copy that we might have added depending on the twos bit of the multiplier has been shifted right one time fewer, down to the twos position, and so forth.

Here is the process (assuming 8 bit multiplicand and multiplier and 16 bit product; but it can easily be extended) in pseudocode:

Begin with the multiplier in the bottom 8 bits of the product
Do the following 8 times:
    If bit 0 of the product is 1 then:
        Add the multiplicand to the top 8 bits of the product
    In any case:
    Shift the whole 16-bit product right
And here it is in 6502 assembly language!
.multiply          \ Expects multiplier in product
TYA                \ stash Y register on the stack
PHA
LDY #8             \ loop counter
LDA #0             \ initialise product
STA product+1
.mult_1
LDA product
AND #1             \ examine bit from multiplier
BEQ mult_2         \ skip adding if zero
CLC
LDA product+1
ADC multiplicand   \ if this sets a carry, it will just
STA product+1      \ get shifted into the product           
.mult_2 \ shift whole product one bit right
ROR product+1 \ bring in carry from addition ROR product \ discard used bit DEY \ decrease counter BNE mult_1 PLA TAY \ restore Y RTS
.product
EQUW 0
.multiplicand
EQUB 0

It's faster than my earlier idea because the addition is restricted to just the width of the multiplicand, not the full width of the product.

2 comments:

  1. The CLC should be before the BEQ mult_2. As it stands carry is undefined on entry (and also each time around the loop), so when we branch on BEQ mult_2 then the 2xRORs introduce spurious bits.

    ReplyDelete
    Replies
    1. No -- it's fine that way.

      The bit that gets shifted in stays out of the way of the way of the addition, and gets shifted right out the other end again by the last ROR.

      Delete