There's a much faster way to do multiplication, and it all comes down to handling the business of adding up the shifted copies of the multiplicand.
Let's look at 7 * 5 in binary again:
0 1 1 1 * 0 1 0 1 0 1 1 1 0 1 0 1 * ----------------
0 0 1 0 0 0 1 1
0 0 0 0 0 1 0 1
0 1 1 1 ^ Add if 1
-------
0 1 1 1 0 1 0 1 -> shift right -> 0 0 1 1 1 0 1 0
0 0 1 1 1 0 1 0
0 1 1 1 ^ Don't add if 0
-------
0 0 1 1 1 0 1 0 -> shift right -> 0 0 0 1 1 1 0 1
0 0 0 1 1 1 0 1
0 1 1 1 ^ Add if 1
-------
1 0 0 0 1 1 0 1 -> shift right -> 0 1 0 0 0 1 1 0
0 1 0 0 0 1 1 0
0 1 1 1 ^ Don't add if 0
-------
0 1 0 0 0 1 1 0 -> shift right -> 0 0 1 0 0 0 1 1
Here we begin with zeros in the high bits of the product register, and a copy of the multiplier in the low bits of the product register. Our model is only using 4 bit words, but of course the real 6502 uses 8 bits.
We loop for as many times as there are bits in the multiplier.
Each time around the loop, we examine the lowest bit of the product (which will actually be a bit from the multiplier; the units the first time around, the twos the second time around, and so forth).
If the lowest bit is a 1, we add the multiplicand to the high bits of the product.
Whether or not we performed an addition, we shift the product one bit to the right; discarding a bit from the multiplier.
After we are done going round the loop, we have the product in its rightful place.
This works because we have shifted the product a whole word to the right, so the units bit of the product is aligned to the units bit of a word in memory. The copy of the multiplicand we might have added depending on the units bit of the multiplier has been shifted all the way down to the units position. The copy that we might have added depending on the twos bit of the multiplier has been shifted right one time fewer, down to the twos position, and so forth.
Here is the process (assuming 8 bit multiplicand and multiplier and 16 bit product; but it can easily be extended) in pseudocode:
Begin with the multiplier in the bottom 8 bits of the product Do the following 8 times: If bit 0 of the product is 1 then: Add the multiplicand to the top 8 bits of the product In any case: Shift the whole 16-bit product right
And here it is in 6502 assembly language!
.multiply \ Expects multiplier in product TYA \ stash Y register on the stack PHA LDY #8 \ loop counter LDA #0 \ initialise product STA product+1 .mult_1 LDA product AND #1 \ examine bit from multiplier BEQ mult_2 \ skip adding if zero CLC LDA product+1 ADC multiplicand \ if this sets a carry, it will just STA product+1 \ get shifted into the product
.mult_2 \ shift whole product one bit right
ROR product+1 \ bring in carry from addition ROR product \ discard used bit DEY \ decrease counter BNE mult_1 PLA TAY \ restore Y RTS
.product
EQUW 0
.multiplicand
EQUB 0
It's faster than my earlier idea because the addition is restricted to just the width of the multiplicand, not the full width of the product.