Multiplication in binary is exactly the same as the decimal multiplication you learned in primary school, but using only ones and zeros. For instance, here is how we would perform 7 * 5 in binary:

0 1 1 1 * 0 1 0 1 0 1 1 1 0 1 0 1 * ------------------- 0 1 1 1 (from the 1 in the units position) 0 1 1 1 0 0 (from the 1 in the fours position) ------------------- 1 1 1 (carry between bits) 0 0 1 0 0 0 1 1 ===================

We look at the multiplier, working from right to left. If there is a 0 at that bit, we do nothing. If there is a 1, we write down a copy of the multiplicand, lining its units digit up with the multiplier digit in question. When we have all our partial product rows, we just add them up to get our final answer. (In practice, we will keep adding as we go along.)

The 6502 has bit-shifting instructions ROL and ROR, which shift the accumulator or addressed memory location left or right respectively; importing the previous value of the carry flag onto one end as a new digit, and catching the bit that falls out the other end in the carry flag. There are also ASL and LSR instructions, which do the same but clear the carry flag first so the newly-imported bit will always be zero. Typically you would use LSR on the high byte and ROR on the lower bytes in turn, or ASL on the low byte and ROL on the higher bytes in turn; but if you wish to import a one, you could do SEC followed by ROL or ROR.

The code:

.mult8 TXA \ we are going to stomp on X PHA LDA #0 STA mulA+1 \ extend A register to 16 bits STA product \ zero product register STA product+1 LDX #8 \ 8 bits .mult8_1 \ BEGINNING OF LOOP LSR mulB \ shift B right BCC mult8_2 \ if 0 fell out, skip all this LDA mulB \ set the high bit of B, so we regenerate ORA #128 \ the original value after eight shifts STA mulB \ LDA product \ add shifted-A to the product CLC ADC mulA STA product LDA product+1 \ now the high byte ADC mulA+1 STA product+1 .mult8_2 \ paths merge again ASL mulA \ shift A left ROL mulA+1 DEX BNE mult8_1 \ move on to the next bit LDA mulA+1 \ effectively, shift A right 8 places STA mulA PLA TAX \ restore X RTS \ ..... and we're outta here

.mulA EQUD 0 \ reserve 32 bits, in case we need a 16-bit multiply .mulB EQUW 0 \ reserve 16 bits .product EQUD 0 \ reserve 32 bitsWe begin by transferring the X register into the accumulator and pushing it onto the stack, so we can retrieve it later. Then we zero out the byte immediately after A, so we have a clean 16-bit workspace with an 8-bit value at the bottom end, along which we can shift A; and the two bytes to hold the product. The last step of the initialisation phase is to load the X register with 8 (the number of times we are going to go round).

Each time around the loop, we shift B to the right each time; the rightmost bit will fall off the end and be caught in the carry flag. We can then use a BCC instruction to skip right over the addition if the bit that fell out was a 0. If it was a 1, we first set the high bit of B -- which will have the effect of regenerating the original value after 8 shifts -- and then add A (suitably shifted to line up with the original bit position out of B that was a 1) to the product. We know that the carry flag is set, but can't make use of this as we are not adding a constant. (Otherwise, we could add one less than what we wanted to add, and save a byte and two cycles clearing the carry.)

Whatever bit shifted out of B, we eventually arrive at the same point. We shift A left 1 bit and decrease X. If the result is not zero, we go around the loop again.

After eight rounds, A has been shifted left 8 bits; and thanks to our setting the high bit whenever the low bit was 1, B is exactly as it originally was. We copy A back to where it started out, pull the value we pushed earlier from the stack, transfer it back to X and return whence we came.

Note, I have purposely reserved double the required space for A, B and the product; on the basis that the same workspace could be used by both an 8-bit and a 16-bit multiply subroutine, and/or a generic, any-length multiply.