Becoming Julie: 2019

The 6502 does not have a hardware multiply instruction, so we need to write us a multiply routine in software. Here we need to multiply an 8-bit value by an 8-bit value giving a 16-bit product, but the principle can be extended up to a 256-bit multiplicand, 256-bit multiplier and 512-bit product.

Multiplication in binary is exactly the same as the decimal multiplication you learned in primary school, but using only ones and zeros. For instance, here is how we would perform 7 * 5 in binary:

0 1 1 1 * 0 1 0 1
         0 1 1 1
         0 1 0 1 *
-------------------
         0 1 1 1    (from the 1 in the units position)
     0 1 1 1 0 0    (from the 1 in the fours position)
-------------------
      1 1 1         (carry between bits)
 0 0 1 0 0 0 1 1
===================

We look at the multiplier, working from right to left. If there is a 0 at that bit, we do nothing. If there is a 1, we write down a copy of the multiplicand, lining its units digit up with the multiplier digit in question. When we have all our partial product rows, we just add them up to get our final answer. (In practice, we will keep adding as we go along.)

The 6502 has bit-shifting instructions ROL and ROR, which shift the accumulator or addressed memory location left or right respectively; importing the previous value of the carry flag onto one end as a new digit, and catching the bit that falls out the other end in the carry flag. There are also ASL and LSR instructions, which do the same but clear the carry flag first so the newly-imported bit will always be zero. Typically you would use LSR on the high byte and ROR on the lower bytes in turn, or ASL on the low byte and ROL on the higher bytes in turn; but if you wish to import a one, you could do SEC followed by ROL or ROR.

The code:

.mult8
TXA              \ we are going to stomp on X
PHA
LDA #0
STA mulA+1       \ extend A register to 16 bits
STA product      \ zero product register
STA product+1
LDX #8           \ 8 bits
.mult8_1         \ BEGINNING OF LOOP
LSR mulB         \   shift B right
BCC mult8_2      \   if 0 fell out, skip all this
LDA mulB         \     set the high bit of B, so we regenerate
ORA #128         \     the original value after eight shifts
STA mulB         \ 
LDA product      \     add shifted-A to the product
CLC
ADC mulA
STA product
LDA product+1    \     now the high byte
ADC mulA+1
STA product+1
.mult8_2         \  paths merge again
ASL mulA         \  shift A left
ROL mulA+1
DEX
BNE mult8_1      \ move on to the next bit
LDA mulA+1       \ effectively, shift A right 8 places
STA mulA
PLA
TAX              \ restore X
RTS              \ ..... and we're outta here

.mulA    EQUD 0  \ reserve 32 bits, in case we need a 16-bit multiply
.mulB    EQUW 0  \ reserve 16 bits
.product EQUD 0  \ reserve 32 bits

We begin by transferring the X register into the accumulator and pushing it onto the stack, so we can retrieve it later. Then we zero out the byte immediately after A, so we have a clean 16-bit workspace with an 8-bit value at the bottom end, along which we can shift A; and the two bytes to hold the product. The last step of the initialisation phase is to load the X register with 8 (the number of times we are going to go round).

Each time around the loop, we shift B to the right each time; the rightmost bit will fall off the end and be caught in the carry flag. We can then use a BCC instruction to skip right over the addition if the bit that fell out was a 0. If it was a 1, we first set the high bit of B -- which will have the effect of regenerating the original value after 8 shifts -- and then add A (suitably shifted to line up with the original bit position out of B that was a 1) to the product. We know that the carry flag is set, but can't make use of this as we are not adding a constant. (Otherwise, we could add one less than what we wanted to add, and save a byte and two cycles clearing the carry.)

Whatever bit shifted out of B, we eventually arrive at the same point. We shift A left 1 bit and decrease X. If the result is not zero, we go around the loop again.

After eight rounds, A has been shifted left 8 bits; and thanks to our setting the high bit whenever the low bit was 1, B is exactly as it originally was. We copy A back to where it started out, pull the value we pushed earlier from the stack, transfer it back to X and return whence we came.

Note, I have purposely reserved double the required space for A, B and the product; on the basis that the same workspace could be used by both an 8-bit and a 16-bit multiply subroutine, and/or a generic, any-length multiply.

So I saw a post on a forum somewhere, awhile ago, and it set a train of thought in motion. Which has led me to take on a programming project. I can't reveal too many of the details here, in case the original poster is reading this and twigs to what I am doing. But, the hardware platform is the BBC Micro, a popular 8-bit machine in the 1980s, and the project -- described in its most abstract terms -- consists of a relational database with three main tables; a visual editor; and a report generator.

One of the tables will be populated by some external means (because I can't be bothered to design a visual editor for it right now) by translating an input file from a standard format. Another will be fairly static, and will govern the visual representations of items that may be described in the input table. The third table will store the user's creative input which, in accordance with rules defined in the first table, govern the arrangement and interactions of items which are represented visually according to the second table.

(Don't worry if you don't understand any of that. It will probably get a bit more obvious.)

Rather than get an actual BBC Micro at this stage, and have to put up with slow cassette and disc loading times and possibly replace the machine if I can't fit it into the model B into which I was initially determined to shoehorn it, have to get a Master 128 instead and get into a last-minute bidding war, I've decided to use an emulator. In this case, the excellent BeebEm (instructions given for Ubuntu; but as a build-from-source, the second half should work on anything once you have the relevant packages and their -dev or -devel companions -- yes, you are a developer.)

Of course, it helps that I grew up with a BBC Model B and know enough BASIC and 6502 assembly language to get the job done. I've also got a slight advantage as my visual editing isn't going to require fast-moving multi-coloured sprites or anything else that cannot be rendered using universally-acceptable OS calls. It can all be made completely out of the lines and triangles (yes, 32KB of ROM and the most complicated shape a Model B can draw is a triangle) that the operating system can already produce. I'd get no benefit out of hitting the display memory directly; it's not as though I could write a faster triangle-drawing routine than Sophie Wilson and Steve Furber already did. So by keeping to "legal moves", if I can't fit it into a 32K B, I can retarget it seamlessly at the 128.

First off, I'm going to start with the visualisation, because that will give me some important clues for structuring the data in Table Two.

Tuesday, 9 April 2019

Multiplying on the 6502

Saturday, 16 March 2019

A Retro Programming Project