Friday 17 July 2020

Multiplying on the 6502, but faster

There's a much faster way to do multiplication, and it all comes down to handling the business of adding up the shifted copies of the multiplicand.
Let's look at 7 * 5 in binary again:
0 1 1 1 * 0 1 0 1
         0 1 1 1
         0 1 0 1 *
----------------
0 0 1 0 0 0 1 1


0 0 0 0 0 1 0 1
0 1 1 1 ^ Add if 1
-------
0 1 1 1 0 1 0 1 -> shift right -> 0 0 1 1 1 0 1 0

0 0 1 1 1 0 1 0
0 1 1 1 ^ Don't add if 0
-------
0 0 1 1 1 0 1 0 -> shift right -> 0 0 0 1 1 1 0 1

0 0 0 1 1 1 0 1
0 1 1 1 ^ Add if 1
-------
1 0 0 0 1 1 0 1 -> shift right -> 0 1 0 0 0 1 1 0

0 1 0 0 0 1 1 0
0 1 1 1 ^ Don't add if 0
-------
0 1 0 0 0 1 1 0 -> shift right -> 0 0 1 0 0 0 1 1
Here we begin with zeros in the high bits of the product register, and a copy of the multiplier in the low bits of the product register.  Our model is only using 4 bit words, but of course the real 6502 uses 8 bits.

We loop for as many times as there are bits in the multiplier.

Each time around the loop, we examine the lowest bit of the product (which will actually be a bit from the multiplier; the units the first time around, the twos the second time around, and so forth).

If the lowest bit is a 1, we add the multiplicand to the high bits of the product.

Whether or not we performed an addition, we shift the product one bit to the right; discarding a bit from the multiplier.

After we are done going round the loop, we have the product in its rightful place.

This works because we have shifted the product a whole word to the right, so the units bit of the product is aligned to the units bit of a word in memory.  The copy of the multiplicand we might have added depending on the units bit of the multiplier has been shifted all the way down to the units position.  The copy that we might have added depending on the twos bit of the multiplier has been shifted right one time fewer, down to the twos position, and so forth.

Here is the process (assuming 8 bit multiplicand and multiplier and 16 bit product; but it can easily be extended) in pseudocode:

Begin with the multiplier in the bottom 8 bits of the product
Do the following 8 times:
    If bit 0 of the product is 1 then:
        Add the multiplicand to the top 8 bits of the product
    In any case:
    Shift the whole 16-bit product right
And here it is in 6502 assembly language!
.multiply          \ Expects multiplier in product
TYA                \ stash Y register on the stack
PHA
LDY #8             \ loop counter
LDA #0             \ initialise product
STA product+1
.mult_1
LDA product
AND #1             \ examine bit from multiplier
BEQ mult_2         \ skip adding if zero
CLC
LDA product+1
ADC multiplicand   \ if this sets a carry, it will just
STA product+1      \ get shifted into the product           
.mult_2 \ shift whole product one bit right
ROR product+1 \ bring in carry from addition ROR product \ discard used bit DEY \ decrease counter BNE mult_1 PLA TAY \ restore Y RTS
.product
EQUW 0
.multiplicand
EQUB 0

It's faster than my earlier idea because the addition is restricted to just the width of the multiplicand, not the full width of the product.

Thursday 16 July 2020

Creating "universal" .deb packages

This is a way to create a "universal" .deb package suitable for installing on Debian, Ubuntu and similar distributions such as Linux Mint; and on any machine architecture  (64-bit, 32-bit, even Raspberry Pi).

The package actually installs the source files under /usr/src/ and performs the compilation and installation of the actual binary in the post-installation phase.   The package manager will make sure all dependencies are installed before running the post-install script.  As long as all dependencies are correctly specified  (remember, if you need libfoo to run something, you will also need libfoo-dev to build it)  then the overall effect will be exactly the same as if they had installed a pre-compiled binary package; but you only have to .deb-ify it once, and you won't be expecting random strangers to install unknown binaries on their systems.

We are going to create a package called "wibble", which will contain this simple program:

#include <stdio.h>
int main() {
    printf("Wibble\n");
    return 0;
};

All it does is display "Wibble", but the same principles obviously are applicable to anything more ambitious.

Tuesday 9 April 2019

Multiplying on the 6502

The 6502 does not have a hardware multiply instruction, so we need to write us a multiply routine in software. Here we need to multiply an 8-bit value by an 8-bit value giving a 16-bit product, but the principle can be extended up to a 256-bit multiplicand, 256-bit multiplier and 512-bit product.

Multiplication in binary is exactly the same as the decimal multiplication you learned in primary school, but using only ones and zeros.  For instance, here is how we would perform 7 * 5 in binary:

0 1 1 1 * 0 1 0 1
         0 1 1 1
         0 1 0 1 *
-------------------
         0 1 1 1    (from the 1 in the units position)
     0 1 1 1 0 0    (from the 1 in the fours position)
-------------------
      1 1 1         (carry between bits)
 0 0 1 0 0 0 1 1
===================

We look at the multiplier, working from right to left.  If there is a 0 at that bit, we do nothing.  If there is a 1, we write down a copy of the multiplicand, lining its units digit up with the multiplier digit in question.  When we have all our partial product rows, we just add them up to get our final answer.  (In practice, we will keep adding as we go along.)

The 6502 has bit-shifting instructions ROL and ROR, which shift the accumulator or addressed memory location left or right respectively; importing the previous value of the carry flag onto one end as a new digit, and catching the bit that falls out the other end in the carry flag. There are also ASL and LSR instructions, which do the same but clear the carry flag first so the newly-imported bit will always be zero.  Typically you would use LSR on the high byte and ROR on the lower bytes in turn, or ASL on the low byte and ROL on the higher bytes in turn; but if you wish to import a one, you could do SEC followed by ROL or ROR.

The code:
.mult8
TXA              \ we are going to stomp on X
PHA
LDA #0
STA mulA+1       \ extend A register to 16 bits
STA product      \ zero product register
STA product+1
LDX #8           \ 8 bits
.mult8_1         \ BEGINNING OF LOOP
LSR mulB         \   shift B right
BCC mult8_2      \   if 0 fell out, skip all this
LDA mulB         \     set the high bit of B, so we regenerate
ORA #128         \     the original value after eight shifts
STA mulB         \ 
LDA product      \     add shifted-A to the product
CLC
ADC mulA
STA product
LDA product+1    \     now the high byte
ADC mulA+1
STA product+1
.mult8_2         \  paths merge again
ASL mulA         \  shift A left
ROL mulA+1
DEX
BNE mult8_1      \ move on to the next bit
LDA mulA+1       \ effectively, shift A right 8 places
STA mulA
PLA
TAX              \ restore X
RTS              \ ..... and we're outta here
 
.mulA    EQUD 0  \ reserve 32 bits, in case we need a 16-bit multiply
.mulB    EQUW 0  \ reserve 16 bits
.product EQUD 0  \ reserve 32 bits
We begin by transferring the X register into the accumulator and pushing it onto the stack, so we can retrieve it later.  Then we zero out the byte immediately after A, so we have a clean 16-bit workspace with an 8-bit value at the bottom end, along which we can shift A; and the two bytes to hold the product.  The last step of the initialisation phase is to load the X register with 8  (the number of times we are going to go round).

Each time around the loop, we shift B to the right each time; the rightmost bit will fall off the end and be caught in the carry flag.  We can then use a BCC instruction to skip right over the addition if the bit that fell out was a 0.  If it was a 1, we first set the high bit of B -- which will have the effect of regenerating the original value after 8 shifts -- and then add A  (suitably shifted to line up with the original bit position out of B that was a 1)  to the product.  We know that the carry flag is set, but can't make use of this as we are not adding a constant.  (Otherwise, we could add one less than what we wanted to add, and save a byte and two cycles clearing the carry.)

Whatever bit shifted out of B, we eventually arrive at the same point.  We shift A left 1 bit and decrease X.  If the result is not zero, we go around the loop again.

After eight rounds, A has been shifted left 8 bits; and thanks to our setting the high bit whenever the low bit was 1, B is exactly as it originally was.  We copy A back to where it started out, pull the value we pushed earlier from the stack, transfer it back to X and return whence we came.

Note, I have purposely reserved double the required space for A, B and the product; on the basis that the same workspace could be used by both an 8-bit and a 16-bit multiply subroutine, and/or a generic, any-length multiply.

Saturday 16 March 2019

A Retro Programming Project

So I saw a post on a forum somewhere, awhile ago, and it set a train of thought in motion.  Which has led me to take on a programming project.  I can't reveal too many of the details here, in case the original poster is reading this and twigs to what I am doing.  But, the hardware platform is the BBC Micro, a popular 8-bit machine in the 1980s, and the project -- described in its most abstract terms -- consists of a relational database with three main tables; a visual editor; and a report generator.

One of the tables will be populated by some external means  (because I can't be bothered to design a visual editor for it right now)  by translating an input file from a standard format.  Another will be fairly static, and will govern the visual representations of items that may be described in the input table.  The third table will store the user's creative input which, in accordance with rules defined in the first table, govern the arrangement and interactions of items which are represented visually according to the second table.

(Don't worry if you don't understand any of that.  It will probably get a bit more obvious.)

Rather than get an actual BBC Micro at this stage, and have to put up with slow cassette and disc loading times and possibly replace the machine if I can't fit it into the model B into which I was initially determined to shoehorn it, have to get a Master 128 instead and get into a last-minute bidding war, I've decided to use an emulator.  In this case, the excellent BeebEm  (instructions given for Ubuntu; but as a build-from-source, the second half should work on anything once you have the relevant packages and their -dev or -devel companions -- yes, you are a developer.)

Of course, it helps that I grew up with a BBC Model B and know enough BASIC and 6502 assembly language to get the job done.  I've also got a slight advantage as my visual editing isn't going to require fast-moving multi-coloured sprites or anything else that cannot be rendered using universally-acceptable OS calls.  It can all be made completely out of the lines and triangles  (yes, 32KB of ROM and the most complicated shape a Model B can draw is a triangle)  that the operating system can already produce.  I'd get no benefit out of hitting the display memory directly; it's not as though I could write a faster triangle-drawing routine than Sophie Wilson and Steve Furber already did.  So by keeping to "legal moves", if I can't fit it into a 32K B, I can retarget it seamlessly at the 128.

First off, I'm going to start with the visualisation, because that will give me some important clues for structuring the data in Table Two.

Tuesday 21 August 2018

Eid-ul-Adha: Celebrating an Obscenity

Today is the Muslim feast of Eid-ul-Adha.  This celebrates an event in both the Jewish / Christian and Islamic mythologies.  You can read it in Genesis 22 or Surah 37.  It's not for the squeamish.

Basically, God told Abraham  (or Ibrahim)  to sacrifice his son Isaac  (or Ishmael),  to prove he loved God more than even his own family.

So 'Bram told God in no uncertain terms to fuck off, and spent the rest of his days as a happy humanist, right?

Tuesday 18 April 2017

This year, I will definitely remember my mother's birthday

This year, I will definitely remember my mother's birthday.  For she was born on the 8 June, one year I am not allowed to mention.

That is also the last day anything is going to make sense, ever.

Thursday 27 October 2016

Ransomware (no, not that sort)

Yes, there is malicious software out there that encrypts files on your computer and directs you to a web page where, for a fee, you can buy the decryption key.  Or just restore from a recent backup  (you do make backups, right?)  But that's not what I'm here to talk about tonight.

Closed-source software vendors are holding their customers to ransom.  If you don't buy the latest version of their products, you won't be able to read anyone else's documents if they have upgraded to the latest version.  But what is especially galling is where electronically-identical hardware appliances have features enabled or disabled purely by software.