For a while I’ve gotten a LOT of questions about why the C64 version, and my upcoming ASM version of Paku Paku tracks scores in BCD – Binary Coded Decimal. The simple fact is that it is WAY faster even with the overhead of the long score additions. Sure, each score takes 8 bytes (I display a fake ‘0’ after since all score values are multiples of ten, 9 digits is overkill) instead of four...
But the simple fact is that whilst yes, adding a value to the score takes eight additions with AAA (ascii adjust for addition), that is still WAY faster than the multiple divides (16 in total) needed to turn a 32 bit integer into ASCII or even just BCD -- and you'd need it in one or the other to output to the screen! You basically NEED unpacked (aka 8 bits per digit) binary coded decimal or something like it to handle display numeric output in a way Joe Sixpack and Susie Sunshine can digest. "normal people" aren't going to be happy if you display your scores in a game in hexadecimal!
So take this long BCD addition:
%macro BCDAddAAA 0 lodsb add al, [es:di] aaa stosb %endmacro %macro BCDAdcAAA 0 lodsb adc al, [es:di] aaa stosb %endmacro ; procedure addBCDUnpacked(var b1, b2:BCDUnpacked); addBCDUnpacked: ; ACCEPTS ; ES:DI = pointer to 8 byte BCD to add to ; DS:SI = pointer to 8 byte BCD to add ; CORRUPTS ; AX, DI, SI ; RETURNS ; nothing BCDAddAAA BCDAdcAAA BCDAdcAAA BCDAdcAAA BCDAdcAAA BCDAdcAAA BCDAdcAAA BCDAdcAAA Ret
That takes around 300 clocks to execute. Seems like a long time – but consider how long it takes to convert a actual binary stored 32 bit integer to decimal for displaying the result using text. EACH DIGIT takes a long divide, and since we’re on a 16 bit processor that means TWO divides per DIGIT we want to output. News flash, 8088/8086 16 bit divides take upwards of 144 clocks EACH. By the time you do that and figure in all the other overhead of preseriving register values and the carry by divide trickery, you’re talking over 350 clocks per BYTE of output, or nearly 2800 clocks EVERY time you want to update the score on screen! Even with the ‘trick’ of doing divides all the time even on the first loop and using the resultant modulo, by the time it’s all said and done just calculating the characters to be displayed is talking 3000 clocks!
Whereas with BCD all I have to do is LODSB, add 48, and call the font routine. Not even 16 clocks per byte on the display side.
Bottom line? I can do around 40 clocks for a 32 bit add and 3000 clocks for calculating what to show on screen, or I can do 320 clocks for a 8 digit BCD addition and ~150 clocks for calculating the output... and that's assuming I need it as ASCII -- if you have optimized number output routines handling 0..9 in graphics modes it's even faster.
Last time I checked, 3040 is a hell of a lot more than 470. That’s why BCD exists and where it is REALLY handy!
This only gets worse on 8 bit CPU's like the Z80, 6502, or 6809 where again you have very slow division, and at 8 bits you're talking some agonizingly complex trickery to turn a 32 bit integer (some cheat and use 24 bits for scores) into 8 or 9 digits of output. It's just faster and simpler to do your score calculations as unpacked BCD since that's the format you NEED to show the result on screen.
It’s almost comical how many game codebases even from that time where the bottleneck was actually showing numbers on screen because of the conversion from binary integer to text characters.