Notes on the Intel 8086 processor's arithmetic-logic unit
Posted by elpocko 1 day ago
Comments
Comment by kens 1 day ago
Comment by gruturo 1 day ago
Out of curiosity: Is there anything you feel they could have done better in hindsight? Useless instructions, or inefficient ones, or "missing" ones? Either down at the transistor level, or in high level design/philosophy (the segment/offset mechanism creating 20 bit addresses out of 2 16-bit registers with thousands of overlaps sure comes to mind - if not a flat model, but that's asking too much to 1979 design and transistor limitations I guess) ?
Thanks!
Comment by kens 23 hours ago
Given those constraints, the design of the 8086 makes sense. In hindsight, though, considering that the x86 architecture has lasted for decades, there are a lot of things that could have been done differently. For example, the instruction encoding is a mess and didn't have an easy path for extending the instruction set. Trapping on invalid instructions would have been a good idea. The BCD instructions are not useful nowadays. Treating a register as two overlapping 8-bit registers (AL, AH) makes register renaming difficult in an out-of-order execution system. A flat address space would have been much nicer than segmented memory, as you mention. The concept of I/O operations vs memory operations was inherited from the Datapoint 2200; memory-mapped I/O would have been better. Overall, a more RISC-like architecture would have been good.
I can't really fault the 8086 designers for their decisions, since they made sense at the time. But if you could go back in a time machine, one could certainly give them a lot of advice!
Comment by bonzini 6 hours ago
AAA/AAS/DAA/DAS were used quite a lot by COBOL compilers. These days ASCII and BCD processing doesn't use them, but it takes very fast data paths (the microcode sequencer in the 8086 was pretty slow), large ALUs, and very fast multipliers (to divide by constant powers of 10) to write efficient routines.
I/O ports have always been weird though. :)
Comment by gruturo 23 hours ago
Thanks for capturing my feeling very precisely! I was indeed thinking what they could have done better with the same approximate number of transistor and the benefit of a time traveler :) And yes the constraints you mention (8080 compatibility, etc) indeed limit their leeway so maybe we'd have to point the time machine at a few years earlier and influence the 8080 first
Comment by mjevans 14 hours ago
There's also the needs of the moment. Wasn't the 8086 a 'drop in' replacement for the 8080, and also (offhand recollection) limited by the number of pins on some of it's package options? This was still an era when it was common for even multiple series of computers from a vendor to have incompatible architectures that required at the very least recompiling software if not whole new programs.
Comment by bcrl 1 day ago
A more personal question: is your reverse engineering work just a hobby or is it tied in with your day to day work?
Comment by kens 1 day ago
Comment by rogerbinns 21 hours ago
Comment by kens 21 hours ago
To understand why the 8086 uses little-endian, you need to go back to the Datapoint 2200, a 1970 desktop computer / smart terminal built from TTL chips (since this was pre-microprocessor). RAM was too expensive at the time, so the Datapoint 2200 used Intel shift-register memory chips along with a 1-bit serial ALU. To add numbers one bit at a time, you need to start with the lowest bit to handle carries, so little-endian is the practical ordering.
Datapoint talked to Intel and Texas Instruments about replacing the board full of TTL chips with a single-chip processor. Texas Instruments created the TMX1795 processor and Intel slightly later created the 8008 processor. Datapoint rejected both chips and continued using TTL. Texas Instruments tried to sell the TMX1795 to Ford as an engine controller, but they were unsuccessful and the TMX1795 disappeared. Intel, however, marketed the 8008 chip as a general-purpose processor, creating the microprocessor as a product (along with the unrelated 4-bit 4004). Since the 8008 was essentially a clone of the Datapoint 2200 processor, it was little-endian. Intel improved the 8008 with the 8080 and 8085, then made the 16-bit 8086, which led to the modern x86 line. For backward compatibility, Intel kept the little-endian order (along with other influences of the Datapoint 2200). The point of this history is that x86 is little-endian because the Datapoint 2200 was a serial processor, not because little-endian makes sense. (Big-endian is the obvious ordering. Among other things, it is compatible with punch cards where everything is typed left-to-right in the normal way.)
Comment by variaga 20 hours ago
E.g. a 1 in bit 7 on a LE system always represnts 2^7 for 8/16/32/64/ whatever bit word widths.
This is emphatically not true in BE systems and as evidence I offer that IBM (natively BE), MIPS natively BE) and ARM (natively LE but with a BE mode) all have different mappings of bit and byte indices/lanes in larger word widths* while all LE systems assign the bit/byte lanes the same way.
Using the bit 7 example
- IBM 8-bit: bit 7 is in byte 0 and equal to 2^0
- IBM 16-bit: bit 7 is in byte o and equal to 2^8
- IBM 32-bit: bit 7 is in byte 0 and equal to 2^25
‐ MIPS 16-bit: bit 7 is in byte 1 and equal to 2^7
- MIPS 32-bit: bit 7 is in byte 3 and is equal to 2^7
- ARM 32-bit BE: bit 7 is in byte 0 and is equal to 2^31
Vs. every single LE system, regardless of word width
- bit N is in byte (N//8) and is equal to 2^N
(And of course none of these match how ethernet orders bits/bytes, but that's a different topic)
Comment by mjevans 14 hours ago
However I've always viewed Little Endian as 'bit 0' being on the left most / lowest part of the string of bits, but Big Endian 'bit 0' is all the way to the right / highest address of bits (but smallest order of power).
If encoding or decoding an analog value it makes sense to begin with the biggest bit first - but that mostly matters in a serial / output sense, not for machine word transfers which are (at least in that era were) parallel (today, of course, we have multiple high speed serial links between most chips, sometimes in parallel for wide paths).
Aside from the reduced complexity of aligned only access, forcing the bus to a machine word naturally also aligns / packs fractions of that word on RISC systems, which tended to be the big endian systems.
From that logical perspective it might even make sense to think of the RAM not in units of bytes but rather in units of whole machine words, which might be partly accessed by a fractional value.