Earlier this year I bought a Raspberry Pi 3 to have as an AArch64 development machine. The fastest way to get familiar with an instruction set is to write a disassembler for it and I’ve made one for 64-bit ARM in R6RS Scheme as part of the machine-code project. The instruction set is called ARM A64, instructions are always 32 bits wide and they have a neat structure which is pretty fast to decode in software.
The architecture has 31 integer registers (x0-x30). There is also a stack pointer register and a zero register that always contains zeroes. Both these registers are encoded as register number 31, and it’s up to each instruction if an operand can use the stack pointer or the zero register. The x30 register is used to store the return address. These registers are all 64-bit registers and the lower 32 bits can be accessed using the names w0-w31. Operations that write to the lower 32 bits also clear the upper 32 bits, just like on AMD64.
There are also 32 registers usable as either floating point registers or 128-bit vector registers. As vectors they support different arrangements that are either 64 or 128 bits in total, containing 8-bit, 16-bit, 32-bit or 64-bit quantities. There are many instructions that operate on multiple quantities at the same time, which is an interesting way to speed up code. Multiple loop iterations can be run simultaneously.
The instructions are documented in the ARM ARM for ARMv8-A. I’ve counted, not including instruction aliases, 442 instruction mnemonics (things like ADD, EOR, B.EQ, etc). They are organized in what is basically a four-level table: main encoding, instruction group, decode group and instruction. Chapter C4 of the manual follows the same structure. This structure is nice for fast decoding, but it’s not strictly necessary since all encodings at the instruction level still need to have a unique meaning.
For each instruction mnemonic there can be multiple variants that
enable the instruction to handle different types of operands. An
example of this is the FMUL instruction that multiples two floating
point values. In a C program it would look like a = b * c
. In A64
assembler it might look like one of these, depending on what the
surrounding code does:
fmul s0, s1, s2 ;single precision floats
fmul d0, d1, d2 ;double precision floats
fmul v0.2s, v1.2s, v2.2s ;vectors with two singles
fmul v0.4s, v1.4s, v2.4s ;vectors with four singles
fmul v0.2d, v1.2d, v2.2d ;vectors with two doubles
fmul s0, s1, v2.s[0] ;multiplies by a vector element
fmul d0, d1, v2.d[0]
fmul v0.2s, v1.2s, v2.s[0] ;combinations of the above
fmul v0.4s, v1.4s, v2.s[0]
fmul v0.2d, v1.2d, v2.d[0]
That’s quite a few variants for a single mnemonic. Not all mnemonics have this many variants, but depending on how one counts I estimate that there are in total around 1000-2000 variants. The instruction set designers had to fit all these variants into 32 bits, while at the same time making space for instructions that encode relatively large immediate operands, and not forgetting about leaving space for future extensions. As if that wasn’t difficult enough, the instructions should also be easy to decode with hardware.
Instruction encodings
I’ve extracted the tables from my disassembler, rendered them with the bit-field package, and made them slightly interactive. If you’re reading this in a browser you can see the encodings below. The thing to notice is that each layer adds extra fixed bits: fields that must be a fixed 0 or 1 value. (The last level, the instruction level, is not shown in this table). Two encodings under the same parent always have some differences in these fields, so that they can be separated by an instruction decoder. Click an encoding to expand the next level of encodings.
There are many conventions in the field names. Instructions that take register operands encode them in fields named Rd, Rn and Rm. Immediate values (integers, PC-relative offsets, etc) are named imm. Fields that change the type of operation tend to be called opN or opcode. In general a few of the fields encode the operation (or the size of the operation) and the rest encode the operands.
Room to grow
The image below shows the encoding space of the instruction set. The x axis goes from 0 to 216-1 and encodes the lower 16 bits of the instruction space, and the y axis contains the upper 16 bits. The different colors denote different decode groups, i.e. all the encodings at the third level of the table above. (There is probably a better representation).
All the dark spots are places where ARMv8 does not have any allocated instructions, or the encoding is reserved. For many instructions there are some fields that have reserved encodings and these are also dark.
Even if instructions are kept to the fixed 32 bit encoding there is still plenty of room for the instruction set to grow.
Impression
ARM A64 is a quite clean instruction set with only a few quirks here and there in its encoding. Compared to AMD64 it has twice the amount of registers, a clean separation of load/store instructions, clean RISCy operands (mostly one destination register and two source registers) and of course the register names and most mnemonics are totally different. Both have 128-bit vector registers and 64-bit integer registers and a 64-bit address space. They look quite similar, except everything’s different.