Assembler Crash Course (x86-64)

[ Class Overview ] [ Assembler Crash Course ]

Assembler Crash Course (x86-64)

Content

What is the goal of this crash course?
What is an assembler?
What can an assembler do?
What is a register?
What is memory?
What is a stack?
Addressing modes
Procedures
Volatile and non-volatile registers / Link to C

What is the goal of this crash course?

The goal of this crash course is to give an overview of assembly language programming, especially for OSC participants who do not yet have any assembly knowledge.

We don't expect you to be able to write complex assembler programs at the end, since you don't need to. However, we hope that this will give you at least some idea of what a high-level language program looks like in assembler, and that you will be able to write very small assembler functions yourself if given the appropriate help.

The different concepts are explained using the x86_64 processor as an example. This processor architecture is also known as amd64 or x64, comes from the companies Intel and AMD and can be found directly or as a replica in almost every modern PC. The notation used corresponds to the Netwide Assembler NASM, which is also used in the development of the exercise operating system OOStuBS.

The "framework" of an assembler program is not explained here, you can see it best in an assembler file.

What is an assembler?

Strictly speaking, an assembler is a compiler which translates the code of an "assembler program" into machine code, i.e. zeros and ones. Unlike a C compiler, however, the assembler has it very easy, since (almost always) one assembler instruction corresponds to exactly one machine code instruction. The assembler program is therefore only a representation of the machine program which is (somewhat) more comfortable for humans:

Instead of

01001000 00000101 11101000 00000011

the programmer can use the assembler instruction

 add rax,1000

which (for the x86_64 processors) means exactly the same thing:

Symbolic description	Machine code
add rax	0100100000000101
1000 (dec.)	0000001111101000

(Additionally the assembler swaps the order of the bytes of the offset)

0100100000000101	11101000	00000011
add rax	low Byte	high Byte

In common language usage, the term "assembler" is understood less as the compiler than as the symbolic notation of the machine language. add rax,1000 is then an assembler instruction.

What can an assembler do?

An assembler can actually do very little, namely only what the processor understands directly. All the nice constructs of higher programming languages, which allow the programmer to transfer his algorithms into understandable, (fairly) error-free programs, are missing:

no complex statements
no comfortable for, while, repeat-until loops, but almost only gotos
no structured data types
no subprograms with parameter passing
...

Examples:

The C-Statement
```
sum = a + b + c + d;
```
is too complicated for an assembler and must therefore be split into several instructions. The x86_64 assembler can only add two numbers and store the result in one of the two "variables" (accumulator register) used. The following C program is therefore more like an assembler program:
```
       sum = a;
       sum = sum + b;
       sum = sum + c;
       sum = sum + d;
       
```
and would look like this in assembler:
```
       mov rax,[a]
       add rax,[b]
       add rax,[c]
       add rax,[d]
       
```

Simple if-then-else constructs are already too difficult for assemblers:

       if (a == 4711)
        {
          ...
        }
       else
        {
          ...
        }

and must therefore be expressed using gotos:

                 if (a != 4711)
                    goto unequal
       equal:    ...
                 goto break:
       unequal:  ...
       break:    ...

In the x86_64 assembler it looks like this:

                 cmp rax,4711
                 jne unequal
       equal:    ...
                 jmp break
       unequal:  ...
       break:    ...

Simple counting loops are better supported by the processor. The following C program
```
       for (i=0; i<100; i++)
        { 
          sum = sum + a;
        }
       
```
looks like this in assembler:
```
                  mov rcx,100
       continue:  add rax,[a]
                  loop continue
       
```
The loop instruction implicitly decrements the rcx register and executes the jump only if the register content is not 0 afterwards.

What is a register?

In the examples mentioned so far the names of registers were always used instead of the variable names of the C program. A register is a tiny piece of hardware inside the processor which can store up to 64 bits, i.e. 64 digits in the range 0 and 1, in x86_64.

x86_64 CPUs have the following registers:

General-purpose registers
Name	Comment
rax	general purpose, special meaning for Arithmetic commands
rbx	general purpose
rcx	general purpose, special meaning for loops
rdx	general purpose
rbp	base pointer
rsi	source for string operations
rdi	destination for string operations
rsp	stack pointer
r8 bis r15	general purpose

Segment registers
Name	Comment
cs	code segment
ds	data segment
ss	stack segment
es	any segment
fs	any segment
gs	any segment

Other registers
Name	Comment
rip	instruction pointer
rflags	CPU status

In addition, there are the 64-bit floating-point registers MMX0 to MMX7 and the 128-bit SEE registers XMM0 to XMM15, but we do not use them here.

The lower bytes of the registers rax, rbx, rcx and rdx have their own names, also the 32bit parts of rbp, rsi, rdi, rsp, rflags and rip can be used this way. For the register rax, for example, it looks like this:

eax for the lower 32 bits, ax for the lower 16 bits, al for
the bits 0 to 7 and ah for the bits 8 to 15.

What is memory?

Most of the time, the registers are not enough to solve a problem. In this case, the main memory of the computer must be accessed, which can store considerably more information. To the assembler programmer, the main memory looks like a huge array of registers that are 8, 16, 32 or 64 bits "wide" as needed. So the smallest addressable unit is a byte (= 8 bits). Therefore, the size of the memory is also measured in bytes. In order to access a specific entry of the "main memory" array, the programmer must know the index, i.e. the address of the entry. The first byte of the main memory gets the address 0, the second the address 1 and so on.

In an assembler program, variables can be created by assigning a label to a memory address and reserving memory space of the desired size.

[SECTION .data]
gruss:       db 'hello, world'
unglueck:    dw 13
million:     dd 1000000

[SECTION .text]
             mov ax,[million]
             ...

What is a stack?

You don't always want to think up a new label just to store the value of a register for a short time, for example, because you need the register for a certain instruction but don't want to lose the old value. In this case you want something like a notepaper. You get it with the stack. The stack is actually nothing more than a piece of main memory, except that fixed addresses are not used there, but the data to be saved is simply always written to the top (push) or fetched from the top (pop). So the access is quite simple, provided that one remembers in which order the data was put on the stack. A special register, the stack pointer rsp always points to the top element of the stack. Since push and pop can only transfer 64 bits at a time, the stack is shown eight bytes wide in the following figure.

Addressing modes

Most instructions can take their operands either from registers, from memory or directly from a constant. With the mov instruction (among others) the following forms are possible, where the first operand always specifies the destination and the second always the source of the copy action:

Register addressing: The value of one register is transferred to another.
mov rbx,rdi
Immediate addressing: The constant is transferred into the register.
mov rbx,1000
Direct addressing: The value that is at the specified memory location is transferred to the register.
mov rbx,[1000]
Register indirect addressing: The value that is at the memory location specified by the second register is transferred to the first register.
mov rbx,[rax]
Base register addressing: The value located at the memory location specified by the sum of the contents of the second register and the constants is transferred to the first register.
mov rax,[10+rsi]

Note: If the x86 processor operates in real mode (e.g. when working with the MS DOS operating system), memory addresses are specified by a segment register and an offset. But here this is not necessary (it is even wrong), because OOStuBS runs in long mode and the segment registers have already been initialized for you by us.

Procedures

From the higher programming languages the concept of the function or procedure is known. The advantage of this concept over a goto is that the procedure can be called from any point in the program and the program is then continued at exactly the point that follows after the procedure call. The procedure itself does not need to know from where it was called and where it continues afterwards. This is done automatically somehow. But how?

The solution is that not only the data of the program, but also the program itself resides in main memory, and thus each machine code instruction has its own address. In order for the processor to execute a program, its instruction pointer must point to the beginning of the program, so the address of the first machine code instruction must be loaded into the special register instruction pointer rip. The processor will then execute that instruction and, normally, will then increment the contents of the instruction pointer by the length of the instruction in memory so that it points to the next machine instruction. In the case of a jump instruction, the instruction pointer is not incremented or decremented by the length of the instruction, but by the specified relative destination address.

To call a procedure or function (the same in assembler), the method is the same as for a jump instruction, except that the old value of the instruction pointer (+ length of the instruction) is written to the stack beforehand. At the end of the function, a jump to the address stored on the stack is then sufficient to return to the calling application.

In the x86 architecture, storing the return address on the stack is done implicitly using the call instruction. Similarly, the ret instruction also implicitly performs a jump to the address located on the stack:

; ----- Main program -----
;
main:  ...
       call f1
xy:    ...

; ----- Function f1
f1:    ...
       ret

If the function should receive parameters, these are partially passed on via CPU registers and partially via the stack, depending on the calling convention used. Here we use the System V AMD64 ABI, which is common on unix like systems, specifying that the first six arguments are in the registers rdi, rsi, rdx, rcx, r8 and r9 and all other arguments (if any) are on the stack. A function call with two arguments then looks like this, for example:

      mov rdi, rax   ; first parameter for f1 (from rax)
      mov rsi, rbx   ; second parameter for f1 (from rbx)
      call f1

If the stack is used, the arguments must of course be removed there afterwards. This is done either with pop or by directly relocating the stack pointer.

      ; ...  the first six parameters are in registers
      push rax       ; seventh parameter for f1 (from rax)
      push rbx       ; eighth parameter for f1 (from rax)
      call f1
      add rsp, 16    ; remove two parameters from the stack

The first six parameters can be accessed within the function directly via the registers. If a function needs seven or more parameters, the base pointer rbp is typically used. If it is saved right at the beginning of the function and then assigned the value of the stack pointer, the seventh parameter can always be accessed via [rbp+16], the eighth via [rbp+24], and so on. This is independent of how many push and pop operations have been used since the beginning of the function.

f1:   push rbp
      mov  rbp,rsp
      ...
      mov rbx,[rbp+16]   ; load 7. parameter into rbx
      mov rax,[rbp+24]   ; load 8. parameter into rax
      ...
      pop rbp
      ret

Volatile and non-volatile registers / Link to C

To allow functions to be called from different places in the assembler program, it is important to specify which register contents may be changed by the function and which must still (or again) have the old value when the function is exited. Of course, the safest way is to basically store all needed registers on the stack at the beginning of the function and to reload them immediately before exiting the function.

However, the assembler programs generated by the GNU C compiler follow a slightly different strategy: They assume that many registers are only used for a short time anyway, for example as count variables of small loops or to write the parameters for a function to the stack. Here, it would be pure waste to laboriously save the already long outdated values at the beginning of a function and restore them at the end. Since you can't tell from looking at a register whether its contents are valuable or not, the developers of the GNU C compiler simply decided that the registers rax, rcx, rdx, rdi, rsi, r8, r9, r10 and r11 are basically to be considered volatile registers whose contents may simply be overwritten. The register rax has a special role: It provides the return value of the function (if required). The values of the other registers, on the other hand, must be saved before they may be overwritten by a function. They are therefore called non-volatile registers.