Overview

This project is intended to be a primer to the AMD64 function calling process with the goal of breaking down what happens inside a function call at the CPU instruction level.

Topics covered include:

Runtime stack
Functionality of various registers (i.e. instruction, stack, and base)
Parameter passing
Returning a value
Caller and callee responsibilities

Getting Started

Reading this document will suffice in gaining an understanding of the topics to be covered. However, it is recommended to actually build the included assembly file and step through it instruction-by-instruction to test your understanding by examining various registers and memory locations.

Building

The C Standard Library is required for compilation. On Ubuntu Linux or Debian based distributions, dependencies can be installed with:

$ sudo apt install build-essential

Compiling the program:

$ gcc -g main.s -o main

Running and Debugging

Running the program and checking the return value:

$ ./main
$ echo $?

Debugging can be done with GDB:

$ gdb ./main

If you are not familiar with GDB, there is a built in help tool. Or you can view the GDB documentation or the many examples online. To get you started, the following commands will set a breakpoint at the entry point to the program and start execution. ni will execute the next instruction and stop. You can repeat the last command ni by pressing enter.

(gdb) b main
(gdb) run
(gdb) ni
(gdb)

The actual output will look similar to:

(gdb) b main
Breakpoint 1 at 0x1178: file main.s, line 41.
(gdb) run
Starting program: /scratch/freddiehaddad/projects/assembly/main 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at main.s:41
41		push	%rbp
(gdb) ni
42		movq	%rsp, %rbp
(gdb) 
44		call	bar
(gdb)

Introduction

Before diving into the actual code, an understanding of a program's organization in memory and how a function call works will be helpful in the learning process.

Address Space

A commonly used memory organization for a program at runtime:

Low Memory      +-----------+
                |   Text    |    Program Instructions
                +-----------+
                |   Data    |    Initialized Global Variables
                +-----------+
                |    BSS    |    Uninitialized Global Variables
                +-----------+
                |   Heap    |    Dynamically Allocated Memory
                |     |     |
                |     v     |
                |           |
                |     ^     |
                |     |     |
                |   Stack   |    Runtime Stack
High Memory     +-----------+

In this layout, the runtime stack grows from high memory towards low memory and any dynamically allocated memory grows from low memory towards high memory. More simply put, the heap and the stack grow towards each other.

Calling Convention

You can think of a calling convention as a contract for how subroutines (i.e. functions) operate and how the caller and callee can communicate — that is pass arguments and return values.

This agreed upon standard includes:

How parameters are passed from the caller to the callee
Which registers the callee must preserve for the caller
How the program flow from one function to another will happen
How a stack frame will be created and destroyed

Function Call Process

Let's first consider what must happen for a function to call another function and eventually return.

Caller responsibilities:

pass arguments to the callee
preserve any registers designated caller saved
store the address to resume execution when the called function returns

Callee responsibilities:

preserve any registers designated callee saved
pass the return value back to the caller
restore any registers designated callee saved before returning

Immediately we can see the need for a contract between the caller and callee to ensure:

callee knows where the arguments are
callee knows the order in which arguments were passed
caller knows where the callee will save the return value
callee can restore the stack frame for the caller
the program can continue where it left off when the callee returns

Code Walkthrough

Everything discussed so far requires a lot of unpacking. We'll accomplish this by converting a program written in C to Assembly step-by-step or (instruction by instruction).

C Program

We'll define two functions foo and bar using the C programming language translating their behavior into Assembly following the interface and conventions discussed.

int foo(int _a, int _b)
{
	int a;
	int b;
	int c;

	a = _a;
	b = _b;

	c = a + b;
	return c;
}

int bar()
{
	int a;
	int b;
	int c;
	
	a = 1;
	b = 2;

	c = foo(a, b);

	return c;
}

Assembly

For simplicity, we will assume ILP-64 — meaning integers and pointers are all 64-bit (8 bytes). Program flow will be captured starting inside bar, ignoring how we got to this point. We'll walk through the calling convention from here.

If we were to imagine the stack at the moment program flow moves to bar, it might look like this:

bar            +--------+
          rsp  | rip    |
               .        .
               .        .
          rbp  .        .

The instruction at the point the call to bar was made (in the instruction pointer) is at the top of the previous function's call stack. This happened as a result of the call instruction which saves the value in the rip register on the stack and moves the stack pointer rsp. Lastly, the instruction pointer register is set to the address of bar.

Thinking about how we got to bar, the stack pointer and the base pointer registers represent the caller's stack frame. The values in those registers need to be preserved because bar must create its own stack frame, use it for its work, destroy it when finished, and restore the caller's stack frame.

Looking at the definition of bar, we can see the function defines three local variables (a, b, and c), calls foo with a and b as the arguments, stores the return value in c, and returns that value to the caller.

Let's start by setting up the stack frame. We need to save the caller's base pointer rbp value. This is achieved by using the push instruction.

push    %rbp

The push operation allocates space on the stack to hold the value of the rbp register and stores its value. The stack frame after this instruction looks like this:

          rsp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
          rbp  .        .

After saving the value in the rbp register, we must update it to reflect the base of bar's stack frame. This can be achieved with the movq instruction.

movq     %rsp, %rbp

We move the value of the stack pointer rsp into the base pointer rbp since this will be the start of bar's stack frame.

The layout now:

     rsp, rbp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

Now we need to allocate space on the stack for the three local variables (a, b, and c). Remembering that we are treating all values at 64-bit (8 bytes), we need increase the size of our stack frame to 24 bytes. We can do this by subtracting 24 from the value in the stack pointer.

Recall that the stack grows from high to low memory towards the heap. In other words, the top of the stack is approaching zero as it grows. Hence, the reason for the subtraction. As a side note, you could also use the add instruction to add -24 to the stack pointer.

sub     $24, %rsp

or

add     $-24, %rsp

We have now allocated space on the stack for bar's local variables:

          rsp  | a      |
               | b      |
               | c      |
          rbp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

Next, we need to assign the values 1 and 2 to a and b, respectively. Looking at the rsp register, we can see that it's already pointing to the top of the stack. This is where a is. Since b is adjacent to a in memory, it's 8 bytes past a. With the top of the stack at a lower memory address than the bottom, the offsets to the local variables are positive from the stack pointer.

With all that in mind, we can assign the values:

movq     $1, 0(%rsp)
movq     $2, 8(%rsp)

Note: The X64 Intel/AMD processors have general purpose registers that can be used instead of allocating memory on the stack. Storing the values on the stack is purely for educational purposes.

Note: The AMD64 ABI requires the stack pointer to be 16-byte aligned before a call instruction. Allocating 24 bytes violates this requirement. In practice, 32 bytes should be allocated to maintain proper alignment. This detail is omitted here for simplicity.

We are now ready to set up the call to foo.

As per the AMD64 calling convention, the following registers are used for the first two arguments to a function:

rdi is used for the first argument
rsi is used for the second argument

This can be achieved as follows:

movq    0(%rsp), %rdi
movq    8(%rsp), %rsi

With the arguments now in the proper registers, we are ready to call foo. This can be done with the call instruction:

call    foo

After the call to foo, the instruction pointer rip is pointing to the function foo and the runtime stack now looks like this:

foo            +--------+
          rsp  | rip    |
               | a      |
               | b      |
               | c      |
          rbp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

Recall that the instruction call saves the value in the instruction pointer rip register on the stack and adjusts the stack pointer.

Looking at the definition of foo, we can see that it needs three integers (a, b, and c) just like bar. Following in the same steps as we did when entering bar, we would do the following:

push    %rbp
movq    %rsp, %rbp
sub     $24, %rsp

At the end of this sequence of instructions, the stack now looks like this:

          rsp  | a      |
               | b      |
               | c      |
          rbp  | rbp    |
foo            +--------+
               | rip    |
               | a      |
               | b      |
               | c      |
               | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

Next, we need to get the arguments a and b which the function expects and were passed in the rdi and rsi registers. The first thing foo does is assign these values to its local variables.

movq    %rdi, 0(%rsp)
movq    %rsi, 8(%rsp)

Adding the two values is slightly interesting because the add instruction only takes two arguments — the operands. The second operand is updated with the result of the operation.

You can think of

add     %rdi, %rsi

as

rsi = rdi + rsi

Now that rsi has the result of the operation, assigning it to c is achieved with a movq operation.

movq    %rsi, 16(%rsp)

Finally, foo needs to return the value to the caller. This is handled by placing the return value in the rax register.

movq    %rsi, %rax

or

movq   16(%rsp), %rax

With the return value placed in the proper register, it's time to tear down the stack and return to the caller.

To accomplish this, a few things need to happen:

We need to tear down the stack
Restore the previous values in the rsp and rbp registers
Update the instruction pointer (rip) to the next instruction in the caller's function

Let's start with restoring the stack by thinking about how we created it.

The instructions:

push    %rbp
movq    %rsp, %rbp
sub     $24, %rsp

Resulted in foo's stack:

          rsp  | a      |
               | b      |
               | c      |
          rbp  | rbp    |
foo            +--------+

In essence if we undo the three actions we took when creating the stack frame, we should be able to restore it.

Thus,

add     $24, %rsp
movq    %rbp, %rsp
pop     %rbp

would suffice to achieve our goal returning us to the moment we entered foo.

However, one observation reveals a minor optimization saving us the step of executing the add instruction.

Notice how the instruction

movq     %rbp, %rsp

automatically collapses the stack frame by setting the rsp register to the same value as rbp. The add happened implicitly as a result of the movq instruction.

The pop instruction is doing two things that must be noted:

Writes the value at the top of the stack to the specified register (in this case rbp)
Adjusts the stack pointer rsp register value to point to the next element on the stack

As another side note, the two instructions can actually be reduced to a single instruction leave which does the same thing.

So, our final solution to the problem of restoring the stack frame can be reduced to:

movq    %rbp, %rsp
pop     %rbp

or

leave

partially restoring the stack to our desired state:

foo            +--------+
          rsp  | rip    |
               | a      |
               | b      |
               | c      |
          rbp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

With foo's stack frame destroyed, we are almost finished. The last step foo needs to take is returning control back to the caller. We can accomplish this with the ret instruction.

ret

The ret instruction is equivalent to popping the next value off the stack and placing it in the instruction pointer rip register.

After the ret instruction is executed, our stack frame is restored:

          rsp  | a      |
               | b      |
               | c      |
          rbp  | rbp    |
bar            +--------+
               | rip    |
               .        .
               .        .
               .        .

Alas, we are back inside bar with all registers restored and the return value ready for us in the rax register. The last two steps of bar include assigning the return value from foo to its local variable c and returning that value to the caller.

Since the return value is already in the rax register and bar doesn't make any changes to it, the return value is already set. Therefore, all we need to do is assign the return value to our local variable c.

movq    %rax, 16(%rsp)

Tearing down bar's stack is the same as what we did in foo's stack.

leave
ret

After the final two instructions are executed, program flow will have returned to bar's caller and the program continues executing.

          rsp  .        .
               .        .
          rbp  .        .

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
README.md		README.md
main.s		main.s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Getting Started

Building

Running and Debugging

Introduction

Address Space

Calling Convention

Function Call Process

Code Walkthrough

C Program

Assembly

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Getting Started

Building

Running and Debugging

Introduction

Address Space

Calling Convention

Function Call Process

Code Walkthrough

C Program

Assembly

Additional Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages