Before we start writing our Hello World program in Assembly, we need to find out how to call system calls in Linux. Assembly programs usually use syscalls to communicate with the Kernel (OS), syscalls have calls for writing, reading, exiting the program, and more.
| Instruction | Syscall # | Return value | arg0 | arg1 | arg2 | arg3 | arg4 | arg5 |
|---|---|---|---|---|---|---|---|---|
| SYSCALL | RAX | RAX | RDI | RSI | RDX | R10 | R8 | R9 |
RAX- To store system call number and return value (if there is a value).RDI- To store 0th argument to pass while invoking a syscall.RSI- To store 1st argument to pass while invoking a syscall.RDX- To store 2nd argument to pass while invoking a syscall.R10- To store 3rd argument to pass while invoking a syscall.R8- To store 4th argument to pass while invoking a syscall.R9- To store 5th argument to pass while invoking a syscall.
Syscalls and arguements are very complex, so you would need to refer to the Linux Syscall Table: Linux System Call Table
Hello World in X86_64 Linux Assembly:
BITS 64
section .data
msg db "Hello World", 0xA
len equ $ - msg
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, msg
mov rdx, len
syscall
mov rax, 60
xor rdi, rdi
syscallIts also located in code-snippets/hello-world.asm if you want to run it yourself.
-
BITS 64- specifies to nasm how many bits our system is.
-
section .data- Defines the data section,- In there, we can put static data, like strings or constants that we will use in the program.
-
msg db "Hello World", 0xA- Makes a data type called msg with "Hello World" in it, 0xA means 0x0A (Newline, aka \n).-stands forDefine Byte, used to allocate a sequence of bytes or a single Byte in memory.
-
len equ $ - msg- Automatically determines the length of the datatype, the last specified arguementmsgcalls themsgvariable we made earlier.
-
section .text- This is where we write code, _start: (like C's main() ) has to be inside it.
-
global _start- Declares_startas global.
-
_start:- Definition of the_start:label.
- A
labeldefines a location in the code and are used to name specific points in the code. - The label itself, infact, does not execute any code at all. the code following it does.
-
mov rax, 1- Moves value 1 (system call number forsys_write) into rax register.
-
mov rdi, 1- Moves value 1 (file descriptor forstdout) into rdi register.
- 0 for
stdin, standard input - 1 for
stdout, standard output - 2 for
stderr, standard error output
-
mov rsi, msg- Moves the base address of the data we want to write into rsi,msgin this case.
-
mov rdx, len- Moves the base of the adress of the data we want to write into rdx, which islenin this case
- Aka the length of "Hello World"
-
syscall- Sends the Syscall to the Kernel/OS, from what we defined earlier.
-
mov rax, 60- Moves value 60 (system call number forsys_exit) into rax register.
-
xor rdi, rdi- Moves value 0 into rdi, the xor instruction sets rdi to 0.
-
Alternative is:
mov rdi, 0 -
Optional information about 'mov rdi, 0' and 'xor rdi, rdi':
The difference between 'mov rdi, 0' and 'xor rdi, rdi' is 'mov rdi, 0' stores a bigger value (64bit) and 'xor rdi, rdi' stores a shorter instruction (less binary size) example 'mov rdi, 0' bytes: '48 C7 C7 00 00 00 00' example 'xor rdi, rdi' bytes: '48 31 FF' in general x86, 'xor rdi, rdi' is preferred.
-
syscall- Sends the Syscall to the Kernel/OS, from what we defined earlier.
To assemble the file we made, hello-world.asm we convert it to an object file hello-world.o
nasm -f elf64 hello-world.asm -o hello-world.oTo link the object file hello-world.o into an executable file hello-world
ld hello-world.o -o hello-world