
In this post, we will implement a hello world program in assembly. We will use the x86_64 architecture and the Linux operating system.
Prerequisites
- Linux x86_64
- nasm
- ld
We all love hello world programs. They are the first programs we write when learning a new language. Therefore we will implement a hello world program in assembly.
There are couple of things we need to know before we start writing our hello world program.
- What is assembly?
- What are registers?
- What are syscalls?
What is assembly?
Computers do not understand high level languages like C, C++, Java, Python, etc. They only understand machine code. Therefore we need to convert our high level language to machine code. This is where assembly comes in. Every processor architecture has its own assembly because every processor architecture has its own instruction set. For example, the x86_64 architecture has its own assembly and the ARM architecture has its own assembly.
What are registers?
Registers are small memory locations inside the processor. They are used to store data temporarily. We will use registers to store the data we want to print and the system call number we want to use.
What are syscalls?
We don't have direct access to the operating system. We need to use syscalls to interact with the operating system. For example, we need to use a syscall to print something to the screen, read from a file, write to a file, etc.
Hello World
Now that we know what assembly is, what registers are and what syscalls are, we can start writing our hello world program.
section .text
global _start
_start:
; ssize_t write(int fd, const void *buf, size_t count);
mov eax, 4
mov ebx, 1
mov ecx, $message
mov edx, 14
int 0x80
; void _exit(int status);
mov eax, 1
xor ebx, ebx
int 0x80
section .data
message: db "Hello, World!",10
Let's go through the code line by line.
section .text
global _start
This is the start of the text section. The text section contains the code of our program.
_start:
This is the entry point of our program. The operating system will start executing our program from this point.
As we said before, we need to use syscalls to interact with the operating system. The syscall we will use to print something to the screen is the write syscall. The write syscall has the following signature:
ssize_t write(int fd, const void *buf, size_t count);
The first argument is the file descriptor. The file descriptor is a number that represents a file. The file descriptor 0 represents the standard input, the file descriptor 1 represents the standard output and the file descriptor 2 represents the standard error.
The second argument is a pointer to the data we want to print.
The third argument is the size of the data we want to print.
The return type is ssize_t which is size_t but signed because some syscalls can return -1 if an error occurs.
For more detailed explanation of write syscall you can check the manuals.
man 2 write
We used man 2
here because we want to check system call manuals instead of shell command manuals.
It's cute that we can check manual of man by itself ; )
man man
And after some scrolling it says which manual section is for what.
...
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions, e.g. /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conventions), e.g.
man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
...
Coming back to our topic,
Invoking a syscall is similar to calling a function. We need to call the function that we want to use and pass the arguments to the function. To select the syscall that we want to use we need to move the syscall number to the eax register.
Syscall numbers might be different on different operating systems and architectures. Therefore its better to use the header files that come with the operating system to get the syscall numbers.
We can find where the header file is located by using the find command.
~$ sudo find -O3 / -name "unistd_32.h"
/usr/include/x86_64-linux-gnu/asm/unistd_32.h
Then we can use the grep command to find occurences of the write keyword in the header file.
~$ cat /usr/include/x86_64-linux-gnu/asm/unistd_32.h | grep write
#define __NR_write 4
#define __NR_writev 146
#define __NR_pwrite64 181
#define __NR_pwritev 334
#define __NR_process_vm_writev 348
#define __NR_pwritev2 379
We can see that the write syscall has the number 4. Lets move the number 4 to the eax register.
mov eax, 4 ; select the write syscall
To pass the arguments to the syscall we need to move the arguments to the ebx, ecx and edx registers. The first argument is the file descriptor. We want to print something to the standard output so we need to move the file descriptor 1 to the ebx register.
mov ebx, 1 ; move the file descriptor 1 to the ebx register
The second argument is a pointer to the data we want to print. We want to print the message "Hello, World!" so we need to move the address of the message to the ecx register. We will talk about the message later.
mov ecx, $message ; move the address of the message to the ecx register
The third argument is the size of the data we want to print. We want to print the message "Hello, World!" so we need to move the size of the message to the edx register. We can calculate the size of the message by incrementing a counter for every character in the message until we reach the null terminator but this would be unnecessary for a hello world tutorial. We can just count the characters in the message and move the number to the edx register. The size of the message is 14.
mov edx, 14 ; move the size of the message to the edx register
Most Unix systems and derivatives do not use software interrupts, with the exception of interrupt 0x80, used to make system calls. This is accomplished by entering a 32-bit value corresponding to a kernel function into the EAX register of the processor and then executing INT 0x80.¹
Now that we have selected the write syscall and passed the arguments to the syscall we can invoke the syscall.
int 0x80 ; invoke the syscall
Every program needs to exit. We can exit our program by calling the exit syscall. Heres the signature of the exit syscall:
void _exit(int status);
If we don't exit our program it will generate a segmentation fault because we don't have any more instructions to execute but the program counter will keep incrementing and we will eventually reach an invalid memory address.²
The exit syscall has the syscall number 1. We need to move the syscall number to the eax register. The exit syscall only has one argument which is the exit status. We want to exit our program successfully so we need to move 0 to the ebx register. We can now invoke the exit syscall.
Since we grasped the concept of syscalls we can put them all together in this example.
mov eax, 1 ; select the exit syscall
xor ebx, ebx ; move 0 to the ebx register
int 0x80 ; invoke the syscall
Now that we have written our hello world program we can talk about the message. Every program several sections. The text section contains the code of our program. The data section contains the data of our program. For now knowing these two sections is enough.
We want to print the message "Hello, World!" so we need to store the message somewhere. In assembly db ³ is used to allocate some space and fill it with data. We want to allocate 14 bytes and fill them with the characters of the message. We also want to add a new line character, which is 10 in ASCII, to the end of the message.
section .data
message: db "Hello, World!",10
Now that we have written our hello world program we can assemble it with nasm.
nasm -f elf32 hello.asm -o hello.o
Then we need to link the object file.
ld -o hello hello.o
Now we can run our program.
not@shitdows:~$ ./hello
Hello, World!
Congratulations! You have written your first assembly program ; )
¹ What does "int 0x80" mean in assembly code?
² What happens if there is no exit system call in an assembly program?
³ Which variable size to use (db, dw, dd) with x86 assembly?