x86-64 & ARM64 Calling Conventions Demystified
Learn x86-64 and ARM64 calling conventions across Linux, macOS, and Windows. Master parameter passing, stack frames, and register preservation with hands-on assembly examples.
In Part 1 of this series, we explored how to make syscalls directly to the kernel. But most real programs rarely call syscalls directly — instead, they call functions, which may themselves invoke syscalls deep in the call stack.
When one function calls another, both must agree on a contract: Where do arguments go? Which registers can I clobber? Who cleans up the stack? This contract is called a calling convention, and it varies dramatically between platforms. Understanding calling conventions is essential for any low-level programmer.
This comprehensive guide covers the three major calling conventions you’ll encounter when writing cross-platform assembly:
| Convention | Platforms | Architecture |
|---|---|---|
| System V AMD64 ABI | Linux, macOS, FreeBSD | x86-64 |
| Microsoft x64 | Windows | x86-64 |
| AAPCS64 | Linux, macOS, Windows | ARM64 |
What Calling Conventions Define
A calling convention specifies the rules that both the caller and callee must follow:
- Parameter passing — Which registers hold arguments, and in what order
- Return values — Where the result goes
- Register preservation — Which registers must be saved/restored by the callee
- Stack layout — Alignment requirements, shadow space, red zones
- Cleanup responsibility — Who adjusts the stack pointer after the call
Stack Frame Visualization
Understanding calling conventions becomes much easier when you visualize the memory layout for each ABI. Below is a comparison of how each architecture structures its stack frame during a function call.
System V AMD64 (Linux/macOS)
The System V stack is optimized for speed. It uses 6 registers for arguments and provides a “Red Zone” for leaf functions.
Red Zone (128 bytes)
Windows x64 (Microsoft)
Windows requires Shadow Space (also known as Home Space) allocated by the caller, even if arguments are passed in registers. This space is used for spilling registers during debugging.
Shadow Space (32 bytes)
ARM64 (AAPCS64)
ARM64 uses the Link Register (x30) to store the return address. The Frame Pointer (x29) and Link Register are typically saved as a pair at the start of the stack frame.
Saved x29 / x30 (16 bytes)
Stack Arguments (9+)
Interactive Lab: Calling Convention Register Mapper
Select a function signature preset and see exactly where each parameter lands across all three major calling conventions. Green = register, Red = stack overflow.
System V AMD64 ABI (Linux & macOS x86-64)
The System V AMD64 ABI is the most widely used calling convention on Unix-like systems. It’s efficient, using many registers for parameter passing which reduces stack operations.
Integer and Pointer Arguments
The first six integer/pointer arguments go in registers, in this order:
| Argument | Register |
|---|---|
| 1st | rdi |
| 2nd | rsi |
| 3rd | rdx |
| 4th | rcx |
| 5th | r8 |
| 6th | r9 |
Arguments beyond the sixth are pushed onto the stack in reverse order (right-to-left).
Floating-Point Arguments
The first eight floating-point arguments use vector registers: xmm0 - xmm7.
Variadic Note: For variadic functions (like printf), the caller must set al (the low 8 bits of rax) to the total number of floating-point arguments being passed in vector registers. This allows the callee to optimize the saving of XMM registers.
Return Values
| Type | Location |
|---|---|
| Integer/pointer (≤64 bits) | rax |
| Integer (128 bits) | rax (low), rdx (high) |
| Floating-point | xmm0 |
| Struct (small) | rax/rdx or xmm0/xmm1 |
| Struct (large) | Caller passes hidden pointer in rdi |
Register Preservation
This is critical: if you clobber a callee-saved register without restoring it, you’ll corrupt the caller’s state.
| Caller-saved (volatile) | Callee-saved (non-volatile) |
|---|---|
rax, rcx, rdx, rsi, rdi, r8-r11 | rbx, rbp, r12-r15, rsp |
Callee-saved means: if your function uses these registers, you must save them at entry and restore them before returning.
Stack Requirements
- 16-byte alignment: The stack pointer (
rsp) must be 16-byte aligned before thecallinstruction. Sincecallpushes an 8-byte return address,rspwill be misaligned by 8 bytes at function entry. Prologues typically subtract 8 (or 8 + 16n) to restore alignment. - Red zone: Leaf functions (functions that don’t call other functions) can use 128 bytes below
rspwithout adjusting the stack pointer. This optimization avoids prologue/epilogue overhead for simple functions.
Code Example
// Linux x86-64: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example
.section .data
fmt: .asciz "Sum of %d and %d is %d\n"
.section .text
.global main
// int add_nums(int a, int b)
// Arguments: rdi = a, rsi = b
// Returns: rax = a + b
add_nums:
lea eax, [rdi + rsi] // eax = rdi + rsi
ret
main:
push rbp
mov rbp, rsp
// Call add_nums(5, 3)
mov edi, 5 // First argument
mov esi, 3 // Second argument
call add_nums // Result in eax
// Call printf(fmt, 5, 3, result)
mov ecx, eax // 4th arg: result
mov edx, 3 // 3rd arg: second number
mov esi, 5 // 2nd arg: first number
lea rdi, [rip + fmt] // 1st arg: format string
xor eax, eax // No vector arguments
call printf
xor eax, eax // Return 0
pop rbp
ret Note the xor eax, eax before calling printf — this tells the variadic function that we’re not passing any floating-point arguments in vector registers. This is a key detail of the System V calling convention.
Microsoft x64 Calling Convention (Windows)
Windows uses a convention that prioritizes simplicity and debugging. Unlike System V, it requires the caller to provide “Shadow Space” for register arguments.
Key Differences from System V
| Aspect | System V | Windows x64 |
|---|---|---|
| Register arguments | 6 | 4 |
| Shadow space | None | 32 bytes required |
| Red zone | 128 bytes | None |
| Variadic handling | al = vector count | Same as regular args |
Argument Registers
Only four registers are used for arguments in the Windows calling convention:
| Argument | Integer/Pointer | Floating Point |
|---|---|---|
| 1st | rcx | xmm0 |
| 2nd | rdx | xmm1 |
| 3rd | r8 | xmm2 |
| 4th | r9 | xmm3 |
Shadow Space: The caller must always allocate 32 bytes of “shadow space” on the stack immediately above the return address. This space is technically “owned” by the callee, which can use it to store (spill) the first four parameters if needed for debugging or varargs processing.
Code Example (Windows)
; Windows x64: add two numbers, print result
; Assemble: ml64 /c example.asm
; Link: link /subsystem:console example.obj msvcrt.lib
.data
fmt db "Sum of %d and %d is %d", 10, 0
.code
extern printf:proc
add_nums proc
lea eax, [rcx + rdx] ; First two args in rcx, rdx
ret
add_nums endp
main proc
sub rsp, 40 ; 32 shadow + 8 alignment
; Call add_nums(5, 3)
mov ecx, 5
mov edx, 3
call add_nums
; Call printf(fmt, 5, 3, result)
mov r9d, eax ; 4th arg
mov r8d, 3 ; 3rd arg
mov edx, 5 ; 2nd arg
lea rcx, fmt ; 1st arg
call printf
xor eax, eax
add rsp, 40
ret
main endp
end AAPCS64 (ARM64)
ARM64 uses the AAPCS64 calling convention across all platforms, including Apple Silicon.
Note on Apple ARM64 (macOS) deviations: While Apple Silicon Macs follow AAPCS64 for the majority of cases, Apple’s ABI has some documented deviations from the standard AAPCS64 spec. Most notably: rules for passing small structs and for variadic functions (va_list layout) differ from Linux ARM64. Code that mixes Apple ARM64 and standard AAPCS64 assumptions in those areas may behave differently across platforms. For general integer/pointer arguments, the conventions are identical. See Apple’s ARM64 ABI documentation for details.
Argument Passing
| Argument | Register |
|---|---|
| 1st-8th integer | x0-x7 |
| 1st-8th float | v0-v7 |
Register Preservation
| Caller-saved (Volatile) | Callee-saved (Non-Volatile) |
|---|---|
x0-x18 | x19-x28, x29 (fp), x30 (lr), sp |
Code Example (ARM64)
// ARM64 Linux: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example
.section .data
fmt: .asciz "Sum of %d and %d is %d\n"
.section .text
.global main
add_nums:
add w0, w0, w1 // w0 = w0 + w1
ret
main:
stp x29, x30, [sp, #-16]!
mov x29, sp
// Call add_nums(5, 3)
mov w0, #5
mov w1, #3
bl add_nums
// Call printf(fmt, 5, 3, result)
mov w3, w0 // 4th arg
mov w2, #3 // 3rd arg
mov w1, #5 // 2nd arg
adrp x0, fmt
add x0, x0, :lo12:fmt
bl printf
mov w0, #0
ldp x29, x30, [sp], #16
ret Side-by-Side Comparison
| Feature | System V AMD64 | Windows x64 | AAPCS64 |
|---|---|---|---|
| Integer args | 6 regs | 4 regs | 8 regs |
| Float args | 8 regs | 4 regs | 8 regs |
| Red zone | 128 bytes | None | None |
| Shadow space | None | 32 bytes | None |
| Stack alignment | 16 bytes | 16 bytes | 16 bytes |
| Return value | rax | rax | x0 |
Common Pitfalls and Debugging Tips
1. Stack Alignment (The 16-byte rule)
All three major 64-bit conventions require the stack pointer (rsp or sp) to be 16-byte aligned before any call or bl instruction. Failure to maintain this alignment often results in a segfault when the callee executes an SSE, AVX, or NEON instruction that expects aligned memory.
2. Shadow Space Corruption
On Windows, even if you pass zero arguments, you must subtract 32 from rsp before calling a function. If you don’t, the callee may overwrite your return address when it spills its (non-existent) register arguments into the shadow space it assumes you provided.
3. Clobbering Callee-Saved Registers
If your function uses rbx, rbp, or r12-r15 (on x86-64) or x19-x28 (on ARM64), you must save them to the stack and restore them before returning.
What’s Next?
Now you understand the contracts governing function communication. In Part 3, we’ll explore program startup: what happens before main() runs and how command-line arguments are passed.