Build a Debugger from Scratch: Ultimate 3 OS Guide (Part 8)

Understanding the bridge between your code and the kernel is the final key to demystifying how software works. Up to this point in my Assembly Series, you've mastered how to load a process, how calling conventions manage functions, how memory is laid out, and finally, how to talk directly to the hardware via system calls.

ARM64 assembly debugger linux macos ptrace windows x86-64 aarch64 register syscall svc system-call breakpoint

Understanding the bridge between your code and the kernel is the final key to demystifying how software works. Up to this point in my Assembly Series, you’ve mastered how to load a process, how calling conventions manage functions, how memory is laid out, and finally, how to talk directly to the hardware via system calls.

So, what do we do with all this low-level power? We build something real.

build a debugger from scratch

In this final part of the series, we will combine all these concepts to build a debugger from scratch. And because we don’t believe in magic or hiding behind C wrappers, we are going to explore how to build its core engine entirely in pure Assembly across all three major operating systems: Linux, macOS, and Windows.

The Anatomy of a Debugger

Whenever you type run in GDB, LLDB, or WinDbg, you are invoking a very specific set of OS-provided debugging APIs. All debuggers share the same fundamental architecture—a core loop that manages the execution of another process.

This loop consists of four main phases:

Attach/Spawn: The debugger launches the target program as a child process and tells the operating system, “Hey, I’m watching this one.”
Event Loop: The debugger pauses and waits for an event (like a signal, a crash, or a breakpoint) from the target.
Inspect & Modify: While the target is suspended, the debugger peeks into its registers (like RIP or RAX) or reads/writes its memory. This is how you set breakpoints!
Resume: The debugger tells the OS to resume the target’s execution until the next event.

Interactive Lab: Breakpoint Mechanism Visualizer

Step through the 5 phases of software breakpoint injection: original code, inject int 3 / BRK, trap on SIGTRAP, restore original bytes, and continue execution. Toggle between x86-64 and ARM64.

A Word on Security Models & macOS SIP

Operating systems do not take kindly to processes spying on one another. On Linux, capabilities like CAP_SYS_PTRACE prevent arbitrary profiling. On Windows, you often need SeDebugPrivilege to attach to protected processes.

macOS takes this a step further with System Integrity Protection (SIP), which severely restricts reading another process’s memory. Does this mean we have to disable SIP to build our debugger?

No. To legally debug a process on macOS without disabling SIP, our debugger simply needs the right entitlement. By injecting an entitlements.plist file containing the com.apple.security.get-task-allow key during code signing, the macOS kernel will grant our debugger permission to debug its own child processes!

entitlements.plist xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>com.apple.security.get-task-allow</key>
  <true/>
</dict>
</plist>

1. Linux: The Power of ptrace to build a debugger from scratch

On Linux, exactly one system call reigns supreme for debugging: ptrace (Process Trace). It is the engine behind GDB and strace.

Let’s look at how we build the debugger loop in pure x86-64 Linux assembly.

We use fork (Syscall 57) to split our process. The child process calls ptrace with the PTRACE_TRACEME command (0) before invoking execve (Syscall 59) to launch the target. The parent process uses wait4 (Syscall 61) to catch the child whenever it pauses.

linux_debugger.s asm


  ; --- IN THE CHILD PROCESS ---
  ; ptrace(PTRACE_TRACEME, 0, 0, 0)
  mov rax, 101 ; sys_ptrace
  mov rdi, 0   ; PTRACE_TRACEME
  xor rsi, rsi
  xor rdx, rdx
  xor r10, r10
  syscall
  
  ; Setup args and execute the target program...
  mov rax, 59 ; sys_execve
  ; ... (arguments setup) ...
  syscall

  ; --- IN THE PARENT PROCESS (DEBUGGER) ---
wait_loop:
  ; wait4(child_pid, &status, 0, NULL)
  mov rax, 61
  mov rdi, r12 ; r12 holds the child pid
  lea rsi, [rel status]
  xor rdx, rdx
  xor r10, r10
  syscall

  ; Check if target exited
  mov eax, dword [rel status]
  and eax, 0x7f
  cmp eax, 0
  jz target_exited

  ; Target stopped. Here, you could peek at registers!
  ; Resume target: ptrace(PTRACE_CONT, pid, 0, 0)
  mov rax, 101
  mov rdi, 7   ; PTRACE_CONT
  mov rsi, r12
  xor rdx, rdx
  xor r10, r10
  syscall

  jmp wait_loop

By calling ptrace with PTRACE_GETREGS (12), the parent can suck the exact state of the child’s CPU registers into a structure (user_regs_struct).

On x86-64: We inspect the Instruction Pointer (RIP) and modify memory to inject an int 3 (opcode 0xCC) breakpoint.
On ARM64: We inspect the Program Counter (PC) and, instead of 0xCC, we write a BRK instruction (like 0xD4200000).

ptrace is incredibly powerful, but you must manually handle every single architecture detail (like the exact size and layout of user_regs_struct).

2. macOS: ptrace and Mach Ports to build a debugger from scratch

macOS is a BSD-derived system wrapped around the Mach microkernel. While macOS does have a ptrace system call (Syscall 0x200001A), Apple deliberately crippled it. It only supports basic attachment (PT_TRACE_ME, PT_CONTINUE, PT_KILL).

To build our macOS loop in assembly (whether x86-64 or ARM64 Apple Silicon), we use fork (Syscall 0x2000002) and wait4 (Syscall 0x2000007). We must understand how to build a debugger from scratch even on locked-down systems.

macos_debugger.s asm


  ; --- IN THE CHILD PROCESS (x86-64) ---
  ; ptrace(PT_TRACE_ME, 0, 0, 0)
  mov rax, 0x200001A ; sys_ptrace
  mov rdi, 0         ; PT_TRACE_ME
  xor rsi, rsi
  xor rdx, rdx
  xor r10, r10
  syscall

  ; --- IN THE CHILD PROCESS (ARM64) ---
  ; ptrace(PT_TRACE_ME, 0, 0, 0)
  mov x16, 26         ; sys_ptrace (no 0x2000000 class offset)
  mov x0, 0           ; PT_TRACE_ME
  mov x1, 0
  mov x2, 0
  mov x3, 0
  svc 0x80

But what if you want to read registers on macOS? The ptrace call will fail. You must ask the Mach kernel directly using Mach RPC functions like thread_get_state. Because constructing raw Mach messages in Assembly requires packing colossal RPC structs manually, most macOS debuggers (like LLDB) rely heavily on the C layer for the Mach APIs.

3. Windows: The Debug API Loop to build a debugger from scratch

Windows handles debugging completely differently. Instead of relying on signals like Linux or macOS, the Windows kernel exposes a dedicated, event-driven Debug API.

Instead of fork and ptrace, we use CreateProcessA directly, passing the DEBUG_ONLY_THIS_PROCESS creation flag. The OS will automatically attach your process as the debugger.

We then enter a continuous WaitForDebugEvent loop. Because the Windows API uses the __fastcall convention on x64, our assembly reflects setup using RCX, RDX, R8, and R9.

windows_debugger.s asm


  ; CreateProcessA setup (shadow space and parameters)
  ; lpCreationFlags = DEBUG_ONLY_THIS_PROCESS (value: 2)
  mov qword [rsp+40], 2    
  call CreateProcessA

debug_loop:
  ; WaitForDebugEvent(&debug_event, INFINITE)
  lea rcx, [rel de] ; de = DEBUG_EVENT struct
  mov rdx, 0xFFFFFFFF ; INFINITE timeout
  call WaitForDebugEvent

  test eax, eax
  jz exit_debugger

  ; Check if process exited (EXIT_PROCESS_DEBUG_EVENT = 5)
  mov eax, dword [rel de]
  cmp eax, 5
  je exit_debugger

  ; Resume the target: ContinueDebugEvent(pid, tid, DBG_CONTINUE)
  mov ecx, dword [rel de + 4] ; dwProcessId
  mov edx, dword [rel de + 8] ; dwThreadId
  mov r8, 0x00010002          ; DBG_CONTINUE
  call ContinueDebugEvent

  jmp debug_loop

The DEBUG_EVENT structure populates with exact details whenever an exception occurs (like hitting a breakpoint), a DLL is loaded, or a thread starts. To read registers, you pause the loop and call GetThreadContext() (specifying CONTEXT_AMD64 or CONTEXT_ARM64 depending on the architecture).

Concluding the Assembly Series

If you have followed along, you now know how to build a debugger from scratch. From simple instruction definitions to manipulating the kernel for debugging, we have traversed the full stack of how a computer truly executes code. You now know exactly what happens behind the scenes of your compiler, linker, and debugger.

While Assembly isn’t something most developers write by hand every day, understanding how it works equips you to diagnose issues, write hyper-optimized C/Rust code, and reverse engineer black-box software.

Welcome to the low level. Stay curious.