Introduction to exploit development - Part 4: Heap overflows
Overview
In Part 2 and Part 3, we explored stack-based vulnerabilities. Specifically focusing on buffer overflows that corrupt saved return addresses and enable shellcode execution. The stack's predictable structure (return addresses stored adjacent to buffers) makes exploitation relatively straightforward (well, not really, don't forget we had ASLR and stack canaries disabled).
Heap-based vulnerabilities is another avenue to exploit, albeit with different challenges. The heap lacks the convenient saved return addresses of the stack, requiring alternative techniques to achieve code execution. This post explores heap exploitation on ARM, covering:
- Heap memory allocation and management
- Differences between stack and heap memory regions
- The Global Offset Table (GOT) and Procedure Linkage Table (PLT)
- Exploiting heap overflows to overwrite GOT entries
- Hijacking control flow via function pointer corruption
All work below uses the ARM environment (Debian armhf VM or Raspberry Pi) configured in Part 1.
Understanding the heap
The heap is a memory region for dynamic allocation - memory requested at runtime rather than compile time. Unlike stack frames that are automatically managed by function calls and returns, heap memory must be explicitly allocated and freed by the program.
Heap vs. stack
Key differences:
| Aspect | Stack | Heap |
|---|---|---|
| Allocation | Automatic (compile-time known allocations, function calls, etc) | Manual (malloc, calloc) |
| Deallocation | Automatic (function return) | Manual (free) |
| Scope | Local to function | Global (accessible via pointers) |
| Growth direction | Downward (high → low addresses) | Upward (low → high addresses) |
| Fragmentation | No fragmentation | Can fragment with allocation/deallocation patterns |
| Overflow target | Saved return address, frame pointer | Adjacent heap chunks, metadata, function pointers |
Heap allocation in C
Functions for heap management:
malloc(size): Allocatessizebytes, returns pointer to allocated memorycalloc(count, size): Allocates and zero-initializescount * sizebytesrealloc(ptr, size): Resizes previously allocated memoryfree(ptr): Deallocates memory, returning it to the heap for reuse
Example:
int
Memory fragmentation
A common heap issue is fragmentation. Consider this allocation sequence:
char *a = ; // Allocate A
char *b = ; // Allocate B
char *c = ; // Allocate C
; // Free B (creates hole between A and C)
After freeing b, there's a 100-byte gap between a and c. If we later request 200 bytes, the allocator can't use this gap, leading to fragmentation. Different malloc implementations (dlmalloc, ptmalloc, jemalloc) handle fragmentation differently with varying performance tradeoffs.
Locating the heap in memory
To get a sense of heap placement, we can write a simple program to show us the memory addresses of variables allocated on the heap and stack:
hello.c
int
Compile and debug:
# Heap variable at: 0x403190
# Stack variable at: 0xbefff450
As mentioned, the heap starts at a much lower address (e.g., 0x00402000) than the stack (e.g., 0xbefdf000), and grows upward toward higher addresses.
Controlling execution flow via heap corruption
Stack overflows target saved return addresses conveniently located on the stack. Heap overflows require different targets. Common techniques:
- Function pointer overwriting: If structures on the heap contain function pointers, overflowing into those pointers redirects execution
- Virtual table (vtable) corruption: In some languages, we can overwrite object vtables to hijack virtual function calls
- Global Offset Table (GOT) overwriting: Corrupt GOT entries to redirect library function calls
- Heap metadata corruption: Exploit allocator metadata to achieve arbitrary writes
This post focuses on GOT overwriting, as it's powerful and demonstrates key concepts applicable to other techniques.
The Global Offset Table (GOT) and Procedure Linkage Table (PLT)
Modern programs use dynamic linking - they don't contain copies of library code (like printf, malloc, etc.). Instead, the dynamic linker (ld-linux.so) resolves library function addresses at runtime and stores them in the Global Offset Table.
Why dynamic linking?
Benefits:
- Smaller binaries: No need to include library code in each executable
- Shared libraries: Multiple programs share one copy of libc in memory
- Security updates: Updating libc fixes vulnerabilities in all programs that use it
Cost:
- Performance overhead: First call to a function requires symbol resolution
- Complexity: Additional indirection through GOT/PLT
Lazy binding optimization
To reduce startup time, dynamic linking uses lazy binding - library functions are resolved only when first called, not at program startup. This is implemented via the Procedure Linkage Table (PLT) and Global Offset Table (GOT).
The PLT contains stubs that:
- Check if the function has been resolved (by checking GOT)
- If not resolved, call the dynamic linker to resolve it
- If resolved, jump directly to the function
The GOT stores the actual addresses of resolved functions.
Dissecting heap1.c
To gain some understanding, let's trace a call to puts() through the PLT and GOT.
Take the following program:
heap1.c
int
Compile:
Examine sections:
Look for:
.pltsection: Contains PLT stubs (starts at0x000003f8in our example).gotsection: Contains GOT entries (starts at0x00002000in our example)
View relocations (GOT entries):
Output shows function offsets from GOT base:
Offset Info Type Sym.Value Sym. Name
00002014 00000616 R_ARM_JUMP_SLOT 00000000 free@GLIBC_2.4
00002018 00000716 R_ARM_JUMP_SLOT 00000000 puts@GLIBC_2.4
0000201c 00000816 R_ARM_JUMP_SLOT 00000000 malloc@GLIBC_2.4
This tells us puts is at offset 0x2018 (GOT base is 0x2000, so offset is 0x18).
Tracing GOT/PLT with GDB
Disassemble main:
()
Find the puts call (something like blx 0x430 <puts@plt>). This branches to the PLT stub.
Disassemble the PLT stub:
()
Output (example for ARM):
0x00000430 <+0>: add r12, pc, #0, 12
0x00000434 <+4>: add r12, r12, #4096 @ 0x1000
0x00000438 <+8>: ldr pc, [r12, #3040]! @ 0xbe0
These instructions calculate the GOT address for puts and load it into the program counter (PC). The calculation takes PC at execution (which is 0x438 due to ARM's 8-byte prefetching pipeline offset) + 0x1000 + 0xbe0. This matches the GOT entry we found with readelf above (0x2018)!
Next let's set a breakpoint and examine the actual GOT address. It's important to note that the GOT offset we see above is an offset from the start of our program:
()
()
()()
The GOT initially contains 0x004003f8 - an address back into the PLT. This is that "lazy binding" mechanism. On first call:
- PLT jumps to address in GOT (
0x004003f8) - That address points back to PLT code that calls the dynamic linker
- Dynamic linker resolves
putsand updates GOT with real address - Future calls to
putsjump directly to the real function
Continue execution past the puts call, then examine GOT again:
()
()
()
Now the GOT contains the real address of puts in libc (0xb6f0f339)!
Exploiting heap2.c
So what if we can overwrite a GOT entry before the function is called? This should allow us to redirect execution to our shellcode or another function. Let's try it!
This time we'll look at a deliberately vulnerable program (from exploit-exercises.com)) to demonstrate heap overflow via GOT overwrite.
heap2.c
;
void
int
Compile:
Understanding the vulnerability
Three internet structures are allocated on the heap, each containing an integer and a pointer to a separately allocated string. The strcpy invocations copy command-line arguments without bounds checking.
Memory layout after allocations:
[i1 structure: priority, name pointer]
[i1->name: 8-byte buffer]
[i2 structure: priority, name pointer]
[i2->name: 8-byte buffer]
[i3 structure: priority, ngdb ./heap2ame pointer]
[i3->name: 8-byte buffer]
If argv[1] exceeds 8 bytes, we overflow i1->name into adjacent heap memory. We can potentially overwrite:
- The
i2structure itself - Specifically, the
i2->namepointer
If we overwrite i2->name with the address of a GOT entry, the second strcpy(i2->name, argv[2]) will write argv[2] to that GOT entry!
Step 1: Finding the overflow offset
Use GDB with a pattern to determine how many bytes are needed to reach the i2->name pointer:
()
()
()()
The error reveals that i2->name was overwritten with 0x46464646, which is "FFFF" in ASCII - bytes 21-24 of our input. This means we need 20 bytes of padding followed by the 4-byte address we want to write into i2->name.
We can also calculate this from the struct layout:
i1->name buffer: 8 bytes
heap metadata: 8 bytes
i2->priority: 4 bytes
------------------------
Total to i2->name: 20 bytes
Step 2: Identifying the GOT target
We want to overwrite the GOT entry for puts (the last function called).
First, find the relocation offset using readelf:
|
# 00002020 00000916 R_ARM_JUMP_SLOT 00000000 puts@GLIBC_2.4
This shows the GOT entry is at offset 0x2020. To get the actual runtime address, we need to add the program's base address:
()
()
()
Look for the first mapping of the binary - this is the base address:
Start Addr End Addr Size Offset Perms objfile
0x400000 0x401000 0x1000 0x0 r-xp /home/user/heap2
Combining the two above, we now know that the GOT entry for puts is at 0x400000 + 0x2020 = 0x402020.
Step 3: Finding the winner() function
The program contains a winner() function that never gets called. We'll redirect puts to winner():
()
Winner is at 0x4005b4.
Since this is ARM Thumb code, we need to set the LSB to 1 for proper execution, giving us 0x4005b5.
Step 4: Crafting the exploit
Payload structure:
- First argument: 20 bytes of padding + GOT address of
puts(little-endian) - Second argument: Address of
winner()(little-endian)
The first strcpy overwrites i2->name with the GOT address.
The second strcpy writes the winner() address to that GOT entry.
Exploit:
Note the quotes around each $(printf ...), these are necessary because \x20 is a space character, and without quotes the shell would split the argument (I found this out the hard way >.>). Also, as mentioned, since this is ARM Thumb code, we set the LSB of the winner address (0x4005b5 instead of 0x4005b4) to ensure proper Thumb mode execution. The \x00 bytes complete the 4-byte addresses in little-endian format.
When executed, the program should print "and we have a winner @ [timestamp]" instead of "and that's a wrap folks!", proving we've successfully redirected puts to winner().
Step 5: Escalating to shellcode execution
Instead of calling winner(), we can redirect to shellcode stored in an environment variable. We'll use the NOP sled technique from Part 3 to make this more reliable.
First, we'll need to cheat a bit and recompile to allow an executable stack like before:
Then using the nop.py helper script from Part 3, export shellcode with a NOP sled:
The NOP sled creates a large landing zone, eliminating the need for precise address calculation. Environment variable addresses shift based on program name length and other factors - the sled compensates for this uncertainty.
Now exploit using the SHELLCODE address:
The nop.py buffer subcommand automatically:
- Finds the SHELLCODE environment variable address
- Compensates for program name length differences
- Adds an offset to land in the middle of the NOP sled
- Outputs the address in little-endian format
This redirects puts to our shellcode and then...unexpectedly FAILS. WHAT!?
It turns out that since I originally worked on this content in 2019, the security mechanisms provided by the Linux kernel have improved. Previously, setting an executable stack via -z execstack applied to all memory mappings, including the heap; but this seems to have been fixed. On ARM, this is now properly enforced at the hardware level via the XN (Execute Never) bit. I leave it as an exercise to the reader to figure out how to properly exploit this ;).
Modern heap exploitation challenges and techniques
As you can see, heap exploitation is generally more difficult than stack exploitation. In this post, we took one of the simplest examples possible, and look just how complicated it got. The reason for this is due to several reasons, for example:
- No convenient targets: Unlike stacks (return addresses), heaps don't have predictable control flow targets
- Allocator complexity: Different
mallocimplementations have different metadata layouts and corruption detection - Heap layout variability: Allocation order and sizes affect memory layout unpredictably
- Modern mitigations: Heap-specific protections (safe unlinking, malloc metadata integrity checks)
- There have been major advances here, which is why the last step for executing shellcode likely didn't work anymore. Blue team is doing a great job here!
The basic GOT overwriting via heap overflow that we covered is just one technique, modern heap exploitation goes deep and includes techniques such as:
Use-after-free (UAF): Accessing freed memory that has been reallocated. If the reallocated chunk contains attacker-controlled data, this can hijack function pointers or vtables.
Double-free: Freeing the same memory twice corrupts allocator metadata, potentially enabling arbitrary writes.
Heap spraying: Allocate many objects containing shellcode to increase likelihood of execution jumping to shellcode.
Tcache poisoning (glibc 2.26+): Corrupt tcache (per-thread cache) metadata to achieve arbitrary writes.
Fastbin dup: Exploit fastbin (small chunk cache) by creating duplicate entries, allowing overlapping allocations.
These techniques exploit allocator internals and require deep understanding of implementation details (ptmalloc2 for glibc, jemalloc, etc.). You're on your own for understanding these, I just helped dip your pinky toe into this complicated sea.
Conclusion
Heap exploitation requires understanding dynamic memory management, GOT/PLT mechanisms, and careful memory layout manipulation. We've demonstrated:
- Heap allocation basics and memory layout
- How GOT/PLT enable dynamic linking with lazy binding
- Overflowing heap buffers to corrupt adjacent chunk pointers
- Redirecting GOT entries to arbitrary functions or shellcode
The techniques shown here are foundational; understanding heap layouts, GOT mechanics, and pointer corruption provides the groundwork for more advanced exploitation. That being said, modern heap techniques will require deep research and efforts, which I leave to dear reader. I hope this post at least demonstrates that heap exploitation, while intricate, follows logical patterns once you understand the underlying memory management.
With that, this post concludes our exploration of fundamental exploitation techniques. We've progressed from basic stack overflows through position-independent shellcode to heap-based control flow hijacking. In the epilogue, we'll summarize what we've learned, discuss techniques we didn't cover, and explore paths forward for continued learning in binary exploitation and systems security.
References
- The Shellcoder's Handbook - Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte
- Heap exploitation on ARM - Aditya Gupta
- LiveOverflow: How to exploit a Heap Overflow
- Protostar heap1 - exploit-exercises.com
- Understanding the heap: Global Offset Table
- Understanding the heap: Procedure Linkage Table
- Heap Memory in C Programming - Stack Overflow