Format String Bug
Reading memory off the stack
Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.
Why it exists
In C, certain functions can take "format specifier" within strings. Let's look at an example:
This prints out:
So, it replaced %d
with the value, %f
with the float value and %x
with the hex representation.
This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:
As expected, we get
What happens, however, if we don't have enough arguments for all the format specifiers?
Erm... what happened here?
The key here is that printf
expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.
How to abuse this
Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf
.
If we run this normally, it works at expected:
But what happens if we input format string specifieres, such as %x
?
It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.
Choosing Offsets
To print the same value 3 times, using
Gets tedious - so, there is a better way in C.
The 1$
between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p
- instead of sending %p %p %p %p %p %p
we can just do %6$p
. This allows us to be much more efficient.
Arbitrary Reads
In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s
format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.
Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.
Let's look back at the previous program and its output:
You may notice that the last two values contain the hex values of %x
. That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address then point %s
at it, we can get an arbitrary write!
%p
is a pointer; generally, it returns the same as %x
just precedes it with a 0x
which makes it stand out more
As we can see, we're reading the value we inputted. Let's write a quick pwntools script that write the location of the ELF file and reads it with %s
- if all goes well, it should read the first bytes of the file, which is always \x7fELF
. Start with the basics:
Nice it works. The base address of the binary is 0x8048000
, so let's replace the 0x41424344
with that and read it with %s
:
It doesn't work.
The reason it doesn't work is that printf
stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.
Let's break down the payload:
We add 4
|
because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong addressThe offset is
%8$p
because the start of the buffer is generally at%6$p
. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at%8$p
.
It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.
Now let's replace the p
with an s
.
Of course, %s
will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte are \x7fELF\x01\x01\x01
.
Arbitrary Writes
Luckily C contains a rarely-used format specifier %n
. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written an also where we write them.
Obviously, there is a small flaw - to write, say, 0x8048000
to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.
Simple - we need to overwrite the variable auth
with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets
.
Work out the location of auth
As it's a global variable, it's within the binary itself. We can check the location using readelf
to check for symbols.
Location of auth
is 0x0804c028
.
Writing the Exploit
We're lucky there's no null bytes, so there's no need to change the order.
Buffer is the 7th %p
.
And easy peasy:
Pwntools
As you can expect, pwntools has a handy feature for automating %n
format string exploits:
The offset
in this case is 7
because the 7th %p
read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.
You can also grab the location of the auth
symbol with pwntools:
Check out the pwntools tutorials for more cool features
Last updated