Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Still learning :)
Moving onto heap exploitation does not require you to be a god at stack exploitation, but it will require a better understanding of C and how concepts such as pointers work. From time to time we will be discussing the glibc source code itself, and while this can be really overwhelming, it's incredibly good practise.
I'll do everything I can do make it as simple as possible. Most references (to start with) will be hyperlinks, so feel free to just keep the concept in mind for now, but as you progress understanding the source will become more and more important.
Occasionally different snippets of code will be from different glibc versions, and I'll do my best to note down which version they are from. The reason for this is that newer versions have a lot of protections that will obscure the basic logic of the operation, so we will start with older implementations and build up.
When we are done with a chunk's data, the data is freed using a function such as free()
. This tells glibc that we are done with this portion of memory.
In the interest of being as efficient as possible, glibc makes a lot of effort to recycle previously-used chunks for future requests in the program. As an example, let's say we need 100
bytes to store a string input by the user. Once we are finished with it, we tell glibc we are no longer going to use it. Later in the program, we have to input another 100-byte string from the user. Why not reuse that same part of memory? There's no reason not to, right?
It is the bins that are responsible for the bulk of this memory recycling. A bin is a (doubly- or singly-linked) list of free chunks. For efficiency, different bins are used for different sizes, and the operations will vary depending on the bins as well to keep high performance.
When a chunk is freed, it is "moved" to the bin. This movement is not physical, but rather a pointer - a reference to the chunk - is stored somewhere in the list.
There are four bins: fastbins, the unsorted bin, smallbins and largebins.
When a chunk is freed, the function that does the bulk of the work in glibc is _int_free()
. I won't delve into the source code right now, but will provide hyperlinks to glibc 2.3, a very old one without security checks. You should have a go at familiarising yourself with what the code says, but bear in mind things have been moved about a bit to get to there they are in the present day! You can change the version on the left in bootlin to see how it's changed.
First, the size
of the chunk is checked. If it is less than the largest fastbin size, add it to the correct fastbin
Otherwise, if it's mmapped, munmap
the chunk
Finally, consolidate them and put them into the unsorted bin
What is consolidation? We'll be looking into this more concretely later, but it's essentially the process of finding other free chunks around the chunk being freed and combining them into one large chunk. This makes the reuse process more efficient.
Fastbins store small-sized chunks. There are 10 of these for chunks of size 16, 24, 32, 40, 48, 56, 64, 72, 80 or 88 bytes including metadata.
There is only one of these. When small and large chunks are freed, they end of in this bin to speed up allocation and deallocation requests.
Essentially, this bin gives the chunks one last shot at being used. Future malloc requests, if smaller than a chunk currently in the bin, split up that chunk into two pieces and return one of them, speeding up the process - this is the Last Remainder Chunk. If the chunk requested is larger, then the chunks in this bin get moved to the respective Small/Large bins.
There are 62 small bins of sizes 16, 24, ... , 504 bytes and, like fast bins, chunks of the same size are stored in the same bins. Small bins are doubly-linked and allocation and deallocation is FIFO.
The purpose of the FD
and BK
pointers as we saw before are to points to the chunks ahead and behind in the bin.
Before ending up in the unsorted bin, contiguous small chunks (small chunks next to each other in memory) can coalesce (consolidate), meaning their sizes combine and become a bigger chunk.
63 large bins, can store chunks of different sizes. The free chunks are ordered in decreasing order of size, meaning insertions and deletions can occur at any point in the list.
The first 32 bins have a range of 64 bytes:
Like small chunks, large chunks can coalesce together before ending up in the unsorted bin.
Each bin is represented by two values, the HEAD
and TAIL
. As it sounds, HEAD
is at the top and TAIL
at the bottom. Most insertions happen at the HEAD
, so in LIFO structures (such as the fastbins) reallocation occurs there too, whereas in FIFO structures (such as small bins) reallocation occurs at the TAIL
. For fastbins, the TAIL
is null
.
Unlike the stack, heap is an area of memory that can be dynamically allocated. This means that when you need new space, you can "request" more from the heap.
In C, this often means using functions such as malloc()
to request the space. However, the heap is very slow and can take up tons of space. This means that the developer has to tell libc when the heap data is "finished with", and it does this via calls to free()
which mark the area as available. But where there are humans there will be implementation flaws, and no amount of protection will ever ensure code is completely safe.
In the following sections, we will only discuss 64-bit systems (with the exception of some parts that were written long ago). The theory is the same, but pretty much any heap challenge (or real-world application) will be on 64-bit systems.
Fastbins are a singly-linked list of chunks. The point of these is that very small chunks are reused quickly and efficiently. To aid this, chunks of fastbin size do not consolidate (they are not absorbed into surrounding free chunks once freed).
A fastbin is a LIFO (Last-In-First-Out) structure, which means the last chunk to be added to the bin is the first chunk to come out of it. Glibc only keeps track of the HEAD, which points to the first chunk in the list (and is set to 0
if the fastbin is empty). Every chunk in the fastbin has an fd
pointer, which points to the next chunk in the bin (or is 0
if it is the last chunk).
When a new chunk is freed, it's added at the front of the list (making it the head):
The fd
of the newly-freed chunk is overwritten to point at the old head of the list
HEAD is updated to point to this new chunk, setting the new chunk as the head of the list
Let's have a visual demonstration (it will help)! Try out the following C program:
We get:
As you can see, the chunk a
gets reassigned to chunk f
, b
to e
and c
to d
. So, if we free()
a chunk, there's a good chance our next malloc()
- if it's of the same size - will use the same chunk.
It can be really confusing as to why we add and remove chunks from the start of the list (why not the end?), but it's really just the most efficient way to add an element. Let's say we have this fastbin setup:
In this case HEAD points to a
, and a
points onwards to b
as the next chunk in the bin (because the fd
field of a
points to b
). Now let's say we free another chunk c
. If we want to add it to the end of the list like so:
We would have to update the fd
pointer of b
to point at c
. But remember that glibc only keeps track of the first chunk in the list - it only has the HEAD stored. It has no information about the end of this list, which could be many chunks long. This means that to add c
in at the end, it would first have to start at the head and traverse through the entire list until it got to the last chunk, then overwrite the fd
field of the last chunk to point at c
and make c
the last chunk.
Meanwhile, if it adds at the HEAD:
All we need to do is:
Set the fd
of c
to point at a
This is easy, as a
was the old head, so glibc had a pointer to it stored already
HEAD is then updated to c
, making it the head of the list
This is also easy, as the pointer to c
is freely available
This has much less overhead!
For reallocating the chunk, the same principle applies - it's much easier to update HEAD to point to a
by reading the fd
of c
than it is to traverse the entire list until it gets to the end.
Internally, every chunk - whether allocated or free - is stored in a structure. The difference is how the memory space is used.
When space is allocated from the heap using a function such as malloc()
, a pointer to a heap address is returned. Every chunk has additional metadata that it has to store in both its used and free states.
The chunk has two sections - the metadata of the chunk (information about the chunk) and the user data, where the data is actually stored.
The size
field is the overall size of the chunk, including metadata. It must be a multiple of 8
, meaning the last 3 bits of the size
are 0
. This allows the flags A
, M
and P
to take up that space, with M
being the 3rd-last bit of size
, A
the 2nd-last and P
the last.
The flags have special uses:
Free chunks have additional metadata to handle the linking between them.
Consolidating fastbins
, I said that chunks that went to the unsorted bin would consolidate, but fastbins would not. This is technically not true, but they don't consolidate automatically; in order for them to consolidate, the function has to be called. This function looks complicated, but it essentially just grabs all adjacent fastbin chunks and combines them into larger chunks, placing them in the unsorted bin.
Why do we care? Well, UAFs and the like are very nice to have, but a Read-After-Free on a fastbin chunk can only ever leak you a heap address, as the singly-linked lists only use the fd
pointer which points to another chunk (on the heap) or is NULL. We want to get a libc leak as well!
If we free enough adjacent fastbin chunks at once and trigger a call to malloc_consolidate()
, they will consolidate to create a chunk that goes to the unsorted bin. The unsorted bin is doubly-linked, and acts accordingly - if it is the only element in the list, both fd
and bk
will point to a location in malloc_state
, which is contained within libc.
This means that the more important thing for us to know is how we can trigger a largebin consolidation.
Some of the most important ways include:
Inputting a very long number into scanf
(around 0x400
characters long)
This works because the code responsible for it manages a scratch_buffer
and assigns it 0x400
bytes, but uses malloc
when the data is too big (along with realloc
if it gets even bigger than the heap chunk, and free
at the end, so it works to trigger those functions too - great for triggering hooks!).
Inputting something along the lines of %10000c
into a format string vulnerability also triggers a chunk to be created
Both of these work because a largebin allocation triggers malloc_consolidate
.By checking the calls to the function in (2.35), we can find other triggers.
It's possible for earlier or later glibc versions to have a greater or lesser number of calls to a specific function, so make sure to check for your version! You may find another way exists.
The most common and most important trigger, a call to malloc()
requesting a chunk of largebin size will .
There is another call to it in the section . This section is called when the top chunk has to be used to service the request. The checks if the top chunk is large enough to service the request:
So, by filling the heap and requesting another chunk, we can trigger a call to malloc_consolidate()
.
(If both conditions fail, _int_malloc
falls back to esssentially using mmap
to service the request).
TODO
P
is the , which is set when the previous adjacent chunk (the chunk ahead) is in use
M
is the , which is set when the chunk is allocated via mmap()
rather than a heap mechanism such as malloc()
A
is the , which is set when the chunk is not located in main_arena
; we will get to Arenas in a later section, but in essence every created thread is provided a different arena (up to a limit) and chunks in these arenas have the A
bit set
prev_size
is set , as calculated by P
being 0
. If it is not, the heap saves space and prev_size
is part of the previous chunk's user data. If it is, then prev_size
stores the size of the previous chunk.
This can be seen in the struct:
If not, checks if there are fastchunks in the arena. If there are, it calls malloc_consolidate
to attempt to regain space to service the request!
Calling will consolidate fastbins (which makes sense, given the name malloc_trim
). Unlikely to ever be useful, but please do let me know if you find a use for it!
When changing malloc options using mallopt
, . This is pretty useless, as mallopt
is likely called once (if at all) in the program prelude before it does anything.
When a non-fast chunk is freed, it gets put into the Unsorted Bin. When new chunks are requested, glibc looks at all of the bins
If the requested size is fastbin size, check the corresponding fastbin
If there is a chunk in it, return it
If the requested chunk is of smallbin size, check the corresponding smallbin
If there is a chunk in it, return it
If the requested chunk is large (of largebin size), we first consolidate the largebins with malloc_consolidate()
. We will get into the mechanisms of this at a later point, but essentially I lied earlier - fastbins do consolidate, but not on freeing!
Finally, we iterate through the chunks in the unsorted bin
If it is empty, we service the request through making the heap larger by moving the top chunk back and making space
If the requested size is equal to the size of the chunk in the bin, return the chunk
If it's smaller, split the chunk in the bin in two and return a portion of the correct size
If it's larger,
One thing that is very easy to forget is what happens on allocation and what happens on freeing, as it can be a bit counter-intuitive. For example, the fastbin consolidation is triggered from an allocation!
Heap Overflow, much like a Stack Overflow, involves too much data being written to the heap. This can result in us overwriting data, most importantly pointers. Overwriting these pointers can cause user input to be copied to different locations if the program blindly trusts data on the heap.
To introduce this (it's easier to understand with an example) I will use two vulnerable binaries from Protostar.
It wouldn't be fun if there were no protections, right?
Using Xenial Xerus, try running:
Notice that it throws an error.
Is the chunk at the top of the bin the same as the chunk being inserted?
For example, the following code still works:
When removing the chunk from a fastbin, make sure the size falls into the fastbin's range
The previous protection could be bypassed by freeing another chunk in between the double-free and just doing a bit more work that way, but then you fall into this trap.
Namely, if you overwrite fd
with something like 0x08041234
, you have to make sure the metadata fits - i.e. the size ahead of the data is completely correct - and that makes it harder, because you can't just write into the GOT, unless you get lucky.
Much like the name suggests, this technique involves us using data once it is freed. The weakness here is that programmers often wrongly assume that once the chunk is freed it cannot be used and don't bother writing checks to ensure data is not freed. This means it is possible to write data to a free chunk, which is very dangerous.
TODO: binary
Reintroducing double-frees
Tcache poisoning is a fancy name for a double-free in the tcache chunks.
http://exploit.education/phoenix/heap-zero/
Luckily it gives us the source:
So let's analyse what it does:
Allocates two chunks on the heap
Sets the fp
variable of chunk f
to the address of nowinner
Copies the first command-line argument to the name
variable of the chunk d
Runs whatever the fp
variable of f
points at
The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.
Let's check out the heap in normal conditions.
We'll break right after the strcpy and see how it looks.
If we want, we can check the contents.
So, we can see that the function address is there, after our input in memory. Let's work out the offset.
Since we want to work out how many characters we need until the pointer, I'll just use a De Bruijn Sequence.
Let's break on and after the strcpy
. That way we can check the location of the pointer then immediately read it and calculate the offset.
So, the chunk with the pointer is located at 0x2493060
. Let's continue until the next breakpoint.
radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.
Notice we overwrote the size
field, so the chunk is much bigger. But now we can easily use the first value to work out the offset (we could also, knowing the location, have done pxq @ 0x02493060
).
So, fairly simple - 80 characters, then the address of winner
.
We need to remove the null bytes because argv
doesn't allow them
http://exploit.education/phoenix/heap-one/
This program:
Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name
of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name
variables of the heapStructures
Prints something
Let's break on and after the first strcpy
.
As we expected, we have two pairs of heapStructure
and name
chunks. We know the strcpy
will be copying into wherever name
points, so let's read the contents of the first heapStructure
. Maybe this will give us a clue.
Look! The name
pointer points to the name
chunk! You can see the value 0x602030
being stored.
This isn't particularly a revelation in itself - after all, we knew there was a pointer in the chunk. But now we're certain, and we can definitely overwrite this pointer due to the lack of bounds checking. And because we can also control the value being written, this essentially gives us an arbitrary write!
And where better to target than the GOT?
The plan, therefore, becomes:
Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner
But what function should we overwrite? The only function called after the strcpy
is printf
, according to the source code. And if we overwrite printf
with winner
it'll just recursively call itself forever.
Luckily, compilers like gcc
compile printf
as puts
if there are no parameters - we can see this with radare2:
So we can simply overwrite the GOT address of puts
with winner
. All we need to find now is the padding until the pointer and then we're good to go.
Break on and after the strcpy
again and analyse the second chunk's name
pointer.
The pointer is originally at 0x8d9050
; once the strcpy occurs, the value there is 0x41415041414f4141
.
The offset is 40.
Again, null bytes aren't allowed in parameters so you have to remove them.
A double-free can take a bit of time to understand, but ultimately it is very simple.
Firstly, remember that for fast chunks in the fastbin, the location of the next chunk in the bin is specified by the fd
pointer. This means if chunk a
points to chunk b
, once chunk a
is freed the next chunk in the bin is chunk b
.
In a double-free, we attempt to control fd
. By overwriting it with an arbitrary memory address, we can tell malloc()
where the next chunk is to be allocated. For example, say we overwrote a->fd
to point at 0x12345678
; once a
is free, the next chunk on the list will be 0x12345678
.
As it sounds, we have to free the chunk twice. But how does that help?
Let's watch the progress of the fastbin if we free an arbitrary chunk a
twice:
Fairly logical.
But what happens if we called malloc()
again for the same size?
Well, strange things would happen. a
is both allocated (in the form of b
) and free at the same time.
If you remember, the heap attempts to save as much space as possible and when the chunk is free the fd
pointer is written where the user data used to be.
But what does this mean?
When we write into the use data of b
, we're writing into the fd
of a
at the same time.
And remember - controlling fd
means we can control where the next chunk gets allocated!
So we can write an address into the data of b
, and that's where the next chunk gets placed.
Now, the next alloc will return a
again. This doesn't matter, we want the one afterwards.
Boom - an arbitrary write.
Still on Xenial Xerus, means both mentioned checks are still relevant. The bypass for the second check (malloc() memory corruption) is given to you in the form of fake metadata already set to a suitable size. Let's check the (relevant parts of) the source.
The fakemetadata
variable is the fake size of 0x30
, so you can focus on the double-free itself rather than the protection bypass. Directly after this is the admin
variable, meaning if you pull the exploit off into the location of that fake metadata, you can just overwrite that as proof.
users
is a list of strings for the usernames, and userCount
keeps track of the length of the array.
Prompts for input, takes in input. Note that main()
itself prints out the location of fakemetadata
, so we don't have to mess around with that at all.
createUser()
allocates a chunk of size 0x20
on the heap (real size is 0x30
including metadata, hence the fakemetadata
being 0x30
) then sets the array entry as a pointer to that chunk. Input then gets written there.
Get index, print out the details and free()
it. Easy peasy.
Checks you overwrote admin
with admin
, if you did, mission accomplished!
There's literally no checks in place so we have a plethora of options available, but this tutorial is about using a double-free, so we'll use that.
First let's make a skeleton of a script, along with some helper functions:
As we know with the fasttop
protection, we can't allocate once then free twice - we'll have to free once inbetween.
Let's check the progression of the fastbin by adding a pause()
after every delete()
. We'll hook on with radare2 using
Due to its size, the chunk will go into Fastbin 2, which we can check the contents of using dmhf 2
(dmhf
analyses fastbins, and we can specify number 2).
Looks like the first chunk is located at 0xd58000
. Let's keep going.
The next chunk (Chunk 1) has been added to the top of the fastbin, this chunk being located at 0xd58030
.
Boom - we free Chunk 0 again, adding it to the fastbin for the second time. radare2 is nice enough to point out there's a double-free.
Now we have a double-free, let's allocate Chunk 0 again and put some random data. Because it's also considered free, the data we write is seen as being in the fd
pointer of the chunk. Remember, the heap saves space, so fd
when free is located exactly where data is when allocated (probably explained better here).
So let's write to fd
, and see what happens to the fastbin. Remove all the pause()
instructions.
Run, debug, and dmhf 2
.
The last free()
gets reused, and our "fake" fastbin location is in the list. Beautiful.
Let's push it to the top of the list by creating two more irrelevant users. We can also parse the fakemetadata
location at the beginning of the exploit chain.
The reason we have to subtract 8 off fakemetadata
is that the only thing we faked in the souce is the size
field, but prev_size
is at the very front of the chunk metadata. If we point the fastbin freelist at the fakemetadata
variable it'll interpret it as prev_size
and the 8 bytes afterwards as size
, so we shift it all back 8 to align it correctly.
Now we can control where we write, and we know where to write to.
First, let's replace the location we write to with where we want to:
Now let's finish it off by creating another user. Since we control the fastbin, this user gets written to the location of our fake metadata, giving us an almost arbitrary write.
The 8 null bytes are padding. If you read the source, you notice the metadata string is 16 bytes long rather than 8, so we need 8 more padding.
Awesome - we completed the level!
Mixing it up a bit - you can try the 32-bit version yourself. Same principle, offsets a bit different and stuff. I'll upload the binary when I can, but just compile it as 32-bit and try it yourself :)
When a chunk is removed from a bin, unlink()
is called on the chunk. The unlink macro looks like this:
Note how fd
and bk
are written to location depending on fd
and bk
- if we control both fd
and bk
, we can get an arbitrary write.
Consider the following example:
We want to write the value 0x1000000c
to 0x5655578c
. If we had the ability to create a fake free chunk, we could choose the values for fd
and bk
. In this example, we would set fd
to 0x56555780
(bear in mind the first 0x8
bytes in 32-bit would be for the metadata, so P->fd
is actually 8 bytes off P
and P->bk
is 12 bytes off) and bk
to 0x10000000
. Then when we unlink()
this fake chunk, the process is as follows:
This may seem like a lot to take in. It's a lot of seemingly random numbers. What you need to understand is P->fd
just means 8 bytes off P
and P->bk
just means 12 bytes off P
.
If you imagine the chunk looking like
Then the fd
and bk
pointers point at the start of the chunk - prev_size
. So when overwriting the fd
pointer here:
FD
points to 0x56555780
, and then 0xc
gets added on for bk
, making the write actually occur at 0x5655578c
, which is what we wanted. That is why we fake fd
and bk
values lower than the actual intended write location.
In 64-bit, all the chunk data takes up 0x8
bytes each, so the offsets for fd
and bk
will be 0x10
and 0x18
respectively.
The slight issue with the unlink exploit is not only does fd
get written to where you want, bk
gets written as well - and if the location you are writing either of these to is protected memory, the binary will crash.
More modern libc versions have a different version of the unlink macro, which looks like this:
Here unlink()
check the bk
pointer of the forward chunk and the fd
pointer of the backward chunk and makes sure they point to P
, which is unlikely if you fake a chunk. This quite significantly restricts where we can write using unlink.
A primitive double-free protection
Starting from glibc 2.29, the tcache was hardened by the addition of a second field in the tcache_entry
struct, the :
It's a pointer to a tcache_perthread_struct
. In the function, we can see what key
is set to:
The chunk being freed is variable e
. We can see here that before tcache_put()
is called on it, there is a check being done:
The check determines whether the key
field of the chunk e
is set to the address of the tcache_perthread_struct
already. Remember that this happens when it is put into the tcache with tcache_put()
! If the pointer is already there, there is a very high chance that it's because the chunk has already been freed, in which case it's a double-free!
It's not a 100% guaranteed double-free though - as the comment above it says:
This test succeeds on double free. However, we don't 100% trust it (it also matches random payload data at a 1 in 2^<size_t> chance), so verify it's not an unlikely coincidence before aborting.
There is a 1/2^<size_t>
chance that the key
being tcache_perthread_struct
already is a coincidence. To verify, it simply iterates through the tcache bin and compares the chunks to the one being freed:
Iterates through each entry, calls it tmp
and compares it to e
. If equal, it detected a double-free.
You can think of the key
as an effectively random value (due to ASLR) that gets checked against, and if it's the correct value then something is suspicious.
In fact, the key
can even be helpful for us - the fd
pointer of the tcache chunk is mangled, so a UAF does not guarantee a heap leak. The key
field is not mangled, so if we can leak the location of tcache_perthread_struct
instead, this gives us a heap leak as it is always located at heap_base + 0x10
.
The value of tcache_key does not really have to be a cryptographically secure random number. It only needs to be arbitrary enough so that it does not collide with values present in applications. [...]
This isn't a huge change - it's still only straight double-frees that are affected. We can no longer leak the heap via the key
, however.
Starting from glibc 2.32, a new Safe-Linking mechanism was implemented to protect the singly-linked lists (the fastbins and tcachebins). The theory is to protect the fd
pointer of free chunks in these bins with a mangling operation, making it more difficult to overwrite it with an arbitrary value.
Every single fd
pointer is protected by , which is undone by :
Here, pos
is the location of the current chunk and ptr
the location of the chunk we are pointing to (which is NULL if the chunk is the last in the bin). Once again, we are using ASLR to protect! The >>12
gets rid of the predictable last 12 bits of ASLR, keeping only the random upper 52 bits (or effectively 28, really, as the upper ones are pretty predictable):
So, what does this mean to an attacker?
Again, heap leaks are key. If we get a heap leak, we know both parts of the XOR in PROTECT_PTR
, and we can easily recreate it to fake our own mangled pointer.
When trying to get a chunk e
out of the tcache, alignment is checked.
The macros are defined side-by-side, but really aligned_OK
is for addresses while misaligned_chunk
is for chunks.
This alignment check means you would have to guess 16 bits of entropy, leading to a 1/16 chance if you attempt to brute-force the last 16 bits to be
New and efficient heap management
Starting in , a new heap feature called the tcache was released. The tcache was designed to be a performance booster, and the operation is very simple: every chunk size (up to size 0x410) has its own tcache bin, which can store up to 7 chunks. When a chunk of a specific size is allocated, the tcache bin is searched first. When it is freed, the chunk is added to the tcache bin; if it is full, it then goes to the standard fastbin/unsortedbin.
The tcache bin acts like a fastbin - it is a singly-linked list of free chunks of a specific size. The handling of the list, using fd
pointers, is identical. As you can expect, the attacks on the tcache are also similar to the attacks on fastbins.
Ironically, years of defenses that were implemented into the fastbins - such as the - were ignored in the initial implementation of the tcache. This means that using the heap to attack a binary running under glibc 2.27 binary is easier than one running under 2.25!
When a chunk is freed and tcache_put()
is called on it, the key
field is set to the location of the tcache_perthread_struct
. Why is this relevant? Let's check :
So, what can we do against this? Well, this protection doesn't affect us that much - it stops a simple double-free, but if we have any kind of UAF primitive we can easily overwrite e->key
. Even with a single byte, we still have a 255/256 chance of overwriting it to something that doesn't match key
. Creating fake tcache chunks doesn't matter either, as even in the latest glibc version there is , meaning tcache poisoning is still doable.
In glibc 2.34, the key
field was . Instead of tcache_put()
setting key
to the location of the tcache_perthread_struct
, it sets it to :
Note the as well!
What is tcache_key
? It's defined and set directly below, in the function:
It attempts to call __getrandom()
, which is defined as a stub and for Linux ; it just uses a syscall to read n
random bytes. If that fails for some reason, it calls the function instead, which generates a pseudo-random number seeded by the time. Long story short: tcache_key
is random. The , and the operation is the same, just it's completely random rather than based on ASLR. As the comment above it says
It's a very rudimentary protection - we use the current location and the location we point to in order to mangle it. From a programming standpoint, it has virtually no overhead or performance impact. We can see that PROTECT_PTR
has been implemented in and two locations in _int_free()
(for fastbins) and . You can find REVEAL_PTR
used as well.
It might be tempting to say that a partial overwrite is still possible, but there is a new security check that comes along with this Safe-Linking mechanism, the alignment check. This check ensures that chunks are 16-bit aligned and is only relevant to singly-linked lists (like all of Safe-Linking). A quick Ctrl-F for unaligned
in will bring up plenty of different locations. The most important ones for us as attackers is probably the one in tcache_get()
and the ones in _int_malloc()
.
There are three checks here. First on , the macro for removing a chunk from a fastbin:
Once on :
And lastly on every fastbin chunk during the :
_int_free()
checks the alignment if the tcache_entry
is already set to the value it's meant to be and it has to do a whole double-free iteration check:
When all the fastbins are consolidated into the , they are :
Not super important functions for attackers, but fastbin chunks are checked for alignment in , , , .
You may notice some of them use while others use .
is defined as such:
is defined for i386 as 16
. In binary that's 10000
, so MALLOC_ALIGN_MASK
is 1111
, so the final byte is checked. This results in 16-bit alignment, as expected.