Reversing a Go stripped binary

Table of Contents

"You are a brand new member of the Gophers team wohoo! But before you become an official member, you need to check the Gopher Archive and see if you know someone in there so that you can talk to them. One small issue: You weren’t given the password to that archive, just the archive itself. It shouldn’t be a problem for you to find the password to that archive thanks to your Gopher reverse engineering talents!"

I chose to write a long blog post about this challenge as it is a Go binary (Download the file), which isn't faced a lot in CTFs, and it is also stripped - which makes it harder. With that explained and the description of the challenge above, let's get right into reversing it

File Analysis

When first downloading the file the main idea is to check what kind of file we are facing:

krypton@spacelab:~$ file gopher-archive
gopher-archive: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=DZY5-NZWffdx_11K_RuZ/ecYcwlsBPPJC4gB4vUTq/R6IjWjgUnvavmgqByNvV/V2BcxgTPHcU3p7JRtY_J, stripped

As we can see, it is a 'normal' linux executable file. One special thing is that we can see it's a Go build, so the file is a Go binary. We can also see that the binary is stripped, meaning that it will most likely be horrible to read the assembly code and recognize what the binary does.

When executing the binary we have a normal welcome message, with a Gopher ASCII art. It is asking for a password to access the archive. We can try some overflow but that of course does not work, and in the end it's a reversing challenge; not a pwn challenge

First look at disassembled program

Functions

Let's open up the binary in a disassembler; as expected it's kind of a mess due to the binary being stripped. We see lots of sub_XXXXXX function names and none of them directly seem to look like the main function of the binary. We however see a _start function but once again, that doesn't look like it's the main function.

Strings

In Go binaries strings are handled differently than C binaries, a pointer to the string is first loaded and then the length we need is moved in a register, as per Go's source code:

type stringStruct struct {
    str unsafe.Pointer
    len int
}

Looking at the strings, when searching for the string that appears when the password entered was incorrect we get the following:

810668945312588817841970012523233890533447265625[  @f | @))    |  | @))   l  0 _/  \nSorry, that password is incorrect!

Our string is inside that one, but it's definitely not used only at the place where we print the error message. Looking at the cross references of that string:

Code references, where it is used, of the long string

We see that it's used at only 4 places. We can now use GDB to place a breakpoint on each of the calls and see if it reaches the breakpoint before printing the welcome message.

Debugging

Finding the main function

Once we reach a breakpoint before the welcome message gets printed, we can simply use ni to get to the next instruction and hold Enter pressed until we see the welcome message with user prompt, here we must have called the main function. And indeed, before printing the welome message and asking for user input we had two important instructions:

0x4347c2    mov     rax, qword [rel data_4aa980]  {sub_489920}
0x4347c9    lea     rdx, [rel data_4aa980] ; Less important
0x4347d0    call    rax  {sub_489920}

We make a call to $rax which has the value of the address of the sub_489920 function. With that information, we've found the main function!

Disassembling the main function

Understanding the logic

Once we've found the main function we need to understand how the password checking system works. Now we can go inside the main function (at 0x489920) and look at the assembly code there.

We first see lots of calls being made one after the other to always the same function. The graph view confirms that there are no checks being made and the functions are just called with no specific behavior. We have a Gopher ASCII art printed when running the file, we can safely consider these calls as being responsible for printing this text. So we can rename the function sub_484100 to print and not look at its content, as it's not what we care about.

Once we come at the end of that list of calls, we see two calls followed by a condition. If the condition succeeded we go and print a single time. If the condition failed we go and print twice. So out of the error message we can say that left is the error message and right the success message. Now we just have to look at the functions sub_44afc0 and sub_48a3c0 to see what they do. One, or maybe both, will be the password check.

Debugging the two unknown functions

Second unknown function

Let's put a breakpoint on the first function, sub_44afc0. We don't hit it before having to put user input, so it's not responsible of asking for user input. But let's continue the debugging and use ni to get to the next function call. After calling the function we can look at the content of $al to see if the condition will pass or fail.

gef➤  p $al
$1 = 0x0

So the condition will succeed (0x0 & 0x0 = 0x0) and we will jump to 0x489df1. We can see that the function sub_48a3c0 will check if the password is valid, so let's rename it to isPasswordValid.

First unknown function

Before jumping in the isPasswordValid function we need to find where the user input is asked, just to gain more debugging experience

We have confirmed that it's not the function above the password checking function, so let's break one function call higher - at 0x489d2d. We run the program, then use ni and now we get the user input. After giving it we see our user input in the $rbx register.

gef➤  x/s $rbx
0xc0000b2000:   "1337test\n"

So our previously unknown function is, for our reversing task, kind of useless. But we can go on and rename the function above, sub_469900, to handleUserInput.

Disassembling the isPasswordValid function

String splitting

When first looking at the function we see a call to sub_468740 and before that we load the string "-./05:<=?CLMPSU..." in the $rcx register. As per Go's way of handling strings, we see the length of the string we will need for the next call getting moved in the $edi register:

0x48a3dd    lea     rcx, [rel data_4a1bdd]  {"-./05:<=?CLMPSUZ[]`hms{} + @ P […"}
0x48a3e4    mov     edi, 0x1

So we can see that the string used will be just "-". After that the call to the function is made and we can see that there is a comparison:

0x48a3f7    cmp     rbx, 0x2

Our string is getting split with "-" and checked if there are two strings at the end! We can confirm this supposition by using the debugger:

$rax   : 0x0000c0000b6040  →  0x0000c0000bc000  →  "hello-world-123-456"
$rbx   : 0x4

Here we have 4 elements after the string is getting split. We can rename the function sub_468740 to splitString.

Strings length

Continuing witht the assembly code, we have previously seen that it's doing a check if the elements are two, so we know our password will be something like XYZ-ZYX.

Looking at the next conditions, we see two checks made with 0x6, we can assume each part of the password will need to have a length of 6. So something like 123456-123456. We can confirm this with the debugger but I feel like I don't need to show how it's being done.

Moving forward we can see two functions being called; sub_48a0e0 and sub_48a2a0. And as always their return value in $al is being checked. Considering we have two strings part, we can guess the first call will check the first part of the string and the second call will check the second part of the string.

Debugging first string validation function

Disassembling the rough logic

The first function, sub_48a0e0, is responsible for validating the first part of the password, we can rename it. Looking at it with the disassembler we don't get a lot of information because there are lots of calls being made and at the end a check with a 0x28 long string - "df4c2d865b38db152e7f6bb4ad2f325de9570185".

0x48a222    lea     rbx, [rel data_4a84fe]  {"df4c2d865b38db152e7f6bb4ad2f325d…"}
0x48a229    mov     ecx, 0x28
0x48a22e    call    compare
0x48a233    test    al, al
0x48a235    je      0x48a248

I renamed the function sub_402940 to compare as it's pretty obvious by looking at the disassembled code of it. Now let's jump into a debugging session to see what this function does.

String getting lowered

When breaking at 0x48a0e0 and going one instruction after the other, we can see that after calling the function sub_468fc0 our first part of the string got lowered:

$rax   : 0x0000c000138010  →  0x00616161616161 ("abcdef"?)

So we can now rename this function, we don't need to dig deep into how that function works.

String getting reversed

After continuing we see nothing really big going on until we face a call at 0x48a1a9 on the function sub_44b240 that will pass the "abcdef" string as an argument.

Call to an unknown function with 'abcdef' as argument

Looking at the results after the call, we see in $rax register "fedcban:/bin:/snap/bin:/". Do you see something special? Yes! We can see that our abcdef string got reversed to fedcba - remember how strings are handled. Another function to rename :D

Loop over characters in string

If we continue with the next instructions we see we stumble into a loop, it can also be seen in the graph view:

Looking at it closer we see that we first give a value to the $edx register to be 0x1ca3 (7331). After that we jump to 0x48a1d2 and compare if $rbx is equals to $rcx if it is, then we jump to 0x48a213 and exit the loop. Otherwise we continue and load into the $rdi register $rcx+0x1, this is a counter. Considering $rbx contains the length of the string, we can say this loop goes over every character of the string.

Now if we look at what exactly the loop does, we see the following assembly code:

0x48a1c2    movsxd  rsi, esi
0x48a1c5    imul    rcx, rsi
0x48a1c9    add     rsi, rcx
0x48a1cc    add     rdx, rsi
0x48a1cf    mov     rcx, rdi ; Not really useful
0x48a1d2    mov     qword [rsp+0x28 {var_60_1}], rdx

Here is the logic explained when using ABCDEF-GHIJKL as password and i being 1 which means we are iterating over e (remember, it's reversed):

$esi contains the hex value of the current character and it is being moved to $rsi ($rsi : 0x65)
We multiply $rsi with $rcx which contains the current value of i (0x65*0x1 is stored in $rcx)
We add to $rsi the result of the multiplication (0x65+0x65 is stored in $rsi)
We add to $rdx the value of $rsi (0x1d09+0xca is stored in $rdx)
This resulting value is then stored in $rsp+0x28
We check if $rsp+0x28 is equals to 0x256c more below after calling another unknown function

So we can resume this loop as being like the following:

x = 0x1ca3
while (True):
    if i >= len(s):
        break
    x += ord(s[i]) + (i * ord(s[i]))
    i += 0x1
if x == 0x256c:
    # Do something

Hashing function

When two functions are called one after the other, the return value of the first function is passed as an argument to the second automatically, you will see this in the incoming steps.

As previously said, there is another unknown function going on (sub_489e60). Right after calling it, it is being checked if the length of the resulting string is 0x28 which is a 40, the only hashing algorithm I know that has a resulting length of 40 is SHA1. So let's try to hash fedcba with SHA1.

The string coming out of the function is "82639466494baa873844ae6cdc593cd54b5c054e" and our input SHA1 hashed is "122fe41b518797f9474d5f6f4665e411c449512c". That doesn't look like it's the same, maybe we should go deeper in the function, because Google confirmed SHA1 has a length of 40 characters. When putting a breakpoint on a function call we can use si to jump in the function itself. We enter a loop once again, this time a new string is getting generated by the loop:

$rax   : 0x0000c0001b0000  →  "8df67bd6d5e123d52a9a"
$rax   : 0x0000c0001b0000  →  "8df67bd6d5e123d52a9a9"
...
$rax   : 0x0000c0001b0000  →  "8df67bd6d5e123d52a9a966bb31cf719"

If we continue with the loop we see a new function called with string above as argument. Looking at the generated string, it's the SHA1 value above starting with 826. So what is happening during the first part? The resulting string has a length of 32 characters, this is MD5. So basically what is happening is result = SHA1(MD5(string)). We can confirm it easily by doing it ourselves and doing a SHA1 hash of "fedcba" and then make a MD5 hash of the resulting SHA1 hash.

Phew! That's it for this hashing function, we can rename it to something like md5sha1.

Comparison

At the end we check if the resulting string is equals to the value in $rbx and take 0x28 characters of that string.

$rbx   : 0x000000004a84fe  →  "df4c2d865b38db152e7f6bb4ad2f325de9570185failed to [...]"
$rcx   : 0x28

So we compare if MD5(SHA1(string)) is equals to "df4c2d865b38db152e7f6bb4ad2f325de9570185".

Finally! The first check is reverse engineered.

Debugging second string validation function

Disassembling the logic

The second check function is sub_48a2a0, so let's rename it.

Looking at it we have some similar pattern to the previous one:

0x48a2c0    call    lower
; ...
0x48a360    call    reverse
0x48a365    call    md5sha1
; ...
0x48a374    lea     rbx, [rel data_4a845e]  {"a2ce7d4c220df20b186f5458d9449a56…"}
0x48a37b    mov     ecx, 0x28
0x48a380    call    compare

Phew! We have basically everything needed to already finish the reversing of this function. It does everything the first function does, just without the loop!

Generating the flag

We are now finished with reversing the functions and we can start generating the flag!

First string

The first string validity in TL;DR is:

Reverse the string
Add the hex values of each character to a counter
That counter, add the hex values of each character multiplied by the current index of the counter
If the counter is equals 9580 the first check is valid
If the reversed string md5 hashed and then sha1 hashed is equals to ?"df4c2d865b38db152e7f6bb4ad2f325de9570185" the second check is valid
If both checks are valid, the first string is valid

Second string

The second string validity in TL;DR is:

Reverse the string
If the reversed string md5 hashed and then sha1 hashed is equals to "a2ce7d4c220df20b186f5458d9449a56e0e36149" the check is valid
If the check is valid, the second string is valid

solver.py

The entire code for the solver would be the following:

def solve_first() -> str:
    from string import ascii_lowercase
    import itertools
    import hashlib
 
    def iter():
        for s in itertools.product(ascii_lowercase, repeat=6):
            yield "".join(s)
    for s in iter():
        s = s[::-1]
        x = 7331
        i = 0
        while (True):
            if (i >= len(s)):
                break
            x += ord(s[i]) + (i * ord(s[i]))
            i += 1
        print(f"Trying {s[::-1]}", end="\r")
        if (x == 9580) and (hashlib.sha1(hashlib.md5(s.encode()).hexdigest().encode()).hexdigest() == 'df4c2d865b38db152e7f6bb4ad2f325de9570185'):
            return s
 
def solve_second() -> str:
    import hashlib
 
    with open('rockyou.txt', 'r', encoding="latin-1") as f:
        for l in f.read().splitlines():
            print(f"Trying {l}", end="\r")
            reversed = l[::-1]
            if (hashlib.sha1(hashlib.md5(reversed.encode()).hexdigest().encode()).hexdigest() == 'a2ce7d4c220df20b186f5458d9449a56e0e36149'):
                return l
 
first = solve_first()
print(f"Found first string: {first[::-1]}")
second = solve_second()
print(f"Found second string: {second}")
 
print(f"\nFlag/password: {first}-{second}")

Source

I could've used a wordlist in the first check but I decided to two different ways of solving it, even though it's really slow. I also tried with itertools.permutations and using multiprocessing but it wasn't really faster at getting the permutations. If you want to try yourself you can edit the first check by downloading, for example, a wordlist of English words and make a wordlist bruteforce just like the second check - that's what I did to actually get the string.

Conclusion

If you read the post in its entirety until here, you are amazing!

It was an overall awesome challenge and I've definitely learned a lot, not only because it was stripped but also because it was a Go binary and not a usual C binary. Took me lots of time to make this writeup and solve the challenge but I am happy about it :) Hopefully you enjoyed this challenge and if you want to try it yourself you can download the binary and have a look at it!