keygenme_v7 (Part 2)

14 June, 2020

This post is the second part on the keygenme_v7 crackme. The first part can be found here. It is recommended that you read the first part, since it gently introduces some concepts that are applied in a much more advanced context here.

We left the first part having tackled a large chunk of the executable code in the binary, leaving only:

The unexplored branches of the window handler function at 0x402121
The hook applied to CharUpperW() at 0x4017BE
The function at 0x401856

The majority of the branches in the window handler function are uninteresting, and just set up the buttons and text, or tear down the window when it’s closed. The branch we are interested in is at 0x402181, which with a debugger we can observe is followed when we click the ‘Register’ button. In this case, the jump is not taken, and two addresses are pushed to the stack followed by a ret, a pattern we’ve seen before. The two addresses are 0x40218E, the other side of this branch, and 0x401856. The return instruction causes execution to jump to that latter address.

sub_401856

Here, finally, is where the user input is processed. The function immediately starts with three calls to GetWindowTextLengthW(). The first two take the address 0x403024, the latter takes 0x40302C. Using a debugger, we can see that the first is the ‘Username’ field, and the latter is the ‘Serial’. The return of each call is followed by a comparison and a jump, and if the jump is taken then the function returns. This imposes size restraints on the number of characters in the two fields. These are:

Username is between 3 and 16 characters, inclusive
Serial is exactly 32 characters

If we input with fields within those constraints (for now lets use three As for the username, and 32 Bs for the serial), we get to the block of code at 0x4018A0. A lot of stack variables and one global are initialised here, most to zero, but [ebp-0x30] is set to -1, and the global byte at 0x403020 is set to 1. Two buffers are initialised to a range of zero bytes with memset(), the first is 34 bytes, the second is 66. Each of these buffers are then used as input to the GetDlgItemTextW() function, which pulls out the input strings from the text fields. The username goes into the smaller buffer, the serial into the larger. It would therefore be useful to mark in the disassembler that:

[ebp+C0] = username
[ebp+104] = serial

In both cases, the maximum size in characters of the string passed to the function is the size we established in the constraints above, plus one (so 17 for the username, 33 for the serial). These are both exactly half the size initialised earlier, and this is because both strings are WCHAR strings, and so each character uses two bytes. With the username and serial pulled from the UI, the address to the serial is pushed as the sole argument to CharUpperW(), which really calls the hooked function at 0x4017BE.

sub_4017BE

Immediately this function checks the global byte at 0x403020. If it is zero, it takes a jump right to the end of the function, which uses the stub created when the hook was made to call the real CharUpperW() and then exit. Effectively, the global byte is a switch to turn the additional behaviour of this function on or off. In this case, however, we know the byte is 1 as it was just set earlier, so the jump is not taken.

Instead then, the register esi is loaded with the base address of our string, and the 16-bit value at [esi] is compared to zero. Later in the code, esi is incremented by two, so we know this is a for loop that runs over the entire input string.

While looping through the string, each character is compared against a switch statement. The switch statement checks the value lies within the range of 48 and 57, which is the character encoding values for the digits 0-9. Each of these cases takes the value and modifies it slightly by adding or subtracting a number. This essentially means each digit in the input string - in this case our serial - has the digits swapped around according to these following rules:

Input	0	1	2	3	4	5	6	7	8	9
Output	5	7	4	1	0	9	8	3	2	6

Once this loop exits, the end of the function is reached - the string is passed to the real CharUpperW() and returned. We therefore know that our input serial will now only consist of uppercase letters, and digits that have been swapped around.

loc_40193D

Returning from the hooked CharUpperW() then goes to a line that initialises another stack variable to zero, then jumps to 0x40193D. The stack variable is used as a loop counter for a loop that goes over the entire username, but to spoil things a bit here, the loop doesn’t really do anything. It uses each character value to calculate two integers. One is never touch again, and the other is then used as a constraint in the next loop at 0x4019D4, which does nothing. You can tell this using a disassembler and checking the cross references to the stack address - there are no reads that happen to the address later in the program. This kind of useless code is an obfuscation technique designed to waste time while reverse engineering.

Skipping that codes takes us to address 0x4019E5, which contains yet more useless code - it pushes eax and edx, then zeroes them and performs some arithmetic (that does nothing as the values are all zero), then pops the registers back again. And the end of this block, another stack variable is set to zero - another loop index - and a jump is taken to 0x401A03.

The beginning of this loop immediately checks if the loop variable on the stack is greater than 32 - the size of the input serial. If it is less, the jump is not taken. The loop variable is then tested against zero. If it is not, another block executes, if it is then the block is skipped and execution jumps straight to 0x401A4E. The small block that executes if the loop variable is not zero uses the idiv command to get the remainder of the loop variable divided by 7. If it is zero, a jump is taken again to 0x401A4E. Essentially then, we can interpret this bit of code as:

for (int i = 0; i < 32; i++)
{
    if ((i == 0) || (i % 7 == 0))
        // loc_401A4E
    else
        // loc_401A20
}

We’ll consider the case when either of the jumps are taken (0x401A4E) first, since it executes first in reality. This bit yet again starts with the exact useless arithmetic code from before, at 0x4019E5. Perhaps it was a function that has been inlined. Skipping that takes us to a point where the loop variable is loaded into ecx, incremented twice, and then in a very roundabout way divided by 33 and the remainder taken. If the remainder is zero, a jump is taken to 0x401A86. If it isn’t, then the loop variable is stored in eax and AND’d with a bitmask of 0x80000003, then followed by a jns - “jump if not sign”. This means that if the loop variable is negative, that bit will persist (since the most significant bit of the bitmask is set), and the jump is not taken. The jump goes to 0x401A7E, which checks if after the AND, eax is zero or not. This will be the case if the loop variable is both not negative, and the last two bits were not set. If it is not zero, a jump is taken to 0x401CEF, otherwise execution continues back to 0x401A86, where our previous jump could have gone. Returning to our code segment above, this than be written out as:

for (int i = 0; i < 32; i++)
{
    if ((i == 0) || (i % 7 == 0))
    {
        if (((i + 2) % 33 == 0) || ((i > 0) && (i & 4 == 0)))
            // loc_401A86
        else
            // loc_401CEF
    }
    else
        // loc_401A20
}