keygenme_v7 (Part 1)

02 June, 2020

keygenme_v7 is a crackme imported from crackmes.de, with a difficulty of level 3 in C/C++. It is a Windows-specific challenge, and is the first I’ve looked at with a GUI.

This writeup is split into two parts. This first covers the periphery areas of the program - the startup and callbacks that will take us to the keygen algorithm. The latter will cover the keygen itself, which is complex enough to warrant its own post, and builds on concepts used leading up to it.

On starting up the program, we’re immediately met with two windows - a message box that constitutes the nag screen, and a window taking a username and serial. Entering some text into the two boxes and clicking “Register” doesn’t really seem to do anything for now.

Lets take a look at the disassembly, starting from the entry point (at 0x4028E1), which then in turn immediately calls a function - 0x4012DA.

sub_4012DA

Looking at this function in a disassembler, we see some constant strings of DLL and function names, and well as calls to GetModuleHandleW() and GetProcAddress(). This is a common technique for loading functions from DLLs at runtime, and is here likely used as an obfuscation technique. The main takeaways from this that should be marked in our disassembler are:

  • dword_403008 = wcsncmp()
  • dword_403000 = memset()
  • dword_403004 = wcsncpy()
  • dword_40300C = memcpy()
  • dword_403010 = InitCommonControls()

After loading all of these, the function then calls GetCurrentProcess(), then OpenProcessToken() to get its own access token. We then see the constant string “SeDebugPrivilege”, followed by a call to LookupPrivilegeValueW(), and some shuffling of a TOKEN_PRIVILEGES struct before then calling AdjustTokenPrivileges(). Looking at the structure and function call, we see

  • The TokenHandle parameter is the token for this process
  • The DisableAllPrivileges parameter is FALSE, so we may be enabling or disabling based on the passed struct
  • The NewState parameter is our TOKEN_PRIVILEGES struct
  • This struct has the 1 privilege in it
  • The LUID of that privilege is the one we looked up with the SeDebugPrivilege string
  • The attributes attached with this LUID is 2, which if we look up in winnt.h is the SE_PRIVILEGE_ENABLED constant.

Therefore, this part is enabling the SeDebugPrivilege privilege of the process. The authors of The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux and Mac Memory say:

This grants the ability to read from or write to another process’ private memory space. It allows malware to bypass the security boundaries that typically isolate processes. Practically all malware that performs code injection from user mode relies on enabling this privilege.

We can therefore expect that the program will need to read or write to the memory of another process later down the line.

That’s it for this function, and control returns back to the entry point function, _start.

sub_4028E1

Immediately after the previous call, the entry function calls the address that stores InitCommonControls() that was loaded earlier.

Once it comes back from that function, this entry function is dedicated to setting up a WNDCLASSEXA structure to be passed to RegisterClassExW(). The main point we care about here is the WNDPROC parameter, an “application-defined function that processes messages sent to a window” (MSDN). We will come back to this function later.

If the call to RegisterClassExW() is successful, then we get the following strange sequence:

0x004029e0      6851254000             push    fcn.00402551 ; 0x402551 ; "U\x8b\xec\x81\xecP\x01"
0x004029e5      ff356c404000           push    dword [sym.imp.USER32.dll_CreateWindowExW]
0x004029eb      68f6294000             push    0x4029f6
0x004029f0      68b0134000             push    fcn.004013b0 ; 0x4013b0 ; "U\x8b\xecQQSVWj\nj\b\xff\x15,@@"
0x004029f5      c3                     ret

This is the entry function, so why does it end with a ret? Where would it be going? The answer lies in the pushes beforehand. Since ret pulls the top address of the stack and sets the program counter to it, this is a long-winded way of saying call 004013b0. As we will see later, the other pushes give us both the argument(s) for this function call, and the future control flow for when that call ends. Lets draw out the stack.

Before ret:

PC = 004029F5

+----------+ Top
| 004013b0 |
|----------|
| 004029f6 |
|----------|
| 761019A0 | // CreateWindowExW
|----------|
| 00402551 |
+----------+ Bottom

After ret:

PC = 004013b0

+----------+ Top
| 004029f6 |
|----------|
| 761019A0 | // CreateWindowExW
|----------|
| 00402551 |
+----------+ Bottom

sub_4013B0

As seen in the stack diagram, control flow now continues to 0x4013b0. The point of the push-then-ret strategy above over a simple call is likely obfuscation, and because of that we cannot be sure how many arguments this function takes - some or all of the addresses pushed before 4013b0 might be address to continue execution flow to, while some might be arguments to this function. We will have to analyse how the stack is used in this function to correctly determine. In addition to the standard function prolog, this function also immediately pushes a lot of registers. At 0x4013BC, our stack will now look like:

PC = 004013BC

+----------+ Top
| 00000008 |
|----------|
| 0000000A |
|----------|
|   edi    |
|----------|
|   esi    |
|----------|
|   ebx    |
|----------|
|   ecx    |
|----------|
|   ecx    |
|----------|
|   ebp    |
|----------|
| 004029f6 |
|----------|
| 761019A0 | // CreateWindowExW
|----------|
| 00402551 |
+----------+ Bottom

This could be saving the state of the registers to restore them at the end of the function, however the double push of ecx stands out as strange. Next up is a call to GetProcessHeap(), which takes no arguments, and then the returned HANDLE is pushed and HeapAlloc() is called. This function takes three arguments, the first being a handle to the heap we just obtained, and the next two are our top two constant values on the stack. Therefore, the dwFlags argument is 8 (HEAP_ZERO_MEMORY), and the dwBytes is 10. Those two values are popped from our stack, meaning the top is now at the pushed value of edi. This function call has allocated 10 bytes of memory and zeroed them out in the process heap.

The address in the heap returned is then passed to VirtualProtect(), along with a dwSize parameter of 10 and an flNewProtect value of 0x40 (PAGE_EXECUTE_READWRITE) marking our allocated memory as executable.

The function is then called again with almost the same parameters, except this time dwSize is 5, and the address being protected comes from the argument [ebp+8]. ebp points to the base of the stack, which is set at the beginning of the function, and is therefore marked on this diagram:

PC = 004013DC

+----------+ Top
|   edi    |
|----------|
|   esi    |
|----------|
|   ebx    |
|----------|
|   ecx    |
|----------|
|   ecx    |
|----------|
|   ebp    | <-- ebp points here
|----------|
| 004029f6 | <-- ebp+4 (return address)
|----------|
| 761019A0 | <-- ebp+8 (arg 1, CreateWindowExW)
|----------|
| 00402551 | <-- ebp+C (arg 2)
+----------+ Bottom

Therefore, [ebp+8] is the address of the start of the CreateWindowExW() function that is being marked as executable. The five bytes being marked disassemble to a standard function prolog.

0x761019A0      8b ff                  mov     edi, edi
0x761019A2      55                     push    ebp
0x761019A3      8b ec                  mov     ebp, esp

This could of course be slightly different depending on your Windows version.

These 5 bytes are then copied to the start of our 10-byte buffer that was allocated at the start of the function.

The memset() function is then used to set the first byte of the external function (in this case the address 0x761019A0, which is CreateWindowExW()) to 0xE9. This is the opcode for the relative x86 JMP instruction, which takes an offset from the current position to jump to.

The function next pulls [ebp+C], which we can see from our diagram is 0x402551. This is stored in memory and then the distance from the start of the CreateWindowExW() function to the address is calculated and stored in memory. memcpy() is called with that as the source, a size of 4 bytes and the destination as our CreateWindowExW() function plus one (0x761019A1). Now, the first 5 bytes of CreateWindowExW() disassembles to:

0x761019A0      e9 ac 0b 30 8a         jmp     00402551

The function has rewritten what CreateWindowExW() does to instead call to a function in the program.

A similar procedure is performed for the last 5 bytes of the buffer that was allocated at the beginning - the relative distance from the buffer to the CreateWindowExW() function (plus five to avoid the new jmp) is calculated, and a relative jmp instruction is written. The 10-byte buffer now disassembles to:

0x00475500      8bff                   mov     edi, edi
0x00475502      55                     push    ebp
0x00475503      8bec                   mov     ebp, esp
0x00485505      e99bc4c875             jmp     0x761019A5

This function therefore takes two arguments - a function to hook and a function to hook that one with. It stores the first 5 bytes of the hooked function in a buffer, then overwrites those first 5 bytes of the function with a relative jump to the hooking function. The remainder of the 10-byte buffer is a relative jump to the rest of the hooked function. This means calling the hooked function directly instead calls the hooking function, and the hooked function is instead accessed through the newly allocated stub code.

The address of the stub code is placed in eax as the return value.

At the end of our function, esp is incremented and then the stack is popped to give us a stack that looks like this:

PC = 0040144E

+----------+
| 004029f6 |
|----------|
| 761019A0 | // hooked CreateWindowExW
|----------|
| 00402551 |
+----------+ Bottom

And therefore control flow returns to 0x4029f6.

loc_4029f6

This is a small piece of code that saves the newly allocated stub that calls into the real CreateWindowExW() to the constant address 0x4029FC, and then calls the hooking function again. This time, the CharUpperW() function is hooked with the function at address 0x4017be.

The control flow then returns to 0x402A17.

loc_402A17

Like before, the stub that calls into the real CharUpperW() is saved to a constant location, at 0x402A1D.

The function then starts to load 2-byte values into a space in the stack. Each value is pushed, then immediately popped into eax and saved to the stack buffer. This is likely done as an obfuscation technique. The last byte pushed is 0, which is easy to miss in a disassembly because it breaks the pattern and just calls xor eax, eax before then putting eax into the stack space. The values 0x13 and 0x72014 are then pushed, and the address of our stack space is saved to ecx, then function at 0x402B3C is called. ecx is the register used for __fastcall on Windows x86, so the stack buffer is our first parameter to the function. Looking beyond this call, we see the buffer is then passed as the lpWindowName argument to CreateWindowExW() (which actually calls our hook). It is therefore safe to assume that 0x402B3C is probably a string deobfuscation function.

sub_402b3c

This function uses edi as a loop counter to iterate from 0 to the third argument (0x13 in our case from above). For each element, it adds the 2nd argument to the loop counter, and uses idiv to get the remainder of the total divided by 255 (the remainder is placed in edx in an idiv). This remainder is then XOR’d with the character for the given element. We can quite easily turn this back into a C example with this information. Remember the string buffer is 2-bytes wide, and since we are on Windows this means it is a WCHAR array. We can also see from the assembly that the buffer is written back to, so it is an INOUT argument, rather than the function returning anything.

#include <Windows.h>

void __fastcall str_deobf(WCHAR *input, DWORD key, DWORD size)
{
    int i;
    for (i = 0; i < size; i++)
        input[i] ^= (key + i) % 255;
}

If we give the sequence of bytes built up from the code starting at 0x402A22, as well as the right size of 0x13 and the key of 0x72014, then we get the string “KeygenMe V7 - Trial” back, which we see from running the program is the window title.

As mentioned, when this function returns from being called at 0x402AB7, its output is then being used as input to the hook on CreateWindowExW(), which is called next.

sub_402551 (nWidth != 120)

This function starts by checking the nWidth parameter against 120. When it is called, that is not the case, so it takes the failure branch, which simply pushes all the arguments back and calls the CreateWindowExW() stub, calling the real function. This makes the main window and returns from the function.

loc_402A80

Returning back from the call to the real CreateWindowExW(), the return value is stored and the passed to the UpdateWindow() function.

The rest of this function is an infinite loop of:

  • GetMessageW()
  • TranslateMessage()
  • DispatchMessageW()

This loop waits for an acts on window messages for the rest of the program’s runtime. The function responsible for this is the one that was passed as the lpfnWndProc argument to the RegisterClassExW() function way back at the beginning of the function, at 0x402121.

sub_402121

Since this function must conform to the WNDPROC callback structure, we know the arguments beforehand. This function is a large switch statement that handles each event type it’s interested in. I won’t look at all of them, as some of them do what you’d expect with no surprises.

The WM_CREATE event (the first branch, handled at 0x40235A) is notable because it includes a call to the hooked CreateWindowExW() with an nWidth of 120, calling the alternate behaviour.

sub_402551 (nWidth == 120)

When the hooked CreateWindowExW() is called with nWidth as 120, the branch at the beginning is not taken. The majority of the resultant code is taken by initialising and deobfuscating two strings. The first says “Nag Screen”, the second we can assume is the text of the nag screen message.

Before that, a block of memory (158 bytes) pointed to on the stack is initialised to zero, and the first four bytes of the block are set to the address of the MessageBoxW() function. The remaining bytes are filled with the deobfuscated strings, with a four-byte gap between the function address and the strings.

Two addresses are then pushed:

  • 0x4028B1 - the other side of the jump in this function that calls the real MessageBoxW() function
  • 0x401469

ret is then executed, returning control flow to 0x401469.

sub_401469

This function first calculates the difference between the function at 0x401451, and a nullsub immediately at 0x401468. This is calculating the size of the function, which is 23 bytes.

The function then calls OpenProcess, opening itself with 0x43A as the dwDesiredAccess field, which decodes to:

  • PROCESS_CREATE_THREAD
  • PROCESS_VM_OPERATION
  • PROCESS_VM_READ
  • PROCESS_VM_WRITE
  • PROCESS_QUERY_INFORMATION

The function then uses VirtualAllocEx() to allocate two pieces of memory with protection PAGE_EXECUTE_READWRITE. The first is the size of the value calculated earlier (23), the latter is 168 bytes. The addresses are determined at runtime by passing NULL as the lpAddress parameter. The addresses are stored in ebx and edi respectively.

The two buffers are then written to with WriteProcessMemory(). The first has the entirety of the function at 0x401451 written to it. The second writes [ebp+8], which is our block of memory that was set up in the previous function which contains a function address, a gap of four null bytes, and then the two decoded strings.

CreateRemoteThread() is then called, with the first buffer as the start address, and the second as the parameter. The function then sleeps for 200 milliseconds.

From what has been seen here, this entire function call could effectively be replaced with a call to 0x401451, passing the block of memory as the parameter. Perhaps this is an obfuscation technique.

The function returns to the next address that was pushed. We have seen this before, it is the other arm of the branch in the function at 0x402551, that just calls into CreateWindowExW(). From there the control flow goes back to the WM_CREATE handler of the window event handler function, which is otherwise uninteresting. We will return to other branches of that function in a second, but first we need to look at the brief function called in a new thread here - 0x401451.

sub_401451

This is a small function that parses the 168-byte block given. We can see it pulls the first 4 bytes - the function address to call - to ecx. It then pulls [ecx+8], then [ecx+1E], then [ecx+4] and pushes each, as well as the constant 0x30. We can therefore assume this data contains three arguments starting at those offsets. We could represent this data as:

// sizeof=0xA8
struct function_parameter_pack {
  FARPROC func;       // +00
  DWORD   arg0;       // +04
  UINT8   arg1[0x16]; // +08
  UINT8   arg2[0x8A]; // +1E
};

And this function’s definition would then be:

void sub_401451(struct function_parameter_pack *f)
{
  f->func(f->arg0, f->arg2, f->arg1, 0x30);
}

Recall that f->func in this case is MessageBoxW(). This function signature exactly matches that, so perhaps this function only exists to obfuscate calls to MessageBoxW(). This call therefore translates to:

  • hWnd = NULL (the message box has no owner window)
  • lpText = The second, longer deobfuscated string (the nag screen message)
  • lpCaption = The first deobfuscated string (the string “Nag Screen”)
  • uType = MB_ICONEXCLAMATION

And sure enough, this call gives us the nag screen window.

Conclusion

Having looked at this function, only 2 remain unexplored - the hook for the CharUpperW() function at 0x4017BE, and the large function at 0x401856. We also have some unexplored branches of the window event handling function. These will all be looked at in part 2.

This initial look at the binary has revealed many obfuscation techniques, such as XOR-encoded strings, runtime function loading, hooked Win32 functions and calling functions by allocating new executable memory for them. These are all designed to make things tricky for static analysis, and as we go into the meat of the keygen algorithm it will be important to keep what has happened in this part in mind.