jump to navigation

Ramblings on static deobfuscation 2009/03/06

Posted by dividead in Reverse engineering.
add a comment

The past few days while not at work I have been thinking a bit about binary deobfuscation, as I’m spending some of my spare time cracking an obfuscated binary.

The anti-disassembly and anti-debugging tricks used in this binary are fairly extensive, ranging from jmps in the middle of instructions (why does this still throw IDA pro off guard this much? At least in a graph representation it should be easily possible to follow both execution paths, so that they can be displayed in an intuitive manner), to insertion of redundant code, nanomites all over the place and so on.

The redundant code introduced by their protection scheme is fairly standard though, and come down to inserted ranges of nop-equivalent code. These redundancies can be easily eliminated through static peephole optimizations, which I implemented in a simple IDC script. However, when one thinks about this a bit, peephole optimization to eliminate such code is a fairly unsafe process if the introduced redundant code is more sophisticated.

To illustrate this, I often ran into an instruction pair such as:

        xchg    eax, edx
        xchg    eax, edx

Of course my IDC script quickly eliminated this as redundant code, but now consider the following situation:

        xchg    eax, edx
        xchg    eax, edx

Obviously this means we cannot trivially eliminate these xchg pairs as they’re not part of the same basic block. We might be able to eliminate them though if all execution paths leading to ‘yadda’ also contain an instruction such as xchg eax, edx. This is already less trivial, but addressable by static analysis tools.

But how about the situation where the disassembler, say IDA, cannot find a cross reference to yadda in the first place? I.e. when yadda is a dynamically calculated target? This would mean that my script (and likely others that focus on static reduction of redundant code) will eliminate these instructions without a second thought, making it a destructive operation.

Now forceful construction of such situations is also doable; suppose at some point in the original program there is a xchg eax, edx instruction, we could decide to create a stub function in the program containing only assumed noop codes, par examplum the following:

        add     eax, 10
        sub     eax, 20
        lea     eax, [eax+10]
        test    eax, 0xbadc0de
        xchg    eax, edx
        xchg    eax, edx
        and     edx, 0xFFFFFFFF

This would appear like a function containing only redundant code, and we could insert calls to it in many places, which deobfuscators would likely eliminate altogether with the function. But if we start replacing xchg eax, edx instructions in our original program with dynamically calculated call instructions to the second xchg instruction in the function such an elimination would be a destructive operation.

This example presented such a trick by abusing two instructions in a row, but it may also be possible to perform this trick when an instruction with multiple bytes is involved, where for instance the operand is an instruction itself. Consider:

        and    esi, esi

Dynamically branching to yadda + 1 (which by the way is 0xF6 — the opcode for div) will also wreak havoc on unsuspecting redundancy elimination. Pure static analysis of binaries containing these tricks seems a fairly inadequate approach to me.