Ramblings on static deobfuscation 2009/03/06Posted by dividead in Reverse engineering.
The past few days while not at work I have been thinking a bit about binary deobfuscation, as I’m spending some of my spare time cracking an obfuscated binary.
The anti-disassembly and anti-debugging tricks used in this binary are fairly extensive, ranging from jmps in the middle of instructions (why does this still throw IDA pro off guard this much? At least in a graph representation it should be easily possible to follow both execution paths, so that they can be displayed in an intuitive manner), to insertion of redundant code, nanomites all over the place and so on.
The redundant code introduced by their protection scheme is fairly standard though, and come down to inserted ranges of nop-equivalent code. These redundancies can be easily eliminated through static peephole optimizations, which I implemented in a simple IDC script. However, when one thinks about this a bit, peephole optimization to eliminate such code is a fairly unsafe process if the introduced redundant code is more sophisticated.
To illustrate this, I often ran into an instruction pair such as:
xchg eax, edx xchg eax, edx
Of course my IDC script quickly eliminated this as redundant code, but now consider the following situation:
xchg eax, edx yadda: xchg eax, edx
Obviously this means we cannot trivially eliminate these xchg pairs as they’re not part of the same basic block. We might be able to eliminate them though if all execution paths leading to ‘yadda’ also contain an instruction such as xchg eax, edx. This is already less trivial, but addressable by static analysis tools.
But how about the situation where the disassembler, say IDA, cannot find a cross reference to yadda in the first place? I.e. when yadda is a dynamically calculated target? This would mean that my script (and likely others that focus on static reduction of redundant code) will eliminate these instructions without a second thought, making it a destructive operation.
Now forceful construction of such situations is also doable; suppose at some point in the original program there is a xchg eax, edx instruction, we could decide to create a stub function in the program containing only assumed noop codes, par examplum the following:
yadda: add eax, 10 sub eax, 20 lea eax, [eax+10] pushf test eax, 0xbadc0de popf xchg eax, edx xchg eax, edx and edx, 0xFFFFFFFF ret
This would appear like a function containing only redundant code, and we could insert calls to it in many places, which deobfuscators would likely eliminate altogether with the function. But if we start replacing xchg eax, edx instructions in our original program with dynamically calculated call instructions to the second xchg instruction in the function such an elimination would be a destructive operation.
This example presented such a trick by abusing two instructions in a row, but it may also be possible to perform this trick when an instruction with multiple bytes is involved, where for instance the operand is an instruction itself. Consider:
yadda: and esi, esi
Dynamically branching to yadda + 1 (which by the way is 0xF6 — the opcode for div) will also wreak havoc on unsuspecting redundancy elimination. Pure static analysis of binaries containing these tricks seems a fairly inadequate approach to me.