On Software Reverse Engineering
This article discusses the methods of software reverse
engineering and the case study of FLEXlm system.
Problem Description
I downloaded IMSL CNL (C Numerical Library) 5.5 from ftp.vni.com but it’s protected by FLEXlm. There
are different binary downloads for different OS and compilers, but the general
cracking techniques apply across platforms (in our case the FLEXlm license file
is certainly platform independent). Here we address CNL for Microsoft Windows
and Visual Studio. The product mainly includes libraries (static and dynamic)
of mathematical and statistical subroutines under FLEXlm feature names “CMATH”
and “CSTAT”. The x86 distribution also has an optimized version of CMATH called
“CMPERF” that utilizes the bundled Intel MKL 6.1 to achieve high performance[1].
No additional licensing is needed for CMPERF. Since CMATH and CSTAT use the
same licensing mechanism, we will focus on CMATH from now on, the procedure for
CSTAT is completely similar.
The setup provides a simple program cmath.c for
validation purpose. Without loss of generality, we link its object file with cmath_s.lib (the
static library) to get the executable. It calls imsl_f_lin_sol_gen() and checks
the license file, if the license file is not right, error is reported.
Searching the web we found the following license file for IMSL CNL 5.0
SERVER hostname hostid
27000
DAEMON VNI
"<vni_dir>\license\bin\bin.i386nt\vni.exe"
FEATURE CMATH VNI 5.0 permanent
uncounted 3F23BE3056E4 HOSTID=ANY
FEATURE CSTAT VNI 5.0
permanent uncounted 2C60CD4570B0 HOSTID=ANY
We tried it and expectedly got “version incorrect” error,
replacing 5.0 by 5.5 we got “incorrect softkey code” error, so obviously naïve
approach does not work. In fact it takes quite sophisticated work to beat
FLEXlm. There are different levels of software cracking, and the associated
complexity ranges from relatively simple to dauntingly difficult – we will see
that later. We now list our task and tools used in below.
Target:
Visual Numerics IMSL CNL 5.5
Protection:
Macrovision FLEXlm 9.2
Tools:
Microsoft Visual Studio 7.1 (CL, NMAKE, DUMPBIN, LIB, …)
RedHat Cygwin 1.3.5
IDM UltraEdit 9.0
Datarescue IDA Pro 4.3 (FLAIR, …)
URSoft W32Dasm 8.9
Sysinternals File Monitor 6.0
Resources:
Macrovision FLEXlm 9.2 SDK source code
Macrovision FLEXlm 8.1 SDK binary release
Preliminary Attempts
All hackings start from gathering information. For
example, it’s very easy to find out the target employs FLEXlm 9.2 for license
management by searching cmath_s.lib in UltraEdit. Reading
relevant literature also very helpful. There are some excellent essays in [4]
describing previous attacks against earlier versions of FLEXlm, which contain a
lot of precious knowledge.
In reality I loaded cmath.exe into
W32Dasm debugger and started tracing pretty soon. It is horrendous experience
tracing through jungles of assembly code, you can spend hours jumping back and
forth without getting any clues about what is really going on. But I managed to
figure out something anyway (with the help of File Monitor).
004133DE
E89D2F0100 call 00426380 ;read
license.dat
...
...
00413411
– 0041345B ... ... ;a loop read out “CMATH”
...
...
004138CD
E83E670000 call 0041A010 ;return must be 0
to pass check
004138D2
83C41C add esp, 0000001C
004138D5
8945F0 mov dword ptr [ebp-10], eax ;return value
stored in EAX
004138D8
837DF000 cmp dword ptr [ebp-10], 00000000
004138DC
0F848C070000 je 0041406E ;proceed to imsl_f_lin_sol_gen()
...
...
00413907
8B8D18F7FFFF mov ecx, dword ptr [ebp+FFFFF718]
0041390D
33C0 xor eax, eax
0041390F
8A81A7444100 mov al, byte ptr
[ecx+004144A7]
00413915
FF24857F444100 jmp dword ptr [4*eax+0041447F] ;jump to error message
Apparently subroutine 0041A010 is the key. Stepping into
it reveals more intricate structure – there are complicated call chains inside.
In practice the procedure returns FFFFFFF8 (or –8), which should be the error
code. So we set a breakpoint at 004138D5, using W32Dasm’s “Modify Data” button
we change the value of EAX register to 0, and let it go. Bang! The program is
fooled and yields the correct result as if we had the right license data.
Now things become clear, we can patch the code section
004138D2 – 004138DC to set the return value to always be 0. To do that we
consult [5] for detailed x86 instruction format on MOV and JMP, and the
following is the modified code. Note that NOP is added to patch extra bytes
left by the change so that the modified executable has the same length as the
old one. In fact they differ only on these three lines that we have changed.
004138CD
E83E670000 call 0041A010 ;return must be 0 to pass check
004138D2
83C41C add esp, 0000001C ;stack pointer adjustment
004138D5
C745F000000000 mov [ebp-10], 00000000 ;pretend the return value is 0
004138DC
90 nop ;patch the extra byte to maintain code alignment
004138DD
E98C070000 jmp 0041406E ;unconditional jump to imsl_f_lin_sol_gen()
Of course patching cmath.exe is just
proof of concept, the real thing is to patch the library itself. cmath.c per
se is embarrassingly short, virtually all contents of cmath.exe come
from cmath_s.lib, including both IMSL functions and FLEXlm code. To
locate the position of the above code, we search the binary string
837DF0000F848C070000 (code for CMP and JE lines) in UltraEdit[2].
This leads us to the unique location in cmath_s.lib (file
offset 00106F70 – 00106F80) where those code lies. Change the bytes accordingly
and we get a patched library. Test the program again by linking it to the new
patched library, everything works fine even though our license file is invalid.
So we’ve had our first success. Next we can do the same
thing to DLL version of the library and ideally, develop a utility to do that
automatically rather than manually, but we’ll omit it for now. The point is,
patching is usually the easiest and first step of cracking, it requires little
insight into the protection scheme. Moreover, patching works only for a
particular binary target, the patching utility cannot function for different
versions or on different platforms. Though powerful and effective as it is,
patching is far from full reverse engineering.
More Analysis
We fooled FLEXlm by passing a fake “OK” to it, but we still
don’t know the true license code. Most software protection works in this way:
Based on the user profile (name, organization, purchased feature, etc.) and
certain algorithm, the program calculates some hash/checksum/license
code/signature and compares it to the one user provides. If they match, the
user is authenticated. This mechanism is almost universal to all software we
have seen, and FLEXlm is no different. The code 3F23BE3056E4 in our license
file is the “SIGN=” signature, just it’s a wrong one.
One more word about FLEXlm: it’s called FLEXlm because it
claims to provide a flexible solution to commercial software license
management. It does put a lot of efforts in treating various situations –
counted/uncounted, feature/incremental, server/local, borrowing, trial, mobile,
… – but we are not interested in those things. What we want is the right
signature that enables us to run the specified target anytime, anywhere.
Since we have the source code of FLEXlm SDK 9.2, the
logical thing to do is to read it. FLEXlm SDK is what Macrovision gives to
their client, the so-called “vendor”, and helps them to ship their own “vendor
software” to the “end user”. In our case Visual Numerics is the vendor, IMSL
CNL 5.5 is the vendor software, and we are the end users. Each vendor has a
unique vendor name or vendor ID. As seen in the license, here it is “VNI”.
Usually vendors only get binary release or partial source of FLEXlm SDK, but we
have the luck to obtain the whole source for that, which is what FLEXlm is all about.
However, it turns out that the all–C source (no C++) are
not very readable. The project evolves over more than a decade (version 1.0
release in 1988); the old and new functions overlap/intertwine like spaghettis,
often with unnecessary redundancies; it is poorly commented and some old style
coding conventions are very bad; the overuse of macros and preprocessing
directives are very annoying. Developed originally on UNIX platforms, it is
ported to Windows environment by NMAKE utility. To efficiently build and debug
such a large application we need a good IDE, but there is no IDE under Windows
that can take in the Makefile directly. As Visual Studio is possibly the best
IDE on Windows and has “Makefile Project” capability, we set out to create a
VS7 project for FLEXlm SDK. It took me some time to do that – need to fix some
errors in makefiles – but when it’s done it’s really convenient.
The core component of FLEXlm is lmgr.lib (or lmgr9a.dll), on
which all others heavily depend. Vendors and end users are more familiar with
the tools like lmgrd.exe, lmtools.exe, lmnewgen.exe, makekey.exe, etc.
After successful building, we try to generate VNI license file but failed
because we don’t have their vendor keys and seeds. According to [2], [3], [4],
each vendor receives 5 vendor keys (VENDOR_KEY1, … VENDOR_KEY5) from
Macrovision and they themselves choose 3 random seeds (LM_SEED1, LM_SEED2,
LM_SEED3). These eight numbers are placed in lm_code.h and then
encrypted, obfuscated, and finally built into the target as well as the tools
that generate the license. Our job is of course to recover these numbers, but
how?
Or we can try a less difficult way: since the target will
calculate the real signature and compare it against the one it reads from the
license file, we can go catch the real signature when the comparison takes
place without knowing vendor keys and seeds. This is not as easy as it may
seem, it requires us to set the breakpoint in the right place at the right
time. After hours of tracing we determined that simply following the
instruction flow was a dead end, it’s hopeless to locate the comparison code in
this way unless FLEXlm uses some standard APIs like ANSI strncmp() or Win32
CompareString()
(actually FLEXlm defines its own macro STRNCMP in l_privat.h).
This brings us to a fundamental question on reverse
engineering: how can we make sense out of the chaotic, high-entropy assembly
code? There are no complete answers to this question and we refer the
interested reader to [7] for some serious theoretic discussions. But here we
want to focus on a practical technology on this issue: FLAIR (a.k.a. FLIRT,
c.f. [6]) introduced by IDA Pro.
We all know that debugging can be made much easier if the
application is built as debug version and symbols files (.PDB, .DBG files) are
available. Symbols are information about the program including identifier
(variable, function) names and memory offsets, source code line number, etc. In
a debug build compiler/linker saves these info either into the application
binary or to separate symbol files. They enable debuggers to present the users
code that closely resembles the original source (or even better, like source
debugging in VS). Naturally symbols are stripped off from software releases
delivered to end users, as in our case.
But this is not end of the story. Although without
symbols, we can still get something useful from the binary files, especially
libraries. On Windows .EXE and .DLL are PE format while .OBJ and .LIB (LIB is
no more than a pile of OBJs stacked together) are COFF format; in both formats
library calls are made via function names and arguments, which have to be
publicly visible[3]. For that
purpose PE has imports & exports sections and COFF has symbol table (note
PE kind of contains COFF as a subdivision). Visual Studio offers a couple of
commands to explore them[4]:
F:\>dumpbin /disasm
%vni_dir%\cnl55\cmath.exe
F:\>dumpbin /rawdata
%vni_dir%\cnl55\cmath.exe
F:\>dumpbin /imports
%vni_dir%\cnl55\cmath.exe
F:\>dumpbin /exports
%vni_dir%\cnl55\bin\cmath.dll
F:\>dumpbin /imports
%vni_dir%\cnl55\bin\cmath.dll
F:\>dumpbin /exports
%vni_dir%\cnl55\lib\cmath.lib
F:\>dumpbin /symbols
%vni_dir%\cnl55\lib\cmath_s.lib
F:\>dumpbin
/linkermember %vni_dir%\cnl55\lib\cmath.lib
F:\>dumpbin
/linkermember %vni_dir%\cnl55\lib\cmath_s.lib
F:\>dumpbin
/archivemembers %vni_dir%\cnl55\lib\cmath.lib
F:\>dumpbin
/archivemembers %vni_dir%\cnl55\lib\cmath_s.lib
F:\>lib /list
%vni_dir%\cnl55\lib\cmath.lib
F:\>lib /list
%vni_dir%\cnl55\lib\cmath_s.lib
F:\>lib
/extract:vc++\flexlm.obj %vni_dir%\cnl55\lib\cmath_s.lib
F:\>dumpbin /symbols
/disasm flexlm.obj
We must stress the difference between LIB and DLL here,
it’s more than merely static linking vs. dynamic linking. There are generally
three stages in developing a program:
Original:
Source File (C, H, C++…), ASCII format
Intermediate:
Object File (OBJ, LIB…), COFF format
Final:
Image File (EXE, DLL, SYS…), PE format
Compiler processes source file to produce object file,
linker takes in object file to output image file, and loader loads image file
from disk into memory. Note DLL has already been processed by linker
(after-linking), turning identifiers into memory addresses or “replaced letters
with numbers”. In contrast object files are “before-linking” and have to retain
the original symbols; otherwise linker cannot resolve them. Hence DLL is much
closer to EXE than to LIB in spite of its name as “library”. The “dynamic
linking” part is actually done by system loader at runtime (some
fixup/relocations), not by the linker pass. You can say this is an M$ trick
that uses misleading terms to confuse people and conceal technical gist, which
they are very good at.
From a hacker’s point of view, this means the later the
stage, the higher the entropy, the less the information. In particular, LIB
provides more clues to us than DLL does. DLL exports only public APIs and hides
the private ones (e.g. in cmath.dll only IMSL APIs are exported
while FLEXlm functions are kept internal), but LIB symbols include both. In
fact, to build a program that calls DLL APIs we need to pass its import library
to the linker[5]. Import
library is a COFF LIB file that servers as a symbol reference (by pointing to
the DLL) and does not contain function bodies.
Now we can find out what functions are there in a library
file, locate our concerned function, and even extract the corresponding object
file (only for LIB). The following is an example with imsl_f_lin_sol_gen(). As we
said before, debug info, which reside in PE debug section or separate symbols
files, are as best as we can get only next to source code (debug info lies
roughly between the first and second stage). Nevertheless, what we obtain here
is still very important in reverse engineering (debug-build symbols is a
superset of COFF symbols).
F:\>dumpbin
/archivemembers /symbols %vni_dir%\cnl55\lib\cmath_s.lib|egrep
"member|imsl_f_lin_sol_gen”
... ...
Archive member name at
1C8CEE: vc++\gmres.obj/
00B 00000000 SECT3 notype ()
External |
_imsl_f_lin_sol_gen_min_residual
... ...
Archive member name at
1DAAAA: vc++\fspgen.obj/
00B 00000000 SECT3 notype ()
External |
_imsl_f_lin_sol_gen_coordinate
05A 00000000 UNDEF notype ()
External |
_imsl_f_lin_sol_gen
... ...
Archive member name at
1FC10E: /3128 vc++\fdmbndg.obj
007 00000000 SECT2 notype ()
External |
_imsl_f_lin_sol_gen_band
... ...
Archive member name at
58AC3E: /6132 vc++\flinslg.obj
007 00000000 SECT2 notype ()
External |
_imsl_f_lin_sol_gen
... ...
F:\>dumpbin /exports
%vni_dir%\cnl55\bin\cmath.dll | grep -i imsl_f_lin_sol_gen
438 1B5 000432C0
imsl_f_lin_sol_gen
439 1B6 0023B130
imsl_f_lin_sol_gen_band
440 1B7 0024AE30
imsl_f_lin_sol_gen_coordinate
441 1B8 00259C30
imsl_f_lin_sol_gen_min_residual
F:\>lib
/extract:vc++\flinslg.obj %vni_dir%\cnl55\lib\cmath_s.lib
IDA FLAIR moves one step further. Albeit in source code
API calls look like “x = imsl_f_lin_sol_gen(n, a, b, 0);”, it
appears in disassemblies as “call 004033B0” (static linking) or “call
[00402054]” (dynamic linking). Appropriate labeling of function
names along side of their memory addresses can drastically ease the assembly
analysis, and that’s exactly what FLAIR does. Most debuggers have such
commenting functionality, but usually restricted to exported APIs. IDA tries to
extend it to include as many functions as possible.
The idea of FLAIR is to create a “signature” for every
identifiable library function so that when IDA analyzes the assembly code it
can recognize and label it. It is essentially a pattern recognition problem as
its name indicates. Again it works only for LIB, not for release-version DLL,
due to their content differences. We must say that DLL does have advantages
such as code sharing and main program simplification. For instance, the size of
statically linked cmath.exe is about 700KB but that of
dynamically linked cmath.exe is less than 4KB. But as far as
cracking is concerned, LIB is way better than DLL (when tracing DLL linked
application, most of the time we are in 10000000+ or 80000000+ area instead of
the familiar 00400000+ region; in DLL-version cmath.exe the
instruction “102D12BA: call 102D7980” returns
FFFFFFF8).
IDA FLAIR is not perfect – it can’t handle DLL, some
functions can’t be identified, false recognition could happen… – yet it is very
practical. Its original goal is to isolate boilerplate APIs (such as Win32,
MFC, ATL, etc.[6]) so that
people can focus on main program algorithm instead of those standard library
functions. In our case we are more interested in getting those FLEXlm functions
highlighted so we don’t need to step into every calls to get a big picture of
the whole code maze. In reality we created signature files of cmath_s.lib and lmgr.lib, applied
them to cmath.exe in IDA, and FLAIR did very well
– it recognized most FLEXlm functions. As an outstanding static analysis
toolbox, IDA also offers a WinGraph32 feature called “Display Flow Chart”. I
found it especially useful to facilitate understanding of the code when
contrasted side by side to the source.
[1] Note the
difference between IMSL and MKL. MKL functions are low-level, fundamental
subroutines like BLAS, LAPACK, FFT, etc. IMSL, on the other hand, contains much
more higher-level functionalities such as differential equation solvers and
statistical regressions. MKL serves as good building blocks for many IMSL
functions.
[2] Remember
x86 architecture is little-endian. In some applications searching binary string
can be tricky because we need to reverse the byte order. Fortunately UltraEdit
handles it well and we can do it as is.
[3] API
calls in DLL could be based on ordinals (indexes). Since ordinals have
bijective mapping to function names, it’s an indirect way of calling by names.
However COM DLL APIs may not be outside visible, see footnote 5.
[4] Here cmath_s.lib and cmath.dll are the
“real thing” while cmath.lib is just the import library for cmath.dll.
[5] Refers
to traditional SDK DLL. Unlike SDK DLL where PE exports section is
indispensable, the newer COM DLL employs a totally different calling mechanism
called automation. Member methods are invoked through interface pointers rather
than being exported directly. Thus the tightly encapsulated COM DLL gives us
even less info and more challenge.
[6] They
claim that modern real life applications contain 50+% of such standard API
calls, see [6].