On Software Reverse Engineering

This article discusses the methods of software reverse

engineering and the case study of FLEXlm system.

Problem Description

I downloaded IMSL CNL (C Numerical Library) 5.5 from ftp.vni.com but it’s protected by FLEXlm. There

are different binary downloads for different OS and compilers, but the general

cracking techniques apply across platforms (in our case the FLEXlm license file

is certainly platform independent). Here we address CNL for Microsoft Windows

and Visual Studio. The product mainly includes libraries (static and dynamic)

of mathematical and statistical subroutines under FLEXlm feature names “CMATH”

and “CSTAT”. The x86 distribution also has an optimized version of CMATH called

“CMPERF” that utilizes the bundled Intel MKL 6.1 to achieve high performance[1].

No additional licensing is needed for CMPERF. Since CMATH and CSTAT use the

same licensing mechanism, we will focus on CMATH from now on, the procedure for

CSTAT is completely similar.

The setup provides a simple program cmath.c for

validation purpose. Without loss of generality, we link its object file with cmath_s.lib (the

static library) to get the executable. It calls imsl_f_lin_sol_gen() and checks

the license file, if the license file is not right, error is reported.

Searching the web we found the following license file for IMSL CNL 5.0

SERVER hostname hostid

27000

DAEMON VNI

"<vni_dir>\license\bin\bin.i386nt\vni.exe"

FEATURE CMATH VNI 5.0 permanent

uncounted 3F23BE3056E4 HOSTID=ANY

FEATURE CSTAT VNI 5.0

permanent uncounted 2C60CD4570B0 HOSTID=ANY

We tried it and expectedly got “version incorrect” error,

replacing 5.0 by 5.5 we got “incorrect softkey code” error, so obviously naïve

approach does not work. In fact it takes quite sophisticated work to beat

FLEXlm. There are different levels of software cracking, and the associated

complexity ranges from relatively simple to dauntingly difficult – we will see

that later. We now list our task and tools used in below.

Target:

Visual Numerics IMSL CNL 5.5

Protection:

Macrovision FLEXlm 9.2

Tools:

Microsoft Visual Studio 7.1 (CL, NMAKE, DUMPBIN, LIB, …)

RedHat Cygwin 1.3.5

IDM UltraEdit 9.0

Datarescue IDA Pro 4.3 (FLAIR, …)

URSoft W32Dasm 8.9

Sysinternals File Monitor 6.0

Resources:

Macrovision FLEXlm 9.2 SDK source code

Macrovision FLEXlm 8.1 SDK binary release

Preliminary Attempts

All hackings start from gathering information. For

example, it’s very easy to find out the target employs FLEXlm 9.2 for license

management by searching cmath_s.lib in UltraEdit. Reading

relevant literature also very helpful. There are some excellent essays in [4]

describing previous attacks against earlier versions of FLEXlm, which contain a

lot of precious knowledge.

In reality I loaded cmath.exe into

W32Dasm debugger and started tracing pretty soon. It is horrendous experience

tracing through jungles of assembly code, you can spend hours jumping back and

forth without getting any clues about what is really going on. But I managed to

figure out something anyway (with the help of File Monitor).

004133DE

E89D2F0100 call 00426380 ;read

license.dat

...

00413411

– 0041345B ... ... ;a loop read out “CMATH”

...

004138CD

E83E670000 call 0041A010 ;return must be 0

to pass check

004138D2

83C41C add esp, 0000001C

004138D5

8945F0 mov dword ptr [ebp-10], eax ;return value

stored in EAX

004138D8

837DF000 cmp dword ptr [ebp-10], 00000000

004138DC

0F848C070000 je 0041406E ;proceed to imsl_f_lin_sol_gen()

...

00413907

8B8D18F7FFFF mov ecx, dword ptr [ebp+FFFFF718]

0041390D

33C0 xor eax, eax

0041390F

8A81A7444100 mov al, byte ptr

[ecx+004144A7]

00413915

FF24857F444100 jmp dword ptr [4*eax+0041447F] ;jump to error message

Apparently subroutine 0041A010 is the key. Stepping into

it reveals more intricate structure – there are complicated call chains inside.

In practice the procedure returns FFFFFFF8 (or –8), which should be the error

code. So we set a breakpoint at 004138D5, using W32Dasm’s “Modify Data” button

we change the value of EAX register to 0, and let it go. Bang! The program is

fooled and yields the correct result as if we had the right license data.

Now things become clear, we can patch the code section

004138D2 – 004138DC to set the return value to always be 0. To do that we

consult [5] for detailed x86 instruction format on MOV and JMP, and the

following is the modified code. Note that NOP is added to patch extra bytes

left by the change so that the modified executable has the same length as the

old one. In fact they differ only on these three lines that we have changed.

004138CD

E83E670000 call 0041A010 ;return must be 0 to pass check

004138D2

83C41C add esp, 0000001C ;stack pointer adjustment

004138D5

C745F000000000 mov [ebp-10], 00000000 ;pretend the return value is 0

004138DC

90 nop ;patch the extra byte to maintain code alignment

004138DD

E98C070000 jmp 0041406E ;unconditional jump to imsl_f_lin_sol_gen()

Of course patching cmath.exe is just

proof of concept, the real thing is to patch the library itself. cmath.c per

se is embarrassingly short, virtually all contents of cmath.exe come

from cmath_s.lib, including both IMSL functions and FLEXlm code. To

locate the position of the above code, we search the binary string

837DF0000F848C070000 (code for CMP and JE lines) in UltraEdit[2].

This leads us to the unique location in cmath_s.lib (file

offset 00106F70 – 00106F80) where those code lies. Change the bytes accordingly

and we get a patched library. Test the program again by linking it to the new

patched library, everything works fine even though our license file is invalid.

So we’ve had our first success. Next we can do the same

thing to DLL version of the library and ideally, develop a utility to do that

automatically rather than manually, but we’ll omit it for now. The point is,

patching is usually the easiest and first step of cracking, it requires little

insight into the protection scheme. Moreover, patching works only for a

particular binary target, the patching utility cannot function for different

versions or on different platforms. Though powerful and effective as it is,

patching is far from full reverse engineering.

More Analysis

We fooled FLEXlm by passing a fake “OK” to it, but we still

don’t know the true license code. Most software protection works in this way:

Based on the user profile (name, organization, purchased feature, etc.) and

certain algorithm, the program calculates some hash/checksum/license

code/signature and compares it to the one user provides. If they match, the

user is authenticated. This mechanism is almost universal to all software we

have seen, and FLEXlm is no different. The code 3F23BE3056E4 in our license

file is the “SIGN=” signature, just it’s a wrong one.

One more word about FLEXlm: it’s called FLEXlm because it

claims to provide a flexible solution to commercial software license

management. It does put a lot of efforts in treating various situations –

counted/uncounted, feature/incremental, server/local, borrowing, trial, mobile,

… – but we are not interested in those things. What we want is the right

signature that enables us to run the specified target anytime, anywhere.

Since we have the source code of FLEXlm SDK 9.2, the

logical thing to do is to read it. FLEXlm SDK is what Macrovision gives to

their client, the so-called “vendor”, and helps them to ship their own “vendor

software” to the “end user”. In our case Visual Numerics is the vendor, IMSL

CNL 5.5 is the vendor software, and we are the end users. Each vendor has a

unique vendor name or vendor ID. As seen in the license, here it is “VNI”.

Usually vendors only get binary release or partial source of FLEXlm SDK, but we

have the luck to obtain the whole source for that, which is what FLEXlm is all about.

However, it turns out that the all–C source (no C++) are

not very readable. The project evolves over more than a decade (version 1.0

release in 1988); the old and new functions overlap/intertwine like spaghettis,

often with unnecessary redundancies; it is poorly commented and some old style

coding conventions are very bad; the overuse of macros and preprocessing

directives are very annoying. Developed originally on UNIX platforms, it is

ported to Windows environment by NMAKE utility. To efficiently build and debug

such a large application we need a good IDE, but there is no IDE under Windows

that can take in the Makefile directly. As Visual Studio is possibly the best

IDE on Windows and has “Makefile Project” capability, we set out to create a

VS7 project for FLEXlm SDK. It took me some time to do that – need to fix some

errors in makefiles – but when it’s done it’s really convenient.

The core component of FLEXlm is lmgr.lib (or lmgr9a.dll), on

which all others heavily depend. Vendors and end users are more familiar with

the tools like lmgrd.exe, lmtools.exe, lmnewgen.exe, makekey.exe, etc.

After successful building, we try to generate VNI license file but failed

because we don’t have their vendor keys and seeds. According to [2], [3], [4],

each vendor receives 5 vendor keys (VENDOR_KEY1, … VENDOR_KEY5) from

Macrovision and they themselves choose 3 random seeds (LM_SEED1, LM_SEED2,

LM_SEED3). These eight numbers are placed in lm_code.h and then

encrypted, obfuscated, and finally built into the target as well as the tools

that generate the license. Our job is of course to recover these numbers, but

how?

Or we can try a less difficult way: since the target will

calculate the real signature and compare it against the one it reads from the

license file, we can go catch the real signature when the comparison takes

place without knowing vendor keys and seeds. This is not as easy as it may

seem, it requires us to set the breakpoint in the right place at the right

time. After hours of tracing we determined that simply following the

instruction flow was a dead end, it’s hopeless to locate the comparison code in

this way unless FLEXlm uses some standard APIs like ANSI strncmp() or Win32

CompareString()

(actually FLEXlm defines its own macro STRNCMP in l_privat.h).

This brings us to a fundamental question on reverse

engineering: how can we make sense out of the chaotic, high-entropy assembly

code? There are no complete answers to this question and we refer the

interested reader to [7] for some serious theoretic discussions. But here we

want to focus on a practical technology on this issue: FLAIR (a.k.a. FLIRT,

c.f. [6]) introduced by IDA Pro.

We all know that debugging can be made much easier if the

application is built as debug version and symbols files (.PDB, .DBG files) are

available. Symbols are information about the program including identifier

(variable, function) names and memory offsets, source code line number, etc. In

a debug build compiler/linker saves these info either into the application

binary or to separate symbol files. They enable debuggers to present the users

code that closely resembles the original source (or even better, like source

debugging in VS). Naturally symbols are stripped off from software releases

delivered to end users, as in our case.

But this is not end of the story. Although without

symbols, we can still get something useful from the binary files, especially

libraries. On Windows .EXE and .DLL are PE format while .OBJ and .LIB (LIB is

no more than a pile of OBJs stacked together) are COFF format; in both formats

library calls are made via function names and arguments, which have to be

publicly visible[3]. For that

purpose PE has imports & exports sections and COFF has symbol table (note

PE kind of contains COFF as a subdivision). Visual Studio offers a couple of

commands to explore them[4]:

F:\>dumpbin /disasm

%vni_dir%\cnl55\cmath.exe

F:\>dumpbin /rawdata

%vni_dir%\cnl55\cmath.exe

F:\>dumpbin /imports

%vni_dir%\cnl55\cmath.exe

F:\>dumpbin /exports

%vni_dir%\cnl55\bin\cmath.dll

F:\>dumpbin /imports

%vni_dir%\cnl55\bin\cmath.dll

F:\>dumpbin /exports

%vni_dir%\cnl55\lib\cmath.lib

F:\>dumpbin /symbols

%vni_dir%\cnl55\lib\cmath_s.lib

F:\>dumpbin

/linkermember %vni_dir%\cnl55\lib\cmath.lib

F:\>dumpbin

/linkermember %vni_dir%\cnl55\lib\cmath_s.lib

F:\>dumpbin

/archivemembers %vni_dir%\cnl55\lib\cmath.lib

F:\>dumpbin

/archivemembers %vni_dir%\cnl55\lib\cmath_s.lib

F:\>lib /list

%vni_dir%\cnl55\lib\cmath.lib

F:\>lib /list

%vni_dir%\cnl55\lib\cmath_s.lib

F:\>lib

/extract:vc++\flexlm.obj %vni_dir%\cnl55\lib\cmath_s.lib

F:\>dumpbin /symbols

/disasm flexlm.obj

We must stress the difference between LIB and DLL here,

it’s more than merely static linking vs. dynamic linking. There are generally

three stages in developing a program:

Original:

Source File (C, H, C++…), ASCII format

Intermediate:

Object File (OBJ, LIB…), COFF format

Final:

Image File (EXE, DLL, SYS…), PE format

Compiler processes source file to produce object file,

linker takes in object file to output image file, and loader loads image file

from disk into memory. Note DLL has already been processed by linker

(after-linking), turning identifiers into memory addresses or “replaced letters

with numbers”. In contrast object files are “before-linking” and have to retain

the original symbols; otherwise linker cannot resolve them. Hence DLL is much

closer to EXE than to LIB in spite of its name as “library”. The “dynamic

linking” part is actually done by system loader at runtime (some

fixup/relocations), not by the linker pass. You can say this is an M$ trick

that uses misleading terms to confuse people and conceal technical gist, which

they are very good at.

From a hacker’s point of view, this means the later the

stage, the higher the entropy, the less the information. In particular, LIB

provides more clues to us than DLL does. DLL exports only public APIs and hides

the private ones (e.g. in cmath.dll only IMSL APIs are exported

while FLEXlm functions are kept internal), but LIB symbols include both. In

fact, to build a program that calls DLL APIs we need to pass its import library

to the linker[5]. Import

library is a COFF LIB file that servers as a symbol reference (by pointing to

the DLL) and does not contain function bodies.

Now we can find out what functions are there in a library

file, locate our concerned function, and even extract the corresponding object

file (only for LIB). The following is an example with imsl_f_lin_sol_gen(). As we

said before, debug info, which reside in PE debug section or separate symbols

files, are as best as we can get only next to source code (debug info lies

roughly between the first and second stage). Nevertheless, what we obtain here

is still very important in reverse engineering (debug-build symbols is a

superset of COFF symbols).

F:\>dumpbin

/archivemembers /symbols %vni_dir%\cnl55\lib\cmath_s.lib|egrep

"member|imsl_f_lin_sol_gen”

... ...

Archive member name at

1C8CEE: vc++\gmres.obj/

00B 00000000 SECT3 notype ()

External |

_imsl_f_lin_sol_gen_min_residual

... ...

Archive member name at

1DAAAA: vc++\fspgen.obj/

00B 00000000 SECT3 notype ()

External |

_imsl_f_lin_sol_gen_coordinate

05A 00000000 UNDEF notype ()

External |

_imsl_f_lin_sol_gen

... ...

Archive member name at

1FC10E: /3128 vc++\fdmbndg.obj

007 00000000 SECT2 notype ()

External |

_imsl_f_lin_sol_gen_band

... ...

Archive member name at

58AC3E: /6132 vc++\flinslg.obj

007 00000000 SECT2 notype ()

External |

_imsl_f_lin_sol_gen

... ...

F:\>dumpbin /exports

%vni_dir%\cnl55\bin\cmath.dll | grep -i imsl_f_lin_sol_gen

438 1B5 000432C0

imsl_f_lin_sol_gen

439 1B6 0023B130

imsl_f_lin_sol_gen_band

440 1B7 0024AE30

imsl_f_lin_sol_gen_coordinate

441 1B8 00259C30

imsl_f_lin_sol_gen_min_residual

F:\>lib

/extract:vc++\flinslg.obj %vni_dir%\cnl55\lib\cmath_s.lib

IDA FLAIR moves one step further. Albeit in source code

API calls look like “x = imsl_f_lin_sol_gen(n, a, b, 0);”, it

appears in disassemblies as “call 004033B0” (static linking) or “call

[00402054]” (dynamic linking). Appropriate labeling of function

names along side of their memory addresses can drastically ease the assembly

analysis, and that’s exactly what FLAIR does. Most debuggers have such

commenting functionality, but usually restricted to exported APIs. IDA tries to

extend it to include as many functions as possible.

The idea of FLAIR is to create a “signature” for every

identifiable library function so that when IDA analyzes the assembly code it

can recognize and label it. It is essentially a pattern recognition problem as

its name indicates. Again it works only for LIB, not for release-version DLL,

due to their content differences. We must say that DLL does have advantages

such as code sharing and main program simplification. For instance, the size of

statically linked cmath.exe is about 700KB but that of

dynamically linked cmath.exe is less than 4KB. But as far as

cracking is concerned, LIB is way better than DLL (when tracing DLL linked

application, most of the time we are in 10000000+ or 80000000+ area instead of

the familiar 00400000+ region; in DLL-version cmath.exe the

instruction “102D12BA: call 102D7980” returns

FFFFFFF8).

IDA FLAIR is not perfect – it can’t handle DLL, some

functions can’t be identified, false recognition could happen… – yet it is very

practical. Its original goal is to isolate boilerplate APIs (such as Win32,

MFC, ATL, etc.[6]) so that

people can focus on main program algorithm instead of those standard library

functions. In our case we are more interested in getting those FLEXlm functions

highlighted so we don’t need to step into every calls to get a big picture of

the whole code maze. In reality we created signature files of cmath_s.lib and lmgr.lib, applied

them to cmath.exe in IDA, and FLAIR did very well

– it recognized most FLEXlm functions. As an outstanding static analysis

toolbox, IDA also offers a WinGraph32 feature called “Display Flow Chart”. I

found it especially useful to facilitate understanding of the code when

contrasted side by side to the source.

[1] Note the

difference between IMSL and MKL. MKL functions are low-level, fundamental

subroutines like BLAS, LAPACK, FFT, etc. IMSL, on the other hand, contains much

more higher-level functionalities such as differential equation solvers and

statistical regressions. MKL serves as good building blocks for many IMSL

functions.

[2] Remember

x86 architecture is little-endian. In some applications searching binary string

can be tricky because we need to reverse the byte order. Fortunately UltraEdit

handles it well and we can do it as is.

[3] API

calls in DLL could be based on ordinals (indexes). Since ordinals have

bijective mapping to function names, it’s an indirect way of calling by names.

However COM DLL APIs may not be outside visible, see footnote 5.

[4] Here cmath_s.lib and cmath.dll are the

“real thing” while cmath.lib is just the import library for cmath.dll.

[5] Refers

to traditional SDK DLL. Unlike SDK DLL where PE exports section is

indispensable, the newer COM DLL employs a totally different calling mechanism

called automation. Member methods are invoked through interface pointers rather

than being exported directly. Thus the tightly encapsulated COM DLL gives us

even less info and more challenge.

[6] They

claim that modern real life applications contain 50+% of such standard API

calls, see [6].

On Software Reverse Engineering - 1