On Software Reverse Engineering

Further Discussions

There are plenty of things worth discussing even though we

have fully reverse engineered the FLEXlm protection system. The most prominent

one is, where is VENDOR_KEY5? All essays on [4] say the 5th key is used to xor

encryption seeds and offer various techniques to uncover it (I tried some of

them but none worked). The source code also strongly suggest that with all the

names like L_UNIQ_KEY5_FUNC, key5(), key5_uniqx, key5_order[], VKEY5(), etc.

Yet we were able to generate correct license file without even knowing

VENDOR_KEY5, isn’t that strange?

The only logical explanation is that VENDOR_KEY5 is

abandoned in the new version. In early versions (pre-8.0?) VENDOR_KEY5 was

truly vital in seed obfuscation, but it is dropped in newer versions, perhaps

as a countermeasure to hackers. In key5()

encryption seeds are xor-ed (partly) with code->keys[0] and code->keys[1], the

first two vendor keys rather than the fifth. Consequently specific tricks in

[4] are largely out of date. However, the old source code are preserved in

FLEXlm, bewildering everyone who tries to read it. You may argue this is done

intentionally to mislead hackers, but I think it is more likely attributed to

bad project management at Macrovision.

The old encoding/decoding algorithms include several

functions that are untouched in the checkout process: VKEY5(), l_svk(), l_key() and l_zinit().

Remember the constants x and z that are

never used? They’re updated in every version upgrade (see history comment) but

irrelevant in our practice. The ironic thing is that despite the alias L_UNIQ_KEY5_FUNC, l_n36_buff() has

nothing to do with VENDOR_KEY5 at all. In FLEXlm dialect KEY5 has become a

(false) symbol for obfuscation.

Back to the latest code, l_string_key() deserves

a second look. Before L_MOVELONG there is a test on (job->flags

& LM_FLAG_MAKE_OLD_KEY). The macro is defined as 0x00100000 in l_privat.h, and in

reality the tracing result was job->flags = 0x00104840 for cmath.exe and job->flags

= 0x00944000 for lmcrypt.exe, so they both enter the if{} block.

As the flag name indicates, we are making the old type license key – which is

no surprise, after all our keys are short and non-CRO in the first place. But

what’s the new type?

The difference between the new and old style license lies

on the “SIGN=” literal. The old key is a standalone hash string while the new

one has a “SIGN=” prefix. This seemingly minor detail actually matters a lot.

As demonstrated above, lmcrypt.exe determines the license

type by the existence of “SIGN=” and the new checksum for CMATH 5.5 is

SIGN=B5E1542279DC. Put it in a license file and cmath.exe pops up

the error message. However, such format incompatibility is restricted to the

keygen process, the checkout process only cares about the signature itself. In

other words, cmath.exe would not complain about a

“SIGN=6D5C01FD71C9” license line.

In the l_string_key() parameters, code->data[] has the

peculiar pattern of {52xxxxB8, 75yyyy0F} where xxxx and yyyy are random at each

run. Why only the middle bytes are random but not all? Well it’s because the

xor operand from job->mem_ptr2_bytes[] is

00zzww00 where zz = ww is random. But then why does that 32-bit word have zero

head and tail with two middle bytes equal? Believe it or not, it’s purely

coincidence.

First, in l_n36_buff() not all members of t->a[] (i.e. job->mem_ptr2_bytes[]) are

assigned to random values. This seems impossible because i loops

through the whole array in the generator uniqcode(). The

problem is the l_puts_rand1() afterward: it’s supposed to

shuffle and output the lines in random order, but the implementation may end up

with writing only certain lines depending on internal seeds. See lm_rand3.c – FLEXlm’s

proprietary RA (Random Algorithm) file – for source code. The seeds may be

modified by many functions and at the time to output t->a[] it

happens to emit the following sequence: 0, 10, 3, 4, 6, 5, 5, 4, 5, 1, 2, 2.

Recall the 4 byte indexes for xor operand are 7, 3, 5, 11; no wonder t[3] and t[5] are

random while t[7] = t[11] = 0.

Second, we know the t->a[]

randomness comes from time(0). But standard time function’s

precision is only down to second, which is rather crude for modern processors.

It takes less than 0.01s to execute l_n36_buff(), thus t->a[3] and t->a[5] are

assigned to the same random value. All these explain the 00zzww00 (zz = ww)

pattern of the xor operand.

We want to point out that the l_puts_rand1()

implementation has to be classified as a bug. In fact the quality of whole lm_rand3.c is quite

low, it gives too many predictable results to be called random. The preceding t->a[] sequence

is a good example, it’s partial (not covering the entire array), fixed (VNI and

I have the same l_n36_buff()), and weak (not realizing the

serious limitations of time(0)). Had it been more random, the

above exotic xor patterns would disappear. Macrovision would be much better off

if they just use Certicom’s library in lieu of their ugly code.

Macrovision did, however, do a terrific job in key/seed

obfuscation. They are initialized to shadow values and not recovered until the

last moment. This makes a careless cracker to fall into traps easily, if he/she

sets breakpoint arbitrarily then most likely he/she will intercept wrong

values. For instance, an article in [4] says FLEXlm validates encryption seeds

not be default ones, which is the following code section (0044414A – 00444289)

from l_init().

if (!(job->options->flags & LM_OPTFLAG_CUSTOM_KEY5)

&& !L_STREQ(job->vendor, "demo")

(l_getattr(job, LMADMIN_API) != LMADMIN_API_VAL))

{

memcpy(&vc,

&job->code, sizeof(vc));

l_sg(job,

job->vendor, &vc); /* calls l_key(), so it

does not recover true seeds */

if ((vc.data[0] ==

0x87654321) || (vc.data[1] == 0x12345678))

{

LM_SET_ERRNO(job, LM_DEFAULT_SEEDS, 318, 0);

}

memset(&vc, 0,

sizeof(vc));

I interrupted it and got vc.data[0] =

2AD430F8, vc.data[1] = 0DF65D4F, which are not real seeds. Surely this is another

bug because such validation should only appear on vendor side such as lmnewgen.c but

never on user side, moreover above code in the target does not serve its

purpose. Of course if we look at the other side of the coin, we may say such

code successfully confuses hackers. I personally made many mistakes on this

issue.

Eventually we achieve in defeating FLEXlm protection at

three levels: 1. patch; 2. obtain license checksum; 3. obtain vendor keys and

seeds. The difficulty rises at each level. Patching is the easiest yet the

sharpest weapon of crackers. It may not be elegant but it’s very effective,

whose principles apply universally to all software protection systems. Level 2

and 3 are more ambitious, and theoretically it is possible to devise a system

that is secure at these two levels.

As mentioned previously, FLEXlm carries quite a few

advanced equipments, especially CRO. Among others we think trash code should be

a top candidate. It has been proven to be very practical. Like the

garbage-mixed core encryption/decryption algorithms, the target assembly alone

shed little light on what’s really going on, and we have to resort to the

source code of lmnewgen.c to grasp it. If more junk were

added beyond lm_new.c the difficulty would increase

exponentially. Another practical choice is framework change at every new

version. In fact Macrovision did just that, but it is very troublesome to keep

new and old versions compatible (you do need to make your customers happy,

right?). Vendors can create their own filters too, by editing utils\pc.mak, it adds

one more layer of xors.

All these measures are practical, but they are just more

obfuscations, which probably has reached the end. New breakthrough requires new

theory and that’s where CRO kicks in. CRO stands for Counterfeit Resistant

Option, it is an ECC public-key encryption system introduced in v7.2. We assume

that readers have basic idea about how public- key algorithms work because

that’s not our topic here. We want to concentrate on its difference from the

traditional hashing method.

The central discrepancy is symmetric vs. asymmetric and

one-way vs. bidirectional. Asymmetric encryption has two keys, public and

private. The CRO public key goes to vendor software (checkout process) and

private key is kept in lmcrypt.exe (keygen process) at

vendor site. So even if the public key (may be obfuscated) is compromised, the

private key is still safe provided vendor doesn’t leak it out. In theory it is

practically impossible to solve the private key from the public key. In

contrast symmetric encryption has only one set of keys that are present in both

checkout and keygen processes. Once we discover it, it’s all done; that’s how

we accomplish level 3. But for CRO enabled keys, level 3 is now officially

daydream.

Level 2 is not any better. We said earlier that all

software protection must compare the right and the wrong to distinguish

legitimate and non-legitimate users. This is a true statement, but must “the

right and the wrong” be the right and wrong checksums? For hash calculation, the

answer is yes because it is a one-way function that cannot be performed in the

other way around. The implication is that true hash code has to be calculated

on the fly and a memory peek brings us to level 2. Not that easy for CRO,

public key encryption has the ability to go either direction. Vendor keygen can

take the feature line ASCII as plaintext and encrypts it with private key to

produce license signature, which is only given to paid customers. Upon checkout

vendor software reads the signature from user license file as cipher-text,

decrypts it with public key, and compare the decrypted string to the feature

line ASCII. In this way (digital signature), the real signature is never

calculated at user site, what gets compared is just vanilla text visible to all

(Note here feature line ASCII is the plaintext, but it’s publicly available;

license signature is the cipher, but it’s secret we want to protect; sort of

mind-boggling). So level 2 is also “mission impossible” now.

There is another way to prevent the direct comparison of

sensitive data. It’s widely used in password verification. At setup the

password is hashed (Windows) or used as key to encrypt a known string

(Unix/Linux, DES/Blowfish/…), the result cipher text is then stored. When user

types in a password for login, the input goes through the same process and

matched to the saved cipher. The plain password never gets compared. Does this

unidirectional plan contradict to what we said? No. This plan will lose all the

flexibility of license management unless ciphers are not saved in vendor

software. If they are, then vendors must know the passwords prior to product

shipment, which means it can only be static serial number. If not, then the

only imaginable place to store them is vendor site. Vendors can maintain a

database of end user profiles including password ciphers and vendor software

can ask users to login to vendor website before real work begins. This may be

feasible, Microsoft already forces every user to activate Windows XP via

Internet, but it will also inflict angry protest.

Fortunately we still have patching, the ultimate killer.

As long as we have the vendor software we can physically change it. We can also

tinker CRO to replace vendor public key with our public key, but why bother

when we can patch much more conveniently as described in the front. This is the

fundamental weakness of software sales and why pirating can only be tamed but

not eradicated. In the real world economic and legal measures are often more

useful in fighting pirating than technology. So much for FLEXlm mechanism,

below we’re going to relax a little bit and offer our 2 cents on some issues.

We worked very hard to reverse engineer FLEXlm, there is

no regret because our effort pays off. As a notable brand on market FLEXlm is

very popular among numerous software vendors (e.g. ANSYS, Fluent, Cadence,

Synopsys, UG, …) and becomes industry standard. Its customer base is a big

incentive for people to study it. Although I criticize its source code harshly,

to be fair, in license management it heads and shoulders the majority of other

software, which often only has the simplest serial code protection. Having the

FLEXlm experience, hacking the rest should be a piece of cake.

I have expressed my hatred toward FLEXlm coding style repeatedly

and it seems I’m not alone. In a document named “Macrovision Coding

Conventions” detail instructions are given on how to write C language programs.

It sounds more like a new developer complaining about his/her frustration on

the Greek-like code. By the way I think FLEXlm should be rewritten in C++

(maybe a little Java too), even just some function wrapper also helps. License

management is a task very suitable for OOP.

A few comments on debuggers (for static analyzers IDA Pro

is by far the best). VC7 has an integrated debugger that is the No.1 choice if

source code is available (developers love it). Most debuggers are for binary

executables. W32dasm is a small and efficient tool, requires little system

resource and can run as a normal Windows application. Thanks to its small size,

it also lacks some advanced functionalities and handles large target poorly.

The worst thing is that its author has stopped supporting it. If it were open

source software, then someone (I’d like to) could pick it up and continue to

improve this neat tool.

Ollydbg is similar to W32dasm and more powerful. It has an

important feature missing in W32dasm – set breakpoint on memory access/change.

But I don’t like its UI layout. The world heavyweight champion of debuggers is

of course Numega SoftICE. This famous debugger can debug anything, even the

system kernel (W32dasm and Ollydbg can only take user applications). It’s

initially intended for driver development – the implementation itself is a

system driver – but now it’s used for all kinds of operations. Its largest

drawback is instability. Running at ring 0 it easily interferes with the OS and

frequently causes system crash/freeze. At last I have to uninstall it.

Microsoft also has two independent debuggers, windbg and kd. This first one is

GUI application and the second is command line kernel debugger. I have no

experience on them.

There are topics we have not covered, some are not

important, some need another paper. In the end we sum up some lessons we have

learned in reverse engineering:

Good tools and skillful use of them are vital to success;

Do not jump into disassembly tracing too hastily, gather

as much information as possible first;

Before dynamic tracing, do a thorough static analysis;

When reading source code, compare it with tracing result

to chart the control flow;

Data flow analysis is very useful;

Reverse engineering is laborious, tedious and rewarding

work, be patient.

References

[1] Visual Numerics, IMSL C

Numerical Library 5.5 User’s Guide, 2003.

[2] Macrovision, FLEXlm

Programmers Guide 8.1, February 2002.

[3] Macrovision, FLEXlm Reference

Manual 8.1, February 2002.

[4] CrackZ, FLEXlm – “Dubious License Management”, http://www.woodmann.com/crackz/Flexlm.htm,

2003.

[5] Intel, IA-32 Intel Architecture Software

Developer’s Manual, Volume 2: Instruction Set Reference, 2001.

[6] I. Guilfanov, Fast

Library Identification and Recognition Technology, http://www.datarescue.com/idabase/flirt.htm, 1997.

[7] C. Cifuentes, Reverse Compilation Techniques,

PhD thesis, University of Queensland, 1994.

[8] NIST, FIPS Publication 186-2: Digital Signature

Standard, 2000.

On Software Reverse Engineering - 7