Notes from: "Semantics-Aware Malware Detection"
[Christodorescu, Jha, Seshia, Song, Bryant]
--------
Problem:
--------
- old trend in malware: create a supervirus, infect millions with it
- new trend in malware: create a virus, create several derivatives of
that virus (each of which would require a distinct signature for AV
scanners) which differ from the base virus only slightly, have each
derivative infect smaller numbers of (possibly just local) hosts
--> so, how to make virus signatures for the main base virus that also
"catch" these derivatives, rather than having to have a sample of each
derivative and having to have even more bloated signature databases
------------------
malware detector : system which attempts to detect whether a program has
------------------ malicious intent
- evade detection via obfuscating the malware
- malware detectors use pattern matching are susceptible to obfuscation
-- a purely syntactic approach
- match some sequence of instructions (perhaps using regular expressions)
- but not matching based on what code DOES, matching based on
particular instruction sequences in code
----------------
Their approach :
----------------
- malware detection algo incorporates instruction semantics : seeking to
match based on malicious program traits
- so theoretically would catch derivative programs which have the same
functionality as the original
========
I. Intro
========
malware instance : program that has malicious intent
- e.g. viruses, trojans, worms
- can classify malware w.r.t. its propagation method and goal
malware detector : system that attempts to identify malware
- e.g. a virus scanner : uses sigs and other heuristics to identify malware
goal of malware writer : modify or morph malware so that it evades detection
- ideally do this derivative/variant-generation automatically
- e.g. evade detection via program obfuscation
-- e.g. virus encrypts its malicious payload and decrypts it during exec
-- e.g. polymorphism and metamorphism
-- polymorphic virus : obfuscates decryption loop using transformations
-- sample transformations :
-- nop insertion
-- code transposition (change order of instructions and put jumps
in in order to retain original semantics)
-- register reassignment (permute register allocation)
-- metamorphic virus : obfuscate entire virus
-- on replication, change code in variety of ways
-- e.g. code transposition, substitution of equivalent
instruction sequences, change of conditional jumps,
register reassignment
- another (obfuscation) technique : add new behaviors to existing malware
-- by adding new behavior to base code, able to evade detection even though
base code would have been detected
-- cf. Sobig.A through Sobig.F (worms), Netsky, Beagle (from 2004) to present
- commercial malware detectors : use simple pattern matching approach
-- match : program contains a sequence of instructions that is matched
by a regular expression
-- basic deficiency : ignore SEMANTICs of these instructions
-- not resilient to slight variations
-- so must use different patterns for detecting two malware instances
that are slight variations of each other
- suppose program P transformed by a compiler phase to produce program P^t
(maybe that compiler phase consisted of register allocation)
- translation-validation problem :
-- determine whether P^t simulates P
-- used to prove that various compiler phases preserve semantics of P
- view hacker obfuscations as compiler phases
-- then malware detection problem seems similar to translation-validation
problem
- after hacker obfuscation produces P' out of P (where P is a
known bad program), does P' have same semantics as P?
-- but hacker obfuscations will produce P' such that P' has *different*
semantics than P (added functionality, e.g.)
- so P and P' are not equivalent
-- plus, translation-validation makes assumptions a/b transformed program :
-- e.g. all branches in P' must match branches in P
- this does not necessarily hold for hacker transformations
-- all the same, some useful ideas from translation-validation :
(1) modeling semantics of instructions, (2) use decision procedures
- observe that certain malicious behaviors appear in all variants of a
certain malware (e.g. the decryption loop in a polymorphic virus)
-- but decryption loop, e.g., might appear in different form in variants
- algo for discovering specified malicious behaviors in a given program
- formal semantics : define problem of determining whether a program
exhibits a specified malicious behavior
-- in general, undecidable
- semantics-aware malware detection algo :
-- handle limited set of transformations
-- detects multiple variants with single template
-- more resilient than commercial AV scanners [claim]
==================================
II. Semantics of Malware Detection
==================================
(1) Specify malicious behavior : via templates
Template T : instruction sequence (I_T) which uses variables (V_T) and constants (C_T)
----------------
Sample template:
----------------
(1)A = const_addr1 // encrypted memory starts here
(2)B = const_addr2 // location to put decrypted memory
(3)condition(A)?TRUE : goto (7), FALSE : goto (4)
(4)mem[B] = f(mem[A])
(5)A = A + c
(6)B = B + d
goto (3)
(7)jump const_addr2
variables, V_T : { A, B }
symbolic constants, C_T : { const_addr1, const_addr2, c, d, f(x), p(x) }
-- symbolic constants : either n-ary function, F(n), or n-ary predicate, P(n)
-- actual constants (e.g. c, d) are 0-ary functions
-- n-ary function : takes n arguments, returns a value
-- n-ary predicate : takes n arguments, returns a boolean
-- c : TAU <==> the type of symbolic constant c is TAU
==> resilient to obfuscations :
-- register renaming (since use vars and constants)
-- changing starting addy of a memory block (since use vars)
==> When does an instruction sequence contain a malicious behavior?
-- if template and instruction sequence have the same effect on memory
-- IOW, the malicious behavior specified by the template is
demonstrated by the instruction sequence
==> also resilient to insertion of irrelevant instruction sequences
-----------------------------------------------------------------
Sample malware instance (instruction sequence) of above template:
-----------------------------------------------------------------
(1) eax = 0x403000
(2) ebx = 0x400000
(3) edx = eax + 3
(4) eax >= 406000 ?TRUE : goto (9), FALSE : goto (5)
(5) mem[ebx] = mem[edx - 3] << 2 + 1
(6) eax = eax + 4
(7) edx = edx + 4
(8) ebx = ebx + 4
goto (4)
(9) jump 0x400000
- edx above is totally unnecessary, could have written :
(1) eax = 0x403000
(2) ebx = 0x400000
(3) eax >= 0x406000 ?TRUE : goto (7), FALSE : goto (4)
(4) mem[ebx] = mem[eax] << 2 + 1
(5) eax = eax + 4
(6) ebx = ebx + 4
goto (3)
(7) jump 0x400000
- but this second instance is a more obvious match to the given template
- so provided first instance to illustrate that even when additional
registers are included, that the first instance matches would be caught
--> some resilience based on semantic matching to hacker obfuscation
- execution context : an assignment of values (of appropriate types) to
the set of symbolic constants, C_T
-- e.g. const_addr1 <-- 0x403000
const_addr2 <-- 0x400000
condition(X) <-- X >= 0x406000
f(X) <-- X << 2 + 1
c <-- 4
d<-- 4
-- EC_T : execution context for a template T
EC_T( c ) : where c is a constant, returns the value of that constant
-- EC_T : is a function with domain C_T such that
for all c in C_T, the type of c and the type of EC_T(c) are the same
- EC_T( T ) : the template obtained by replacing every constant c in C_T by EC_T(c)
- state s^T for the template T is a 3-tuple denoted : ( val^T, pc^T, mem^T )
-- val^T : V_T --> Values, is an assignment of values to the variables in V_T
-- pc^T : value for the program counter
-- mem^T : Addr --> Values, gives the memory content
- a state for the instruction sequence I is a 3-tuple (val, pc, mem)
-- val : Reg --> Values, is an assignment of values to the set of registers
-- pc : a value for the PC
-- mem : Addr --> Values, gives memory contents
- S^T is state space for template; S is state space for instruction sequence
- given EC_T for a template T and template is in state s^T
- if execute instruction i from the template EC_T(T) in state s^T, we
transition to a new state s_1^T and generate a system event e (event
may be sys call or kernel trap, e.g.); denote this via :
s^T ---[e]---> s_1^T
- if an instruction i does NOT generate a system event, e(i) = null
[it generates the null event]
- for every initial template state, s_0^T, executing T in an execution
context EC_T generates a sequence :
SIGMA( T, EC_T, s_0^T ) = s_0^T ---[e_1^T]---> s_1^T ---[e_2^T]---> ...
- for i >= 1,
s_i^T == state after executing i'th instruction from the template EC_T(T) and
e_i^T == the event generated by the i'th instruction
--> s_i^T ---> execute (i+1)-th instruction ---> generate event ---> s_(i+1)^T
- if template doesn't terminate, sequence can be infinite
SIGMA( I, s_0 ) == sequence when the instruction sequence I is executed
from the initial state s_0
==========================================================================
- Definition 1 : "instruction sequence I contains a behavior specified by T"
==========================================================================
an instruction sequence I contains a behavior specified by the template
T if there exists a program state s_0, an execution context EC_T and a
template state s_0^T such that mem( s_0^T ) == mem( s_0 ) and the two
sequences SIGMA( T, EC_T, s_0^T ) and SIGMA( I, s_0 ) are finite and :
-- note that the initial PCs don't have to be the same
------------------------------------
requirements for the two sequences :
------------------------------------
(1) SIGMA( T, EC_T, s_0^T )
= s_0^T ---[e_1^T]---> s_1^T ---[e_2^T]---> ... ---[e_k^T]---> s_k^T
SIGMA( I, s_0 )
= s_0 ---[e_1]---> s_1 ---[e_2]---> ... ---[e_r]---> s_r
Let affected( SIGMA( T, EC_T, s_0^T ) ) be the set of addresses such that :
mem(s_0^T)[a] != mem(s_k^T)[a]
(addresses whose contents before applying the template from the initial
state are different than AFTER applying the template to completion)
REQUIRED : mem( s_k^T )[a] == mem( s_r )[a] for all a in affected(...)
for any memory address whose contents change when applying the template,
the value of those contents are the same as the value of those contents
when applying the instruction sequence from the init state to completion
RECALL: the memory in the two starting states is identical
THIS SAYS: call M the set of memory addresses whose content changes
during the execution of the template. Then for every m in M,
mem[m] at the end of execution of the template must equal mem[m] at
the end of execution of the instruction sequence
--> NOTE that this is NOT the same thing as saying that the contents
of memory at the end of both must be identical
we only care about the memory addresses whose content changed
during execution of the template
this means that there could be some memory address A and its
contents could be changed during execution of the instruction
sequence BUT NOT during execution of the template
--> for example, edx in the given example
--> edx could have value 0 at the beginning of both
--> then edx could have value 0 at the end of the template but
have value 0x406000 at the end of the instruction sequence
==> this is OK; edx as such is "extraneous"
(2) ignoring null events, the event sequence < e_0^T, e_1^T, ..., e_k^T >
is a subsequence of < e_1, e_2, ..., e_r >
two system events e_i^T and e_j MATCH if the arguments to each are the
same and the return values from each are the same
(3) if pc( s_k^T ) is in affected( SIGMA( T, EC_T, s_0^T ) )
then pc( s_r ) is in affected( SIGMA( T, EC_T, s_0^T ) )
- if the program counter at the end of applying the template is an
address whose contents have changed (due to applying the template)
then the program counter after issuing the instruction sequence
must also have changed due to that instruction sequence
[A problem is that affected(...) contains ADDRESSES whereas pc( s_i^T )
refers to a VALUE or the *contents* of some address]
- do we assume that the *address* of the program counter stays static?
< Tangent re: condition 3; see cond_3_tangent.txt for details >
==============================================
- Definition 2 : Template Matching Problem (TMP)
==============================================
a program P satisfies a template T iff P contains an instruction
sequence I such that I contains a behavior specified by T
NOTATION : P |= T (P satisfies template T)
e.g. P is malware; T specifies decryption loop; if P contains *any*
instruction sequence I which contains the behavior specified by T
then P satisfies T (e.g. PGP should satisfy that T)
----------------
Variant Family :
----------------
- Let T be a set of templates (e.g. T = { T_1, T_2, ..., T_l } and
T_1 specifies decryption loop behavior, T_2 specifies email addy
search behavior, ... )
- The set T defines a variant family F :
{ P | for all T_i in T, P satisfies T_i }
-- note that it is possible for some family members, f in F,
to have *additional* behavior than what is specified by T
-- so adding functionality to a family member will not cause that
family member to be thrown out of the family
-- removing some functionality may result in booting the family
member though
------------------------------
- Theorem 1 : TMP is undecidable
------------------------------
Let M be a Turing machine and P_M be a program that uses instructions
and simulates M
Assume there is some address sp_addr which P_M doesn't alter the contents of
Before starting to simulate M, P_M sets mem[sp_addr] = 0
After simulating M, P_M sets mem[sp_addr] = 1
Consider template T_1 : mem[sp_addr] = 0
mem[sp_addr] = 1
So P_M satisfies template T_1 if and only if M halts which implies a
solution to the halting problem which is undecidable therefore...
------------------
A weaker semantics
------------------
Semantics from above may be too strict; imagine a template uses certain
memory locations to store temporary values then such mem locs should not
be checked for equality in comparing the executions of an instruction
sequence and a template.
core memory locations: core( SIGMA( T, EC_T, s_0^T ) ) is a subset of the
affected memory addresses
Then modified condition 1 :
- for all a in core( SIGMA( T, EC_T, s_0^T ) ),
mem( s_k^T )[a] = mem( s_r )[a]
- so we only check "core" memory locations for equality
How to define "core" memory locations?
- label each instruction in I_T as "temp" or "persistent"
- core memory locations = any address in affected( ... ) that is the
target of a PERSISTENT load or store
=======================================
III. Semantics-aware matching algorithm :
=======================================
- Sound but not complete
Sound : no false positives (anything that matches definitely matches)
--> every solution produced is indeed a solution
Complete : no false negatives (catch everything that matches)
--> produces all possible solutions
- tool takes as input : template and a binary (for MSFT Windows on Intel
IA-32 (x86) platform)
- tool outputs whether any fragment of the program contains the behavior
specified by the template
-- describes the matched fragment of the program plus relates template
vars to program registers and memory locations and lists irrelevant
instruction sequences that are part of the matched program fragment
- architecture : tool built on top of IDA Pro disassembler
- binary comes in and is disassembled, create control flow graphs from
it (one per program function), produce intermediate representation (IR)
-- IR generated using library of x86 instruction transformers
-- IR is architecture-independent
-- IR can still contain library and sys calls that are platform specific
--> remove this dependency by using summary functions expressed in IR
- not much info on this (how works exactly, examples, ...)
==> point of all this, though, is for IR to be platform- and
architecture-independent, so could conceivably use one template
to detect some virus with semantically identical variants
produced for Win32, Linux, ...
- so malware detector algo takes Template and program IR, interacts with :
- decision procedures, which include :
-- nop library, random execution, simplify, UCLID
and outputs YES or NO
nop library : predefined patterns of irrelevant instructions
randomized execution : randomized x86 interpreter
simplify : theorem prover
UCLID : decision procedure with bit-vector arithmetic
===============================================
- algo description, malware detection algo : A_MD
===============================================
- finds, for each template node, a matching node in the program
- a template node is an expression; e.g. "A = const_addr1", "condition(A)?",
- a program node is an expression; e.g. "eax = 0x403000", "eax >= 0x406000?"
- two nodes, n_T (from the template) and n_P (from the program) match if
there exists an assignment to vars from the n_T expression that unifies
it with n_P
- once two matching nodes are found, check whether def-use relationships
true b/n template nodes also hold true in the corresponding program
-- def-use relationship :
two nodes n_1 and n_2 are related by a def-use relationship IF
n_1's DEFinition of a var reaches n_2's USE of the same var
e.g. n_1 = statement S_i and n_2 = statement S_j and
n_1 defines some variable X and n_2 *uses* that value of X
(as set during statement S_i)
[see def_use_rp.txt in this folder for slighly more in-depth
though still somewhat elementary explanation]
- if all the nodes in the template have matching counterparts in the
program that satisfy these conditions, then the algo has found a
program fragment that satisfies the template and produces a proof of
this relationship
- formally, given a malicious code template T and a program P,
A_MD : is a predicate on pairs of programs and templates
A_MD : Programs x Templates ---> { "yes", "don't know" }
"don't know" == algo can't determine that prog satisfies the template
such that A_MD( P, T ) returns "yes" when P satisfies T under the
conditions below
----------------------
Simplified template T; X^key == X (bit-vector XOR) key
----------------------
(1) A = const_addr1// b_1
(2) B = const_addr2// b_2
(3) A > const_addr3 ? TRUE: goto 7, FALSE: goto 4// b_3
(4) mem[B] = mem[A]^key// b_4
(5) A = A + c// b_5
(6) B = B + d// b_6
goto 3 // they have another typo here
(7) jump const_addr2// b_7
---------------------
Simplified program P:
---------------------
(1) eax = 0x403000// a_1
(2) ebx = 0x400000// a_2
(3) nop// a_3
(4) ecx = eax + 1// a_4
(5) ecx - 1 > 0x406000 ? TRUE: goto 10, FALSE: goto 6// a_5
(6) eax = ecx - 1// a_6
(7) mem[ebx] = mem[eax]^5// a_7
(8) eax = eax + 1// a_8
(9) ebx = ebx + 2// a_9
goto 4
(10) jump 0x400000// a_10
===========================================================
For A_MD(P, T) = "yes", the following conditions must hold:
===========================================================
-----------------------------------------------
(1) matching of template nodes to program nodes
-----------------------------------------------
- each node n in T has to unify with a node m in P
- call n_T : the set of template nodes and n_P the set of program nodes
- IOW, there is an onto partial function f from the nodes of P to the
nodes of T (there may exist nodes p in n_P such that f(p) = not defined;
however for every node t in n_T, there exists some p in n_P such that
f(p) = t)
then f : { nodes in P } ---> { nodes in T }
{ n_P } ---> { n_T }
{ a_1, a_2, ..., a_m } ---> { b_1, b_2, ..., b_n }
m >= n
then: f( a_i ) = b_j for some i,j
and a_i and b_j are unifiable
B( X, n, m ), where n = f( m )
== the binding of variable X (where X is referred to in the template
instruction at node n) TO an expression referred to in the
program instruction at node m;
==> So is it okay if more than one program node maps to the SAME
template node? By the definition of f, this is allowed...
e.g. B(A,1,1) = eax alsoB(B,2,1) = eax
B(A,2,2) = ebxB(A,1,2) = ebx
B(A,3,5) = (ecx - 1)
B(A,5,8) = eax
B(B,6,9) = ebx
----------------------------------------------------------------------------------
sample definition of f [there is more than one possible that satisfies this cond]:
----------------------------------------------------------------------------------
a_1 --> b_1
a_2 -->b_2
a_3 -->undefined
a_4 -->undefined
a_5 -->b_3
a_6 -->undefined
a_7 -->b_4
a_8 -->b_5
a_9 -->b_6
a_10 --> b_7
---------------------------------
(2) preservation of def-use paths
---------------------------------
- for each node n in T, define def(n) as the set of vars defined in n
- for each node n in T, define use(n) as the set of vars USED in node n
---------- ------ ------
nodes in T def(n) use(n)
---------- ------ ------
b_1 A none
b_2 B none
b_3 none A
b_4 none A, B
b_5 A A
b_6 B B
b_7 none none
- nodes n_pre : predecessor to the template entry node
- nodes n_post : successor to all template exit nodes
- def( n_pre ) = use( n_post ) = V_T
- so def( n_pre ) = set of all nodes defined in the template?
- and use( n_post ) = set of all nodes used in the template?
- a def-use path is a sequence of nodes < n_1, n_2, ..., n_k > such that
+ for 1 <= i < k, there is an edge from n_i to n_(i+1) in T
[i.e. < n_1, n_2, ..., n_k > is a valid path through template T ]
+ I( n_1 ) == the instruction at node n_1
I( n_1 ) DEFines a variable that I( n_k ) USEs
[i.e. def( n_1 ) INTERSECT use( n_k ) != {} ]
+ none of the instructions at the intermediate nodes redefine the
value of the variables in def(n_1)
[i.e. def(n_1) INTERSECT def(n_i) = {} for 1 < i < k ]
- so from our template graph,
paths : n_1 --> n_2 --> n_3
n_3 --> n_4
n_3 --> n_7
n_4 --> n_5 --> n_6 --> n_3
def( A ) : { n_1, n_5 } use( A ) : { n_3, n_4, n_5 }
def( B ) : { n_2, n_6 } use( B ) : { n_4, n_6 }
---------------------------------------------
def-use relationships, denoted duse(n_1,n_2):
---------------------------------------------
A : duse( n_1, n_3 ) path : n_1 --> n_2 --> n_3
A : duse( n_1, n_4 ) path : n_1 --> n_2 --> n_3 --> n_4
A : duse( n_1, n_5 ) path : n_1 --> n_2 --> n_3 --> n_4 --> n_5
A : duse( n_5, n_3 ) path : n_5 --> n_6 --> n_3
A : duse( n_5, n_4 ) path : n_5 --> n_6 --> n_3 --> n_4
A : duse( n_5, n_5 ) path : n_5 --> n_6 --> n_3 --> n_4 --> n_5
B : duse( n_2, n_4 ) path : n_2 --> n_3 --> n_4
B : duse( n_2, n_6 ) path : n_2 --> n_3 --> n_4 --> n_5 --> n_6
B : duse( n_6, n_4 ) path : n_6 --> n_3 --> n_4
B : duse( n_6, n_6 ) path : n_6 --> n_3 --> n_4 --> n_5 --> n_6
- any def-use relationship in the template, e.g. between nodes n and n',
must hold between nodes m and m' WHERE n = f(m) and n' = f(m')
- consider two nodes n and n' in T that are def-use related
- let nodes n and n' in T be matched to nodes m and m' in P
- suppose X is a var that is defined in n and used in n'
B( X, n, m ) = r "X is bound to expression r in node m"
B( X, n', m' ) = r' "X is bound to expression r' in node m'"
- for example, n = n_1, n' = n_3, X = A
m = m_1, m' = m_5, X = A
then B(X, n_1, m_1 ) = eax
B(X, n_3, m_5 ) = (ecx - 1)
so r = eax and r' = (ecx - 1)
then need to make sure that value of (eax) after "eax = 0x403000" (m_1)
is equal to value of (ecx - 1) before "ecx - 1 > 0x406000" (m_5)
*for every program path corresponding to the template path from n_1 to n_3*
- in this case there is just one such program path
- need to make sure that value of r AFTER executing I(m) is same as
value of r' BEFORE executing I(m')
- let's try another example: n_6 and n_4 are related by def-use r/p in T
and n_6 corresponds to m_9 and n_4 corresponds to m_7
B( B, n_6, m_9 ) = ebx
B( B, n_4, m_7 ) = ebx
condition is trivially satisfied
===========================================================
- Theorem 2 : A_MD is sound with respect to the TMP semantics;
===========================================================
i.e. A_MD( P, T ) = "yes" ===> P |= T
- sound but not complete
- can show there exists a program P and a template T such that
P satisfies T but A_MD( P, T ) returns "don't know"
template T program P
---------- ---------
for ( i = 0; i < 10; i++ ) for ( i = 0; i < 10; i += 2 )
a[i] = 0; a[i] = 0;
for ( i = 1; i < 10; i += 2 )
a[i] = 0;
- A_MD won't catch this matching as the ordering of updates in the
template loop is different than that of the program loops
=================
- Local unification
=================
-- produces an assignment to vars in an instruction at a template node
such that this matches a program node instruction
-- program node expressions contain ground terms only
-- instantiate template vars with program expression
-- template node 1 ( A = const_addr1 ) can be unified with
program node 1 ( eax = 0x403000 ) OR
program node 2 ( ebx = 0x400000 )
but not with program node 8 ( eax = eax + 1 ) or 9 ( ebx = ebx + 2 )
++ so note that unification assigns to vars BUT ALSO to constants
-- [ t_1, p_1 ] { A <-- eax, const_addr1 <-- 0x403000 }
[ t_2, p_2 ] { B <-- ebx, const_addr2 <-- 0x400000 }
[ t_3, p_5 ] { A <-- ecx - 1, const_addr3 <-- 0x406000 }
[ t_4, p_7 ] { B <-- ebx, A <-- eax, key <-- 5 }
[ t_5, p_8 ] { A <-- eax, c <-- 1 }
[ t_6, p_9 ] { B <-- ebx, d <-- 2 }
[ t_7, p_10 ] { const_addr2 <-- 0x400000 }
--> B is only assigned ebx so we don't have to worry about B
--> const_addr1 is only assigned 0x403000
--> const_addr2 is only assigned 0x400000
--> key is only assigned 5
--> const_addr3 is only assigned 0x406000
--> c is only assigned 1
--> d is only assigned 2
--> A is assigned both: { eax, ecx - 1 }
RULES about producing an assignment to vars in an instruction at a
template node such that the resulting expr matches a prog node instr:
(1) a var in the template can be unified with any prog expr except for
assignment expressions
(2) a constant in the template can only be unified with a prog constant
(3) the memory function (in the template) can only be unified with the
program's memory function
(4) a predefined operator in the template (e.g. OR, AND, XOR, NOT,
BITAND, LEQ, PLUS, MINUS, ...) can only be unified with THE SAME
operator in the program
(5) an external function call in the template can only be unified with
the same external function call in the program
==> result of this is a binding relating template variables to program expressions
- note that we don't require that bindings be consistent (monomorphic) since
as above we have that A is bound to eax at one point yet bound to ecx - 1
at another
-- such a requirement would be overly restrictive (not resilient to
obfuscation attacks)
- instead we want to check program expressions bound to the same template
variable (e.g. eax is bound to A and so is ecx - 1 )
--> verify that such expressions change value in the same way that the
template variable changes value
---------------------=====================================
ACCOMPLISH THIS via : value preservation on def-use chains
---------------------=====================================
- for each def-use chain in the template, check :
whether the value of the program expression corresponding to the
template-variable use is the same as the value of the program
expression corresponding to the template-variable definition
- e.g. t_1 and t_3 are def-use related for variable A
p_1 maps to t_1 and p_5 maps to t_3
R : the program fragment between (exclusive between) p_1 and p_5
R : ebx = 0x400000
nop
ecx = eax + 1
so the value for A after p_1 must be the same as the value for A
before p_5
-- the value for A at p_1 is : eax
-- the value for A at p_5 is : ecx - 1
so, confirm that from < p_2, p_3, p_4 >, eax == (ecx - 1)
"the expressions corresponding to template variable A after p_1
must be equal in value to the expressions corresponding to
template variable A before p_5 (since p_1 and p_5 are related by
a def-use relationship for variable A)"
express this via a predicate :
for each path I in program fragment R
val_pre( I )(eax) = val_post( I )(ecx - 1)
[e.g. I = <p_2, p_3, p_4>]
val_pre( I )( x ) : the value of x before path I
val_post( I )( x ) : the value of x after path I
- for each match between a program node p_i and a template node t_j, the
algo checks whether the def-use chain is preserved for each program
expression corresponding to a template var used at the matched node
- does R maintain the value predicate PHI :
PHI = { for all I in R,
( val_pre<I>(eax) = val_post<I>( ecx - 1 ) )
}
- treat decision procedures as oracles that answer the question,
"Does a program fragment R maintain an invariant PHI?"
- another example, def-use chain t_5, t_3 maps to p_8, p_5
R : < p_9, p_4 >
: ebx = ebx + 2
ecx = eax + 1
definition of A in t_5 : eax
definition of A in t_3 : ecx - 1
val_pre( I )( eax ) =? val_post( ecx - 1 ) Yes.
- another example, def-use chain t_1, t_4 maps to p_1, p_7
R : < p_2, p_3, p_4, p_5, p_6 >
: ebx = 0x400000
nop
ecx = eax + 1
ecx - 1 > 0x406000 ?
eax = ecx - 1
definition of A in t_1 : eax
definition of A in t_5 : eax
val_pre( I )( eax ) =? val_post( eax ) Yes.
=====================================================
- Using decision procedures to check value-preservation
=====================================================
- want to figure out whether a program fragment P is a semantic nop
with respect to some conditions, e_1 and e_2
if -- for every path I in that program fragment -- the value of e_1
*before* that path is the same as the value of e_2 *after* that path
then that PHI( I, e_1, e_2 ) holds
if ( PHI( I, e_1, e_2 ) for every I in R holds )
then PHI( P, e_1, e_2 ) holds and D returns TRUE
if on the other hand, there exists some I' in R for which
PHI( I', e_1, e_2 ) == "don't know" or "false"
then D( P, e_1, e_2 ) = "don't know" or "false"
we can also determine whether: !PHI( P, e_1, e_2 ) holds...
if ( PHI( I, e_1, e_2 ) for every I in R *does not hold* )
then !PHI( P, e_1, e_2 ) does not hold
D^+ : a decision procedure for PHI( P, e_1, e_2 )
D^- : a decision procedure for !PHI( P, e_1, e_2 )
=======================================================================
- "semantic nops" : some program fragment I is a semantic nop with
respect to two conditions e_1 and e_2 if the value of e_1 before that
fragment is the same as the value of e_2 after that fragment
=======================================================================
- value-preservation oracle : decision procedure, determines whether a
given program fragment is a semantic nop w.r.t. certain prog vars
D determines whether the value predicate,
PHI( P, e_1, e_2 )
holds for all paths in P; i.e.
for all paths I in P,
( val_pre<I>(e_1) = val_post<I>(e_2) )
-------------
- NOP library : identifies sequences of nop instructions
-------------
-- annotate basic blocks as nop sequences when applicable
-- use actual nop instructions plus some sequences which are known to be nops
-----------------------------
- randomized symbolic execution
-----------------------------
-- program frag executed using a random initial state
-- at completion, check whether !PHI(P,e_1,e_2) is true
-- if so, at least one path in the program fragment is not a
semantic nop; and so whole P is NOT a semantic nop
-- else, whole P may be a semantic nop
--------------------------
- theorem proving : SIMPLIFY
--------------------------
-- value preservation problem formulated as a theorem to be proved
(as long as program frag has NO loops)
-- program frag represented as state transformer DELTA
-- then use SIMPLIFY to prove the formula
DELTA ==> PHI( P, e_1, e_2 )
- if SIMPLIFY confirms that the formula is valid,
then the program fragment is a semantic nop
-- can't handle programs which include bit-vector operations
2: ebx = 0x400000
3: nop
4: ecx = eax + 1
Simplify formula :
(IMPLIES (AND (EQ ebx_1 4194304)
(EQ ecx_4 (+ eax_pre 1) )
(EQ ecx_post ecx_4)
)
(EQ eax_pre (- eax_post 1))
)
same as:
(ebx_1 == 4194304) && (ecx_4 == (eax_pre + 1)) && (ecx_post == ecx_4)
==> ( eax_pre == (eax_post - 1) )
- theorem proving : UCLID
-- based on UCLID infinite-state bounded model checker
-- logic supported by UCLID is a superset of that supported by SIMPLIFY
-- UCLID precisely models int and bit-vector arith
-- model program fragment instructions as state transformers for each
register and for memory
-- UCLID simulates program frag for a given # of steps and determines
whether PHI( P, e_1, e_2 ) holds at the end of the simulation
=========================
Strengths and Limitations
=========================
----------------------------------------------------------------
4 classes of obfuscations, A_MD can handle (to varying degrees):
----------------------------------------------------------------
- code reordering, equivalent instruction replacement, register renaming,
garbage insertion
(1) code reordering : change order of instructions; maintain execution
order using jumps
- handled via use of control flow graphs
(2) register renaming : reassigns registers in selected program fragments
- so any tool searching for signature which is based on specific register
names will miss the obfuscated variant
- use templates; vars assigned registers only during unification
- so can detect any program with the same behavior as the template
irrespective of register allocation
(3) garbage insertion : insert instruction sequences that don't change
behavior of program
- uses decision procedures to reason about value predicates on def-use chains
(4) equivalent instruction replacement
- uses rich instruction set to replace groups of consecutive instructions
with other short instruction sequences that have the same semantics
- e.g. to add 1 to register X, can use: < add X, 1 > or < inc X >
- handle limited kind of this via normalize code into IR which supports only
semantically-disjoint operations
-----------
LIMITATIONS
-----------
- doesn't recognize that: "x = x + 2" is equivalent to "eax = eax << 1"
- can only detect programs that exhibit same ordering of memory updates
-- recall the problem where we populated a[i] with zero via pattern:
<1,2,3,...,20> versus <2,4,6,8,...,20,1,3,5,...,19>
-- comes from use of def-use chains for value preservation checking
but NB : automatically generating functionally-equivalent variants is hard
Used : (1) decryption-loop template and (2) mass email template