++++++++++++++++++++++++
In-line function hooking (see especially the Detours library from Hunt/Brubacher)
++++++++++++++++++++++++
Detours library (using it raw or unmodified):
- user-level hooking only
why? because what detours lib does is overwrite the first 5 bytes of
the hooked function ... and if the hooked function lives in kernel
space, then we'll get a fatal mem error when we try to overwrite
these bytes
--> confirm : c'est vrai. also the addys of kernel syscalls not
accessible in same way that addys of win32 api are. so can't use
existing framework to trivially obtain kernel function addys,
i.e. those found in the SSDT. Even forgetting for a moment that
operating as a userland program we wouldn't have sufficient
permissions to change the permissions on such memory regions in the
way that is done here for detours.
==> in summary, not trivial to adapt detours to work for kernel syscall
hooking, is not clear that it would be better to adapt detours vs. to
start from scratch for such a goal.
- also check on whether detours is an NT/XP/2k/2003 only soln or whether
also works on '9x ===> seems not.
- works for all x86 versions of NT, Win2K, XP
"However, under Windows 95, Windows 98, and Windows ME, the DetourFunction*
APIs do not work unless the program is running under a debugger (the
process was created with the DEBUG_PROCESS flag on the call to the
CreateProcess* APIs)."
--> limitation derives from two sources:
(1) non-support of Win '9x (including Win ME) for
CreateRemoteThread(...) is why injdll doesn't work for Win '9x
(2) the fact that shared virtual memory (which goes from 2 GB to 3 GB
in Win '9x) is not copy-on-write. So VirtualProtect, VirtualProtectEx
don't work on memory regions w/in that range.
System DLLs are mapped into that area. So we can't (trivially) change
the permission on memory regions containing win32 api functions...
which means we can't overwrite the first five bytes of such functions
to cause control to transfer to our detour function.
==> note that if the process is created with the DEBUG_PROCESS
flag, DLLs *are* mapped with copy-on-write protection...
==> so does VirtualProtect{Ex} still fail in such a case?
if so, then even if we modify "withdll.exe" to create the
process with the DEBUG_PROCESS flag, we still can't use
detours in its present form to hook Win '9x
if not, then GetProcAddress(...) appears to operate
differently for Win '9x apps run in Debug mode, i.e. it
returns a debug thunk address, not the actual address
- so, would take some work to adapt detours to follow the
debug thunk to the actual code and overwrite that there?
Appears that if we ONLY use "withdll" with our target executables
and use -d:traceapi and change <detours/src/creatwith.cpp> function
DetourCreateProcessWithDll{A,W} to create the process with the
DEBUG_PROCESS creation flag, that hooking will work on Win '9x.
==> Maybe there's still a hangup which maybe has to do with
the fact that FlushInstructionCache(...) isn't supported
for Win '9x or maybe it works correctly transparently.
======================================================================
So I think the deal here is that the problem comes in overwriting the
first 5 bytes of target functions
Target functions by and large live in DLLs
DLLs live in shared virtual memory
For Win '9x, shared virtual memory is from 2 GB to 3 GB
And the function which changes the permissions on memory SO THAT 5
bytes of it can be overwritten DOESNT work for Win '9x for shared
virtual memory
- VirtualProtect, VirtualProtectEx for Win '9x (and ME) don't work
for shared virtual mem (2GB to 3GB)
- system DLLs are loaded into shared virtual mem
- so if we want to change the code in one such DLL's function (which
presumably also lives in that 2 GB to 3 GB space) then we'd need
to first change the mem permissions on that area to READ/WRITE/EXEC
- for Win '9x we cannot make that change via VirtualProtect nor
VirtualProtectEx for any mem region w/in 2 GB to 3 GB
I do believe that the "withdll" functionality still works... to inject
a DLL into a user process, however "injdll" -- which injects a dll
into an *existing* user process -- would NOT work since that fxnality
relies on CreateRemoteThread which isn't supported for Win '9x
"While Windows NT, Windows 2000 and Windows XP always map DLLs into
processes with copy-on-write mapping (which Detours needs in order
to patch the binary image), Windows 95, Windows 98, and Windows ME
only map DLLs with copy-on-write if the process was started with
the DEBUG_PROCESS flag on the call to CreateProcess." [README]
"Windows 95 doesn't implement copy-on-write in the operating system.
With copy-on-write, the operating system will share a common code
page in memory, but when a process writes to that memory, the memory
is copied so that the individual process gets its own copy that will
not interfere with any other process. In the Windows 95 architecture,
any memory that is above the 2GB line is shared among all processes.
If one process were to write a breakpoint to this shared memory area
without the copy-on-write, the breakpoint would apply to all processes,
not just the one being debugged."
======================================================================
CreateRemoteThread(...) not supported for Win '9x
- only supported for 2k, nt, xp
- a means to injecting a DLL into a targeted process
-- why is this necessary?
-- absent this injection, you can't force a process to call your
functions or, if it does, to be able to resolve those functions
- we want to force a process to call LoadLibrary(...) with our DLL as the arg
HANDLE CreateRemoteThread( HANDLE hProcess, // IN
LPSECURITY_ATTRIBUTES lpThreadAttributes, // IN
SIZE_T dwStackSize, // IN
LPTHREAD_START_ROUTINE lpStartAddress, // IN
LPVOID lpParameter, // IN
DWORD dwCreationFlags, // IN
LPDWORD lpThreadId // OUT
);
// hProcess : a handle to the process in which this thread is to be created
--> that handle must have the following access rights :
(1) PROCESS_CREATE_THREAD
(2) PROCESS_QUERY_INFORMATION
(3) PROCESS_VM_OPERATION, PROCESS_VM_READ, PROCESS_VM_WRITE
// lpThreadAttributes : pointer to security attributes of new thread
--> specifies security descriptor for new thread
--> if NULL, thread gets default security descriptor and the
returned thread handle canNOT be inherited
// dwStackSize : initial size of the new thread's stack in bytes
--> if 0, uses default size for the executable
// lpStartAddress : pointer to application-defined function to be
// executed by the thread; represents starting address of the thread
// in the remote process
--> ThreadProc function : is an application-defined function
- serves as starting address for a thread
DWORD WINAPI ThreadProc( LPVOID lpParameter );
--> lpParameter : thread data passed to the function
// lpParameter : pointer to var to be passed to thread function
// dwCreationFlags : flags that control creation of the thread
--> if 0, thread runs immediately after created
// lpThreadId : pointer to a var that receives the thread ID
--> if NULL, thread ID is not returned
- So how do we use this?
/* ------------------------------------------------------------
* So we write our own version of TheadProc which essentially
* calls LoadLibrary on the string provided
* ------------------------------------------------------------ */
DWORD WINAPI ThreadProc( LPVOID lpParameter ) {
HMODULE targLib = LoadLibrary( lpParameter );
return targLib;
}
void main( ) {
HANDLE hProcessForHooking = ;
hThread = CreateRemoteThread( hProcessForHooking,
NULL, // thread attrs
0, // stack size
ThreadProc, // pointer to fxn to execute
"C:\\HookTool.dll", // argument to that fxn
0, // creation flags
NULL ); // thread ID wont be returned
}
- so, the deal with detours is that there are a couple different
functionalities on which other functionalities build
+++++++++++++++++++++++++++++++++++++++++++++++++++++
withdll : is defined in <detours/samples/withdll.cpp>
+++++++++++++++++++++++++++++++++++++++++++++++++++++
e.g. usage : withdll -d:traceapi.dll myexe.exe
this will :
(1) create a process with the specified app name and (optional) args
- this process is created with the suspend flag so that it is
initially suspended
- this is done via : <detours/samples/creatwith.cpp> function :
P = DetourCreateProcessWithDll{A,W}
(2) then <detours/src/creatwith.cpp> function :
InjectLibrary( P.hProcess,
P.hThread,
GetProcessAddress(
GetModuleHandle{A,W}(kernel32.dll),
LoadLibrary{A,W} ),
traceapi.dll,
strlen(traceapi.dll) + 1 );
which :
(a) suspends the thread
(b) gets the contents of the control registers (ESP, EIP, EBP, ...)
-- which includes current stack pointer (ESP)
-- sets nCodeBase = ESP - { space for our assembly code +
space for our args }
==> we're going to write some assembly code and this is the
address (within the addy space of the given process) where
that code will begin (and so execution should begin)
-- will create a buffer with assembly code instructions which will :
(1) PUSH "your_dll_name" onto the stack
(2) CALL LoadLibrary (where (1) is arg to that call)
(3) restore the EAX, EBX, ..., ESI, EDI, EBP, ESP, ...
values to what they are in (b)
(4) JMP <to original code start, EIP from (b)>
-- then makes stack pointer point 4 below
-- and instruction pointer point to where your code will be
written to (nCodeBase)
- changes permissions to read/write nCodeBase
- writes starting at nCodeBase with above assembly code (which will
cause app to LoadLibrary( yourdll ) then restore the registers
to their current contents (before that call) then return to
the code they were originally going to execute)
- then calls FlushInstructionCache(...) to make sure that this
new code (starting at nCodeBase) overwrites any existing code
in memory for this process
- then sets the thread context so that the new ESP and EIP will take hold
- then resumes the thread's execution
==> basically inserts a LoadLibrary(...) call for an arbitrary
DLL (specified by you via the command line) into the process's
code so that this LoadLibrary is done before the process begins
executing then the process returns to exec'ing its normal /
original code.
--> so the 64k question is : is this fxnality supported on Win '9x?
- well, FlushInstructionCache in Win '9x has no effect
- for Win '9x, VirtualProtectEx cannot be used on any mem region
in shared virtual address space (0x8000000 - 0xbfffffff)
-- which is from 2 GB to 3 GB in the virtual addy space
-- as noted, this region is shared between processes
-- system DLLs are loaded here, also memory mapped files are
mapped here
==> So in this case the memory whose protection bits we want to
change lives on the user stack, which is probably somewhere in
the user virtual addy space (from 4 MB to 2 GB)
- so we should be able to call VirtualProtectEx on that area
- and we should be allowed to execute code that lives on the stack
(in that location that we've just written w/our assembly code)
- maybe the inability to call FlushInstructionCache to an
effect is a deal breaker but, if not, seems that this
functionality should hold for the win '9x model
+++++++++++++++++++++++++++++++++++++++++++++++++++
injdll : is defined in <detours/samples/injdll.cpp>
+++++++++++++++++++++++++++++++++++++++++++++++++++
e.g. usage : injdll -p:<pid> -d:traceapi.dll
- this injects a DLL into an already-executing process (the PID of
which is specified above as a command-line arg to this program)
- this opens the specified process
- then calls DetourContinueProcessWithDll{A,W} which is defined in
<detours/src/creatwith.cpp>
- and which does :
calls InjectLibraryOld (also in <detours/src/creatwith.cpp>)
which in turn calls CreateRemoteThread to inject the provided dll
(from the command line, e.g. "traceapi.dll" from above)
==> This (injdll) is the functionality that requires support for
CreateRemoteThread and so is NOT supported for Win '9x
except if the original process (which we are attempting to inject
a DLL into was created in DEBUG mode, which is highly unlikely)
- basically what this does is :
(1) open specified process
(2) allocates memory in that process with read/write permissions
(3) writes ThreadFunc function and argument to it in that memory
(4) then calls CreateRemoteThread -- passing function address and
arg address where just wrote ThreadFunc to that (remote)
process's address space as well as the name of the DLL to inject
(5) then waits for that created thread to complete executing then
closes the handle to it then returns
==> won't work for Win '9x since CreateRemoteThread(...) not supported
on that (those) platforms
--------
I. Intro
--------
Detours : library for intercepting arbitrary win32 binary functions (read:
win32 api functions) on x86 machines
- interception code applied dynamically at run-time
- replaces first few instructions of target function (which we'll call OVERWRITTEN)
with an unconditional jump to the user-provided detour function
- then the trampoline function consists of: OVERWRITTEN then an
unconditional jump to the remainder of the target function
- the detour function can then invoke the target function as a subroutine
via invoking the trampoline function
- detours are inserted at execution time
-- code of target function modified in memory
- detours guaranteed to work "regardless of the mehod used by the app or
system code to locate the target function"
-- think they really mean : "regardless of whether the function is (in a
library) that is statically linked, dynamically linked or delay loaded..."
Detours also provides functions :
- to edit the IAT of any binary
- to inject a DLL into a new or an existing process
-- then the injected DLL can can detour any win32 function "whether in
the application or system libraries"
- to attach arbitrary data segments to existing binaries
------------------
II. Implementation
------------------
===================================
A. Interception of binary functions
===================================
- at runtime, detours replaces first few instructions of target function
with an unconditional jump to user-provided detour function
- when execution reaches the target function, control jumps directly to
the user-supplied detour function
- detour function does whatever
- then detour function may return control to the source function (the
original caller) OR may invoke the trampoline function - which invokes
the target function without interception
- when the target function completes, it returns control to the detour function
- the detour function does whatever then returns control to the source function
++++
How?
++++
The detours library intercepts target functions by rewriting their
in-process binary image
- rewrites target function
- rewrites matching trampoline function
The tramp function can be allocated dynamically or statically.
- if statically, the trampoline always invokes the target function w/o
the detour
- before insert a detour, static trampoline contains single jump to target fxn
- after insert detour, trampoline contains OVERWRITTEN and jmp to remainder
of target function
-----------------------
To detour a function...
-----------------------
- alloc mem for dynamic tramp fxn (if no static tramp provided)
- enable write access to both the target and the tramp
- copies instructions from target to tramp until at least 5 bytes have
been copied
- then adds a jmp instruction at end of tramp to the first non-copied
instruction of target fxn
- restore original page permissions on both target and tramp
- flushes CPU instruction cache
==================================
B. Payloads and DLL Import Editing
==================================
Attach arbitrary data segments to a win32 binary ("payloads")
Edit DLL import address tables
Detours creates new section : .detours
- between export table (the RVA of which is specified in the 0'th entry
of the DataDirectory which itself is in the Optional Headers which are
part of the IMAGE_NT_HEADERS of a PE file) and the debug symbols
- debug symbols MUST reside last in a win32 binary
- the .detours section contains a detours header record and a copy of the
original PE header (PE header == IMAGE_NT_HEADERS)
- if modifying the IAT, detours creates new IAT, appends it to the copied
PE header, then makes the original PE header point to the new IAT
-- "makes the original PE header point to the new IAT" == change the
RVA stored in the second entry in the Data Directory (which points
to the IAT RVA) to contain the RVA of the new IAT in our .detours section
- any data segments to be added are then written at the end of the
.detours section then the debug symbols are appended
- reversal easy : restor original PE header from .detours section then
remove .detours section
---------------------
Why create a new IAT?
---------------------
- preserves original IAT
- new IAT can contain renamed import DLLs and functions or entirely new
DLLs and functions
--> can make YOUR DLL be the first one loaded when an app runs
- question : so this is done at run-time? when, precisely? after the app
has been loaded? or before? (before makes mor sense but how modify
app's image in mem before that app has been loaded?)
Detours also provides routines for enumerating the binary files mapped
into an address space; can also locate payloads w/in those mapped binaries
- each payload identified by a 128-bit globally unique identifier (guid)
OK. I think this actually modifies the binaries on disk ... not just in mem
- which makes more sense
----------------------------------------------
Injecting a DLL into a new or existing process
----------------------------------------------
- inject : detours writes LoadLibrary(...) call into the target process
with VirtualAllocEx and WriteProcessMemory then invokes call with
CreateRemoteThread
==> believe this is NOT supported on Win '9x : CONFIRM
==> and figure : can detours itself be used on a Win '9x machine?
------------------
III. Using Detours
------------------
User code must include the detours.h header file and link with detours.lib
(1) to intercept a function with a static trampoline
- create the trampoline with the DETOUR_TRAMPOLINE macro
DETOUR_TRAMPOLINE( trampoline_prototype, target_name )
e.g. DETOUR_TRAMPOLINE( VOID WINAPI SleepTrampoline( DWORD ),
Sleep );
fyi, Actual Sleep function signature (from windows.h, is in kernel32.dll):
VOID Sleep( DWORD dwMilliseconds );
"Note that for proper interception: the prototype, target, trampoline,
and detour functions must all have exatly the same call signature
including number of arguments and calling convention."
interecepting the target function : invoke DetourFunctionWithTrampoline
with two args : (1) trampoline, (2) pointer to the detour function
note that the target function is already encoded in the trampoline
and so it not needed as an arg
e.g. DetourFunctionWithTrampoline( (PBYTE)SleepTrampoline,
(PBYTE)SleepDetour );
where
VOID WINAPI SleepDetour( DWORD dw ) {
return SleepTrampoline( dw );
}
(2) to intercept a function with a dynamic trampoline
- call DetourFunction with two arguments : (1) a pointer to the target
function and a pointer to the detour function
- e.g.
#include <windows.h>
#include <detours.h>
VOID (*DynamicTrampoline)(VOID) = NULL;
VOID DynamicDetour( VOID ) {
return DynamicTrampoline();
}
void main( void ) {
VOID (*DynamicTarget)(VOID) = TargetFunction;
DynamicTrampoline = (FUNCPTR)DetourFunction( (PBYTE)DynamicTarget,
(PBYTE)DynamicDetour );
...
// below function can be used w/either static or dynamic tramps
DetourRemoveTrampoline( DynamicTrampoline );
}
- DetourFunction : allocates a new trampoline and inserts the
appropriate interception code into the target function
Static tramps very easy to use when target function is available as a link
symbol;
DetourFindFunction : can find the pointer to a function when that function
is exported from a known DLL or if debugging symbols are available for the
target function's binary.
- takes two args : the name of the binary and the name of the function
- first tries via LoadLibrary(...) and GetProcAddress(...)
- then uses ImageHlp library to search available debugging symbols
- the fxn pointer returned by DetourFindFunction can be given to
DetourFunction to create a dynamic trampoline
Programmer's responsibility to make sure that no other threads are
exec'ing in addy space while a detour is inserted or removed
- one approach : call functions in the Detours library from a DLL main routine...
--------------
IV. Evaluation
--------------
Other approaches :
(1) call replacement in app source code
- calls to target fxn in app replaced with calls to detour fxn
- requires access to source code
(2) call replacement in app binary
- modify app binary to replace calls to target fxn w/calls to detour fxn
- requires being able to identify all applicable call sites
-- requires symbolic info which may not be present in general binaries
-- also would miss dynamically-linked calls to the target fxn
(i.e. which work by loading dll then getprocaddress(...)) as well
as calls which use late-demand binding?
(3) DLL redirection
- modify DLL import entries in binary to point to a detour DLL
- fails to intercept DLL internal calls and calls on pointers obtained
from GetProcAddress(...)
(4) Breakpoint trapping
- insert debugging breakpoint into the target function
- have debugging exception handler invoke the detour function
- but debugging exceptions suspend all application threads
- requires second OS process to catch the debug exception
==> heavy performance penalty
============
References :
============
http://research.microsoft.com/sn/detours/
http://research.microsoft.com/~galenh/Publications/HuntUsenixNt99.pdf
http://www.sisecure.com/pdf/cs-2003-01.pdf
+++++++++++++++++++
Detours Usage Notes
+++++++++++++++++++
(1) withdll
(2) setdll
(3)