Software versions: XP SP3 32-bit, Process Explorer 10.21
Part 1: Getting the CONTEXT
If you’ve used Process Explorer chances are you’ve checked out a thread stack or two. If you’ve ever tried to implement something similar yourself, the combo of SuspendThread, GetThreadContext, ResumeThread, and StackWalk64 have more than likely done a sterling job getting a user mode trace. But what about further up the stack, or those threads locked in kernel mode?
The DACL for threads belonging to the system process allows the THREAD_GET_CONTEXT permission for the administrators group [1], so the same formula should work, but alas it doesn’t. Any attempts to pass a thread handle with such rights to GetThreadContext returns FALSE with a last error of ERROR_INVALID_HANDLE. So how does Process Explorer handle it? The answer is it doesn’t, not really. StackWalk64 only uses the ebp, eip, and esp members of the CONTEXT struct so that’s all it fills in with help from a driver and as is to be expected from SysInternals, a whole bunch of undocumented behaviour.
To get the stack trace, we obviously need to know where the stack area is for a given thread. Like its user mode compatriot the TEB (actually the NT_TIB part), the ETHREAD structure has 2 members which point to the top and the bottom of the address range which delimts the stack. Unlike the TEB however, it has a third member pointing to current top of the stack, and this pointer is the golden egg.
Before we use it, we first need to do the simple step of gaining an ETHREAD when given a handle or ID. Luckily, there are documented functions for each. For a handle, the conversion is done via ObReferenceObjectByHandle and for an ID by PsLookupThreadByThreadId. Now we have it and access to it’s KernelStack member, we’re ready to get our three pointers worth of data.
The first one to grab is the value used as the threads’ esp. A quick peek at the disassembly of Process Explorer’s Procexp100 driver reveals just how arduous this task is:
; starting from Procexp100.sys!+0x983 ; esi = address of the output buffer, edi = (*ethread).kernelStack lea eax,[edi+0Ch] ; copy kernelStack + 12 into eax push eax ; use as argument for MmIsAddressValid mov dword ptr [esi+4],eax ; save in output buffer call dword ptr [PROCEXP100+0x290] ; call MmIsAddressValid via function pointer test al,al ; test for success je PROCEXP100+0x9b4 (f75289b4) ; goto error handler if it returned 0
and that’s it. The kernelStack value has 12 added to it and the result saved in eax. This value is then pushed as argument to a function before being stashed in the output buffer. If you were expecting long complicated code, you’ll be wanting to re-read the beginning of the article because this is about as complex as it gets. For those who don’t know asm, the C equivalent is simply
PVOID threadESP = kernelStack + 3; /* kernelStack is a UINT_PTR* */ outputBuffer->esp = threadESP; if(MmIsAddressValid(threadESP)) { /* grab other two */ }
The other two values are read much in the same way
; starting from Procexp100.sys!+0x994 ; esi = address of the output buffer, edi = ethread->kernelStack mov eax,dword ptr [esi+4] ; thread esp into eax mov ecx,dword ptr [ebp+10h] ; nothing of note, ecx isn't used mov eax,dword ptr [eax] ; dereference of thread esp is ebp (*esp = ebp) mov dword ptr [esi+8],eax ; save ebp into output buffer mov eax,dword ptr [edi+8] ; thread eip = *(kernelStack + 8) mov dword ptr [esi],eax ; save eip into output buffer
and the equivalent C is even simpler still
outputBuffer->ebp = *(PVOID*)threadESP; outputBuffer->eip = *(kernelStack + 2);
and that really is that. After all the buildup, the end was thoroughly anti-climatic. So much so, that’s it’s covered in it’s entirety plus error checking in just 12 lines of assembly and even less in C. However, now we know how the experts do it, we can write an equivalent driver function of our own:
/* output buffer struct */ typedef struct _ThreadCtx { PVOID eip; PVOID esp; PVOID ebp; } ThreadCtx; /* first few members of the ethread / kthread structure are all we need note: this structure does change between OS releases, this layout works on 2000 and XP */ typedef struct _KTHREAD_2000_XP { DISPATCHER_HEADER Header; LIST_ENTRY MutantListHead; PVOID InitialStack; /* Bottom of the stack memory region */ PVOID StackLimit; /* top of the stack memory region */ PVOID TEB; PVOID TlsArray; PVOID KernelStack; /* Current stack top */ /* ... */ } KTHREAD_XP; NTSTATUS GetThreadContext(ULONG* pThreadID, ULONG inSize, ThreadCtx* ctx, ULONG outSize, ULONG* bytesCopied) { NTSTATUS stat = STATUS_SUCCESS; *bytesCopied = 0; /* Initialize bytes copied count*/ /* validate user parameters */ if(pThreadID && ctx && (inSize >= sizeof(*pThreadID)) && (outSize >= sizeof(*ctx))) { PETHREAD pThread = NULL; /* grab a pointer to the ETHREAD */ if(NT_SUCCESS(stat = PsLookupThreadByThreadId((HANDLE)*pThreadID, &pThread))) { /* bend it to our superior version */ KTHREAD_XP* kThread = (KTHREAD_XP*)pThread; /* Do the hustle as outlined above */ UINT_PTR* kernelStack = (UINT_PTR*)kThread->KernelStack; PVOID threadESP = (PVOID)(kernelStack + 3); ctx->esp = threadESP; if(MmIsAddressValid(threadESP)) { ctx->ebp = *(PVOID*)threadESP; ctx->eip = *(PVOID*)(kernelStack + 2); *bytesCopied = sizeof(*ctx); } else { stat = STATUS_ACCESS_VIOLATION; } /* finally release our hold on the thread */ ObDereferenceObject(pThread); } } else { stat = STATUS_INFO_LENGTH_MISMATCH; } return stat; }
And voila we have a simple context grabber function. Using it is simply a matter of hooking it up to an ioctl so it can be called from DeviceIoControl and we’ll also need a simple memcpy wrapper routine. We’ll be investigating how process explorer does this and writing our own next time along with the startup and shutdown code needed by our driver and end up with something functional. Finally, part 3 will tie up the user mode loose ends and give us a sample application.