Grabbing Kernel Thread Call Stacks the Process Explorer Way

February 11, 2009

Grabbing Kernel Thread Call Stacks the Process Explorer Way – Part 1

Filed under: Code,Windows — adeyblue @ 5:57 am

Software versions: XP SP3 32-bit, Process Explorer 10.21

Part 1: Getting the CONTEXT

If you’ve used Process Explorer chances are you’ve checked out a thread stack or two. If you’ve ever tried to implement something similar yourself, the combo of SuspendThread, GetThreadContext, ResumeThread, and StackWalk64 have more than likely done a sterling job getting a user mode trace. But what about further up the stack, or those threads locked in kernel mode?

The DACL for threads belonging to the system process allows the THREAD_GET_CONTEXT permission for the administrators group ^[1], so the same formula should work, but alas it doesn’t. Any attempts to pass a thread handle with such rights to GetThreadContext returns FALSE with a last error of ERROR_INVALID_HANDLE. So how does Process Explorer handle it? The answer is it doesn’t, not really. StackWalk64 only uses the ebp, eip, and esp members of the CONTEXT struct so that’s all it fills in with help from a driver and as is to be expected from SysInternals, a whole bunch of undocumented behaviour.

To get the stack trace, we obviously need to know where the stack area is for a given thread. Like its user mode compatriot the TEB (actually the NT_TIB part), the ETHREAD structure has 2 members which point to the top and the bottom of the address range which delimts the stack. Unlike the TEB however, it has a third member pointing to current top of the stack, and this pointer is the golden egg.

Before we use it, we first need to do the simple step of gaining an ETHREAD when given a handle or ID. Luckily, there are documented functions for each. For a handle, the conversion is done via ObReferenceObjectByHandle and for an ID by PsLookupThreadByThreadId. Now we have it and access to it’s KernelStack member, we’re ready to get our three pointers worth of data.

The first one to grab is the value used as the threads’ esp. A quick peek at the disassembly of Process Explorer’s Procexp100 driver reveals just how arduous this task is:

; starting from Procexp100.sys!+0x983
; esi = address of the output buffer, edi = (*ethread).kernelStack
lea     eax,[edi+0Ch] ; copy kernelStack + 12 into eax
push    eax ; use as argument for MmIsAddressValid
mov     dword ptr [esi+4],eax ; save in output buffer
call    dword ptr [PROCEXP100+0x290] ; call MmIsAddressValid via function pointer
test    al,al ; test for success
je      PROCEXP100+0x9b4 (f75289b4) ; goto error handler if it returned 0

and that’s it. The kernelStack value has 12 added to it and the result saved in eax. This value is then pushed as argument to a function before being stashed in the output buffer. If you were expecting long complicated code, you’ll be wanting to re-read the beginning of the article because this is about as complex as it gets. For those who don’t know asm, the C equivalent is simply

PVOID threadESP = kernelStack + 3; /* kernelStack is a UINT_PTR* */
outputBuffer->esp = threadESP;
if(MmIsAddressValid(threadESP))
{
    /* grab other two */
}

The other two values are read much in the same way

; starting from Procexp100.sys!+0x994
; esi = address of the output buffer, edi = ethread->kernelStack
mov     eax,dword ptr [esi+4] ; thread esp into eax
mov     ecx,dword ptr [ebp+10h] ; nothing of note, ecx isn't used
mov     eax,dword ptr [eax] ; dereference of thread esp is ebp (*esp = ebp)
mov     dword ptr [esi+8],eax ; save ebp into output buffer
mov     eax,dword ptr [edi+8] ; thread eip = *(kernelStack + 8)
mov     dword ptr [esi],eax ; save eip into output buffer

and the equivalent C is even simpler still

outputBuffer->ebp = *(PVOID*)threadESP;
outputBuffer->eip = *(kernelStack + 2);

and that really is that. After all the buildup, the end was thoroughly anti-climatic. So much so, that’s it’s covered in it’s entirety plus error checking in just 12 lines of assembly and even less in C. However, now we know how the experts do it, we can write an equivalent driver function of our own:

/* output buffer struct */
typedef struct _ThreadCtx
{
    PVOID eip;
    PVOID esp;
    PVOID ebp;
} ThreadCtx;
 
/* first few members of the ethread / kthread structure are all we need
   note: this structure does change between OS releases, this layout works on 2000 and XP
*/
typedef struct _KTHREAD_2000_XP
{
    DISPATCHER_HEADER Header;
    LIST_ENTRY MutantListHead;
    PVOID InitialStack; /* Bottom of the stack memory region */
    PVOID StackLimit; /* top of the stack memory region */
    PVOID TEB;
    PVOID TlsArray;
    PVOID KernelStack; /* Current stack top */
    /* ... */
} KTHREAD_XP;
 
NTSTATUS GetThreadContext(ULONG* pThreadID, ULONG inSize, ThreadCtx* ctx, ULONG outSize, ULONG* bytesCopied)
{
    NTSTATUS stat = STATUS_SUCCESS;
    *bytesCopied = 0; /* Initialize bytes copied count*/
    /* validate user parameters */
    if(pThreadID && ctx && (inSize >= sizeof(*pThreadID)) && (outSize >= sizeof(*ctx)))
    {
        PETHREAD pThread = NULL;
        /* grab a pointer to the ETHREAD */
        if(NT_SUCCESS(stat = PsLookupThreadByThreadId((HANDLE)*pThreadID, &pThread)))
        {
            /* bend it to our superior version */
            KTHREAD_XP* kThread = (KTHREAD_XP*)pThread;
            /* Do the hustle as outlined above */
            UINT_PTR* kernelStack = (UINT_PTR*)kThread->KernelStack;
            PVOID threadESP = (PVOID)(kernelStack + 3);
            ctx->esp = threadESP;
            if(MmIsAddressValid(threadESP))
            {
                ctx->ebp = *(PVOID*)threadESP;
                ctx->eip = *(PVOID*)(kernelStack + 2);
                *bytesCopied = sizeof(*ctx);
            }
            else
            {
                stat = STATUS_ACCESS_VIOLATION;
            }
            /* finally release our hold on the thread */
            ObDereferenceObject(pThread);
        }
    }
    else
    {
        stat = STATUS_INFO_LENGTH_MISMATCH;
    }
    return stat;
}

And voila we have a simple context grabber function. Using it is simply a matter of hooking it up to an ioctl so it can be called from DeviceIoControl and we’ll also need a simple memcpy wrapper routine. We’ll be investigating how process explorer does this and writing our own next time along with the startup and shutdown code needed by our driver and end up with something functional. Finally, part 3 will tie up the user mode loose ends and give us a sample application.

Just Let It Flow

February 11, 2009

Grabbing Kernel Thread Call Stacks the Process Explorer Way – Part 1

No Comments