Just Let It Flow

May 26, 2012

Process Thread Creation Notification – The Easy Way

Filed under: Code,Windows — adeyblue @ 3:22 am

If what you’re writing already requires a dll, or you can leverage an existing one then you’re already set and can use the fact that DllMain gets called when threads are created and destructed to your advantage. If you’re not, or can’t then you’re pretty much stuck for an answer. Conventional wisdom on the web seems to revolve around hooking CreateThread. However, with several methods of creating threads called at various levels of the Win32 system, that isn’t always sufficient, especially if you want to execute code in the thread context.

Dll thread_attach notifications work because when threads are created and torn down, ntdll loops around the internal structures corresponding to each module loaded in the process and calls their entry point if they meet certain criteria. The structure for the exe is included in the enumeration but as it doesn’t identify as a dll, its entry point isn’t called.

The thing to do then, is modify it to a) look like a dll and b) make it think our entry point is a DllMain. Usually this poking around officially undocumented stuff is at least a slight pain (after all you’re not meant to be touching it), in this case however the structure is a single call away.

// in ntdll.dll
EXTERN_C
NTSYSAPI
NTSTATUS
NTAPI
LdrFindEntryForAddress(
    HMODULE hMod,
    LDR_DATA_TABLE_ENTRY** ppEntry
);

You give it an address in a module, and it gives you a pointer to the structure. Seems like a fair trade to me. Here’s what’s in it.

struct LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY InLoadOrderModuleList;
    LIST_ENTRY InMemoryOrderModuleList;
    LIST_ENTRY InInitializationOrderModuleList;
    PVOID BaseAddress;
    PVOID EntryPoint;
    ULONG SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;
    ULONG Flags;
    SHORT LoadCount;
    SHORT TlsIndex;
    union
    {
        LIST_ENTRY HashLinks;
        PVOID SectionPointer;
    };
    ULONG Checksum;
    union
    {
        ULONG TimeDataStamp;
        PVOID LoadedImports;
    };
    PVOID EntryPointActivationContext;
    PVOID PatchInformation;
};

This is the XP version layout. More recent versions have appended extra fields, but we ignore them. The three fields of interest are BaseAddress, EntryPoint, and Flags.

Entrypoint and Flags are self-explanatory, BaseAddress though? Yep, as well as checking the flags, the looping code also excludes those whose BaseAddress is the same as GetModuleHandle(NULL). This might sound like it could be a bit disabling to the program, especially if you know that GetModuleHandle can also loop over these structures. It isn’t in practice thought, only affecting calls to GetModuleHandle(), the value returned from GetModuleHandle(NULL) is cached elsewhere so modifying the entry leaves it unaffected.

Setting the new entrypoint to a DllMain-a-like is easy enough, and the BaseAddress value can be changed to BaseAddress += 2 (this enables it to still be used in kernel32 functions if anybody ever does GetModuleHandle()), yet we’ve neglected the flag values.

//
// Loader Data Table Entry Flags, from ReactOS
//
#define LDRP_STATIC_LINK                        0x00000002
#define LDRP_IMAGE_DLL                          0x00000004
#define LDRP_LOAD_IN_PROGRESS                   0x00001000
#define LDRP_UNLOAD_IN_PROGRESS                 0x00002000
#define LDRP_ENTRY_PROCESSED                    0x00004000
#define LDRP_ENTRY_INSERTED                     0x00008000
#define LDRP_CURRENT_LOAD                       0x00010000
#define LDRP_FAILED_BUILTIN_LOAD                0x00020000
#define LDRP_DONT_CALL_FOR_THREADS              0x00040000
#define LDRP_PROCESS_ATTACH_CALLED              0x00080000
#define LDRP_DEBUG_SYMBOLS_LOADED               0x00100000
#define LDRP_IMAGE_NOT_AT_BASE                  0x00200000
#define LDRP_COR_IMAGE                          0x00400000
#define LDRP_COR_OWNS_UNMAP                     0x00800000
#define LDRP_SYSTEM_MAPPED                      0x01000000
#define LDRP_IMAGE_VERIFYING                    0x02000000
#define LDRP_DRIVER_DEPENDENT_DLL               0x04000000
#define LDRP_ENTRY_NATIVE                       0x08000000
#define LDRP_REDIRECTED                         0x10000000
#define LDRP_NON_PAGED_DEBUG_INFO               0x20000000
#define LDRP_MM_LOADED                          0x40000000
#define LDRP_COMPAT_DATABASE_PROCESSED          0x80000000

There are a lot of them, yet we only need concern ourselves with three. LDRP_IMAGE_DLL and LDRP_PROCESS_ATTACH_CALLED need to be set to signal that we are a dll and that we’ve had our init code called. LDRP_DONT_CALL_FOR_THREADS needs to be clear, because being called for threads is exactly what we’re after!

So, putting it all into motion:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <winternl.h> // for Unicode_string
#include <cstdio>
 
#define LDRP_IMAGE_DLL                          0x00000004
#define LDRP_DONT_CALL_FOR_THREADS              0x00040000
#define LDRP_PROCESS_ATTACH_CALLED              0x00080000
 
struct LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY InLoadOrderModuleList;
    LIST_ENTRY InMemoryOrderModuleList;
    LIST_ENTRY InInitializationOrderModuleList;
    PVOID BaseAddress;
    PVOID EntryPoint;
    ULONG SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;
    ULONG Flags;
    SHORT LoadCount;
    SHORT TlsIndex;
    union
    {
        LIST_ENTRY HashLinks;
        PVOID SectionPointer;
    };
    ULONG Checksum;
    union
    {
        ULONG TimeDataStamp;
        PVOID LoadedImports;
    };
    PVOID EntryPointActivationContext;
    PVOID PatchInformation;
};
 
BOOL APIENTRY ThreadAndShutdownNotify(HMODULE hMod, DWORD reason, PVOID pDynamic)
{
    char buffer[100];
    switch(reason)
    {
        case DLL_THREAD_ATTACH:
        {
            sprintf(buffer, "Thread attach : %lu\n", GetCurrentThreadId());
        }
        break;
        case DLL_THREAD_DETACH:
        {
            sprintf(buffer, "Thread detach : %lu\n", GetCurrentThreadId());
        }
        break;
        case DLL_PROCESS_DETACH:
        {
            sprintf(buffer, "Process detach : %lu\n", GetCurrentThreadId());
        }
        break;
    }
    OutputDebugStringA(buffer);
    puts(buffer);
    return TRUE;
}
 
DWORD WINAPI WaitThread(PVOID p)
{
    return WaitForSingleObject((HANDLE)p, INFINITE);
}
 
typedef NTSTATUS (NTAPI*pfnLdrFindEntryForAddress)(HMODULE hMod, LDR_DATA_TABLE_ENTRY** ppLdrData);
 
int main()
{
    HMODULE hNtdll = GetModuleHandle(L"ntdll.dll");
    pfnLdrFindEntryForAddress LdrFindEntryForAddress = (pfnLdrFindEntryForAddress)GetProcAddress(hNtdll, "LdrFindEntryForAddress");
    LDR_DATA_TABLE_ENTRY* pEntry = NULL;
    if(NT_SUCCESS(LdrFindEntryForAddress(GetModuleHandle(NULL), &pEntry)))
    {
        pEntry->EntryPoint = (PVOID)&ThreadAndShutdownNotify;
        pEntry->Flags |= LDRP_PROCESS_ATTACH_CALLED | LDRP_IMAGE_DLL;
        pEntry->Flags &= ~(LDRP_DONT_CALL_FOR_THREADS);
        pEntry->BaseAddress = (PVOID)(((ULONG_PTR)pEntry->BaseAddress) + 2);
    }
    else
    {
        return puts("Something's strange, in the neighbourhood, and my phone doesn't work!!");
    }
    HANDLE hEvent4 = CreateEvent(NULL, TRUE, FALSE, NULL);
    HANDLE hThread[10];
    for(DWORD i = 0; i < ARRAYSIZE(hThread); ++i)
    {
        hThread[i] = CreateThread(NULL, 0, &WaitThread, hEvent4, 0, NULL);
    }
    SetEvent(hEvent4);
    WaitForMultipleObjects(ARRAYSIZE(hThread), hThread, TRUE, INFINITE);
    for(DWORD i = 0; i < ARRAYSIZE(hThread); ++i)
    {
        CloseHandle(hThread[i]);
    }
    CloseHandle(hEvent4);
    return 0;
}

And running it produces:

Thread attach : 4208
Thread attach : 3052
Thread attach : 2076
Thread attach : 3144
Thread attach : 1476
Thread attach : 516
Thread attach : 4224
Thread attach : 4320
Thread attach : 3620
Thread attach : 1280

Yay… wait a minute. Where are the detach and shutdown notifications? So much for mimicking DllMain. See those LIST_ENTRY structures at the head of the LDR_DATA_TABLE_ENTRY structure, those are the three different orders in which the module entries are linked. LoadOrder is obvious, whereas non-obviously MemoryOrder is the same as LoadOrder, and InitializationOrder is the order that the DllMain’s were called…, ah. Our LDR_DATA_TABLE_ENTRY is the first entry of the first two lists however since it isn’t a kosher dll, it’s not present in the third.

ThreadAttach notifications work since ntdll walks the LoadOrder module list when calling entry points for thread attaches. The other notifications don’t work, since ntdll walks the InitOrder list for those. The solution therefore is to add ourself to that one, and that’s quite a simple op too. Where we should insert outselves is up to you, before kernel32 seems as good a place as any though.

In summation:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <winternl.h> // for Unicode_string
#include <cstdio>
 
#define LDRP_IMAGE_DLL                          0x00000004
#define LDRP_DONT_CALL_FOR_THREADS              0x00040000
#define LDRP_PROCESS_ATTACH_CALLED              0x00080000
 
struct LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY InLoadOrderModuleList;
    LIST_ENTRY InMemoryOrderModuleList;
    LIST_ENTRY InInitializationOrderModuleList;
    PVOID BaseAddress;
    PVOID EntryPoint;
    ULONG SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;
    ULONG Flags;
    SHORT LoadCount;
    SHORT TlsIndex;
    union
    {
        LIST_ENTRY HashLinks;
        PVOID SectionPointer;
    };
    ULONG Checksum;
    union
    {
        ULONG TimeDataStamp;
        PVOID LoadedImports;
    };
    PVOID EntryPointActivationContext;
    PVOID PatchInformation;
};
 
BOOL APIENTRY ThreadAndShutdownNotify(HMODULE hMod, DWORD reason, PVOID pDynamic)
{
    char buffer[100];
    switch(reason)
    {
        case DLL_THREAD_ATTACH:
        {
            sprintf(buffer, "Thread attach : %lu\n", GetCurrentThreadId());
        }
        break;
        case DLL_THREAD_DETACH:
        {
            sprintf(buffer, "Thread detach : %lu\n", GetCurrentThreadId());
        }
        break;
        case DLL_PROCESS_DETACH:
        {
            sprintf(buffer, "Process detach : %lu\n", GetCurrentThreadId());
        }
        break;
    }
    OutputDebugStringA(buffer);
    puts(buffer);
    return TRUE;
}
 
DWORD WINAPI WaitThread(PVOID p)
{
    return WaitForSingleObject((HANDLE)p, INFINITE);
}
 
void InsertIntoList(LIST_ENTRY* pOurListEntry, LIST_ENTRY* pK32ListEntry)
{
    // dll detach are called in reverse list order
    // so after Kernel32 is before it in the list
    // our forward link wants to point to whatever is after
    // k32ListEntry and our back link wants to point to pK32ListEn
    LIST_ENTRY* pEntryToInsertAfter = pK32ListEntry->Flink;
    pOurListEntry->Flink = pEntryToInsertAfter;
    pOurListEntry->Blink = pEntryToInsertAfter->Blink;
    pEntryToInsertAfter->Blink = pOurListEntry;
    pOurListEntry->Blink->Flink = pOurListEntry;
}
 
typedef NTSTATUS (NTAPI*pfnLdrFindEntryForAddress)(HMODULE hMod, LDR_DATA_TABLE_ENTRY** ppLdrData);
 
int main()
{
    HMODULE hNtdll = GetModuleHandle(L"ntdll.dll");
    pfnLdrFindEntryForAddress LdrFindEntryForAddress = (pfnLdrFindEntryForAddress)GetProcAddress(hNtdll, "LdrFindEntryForAddress");
    LDR_DATA_TABLE_ENTRY* pEntry = NULL;
    if(NT_SUCCESS(LdrFindEntryForAddress(GetModuleHandle(NULL), &pEntry)))
    {
        pEntry->EntryPoint = (PVOID)&ThreadAndShutdownNotify;
        pEntry->Flags |= LDRP_PROCESS_ATTACH_CALLED | LDRP_IMAGE_DLL;
        pEntry->Flags &= ~(LDRP_DONT_CALL_FOR_THREADS);
        pEntry->BaseAddress = (PVOID)(((ULONG_PTR)pEntry->BaseAddress) + 2);
        LDR_DATA_TABLE_ENTRY* pK32Entry = NULL;
        LdrFindEntryForAddress(GetModuleHandle(L"kernel32.dll"), &pK32Entry);
        InsertIntoList(&pOurEntry->InInitializationOrderModuleList, &pK32Entry->InInitializationOrderModuleList);
 
    }
    else
    {
        return puts("Something's strange, in the neighbourhood, and my phone doesn't work!!");
    }
    HANDLE hEvent4 = CreateEvent(NULL, TRUE, FALSE, NULL);
    HANDLE hThread[10];
    for(DWORD i = 0; i < ARRAYSIZE(hThread); ++i)
    {
        hThread[i] = CreateThread(NULL, 0, &WaitThread, hEvent4, 0, NULL);
    }
    SetEvent(hEvent4);
    WaitForMultipleObjects(ARRAYSIZE(hThread), hThread, TRUE, INFINITE);
    for(DWORD i = 0; i < ARRAYSIZE(hThread); ++i)
    {
        CloseHandle(hThread[i]);
    }
    CloseHandle(hEvent4);
    return 0;
}

Output:

Thread attach : 4856
Thread attach : 3896
Thread attach : 4140
Thread attach : 3488
Thread attach : 188
Thread detach : 188
Thread attach : 3272
Thread attach : 3376
Thread attach : 1632
Thread detach : 4856
Thread attach : 4120
Thread detach : 3896
Thread detach : 4140
Thread detach : 3488
Thread attach : 3484
Thread detach : 3272
Thread detach : 3376
Thread detach : 1632
Thread detach : 4120
Thread detach : 3484
Process detach : 4720

Yay, for realsies this time. So that’s how you do it, with help from some jiggery poker. No hooks, no external code, just pure, clean, faffing around.

2 Comments »

  1. If you don’t want to mess with “undocumented” ntdll functions, you can use debugging functions to debug your own process. Then you’ll also will get thread attach/detach notifications: http://msdn.microsoft.com/en-us/library/windows/desktop/ms679308.aspx

    Comment by Martins — May 26, 2012 @ 3:50 am

  2. You can’t use those functions. The debugging functions suspend all threads in the debugged process when an event happens, suspending the thread that’s trying to handle the notification in the process. Indeed, in recent Windows the first thing NtDebugActiveProcess does is check whether you’re trying to suspend the current process. If so, it rejects the attempt with STATUS_ACCESS_DENIED.

    Comment by adeyblue — May 26, 2012 @ 10:16 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress