Just Let It Flow

January 18, 2010

A Whole Heap of Trouble

Filed under: Code,Windows — adeyblue @ 4:21 am

Contents:
  Introduction
  Leak Checking
    XP Functionality
      Enabling
      Stack Traces
    Vista/7 Upgrades
      HEAP_DEBUGGING_INFORMATION
    How it works
      Stack Collection
      Leak Checking
      Fallabilities
  Wrap up

Introduction


The heap manager, every program uses it. Whether hidden behind the CRT, COM, OLE, or the crusty old Local/GlobalAlloc, it’s at the centre of the vast majority of memory related operations. Dealing out memory and reclaiming it are no doubt its most common utilities but it has a few more tricks up its sleeve; some well known, some less so. In this part, the operation of the leak checking facility will be investigated including how to use it, how it works and how its evolved.

Leak Checking – XP Functionality

Enabling

If you were paying attention to MSDN or the DDK when XP was released, you’ll have come across the description of the ShutdownFlags registry value for the infamous Image File Execution Options registry key. I say this because the only useful mention of this value and its purpose has been removed from MSDN despite still being relevant [1]. In an IFEO entry for a program [2], the value and its data enable certain tasks to be performed during clean process termination. The value currently has two modes of operation providing the first line of attack in heap leak checking. Setting it to a data of 1 invokes the leak checking code in ntdll and, by default, produces debugger output like the following when the process shuts down.

HEAP[app.exe]: Inspecting leaks at process shutdown ...
Entry     User      Heap          Size    Req.Size     Flags
------------------------------------------------------------
0016FC98  0016FCA0  00160000     624c8  00000020  busy extra fill user_flag 
00172500  00172508  00160000        58  00000050  busy 
005525B0  005525B8  00550000        88  00000080  busy 
HEAP[app.exe]: 3 leaks detected.

Setting it to 2 or 3 triggers a breakpoint in addition to the above behaviour. While it’s good to know that your app has leaks (or not), I’m sure you’ll agree that a block address and size isn’t much to go on.

Stack Traces

What is really needed to start pinning down the source of the leak is a stack trace of where the allocation was made. Phase 2 of the battle starts by adding another registry entry, the more widely known and documented GlobalFlag value. Its presence controls various app specific or systemwide debug options and is usually controlled by the gflags executable distributed with the Debugging tools for Windows. However, as long as you know the values (here’s the cheat sheet) there’s no reason you can’t add it by hand. The option to enable is “Create user mode stack trace database”, value 0x1000. Setting that value/data in the registry and rerunning the program gives debugger output similar to the following.

HEAP[app.exe]: Inspecting leaks at process shutdown ...
Entry     User      Heap          Size    Req.Size     Flags
------------------------------------------------------------
0016FC98  0016FCA0  00160000     624c8  00000020  busy extra fill user_flag 
00172500  00172508  00160000        58  00000050  busy 
005525B0  005525B8  00550000        88  00000080  busy 
HEAP[app.exe]: 3 leaks detected.

Unfortunate though it is, the leak dumping code doesn’t give a hoot if there are stack traces or not. This is where the previously mentioned breakpoint comes into play. When hit, attach WinDbg and run the “!heap -l” command. The end of the tunnel is now much lighter.

0:000> !heap -l
Searching the memory for potential unreachable busy blocks.
Heap 002e0000
Heap 00010000
Heap 00020000
Heap 00210000
Heap 02300000
Scanning VM ...
Scanning references from 294 busy blocks (0 MBytes) ...
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
002f2730  002f2748  002e0000  002eebd0        28      -           18  LFH;busy  stack_trace
		77cbb234: ntdll!RtlAllocateHeap+0x00000274
		004031a2: app!wmain+0x000000b2
		00409558: app!__tmainCRTStartup+0x000001a8
		0040939f: app!wmainCRTStartup+0x0000000f
		77b61174: kernel32!BaseThreadInitThunk+0x0000000e
		77c9b3f5: ntdll!__RtlUserThreadStart+0x00000070
		77c9b3c8: ntdll!_RtlUserThreadStart+0x0000001b
 
 
002f3bb0  002f3bc8  002e0000  002e0000        68       458        18  busy  stack_trace
		77cbb234: ntdll!RtlAllocateHeap+0x00000274
		75e17589: KERNELBASE!LocalAlloc+0x0000005f
		004031b8: app!wmain+0x000000c8
		00409558: app!__tmainCRTStartup+0x000001a8
		0040939f: app!wmainCRTStartup+0x0000000f
		77b61174: kernel32!BaseThreadInitThunk+0x0000000e
		77c9b3f5: ntdll!__RtlUserThreadStart+0x00000070
		77c9b3c8: ntdll!_RtlUserThreadStart+0x0000001b
 
 
00212620  00212638  00210000  00210000        98       ad0        18  busy  stack_trace
		77cbb234: ntdll!RtlAllocateHeap+0x00000274
		7740ade8: msvcrt!_calloc_impl+0x00000136
		7740ae43: msvcrt!_calloc_crt+0x00000016
		77412015: msvcrt!__onexitinit+0x0000000c
		77411fc8: msvcrt!_cinit+0x0000001e
		77411a94: msvcrt!_core_crt_dll_init+0x000001b2
		7740a48c: msvcrt!_CRTDLL_INIT+0x0000001b
		77c9af24: ntdll!LdrpCallInitRoutine+0x00000014
		77c9fd2e: ntdll!LdrpRunInitializeRoutines+0x0000026f
		77ca90be: ntdll!LdrpInitializeProcess+0x0000138d
		77ca8fc0: ntdll!_LdrpInitialize+0x00000078
		77c9b2c5: ntdll!LdrInitializeThunk+0x00000010

If “!heap -l” doesn’t list the stack traces, “!heap -p -a ⟨blockAddr⟩” will. File and line information can be gotten by issuing a ln command where address is an IP from the stack trace, e.g.

0:000> ln 004031b8
f:\dev-cpp\projects\test\app\app.cpp(29)+0x16
(004031b8) app!wmain+0xc8

Unfortunately though, Visual Studio doesn’t have access to windbg extension commands meaning you have to find the trace manually, which isn’t a fun exercise [3].

And, basic though it is, that’s the common functionality from XP to 7 as far as built-in code about leaks goes.

Vista/7 Upgrades

One detraction from the XP scheme of things is that it’s inflexible. It’s either enabled for all heaps, or none at all, there’s no middle ground. The stack trace collection also has to be explicitly enabled via the registry or in the image header via gflags and captured a static number of frames (32). Things have been redesigned as part of Vista’s upgrades though. The heap manager has sprouted support for finer grained control of debugging and with it, to the delight of non WinDbg users, caller customized printing of stack traces. Best of all, the features can be controlled by a single api, rather than lots of sparsely document registry keys. The main downside being that you still have the set the Shutdownflags in the registry activate the leak checking.

HEAP_DEBUGGING_INFORMATION

Armed with a new info-level value of 0x80000002, HeapSetInformation now takes a HEAP_DEBUGGING_INFORMATION structure to configure the aforementioned options. It is currently not present in the public headers, but can be found in ntdll’s symbols. The layout is as follows (function typedef names are mine):

typedef void (NTAPI*ENUMLEAKPROC)(ULONG always0, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* ppStack);
typedef NTSTATUS (NTAPI*INTERCEPTPROC)(HANDLE hHeap, UINT action, UINT stackFramesToCapture, ULONG* pOutput);
 
typedef struct _HEAP_DEBUGGING_INFORMATION
{
    INTERCEPTPROC InterceptorFunction;
    WORD InterceptorValue;
    DWORD ExtendedOptions;
    DWORD StackTraceDepth;
    SIZE_T MinTotalBlockSize;
    SIZE_T MaxTotalBlockSize;
    ENUMLEAKPROC HeapLeakEnumerationRoutine;
} HEAP_DEBUGGING_INFORMATION;

The purpose of the members are:

  • InterceptorFunction is of no use to the outside world. Besides 0, it can only be set to three functions internal to ntdll, RtlpStackTracePrefix, RtlpStackTraceDatabaseLogPrefix, and RtlpHeapTrkInterceptor. Fortunately, the second of those is the default option when NULL is specified, and the stack traces are captured via it.
  • InterceptorValue is like the above. It is only valid when the above is also valid and non-null, and is used as a number of stack frames to capture for the interceptor.
  • ExtendedOptions controls LFH heap debugging. If this is non-zero, the heap passed to HeapSetInformation is converted to a LFH heap if it not one already. The low byte is then used to affect the DebugFlags member of the LFH’s HEAP_BUCKET structures after being doubled, xor’ed with the current flags, and’ed with 6 and xored with the current flags again. I can’t find where they’re used so the effects are unknown.
  • StackTraceDepth is the number of ips to capture in the stack traces. Only the LOWORD is used. Setting it to 0 doesn’t enable trace collection, but it doesn’t disable it if active either. If the InterceptorFunction is valid, this member is ignored in favour of interceptorValue.
  • MinTotalBlockSize is the minimum bucket size in bytes to apply the above flags to.
  • MaxTotalBlockSize is the maximum bucket size to apply the above flags to. If these members are both 0, the debug flags are applied to every LFH bucket.
  • HeapLeakEnumerationRoutine is the function called for every leak at program termination, even those without stack traces collected. It is called after all DLL_PROCESS_DETACH notifications have been sent, so it can only be reliably implemented by statically linked exe functions and ntdll exports. Note that this member is only used when NULL is passed as the hHeap parameter in HeapSetInformation.

One thing to keep in mind is that the above options are apply only. Once a specific option is enabled it cannot be turned off, except for the HeapLeakEnumerationRoutine which can be reset to NULL. This little sample shows how to initiate the leak checking on all heaps, and print the traces to the debugger.

// don't forget to add the shutdownflags value to the registry
#include <windows.h>
#include <iostream>
#include <cstring>
 
typedef void (NTAPI*RtlRaiseException)(PEXCEPTION_RECORD);
static RtlRaiseException rtlRaiseException;
 
static const ULONG HeapDebugInformation = 0x80000002;
 
typedef void (NTAPI*ENUMLEAKPROC)(ULONG always0, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* ppStack);
typedef NTSTATUS (NTAPI*INTERCEPTPROC)(HANDLE hHeap, UINT action, UINT stackFramesToCapture, PVOID* pOutput);
 
typedef struct _HEAP_DEBUGGING_INFORMATION
{
    INTERCEPTPROC InterceptorFunction;
    WORD InterceptorValue;
    DWORD ExtendedOptions;
    DWORD StackTraceDepth;
    SIZE_T MinTotalBlockSize;
    SIZE_T MaxTotalBlockSize;
    ENUMLEAKPROC HeapLeakEnumerationRoutine;
} HEAP_DEBUGGING_INFORMATION;
 
void DoOutputDebugString(LPCSTR str)
{
	ULONG length = strlen(str) + 1;
	EXCEPTION_RECORD ex = {0};
	ex.ExceptionCode = DBG_PRINTEXCEPTION_C;
	ex.ExceptionAddress = &DoOutputDebugString;
	ex.NumberParameters = 2;
	ex.ExceptionInformation[0] = length;
	ex.ExceptionInformation[1] = reinterpret_cast<ULONG_PTR>(str);
	rtlRaiseException(&ex);
}
 
void NTAPI LeakReport(ULONG, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* pStack)
{
    if(pBlock) // enumeration has ended when a NULL block is passed
    {
        char buffer[0x80];
        _snprintf(buffer, sizeof(buffer), "Leaked block at 0x%p of size %Iu from heap 0x%p\n", pBlock, blockSize, hHeap);
        DoOutputDebugString(buffer);
        if(pStack)
        {
            for(ULONG i = 0; i < numIps; ++i)
            {
                _snprintf(buffer, sizeof(buffer), "%lu. 0x%p\n", i + 1, pStack[i]);
                DoOutputDebugString(buffer);
            }
        }
    }
}
 
int main()
{
    rtlRaiseException = (RtlRaiseException)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "RtlRaiseException");
    HANDLE hHeap = GetProcessHeap();
    HEAP_DEBUGGING_INFORMATION hdi = {0};
    hdi.stackTraceDepth = 20;
    hdi.HeapLeakEnumerationRoutine = &LeakReport;
    HeapSetInformation(NULL, (HEAP_INFORMATION_CLASS)HeapDebugInformation, &hdi, sizeof(hdi));
    LPVOID pHeap = HeapAlloc(hHeap, 0, 0xcc);
    std::cout << "pHeap is at 0x" << pHeap << '\n';
    LPVOID pLocal = LocalAlloc(LPTR, 0x123);
    std::cout << "pLocal is at 0x" << pLocal << '\n';
    LPVOID pGlobal = GlobalAlloc(GPTR, 0x456);
    std::cout << "pGlobal is at 0x" << pGlobal << '\n';
    LPVOID pHeap2 = HeapAlloc(hHeap, 0, 0x80);
    std::cout << "pHeap2 is at 0x" << pHeap2 << '\n';
    LPVOID pNew = new char[77];
    std::cout << "pNew is at 0x" << pNew << '\n';
    LPVOID pMalloc = malloc(89);
    std::cout << "pMalloc is at 0x" << pMalloc << '\n';
}

Produces the following console output:

pHeap is at 0x00422518
pLocal is at 0x00422600
pGlobal is at 0x00422740
pHeap2 is at 0x00422BB0
pNew is at 0x00623720
pMalloc is at 0x00623950

And debugger output:

HEAP[app.exe]: Inspecting leaks at process shutdown ...
Leaked block at 0x0041FC28 of size 32 from heap 0x00410000
Leaked block at 0x0041FCA0 of size 32 from heap 0x00410000
Leaked block at 0x00421CD0 of size 32 from heap 0x00410000
Leaked block at 0x00422518 of size 204 from heap 0x00410000
1. 0x76EAB234
2. 0x00EF3ADC
3. 0x00EFB998
4. 0x00EFB7DF
5. 0x75D31174
6. 0x76E8B3F5
7. 0x76E8B3C8
Leaked block at 0x00422600 of size 291 from heap 0x00410000
1. 0x76EAB234
2. 0x75237589
3. 0x00EF3B39
4. 0x00EFB998
5. 0x00EFB7DF
6. 0x75D31174
7. 0x76E8B3F5
8. 0x76E8B3C8
Leaked block at 0x00422740 of size 1110 from heap 0x00410000
1. 0x76EAB234
2. 0x7523C495
3. 0x00EF3B96
4. 0x00EFB998
5. 0x00EFB7DF
6. 0x75D31174
7. 0x76E8B3F5
8. 0x76E8B3C8
Leaked block at 0x00422BB0 of size 128 from heap 0x00410000
1. 0x76EAB234
2. 0x00EF3BFA
3. 0x00EFB998
4. 0x00EFB7DF
5. 0x75D31174
6. 0x76E8B3F5
7. 0x76E8B3C8
Leaked block at 0x003625B8 of size 128 from heap 0x00360000
HEAP[app.exe]: 8 leaks detected.

The four blocks allocated with the Win32 functions were reported with stack traces intact. Four others were allocated before we set up the debugging, including one from a foreign heap. The two CRT allocations didn’t and, with the debug runtimes, will never show up in the output for reasons described later.

How it Works

Stack Collection

On XP, the function that does the stack collection, RtlLogStackBackTrace, is called directly at various places of interest such as heap creation, allocation, freeing and tag creation depending on the values of the globalflags or heap flags as appropriate. The function calls RtlCaptureStackBacktrace to get the most recent 32 entries on the stack, skipping the first. After being captured, the trace is added to a database (RtlpStackTraceDataBase) along with the number of frames, their hash, and number of times encountered for future reference. It returns a WORD sized index into the database to be saved in the allocation header.

With the addition of the above debug options, Vista+ substitutes the hard-coded call to the stack trace collector for a call to one of the predefined interceptors, if one has been specified for the heap. In the HEAP_DEBUGGING_INFORMATION structure, the reason the InterceptorFunction must be one of the three inside ntdll is that only its position in the table of valid functions is saved. At the relevant times, the function at that index is called with contextual data including an enum defining the current operation. The interceptors that take stack traces are only interested in three of the current 8 defined actions (post-allocation, reallocation, deallocation) and operates almost exactly like the XP version except the trace is entered into a structure array (RtlpHeapStackTraceLog) rather than a database.

Leak Checking

After initializing a bunch of variables, leak checking starts off with RtlpReadProcessHeaps. In XP, the function walks all active heaps for busy regions, while in Vista+ it makes use of the new callback system which will be discussed in the next article. These busy regions are linked up to a leak list (RtlpLeakList) before having their address and size added to a map of active process memory (RtlpProcessMemoryMap) to be used later.

Secondarily, RtlpScanProcessVirtualMemory is called to scan the entire virtual address space of the process for page ranges that were writeable when initially allocated, in a committed state, do not have guard status, and aren’t in the memory map. When such a range is found, each pointer size area is checked against the map to see if it lies within a busy entry recorded during the walk. If so, the busy entry is removed from the leak list onto a list of busy blocks (RtlpBusyList). After the virtual scan is finished RtlpScanHeapAllocBlocks takes over and sweeps the entries on the busy list in the same manner as the virtual addresses. After this second scan, entries left in the leak list are considered to be leaks and reported.

Fallabilities

On the unfortunate side, the method described above will never pick up any leaks from the debug CRT, because it keeps a global pointer as the head of a linked list of all allocations. This is picked up during the virtual address scan and its links by the heap scan, disregarding them as leaks. Another downside to the method is that the checks on virtual space are passed by image code pages which are initially mapped as PAGE_EXECUTE_WRITECOPY. This leads to arbitrary chunks of instructions masquerading as valid pointers and their subsequent removal from the leak list.

Wrap Up

There you have it. A hopefully clearer picture of how to use one of Windows’ built-in debugging tool as well as how it goes about its business.


Notes

[1] The article in question is mirrored in its entirety at OSROnline. The “Built-in User Heap Leak Detection” section mentions the ShutdownFlags value.

[2] Basic setup instructions:
1. Open regedit to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\
2. Add a new key that is the same as your app’s file name
3. In the new key, add the ‘ShutdownFlags’ value as a DWORD with data of 3
4. Add a ‘GlobalFlag’ value as a DWORD with a hexadecimal data of 1000
4a. To limit the amount of heap space used for stack traces, add a StackTraceDatabaseSizeInMb value with data of the desired limit. A data of 0 is ignored.
5. Run your app as normal
6. If you didn’t run your app in the debugger, attach it when the breakpoint is hit. This is signified by a “app has stopped working” dialog in Vista+ and by the following dialog in XP:

Breakpoint Messages Box on XP

Breakpoint Messages Box on XP

[3] To find the trace manually, make sure you have the correct symbols, then:
On Windows XP and Server 2003:

// these instructions are generally a formalised version of the process described
// at http://blogs.msdn.com/duetsupport/archive/2009/03/12/adventures-in-analyzing-high-memory-use-on-a-duet-client.aspx
 
1.
Open the callstack window and double-click on the top entry. It should be ntdll.dll!_RtlDetectHeapLeaks@0...
Open a Watch window and type "(_STACK_TRACE_DATABASE**)_RtlpStackTraceDatabase" (without the quotes) for the name.
Expand the entry twice and scroll down to the EntryIndexArray member
 
2.
Using the pointer value from the 'User' column of the leak output and a memory window, mentally go through the following function to get the block's trace index.
USHORT GetTraceIndex(BYTE* pUserPointer)
{
    // this is essentially RtlGetExtraStuffPointer
    BYTE* pBlockHeader = pUserPointer - 8;
    BYTE* returnedPointer = NULL;
    BYTE flag = *(pBlockHeader + 5);
    if(flag & 8)
    {
        returnedPointer = pBlockHeader - 0x10;
    }
    else
    {
        DWORD index = *((WORD*)pBlockHeader);
        returnedPointer = (pBlockHeader + (index * 8)) - 8;
    }
    WORD traceIndex = (*(ULONG_PTR*)returnedPointer) & 0xFFFF;
    return traceIndex;
}
 
3.
Go back to the watch window from step one and copy the address contained in the EntryIndexArray member to the Immediate window.
Add on to it, "-(sizeof(void*)*TraceIndex" without quotes and substituting TraceIndex for the value gained in step two.
Paste the resulting address into a memory window.
On 32-bit machines, the stack trace starts at (offset 0xc) and is (offset 0xa) entries long.
On 64-bit, the stack trace starts at (offset 0x10) and is (offset 0xe) entries long

On Vista / 7, the callstack data is stored with the block, but accessing it isn’t much easier from a manual standpoint. Luckily with the leak callback enabled, you can copy and paste the below code and call it from within the callback. It returns an array of instruction pointers of ‘numIPs’ length.

typedef struct _StackTraceInfo
{
    ULONG unk;
    ULONG_PTR unk2;
    ULONG numFrames;
    PVOID* ips;
} StackTraceInfo;
 
PVOID* GetStackBackTraceFromUserPointer(BYTE* pUserData, ULONG* numIPs)
{
    // start here from a pointer returned from HeapAlloc or the 'User' field of a leak report
    // DWORD amountToRewind = sizeof(void*) * 2;
    // BYTE* pBlockStart = pUserData - amountToRewind;
    // if((*(pBlockStart + amountToRewind - 1)) == 5)
    // {
        // ULONG offsetfromBeginning = (*(pBlockStart + amountToRewind - 2)) * amountToRewind;
        // pBlockStart -= offsetfromBeginning;
    // }
    // the following is essentially RtlpQueryBlockStackTrace
    // start here from the value in the 'Block' field of the leak report
    DWORD heapEntrySize = sizeof(void*) * 2;
    DWORD index = *(pBlockStart + (heapEntrySize - 1));
    BYTE* endOfHeader = NULL;
    if(index & 0x40)
    {
        index &= 0x3F;
        endOfHeader = pBlockStart + (index * heapEntrySize) + heapEntrySize;
    }
    else if(index == 4)
    {
        endOfHeader = pBlockStart + (index * heapEntrySize) + heapEntrySize;
    }
    else
    {
        endOfHeader = (pBlockStart + heapEntrySize);
    }    
    StackTraceInfo** addressOfStackTrace = (StackTraceInfo*)(pBlockStart + heapEntrySize);
    if(addressOfStackTrace == endOfHeader)
    {
        return NULL;
    }
    WORD type = *(WORD*)(endOfHeader - 8) - 1;
    if(type == 0)
    {
        if(numIPs)
        {
            *numIPs = *(WORD*)(endOfHeader - 6);
        }
        return (PVOID*)(pBlockStart + (heapEntrySize * 2));
    }
    StackTraceInfo* stackTraceEntry = (*addressOfStackTrace);
    if((type != 1) || (stackTraceEntry == NULL))
    {
        return NULL;
    }
    if(numIPs)
    {
        *numIPs = stackTraceEntry->numFrames;
    }
    return stackTraceEntry->ips;
}

2 Comments »

  1. […] Windows has a built in memory leak checker. Without pdb symbols (no idea if PellesC gives you these or not) you'll need to check the stack trace addresses with your linker generated map file. __________________ Blog – Twitter […]

    Pingback by Plugging a memory leak... — January 2, 2011 @ 10:55 pm

  2. I must tell you that it’s hard to find your posts in google,
    i found this one on 22 spot, you should build some quality
    backlinks in order to rank your site, i know how to help you, just
    type in google – Arshumaker SEO tips

    Comment by Jose Mcglawn — June 25, 2014 @ 9:00 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress