Contents:
Introduction
Leak Checking
XP Functionality
Enabling
Stack Traces
Vista/7 Upgrades
HEAP_DEBUGGING_INFORMATION
How it works
Stack Collection
Leak Checking
Fallabilities
Wrap up
Introduction
The heap manager, every program uses it. Whether hidden behind the CRT, COM, OLE, or the crusty old Local/GlobalAlloc, it’s at the centre of the vast majority of memory related operations. Dealing out memory and reclaiming it are no doubt its most common utilities but it has a few more tricks up its sleeve; some well known, some less so. In this part, the operation of the leak checking facility will be investigated including how to use it, how it works and how its evolved.
Leak Checking – XP Functionality
Enabling
If you were paying attention to MSDN or the DDK when XP was released, you’ll have come across the description of the ShutdownFlags registry value for the infamous Image File Execution Options registry key. I say this because the only useful mention of this value and its purpose has been removed from MSDN despite still being relevant [1]. In an IFEO entry for a program [2], the value and its data enable certain tasks to be performed during clean process termination. The value currently has two modes of operation providing the first line of attack in heap leak checking. Setting it to a data of 1 invokes the leak checking code in ntdll and, by default, produces debugger output like the following when the process shuts down.
HEAP[app.exe]: Inspecting leaks at process shutdown ... Entry User Heap Size Req.Size Flags ------------------------------------------------------------ 0016FC98 0016FCA0 00160000 624c8 00000020 busy extra fill user_flag 00172500 00172508 00160000 58 00000050 busy 005525B0 005525B8 00550000 88 00000080 busy HEAP[app.exe]: 3 leaks detected.
Setting it to 2 or 3 triggers a breakpoint in addition to the above behaviour. While it’s good to know that your app has leaks (or not), I’m sure you’ll agree that a block address and size isn’t much to go on.
Stack Traces
What is really needed to start pinning down the source of the leak is a stack trace of where the allocation was made. Phase 2 of the battle starts by adding another registry entry, the more widely known and documented GlobalFlag value. Its presence controls various app specific or systemwide debug options and is usually controlled by the gflags executable distributed with the Debugging tools for Windows. However, as long as you know the values (here’s the cheat sheet) there’s no reason you can’t add it by hand. The option to enable is “Create user mode stack trace database”, value 0x1000. Setting that value/data in the registry and rerunning the program gives debugger output similar to the following.
HEAP[app.exe]: Inspecting leaks at process shutdown ... Entry User Heap Size Req.Size Flags ------------------------------------------------------------ 0016FC98 0016FCA0 00160000 624c8 00000020 busy extra fill user_flag 00172500 00172508 00160000 58 00000050 busy 005525B0 005525B8 00550000 88 00000080 busy HEAP[app.exe]: 3 leaks detected.
Unfortunate though it is, the leak dumping code doesn’t give a hoot if there are stack traces or not. This is where the previously mentioned breakpoint comes into play. When hit, attach WinDbg and run the “!heap -l” command. The end of the tunnel is now much lighter.
0:000> !heap -l Searching the memory for potential unreachable busy blocks. Heap 002e0000 Heap 00010000 Heap 00020000 Heap 00210000 Heap 02300000 Scanning VM ... Scanning references from 294 busy blocks (0 MBytes) ... Entry User Heap Segment Size PrevSize Unused Flags ----------------------------------------------------------------------------- 002f2730 002f2748 002e0000 002eebd0 28 - 18 LFH;busy stack_trace 77cbb234: ntdll!RtlAllocateHeap+0x00000274 004031a2: app!wmain+0x000000b2 00409558: app!__tmainCRTStartup+0x000001a8 0040939f: app!wmainCRTStartup+0x0000000f 77b61174: kernel32!BaseThreadInitThunk+0x0000000e 77c9b3f5: ntdll!__RtlUserThreadStart+0x00000070 77c9b3c8: ntdll!_RtlUserThreadStart+0x0000001b 002f3bb0 002f3bc8 002e0000 002e0000 68 458 18 busy stack_trace 77cbb234: ntdll!RtlAllocateHeap+0x00000274 75e17589: KERNELBASE!LocalAlloc+0x0000005f 004031b8: app!wmain+0x000000c8 00409558: app!__tmainCRTStartup+0x000001a8 0040939f: app!wmainCRTStartup+0x0000000f 77b61174: kernel32!BaseThreadInitThunk+0x0000000e 77c9b3f5: ntdll!__RtlUserThreadStart+0x00000070 77c9b3c8: ntdll!_RtlUserThreadStart+0x0000001b 00212620 00212638 00210000 00210000 98 ad0 18 busy stack_trace 77cbb234: ntdll!RtlAllocateHeap+0x00000274 7740ade8: msvcrt!_calloc_impl+0x00000136 7740ae43: msvcrt!_calloc_crt+0x00000016 77412015: msvcrt!__onexitinit+0x0000000c 77411fc8: msvcrt!_cinit+0x0000001e 77411a94: msvcrt!_core_crt_dll_init+0x000001b2 7740a48c: msvcrt!_CRTDLL_INIT+0x0000001b 77c9af24: ntdll!LdrpCallInitRoutine+0x00000014 77c9fd2e: ntdll!LdrpRunInitializeRoutines+0x0000026f 77ca90be: ntdll!LdrpInitializeProcess+0x0000138d 77ca8fc0: ntdll!_LdrpInitialize+0x00000078 77c9b2c5: ntdll!LdrInitializeThunk+0x00000010
If “!heap -l
” doesn’t list the stack traces, “!heap -p -a 〈blockAddr〉
” will. File and line information can be gotten by issuing a ln
command where address is an IP from the stack trace, e.g.
0:000> ln 004031b8 f:\dev-cpp\projects\test\app\app.cpp(29)+0x16 (004031b8) app!wmain+0xc8
Unfortunately though, Visual Studio doesn’t have access to windbg extension commands meaning you have to find the trace manually, which isn’t a fun exercise [3].
And, basic though it is, that’s the common functionality from XP to 7 as far as built-in code about leaks goes.
Vista/7 Upgrades
One detraction from the XP scheme of things is that it’s inflexible. It’s either enabled for all heaps, or none at all, there’s no middle ground. The stack trace collection also has to be explicitly enabled via the registry or in the image header via gflags and captured a static number of frames (32). Things have been redesigned as part of Vista’s upgrades though. The heap manager has sprouted support for finer grained control of debugging and with it, to the delight of non WinDbg users, caller customized printing of stack traces. Best of all, the features can be controlled by a single api, rather than lots of sparsely document registry keys. The main downside being that you still have the set the Shutdownflags in the registry activate the leak checking.
HEAP_DEBUGGING_INFORMATION
Armed with a new info-level value of 0x80000002, HeapSetInformation now takes a HEAP_DEBUGGING_INFORMATION structure to configure the aforementioned options. It is currently not present in the public headers, but can be found in ntdll’s symbols. The layout is as follows (function typedef names are mine):
typedef void (NTAPI*ENUMLEAKPROC)(ULONG always0, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* ppStack); typedef NTSTATUS (NTAPI*INTERCEPTPROC)(HANDLE hHeap, UINT action, UINT stackFramesToCapture, ULONG* pOutput); typedef struct _HEAP_DEBUGGING_INFORMATION { INTERCEPTPROC InterceptorFunction; WORD InterceptorValue; DWORD ExtendedOptions; DWORD StackTraceDepth; SIZE_T MinTotalBlockSize; SIZE_T MaxTotalBlockSize; ENUMLEAKPROC HeapLeakEnumerationRoutine; } HEAP_DEBUGGING_INFORMATION;
The purpose of the members are:
- InterceptorFunction is of no use to the outside world. Besides 0, it can only be set to three functions internal to ntdll, RtlpStackTracePrefix, RtlpStackTraceDatabaseLogPrefix, and RtlpHeapTrkInterceptor. Fortunately, the second of those is the default option when NULL is specified, and the stack traces are captured via it.
- InterceptorValue is like the above. It is only valid when the above is also valid and non-null, and is used as a number of stack frames to capture for the interceptor.
- ExtendedOptions controls LFH heap debugging. If this is non-zero, the heap passed to HeapSetInformation is converted to a LFH heap if it not one already. The low byte is then used to affect the DebugFlags member of the LFH’s HEAP_BUCKET structures after being doubled, xor’ed with the current flags, and’ed with 6 and xored with the current flags again. I can’t find where they’re used so the effects are unknown.
- StackTraceDepth is the number of ips to capture in the stack traces. Only the LOWORD is used. Setting it to 0 doesn’t enable trace collection, but it doesn’t disable it if active either. If the InterceptorFunction is valid, this member is ignored in favour of interceptorValue.
- MinTotalBlockSize is the minimum bucket size in bytes to apply the above flags to.
- MaxTotalBlockSize is the maximum bucket size to apply the above flags to. If these members are both 0, the debug flags are applied to every LFH bucket.
- HeapLeakEnumerationRoutine is the function called for every leak at program termination, even those without stack traces collected. It is called after all DLL_PROCESS_DETACH notifications have been sent, so it can only be reliably implemented by statically linked exe functions and ntdll exports. Note that this member is only used when NULL is passed as the hHeap parameter in HeapSetInformation.
One thing to keep in mind is that the above options are apply only. Once a specific option is enabled it cannot be turned off, except for the HeapLeakEnumerationRoutine which can be reset to NULL. This little sample shows how to initiate the leak checking on all heaps, and print the traces to the debugger.
// don't forget to add the shutdownflags value to the registry #include <windows.h> #include <iostream> #include <cstring> typedef void (NTAPI*RtlRaiseException)(PEXCEPTION_RECORD); static RtlRaiseException rtlRaiseException; static const ULONG HeapDebugInformation = 0x80000002; typedef void (NTAPI*ENUMLEAKPROC)(ULONG always0, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* ppStack); typedef NTSTATUS (NTAPI*INTERCEPTPROC)(HANDLE hHeap, UINT action, UINT stackFramesToCapture, PVOID* pOutput); typedef struct _HEAP_DEBUGGING_INFORMATION { INTERCEPTPROC InterceptorFunction; WORD InterceptorValue; DWORD ExtendedOptions; DWORD StackTraceDepth; SIZE_T MinTotalBlockSize; SIZE_T MaxTotalBlockSize; ENUMLEAKPROC HeapLeakEnumerationRoutine; } HEAP_DEBUGGING_INFORMATION; void DoOutputDebugString(LPCSTR str) { ULONG length = strlen(str) + 1; EXCEPTION_RECORD ex = {0}; ex.ExceptionCode = DBG_PRINTEXCEPTION_C; ex.ExceptionAddress = &DoOutputDebugString; ex.NumberParameters = 2; ex.ExceptionInformation[0] = length; ex.ExceptionInformation[1] = reinterpret_cast<ULONG_PTR>(str); rtlRaiseException(&ex); } void NTAPI LeakReport(ULONG, HANDLE hHeap, PVOID pBlock, SIZE_T blockSize, ULONG numIps, PVOID* pStack) { if(pBlock) // enumeration has ended when a NULL block is passed { char buffer[0x80]; _snprintf(buffer, sizeof(buffer), "Leaked block at 0x%p of size %Iu from heap 0x%p\n", pBlock, blockSize, hHeap); DoOutputDebugString(buffer); if(pStack) { for(ULONG i = 0; i < numIps; ++i) { _snprintf(buffer, sizeof(buffer), "%lu. 0x%p\n", i + 1, pStack[i]); DoOutputDebugString(buffer); } } } } int main() { rtlRaiseException = (RtlRaiseException)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "RtlRaiseException"); HANDLE hHeap = GetProcessHeap(); HEAP_DEBUGGING_INFORMATION hdi = {0}; hdi.stackTraceDepth = 20; hdi.HeapLeakEnumerationRoutine = &LeakReport; HeapSetInformation(NULL, (HEAP_INFORMATION_CLASS)HeapDebugInformation, &hdi, sizeof(hdi)); LPVOID pHeap = HeapAlloc(hHeap, 0, 0xcc); std::cout << "pHeap is at 0x" << pHeap << '\n'; LPVOID pLocal = LocalAlloc(LPTR, 0x123); std::cout << "pLocal is at 0x" << pLocal << '\n'; LPVOID pGlobal = GlobalAlloc(GPTR, 0x456); std::cout << "pGlobal is at 0x" << pGlobal << '\n'; LPVOID pHeap2 = HeapAlloc(hHeap, 0, 0x80); std::cout << "pHeap2 is at 0x" << pHeap2 << '\n'; LPVOID pNew = new char[77]; std::cout << "pNew is at 0x" << pNew << '\n'; LPVOID pMalloc = malloc(89); std::cout << "pMalloc is at 0x" << pMalloc << '\n'; }
Produces the following console output:
pHeap is at 0x00422518 pLocal is at 0x00422600 pGlobal is at 0x00422740 pHeap2 is at 0x00422BB0 pNew is at 0x00623720 pMalloc is at 0x00623950
And debugger output:
HEAP[app.exe]: Inspecting leaks at process shutdown ... Leaked block at 0x0041FC28 of size 32 from heap 0x00410000 Leaked block at 0x0041FCA0 of size 32 from heap 0x00410000 Leaked block at 0x00421CD0 of size 32 from heap 0x00410000 Leaked block at 0x00422518 of size 204 from heap 0x00410000 1. 0x76EAB234 2. 0x00EF3ADC 3. 0x00EFB998 4. 0x00EFB7DF 5. 0x75D31174 6. 0x76E8B3F5 7. 0x76E8B3C8 Leaked block at 0x00422600 of size 291 from heap 0x00410000 1. 0x76EAB234 2. 0x75237589 3. 0x00EF3B39 4. 0x00EFB998 5. 0x00EFB7DF 6. 0x75D31174 7. 0x76E8B3F5 8. 0x76E8B3C8 Leaked block at 0x00422740 of size 1110 from heap 0x00410000 1. 0x76EAB234 2. 0x7523C495 3. 0x00EF3B96 4. 0x00EFB998 5. 0x00EFB7DF 6. 0x75D31174 7. 0x76E8B3F5 8. 0x76E8B3C8 Leaked block at 0x00422BB0 of size 128 from heap 0x00410000 1. 0x76EAB234 2. 0x00EF3BFA 3. 0x00EFB998 4. 0x00EFB7DF 5. 0x75D31174 6. 0x76E8B3F5 7. 0x76E8B3C8 Leaked block at 0x003625B8 of size 128 from heap 0x00360000 HEAP[app.exe]: 8 leaks detected.
The four blocks allocated with the Win32 functions were reported with stack traces intact. Four others were allocated before we set up the debugging, including one from a foreign heap. The two CRT allocations didn’t and, with the debug runtimes, will never show up in the output for reasons described later.
How it Works
Stack Collection
On XP, the function that does the stack collection, RtlLogStackBackTrace, is called directly at various places of interest such as heap creation, allocation, freeing and tag creation depending on the values of the globalflags or heap flags as appropriate. The function calls RtlCaptureStackBacktrace to get the most recent 32 entries on the stack, skipping the first. After being captured, the trace is added to a database (RtlpStackTraceDataBase) along with the number of frames, their hash, and number of times encountered for future reference. It returns a WORD sized index into the database to be saved in the allocation header.
With the addition of the above debug options, Vista+ substitutes the hard-coded call to the stack trace collector for a call to one of the predefined interceptors, if one has been specified for the heap. In the HEAP_DEBUGGING_INFORMATION structure, the reason the InterceptorFunction must be one of the three inside ntdll is that only its position in the table of valid functions is saved. At the relevant times, the function at that index is called with contextual data including an enum defining the current operation. The interceptors that take stack traces are only interested in three of the current 8 defined actions (post-allocation, reallocation, deallocation) and operates almost exactly like the XP version except the trace is entered into a structure array (RtlpHeapStackTraceLog) rather than a database.
Leak Checking
After initializing a bunch of variables, leak checking starts off with RtlpReadProcessHeaps. In XP, the function walks all active heaps for busy regions, while in Vista+ it makes use of the new callback system which will be discussed in the next article. These busy regions are linked up to a leak list (RtlpLeakList) before having their address and size added to a map of active process memory (RtlpProcessMemoryMap) to be used later.
Secondarily, RtlpScanProcessVirtualMemory is called to scan the entire virtual address space of the process for page ranges that were writeable when initially allocated, in a committed state, do not have guard status, and aren’t in the memory map. When such a range is found, each pointer size area is checked against the map to see if it lies within a busy entry recorded during the walk. If so, the busy entry is removed from the leak list onto a list of busy blocks (RtlpBusyList). After the virtual scan is finished RtlpScanHeapAllocBlocks takes over and sweeps the entries on the busy list in the same manner as the virtual addresses. After this second scan, entries left in the leak list are considered to be leaks and reported.
Fallabilities
On the unfortunate side, the method described above will never pick up any leaks from the debug CRT, because it keeps a global pointer as the head of a linked list of all allocations. This is picked up during the virtual address scan and its links by the heap scan, disregarding them as leaks. Another downside to the method is that the checks on virtual space are passed by image code pages which are initially mapped as PAGE_EXECUTE_WRITECOPY. This leads to arbitrary chunks of instructions masquerading as valid pointers and their subsequent removal from the leak list.
Wrap Up
There you have it. A hopefully clearer picture of how to use one of Windows’ built-in debugging tool as well as how it goes about its business.
Notes
[1] The article in question is mirrored in its entirety at OSROnline. The “Built-in User Heap Leak Detection” section mentions the ShutdownFlags value.
[2] Basic setup instructions:
1. Open regedit to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\
2. Add a new key that is the same as your app’s file name
3. In the new key, add the ‘ShutdownFlags’ value as a DWORD with data of 3
4. Add a ‘GlobalFlag’ value as a DWORD with a hexadecimal data of 1000
4a. To limit the amount of heap space used for stack traces, add a StackTraceDatabaseSizeInMb value with data of the desired limit. A data of 0 is ignored.
5. Run your app as normal
6. If you didn’t run your app in the debugger, attach it when the breakpoint is hit. This is signified by a “app has stopped working” dialog in Vista+ and by the following dialog in XP:
[3] To find the trace manually, make sure you have the correct symbols, then:
On Windows XP and Server 2003:
// these instructions are generally a formalised version of the process described // at http://blogs.msdn.com/duetsupport/archive/2009/03/12/adventures-in-analyzing-high-memory-use-on-a-duet-client.aspx 1. Open the callstack window and double-click on the top entry. It should be ntdll.dll!_RtlDetectHeapLeaks@0... Open a Watch window and type "(_STACK_TRACE_DATABASE**)_RtlpStackTraceDatabase" (without the quotes) for the name. Expand the entry twice and scroll down to the EntryIndexArray member 2. Using the pointer value from the 'User' column of the leak output and a memory window, mentally go through the following function to get the block's trace index. USHORT GetTraceIndex(BYTE* pUserPointer) { // this is essentially RtlGetExtraStuffPointer BYTE* pBlockHeader = pUserPointer - 8; BYTE* returnedPointer = NULL; BYTE flag = *(pBlockHeader + 5); if(flag & 8) { returnedPointer = pBlockHeader - 0x10; } else { DWORD index = *((WORD*)pBlockHeader); returnedPointer = (pBlockHeader + (index * 8)) - 8; } WORD traceIndex = (*(ULONG_PTR*)returnedPointer) & 0xFFFF; return traceIndex; } 3. Go back to the watch window from step one and copy the address contained in the EntryIndexArray member to the Immediate window. Add on to it, "-(sizeof(void*)*TraceIndex" without quotes and substituting TraceIndex for the value gained in step two. Paste the resulting address into a memory window. On 32-bit machines, the stack trace starts at (offset 0xc) and is (offset 0xa) entries long. On 64-bit, the stack trace starts at (offset 0x10) and is (offset 0xe) entries long
On Vista / 7, the callstack data is stored with the block, but accessing it isn’t much easier from a manual standpoint. Luckily with the leak callback enabled, you can copy and paste the below code and call it from within the callback. It returns an array of instruction pointers of ‘numIPs’ length.
typedef struct _StackTraceInfo { ULONG unk; ULONG_PTR unk2; ULONG numFrames; PVOID* ips; } StackTraceInfo; PVOID* GetStackBackTraceFromUserPointer(BYTE* pUserData, ULONG* numIPs) { // start here from a pointer returned from HeapAlloc or the 'User' field of a leak report // DWORD amountToRewind = sizeof(void*) * 2; // BYTE* pBlockStart = pUserData - amountToRewind; // if((*(pBlockStart + amountToRewind - 1)) == 5) // { // ULONG offsetfromBeginning = (*(pBlockStart + amountToRewind - 2)) * amountToRewind; // pBlockStart -= offsetfromBeginning; // } // the following is essentially RtlpQueryBlockStackTrace // start here from the value in the 'Block' field of the leak report DWORD heapEntrySize = sizeof(void*) * 2; DWORD index = *(pBlockStart + (heapEntrySize - 1)); BYTE* endOfHeader = NULL; if(index & 0x40) { index &= 0x3F; endOfHeader = pBlockStart + (index * heapEntrySize) + heapEntrySize; } else if(index == 4) { endOfHeader = pBlockStart + (index * heapEntrySize) + heapEntrySize; } else { endOfHeader = (pBlockStart + heapEntrySize); } StackTraceInfo** addressOfStackTrace = (StackTraceInfo*)(pBlockStart + heapEntrySize); if(addressOfStackTrace == endOfHeader) { return NULL; } WORD type = *(WORD*)(endOfHeader - 8) - 1; if(type == 0) { if(numIPs) { *numIPs = *(WORD*)(endOfHeader - 6); } return (PVOID*)(pBlockStart + (heapEntrySize * 2)); } StackTraceInfo* stackTraceEntry = (*addressOfStackTrace); if((type != 1) || (stackTraceEntry == NULL)) { return NULL; } if(numIPs) { *numIPs = stackTraceEntry->numFrames; } return stackTraceEntry->ips; }