Tuesday, 30 January 2024

Analyzing Windows heap usage with and without ETW

 

It has been a long time since I last wanted to discover if/where a program was “leaking” heap allocations. Most programs that I developed myself just performed some task and exited; heap allocations (from all sources, including Microsoft and other third party DLLs) probably rarely exceeded a few megabytes. I coded mostly with C# (garbage collected); most heap allocations directly under my control arose from native interop and I adopted an approach of releasing memory when it was “easy” and did not obscure the main intent of the code – otherwise I “intentionally” allowed the memory to leak.

I mention the above because I am a heavy user of Event Tracing for Windows (ETW) but I had hitherto no experience of using ETW (or, indeed, any other tool) to investigate heap usage. It was only when I tried to help with a problem/question in a technical forum that I had a need to understand heap usage. The question was whether the Windows Filtering Platform API FwpmNetEventEnum unavoidably leaks heap allocations.

The first approach that came to mind was to use the User-Mode Dump Heap (UMDH) utility from the Debugging Tools for Windows kit. However, the “current” version did not seem to work. Searching the web for explanations uncovered the following quotes for other users who had encountered the problem:

According to a Microsoft employee, this is a known problem. I quote: "Yeah. It's not working and I don't know when/if it will ever be."

I also quote an email I got from a Microsoft Support guy: "Anyway, I have confirmation it is broken. The dev team owning the exe knows about it and when they can get to fixing it they will."

Fortunately older versions of UMDH still work and it quickly became apparent that FwpmNetEventEnum does leak heap allocations. Most Fwpm* routines use RPC to the Base Filtering Engine (BFE) service to perform their function. Those Fwpm* APIs that return complex data structures mostly use a [allocate(all_nodes)] attribute in the MIDL ACF (Application Configuration File) so that the data can be freed with a single call to midl_user_free; however, that attribute was not applied to the RPC routine at the core of FwpmNetEventEnum. A subsequent call to FwpmFreeMemory just frees the top-level allocation and not the additional embedded allocations.

The absence of the [allocate(all_nodes)] attribute could be confirmed with tools that dump embedded RPC data structures; one example of a heap allocation back-trace that demonstrated that complex data structures were being allocated node-by-node was:

ntdll!RtlpAllocateHeapInternal+0x80B4E

fwpuclnt!MIDL_user_allocate+0x19

RPCRT4!NdrSafeAllocate+0x47

RPCRT4!Ndr64ComplexStructUnmarshall+0x72D

RPCRT4!Ndr64EmbeddedPointerUnmarshall+0x366

RPCRT4!Ndr64UnionUnmarshall+0x2D9

RPCRT4!Ndr64ComplexStructUnmarshall+0x5F4

RPCRT4!Ndr64pPointerLayoutUnmarshallCallback+0x234

RPCRT4!Ndr64ConformantArrayUnmarshall+0x21C

RPCRT4!Ndr64TopLevelPointerUnmarshall+0x40F

RPCRT4!Ndr64TopLevelPointerUnmarshall+0x59D

RPCRT4!Ndr64pClientUnMarshal+0x2A1

RPCRT4!NdrpClientCall3+0x40C

RPCRT4!NdrClientCall3+0xEB

fwpuclnt!FwpmNetEventEnum5+0x70


Heap Snapshots

I then turned my thoughts to understanding what type of bug could have been introduced into UMDH. There are several methods of obtaining the information needed to dump heap snapshot information (including heap allocation back-traces) about a process; the routines RtlQueryProcessDebugInformation and RtlQueryHeapInformation can both independently obtain the necessary information. UMDH seems to have taken a different approach and used the routine ReadProcessMemory and a knowledge of NTDLL internal data structures to gather the information.

The failing version of UMDH seems to have started using RtlQueryHeapInformation (with an HEAP_INFORMATION_CLASS value of HeapExtendedInformation (2)) to obtain information about heap allocations, but this information does not include any data that can be used to associate the allocation with a back-trace. There is, however, a HEAP_INFORMATION_CLASS value (5, let’s name it HeapStackTraceInformation) that returns information well suited for use by UMDH (i.e. includes information about allocated heap blocks and back-traces for the allocations).

The back-traces returned by RtlQueryHeapInformation for HeapStackTraceInformation come from a different source compared to the back-traces created and store when the Global Flag FLG_USER_STACK_TRACE_DB is set. The back-traces used by RtlQueryHeapInformation are enabled and disabled by RtlSetHeapInformation (also with a HEAP_INFORMATION_CLASS value of 5) or by creating a value named “FrontEndHeapDebugOptions” under the Image File Execution Options (IFEO) key for an image; this value can be set by the Windows Performance Recorder (WPR) command “wpr -snapshotconfig heap –name […]” (“wpr -snapshotconfig heap –pid […]” effectively calls RtlSetHeapInformation).

When comparing the two versions of the back-trace information for a given allocation, they mostly just differ in the first frame:

HeapStackTraceInformation:

ntdll!RtlpAllocateHeapInternal+0x80b49:

e8528d0500      call    ntdll!RtlpHpStackTraceAddStack

 

FLG_USER_STACK_TRACE_DB:

ntdll!RtlpAllocateHeapInternal+0x809dd:

e8ac1cffff      call    ntdll!RtlpCallInterceptRoutine

The back-traces can also differ in the depth of the back-trace captured and stored (HeapStackTraceInformation can save more frames).

“wpr -singlesnapshot heap […]” uses EnableTraceEx2 to send an EVENT_CONTROL_CODE_CAPTURE_STATE to the Microsoft-Windows-Heap-Snapshot provider, using the EnableFilterDesc field of the EnableParameters parameter to select the “pids”. This causes RtlQueryHeapInformation with HeapStackTraceInformation to be executed in the target processes with the output being broken into chunks and logged into the trace session. Windows Performance Analyzer (WPA) can reassemble, analyze and display this data in a “Heap Snapshot” graph.

Heap Events

WPR provides another heap related command: “wpr -heaptracingconfig […]”. This command creates/sets another value under IFEO – namely TracingFlags. These flags enable aspects of the User Mode Global Logger (UMGL), including events generated by the WMI HeapTraceProvider; this provider generates events for individual heap events (HeapRangeCreate, HeapRangeReserve, HeapRangeRelease, HeapRangeDestroy, HeapCreate, HeapAllocation, HeapReallocation, HeapDestroy, HeapFree and more) and StackWalk back-traces can be configured for selected event types. WPA knows how to analyze and display these events too (in various graphs in the Memory category).

The instrumentation for these events is obviously embedded in many NTDLL heap routines; for the HeapAllocation event, the instrumentation is embedded close to the heap stack tracing calls:

ntdll!RtlpAllocateHeapInternal+0x80aec:

e817a30500      call    ntdll!RtlpLogHeapAllocateEvent

If a process was started without heap tracing enabled via IFEO, heap tracing can still be enabled by directly setting the heap tracing bit in the _PEB.TracingFlags field (perhaps via a debugger); there does not seem to be any API that performs this function.