It has been a long time since I last wanted
to discover if/where a program was “leaking” heap allocations. Most programs
that I developed myself just performed some task and exited; heap allocations
(from all sources, including Microsoft and other third party DLLs) probably
rarely exceeded a few megabytes. I coded mostly with C# (garbage collected);
most heap allocations directly under my control arose from native interop and I
adopted an approach of releasing memory when it was “easy” and did not obscure
the main intent of the code – otherwise I “intentionally” allowed the memory to
leak.
I mention the above because I am a heavy
user of Event Tracing for Windows (ETW) but I had hitherto no experience of using
ETW (or, indeed, any other tool) to investigate heap usage. It was only when I
tried to help with a problem/question in a technical forum that I had a need to
understand heap usage. The question was whether the Windows Filtering Platform
API FwpmNetEventEnum unavoidably leaks heap allocations.
The first approach that came to mind was to
use the User-Mode Dump Heap (UMDH) utility from the Debugging Tools for Windows
kit. However, the “current” version did not seem to work. Searching the web for
explanations uncovered the following quotes for other users who had encountered
the problem:
According to a
Microsoft employee, this is a known problem. I quote: "Yeah. It's not
working and I don't know when/if it will ever be."
I also quote an
email I got from a Microsoft Support guy: "Anyway, I have confirmation it
is broken. The dev team owning the exe knows about it and when they can get to
fixing it they will."
Fortunately older versions of UMDH still
work and it quickly became apparent that FwpmNetEventEnum does leak heap
allocations. Most Fwpm* routines use RPC to the Base Filtering Engine (BFE)
service to perform their function. Those Fwpm* APIs that return complex data
structures mostly use a [allocate(all_nodes)] attribute in the MIDL ACF (Application Configuration File) so that
the data can be freed with a single call to midl_user_free; however, that
attribute was not applied to the RPC routine at the core of FwpmNetEventEnum. A
subsequent call to FwpmFreeMemory just frees the top-level allocation and not
the additional embedded allocations.
The absence of the [allocate(all_nodes)] attribute
could be confirmed with tools that dump embedded RPC data structures; one
example of a heap allocation back-trace that demonstrated that complex data
structures were being allocated node-by-node was:
ntdll!RtlpAllocateHeapInternal+0x80B4E
fwpuclnt!MIDL_user_allocate+0x19
RPCRT4!NdrSafeAllocate+0x47
RPCRT4!Ndr64ComplexStructUnmarshall+0x72D
RPCRT4!Ndr64EmbeddedPointerUnmarshall+0x366
RPCRT4!Ndr64UnionUnmarshall+0x2D9
RPCRT4!Ndr64ComplexStructUnmarshall+0x5F4
RPCRT4!Ndr64pPointerLayoutUnmarshallCallback+0x234
RPCRT4!Ndr64ConformantArrayUnmarshall+0x21C
RPCRT4!Ndr64TopLevelPointerUnmarshall+0x40F
RPCRT4!Ndr64TopLevelPointerUnmarshall+0x59D
RPCRT4!Ndr64pClientUnMarshal+0x2A1
RPCRT4!NdrpClientCall3+0x40C
RPCRT4!NdrClientCall3+0xEB
fwpuclnt!FwpmNetEventEnum5+0x70
Heap Snapshots
I then turned my thoughts to understanding
what type of bug could have been introduced into UMDH. There are several
methods of obtaining the information needed to dump heap snapshot information (including
heap allocation back-traces) about a process; the routines RtlQueryProcessDebugInformation
and RtlQueryHeapInformation can both independently obtain the necessary
information. UMDH seems to have taken a different approach and used the routine
ReadProcessMemory and a knowledge of NTDLL internal data structures to gather
the information.
The failing version of UMDH seems to have
started using RtlQueryHeapInformation (with an HEAP_INFORMATION_CLASS value of HeapExtendedInformation
(2)) to obtain information about heap allocations, but this information does
not include any data that can be used to associate the allocation with a
back-trace. There is, however, a HEAP_INFORMATION_CLASS value (5, let’s name it
HeapStackTraceInformation) that returns information well suited for use by UMDH
(i.e. includes information about allocated heap blocks and back-traces for the
allocations).
The back-traces returned by RtlQueryHeapInformation
for HeapStackTraceInformation come from a different source compared to the
back-traces created and store when the Global Flag FLG_USER_STACK_TRACE_DB is
set. The back-traces used by RtlQueryHeapInformation are enabled and disabled
by RtlSetHeapInformation (also with a HEAP_INFORMATION_CLASS value of 5) or by
creating a value named “FrontEndHeapDebugOptions” under the Image
File Execution Options (IFEO) key for an image; this value can be
set by the Windows Performance Recorder (WPR) command “wpr -snapshotconfig heap
–name […]” (“wpr -snapshotconfig
heap –pid […]” effectively calls
RtlSetHeapInformation).
When comparing the two versions of the
back-trace information for a given allocation, they mostly just differ in the
first frame:
HeapStackTraceInformation:
ntdll!RtlpAllocateHeapInternal+0x80b49:
e8528d0500 call ntdll!RtlpHpStackTraceAddStack
FLG_USER_STACK_TRACE_DB:
ntdll!RtlpAllocateHeapInternal+0x809dd:
e8ac1cffff call ntdll!RtlpCallInterceptRoutine
The back-traces can also differ in the
depth of the back-trace captured and stored (HeapStackTraceInformation can save
more frames).
“wpr -singlesnapshot heap […]” uses EnableTraceEx2
to send an EVENT_CONTROL_CODE_CAPTURE_STATE to the Microsoft-Windows-Heap-Snapshot
provider, using the EnableFilterDesc field of the EnableParameters parameter to
select the “pids”. This causes RtlQueryHeapInformation with HeapStackTraceInformation
to be executed in the target processes with the output being broken into chunks
and logged into the trace session. Windows Performance Analyzer (WPA) can
reassemble, analyze and display this data in a “Heap Snapshot” graph.
Heap Events
WPR provides another heap related command:
“wpr -heaptracingconfig […]”. This command creates/sets another value under
IFEO – namely TracingFlags. These flags enable aspects of the User Mode Global
Logger (UMGL), including events generated by the WMI HeapTraceProvider; this
provider generates events for individual heap events (HeapRangeCreate,
HeapRangeReserve, HeapRangeRelease, HeapRangeDestroy, HeapCreate,
HeapAllocation, HeapReallocation, HeapDestroy, HeapFree and more) and StackWalk
back-traces can be configured for selected event types. WPA knows how to analyze
and display these events too (in various graphs in the Memory category).
The instrumentation for these events is
obviously embedded in many NTDLL heap routines; for the HeapAllocation event,
the instrumentation is embedded close to the heap stack tracing calls:
ntdll!RtlpAllocateHeapInternal+0x80aec:
e817a30500 call ntdll!RtlpLogHeapAllocateEvent
If a process was started without heap
tracing enabled via IFEO, heap tracing can still be enabled by directly setting
the heap tracing bit in the _PEB.TracingFlags field (perhaps via a debugger);
there does not seem to be any API that performs this function.