In Detail

When investigating the behaviour of a Windows system, I often use a combination of crash/memory dumps and Event Tracing for Windows (ETW): dumps to view the state and ETW to view the temporal development of a system. I often use WPR (and other trace controllers), WPA, cdb/WinDbg and homebrew tools to view and manipulate the captured data. One area that I had hitherto never explored was most of the “Memory” analysis graphs in WPA – especially the “Resident Set” and “Reference Set” graphs.

My intention is to describe the mechanics of using ETW events to generate the “Resident Set” and “Reference Set” graphs rather than how to interpret the resulting graphs from a memory performance perspective.

Resident Set

The Resident Set graph, in particular, displays “state” information which I would normally have obtained in a list-like form from a dump file; it was the data manipulation abilities of WPA (grouping, sorting, collapsing, etc.) that made it interesting to me, but I wanted to know what data was used so that I could validate the interpretation of the raw data and compare the results with dump file analysis results (e.g. kernel debugger commands such as “!memusage” and “!vm”).

One way of creating an ETW trace for Resident Set analysis is to use the WPR profile “ResidentSet”; this profile uses the System Keywords: CpuConfig, DiskIO, HardFaults, Loader, Memory, MemoryInfo, ProcessThread, Session, VirtualAllocation, VAMap (keyword “Memory” also sets keyword “Filename” implicitly).

One further event provider is used to provide memory usage information (Win32HeapRanges) alongside a few other providers that are enabled to enhance the presentation of stack information.

By enabling only selected keywords or by filtering the ETW trace to remove information, one can investigate how the ETW trace data is used to create the Resident Set graph.

“Memory” is the only essential keyword; even if the other keywords are not used, a Resident Set graph will still be available in WPA, albeit with fewer “details”. When this keyword is enabled, a number of event types are recorded, the most important of which is named “Memory: PageInMemory” (in the Trace Statistics view of a trace). One event of this type will be logged for every physical page (4 kilobyte) of memory in the system; so 16 gigabytes of physical memory will result in over 4 million events emitted without delay, so many ETW buffers are needed to avoid losing events.

There is, by default, no MOF definition for this type in the WMI/WBEM database/repository, but the event data is essentially a MMPFN_IDENTITY structure, a definition of which can be found in the Windows Research Kernel (WRK); the WRK definition is old, but only a few minor tweaks (new bit field meanings and enumeration values) appear to have been made in the intervening years. The sequence of these events is essentially a parallel to the MMPFN array, which is the basis for the debugger “!memusage” analysis of memory usage.

The MMPFN_IDENTITY structure includes fields that identify a “list” (3 bits that encode: zero, free, standby, modified, modified-no-write, bad, active, transition) and a “use” (4 bits); these “use” values map to ResidentSetPageCategory values used by WPA as shown in the following table:

ResidentSetPageCategory	MMPFN_IDENTITY derived information
AddressingWindowExtensionsPage	MMPFNUSE_AWEPAGE
DriverLockedSystemPage	MMPFNUSE_DRIVERLOCKPAGE
Image	MMPFNUSE_FILE + MMPFN_IDENTITY.u2.e1.Image
KernelStack	MMPFNUSE_KERNELSTACK
LargePage	MMPFNUSE_LARGEPAGE
MapFile	MMPFNUSE_FILE + not MMPFN_IDENTITY.u2.e1.Image
MetaFile	MMPFNUSE_METAFILE
NonPagedPool	MMPFNUSE_NONPAGEDPOOL
PagedPool	MMPFNUSE_PAGEDPOOL
PageTable	MMPFNUSE_PAGETABLE
PageFileMappedSection	MMPFNUSE_PAGEFILEMAPPED
SessionPrivate	MMPFNUSE_SESSIONPRIVATE
StraggleIOPage	MMPFNLIST_TRANSITION
SystemPage	MMPFNUSE_SYSTEMPTE
VirtualAlloc_PreTrace	MMPFNUSE_PROCESSPRIVATE
WsMetaData	MMPFNUSE_WSMETADATA

The following ResidentSetPageCategory values reliably (in principle, perhaps not in implementation) combine MMPFN_IDENTITY event information with additional event types as follows:

ResidentSetPageCategory	MMPFN_IDENTITY plus derived information	WPR Keyword
CopyOnWriteImage	MMPFNUSE_PROCESSPRIVATE + MapFile Rundown	VAMap
CopyOnWriteMapFile	MMPFNUSE_PROCESSPRIVATE + MapFile Rundown	VAMap
CopyOnWritePageFileMappedSection	MMPFNUSE_PROCESSPRIVATE + MapFile Rundown	VAMap
VirtualAlloc	MMPFNUSE_PROCESSPRIVATE + VirtualAlloc Rundown	VirtualAllocation
Win32Heap	MMPFNUSE_PROCESSPRIVATE + HeapRange Rundown	Win32HeapRanges
SessionCopyOnWriteImage	MMPFNUSE_SESSIONPRIVATE + MapFile Rundown	VAMap
Driver	MMPFNUSE_SYSTEMPTE + Image Rundown	Loader

The following ResidentSetPageCategory values unreliably/incorrectly combine MMPFN_IDENTITY event information with additional event types as follows:

ResidentSetPageCategory	MMPFN_IDENTITY plus derived information	WPR Keyword
UserStack	MMPFNUSE_ PROCESSPRIVATE + Thread Rundown	ProcessThread
DriverFile	MMPFNUSE_FILE + FileName Rundown	Filename
Prefetcher	MMPFNUSE_FILE + FileName Rundown	Filename
RegistryFile	MMPFNUSE_METAFILE + FileName Rundown	Filename

The remaining ResidentSetPageCategory values are either not used or just not present in my system:

ResidentSetPageCategory	MMPFN_IDENTITY plus derived information
SystemCache
HyperSpace

The “current” implementation of the ResidentSetPageCategory classification for VirtualAlloc and Win32Heap just use “actual” events (rather than the “Rundown” events, which are also present) and so dramatically underestimates their values (attributing them to VirtualAlloc_PreTrace).

The UserStack classification just uses the StackBase and StackLimit values from Thread Rundown/Create events, which just gives an initial view of the stack virtual address range (the initial stack commit size) and does not take account of the stack reserve size. At the time of the WPR trace, the stack may have grown and this could be determined by combining information from the thread events and the VirtualAlloc Rundown events, but such calculations are not currently performed.

The DriverFile and Prefetcher classification just matches filename extensions against “.sys” and “.pf” – perhaps a reasonable heuristic but normally just an uninteresting subdivision of pages on the standby list.

The current RegistryFile classification is just pure nonsense. Firstly, it uses name matching on the filenames of “MetaFile” pages; in this context, “meta files” are file system metafiles such as $Mft, $LogFile, index files (directories), etc. (and, therefore, not data files such as registry hives). Secondly, it just looks for a few familiar hive names such as “SYSTEM”, “SECURITY” and “DEFAULT”. One could try to rescue this classification by using the “RegistryHive” keyword to obtain a rundown of hive filenames and matching those names against filenames of “MapFile” pages, but one would still have to allow for differences in the filenames (e.g. \Device\HarddiskVolume3\Windows\system32\config\SOFTWARE vs. \SystemRoot\System32\Config\SOFTWARE).

Page Frame Number (PFN) Pages

The ability to group pages based on list, use, process, file, page priority and pool tag means that there are few large counts of pages that cannot be broken down into smaller counts. One such large, opaque, block is the ResidentSetPageCategory SystemPage. There is however an event in a ResidentSet trace that can divide this block: “Memory: KeMemUsage”; this event contains the virtual address of the PFN database and its page count. The PFN database typically forms a large portion of the SystemPage category, so separating it from that category could be helpful. The “Memory: KeMemUsage” event actually contains a “UsageType” field, but currently only one usage type is defined:

ntoskrnl!_PERFINFO_KERNELMEMORY_USAGE_TYPE
PerfInfoMemUsagePfnMetadata = 0n0
PerfInfoMemUsageMax = 0n1

Bad Pages

Depending on the duration of the ResidentSet trace, there may be one or more “Memory: MemInfo” events in the trace. These events are the raw data for the WPA “Memory Utilization” graph; they contain interesting summary information (including standby list repurposed counts) and other counts, including the number of “bad” pages. The bad page count is useful because when examining the “Memory: PageInMemory” events, pages with a “list” value of “bad” are likely to be found. The “list” value is represented in 3 bits and all 8 possible values have established meanings but it is possible to overload the meaning of the bad list value. When “!memusage” examines the PFN database, it is able to use “magic” values in other fields of the MMPFN structure to overload the meaning of the “bad” list, but this distinction cannot be deduced from the contents of the MMPFN_IDENTITY structure. “!memusage” describes these pages as “SLIST/Temp”.

Combined Pages

I was not aware of “Combined Pages” until I started looking at this topic; they are described in “Windows Internals, Seventh Edition, Part 1” Chapter 5, Section “Memory combining”. Combined pages can be identified from information in the MMPFN_IDENTITY and WPA does this; when exploring the ResidentSetPageCategory PageFileMappedSection pages (PFMappedSection) one will probably find “CombinedPage” pages. The ResidentSet trace will include a “Memory: MMStat” event that includes statistics about page combining activity.

Non-Tradeable Pages

To be done.

Summary of ResidentSet Tracing and Analysis

The ability to share the raw data with others, store the raw data, combine information from several kernel data structures without groveling through undocumented data and use versatile user interface features to organize data make this a useful feature.

Reference Set

The WPR keywords used in a ReferenceSet trace do not differ greatly from those used in a ResidentSet trace; one keyword is omitted (DiskIO) and three additional keywords are added: FootPrint, MemInfoWS and ReferenceSet.

As far as I can tell, the FootPrint keyword just ensures that Memory, Pool and Session rundowns are included in the trace, but other keywords in the ReferenceSet trace trigger these rundowns too. MemInfoWS causes a “Memory: MemInfoExWS” event to be logged twice per second, containing summary information (total counts) for shared pages in each working set; these events do not seem to be used in the “Reference Set” graph.

As would be expected, the ReferenceSet keyword is essential for a reference set analysis. Its effect seems to be, in quick succession, to log a “start” mark, empty all working sets and to log a sequence of “Memory: InMemoryActive” events; it also logs a “stop” mark when the trace is stopped (but before the rundowns begin). Memory: InMemoryActive” events contain the same information as “Memory: PageInMemory” events (a MMPFN_IDENTITY structure) but are only logged for “active” list PFN database entries.

In addition to their rundown behaviour, the keywords Memory, VirtualAllocation and VAMap generate events whenever relevant operations occur (e.g. adding pages to working sets, mapping/unmapping files or pagefile, allocating/freeing virtual memory or pool); these events occur in a ResidentSet trace too, but they can be ignored for Resident Set analysis/graphing.

Column Names

More than 60 column names are available in the Reference Set View Editor. The names often give an indication of how a view “works” (e.g. which event types are used and how information from events is combined to present the view) and not being able to guess what a name means is an indication that one might not have fully understood the purpose of a view.

The first change to the view that I wanted to make was to switch from megabytes to pages as the measure of size; most of the potential column names for page count include the text “w/o Offer” and I had no idea what this might imply. Subsequently, I guessed that this text is related to Video memory offer and reclaim (there is also a column name of “VidMm”, which adds weight to the guess). There is an event provider that could possibly generate events relevant to this activity (Microsoft-Windows-DxgKrnl) but this provider is not included in the “Reference Set” recording profile and the topic is too far from my interests to pursue in more detail.

Some of the column names, such as “COFF Group” (known to me in the context of object/executable file formats) seem irrelevant with respect to event tracing; as would be expected, no values appeared under this (and similar) column names when they were added to a view.

Reaccess

The column name “Reaccess” combined with the potential values in the “Access Reason” (ReferenceSetReferenceReason) and Release Reason” (ReferenceSetReleaseReason) helped me to infer how the Reference Set view possibly works.

ReferenceSetReferenceReason	Related Event
PrivatePageAccess	Memory: PageAccess
SharablePageAccess	Memory: PageAccessEx
PageRangeAccess	Memory: PageRangeAccess
ActiveRundown	Memory: InMemoryActive
PoolAllocate	Pool: Allocate
PoolFree	Pool: Free
PageCombine	Memory: PageCombine
Reclaim	Microsoft-Windows-DxgKrnl

ReferenceSetReleaseReason	Related Event
PageRelease	Memory: PageRelease
PageRangeRelease	Memory: PageRangeRelease
VirtualAddressRangeEnd	Memory: VirtualFree (Flags includes MEM_RELEASE)
PageFileMappedSectionDelete	Section: Delete
PageCombine	Memory: PageCombine
VirtualAddressRangeDecommit	Memory: VirtualFree (Flags includes MEM_DECOMMIT)
PoolFree	Pool: Free
PoolAllocate	Pool: Allocate
ProcessEnd	Process: Delete
ThreadEnd	Thread: Delete
RemovedFromWorkingSet	Memory: RemoveFromWS
Offer	Microsoft-Windows-DxgKrnl

“Reaccess” seems to mean that a page has been “accessed” (added to a working set) twice without an intervening “release” (removal/eviction from a working set).

The ReferenceSetReferenceReason values cover all of the paths causing a page to be added to a working set plus other values: Reclaim (video memory) and Pool Allocate/Free.

Pool Allocate/Free is orthogonal to working set growth/reduction; my guess is that both “Allocate” and “Free” are included as “access” reasons and both as “release” reasons to “balance the books”. The default keywords used in a Reference Set trace do not include the “Pool” keyword; the “Pool” events in the trace are just the “large”/”big” allocations (size plus overhead greater/equal one page) and frees which are included via the “Memory” keyword.

The ReferenceSetReleaseReason values do not cover all of the paths causing a page to be removed from a working set (if one ignores RemovedFromWorkingSet, which is not enabled in the default capture) but does contain “other” values: Offer (video memory), Pool Allocate/Free (again) and ThreadEnd (presumably included to track user stack releases, but VirtualFree does this better). Missing from the list is the unmapping of files.

If one adds RemovedFromWorkingSet events to a trace (via the “WorkingSet” keyword), it is possible to completely eliminate “reaccess” occurrences; if one does not collect RemovedFromWorkingSet events but does use unmap file as a release reason, it is possible to reduce the number of “reaccess” occurrences to a handful (tens of events). On even just short traces, the existing WPA algorithm reports hundreds of thousands of such reaccess occurrences.

Enabling the “WorkingSet” keyword does mean that two additional high volume bursts of events are added to a Reference Set trace: another rundown of the active pages in the PFN database and all of the working set evictions that occur when the working sets are emptied at the start of the trace.

Verifying the accuracy of the working set tracking

A Reference Set trace contains a rundown of the active pages at the start of tracking (after working sets have been emptied) and a full rundown of the PFN database when tracking stops (when the trace is stopped). It is possible to compare the result of “initial state plus tracked page insertions/removals” against “final state” and the inconsistencies should be small; because it takes time to record the initial and final states, some insertions and removals that occur during the rundowns will not be attributed appropriately.

The elimination of “reaccess” occurrences is also a strong sign that the working set tracking is functioning correctly.

Special handling for some events

Some “Memory: PageAccessEx” indicate the conversion of a shared page to a process private page (e.g. when a page in a copy-on-write region is written to), and these events need to be detected if one is to accurately track working set state.

Some other “Memory: PageAccessEx” events indicate a ProcessId of 0; the stack at the time of such events typically looks like this:

MiLogMapFileEvent
MiMapViewOfImageSection
MiMapViewOfSection
MiMapViewOfSectionExCommon
MmMapViewOfSectionEx
MiMapProcessExecutable
MmInitializeProcessAddressSpace
PspAllocateProcess
NtCreateUserProcess

The page accesses are being recorded during the creation of the virtual address space of a new process; the ThreadId is that of the thread that called CreateProcess and can be used to “collect” the pages that are accessed. When the “Process: Create” event occurs, the attribution of the collected pages can be transferred to the new process.

The “Memory: PageAccessEx” events include a bit that indicates whether is page is/was in a user working set or a system working set; this user/system distinction must be observed when attributing a page.

Summary of ReferenceSet Tracing and Analysis

There seem to be serious flaws in how WPA attributes and tracks page ownership but the total impression given by the view is probably accurate enough for most practical purposes.

MOF classes relevant for memory analysis

MOF class definitions are available for some of the events that can usefully be processed during memory analysis:

WPA Event Name	MOF Event Type Name	MOF Class
Memory: VirtualAlloc	VirtualAlloc	PageFault_VirtualAlloc
Memory: VirtualFree	VirtualFree	PageFault_VirtualAlloc
Memory: VirtualAlloc Start Rundown	VirtualAllocDCStart	PageFault_VirtualAllocRundown
Memory: VirtualAlloc End Rundown	VirtualAllocDCEnd	PageFault_VirtualAllocRundown
Process: Create	Start	Process_V4_TypeGroup1
Process: Delete	End	Process_V4_TypeGroup1
Process: Start Rundown	DCStart	Process_V4_TypeGroup1
Process: End Rundown	DCEnd	Process_V4_TypeGroup1
Process: PerfCounters: End	PerfCtr	Process_V2_TypeGroup2
Process: PerfCounters: Rundown	PerfCtrRundown	Process_V2_TypeGroup2
Thread: Create	Start	Thread_V3_TypeGroup1
Thread: Delete	End	Thread_V3_TypeGroup1
Thread: Start Rundown	DCStart	Thread_V3_TypeGroup1
Thread: End Rundown	DCEnd	Thread_V3_TypeGroup1
Image: Load	Load	Image_Load_V2
Image: Unload	UnLoad	Image_Load_V2
Image: Start Rundown	DCStart	Image_Load_V2
Image: End Rundown	DCEnd	Image_Load_V2
Filename: Create	FileCreate	FileIo_V2_Name
Filename: Delete	FileDelete	FileIo_V2_Name
Filename: Rundown	FileRundown	FileIo_V2_Name
Pool: Allocate	PoolAllocation	PoolAllocFree
Pool: Free	PoolFree	PoolAllocFree
Mark	Mark	Mark_V0
EventTrace: Rundown Complete	RDComplete	RDComplete_V1

For some other events that might need to be processed, no MOF definition is available; in part, this may be due to the fact that many of the C data structures use unions, bit fields and arrays of structures which cannot be represented in MOF. In many cases, Microsoft definitions of these data structures can be found in the symbol files for some older DLLs; Geoff Chappell describes the background to this here and elsewhere.

WPA Event Name	C struct Name	Source
Memory: PageAccess	_MM_ETW_PAGE_INFO	urlmon.pdb
Memory: PageAccessEx	_MM_ETW_PAGE_INFO_EX	urlmon.pdb
Memory: PageCombine	_MM_ETW_PAGE_INFO_EX	urlmon.pdb
Memory: PageRelease	_MM_ETW_PAGE_INFO	urlmon.pdb
Memory: RemoveFromWS	_MM_ETW_PAGE_EXTRA_INFO	urlmon.pdb
Memory: PageInMemory	_MM_ETW_PAGE_INFO	urlmon.pdb
Memory: InMemoryActive	_MM_ETW_PAGE_INFO	urlmon.pdb
Memory: InMemoryActive Rundown	_MM_ETW_PAGE_INFO	urlmon.pdb
Memory: WS Shareable Rundown	_MM_ETW_WORKING_SET_PFN_RUNDOWN	urlmon.pdb
Memory: PageRangeAccess	_PERFINFO_PAGE_RANGE_IDENTITY	CertEnroll.pdb
Memory: PageRangeRelease	_PERFINFO_PAGE_RANGE_IDENTITY	CertEnroll.pdb
Memory: KeMemUsage	_PERFINFO_KERNELMEMORY_RANGE_USAGE	CertEnroll.pdb
Memory: MemInfo	_PERFINFO_MEMORY_INFORMATION	CertEnroll.pdb
Memory: MemInfoExWS	_PERFINFO_WORKINGSET_INFORMATION	CertEnroll.pdb
Memory: MMStat	_PERFINFO_PAGECOMBINE_AGGREGATE_STAT	CertEnroll.pdb
Section: Create	_PERFINFO_PFMAPPED_SECTION_INFORMATION	CertEnroll.pdb
Section: Delete	_PERFINFO_PFMAPPED_SECTION_INFORMATION	CertEnroll.pdb
Section: SectionObject Create	PERFINFO_PFMAPPED_SECTION_OBJECT_INFORMATION
Section: SectionObject Delete	PERFINFO_PFMAPPED_SECTION_OBJECT_INFORMATION
File: Map	MAPFILE_INFO
File: Unmap	MAPFILE_INFO
File: Start Rundown	MAPFILE_INFO
File: End Rundown	MAPFILE_INFO

The only type for which I could not find a definition is the type that I called MAPFILE_INFO (hence the italic rendition). The class MapFileTraceData in the PerfView KernelTraceEventParser.cs file gives a rough overview of the data contained in the event, but an analysis of the NT kernel routine MiFillMapFileInfo is needed to understand the details. Comparing any purported draft definition of the type with the interface IMappedFileLifetime and the raw event data can be helpful/informative.

Free Email Signature Generator