Tuesday 9 May 2023

Event Tracing for Windows (ETW) Data in Bug Check Secondary Dump Data

Secondary Dump Data seems to be a seldom discussed topic; a useful introduction is the “Inside Show” episode “Bugcheck Secondary Dump Data” by Andrew Richards and Chad Beeder (the main presenters of the “Defrag Tools” shows).

The Secondary Dump Data mechanism can be used by all developers of kernel-mode software (including Microsoft). The information included in the dump is whatever makes sense to the developer, so it can be difficult for someone else to interpret. Microsoft has added some debugger extensions to dump some of their secondary dump data. The table below shows some of the GUIDs used to identify items of secondary dump data and the debugger extension commands that displays them (where known).

GUID Symbol

Debugger Extension Command

mssmbios!SMBiosGuidAcpi

!sysinfo gbl

mssmbios!SMBiosGuidBios

!sysinfo cpuinfo cpuspeed cpumicrocode

mssmbios!SMBiosGuidRegisters

!sysinfo registers

mssmbios!SMBiosGuidSMBios

!sysinfo machineid

nt!EtwSecondaryDumpDataGuid

!wmitrace.strdump !wmitrace.logsave

nt!GUID_TRIAGEDUMP_DATA

 

nt!PopBlackBoxAcpiGuid

!ext.blackboxacpi

nt!PopBlackBoxBsdGuid

!ext.blackboxbsd

nt!PopBlackBoxCodeIntegrityGuid

 

nt!PopBlackBoxCrashedProcessGuid

!ext.blackboxcrashedprocess

nt!PopBlackBoxDxgDisplayGuid

!ext.blackboxdxg

nt!PopBlackBoxExplorerCoreStartupGuid

 

nt!PopBlackBoxExplorerLogonTasksGuid

 

nt!PopBlackBoxNtfsGuid

!ext.blackboxntfs

nt!PopBlackBoxPdcLockGuid

!ext.blackboxpdclock

nt!PopBlackBoxPnpDelayedRemoveWorkerGuid

!ext.blackboxpnpdelayedremoveworker

nt!PopBlackBoxPnpDeviceCompletionQueueGuid

!ext.blackboxpnpdevicecompletionqueue

nt!PopBlackBoxPnpEventWorkerGuid

!ext.blackboxpnpeventworker

nt!PopBlackBoxPnpGuid

!ext.blackboxpnp

nt!PopBlackBoxPoIrpGuid

 

nt!PopBlackBoxPoPepWorkOrderGuid

!ext.blackboxpopepworkorder

nt!PopBlackBoxPoPowerWatchdogGuid

!ext.blackboxpowerwatchdog

nt!PopBlackBoxScmGuid

 

nt!

!ext.blackboxstoremanager

nt!PopBlackBoxUsoCommitGuid

 

nt!PopBlackBoxWheaGuid

 

nt!PopBlackBoxWinLogonGuid

 

nt!PopBlackBoxWinLogonNotifyGuid

 

pci!PCI_CFG_RECORD_GUID

!pci

Wdf01000!WdfDumpGuid

!wdfkd.wdfcrashdump

 

The debugger (extension) commands “!rcdrkd.rcdrenumtag” and “.enumtag” show the tags and data included in the secondary dump data “stream” of a dump file.

Of particular interest to me is the EtwSecondaryDumpData; ETW trace sessions started with the flag EVENT_TRACE_ADDTO_TRIAGE_DUMP set in the logging mode will have their “in-memory” trace buffers added to the secondary dump data. On my Windows 11 system, the Microsoft sessions named “EventLog-System”, “Microsoft-Windows-Rdp-Graphics-RdpIdd-Trace” and “WiFiSession” have this flag set, as do some third party sessions (e.g. “IntelRST”).

In principle, trace files for the sessions can be extracted using debugger extension commands; in practice, a few bugs currently make this difficult.

EtwSecondaryDumpData starts with some values that can be used to reconstruct the TRACE_LOGFILE_HEADER of every trace in the system (values for ProviderVersion, TimerResolution, CpuSpeedInMHz,  BootTime and PerfFreq). Following this, for each session that has the EVENT_TRACE_ADDTO_TRIAGE_DUMP flag set, a summary of the WMI_LOGGER_CONTEXT (dt nt!_WMI_LOGGER_CONTEXT) for the session is added to the data, including the “Logger Name”.

In the WMI_LOGGER_CONTEXT structure, LoggerName is a UNICODE_STRING structure. The documentation for the “Length” member of UNICODE_STRING says “Specifies the length, in bytes, of the string pointed to by the Buffer member, not including the terminating NULL character, if any” (my emphasis). Older versions of Windows seem to interpret “Length” as number of Unicode characters (UTF-16) and copy twice the length of the Logger Name to the secondary dump data. The debugger extension “!wmitrace.strdump” follows this example and expects the same (double) length; this works on old secondary dump data but fails silently on secondary dump data of newer Windows versions which copy the correct name length (displays nothing beyond the first session name).

Note that “!wmitrace.strdump” detects whether a dump is a triage dump or a larger dump; if the dump is a triage dump (DUMP_TYPE_TRIAGE) then “!wmitrace.strdump” uses the secondary dump data but, if the dump is larger, then the full kernel WMI_LOGGER_CONTEXT structures are used.

The second bug (at the time of writing) is in the wmitrace routine that reads trace buffers (dt nt!_WMI_BUFFER_HEADER); the routine seems to expect that full buffers will be read (length = WMI_BUFFER_HEADER.BufferSize) although its parameters allow for sections of a buffer to be read. Since the “!wmitrace.logsave” debugger extension tries to read buffer sections, it normally fails.

It was possible to verify that the debugger extensions could work (by debugging the debugger and fixing-up the issues) but, in practice, it is currently more comfortable (and insightful) to re-implement the functionality.

This feature set (marking selected trace sessions with EVENT_TRACE_ADDTO_TRIAGE_DUMP and adding such session trace buffers to triage dumps) is very useful in remote support/diagnostic scenarios where exchanges of full dumps are undesirable (because of confidentiality and size issues) but some contextual/historical information is essential to understand the state of the limited amount of kernel memory in a triage dump.

Note that triage dumps that are not created from a bug check (e.g. triage dumps created by commands such as “kdbgctrl -td <pid> <file>”) do not include secondary dump data.

Event Tracing for Windows (ETW) Data in Bug Check Primary Minidump Data

There is a very limited amount of ETW data in the primary data of a small memory dump (minidump). The debugger extension commands “!wmitrace.dumpmini” and “!wmitrace.dumpminievent” are intended to dump this data; similar to some of the commands previously mentioned, these commands often choose/use one of the 3 “offset” fields in WMI_BUFFER_HEADER which indicates that the buffer is empty. As a workaround, the commands “!wmitrace.buffer poi(nt!KiCurrentEtwBufferBase)” and “!wmitrace.buffer poi(nt!KiCurrentErrLogBufferBase)” give a flavor of the information available.

“!wmitrace.dumpmini” displays the current buffer of the “Circular Kernel Context Logger” session for the current processor – the “used” portion of this buffer is explicitly added to the kernel memory dumped by a small memory dump. “!wmitrace.dumpminievent” displays the current buffer of the “EventLog-System” session if a “current” buffer was identified at the time of the dump (normally there is/was no “current” buffer for this session) and added to the kernel memory dumped.

UPDATE

While thinking about troubleshooting strategies for a problem, I noticed two more secondary dump data items to be interesting for me: nt!GUID_TRIAGEDUMP_DATA and ipt!GUID_IPT.

Triage Dump Data

The “triage dump data” secondary dump data (with GUID nt!GUID_TRIAGEDUMP_DATA) is a relatively simple list of kernel address ranges (base and size) grouped by “component” (a free-form Latin 1 string). This data can be attached to most dump types. In a “large” dump, it indicates which ranges should be added to a “carved” minidump (a minidump created from a larger dump).

The topic that I was thinking about was how to access the buffers that were being used by the Intel Processor Trace (IPT) facility at the time of the crash/dump; the IPT driver adds ranges describing these buffers to the triage dump data (using the component name “IPT”) making them easy to identify. This behaviour also makes it possible to share just carved minidumps when remotely diagnosing a problem and wanting access to the last recorded IPT entries.

Intel Processor Trace (IPT) Data in Bug Check Secondary Dump Data

If IPT is active at the time of a dump (including a “live dump”) then IPT secondary dump data is added to the dump (using the GUID ipt!GUID_IPT). The header of the secondary dump data is an IPT_TRACE structure and two TraceType values appear to be used: one to describe “thread” traces and one to describe “core” traces.

The thread trace secondary data is mostly just a simple summary of the number of threads being traced.

The core trace secondary data is more useful: it contains the kernel address ranges of the trace buffers plus the values of the relevant Intel Model-Specific Registers (MSRs) such as IA32_RTIT_OUTPUT_MASK_PTRS and IA32_RTIT_STATUS.

The usefulness of the trace data is still an open question for me. I have only experimented with the traces in conjunction with a “live” dump and the process of creating the dump is traced, meaning that the trace of the moments before the “dump” will almost certainly be overwritten. It is conceivable that, in the event of crash, the IPT mechanism will be stopped quickly (thus preserving the moments leading up to the crash).