Secondary Dump Data seems to be a seldom discussed topic; a useful introduction is the “Inside Show” episode “Bugcheck Secondary Dump Data” by Andrew Richards and Chad Beeder (the main presenters of the “Defrag Tools” shows).
The Secondary Dump
Data mechanism can be used by all developers of kernel-mode software (including
Microsoft). The information included in the dump is whatever makes sense to the
developer, so it can be difficult for someone else to interpret. Microsoft has
added some debugger extensions to dump some of their secondary dump data. The
table below shows some of the GUIDs used to identify items of secondary dump
data and the debugger extension commands that displays them (where known).
GUID Symbol |
Debugger
Extension Command |
mssmbios!SMBiosGuidAcpi |
!sysinfo gbl |
mssmbios!SMBiosGuidBios |
!sysinfo cpuinfo cpuspeed cpumicrocode |
mssmbios!SMBiosGuidRegisters |
!sysinfo registers |
mssmbios!SMBiosGuidSMBios |
!sysinfo machineid |
nt!EtwSecondaryDumpDataGuid |
!wmitrace.strdump
!wmitrace.logsave |
nt!GUID_TRIAGEDUMP_DATA |
|
nt!PopBlackBoxAcpiGuid |
!ext.blackboxacpi |
nt!PopBlackBoxBsdGuid |
!ext.blackboxbsd |
nt!PopBlackBoxCodeIntegrityGuid |
|
nt!PopBlackBoxCrashedProcessGuid |
!ext.blackboxcrashedprocess |
nt!PopBlackBoxDxgDisplayGuid |
!ext.blackboxdxg |
nt!PopBlackBoxExplorerCoreStartupGuid |
|
nt!PopBlackBoxExplorerLogonTasksGuid |
|
nt!PopBlackBoxNtfsGuid |
!ext.blackboxntfs |
nt!PopBlackBoxPdcLockGuid |
!ext.blackboxpdclock |
nt!PopBlackBoxPnpDelayedRemoveWorkerGuid |
!ext.blackboxpnpdelayedremoveworker |
nt!PopBlackBoxPnpDeviceCompletionQueueGuid |
!ext.blackboxpnpdevicecompletionqueue |
nt!PopBlackBoxPnpEventWorkerGuid |
!ext.blackboxpnpeventworker |
nt!PopBlackBoxPnpGuid |
!ext.blackboxpnp |
nt!PopBlackBoxPoIrpGuid |
|
nt!PopBlackBoxPoPepWorkOrderGuid |
!ext.blackboxpopepworkorder |
nt!PopBlackBoxPoPowerWatchdogGuid |
!ext.blackboxpowerwatchdog |
nt!PopBlackBoxScmGuid |
|
nt! |
!ext.blackboxstoremanager |
nt!PopBlackBoxUsoCommitGuid |
|
nt!PopBlackBoxWheaGuid |
|
nt!PopBlackBoxWinLogonGuid |
|
nt!PopBlackBoxWinLogonNotifyGuid |
|
pci!PCI_CFG_RECORD_GUID |
!pci |
Wdf01000!WdfDumpGuid |
!wdfkd.wdfcrashdump |
The debugger (extension) commands “!rcdrkd.rcdrenumtag”
and “.enumtag” show the tags and data included in the secondary dump data “stream”
of a dump file.
Of particular interest to me is the EtwSecondaryDumpData;
ETW trace sessions started with the flag EVENT_TRACE_ADDTO_TRIAGE_DUMP set in
the logging mode will have their “in-memory” trace buffers added to the
secondary dump data. On my Windows 11 system, the Microsoft sessions named “EventLog-System”,
“Microsoft-Windows-Rdp-Graphics-RdpIdd-Trace” and “WiFiSession” have this flag
set, as do some third party sessions (e.g. “IntelRST”).
In principle, trace files for the sessions
can be extracted using debugger extension commands; in practice, a few bugs
currently make this difficult.
EtwSecondaryDumpData starts with some
values that can be used to reconstruct the TRACE_LOGFILE_HEADER of every trace
in the system (values for ProviderVersion, TimerResolution, CpuSpeedInMHz, BootTime and PerfFreq). Following this, for
each session that has the EVENT_TRACE_ADDTO_TRIAGE_DUMP flag set, a summary of
the WMI_LOGGER_CONTEXT (dt nt!_WMI_LOGGER_CONTEXT) for the session is added to
the data, including the “Logger Name”.
In the WMI_LOGGER_CONTEXT structure,
LoggerName is a UNICODE_STRING structure. The documentation for the “Length”
member of UNICODE_STRING says “Specifies the length, in bytes, of the string pointed to by the Buffer member,
not including the terminating NULL character, if any” (my emphasis). Older
versions of Windows seem to interpret “Length” as number of Unicode characters (UTF-16)
and copy twice the length of the Logger Name to the secondary dump data. The
debugger extension “!wmitrace.strdump” follows this example and expects the
same (double) length; this works on old secondary dump data but fails silently
on secondary dump data of newer Windows versions which copy the correct name
length (displays nothing beyond the first session name).
Note that “!wmitrace.strdump” detects
whether a dump is a triage dump or a larger dump; if the dump is a triage dump (DUMP_TYPE_TRIAGE)
then “!wmitrace.strdump” uses the secondary dump data but, if the dump is
larger, then the full kernel WMI_LOGGER_CONTEXT structures are used.
The second bug (at the time of writing) is
in the wmitrace routine that reads trace buffers (dt nt!_WMI_BUFFER_HEADER);
the routine seems to expect that full buffers will be read (length = WMI_BUFFER_HEADER.BufferSize)
although its parameters allow for sections of a buffer to be read. Since the “!wmitrace.logsave”
debugger extension tries to read buffer sections, it normally fails.
It was possible to verify that the debugger
extensions could work (by debugging the debugger and fixing-up the issues) but,
in practice, it is currently more comfortable (and insightful) to re-implement
the functionality.
This feature set (marking selected trace
sessions with EVENT_TRACE_ADDTO_TRIAGE_DUMP and adding such session trace
buffers to triage dumps) is very useful in remote support/diagnostic scenarios
where exchanges of full dumps are undesirable (because of confidentiality and
size issues) but some contextual/historical information is essential to
understand the state of the limited amount of kernel memory in a triage dump.
Note that triage dumps that are not created
from a bug check (e.g. triage dumps created by commands such as “kdbgctrl -td
<pid> <file>”) do not include secondary dump data.
Event Tracing for Windows (ETW) Data in Bug Check Primary Minidump
Data
There is a very limited amount of ETW data
in the primary data of a small memory dump (minidump). The debugger extension
commands “!wmitrace.dumpmini” and “!wmitrace.dumpminievent” are intended to
dump this data; similar to some of the commands previously mentioned, these
commands often choose/use one of the 3 “offset” fields in WMI_BUFFER_HEADER
which indicates that the buffer is empty. As a workaround, the commands “!wmitrace.buffer
poi(nt!KiCurrentEtwBufferBase)” and “!wmitrace.buffer
poi(nt!KiCurrentErrLogBufferBase)” give a flavor of the information available.
“!wmitrace.dumpmini” displays the current
buffer of the “Circular Kernel Context Logger” session for the current processor
– the “used” portion of this buffer is explicitly added to the kernel memory
dumped by a small memory dump. “!wmitrace.dumpminievent” displays the current
buffer of the “EventLog-System” session if a “current” buffer was identified at
the time of the dump (normally there is/was no “current” buffer for this
session) and added to the kernel memory dumped.
UPDATE
While thinking about troubleshooting
strategies for a problem, I noticed two more secondary dump data items to be
interesting for me: nt!GUID_TRIAGEDUMP_DATA and ipt!GUID_IPT.
Triage Dump Data
The “triage dump data” secondary dump data
(with GUID nt!GUID_TRIAGEDUMP_DATA) is a relatively simple list of kernel
address ranges (base and size) grouped by “component” (a free-form Latin 1 string).
This data can be attached to most dump types. In a “large” dump, it indicates
which ranges should be added to a “carved” minidump (a minidump created from a
larger dump).
The topic that I was thinking about was how
to access the buffers that were being used by the Intel Processor Trace (IPT)
facility at the time of the crash/dump; the IPT driver adds ranges describing
these buffers to the triage dump data (using the component name “IPT”) making
them easy to identify. This behaviour also makes it possible to share just
carved minidumps when remotely diagnosing a problem and wanting access to the
last recorded IPT entries.
Intel Processor Trace (IPT) Data in Bug Check Secondary
Dump Data
If IPT is active at the time of a dump
(including a “live dump”) then IPT secondary dump data is added to the dump
(using the GUID ipt!GUID_IPT). The header of the secondary dump data is an
IPT_TRACE structure and two TraceType values appear to be used: one to describe
“thread” traces and one to describe “core” traces.
The thread trace secondary data is mostly just
a simple summary of the number of threads being traced.
The core trace secondary data is more
useful: it contains the kernel address ranges of the trace buffers plus the
values of the relevant Intel Model-Specific Registers (MSRs) such as IA32_RTIT_OUTPUT_MASK_PTRS
and IA32_RTIT_STATUS.
The usefulness of the trace data is still
an open question for me. I have only experimented with the traces in
conjunction with a “live” dump and the process of creating the dump is traced,
meaning that the trace of the moments before the “dump” will almost certainly
be overwritten. It is conceivable that, in the event of crash, the IPT mechanism
will be stopped quickly (thus preserving the moments leading up to the crash).
No comments:
Post a Comment