As mentioned in a previous posting, a suitably configured WPR/ETW trace (with Loader and
ProcessorTrace events) provides most of the information needed to analyse the
IPT event data but practical considerations make the availability of a contemporaneous
kernel memory dump (almost) a
prerequisite for a successful analysis.
The reasons for preferring
a memory dump include load-time modifications to the code (especially “Import
Optimization”) and the difficulty of managing disassembly of many separate modules
(especially for inter-module calls).
I like to help other
people troubleshoot their problems and sharing ETW traces is often essential.
There are sometimes concerns about the security implications of sharing trace
data but sufficiently sophisticated “problem owners” do sometimes share;
however, probably in most cases, sharing a (live) kernel memory dump would be
considered too difficult/risky.
The approach that I
have taken is to create a “dump” file from the information in an ETW data file.
The first consideration was how to create a dump containing just selected kernel-mode
modules and the first thing that I tested was whether it was possible to load a
kernel-mode module in a user-mode process. By “load”, I essentially mean “use
LoadLibraryEx”: I knew that it was possible to map any PE module as an “image”
using standard file mapping APIs, but I want the ntdll “Ldr” structures to be
created so that the “dump” routines would record the kernel module as a loaded
module. The documentation for LoadLibraryEx says:
If the specified
module is an executable module, static imports are not loaded; instead, the module
is loaded as if DONT_RESOLVE_DLL_REFERENCES was specified. See
the dwFlags parameter for more information.
In practice “is an executable” means “is
not a DLL” (IMAGE_FILE_HEADER. Characteristics does not include IMAGE_FILE_DLL)
and many kernel-mode modules are not “DLLs” but some are (fwpkclnt.sys, for
example). Therefore, I specify the DONT_RESOLVE_DLL_REFERENCES explicitly to
cover such cases (even though the documentation says “Do not use this value; it
is provided only for backward compatibility.”).
A quick test of MiniDumpWriteDump verified
that a usable dump file could be created with the kernel modules in the loaded
modules list. The next step is to apply “import optimization” to the loaded modules.
The article Mitigating
Spectre variant 2 with Retpoline on Windows describes the process; the only
additional “discovery” that I made is that some of the IMAGE_IMPORT_CONTROL_TRANSFER_DYNAMIC_RELOCATION
entries contain a value of 0x7FFFF in the IATIndex field (the maximum unsigned
19-bit value), indicating that no optimization should be performed.
It is not necessary, for the intended
purpose of IPT data analysis, to perform any other modifications to the loaded
image (e.g. apply relocations).
Having loaded the modules and applied
import optimization, one is now in a position to create the dump file with MiniDumpWriteDump.
I used a DumpType of “MiniDumpWithCodeSegs | MiniDumpWithoutAuxiliaryState”;
only the code sections are needed for IPT data analysis and, since I use
C#/.NET, the MiniDumpWithoutAuxiliaryState prevents the CLR auxiliary provider
from intervening in the dump process. It is not essential, but in order to keep
the dump file small and simple, I used the callback types IncludeThreadCallback
and IncludeModuleCallback to include just a single thread (to keep debuggers
happy) and just the kernel modules.
The resulting dump file still records the
kernel modules as being loaded at random user-mode addresses – the final step
is to post-process the dump file to relocate them to the locations recorded in
the source ETW data. The design of the MiniDumpReadDumpStream API makes this
step easy: the routine provides pointers to the key data structures in the
mapped view of the dump file – if the file is mapped in a read/write mode then
one can just update the values in the structure. We just need to update the MINIDUMP_MODULE.BaseOfImage
values for the kernel modules and the MINIDUMP_MEMORY_DESCRIPTOR.StartOfMemoryRange
values for memory ranges corresponding to sections of the kernel modules.
Preliminary testing of the hand-crafted
dump files has so far revealed only one problem – some system modules still
import from HAL.dll although (on my current version of Windows 11) this DLL
contains no code and just export forwarders for routines in ntoskrnl.exe. My
simple “import optimization” code does not handle this case but as a simple
workaround, I just “special case” imports from “HAL.dll” – treating them as
direct imports from ntoskrnl.exe.
A suitably configured ETW trace contains
enough information to obtain Windows modules (if they are not already locally
available), but if the IPT data records paths through third-party modules then
these modules will need to be explicitly requested and copied from the “problem
owner”.
The “art” of interpreting ETW IPT data
As implied in my original posting on this
topic, interpreting ETW IPT data is rather an “art” and, as a side effect, a
manual process.
The first step that requires intuition or
experience is choosing the events to trace. When one has potentially used the
full range of event providers (manifest, WPP, MOF, etc.) to identify “interesting”
behaviour, one then has to identify a “SystemProvider” event that occurs
shortly afterwards that will trigger logging of the IPT buffer.
Interpreting IPT data is time consuming, so
it makes little sense to try to interpret every IPT event in an ETW trace – it is
better to select the event(s) with the best chance of being useful. Currently,
having identified an IPT event of potential interest in an ETW viewing tool, I
copy the IPT data as a hexadecimal string and paste it into a file for
analysis.
If a complete dump is not available, then one
has to decide which modules to include in a “hand-made” dump file and this
might involve retrieving files from the Microsoft Symbol Server. The “first
pass” technique that I use to identify the modules to include in a dump is to
perform a “quick” analysis of the IPT data – just looking for addresses in TIP
and FUP packets. The resulting list of modules containing those addresses might
not be complete but a subsequent attempt to fully analyse the IPT data will
indicating the next module that needs to be added to the list.
One cannot be certain, on the basis of ETW
data alone, whether “import optimization” is in use on a remote system before
one starts to analyse the IPT data. If it is not in use then one would need to
create a new dump file without import optimization.
Great one again Gary.
ReplyDeleteI've looked at this particular provider and it's in "ipt.sys".
It seems it will only do IPT CPU core dumps. Not the per process way that would probably be more useful.
Meaning this way how Alex Ionescu https://github.com/ionescu007/winipt
has a per process polling IPT reading this way.
There is actually a way to start an IPT recording on a process using native Windows 10 tools, but it's not published.
It would be nice if there is a per process streaming way.
AFAIK with that IPT ETW stream, one would have to parse thread switching, what ever out of the raw CPU core dumps.
Is this how you understand it?
Hello Neighbor,
DeleteThere is a method in ipt.sys to stream IPT data to a file (it was added after Alex wrote that article).
Regarding the ETW IPT support, management of the IPT registers has been incorporated in the Windows context switch code; an ETW IPT event contains entries for only one thread.
Gary