Saturday, 13 August 2022

Analyzing ETW/WPR Intel Processor Trace data from a remote system

 

As mentioned in a previous posting, a suitably configured WPR/ETW trace (with Loader and ProcessorTrace events) provides most of the information needed to analyse the IPT event data but practical considerations make the availability of a contemporaneous kernel memory dump (almost) a prerequisite for a successful analysis.

The reasons for preferring a memory dump include load-time modifications to the code (especially “Import Optimization”) and the difficulty of managing disassembly of many separate modules (especially for inter-module calls).

I like to help other people troubleshoot their problems and sharing ETW traces is often essential. There are sometimes concerns about the security implications of sharing trace data but sufficiently sophisticated “problem owners” do sometimes share; however, probably in most cases, sharing a (live) kernel memory dump would be considered too difficult/risky.

The approach that I have taken is to create a “dump” file from the information in an ETW data file. The first consideration was how to create a dump containing just selected kernel-mode modules and the first thing that I tested was whether it was possible to load a kernel-mode module in a user-mode process. By “load”, I essentially mean “use LoadLibraryEx”: I knew that it was possible to map any PE module as an “image” using standard file mapping APIs, but I want the ntdll “Ldr” structures to be created so that the “dump” routines would record the kernel module as a loaded module. The documentation for LoadLibraryEx says:

If the specified module is an executable module, static imports are not loaded; instead, the module is loaded as if DONT_RESOLVE_DLL_REFERENCES was specified. See the dwFlags parameter for more information.

In practice “is an executable” means “is not a DLL” (IMAGE_FILE_HEADER. Characteristics does not include IMAGE_FILE_DLL) and many kernel-mode modules are not “DLLs” but some are (fwpkclnt.sys, for example). Therefore, I specify the DONT_RESOLVE_DLL_REFERENCES explicitly to cover such cases (even though the documentation says “Do not use this value; it is provided only for backward compatibility.”).

A quick test of MiniDumpWriteDump verified that a usable dump file could be created with the kernel modules in the loaded modules list. The next step is to apply “import optimization” to the loaded modules. The article Mitigating Spectre variant 2 with Retpoline on Windows describes the process; the only additional “discovery” that I made is that some of the IMAGE_IMPORT_CONTROL_TRANSFER_DYNAMIC_RELOCATION entries contain a value of 0x7FFFF in the IATIndex field (the maximum unsigned 19-bit value), indicating that no optimization should be performed.

It is not necessary, for the intended purpose of IPT data analysis, to perform any other modifications to the loaded image (e.g. apply relocations).

Having loaded the modules and applied import optimization, one is now in a position to create the dump file with MiniDumpWriteDump. I used a DumpType of “MiniDumpWithCodeSegs | MiniDumpWithoutAuxiliaryState”; only the code sections are needed for IPT data analysis and, since I use C#/.NET, the MiniDumpWithoutAuxiliaryState prevents the CLR auxiliary provider from intervening in the dump process. It is not essential, but in order to keep the dump file small and simple, I used the callback types IncludeThreadCallback and IncludeModuleCallback to include just a single thread (to keep debuggers happy) and just the kernel modules.

The resulting dump file still records the kernel modules as being loaded at random user-mode addresses – the final step is to post-process the dump file to relocate them to the locations recorded in the source ETW data. The design of the MiniDumpReadDumpStream API makes this step easy: the routine provides pointers to the key data structures in the mapped view of the dump file – if the file is mapped in a read/write mode then one can just update the values in the structure. We just need to update the MINIDUMP_MODULE.BaseOfImage values for the kernel modules and the MINIDUMP_MEMORY_DESCRIPTOR.StartOfMemoryRange values for memory ranges corresponding to sections of the kernel modules.

Preliminary testing of the hand-crafted dump files has so far revealed only one problem – some system modules still import from HAL.dll although (on my current version of Windows 11) this DLL contains no code and just export forwarders for routines in ntoskrnl.exe. My simple “import optimization” code does not handle this case but as a simple workaround, I just “special case” imports from “HAL.dll” – treating them as direct imports from ntoskrnl.exe.

A suitably configured ETW trace contains enough information to obtain Windows modules (if they are not already locally available), but if the IPT data records paths through third-party modules then these modules will need to be explicitly requested and copied from the “problem owner”.

The “art” of interpreting ETW IPT data

As implied in my original posting on this topic, interpreting ETW IPT data is rather an “art” and, as a side effect, a manual process.

The first step that requires intuition or experience is choosing the events to trace. When one has potentially used the full range of event providers (manifest, WPP, MOF, etc.) to identify “interesting” behaviour, one then has to identify a “SystemProvider” event that occurs shortly afterwards that will trigger logging of the IPT buffer.

Interpreting IPT data is time consuming, so it makes little sense to try to interpret every IPT event in an ETW trace – it is better to select the event(s) with the best chance of being useful. Currently, having identified an IPT event of potential interest in an ETW viewing tool, I copy the IPT data as a hexadecimal string and paste it into a file for analysis.

If a complete dump is not available, then one has to decide which modules to include in a “hand-made” dump file and this might involve retrieving files from the Microsoft Symbol Server. The “first pass” technique that I use to identify the modules to include in a dump is to perform a “quick” analysis of the IPT data – just looking for addresses in TIP and FUP packets. The resulting list of modules containing those addresses might not be complete but a subsequent attempt to fully analyse the IPT data will indicating the next module that needs to be added to the list.

One cannot be certain, on the basis of ETW data alone, whether “import optimization” is in use on a remote system before one starts to analyse the IPT data. If it is not in use then one would need to create a new dump file without import optimization.

2 comments:

  1. Great one again Gary.

    I've looked at this particular provider and it's in "ipt.sys".
    It seems it will only do IPT CPU core dumps. Not the per process way that would probably be more useful.

    Meaning this way how Alex Ionescu https://github.com/ionescu007/winipt
    has a per process polling IPT reading this way.

    There is actually a way to start an IPT recording on a process using native Windows 10 tools, but it's not published.
    It would be nice if there is a per process streaming way.

    AFAIK with that IPT ETW stream, one would have to parse thread switching, what ever out of the raw CPU core dumps.
    Is this how you understand it?

    ReplyDelete
    Replies
    1. Hello Neighbor,
      There is a method in ipt.sys to stream IPT data to a file (it was added after Alex wrote that article).
      Regarding the ETW IPT support, management of the IPT registers has been incorporated in the Windows context switch code; an ETW IPT event contains entries for only one thread.
      Gary

      Delete