My interest in IPT under Windows has been piqued more than once but, until now, my judgement of the effort versus benefit of interpreting IPT data tended to a “not worth it” decision: identifying and displaying the individual IPT packets seemed straightforward enough but interpreting the Taken/Not-taken (TNT) bits would require a disassembler and the binary code and load addresses of all the executable files that might have been executed. Mentions of IPT tracing and trace analysis found in the Web often mention very large volumes of trace data and very long trace analysis times.
The event that
prompted a more detailed consideration of the effort required to develop a
simple (and therefore slow) tool to analyse very short IPT traces was the
discovery that Windows Performance Recorder (WPR) can be configured to cause “IPT”
events to be generated as an accompaniment to kernel events. The Windows
Performance Recorder Profile (WPRP) schema (https://docs.microsoft.com/en-us/windows-hardware/test/wpt/wprcontrolprofiles-schema)
includes a description of the “HardwareCounter” element which can contain
elements such as “LastBranch” (for Last Branch Recording (LBR)), “Counters”
(for capturing Performance Monitoring Counters (PMC)) and “ProcessorTrace” (for
Intel Processor Tracing (IPT)).
The configuration of
ProcessorTrace is simple: one just has to specify three items:
- The “CodeMode” for the trace (user, kernel or user plus kernel).
- The (maximum) “BufferSize” of the IPT trace data (chosen from 4, 8, 16 or 32 kilobytes).
- A list of the kernel “Events” that cause a corresponding IPT event to be generated. The events can be chosen from the “SystemStackEnumeration” which is the list of event names that can be used in configuring stack traces for the system provider.
One thing that is
missing is an equivalent of the “CustomStack” element for defining custom
events, since not all of the kernel events are included in SystemStackEnumeration.
A sample configuration could look like this:
<HardwareCounter Id="Perf">
<ProcessorTrace>
<BufferSize
Value="4" />
<CodeMode
Value="Kernel" />
<Events>
<Event
Value="SystemCallExit" />
</Events>
</ProcessorTrace>
</HardwareCounter>
Starting “small” (smallest buffer size and
a single code mode) might be advisable until one has acquired experience in the
analysis and interpretation of the data.
Out-of-the-box, there are no MOF classes
describing most of the performance events; I added the following definitions for
IPT on my system with the “mofcomp” utility:
[dynamic: ToInstance,
Guid("{ff1fd2fd-6008-42bb-9e75-00a20051f3be}"), EventVersion(2),
DisplayName("IntelProcessorTrace")]
class IPT_V2 : MSNT_SystemTrace
{
};
[dynamic: ToInstance,
EventType{32}, EventTypeName{"ProcessorTrace"}]
class IPT_Event : IPT_V2
{
[WmiDataId(1), read] uint64 EventTimeStamp;
[WmiDataId(2), read] uint32 Process;
[WmiDataId(3), read] uint32 Thread;
[WmiDataId(4), read, format("x")] uint64 IptOption;
[WmiDataId(5), read] uint32 TraceSize;
[WmiDataId(6), read] uint32 TracePosition;
};
The actual trace data follows immediately after this header; I could not think of a way to include the variable length array of bytes in the MOF class definition.
The first three members are identical in
meaning to a kernel stack trace event. A type for the IptOption value is
available as a public type in the ipt.sys driver; Windows debuggers display it
thus:
0:000> dt ipt!_IPT_OPTION
+0x000 TraceMode : Pos 0, 4
Bits
+0x000 TimeMode : Pos 4, 4
Bits
+0x000 MTCFreq : Pos 8, 4
Bits
+0x000 CycThresh : Pos 12,
4 Bits
+0x000 BufferSize : Pos 16,
4 Bits
+0x000 TraceSessionMode : Pos 20, 3 Bits
+0x000 TraceChild : Pos 23,
1 Bit
+0x000 TraceCodeMode : Pos 24,
4 Bits
+0x000 Reserved2 : Pos 28,
4 Bits
+0x004 Reserved3 : Uint4B
+0x000 Value : Uint8B
The TraceSize is the size of the trace
data; if the size is less than the configured size, then the “entire” trace is
available (all trace data from the last context switch until the triggering
event occurred). If TraceSize is equal to the configured “BufferSize” then the
trace has probably wrapped and “TracePosition” is the point in the (circular)
buffer at which the next packet would have been written; one has to search the
buffer in a circular fashion for a PSB (Packet Stream Boundary) packet,
starting from the TracePosition.
By including “Loader”keyword events in a
WPR trace (which enables loaded modules to be identified, along with their load
address), one seems “in good shape” to interpret the IPT trace.
A disassembler is needed to interpret the
IPT trace and fortunately one is readily available: the one used by the Windows
debuggers, namely the Disassemble method of the IDebugControl interface. The
disassembler is needed to identify relevant instructions (e.g. conditional branches)
and instruction lengths. The “Disassemble” method does much more than this, formatting
the instruction as a string and performing symbol look-up for memory
references, so it is slow but it does the job and obviates the need to develop
a purpose oriented replacement.
In a typical trace, code from many
executable files may appear and the IDebugClient/IDebugControl interfaces is
probably not well suited to simultaneously opening several separate executable files.
“Fortunately”, I encountered another problem with this approach and the same “solution”
resolved both problems.
This code is taken from the executable
file; because it contains an indirect control transfer, a TIP (Target IP) would
be needed in the IPT trace:
tcpip!TcpDeliverDataToClient+0x119:
call qword ptr
[tcpip!_imp_KeAcquireSpinLockAtDpcLevel (00000001`c02331e8)]
nop dword ptr [rax+rax]
cmp r14d,0C000021Bh
However no TIP was present and it turned out that the code in memory that was actually executed looks like this (direct control transfer):
tcpip!TcpDeliverDataToClient+0x119:
mov r10,qword ptr
[tcpip!_imp_KeAcquireSpinLockAtDpcLevel (fffff803`590d31e8)]
call nt!KeAcquireSpinLockAtDpcLevel
(fffff803`534a2490)
cmp r14d,0C000021Bh
Import Optimization (https://techcommunity.microsoft.com/t5/windows-kernel-internals-blog/mitigating-spectre-variant-2-with-retpoline-on-windows/ba-p/295618) had been applied when building the executable and, whilst there is obviously
sufficient metadata in the executable file to recognize and emulate the code
modifications, it would be difficult to integrate this into the simple use of
the disassembler.
The “solution” was to use a “dump” of the
process (or a “live dump” of the kernel) to perform the analysis. This
simplifies many things but also means that a standalone ETW (Event Tracing for
Windows) file is not enough for an analysis with my simple tool (a dump is
needed too).
The “conciseness” of the IPT trace data
means that it is not easy to “check” whether an analysis is proceeding
correctly. One of my many mistakes was in incorrectly handling “Indirect
Transfer Compression for Returns” (the uncompressed cases), but “RET
compression” was a big help in identifying problems: if a RET was compressed,
then the Taken/Not-taken bit should be set and if it is clear then one knows
that something has gone wrong. Another hint is if the “recorded” code path does
not seem plausible; this is not always easy to judge, but I often found that my
tool was analyzing the routine “KeBugCheckEx” – something that had patently not
happened during the trace capture.
My “use case” for IPT tracing is as an
additional aid in debugging/troubleshooting tricky/interesting problems. For
this type of tracing to be useful, one needs to identify kernel events that
occur after the code of interest has been executed and whose IPT data might
include the path taken. The limited set of events in SystemStackEnumeration (lacking, for example,
network events) is a hindrance, but the undocumented API to set additional
custom events is relatively easy to deduce. IPT trace data attached to the “CSwitch”
event is often useful; some traces are very short (a context switch from idle)
and are useful for testing the TNT interpreter and some others are useful “backstops”
for data gathering (especially if the context switch is the result of a natural
“break” in execution, such as entering a wait state).
\Device\IPT IOCTL Interface
IPT can be used
separately from ETW: the ipt.sys driver makes certain IPT operations available
via an IOCTL style interface. This interface is not documented, but the ipt.sys
driver is small and “straightforward”, so many of the features of the interface
can be deduced. As someone who is almost exclusively interested in short IPT,
it is a relief that the interface supports some of the IPT filtering mechanisms
– most importantly filtering by IP (Instruction Pointer).
The current
(undocumented and probably still evolving) interface allows IPT tracing to be
enabled for a process and IP filtering to be configured per thread. Once
tracing has been enabled for a process (which enables tracing on all threads in
the process), tracing of individual threads can be suspended and resumed and IP
filtering can be applied to individual threads. Threads created after tracing
has started inherit the tracing options set for the process but start without
any IP filtering.
I am often interested
in tracing the path through short sections of code in service processes, where
the thread which will execute the code cannot easily be predicted and might even
be a newly created thread. I wanted to avoid “invading” the process to be
traced (by attaching a “debugger”), but that is the only standard way of being
informed of (and partially controlling) thread creation in a process.
Initially, I thought that this would be simple: just receive the debug events,
apply the IPT IP filter to any newly created threads and then resume the target.
However applying an IPT IP filter to a thread that has just been created and is
paused at the create thread debug event has no effect – it is necessary to arrange
for the thread to proceed to the “LdrInitializeThunk” routine before applying
the filter.
Most of the IPT
tracing configurable via IOCTL traces to circular buffers; these buffers can be
large and, with judicious filtering, they might not need to wrap. There is one
operation that writes the trace data directly to a file, ensuring a complete
trace; this operation just traces the user mode behaviour of a process and does
not support IP filtering.
Summary
I am often interested
with problems related to networking (for example, a potential minor problem in
the Windows Filtering Platform, described in an earlier posting) and the
frequently used troubleshooting tools are event tracing (including network
packet) and user-mode debugging. Kernel debugging is possible but I use it only
very occasionally (partly because disturbing the timing of things in the
debugger disturbs the whole evolution of the debugging scenario). IPT tracing
will hopefully be useful, when it can be
applied. Often the “transmit” side of communication occurs in a predictable
process (and a process for which a “handle” can be obtained) and here process
based tracing can be effectively deployed. However, the “receive” side can
occur in any context/process and I hope that combining ETW and IPT will help
there. There are also common scenarios where the “transmissions” originate from
the “System” process (e.g. SMB traffic) and the IOCTL interface, which uses
handles rather than process ids to identify the target, can’t be used there.
Great post. Are you having any code ready to publish so others can play with it? Networking and ETW is indeed a severely under documented topic.
ReplyDelete