There are a number of approaches to “tapping into” the network traffic of a Microsoft Windows (desktop/server) operating system. I would like to share some practical experience of using the various approaches.
NDIS Filter
Using an NDIS (Network Driver Interface Specification) filter driver is probably the most common technique – it is the technique used by Microsoft’s “Network Monitor” and one of the options for packet capture in its successor (Microsoft’s “Message Analyzer”). Typically, Wireshark (perhaps the best known third party sniffer used under Windows) also uses this technique (via use of the WinPcap or Npcap NDIS filter drivers).
An NDIS filter can observe and capture all of the activity at the data link layer (which can be divided into the logical link control (LLC) and medium access control (MAC) sublayers) – making it network (layer) protocol independent; it is the only technique that I shall mention which has this capability. If one wishes to capture MAC frame headers or network protocols other than IPv4/IPv6, then this is the technique to use.
There are at least two problems with the NDIS filter approach:
· Traffic over loopback interfaces cannot be captured (since these “software” network interfaces do not use NDIS).
· Network traffic can be interrupted when starting and stopping a capture session, since the NDIS filter needs to be “bound” into the driver stack. The “binding” process “pauses” traffic through the stack and (very occasionally) this “pause” can continue for an extended period of time – potentially causing network connections to be closed.
On one occasion (or possibly two – it is the first that remains in memory), I started a network trace on a production server whilst logged in via Remote Desktop (RDP) and the RDP connection was immediately broken and could not re-established for several minutes (almost certainly due to a problem draining packets from the driver stack during a pause/bind operation).
Recent versions of Windows include such an NDIS filter driver – the driver/service NdisCap. This driver exposes the captured traffic via the Event Tracing for Windows (ETW) mechanism as the provider “Microsoft-Windows-NDIS-PacketCapture”. A limited filtering capability is also exposed via this ETW provider.
The “Microsoft-Windows-NDIS-PacketCapture” provider is used by Message Analyzer, the “netsh trace” command and the “NetEventPacketCapture” PowerShell cmdlets (in particular, the “Add-NetEventPacketCaptureProvider” cmdlet).
WFP Callouts
The Windows Filtering Platform (WFP) allows developers to “intervene” at several stages in the processing that takes place as a packet flows through the IPv4/IPv6 network stack – including the capability of capturing the network traffic (from the network layer upwards).
MAC frame headers and network protocols other than IPv4/IPv6 are not included in the captured data, but packets sent via loopback can be captured; IPsec traffic in its unencrypted state (i.e. before/after encryption/decryption) can be captured too.
Adding and removing WFP callouts does not “pause” the network stack and is less likely to cause a network interruption than binding/unbinding an NDIS filter driver (reconfiguring WFP is a normal/common activity).
Recent versions of Windows include such a WFP callout driver – the driver/service WfpCapture. This driver exposes the captured traffic via the Event Tracing for Windows (ETW) mechanism as the provider “Microsoft-Pef-WFP-MessageProvider”. A limited filtering capability is also exposed via this ETW provider.
The “Microsoft-Pef-WFP-MessageProvider” provider is used by Message Analyzer and the “NetEventPacketCapture” PowerShell cmdlets (in particular, the “Add-NetEventWFPCaptureProvider” cmdlet).
At the time of writing, the current version of WfpCapture does not pass the Driver Signing Policy enforced by Windows 10, version 1607 and later. Unless one or more of the exception conditions apply (i.e. Secure Boot is disabled or the installed Windows version was upgraded from an earlier release of Windows (rather than being a “clean” install)), then this WFP callout driver cannot be used. This is the most recent message from Microsoft that I could find on this topic:
Yes, we were able to repro with SecureBoot enabled. We are looking at this now and
post a new build when we have this fixed. But I don't have a time
frame.
Paul
Paul E Long Microsoft (MSFT) Thursday,
October 13, 2016 1:08 PM
The changes to the Driver Signing Policy were discussed at a 2016 Filter Plugfest (video available on Channel 9) and Scott Anderson from Microsoft mentioned four exceptions to the policy – the three currently documented exceptions and a “test reg key to allow cross-signed certificates to work”. Peter Viscarola (founder of OSR) later wrote, in response to a discussion of this topic:
I hate to say this, but since you asked: The
registry key information is only available under NDA.
The “upgraded” system exception to the driver signing policy is signalled by a registry value in the Code Integrity (CI) Policy key. The driver signing policy is slowly being tightened; future releases will first only allow exceptions for “boot start” drivers before finally removing all exceptions.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CI\Policy
BootUpgradedSystem
REG_DWORD 0x1
UpgradedSystem
REG_DWORD 0x1
Raw Sockets
The “functionality” of raw sockets under Windows (i.e. which packets they are capable of receiving, such as outbound packets) has changed over the years. Windows 10 raw sockets can receive all IPv4 packets (both inbound and outbound) including their IPv4 headers and all IPv6 packets – but only from the transport layer upwards (i.e. excluding their IPv6 headers). The receipt of inbound packets is subject to the Windows Defender Firewall rules in force – it is normally necessary to add a rule to grant access.
Since raw sockets are built into the kernel TCP/IP implementation, there is no need for additional kernel-mode code (such as NDIS filter drivers or WFP callout drivers). There are however a number of drawbacks compared to the first two techniques:
· No filtering in kernel-mode is possible – all packets are delivered to the user-mode application (which has performance implications).
· There is no visibility of how many packets are lost/dropped as a result of insufficient buffering.
· The packets are first time-stamped when processed by a user-mode application, which might be some time after they “could have been” time-stamped by filter/callout driver kernel-mode code running in a DPC (Deferred Procedure Call).
· There is no guarantee of the order in which the kernel adds packets to the raw socket. Monitoring the kernel activity with the “Microsoft-Windows-TCPIP” and “Microsoft-Windows-Winsock-AFD” providers indicates that the outbound response to an inbound packet is often copied to the raw socket before the inbound packet.
Using multiple outstanding read requests and I/O Completion Ports reduces the risk of dropping packets but further increases the risk of out-of-order time-stamping of packets (because the I/O completion port thread pool scheduling determines how quickly a time-stamp can be associated with a packet).
If captured data is loaded into Message Analyzer for analysis, the out-of-order time-stamping causes many spurious diagnosis messages. A “premature” packet is flagged with diagnosis messages like:
Lost TCP segments, sequence range
1234 ~ 2345.
This data segment was acknowledged before it arrived, which infers an
out-of-order capturing issue.
The corresponding “delayed” packet is flagged with diagnosis messages like:
Retransmitted, original message is missing.
One always has to be aware that artefacts of the capture process can misrepresent what actually happened “on the wire” (overly aggressive capture filtering being perhaps the biggest problem), but it is nonetheless unfortunate that the value of the automated diagnosis is substantially reduced when using this capture technique.
The biggest problem with raw socket network sniffing is the handling of IPv6 packets. The documentation (accurately) states:
For IPv6 (address family of AF_INET6), an application receives everything after the last IPv6 header in each received datagram regardless of the IPV6_HDRINCL socket option. The application does not receive any IPv6 headers using a raw socket.
The basic IPv6 header (RFC 8200), and therefore the missing information in the received data, looks like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |
Flow Label |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length | Next Header | Hop
Limit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+
+
| |
+ Source
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+ +
|
|
+ Destination
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Version field can be inferred since one needs to create separate raw sockets per network interface for IPv4 and IPv6 packets. The Payload Length is implicit in the length of the captured data. The Source Address can be obtained by using socket functions such as recvfrom/WSARecvFrom/WSARecvMsg (which can return the source address via a separate output parameter). Traffic Class, Flow Label and Hop Limit are often not “interesting” in common troubleshooting scenarios involving network sniffing.
The most important missing information is the final Next Header value since this determines the transport protocol and how the captured data should be interpreted. The Internet Assigned Numbers Authority (IANA) documents the registered values for this field; not all of these values are acceptable as the final Next Header value (e.g. HOPOPT and AH) and some make the interpretation/decoding of subsequent data “difficult” (e.g. ESP). The Next Header values that I find most useful to identify are TCP, UDP and ICMPv6 and one can use heuristics to infer which, if any, of these values was probably present.
The basic structure of the UDP, ICMPv6 and TCP headers is shown here (taken directly from the plain text versions of the RFCs):
UDP
+--------+--------+--------+--------+
| Source |
Destination |
|
Port | Port
|
+--------+--------+--------+--------+
| | |
| Length
| Checksum |
+--------+--------+--------+--------+
ICMPv6
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type |
Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence
Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
|
Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The UDP header is the only header that contains a field (Length) that can be directly compared with information that we know about the received packet. All three types of headers include a Checksum field, albeit at different offsets.
The heuristics that I use to infer the Next Header value are:
· If the received data length matches the UDP Length and the UDP Checksum is good, then set Next Header to UDP.
· If the TCP Checksum is good, then set Next Header to TCP.
· If the ICMPv6 Checksum is good, then set Next Header to ICMPv6.
· If the received data length matches the UDP Length, then set Next Header to UDP.
· If the first 4 bits of the received data equals 4 and the IPv4 checksum is good, then set Next Header to IPv4 (IPv4 packet encapsulated in IPv6).
· If the first 4 bits of the received data equals 6 and the IPv6 length is consistent with the length of the received data, then set Next Header to IPv6 (IPv6 packet encapsulated in IPv6).
· If the first byte of the received data equals IPPROTO_UDP (17) and the second byte is zero, then set Next Header to IPv6FragmentHeader.
· Otherwise, set the Next Header to Reserved (255/0xFF). These packets are then easy to spot in trace analysis tools such as Message Analyzer and Wireshark.
If a checksum is good, repeating the checksum process including the checksum value itself in the checksum should deliver a result of 0 or 0xFFFF. In addition to the transport data, the checksum also covers an IPv6 pseudo-header:
IPv6 pseudo-header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+ +
|
|
+ Source
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+
+
|
|
+ Destination
Address +
| |
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Upper-Layer Packet Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| zero
| Next Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
We know the Upper-Layer Packet Length and the Source Address and we are guessing the Next Header value, but we are still missing the Destination Address. The Destination Address is available if WSARecvMsg is used to receive messages from the raw socket (via the Control field of a WSAMSG struct (WSACMSGHDR cmsg_type = IPV6_PKTINFO)). An alternative approach is to create an initial set of possible addresses by examining various networking tables: the TCP connections table, the destination cache table, the neighbours table and the local addresses of all network interfaces; all received Source Addresses are also merged into the set. Now try to verify the checksum using each of these addresses.
Because a “partial” checksum of the received data and known values from the pseudo-header can be calculated once (and partial checksums for each of the possible Destination Addresses can be cached), verifying the complete checksum just involves adding two values and folding back in any carry – which can be done very quickly.
False matches (of Next Header and Destination Address against the Checksum) are possible, but I have been happy with the results.
PktMon
PktMon is a relatively new packet capture technique. Microsoft has introduced hooks into NDIS.sys to support this type of logging. Some typical stack traces at the point that captured data is passed to the PktMon.sys driver show how and where the hooks are integrated into NDIS.sys:
PktMon!PktMonPacketLogCallback+0x19
ndis!PktMonClientNblLog+0xbd
ndis!PktMonClientNblLogNdis+0x2b
ndis!ndisCallSendHandler+0x3ca4b
ndis!ndisInvokeNextSendHandler+0x10e
ndis!NdisSendNetBufferLists+0x17d
PktMon!PktMonPacketLogCallback+0x19
ndis!PktMonClientNblLog+0xbd
ndis!PktMonClientNblLogNdis+0x2b
ndis!ndisMIndicateNetBufferListsToOpen+0x3e95c
ndis!ndisMTopReceiveNetBufferLists+0x1bd
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!ndisFilterIndicateReceiveNetBufferLists+0x3be91
ndis!NdisFIndicateReceiveNetBufferLists+0x6e
This technique allows the data to be captured at many points in the NDIS protocol stack (the same packet can be captured and recorded at more than one point in the stack), but simple configuration allows packets to be captured just once.
Enabling and disabling tracing does not involve rebinding the NDIS protocol stack, which is an improvement over the NDIS filter approach to tracing.
This capture technique does not capture “loopback” traffic (for the same reasons that NDIS filters are unable to capture such traffic).
Unlike Microsoft’s NdisCap NDIS filter driver and Microsoft-Windows-NDIS-PacketCapture ETW provider, the ETW provider associated with PktMon (Microsoft-Windows-PktMon) does record the original payload size of packets that are truncated. NdisCap captures large TCP sends with an IP “pseudo” header containing an IP length of zero; since there is no record of the original payload size and the size cannot be deduced from the pseudo IP header then analysis tools (such as Wireshark) are unable to determine whether IP packets are missing from the captured data.
PktMon provides better filtering options than those supported by NdisCap/Microsoft-Windows-NDIS-PacketCapture, but the filters are not set via ETW but rather by IOCTLs to the PktMon driver.
Gary,
ReplyDeleteThanks for taking the time to thoroughly review the three approaches.
Before I discovered your research, I needed a small custom app to grab the packets related to a specific TCP/IP connection. I used raw sockets and WSAIoctl(SIO_RCVALL). It was important for my program logic to confirm the seqno's and ackno's matched expected values as I passively watched the conversation between client and server. I was totally surprised to see that the client packets were being 'received' before the corresponding server packets. The situation was glaringly wrong because the client was an acknowledging an seqno that hadn't yet arrived, if I was to believe the order of the packets coming from the WSARecv call. I can program around this Windows shortcoming, although I wish I didn't need to. But the casualty is that the packet arrival times are now questionable because the kernel is not honoring the true arrival order and definitely caching the packets for an unknowable length of time.
Your article helped my realize this was a known problem with the raw socket approach. Thanks!
Hey Gary,
ReplyDeleteI'm using "Microsoft-Windows-NDIS-PacketCapture" ETW traces via "netsh.exe".
Ran into a problem where the ETW event PID doesn't always match the packet source process. In other words there appears to be some PID association problem within Microsoft's trace code someplace.
I verified the packets are NOT coming from the target process by putting breakpoints in target soccket connect functions, etc. And not talking about UDP DNS packets which can come from a system process for the caching stuff.
I wonder if you have run into this yourself.
I might just have to dig into the NDIS driver and see if I can figure out why. Maybe it's a stack thing.
I suppose one option too is to make my own network filter driver.
Also I have yet to try the PktMon way, maybe it won't have this PID attribution issue.
Hello,
DeleteThe PID on sent packets normally reflects the PID of the sender; the PID on received packets is just the PID of whatever process happens to be running when the packets arrived (in interrupt or DPC context), it is not related to the PID of the socket owner.
Stateful tracking of send and receives, or combining information from other ETW providers (such as Microsoft-Windows-Winsock-AFD or Microsoft-Windows-TCPIP) should make it possible to attribute a received packet to a process.
I don't think that there is any difference between Microsoft-Windows-NDIS-PacketCapture and Microsoft-Windows-PktMon in this respect.
Gary
Thank you so much Gary! Your expertise is inspiring.
Delete