There are a number of approaches to “tapping into” the
network traffic of a Microsoft Windows (desktop/server) operating system. I
would like to share some practical experience of using the various approaches.
NDIS Filter
Using an NDIS (Network Driver Interface Specification)
filter driver is probably the most common technique – it is the technique used
by Microsoft’s “Network Monitor” and one of the options for packet capture in
its successor (Microsoft’s “Message Analyzer”).
Typically, Wireshark (perhaps the best known third party sniffer used under
Windows) also uses this technique (via use of the WinPcap
or Npcap NDIS filter drivers).
An NDIS filter can observe and capture all of the activity
at the data link layer (which can be divided into the logical link control
(LLC) and medium access control (MAC) sublayers) – making it network (layer)
protocol independent; it is the only technique that I shall mention which has
this capability. If one wishes to capture MAC frame headers or network protocols
other than IPv4/IPv6, then this is the technique to use.
There are at least two problems with the NDIS filter
approach:
·
Traffic over loopback interfaces cannot be
captured (since these “software” network interfaces do not use NDIS).
·
Network traffic can be interrupted when starting
and stopping a capture session, since the NDIS filter needs to be “bound” into
the driver stack. The “binding” process “pauses” traffic through the stack and
(very occasionally) this “pause” can continue for an extended period of time –
potentially causing network connections to be closed.
On one occasion (or possibly two – it is the first that
remains in memory), I started a network trace on a production server whilst
logged in via Remote Desktop (RDP) and the RDP connection was immediately broken
and could not re-established for several minutes (almost certainly due to a
problem draining packets from the driver stack during a pause/bind operation).
Recent versions of Windows include such an NDIS filter
driver – the driver/service NdisCap. This driver
exposes the captured traffic via the Event Tracing for Windows (ETW) mechanism
as the provider “Microsoft-Windows-NDIS-PacketCapture”.
A limited filtering capability is also exposed via this ETW provider.
The “Microsoft-Windows-NDIS-PacketCapture”
provider is used by Message Analyzer, the “netsh trace” command and the “NetEventPacketCapture”
PowerShell cmdlets (in particular, the “Add-NetEventPacketCaptureProvider”
cmdlet).
WFP Callouts
The Windows Filtering Platform (WFP) allows developers to
“intervene” at several stages in the processing that takes place as a packet
flows through the IPv4/IPv6 network stack – including the capability of
capturing the network traffic (from the network layer upwards).
MAC frame headers and network protocols other than IPv4/IPv6
are not included in the captured data, but packets sent via loopback can be
captured; IPsec traffic in its unencrypted state (i.e. before/after
encryption/decryption) can be captured too.
Adding and removing WFP callouts does not “pause” the
network stack and is less likely to cause a network interruption than binding/unbinding
an NDIS filter driver (reconfiguring WFP is a normal/common activity).
Recent versions of Windows include such a WFP callout driver
– the driver/service WfpCapture. This driver exposes
the captured traffic via the Event Tracing for Windows (ETW) mechanism as the
provider “Microsoft-Pef-WFP-MessageProvider”.
A limited filtering capability is also exposed via this ETW provider.
The “Microsoft-Pef-WFP-MessageProvider” provider is used by Message Analyzer and the “NetEventPacketCapture”
PowerShell cmdlets (in particular, the “Add-NetEventWFPCaptureProvider”
cmdlet).
At the time of writing, the current version of WfpCapture does not pass the Driver Signing Policy enforced
by Windows 10, version 1607 and later. Unless one or more of the exception
conditions apply (i.e. Secure Boot is disabled or the installed Windows version
was upgraded from an earlier release of Windows (rather than being a “clean”
install)), then this WFP callout driver cannot be used. This is the most recent
message from Microsoft that I could find on this topic:
Yes, we were able to repro with SecureBoot enabled. We are looking at this now and
post a new build when we have this fixed. But I don't have a time
frame.
Paul
Paul E Long Microsoft (MSFT) Thursday,
October 13, 2016 1:08 PM
The changes
to the Driver Signing Policy were discussed at a 2016 Filter Plugfest (video available on Channel 9) and Scott Anderson
from Microsoft mentioned four exceptions to the policy – the three currently
documented exceptions and a “test reg key to allow
cross-signed certificates to work”. Peter Viscarola
(founder of OSR) later wrote, in response to a discussion of this topic:
I hate to say this, but since you asked: The
registry key information is only available under NDA.
The “upgraded” system exception to the driver signing policy
is signalled by a registry value in the Code Integrity (CI) Policy key. The
driver signing policy is slowly being tightened; future releases will first only
allow exceptions for “boot start” drivers before finally removing all
exceptions.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CI\Policy
BootUpgradedSystem
REG_DWORD 0x1
UpgradedSystem
REG_DWORD 0x1
Raw Sockets
The “functionality” of raw sockets under Windows (i.e. which
packets they are capable of receiving, such as outbound packets) has changed
over the years. Windows 10 raw sockets can receive all IPv4 packets (both
inbound and outbound) including their IPv4 headers and all IPv6 packets – but
only from the transport layer upwards (i.e. excluding their IPv6 headers). The
receipt of inbound packets is subject to the Windows Defender Firewall rules in
force – it is normally necessary to add a rule to grant access.
Since raw sockets are built into the kernel TCP/IP
implementation, there is no need for additional kernel-mode code (such as NDIS
filter drivers or WFP callout drivers). There are however a number of drawbacks
compared to the first two techniques:
·
No filtering in kernel-mode is possible – all
packets are delivered to the user-mode application (which has performance
implications).
·
There is no visibility of how many packets are
lost/dropped as a result of insufficient buffering.
·
The packets are first time-stamped when
processed by a user-mode application, which might be some time after they
“could have been” time-stamped by filter/callout driver kernel-mode code
running in a DPC (Deferred Procedure Call).
·
There is no guarantee of the order in which the
kernel adds packets to the raw socket. Monitoring the kernel activity with the “Microsoft-Windows-TCPIP”
and “Microsoft-Windows-Winsock-AFD” providers indicates that the outbound
response to an inbound packet is often copied to the raw socket before the
inbound packet.
Using multiple outstanding read requests and I/O Completion
Ports reduces the risk of dropping packets but further increases the risk of
out-of-order time-stamping of packets (because the I/O completion port thread
pool scheduling determines how quickly a time-stamp can be associated with a packet).
If captured data is loaded into Message Analyzer
for analysis, the out-of-order time-stamping causes many spurious diagnosis
messages. A “premature” packet is flagged with diagnosis messages like:
Lost TCP segments, sequence range
1234 ~ 2345.
This data segment was acknowledged before it arrived, which infers an
out-of-order capturing issue.
The corresponding “delayed” packet is flagged with diagnosis
messages like:
Retransmitted, original message
is missing.
One always has to be aware that artefacts of the capture
process can misrepresent what actually happened “on the wire” (overly
aggressive capture filtering being perhaps the biggest problem), but it is
nonetheless unfortunate that the value of the automated diagnosis is
substantially reduced when using this capture technique.
The biggest problem with raw socket network sniffing is the
handling of IPv6 packets. The documentation (accurately) states:
For IPv6 (address family of
AF_INET6), an application receives everything after the last IPv6 header in
each received datagram regardless of the IPV6_HDRINCL socket option. The
application does not receive any IPv6 headers using a raw socket.
The basic IPv6 header (RFC 8200), and therefore the missing
information in the received data, looks like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |
Flow Label |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length | Next Header | Hop
Limit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+
+
| |
+ Source
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+ +
|
|
+ Destination
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Version field can be inferred since one needs to create
separate raw sockets per network interface for IPv4 and IPv6 packets. The
Payload Length is implicit in the length of the captured data. The Source
Address can be obtained by using socket functions such as recvfrom/WSARecvFrom/WSARecvMsg (which can
return the source address via a separate output parameter). Traffic Class, Flow
Label and Hop Limit are often not “interesting” in common troubleshooting
scenarios involving network sniffing.
The most important missing information is the final Next Header
value since this determines the transport protocol and how the captured data
should be interpreted. The Internet Assigned Numbers Authority (IANA) documents
the registered values for this field; not all of these values are acceptable as
the final Next Header value (e.g. HOPOPT and AH) and some make the
interpretation/decoding of subsequent data “difficult” (e.g. ESP). The Next
Header values that I find most useful to identify are TCP, UDP and ICMPv6 and
one can use heuristics to infer which, if any, of these values was probably present.
The basic structure of the UDP, ICMPv6 and TCP headers is
shown here (taken directly from the plain text versions of the RFCs):
UDP
+--------+--------+--------+--------+
| Source |
Destination |
|
Port | Port
|
+--------+--------+--------+--------+
| | |
| Length
| Checksum |
+--------+--------+--------+--------+
ICMPv6
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type |
Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence
Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
|
Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The UDP header is the only header that contains a field
(Length) that can be directly compared with information that we know about the
received packet. All three types of headers include a Checksum field, albeit at
different offsets.
The heuristics that I use to infer the Next Header value are:
·
If the received data length matches the UDP
Length and the UDP Checksum is good, then set Next Header to UDP.
·
If the TCP Checksum is good, then set Next
Header to TCP.
·
If the ICMPv6 Checksum is good, then set Next
Header to ICMPv6.
·
If the received data length matches the UDP
Length, then set Next Header to UDP.
·
If the first 4 bits of the received data equals
4 and the IPv4 checksum is good, then set Next Header to IPv4 (IPv4 packet
encapsulated in IPv6).
·
If the first 4 bits of the received data equals
6 and the IPv6 length is consistent with the length of the received data, then
set Next Header to IPv6 (IPv6 packet encapsulated in IPv6).
·
If the first byte of the received data equals IPPROTO_UDP
(17) and the second byte is zero, then set Next Header to IPv6FragmentHeader.
·
Otherwise, set the Next Header to Reserved
(255/0xFF). These packets are then easy to spot in trace analysis tools such as
Message Analyzer and Wireshark.
If a checksum is good, repeating the checksum process
including the checksum value itself in the checksum should deliver a result of
0 or 0xFFFF. In addition to the transport data, the checksum also covers an
IPv6 pseudo-header:
IPv6 pseudo-header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
+ +
|
|
+ Source
Address +
|
|
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+
+
|
|
+ Destination
Address +
| |
+
+
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Upper-Layer Packet Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| zero
| Next Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
We know the Upper-Layer Packet Length and the Source Address
and we are guessing the Next Header value, but we are still missing the
Destination Address. The Destination Address is available if WSARecvMsg is used to receive messages from the raw socket (via
the Control field of a WSAMSG struct (WSACMSGHDR cmsg_type = IPV6_PKTINFO)). An alternative approach is to
create an initial set of possible addresses by examining various networking
tables: the TCP connections table, the destination cache table, the neighbours
table and the local addresses of all network interfaces; all received Source
Addresses are also merged into the set. Now try to verify the checksum using
each of these addresses.
Because a “partial” checksum of the received data and known
values from the pseudo-header can be calculated once (and partial checksums for
each of the possible Destination Addresses can be cached), verifying the
complete checksum just involves adding two values and folding back in any carry
– which can be done very quickly.
False matches (of Next Header and Destination Address
against the Checksum) are possible, but I have been happy with the results.
PktMon
PktMon is a relatively new packet capture
technique. Microsoft has introduced hooks into NDIS.sys to support this type of
logging. Some typical stack traces at the point that captured data is passed to
the PktMon.sys driver show how and where the hooks are integrated into
NDIS.sys:
PktMon!PktMonPacketLogCallback+0x19
ndis!PktMonClientNblLog+0xbd
ndis!PktMonClientNblLogNdis+0x2b
ndis!ndisCallSendHandler+0x3ca4b
ndis!ndisInvokeNextSendHandler+0x10e
ndis!NdisSendNetBufferLists+0x17d
PktMon!PktMonPacketLogCallback+0x19
ndis!PktMonClientNblLog+0xbd
ndis!PktMonClientNblLogNdis+0x2b
ndis!ndisMIndicateNetBufferListsToOpen+0x3e95c
ndis!ndisMTopReceiveNetBufferLists+0x1bd
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!ndisFilterIndicateReceiveNetBufferLists+0x3be91
ndis!NdisFIndicateReceiveNetBufferLists+0x6e
This technique allows the data to be captured at many points
in the NDIS protocol stack (the same packet can be captured and recorded at
more than one point in the stack), but simple configuration allows packets to
be captured just once.
Enabling and disabling tracing does not involve rebinding
the NDIS protocol stack, which is an improvement over the NDIS filter approach
to tracing.
This capture technique does not capture “loopback” traffic
(for the same reasons that NDIS filters are unable to capture such traffic).
Unlike Microsoft’s NdisCap NDIS
filter driver and Microsoft-Windows-NDIS-PacketCapture
ETW provider, the ETW provider associated with PktMon
(Microsoft-Windows-PktMon) does record the original
payload size of packets that are truncated. NdisCap
captures large TCP sends with an IP “pseudo” header containing an IP length of
zero; since there is no record of the original payload size and the size cannot
be deduced from the pseudo IP header then analysis tools (such as Wireshark)
are unable to determine whether IP packets are missing from the captured data.
PktMon provides better filtering
options than those supported by NdisCap/Microsoft-Windows-NDIS-PacketCapture, but the filters are not set via ETW but
rather by IOCTLs to the PktMon driver.