Sunday, 9 January 2022

Windows 11 TCP/IP Congestion Control Improvements

I have written about weaknesses in the Windows 10 TCP/IP CUBIC congestion control mechanism in both this blog (Slow performance of IKEv2 built-in client VPN under Windows) and in a number of threads in the Microsoft Q&A site (that typically mention slow upload speed only under Windows (good speed under other OSes)); I also mentioned some links that suggested improvements were under development.

The improvements did not surface in any mainstream Windows 10 version of which I am aware; however Windows 11 does seem to incorporate substantial changes (at least an improved RFC 8985 (The RACK-TLP Loss Detection Algorithm for TCP) implementation with per-segment tracking), albeit with a least one minor bug which still limits performance (the bug is still present in Windows 11 build 22000.376).

The “bug” that I am referring to is a “sanity check” in the routine TcpReceiveSack. The problematic check is that if the SACK SLE (SACK Left Edge) is less than the acknowledgement (i.e. appears to be a D-SACK) then the SACK SRE (SACK Right Edge) must also be less than the acknowledgement; this appears to be coded incorrectly and means that genuine D-SACKs are not recognized as such. This failure to recognize D-SACKs means that tuning of things such as the “reorder window” does not take place.

The initial size of the “reorder window” (RACK.reo_wnd) is RACK.min_RTT/4 and it can grow to a maximum size of SRTT (Smoothed Round Trip Time). For many configurations and degrees of packet reordering, the default size eliminates many unnecessary retransmisions. However, in my test configuration (TCP inside an IKEv2 tunnel between two systems on the same LAN with substantial reordering introduced by the network adapter driver and a very low min_RTT (less than one millisecond)), the default reorder window size is completely inadequate.

A one-byte patch to tcpip.sys can (more or less) “correct” this problem (the patch does not handle sequence number wrap-around), enabling performance tests with and without the patch to be performed. This patching is risky (it will eventually result in a bug check when the change is detected by PatchGuard), but I judged the value of the performance test results to be worth the inconvenience.

What follows are traces of two 10 MB “discards” (TCP port 9), with and without the patch:


With the patch (above), the 10 MB discard takes about 2.4 seconds and, without the patch (below), it takes about 4.2 seconds.

The green line (the congestion window size) is reduced more without the patch, although there is less “reordering” than in the patched test. The yellow line is the send window size (auto-tuned by the server) and is indeed approximately the minimum window size needed to achieve the theoretical maximum throughput.

The dots on and below the x-axis are points at which “interesting” events occur; from the x-axis downwards:

·         Packet reordering

·         TcpEarlyRetransmit (Forward ACK (FACK), Recent ACK (RACK))

·         TcpLossRecoverySend Fast Retransmit

·         TcpLossRecoverySend SACK Retransmit

·         TcpDataTransferRetransmit

·         TcpTcbExpireTimer

·         D-SACK

·         TcpSendTrackerDetectLoss

·         TcpSendTrackerSackReorderingDetected

·         TcpSendTrackerUpdateReoWnd

A graph of the Windows 11 trace data gathered by someone with a “real world” setup (minimum RTT of about 9 milliseconds and using SMB file copy to generate the traffic) and no patching looks like this:



In this case, the initial reordering window was large enough to allow almost all reordering to be detected (and therefore to avoid spurious retransmissions).

The trace where D-SACK recognition is working shows that congestion window reduction following a retransmission is not “undone” in the case that a D-SACK shows the retransmission to have been spurious; this may mean that Windows 11 will not perform as well as some other operating systems when sending TCP data over a path where some packet reordering is present.

There are two Microsoft-Windows-TCPIP events that indicated whether D-SACKs are being recognized. In the TcpReceiveSack event, there is a DSackCount member that accurately reflects the number of D-SACKs (as observed in the raw packet data) when D-SACK recognition is working. In the TcpSendTrackerUpdateReoWnd event, there are several members that reflect the state of the RACK algorithm: Multiplier (RACK.reo_wnd_mult), Persist (RACK.reo_wnd_persist), Reownd (RACK.reo_wnd), ReorderingSeen (RACK.reordering_seen), DSackSeenOnLastAck, DSackRound/DSackRoundValid (RACK.dsack_round); when D-SACK recognition is working and D-SACKs are present then this event contains meaningful data.

October 2022 Update: Windows 11 22H2 corrects the D-SACK recognition problem.


5 comments:

  1. What tool do you use to generate these graphs?

    ReplyDelete
    Replies
    1. The tool is a simple home-made program, using the .NET System.Windows.Forms.DataVisualization.Charting classes for the graphs.

      Delete
  2. Excellent post again Gary.

    I wonder if it would be a viable option to just replace parts of the Windows 10 (or 11) TCPIP stack for performance a boost.

    Maybe a legit driver setup with an alternative network device.
    Hopefully wouldn't take some hacky setup that relies on disabling PatchGuard etc., but maybe an okay tradeoff still.

    ReplyDelete
  3. Very interesting, thanks. One question: How did you get the data, i.e., the internal TCP states like cwnd and events shown in your graphs?

    ReplyDelete
    Replies
    1. Hello Joerg/Jörg, the Microsoft-Windows-TCPIP Event Tracing for Windows (ETW) provider makes this information available. There are lots of tools (built-in and third party) that allow ETW to be controlled; logman, pktmon, "netsh trace", several PowerShell cmdlets and perhaps more are the built-in mechanisms. Gary

      Delete