Tuesday 18 May 2021

Network Discovery and Name Resolution under Windows 10 in a Home Network (zero-configuration networking)

Network discovery (for example, the “discovery” performed by the “Network” item in the left (navigation) pane of “File Explorer” under Windows 10) in a home network should “just work” in the sense of discovering and displaying the network devices that are known to be in the home network. However, one often reads in technical support forums that “network discovery” is not working to some extent; sometimes this results from outdated expectations (for example, that the “net view” command is the full extent of “network discovery”) but sometimes also from old network equipment that does not support newer discovery mechanisms or from network equipment that has been configured not to respond to network discovery requests (perhaps for security reasons).

Let’s first consider how “network discovery” works and what can be done to influence its behaviour.

The Microsoft interface IFunctionDiscovery is the entry point into performing network discovery in the same style as File Explorer. The method CreateInstanceCollectionQuery of this interface is called first with either a “layered category” (e.g. "Layered\Microsoft.Networking.Devices") which will use a collection of providers appropriate to the layer or a “provider category” (e.g. “Provider\Microsoft.Networking.WSD”) which will use a specific provider/technology/protocol.

Some of the providers that are relevant to discovering networking devices are:

Provider\Microsoft.Networking.WSD
Provider\Microsoft.Networking.SSDP
Provider\Microsoft.Networking.Netbios

Network discovery can take some time, so the method that executes the discovery normally returns a “pending” status (E_PENDING) and delivers discovery results to its caller asynchronously (as they happen). The main work of discovery is performed in the “Function Discovery Provider Host” (fdPHost) service.

One piece of advice that one often sees on the Internet is to ensure that Windows services used in the discovery process are running and/or configured to run. This is not something that I would recommend. The relevant services (e.g. fdPHost, FDResPub, SSDPSRV) are normally configured as “demand” start; some may also include “trigger” configuration (e.g. FDResPub triggers on specific event values of the Microsoft-Windows-NetworkProfileTriggerProvider ETW provider); some are defined as “dependencies” for other services; some services explicitly start other services. The ability of a service to operate is also often dependent on Windows Firewall rules (that are also actively maintained and changed as system events occur). Manual interference should be a last step, guided by evidence that there is actually a misconfiguration, rather than a first/early troubleshooting step.

The progress of network discovery can be followed using ETW. A combination of the providers Microsoft-Windows-FunctionDiscovery, Microsoft-Windows-WFP (to check for firewall packet drops) and Microsoft-Windows-PktMon (or equivalent, to observe the actual network protocol interactions) is often a good combination.

Web Services Dynamic Discovery (WS-Discovery or WSD)

The Microsoft.Networking.WSD provider is the provider most likely to detect computers and file servers on the home network. During the discovery operation, the fdPHost service sends WSD Probe messages to the WSD IPv4 and IPv6 multicast addresses defined by the WSD protocol. If and when the fdPHost receives a ProbeMatch message, it sends a Get request to the responder (via TCP) to obtain a Get response. In the case of Windows computers, the responder is the FDResPub (Function Discovery Resource Publication) service

The key information in the Get response is contained within the wsdp:Relationship/wsdp:Host/pub:Computer element. As the [MS-PBSD] document says, if the computer is domain joined then the value will be of the form “<NetBIOS_Computer_Name>/Domain:<NetBIOS_Domain_Name>”, if the computer is in a workgroup then the value will have the form “<NetBIOS_Computer_Name>\Workgroup:<Workgroup_Name>”, otherwise it will have the form “<NetBIOS_Computer_Name>\NotJoined”.

Network Discovery via IFunctionDiscovery finds all of these variants and File Explorer displays all of the results that represent domain joined or workgroup computers, but it does not display computers that report “not joined”. FDResPub uses the NetGetJoinInformation API to obtain workgroup/domain information; it normally obtains the information when the service starts, so if the LanmanWorkstation service (which serves the NetGetJoinInformation request) has not (completely) started when FDResPub calls NetGetJoinInformation, then the published information will state that the computer is “not joined”. 

A workaround for the above problem is to add a service dependency to the FDResPub service on the LanmanWorkstation service. The problem could be called a “bug” and it has a simple source code fix. FDResPub calls NetGetJoinInformation specifying the name of the local computer as the system for which the information should be retrieved; if NetGetJoinInformation fails with RPC_S_SERVER_UNAVAILABLE and a system name was specified then a failure code is returned to the caller (NERR_WkstaNotStarted), but if no system name was specified (a null was passed as parameter, implying the local system) then NetGetJoinInformation uses other local mechanisms to obtain join information and returns a success code to the caller.

This discovery mechanism should discover all devices (Windows, Apple, Linux, Network Attached Storage (NAS), etc.) that support WS-Discovery, have a WS-Discovery publisher service running and are not blocking WS-Discovery messages via firewall mechanisms.

For Windows systems, the “Network and Sharing Centre, Advanced sharing settings” dialog (on each Windows system in the home network) should be the only thing that needs to be checked to ensure that network discovery is correctly configured.

Simple Service Discovery Protocol (SSDP)

The Microsoft.Networking.SSDP provider “discovers” most of the printers, scanners, displays, etc. in the home network. The SSDPSRV service periodically multicasts SSDP M_SEARCH requests and observes SSDP NOTIFY announcements. When network discovery is started, fdPHost retrieves a list of responses from SSDPSRV via RPC. The fdPHost then retrieves detailed information about the service by querying the Location URL in the SSDP response. For services hosted on Windows systems (perhaps directly attached printers, music and video libraries, etc.), the upnphost (UPnP Device Host) service is normally the process that is listening at the Location URL.

NetBIOS

The Microsoft.Networking.Netbios provider essentially performs a classic “net view” command, using the WNetOpenEnum/WNetEnumResource/WNetCloseEnum API.

A prerequisite for this resolution mechanism is that NetBIOS over TCP/IP is enabled. By default, the relevant setting is set to “Use NetBIOS from the DHCP server. If static IP address is used or the DHCP server does not provide NetBIOS setting, enable NetBIOS over TCP/IP”.

If SMBv1 is installed, then this method should produce the classically expected results. If SMBv1 is not installed/enabled then this discovery method will only work in the computer has been elected as the “Master Browser” of a workgroup.

If the local computer is not the Master Browser, then the local computer will try to negotiate a connection with the Master Browser. Normally, the newest SMB protocol version available to both parties will be negotiated – typically SMBv3. From a network trace perspective, it seems as though the negotiation has been concluded successfully, but post processing by the client causes the connection to be disconnected.

The stack on the client (local computer) when a disconnection is initiated looks like this:

mrxsmb!SmbCeDisconnectServerConnections+0x2d6:
mrxsmb20!MRxSmb2HandOverSrvCall+0x2054:
mrxsmb!SubRdrClaimSrvCall+0x90:
mrxsmb!SmbCeCompleteSrvCallConstructionPhase2+0x146:
mrxsmb!SmbCeCompleteServerEntryInitialization+0x176:
mrxsmb!SmbCeCompleteNegotiatedConnectionEstablishment+0x155:
mrxsmb!SmbNegotiate_Finalize+0x5b:

Some code in mrxsmb20!MRxSmb2HandOverSrvCall decides that a disconnect is necessary and a quick look at that routine shows that the condition is ConnectionType == Tdi. Possible values for ConnectionType are Tdi (TDI - Transport Driver Interface), Wsk (Windows Kernel Sockets), Rdma (Remote Direct Access Memory) and VMBUS.

TDI is a deprecated technology and is used by "NetBIOS over TCP/IP" (netbt.sys). It seems as though the client will refuse to use SMBv2/3 in conjunction with "NetBIOS over TCP/IP".

If the local computer is the Master Browser, it has access to the list of servers via local mechanisms and the results are made available to the user of IFunctionDiscovery. Users of IFunctionDiscovery, such as Windows File Explorer, typically recognize that some systems have been discovered by more than one mechanism (perhaps WSD and NetBIOS) and display just a single entry for such systems in their user interface.

Name Resolution

If network discovery fails to discover some resource (for example, a file server), it may still be possible to reference the resource by name (rather than by IP address; IP addresses are typically not permanently assigned but rather leased, so it is difficult to be certain of the IP address in the long term in a home network). Name resolution uses different protocols to network discovery and these may well work, even if discovery has failed.

Windows uses 3 mechanisms to resolve names: multicast DNS (mDNS), Link-Local Multicast Name Resolution (LLMNR) and NetBIOS Name Service (NBNS). Name resolution via all applicable mechanisms is normally started in parallel (i.e. the mechanisms are not tried sequentially, waiting for one method to fail before the next is tried). If NetBIOS over TCP/IP is disabled or the name being queried is not NetBIOS compatible (e.g. it is longer than 15 characters) then the NetBIOS Name Service resolution method is not used.

1 comment:

  1. Home user here (probably not your target audience). But this just helped me to fix an issue on my Windows 11 home network that had been confounding me for weeks! Most of the PC’s could see each other just fine in File Explorer, but one simply refused to show up anywhere, despite trying literally every trick on the Internet and even reinstalling Windows on the “invisible” PC. Eventually, digging into Wireshark data showed that this machine was reporting as “NotJoined”, and Googling that led me here. Set fdResPub to depend on Lanmanworkstation on the offending machine and all is now well. The formerly invisible PC is significantly older than the other W11 boxes - technically doesn’t even meet W11 requirements - so I can only surmise that perhaps Lanmanworkstation was too slow to start before fdResPub queried it. Thanks Gary!

    ReplyDelete