The Microsoft article “Acquiring high-resolution time stamps” provides a good deal of useful information about high-resolution timing under Windows and Simon Anciaux provided the source code of a routine that matches the Windows 11 QueryPerformanceCounter routine (QueryPerformanceFrequency returning 10mhz bug).
Nonetheless, there is more that can be said
and that might provide additional reassurance that one’s understanding of high-resolution
timing is accurate.
There is a Windows kernel variable named HalpRegisteredTimers
that points to a linked list of registered timers; the routine that registers
the timers is passed a pointer to a TIMER_INITIALIZATION_BLOCK (whose format is
defined in file nthalext.h). On my PC, 8 timers appear in the list and one can
infer some interesting values from the information in the calls to register
timers.
The KnownType values are members of the
enum KNOWN_TIMER_TYPE (also defined in nthalext.h). In case the names are not
immediately recognizable, here are some brief descriptions:
TimerProcessor: the processor Time-Stamp Counter (TSC).
TimerART: the processor Always Running Timer, perhaps mostly commonly
encountered in conjunction with “Intel Processor Trace”.
TimerApic: Advanced Programmable Interrupt Controller (APIC).
TimerAcpi: Advanced Configuration and Power Interface (ACPI) power management
timer.
TimerCmosRtc: the Real Time Clock (RTC).
TimerHpet: High Precision Event Timer (HPET)
KnownType |
Identifier |
CounterBitWidth |
CounterFrequency |
Capabilities |
TimerProcessor |
0 |
64 |
1992000000 |
TIMER_PER_PROCESSOR |
TimerART |
0 |
64 |
24000000 |
TIMER_PER_PROCESSOR |
TimerApic |
0 |
32 |
187500 |
TIMER_PER_PROCESSOR |
TimerAcpi |
0 |
24 |
3579545 |
TIMER_COUNTER_READABLE |
TimerCmosRtc |
0 |
64 |
2048 |
TIMER_PSEUDO_PERIODIC_CAPABLE |
TimerHpet |
0 |
32 |
24000000 |
TIMER_COUNTER_READABLE |
TimerHpet |
1 |
31 |
24000000 |
TIMER_PSEUDO_PERIODIC_CAPABLE |
TimerHpet |
2 |
31 |
24000000 |
TIMER_PSEUDO_PERIODIC_CAPABLE |
When registering the
timers on my PC, the CounterFrequency of the TimerProcessor, TimerART and
TimerApic timers is specified as zero; actual values are determined by
comparing these timers to another timer whose nominal frequency is considered
to be reliable. Windows has a preference list of which timer should be used as
a reference and the first choice is TimerAcpi; on my PC, this timer is
available and is used. The timers are compared over a period of one eighth of a
second (125 milliseconds).
Section 19.7.3 (“Determining
the Processor Base Frequency”) of Intel 64 and IA-32 Architectures Software
Developer's Manual Volume 3 contains a table of “Nominal Core Crystal Clock
Frequency”; for my PC, the value is 24 MHz. The CPUID instruction with EAX set
to 15H (Time Stamp Counter and Nominal Core Crystal Clock Information Leaf)
returns the values 2 as the denominator of the TSC/”core crystal clock” ratio
and 166 as the numerator of the TSC/”core crystal clock” ratio; the nominal
frequency of the core crystal clock in Hz is not enumerated on my CPU.
These values allow a
nominal counter frequency for TimerProcessor to be calculated (1992000000 Hz).
The other values in the table above (1991998774 and 1991998859) are the results
of two runs of the measuring process against the ACPI PM timer.
TimerART runs at the “Nominal
Core Crystal Clock Frequency” (24000000 Hz); again, the other values in the
table above (24000038 and 24000039) are the results of two runs of the
measuring process against the ACPI PM timer. Although there is no instruction
to read the ART (Windows determines its value by evaluating (__rdtscp() -
__rdmsr(IA32_TSC_ADJUST)) * CPUID.15H:EAX[31:0] / CPUID.15H:EBX[31:0]), it is
measured separately.
I have assumed that
the APIC timer also has a nominal frequency of 24 MHz and a divider of 128,
giving a nominal frequency of 187500 Hz; the other values (187500 and 187497)
are the measurement results.
In the simplest case,
the value returned by QueryPerformanceCounter is the result of executing the
following calculations:
_umul128(__rdtscp(out _),
HypervisorSharedUserData.Factor, out ulong qpc);
qpc += HypervisorSharedUserData.Bias;
qpc += SharedUserData.QpcBias;
qpc >>= SharedUserData.Qpc.QpcShift;
SharedUserData is
intended as a reference to the KUSER_SHARED_DATA
structure; HypervisorSharedUserData is intended as the reference returned by
the call NtQuerySystemInformation(SystemHypervisorSharedPageInformation, …); HypervisorSharedUserData.Bias
is zero and HypervisorSharedUserData.Factor is the integral result of
evaluating:
CpuHz is the measured TimerProcessor
frequency and the evaluation is performed as a _udiv128 style calculation.
Auxiliary
Counter routines
Windows has a group of
functions with names including the string “AuxiliaryCounter”, such as QueryAuxiliaryCounterFrequency,
ConvertPerformanceCounterToAuxiliaryCounter. These functions are related to the
timer with the capability TIMER_AUXILIARY, if such a timer is present; on my
PC, this is TimerART. The documentation for the group of functions is not very
informative; I could only find one use of the functions on my PC: in file
IntcAudioBus.sys (FileDescription: “Intel® Smart Sound Technology (Intel® SST)
Bus”).
TSC synchronization
The
Microsoft-Windows-HAL Event Tracing for Windows provider records the synchronization
of the TSC values between the processors. Here is a condensed summary from a
synchronization run:
The processor cycle counter on processor 1 has been probed by processor
0. A counter delta of -199 was detected. The approximate communication delay
between these processors was detected to be 508.
[…]
The processor cycle counter on processor 0 was synchronized against
processor 4 using an adjustment of 94 cycles on attempt 0. This resulted in a
delta of -13 cycles.
The processor cycle counter on processor 1 was synchronized against
processor 0 using an adjustment of 342 cycles on attempt 0. This resulted in a
delta of 68 cycles.
[…]
The processor cycle counter on processor 1 has been probed by processor
0. A counter delta of 10 was detected. The approximate communication delay
between these processors was detected to be 500.
[…]
The processor's cycle counters have been successfully synchronized from
processor 0 within acceptable operating thresholds. The maximum positive delta
detected was 10 and the maximum negative delta was -11. Synchronization
executed for 7773 microseconds.
If the processors
cycle counters can be synchronized to within “acceptable operating thresholds”
then it should be impossible for a thread, rescheduled on a different
processor, to detect a backwards step in the TSC values.