From 7b7b92975c335f6f50d68c0a76705f855ec84871 Mon Sep 17 00:00:00 2001
From: "QUIC Dev[bot]"
On Windows, MsQuic leverages manifested ETW for its logging. All dependencies should be included with Windows; nothing additional should be required.
The easiest and quickest way to enable msquic logging is by compiling msquic with the following cmake arguments to direct the logs to standard output:
+cmake -D QUIC_ENABLE_LOGGING=ON -D QUIC_LOGGING_TYPE=stdout ...
+
On Linux, MsQuic leverages LTTng for its logging. Some dependencies, such as babeltrace, lttng, and clog2text_lttng are required. The simplest way to install all dependencies is by running ./scripts/prepare-machine.ps1 -ForTest
, but if you only want to collect the traces on the machine, the minimal dependencies are:
sudo apt-add-repository ppa:lttng/stable-2.13
diff --git a/msquicdocs/docs/TSG.html b/msquicdocs/docs/TSG.html
index 1d8fe8841..53f71f332 100644
--- a/msquicdocs/docs/TSG.html
+++ b/msquicdocs/docs/TSG.html
@@ -99,8 +99,10 @@
Trouble Shooting a Functional Issue
+- MsQuic logging?
- I am getting an error code I don't understand.
- The connection is unexpectedly shutting down.
+- The stream is aborted
- No application (stream) data seems to be flowing.
- Why is this API failing?
- An MsQuic API is hanging.
@@ -109,6 +111,8 @@
- TLS handshake fails in Chrome and Edge for HTTP/3 (including WebTransport) even though HTTP/1.1 and HTTP/2 work.
- I need to get a packet capture.
+Logging
+See [Tracing](./Diagnostics.md#Built-in Tracing)
Understanding Error Codes
Some error codes are MsQuic specific (QUIC_STATUS_*
), and some are simply a passthrough from the platform. You can find the MsQuic specific error codes in the platform specific header (msquic_posix.h, msquic_winkernel.h, or msquic_winuser.h).
From msquic_winuser.h:
@@ -143,6 +147,17 @@
TODO - Add an example event
The error code indicated in this event is completely application defined (type of QUIC_UINT62
). The transport has no understanding of the meaning of this value. It never generates these error codes itself. So, to map these values to some meaning will require the application protocol documentation.
+The stream is aborted.
+Stream abortion is stream terminated abruptly.
+Remote Stream is aborted by my side?
+symptom: Peer complains starting stream is failed due to remote abortion.
+The remote stream maybe aborted locally by
+
+- Local calls the StreamShutdown with abortive flags
+- Local calls the StreamClose
+- Local calls the ConnectionShutdown
+- Connection callback handler returns the value other than
QUIC_STATUS_SUCCESS
for event: QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED
+
Why isn't application data flowing?
Application data is exchanged via Streams and queued by the app via StreamSend. The act of queuing data doesn't mean it will be immediately sent to the peer. There are a number of things that can block or delay the exchange. The QUIC_FLOW_BLOCK_REASON
enum in quic_trace.h contains the full list of reasons that data may be blocked. Below is a short explanation of each:
@@ -284,9 +299,12 @@ Share the generated cab file with SMB developers.
See FlyByWireless.CustomCertificate.Generate() on how to generate such a certificate.
Collecting a Packet Capture
Linux Packet Capture
-
-TODO
-
+# Capture from any interface, all the udp traffic on the host and write it to the msquic.pcap file
+tcpdump -i any udp -w msquic.pcap
+
+# wireshark to view it. (You need load SSLKEYLOGFILE for the session key to decrypt the session)
+wireshark ./msquic.pcap
+
Window Packet Capture
On Windows, Packet Monitor (pktmon
) is the best way to collect a packet capture.
The (optional) first step is usually to find the interface you want to collect the capture on. Do this by first running pktmon list
:
diff --git a/msquicdocs/index.json b/msquicdocs/index.json
index fb8ca2815..a9fb6fa79 100644
--- a/msquicdocs/index.json
+++ b/msquicdocs/index.json
@@ -37,7 +37,7 @@
"docs/Diagnostics.html": {
"href": "docs/Diagnostics.html",
"title": "Diagnosing Issues with MsQuic",
- "keywords": "Diagnosing Issues with MsQuic This document describes various ways to debug and diagnose issues when using MsQuic. Built-in Tracing For debugging issues, logging is generally the best way to diagnose problems. MsQuic has extensive logs in the code to facilitate debugging. The logs can simply be converted to human readable text or they can be processed by various automated tools to help improve diagnostics. Note Currently tools only support Windows ETW. Windows On Windows, MsQuic leverages manifested ETW for its logging. All dependencies should be included with Windows; nothing additional should be required. Linux LTTng On Linux, MsQuic leverages LTTng for its logging. Some dependencies, such as babeltrace, lttng, and clog2text_lttng are required. The simplest way to install all dependencies is by running ./scripts/prepare-machine.ps1 -ForTest, but if you only want to collect the traces on the machine, the minimal dependencies are: sudo apt-add-repository ppa:lttng/stable-2.13 sudo apt-get update sudo apt-get install -y lttng-tools Perf For general tracing, refer Stacks and CPU usage macOS Tracing is currently unsupported on macOS. Trace Collection Windows There are a lot of general tools and ways to collect and interact with ETW. In the sections below are our recommended ways to collect and consume the MsQuic ETW logs. MsQuicEtw.man specifies all the details for each event. For instance: This details everything about the QuicConnOutFlowStats event (see ConnOutFlowStats in the code), including it's level (verbose) and it's keywords (ut:Connection and ut:DataFlow). So this event would be collected if the level is 5 or higher and either the ut:Connection or the ut:DataFlow keywords are configured. The numeric value for the keywords is at the top of the manifest file: netsh.exe netsh.exe (short for Net Shell, but generally pronounced \"net-s-h\") is an inbox Windows command line tool. It has many uses, but one of the most used is around tracing (at least for the networking community). To start collecting a trace with netsh, use the following command: netsh.exe trace start overwrite=yes report=dis correlation=dis traceFile=quic.etl provider={ff15e657-4f26-570e-88ab-0796b258d11c} level=0x5 keywords=0xffffffff There are a lot of parameters being passed and if you want the complete help text explaining them all you can run netsh.exe trace start help. A few of the important parameters are: traceFile - The name/path of the output ETL file that the logs will be saved to. provider - The GUID of the ETW provider that should be collected. In this case, {ff15e657-4f26-570e-88ab-0796b258d11c} is the GUID for the MsQuic provider. level - The level at which to collect traces. 5 indicates the verbose level. 4 is informational. keywords - The set of flags that indicate which type of events should be collected. The command above collects all keywords (0xffffffff) which may be too verbose for some scenarios, such as high throughput testing or large number of parallel connections. For a detailed list of the available keywords, see MsQuicEtw.man and see . Once you are done with collecting all the events, to stop the trace session, use the following command: netsh.exe trace stop This will flush and finalize the output ETL file specified in the netsh trace start command. This ETL file can either be converted to text or used in the automated tools. wpr.exe wpr.exe is another inbox Windows command line tool that can be used to collect ETW traces, but it requires an additional \"profile\" file: MsQuic.wprp. This WPR profile file details several specifically tailored trace configurations. To use it, you must copy MsQuic.wprp to the machine you wish to collect the traces for, you can simply run the following to start trace collection: wpr.exe -start MsQuic.wprp!PROFILE -filemode You must replace PROFILE with the name of the profile you want to run. Some of the useful profiles are: Stacks.Light- Collects CPU callstacks. Stacks.Verbose- Collects CPU callstacks, DPCs and interrupts. Performance.Light - Collects performance related events useful for automated tool processing. Performance.Verbose - Collects Performance.Light plus CPU callstacks. Basic.Light - Collects general, \"low volume\" MsQuic events. Useful for a \"big picture\" understanding, with as few events collected as possible. Basic.Verbose - Collects all MsQuic events. This is very verbose. Scheduling.Verbose - Collects \"low volume\" and scheduling related MsQuic events. Datapath.Light - Collects \"low volume\" and datapath related MsQuic events. Datapath.Verbose - Collects Datapath.Light plus CPU callstacks. Full.Light - Collects all MsQuic events as well as TAEF events. For most, this will be equivalent to Basic.Verbose. Full.Verbose - Collects all MsQuic events, TAEF events and several networking components' events. This is the most verbose possible, and should only be used for the most minimal scenarios. When you're done and ready to collect the logs, you run: wpr.exe -stop ETL_PATH You must replace ETL_PATH with the path of the output ETL file that you want to be written. If you no longer need the logs being collected, you can cancel a trace instead of saving it to file: wpr.exe -cancel Other Ways As already indicated, there are lots of ways to collect ETW traces. Feel free to look at Configuring and Starting an Event Tracing Session for more information. Linux All in one command This script wraps collecting trace then converting to text as well WARN: This wrapper doesn't work with ./scripts/test.ps1 etc. as it is also creating lttng session internally. cd ${MSQUIC_PATH} ./scripts/log_wrapper.sh ${YOUR_COMMAND} # e.g. ./scripts/log_wrapper.sh ./artifacts/bin/linux/x64_Debug_openssl/msquictest --gtest_filter=Basic.* ls msquic_lttng0 # data quic.babel.txt quic.log Step by step command Instead, you can use the following commands: mkdir msquic_lttng lttng create msquic -o=./msquic_lttng lttng enable-event --userspace \"CLOG_*\" lttng add-context --userspace --type=vpid --type=vtid lttng start And to stop log the trace session, you can use the following command: lttng stop msquic Trace Conversion to Text Windows To convert an ETL file to human-readable text, you can use the following netsh.exe command: netsh.exe trace convert quic.etl overwrite=yes If you're using a version of MsQuic that uses an ETW manifest version more recent than the one built into the Windows image, decoding may not provide correct output. To solve this issue, newer versions of Windows (11 and Server 2022) support a manpath parameter that can be used to manually specify a manifest. This will likely always be needed for apps using MsQuic in user mode. netsh.exe trace convert quic.etl manpath=path/to/manifest/MsQuicEtw.man For older versions of Windows that don't support the manpath parameter, you can manually update the installed manifest that is used to decode the file. You can run the following wevtutil.exe commands in an Admin window: wevtutil.exe um path\\to\\MsQuicEtw.man wevtutil.exe im path\\to\\MsQuicEtw.man /rf:path\\to\\msquic.dll /mf:path\\to\\msquic.dll Replace path\\to with the actual paths to the respective files. With the latest manifests installed, now the normal netsh.exe convert command should work. Linux NOTE: msquic.lttng.so must be built to enable lttng logging - see https://lttng.org/docs/v2.13/#doc-liblttng-ust-dl and it must be placed in the same directory as the msquic.so. Building clog2text_lttng: cat > /etc/apt/preferences.d/99microsoft-dotnet.pref < quic.babel.txt ~/.dotnet/tools/clog2text_lttng -i quic.babel.txt -s clog.sidecar -o quic.log --showTimestamp --showCpuInfo Note The clog.sidecar file that was used to build MsQuic must be used. It can be found in the ./src/manifest directory of the repository. Trace Analysis MsQuic supports a custom plugin for Windows Performance Analyzer (WPA) to detailed analysis of ETW traces. See the WPA instructions for more details. Text Analysis Tool When viewing the traces as text, we recommend TextAnalysisTool.NET (Windows only) and we have several filter files we maintain for it (folder). The different filters are meant to quickly highlight and color code important information. You may also open the trace in Windows Performance Analyzer. See the WPA instructions for more details. Note WPA support for LTTng based logs is not yet available but will be supported in the future. Performance Counters To assist investigations into running systems, MsQuic has a number of performance counters that are updated during runtime. These counters are exposed as an array of unsigned 64-bit integers, via a global GetParam parameter. Sample code demonstrating how to query the performance counters: uint64_t Counters[QUIC_PERF_COUNTER_MAX]; uint32_t BufferLength = sizeof(Counters); MsQuic->GetParam( NULL, QUIC_PARAM_GLOBAL_PERF_COUNTERS, &BufferLength, Counters); Each of the counters available is described here: Counter | Description --------|------------ QUIC_PERF_COUNTER_CONN_CREATED | Total connections ever allocated QUIC_PERF_COUNTER_CONN_HANDSHAKE_FAIL | Total connections that failed during handshake QUIC_PERF_COUNTER_CONN_APP_REJECT | Total connections rejected by the application QUIC_PERF_COUNTER_CONN_RESUMED | Total connections resumed QUIC_PERF_COUNTER_CONN_ACTIVE | Connections currently allocated QUIC_PERF_COUNTER_CONN_CONNECTED | Connections currently in the connected state QUIC_PERF_COUNTER_CONN_PROTOCOL_ERRORS | Total connections shutdown with a protocol error QUIC_PERF_COUNTER_CONN_NO_ALPN | Total connection attempts with no matching ALPN QUIC_PERF_COUNTER_STRM_ACTIVE | Current streams allocated QUIC_PERF_COUNTER_PKTS_SUSPECTED_LOST | Total suspected packets lost QUIC_PERF_COUNTER_PKTS_DROPPED | Total packets dropped for any reason QUIC_PERF_COUNTER_PKTS_DECRYPTION_FAIL | Total packets with decryption failures QUIC_PERF_COUNTER_UDP_RECV | Total UDP datagrams received QUIC_PERF_COUNTER_UDP_SEND | Total UDP datagrams sent QUIC_PERF_COUNTER_UDP_RECV_BYTES | Total UDP payload bytes received QUIC_PERF_COUNTER_UDP_SEND_BYTES | Total UDP payload bytes sent QUIC_PERF_COUNTER_UDP_RECV_EVENTS | Total UDP receive events QUIC_PERF_COUNTER_UDP_SEND_CALLS | Total UDP send API calls QUIC_PERF_COUNTER_APP_SEND_BYTES | Total bytes sent by applications QUIC_PERF_COUNTER_APP_RECV_BYTES | Total bytes received by applications QUIC_PERF_COUNTER_CONN_QUEUE_DEPTH | Current connections queued for processing QUIC_PERF_COUNTER_CONN_OPER_QUEUE_DEPTH | Current connection operations queued QUIC_PERF_COUNTER_CONN_OPER_QUEUED | Total connection operations queued ever QUIC_PERF_COUNTER_CONN_OPER_COMPLETED | Total connection operations processed ever QUIC_PERF_COUNTER_WORK_OPER_QUEUE_DEPTH | Current worker operations queued QUIC_PERF_COUNTER_WORK_OPER_QUEUED | Total worker operations queued ever QUIC_PERF_COUNTER_WORK_OPER_COMPLETED | Total worker operations processed ever QUIC_PERF_COUNTER_PATH_VALIDATED | Total path challenges that succeed ever QUIC_PERF_COUNTER_PATH_FAILURE | Total path challenges that fail ever QUIC_PERF_COUNTER_SEND_STATELESS_RESET | Total stateless reset packets sent ever QUIC_PERF_COUNTER_SEND_STATELESS_RETRY | Total stateless retry packets sent ever Windows Performance Monitor On the latest version of Windows, these counters are also exposed via PerfMon.exe under the QUIC Performance Diagnostics category. The values exposed via PerfMon only represent kernel mode usages of MsQuic, and do not include user mode counters. ETW Counters are also captured at the beginning of MsQuic ETW traces, and unlike PerfMon, includes all MsQuic instances running on the system, both user and kernel mode. Detailed Troubleshooting For detailed trouble shooting steps please see the MsQuic Trouble Shooting Guide."
+ "keywords": "Diagnosing Issues with MsQuic This document describes various ways to debug and diagnose issues when using MsQuic. Built-in Tracing For debugging issues, logging is generally the best way to diagnose problems. MsQuic has extensive logs in the code to facilitate debugging. The logs can simply be converted to human readable text or they can be processed by various automated tools to help improve diagnostics. Note Currently tools only support Windows ETW. Windows On Windows, MsQuic leverages manifested ETW for its logging. All dependencies should be included with Windows; nothing additional should be required. Linux stdout The easiest and quickest way to enable msquic logging is by compiling msquic with the following cmake arguments to direct the logs to standard output: cmake -D QUIC_ENABLE_LOGGING=ON -D QUIC_LOGGING_TYPE=stdout ... LTTng On Linux, MsQuic leverages LTTng for its logging. Some dependencies, such as babeltrace, lttng, and clog2text_lttng are required. The simplest way to install all dependencies is by running ./scripts/prepare-machine.ps1 -ForTest, but if you only want to collect the traces on the machine, the minimal dependencies are: sudo apt-add-repository ppa:lttng/stable-2.13 sudo apt-get update sudo apt-get install -y lttng-tools Perf For general tracing, refer Stacks and CPU usage macOS Tracing is currently unsupported on macOS. Trace Collection Windows There are a lot of general tools and ways to collect and interact with ETW. In the sections below are our recommended ways to collect and consume the MsQuic ETW logs. MsQuicEtw.man specifies all the details for each event. For instance: This details everything about the QuicConnOutFlowStats event (see ConnOutFlowStats in the code), including it's level (verbose) and it's keywords (ut:Connection and ut:DataFlow). So this event would be collected if the level is 5 or higher and either the ut:Connection or the ut:DataFlow keywords are configured. The numeric value for the keywords is at the top of the manifest file: netsh.exe netsh.exe (short for Net Shell, but generally pronounced \"net-s-h\") is an inbox Windows command line tool. It has many uses, but one of the most used is around tracing (at least for the networking community). To start collecting a trace with netsh, use the following command: netsh.exe trace start overwrite=yes report=dis correlation=dis traceFile=quic.etl provider={ff15e657-4f26-570e-88ab-0796b258d11c} level=0x5 keywords=0xffffffff There are a lot of parameters being passed and if you want the complete help text explaining them all you can run netsh.exe trace start help. A few of the important parameters are: traceFile - The name/path of the output ETL file that the logs will be saved to. provider - The GUID of the ETW provider that should be collected. In this case, {ff15e657-4f26-570e-88ab-0796b258d11c} is the GUID for the MsQuic provider. level - The level at which to collect traces. 5 indicates the verbose level. 4 is informational. keywords - The set of flags that indicate which type of events should be collected. The command above collects all keywords (0xffffffff) which may be too verbose for some scenarios, such as high throughput testing or large number of parallel connections. For a detailed list of the available keywords, see MsQuicEtw.man and see . Once you are done with collecting all the events, to stop the trace session, use the following command: netsh.exe trace stop This will flush and finalize the output ETL file specified in the netsh trace start command. This ETL file can either be converted to text or used in the automated tools. wpr.exe wpr.exe is another inbox Windows command line tool that can be used to collect ETW traces, but it requires an additional \"profile\" file: MsQuic.wprp. This WPR profile file details several specifically tailored trace configurations. To use it, you must copy MsQuic.wprp to the machine you wish to collect the traces for, you can simply run the following to start trace collection: wpr.exe -start MsQuic.wprp!PROFILE -filemode You must replace PROFILE with the name of the profile you want to run. Some of the useful profiles are: Stacks.Light- Collects CPU callstacks. Stacks.Verbose- Collects CPU callstacks, DPCs and interrupts. Performance.Light - Collects performance related events useful for automated tool processing. Performance.Verbose - Collects Performance.Light plus CPU callstacks. Basic.Light - Collects general, \"low volume\" MsQuic events. Useful for a \"big picture\" understanding, with as few events collected as possible. Basic.Verbose - Collects all MsQuic events. This is very verbose. Scheduling.Verbose - Collects \"low volume\" and scheduling related MsQuic events. Datapath.Light - Collects \"low volume\" and datapath related MsQuic events. Datapath.Verbose - Collects Datapath.Light plus CPU callstacks. Full.Light - Collects all MsQuic events as well as TAEF events. For most, this will be equivalent to Basic.Verbose. Full.Verbose - Collects all MsQuic events, TAEF events and several networking components' events. This is the most verbose possible, and should only be used for the most minimal scenarios. When you're done and ready to collect the logs, you run: wpr.exe -stop ETL_PATH You must replace ETL_PATH with the path of the output ETL file that you want to be written. If you no longer need the logs being collected, you can cancel a trace instead of saving it to file: wpr.exe -cancel Other Ways As already indicated, there are lots of ways to collect ETW traces. Feel free to look at Configuring and Starting an Event Tracing Session for more information. Linux All in one command This script wraps collecting trace then converting to text as well WARN: This wrapper doesn't work with ./scripts/test.ps1 etc. as it is also creating lttng session internally. cd ${MSQUIC_PATH} ./scripts/log_wrapper.sh ${YOUR_COMMAND} # e.g. ./scripts/log_wrapper.sh ./artifacts/bin/linux/x64_Debug_openssl/msquictest --gtest_filter=Basic.* ls msquic_lttng0 # data quic.babel.txt quic.log Step by step command Instead, you can use the following commands: mkdir msquic_lttng lttng create msquic -o=./msquic_lttng lttng enable-event --userspace \"CLOG_*\" lttng add-context --userspace --type=vpid --type=vtid lttng start And to stop log the trace session, you can use the following command: lttng stop msquic Trace Conversion to Text Windows To convert an ETL file to human-readable text, you can use the following netsh.exe command: netsh.exe trace convert quic.etl overwrite=yes If you're using a version of MsQuic that uses an ETW manifest version more recent than the one built into the Windows image, decoding may not provide correct output. To solve this issue, newer versions of Windows (11 and Server 2022) support a manpath parameter that can be used to manually specify a manifest. This will likely always be needed for apps using MsQuic in user mode. netsh.exe trace convert quic.etl manpath=path/to/manifest/MsQuicEtw.man For older versions of Windows that don't support the manpath parameter, you can manually update the installed manifest that is used to decode the file. You can run the following wevtutil.exe commands in an Admin window: wevtutil.exe um path\\to\\MsQuicEtw.man wevtutil.exe im path\\to\\MsQuicEtw.man /rf:path\\to\\msquic.dll /mf:path\\to\\msquic.dll Replace path\\to with the actual paths to the respective files. With the latest manifests installed, now the normal netsh.exe convert command should work. Linux NOTE: msquic.lttng.so must be built to enable lttng logging - see https://lttng.org/docs/v2.13/#doc-liblttng-ust-dl and it must be placed in the same directory as the msquic.so. Building clog2text_lttng: cat > /etc/apt/preferences.d/99microsoft-dotnet.pref < quic.babel.txt ~/.dotnet/tools/clog2text_lttng -i quic.babel.txt -s clog.sidecar -o quic.log --showTimestamp --showCpuInfo Note The clog.sidecar file that was used to build MsQuic must be used. It can be found in the ./src/manifest directory of the repository. Trace Analysis MsQuic supports a custom plugin for Windows Performance Analyzer (WPA) to detailed analysis of ETW traces. See the WPA instructions for more details. Text Analysis Tool When viewing the traces as text, we recommend TextAnalysisTool.NET (Windows only) and we have several filter files we maintain for it (folder). The different filters are meant to quickly highlight and color code important information. You may also open the trace in Windows Performance Analyzer. See the WPA instructions for more details. Note WPA support for LTTng based logs is not yet available but will be supported in the future. Performance Counters To assist investigations into running systems, MsQuic has a number of performance counters that are updated during runtime. These counters are exposed as an array of unsigned 64-bit integers, via a global GetParam parameter. Sample code demonstrating how to query the performance counters: uint64_t Counters[QUIC_PERF_COUNTER_MAX]; uint32_t BufferLength = sizeof(Counters); MsQuic->GetParam( NULL, QUIC_PARAM_GLOBAL_PERF_COUNTERS, &BufferLength, Counters); Each of the counters available is described here: Counter | Description --------|------------ QUIC_PERF_COUNTER_CONN_CREATED | Total connections ever allocated QUIC_PERF_COUNTER_CONN_HANDSHAKE_FAIL | Total connections that failed during handshake QUIC_PERF_COUNTER_CONN_APP_REJECT | Total connections rejected by the application QUIC_PERF_COUNTER_CONN_RESUMED | Total connections resumed QUIC_PERF_COUNTER_CONN_ACTIVE | Connections currently allocated QUIC_PERF_COUNTER_CONN_CONNECTED | Connections currently in the connected state QUIC_PERF_COUNTER_CONN_PROTOCOL_ERRORS | Total connections shutdown with a protocol error QUIC_PERF_COUNTER_CONN_NO_ALPN | Total connection attempts with no matching ALPN QUIC_PERF_COUNTER_STRM_ACTIVE | Current streams allocated QUIC_PERF_COUNTER_PKTS_SUSPECTED_LOST | Total suspected packets lost QUIC_PERF_COUNTER_PKTS_DROPPED | Total packets dropped for any reason QUIC_PERF_COUNTER_PKTS_DECRYPTION_FAIL | Total packets with decryption failures QUIC_PERF_COUNTER_UDP_RECV | Total UDP datagrams received QUIC_PERF_COUNTER_UDP_SEND | Total UDP datagrams sent QUIC_PERF_COUNTER_UDP_RECV_BYTES | Total UDP payload bytes received QUIC_PERF_COUNTER_UDP_SEND_BYTES | Total UDP payload bytes sent QUIC_PERF_COUNTER_UDP_RECV_EVENTS | Total UDP receive events QUIC_PERF_COUNTER_UDP_SEND_CALLS | Total UDP send API calls QUIC_PERF_COUNTER_APP_SEND_BYTES | Total bytes sent by applications QUIC_PERF_COUNTER_APP_RECV_BYTES | Total bytes received by applications QUIC_PERF_COUNTER_CONN_QUEUE_DEPTH | Current connections queued for processing QUIC_PERF_COUNTER_CONN_OPER_QUEUE_DEPTH | Current connection operations queued QUIC_PERF_COUNTER_CONN_OPER_QUEUED | Total connection operations queued ever QUIC_PERF_COUNTER_CONN_OPER_COMPLETED | Total connection operations processed ever QUIC_PERF_COUNTER_WORK_OPER_QUEUE_DEPTH | Current worker operations queued QUIC_PERF_COUNTER_WORK_OPER_QUEUED | Total worker operations queued ever QUIC_PERF_COUNTER_WORK_OPER_COMPLETED | Total worker operations processed ever QUIC_PERF_COUNTER_PATH_VALIDATED | Total path challenges that succeed ever QUIC_PERF_COUNTER_PATH_FAILURE | Total path challenges that fail ever QUIC_PERF_COUNTER_SEND_STATELESS_RESET | Total stateless reset packets sent ever QUIC_PERF_COUNTER_SEND_STATELESS_RETRY | Total stateless retry packets sent ever Windows Performance Monitor On the latest version of Windows, these counters are also exposed via PerfMon.exe under the QUIC Performance Diagnostics category. The values exposed via PerfMon only represent kernel mode usages of MsQuic, and do not include user mode counters. ETW Counters are also captured at the beginning of MsQuic ETW traces, and unlike PerfMon, includes all MsQuic instances running on the system, both user and kernel mode. Detailed Troubleshooting For detailed trouble shooting steps please see the MsQuic Trouble Shooting Guide."
},
"docs/FAQ.html": {
"href": "docs/FAQ.html",
@@ -97,7 +97,7 @@
"docs/TSG.html": {
"href": "docs/TSG.html",
"title": "Trouble Shooting Guide",
- "keywords": "Trouble Shooting Guide This document is meant to be a step-by-step guide for trouble shooting any issues while using MsQuic. What kind of Issue are you having? I am debugging a crash. Something is not functionally working as I expect. Performance is not what I expect it to be. Debugging a Crash TODO Trouble Shooting a Functional Issue I am getting an error code I don't understand. The connection is unexpectedly shutting down. No application (stream) data seems to be flowing. Why is this API failing? An MsQuic API is hanging. I am having problems with SMB over QUIC. No credentials when loading a server certificate from PEM with Schannel. TLS handshake fails in Chrome and Edge for HTTP/3 (including WebTransport) even though HTTP/1.1 and HTTP/2 work. I need to get a packet capture. Understanding Error Codes Some error codes are MsQuic specific (QUIC_STATUS_*), and some are simply a passthrough from the platform. You can find the MsQuic specific error codes in the platform specific header (msquic_posix.h, msquic_winkernel.h, or msquic_winuser.h). From msquic_winuser.h: #define QUIC_STATUS_ADDRESS_IN_USE HRESULT_FROM_WIN32(WSAEADDRINUSE) // 0x80072740 #define QUIC_STATUS_CONNECTION_TIMEOUT ERROR_QUIC_CONNECTION_TIMEOUT // 0x80410006 #define QUIC_STATUS_CONNECTION_IDLE ERROR_QUIC_CONNECTION_IDLE // 0x80410005 #define QUIC_STATUS_UNREACHABLE HRESULT_FROM_WIN32(ERROR_HOST_UNREACHABLE) // 0x800704d0 #define QUIC_STATUS_INTERNAL_ERROR ERROR_QUIC_INTERNAL_ERROR // 0x80410003 For more info, see the Well Known Status Codes. Linux File Handle Limit Too Small In many Linux setups, the default per-process file handle limit is relatively small (~1024). In scenarios where lots of (usually client) connection are opened, a large number of sockets (a type of file handle) are created. Eventually the handle limit is reached and connections start failing (error codes 0x16 or 0xbebc202) because new sockets cannot be created. To fix this, you will need to increase the handle limit. To query the maximum limit you may set: ulimit -Hn To set a new limit (up to the max): ulimit -n newValue Why is the connection shutting down? What does this QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_TRANSPORT event mean? What does this QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_PEER event mean? Understanding shutdown by Transport. There are two ways for a connection to be shutdown, either by the application layer or by the transport layer (i.e. the QUIC layer). The QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_TRANSPORT event occurs when the transport shuts the connection down. Generally, the transport shuts down the connection either when there's some kind of error or if the negotiated idle period has elapsed. [2]6F30.34B0::2021/04/13-09:22:48.297449100 [Microsoft-Quic][conn][0x1CF25AC46B0] Transport Shutdown: 18446744071566327813 (Remote=0) (QS=1) Above is an example event collected during an attempt to connect to a non-existent server. Eventually the connection failed and the transport indicated the event with the appropriate error code. This error code (18446744071566327813) maps to 0xFFFFFFFF80410005, which specifically refers to the QUIC_STATUS (indicated by QS=1) for 0x80410005; which indicates ERROR_QUIC_CONNECTION_IDLE. For more details for understanding error codes see here. Understanding shutdown by Peer. As indicated in Understanding shutdown by Transport, there are two ways for connections to be shutdown. The QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_PEER event occurs when the peer application has explicitly shut down the connection. In MsQuic API terms, this would mean the app called ConnectionShutdown. TODO - Add an example event The error code indicated in this event is completely application defined (type of QUIC_UINT62). The transport has no understanding of the meaning of this value. It never generates these error codes itself. So, to map these values to some meaning will require the application protocol documentation. Why isn't application data flowing? Application data is exchanged via Streams and queued by the app via StreamSend. The act of queuing data doesn't mean it will be immediately sent to the peer. There are a number of things that can block or delay the exchange. The QUIC_FLOW_BLOCK_REASON enum in quic_trace.h contains the full list of reasons that data may be blocked. Below is a short explanation of each: Value Meaning QUIC_FLOW_BLOCKED_SCHEDULING 1 The cross-connection scheduling logic has determined that too much work is queued on the connection to be processed all at once. Generally, this means we are CPU-bound. QUIC_FLOW_BLOCKED_PACING 2 Data burst sizes into the network are being limited and periodically sent into the network based on the congestion control's pacing logic. QUIC_FLOW_BLOCKED_AMPLIFICATION_PROT 4 The peer has not proved ownership of their IP address and therefore we are locally limiting the amount of data to send to it. QUIC_FLOW_BLOCKED_CONGESTION_CONTROL 8 Congestion control has determined that the network cannot handle any more data currently. QUIC_FLOW_BLOCKED_CONN_FLOW_CONTROL 16 The connection-wide limit for the amount of data that can be buffered or accepted by the peer at this time has been reached. QUIC_FLOW_BLOCKED_STREAM_ID_FLOW_CONTROL 32 The limit on the number of streams the peer can accept has been reached. QUIC_FLOW_BLOCKED_STREAM_FLOW_CONTROL 64 The limit on the amount of data that can be buffered or accepted by the peer for this stream has been reached. QUIC_FLOW_BLOCKED_APP 128 All data queued by the application on this stream has been sent. No more data is available to send. Internally, MsQuic tracks these flags at all times for every connection and stream. Whenever any of them change, MsQuic logs an event. For example: [0]0004.0F54::2021/05/14-10:30:22.541024000 [Microsoft-Quic][strm][0xC16BA610] Send Blocked Flags: 128 This event indicates that stream C16BA610 only has the QUIC_FLOW_BLOCKED_APP flag, so it is currently blocked because there is no more application data queued to be sent. The QUIC WPA plugin also supports visualizing these blocked states via the QUIC TX Blocked State graph. It allows you to see what flags are blocking the connection as a whole (shown as stream 0) and what is blocking each individual stream, over the lifetime of the whole connection. For instance, in the image above, you can see the stream (1) is blocked most of the time because there is no application data. Beyond that, the connection (\"stream\" 0) alternated between pacing and congestion control as the blocked reasons. Why is this API Failing? The simplest way to determine exactly why a particular API is failing is via tracing. Collect the traces for the repro and convert them to text and open them in your favorite text editor (try TextAnalysisTool!). MsQuic logs every API entry and exit. Depending on the platform and tool used to decode the traces to text, you may either have the number or an enum represented as the API type (see QUIC_TRACE_API_TYPE in quic_trace.h), but all events look something like this: [cpu][process.thread][time][ api] Enter () [cpu][process.thread][time][ api] Exit [optional status code] A TextAnalysisTool filter (api.tat) is also included in ./docs/tat to help quickly find all failed API calls. Example (ListenerStart failing with QUIC_STATUS_INVALID_STATE) In a recent example, we wanted to know why an app occasionally received QUIC_STATUS_INVALID_STATE when it called ListenerStart. We took the following steps to diagnose it. Collected traces for the repro. Converted to text and opened them in TextAnalysisTool. Added a filter for all API enter events for QUIC_TRACE_API_LISTENER_START (10). Looked for the following [ api] Exit event after each enter event on the same [process.thread]. This quickly resulted in the following pair of events. They show the app called ListenerStart for the listener pointer 7f30ac0dcff0 at 09:54:03.528362 in process 2e73 on thread 2e8b (CPU 1). Shortly after, MsQuic returned with status 200000002 (QUIC_STATUS_INVALID_STATE on Posix platforms). [1][2e73.2e8b][09:54:03.528362][ api] Enter 10 (0x7f30ac0dcff0). [1][2e73.2e8b][09:54:03.528913][ api] Exit 200000002 From here, we simply went backwards from the exit event to find any errors; and came up with the full set of important traces: [1][2e73.2e8b][09:54:03.528362][ api] Enter 10 (0x7f30ac0dcff0). [1][2e73.2e8b][09:54:03.528902][bind][0x7f30b80394e0] Listener (0x7f30ac076d90) already registered on ALPN [1][2e73.2e8b][09:54:03.528903][list][0x7f30ac0dcff0] ERROR, \"Register with binding\". [1][2e73.2e8b][09:54:03.528913][ api] Exit 200000002 This clearly shows that listener 7f30ac0dcff0 failed to register with the binding (i.e. UDP socket abstraction) because listener 7f30ac076d90 was already registered for the same ALPN. MsQuic only allows a single listener to be registered for a given ALPN on a local IP address and port. Why is the API hanging or deadlocking? First, a bit of background. The MsQuic API has two types of APIs: Blocking / Synchronous - These APIs run to completion and only return once finished. When running in the Windows kernel, these MUST NOT be called at DISPATCH_LEVEL. They are denoted by the _IRQL_requires_max_(PASSIVE_LEVEL) annotation. For example, ConnectionClose. Nonblocking / Asynchronous - These APIs merely queue work and return immediately. When running in the Windows kernel, these may be called at DISPATCH_LEVEL. They are denoted by the _IRQL_requires_max_(DISPATCH_LEVEL) annotation. For example, StreamSend. Additional documentation on the MsQuic execution model is available here. Now, back to the problem. The app is calling into an MsQuic API and it is hanging and likely deadlocked. This can only happen for synchronous APIs. What do you do next? Generally, this is because the app is breaking one of the following rules: Do not block the MsQuic thread/callback for any length of time. You may acquire a lock/mutex, but you must guarantee very quick execution. Do not grab a lock that you also hold (on a different thread) when calling back into MsQuic. Do not call MsQuic APIs cross-object on MsQuic the thread/callbacks. For instance, if you're in a callback for Connection A, do not call ConnectionClose for Connection B. To verify exactly what is happening, Collect the traces and open then up in a text editor (ideally TextAnalysisTool). The simplest way forward from here is to filter the logs based on the pointer of the object you are calling the API on. For instance, if you are calling ConnectionClose on 0x7fd36c0019c0, then add a filter for 7fd36c0019c0. Here is an example (filtered) log for just such a case: [0][53805.5381b][11:22:52.896762][ api] Enter 13 (0x7fd36c0019c0). [0][53805.53815][11:22:52.896796][conn][0x7fd36c0019c0] Scheduling: 2 [0][53805.53815][11:22:52.896797][conn][0x7fd36c0019c0] Execute: 1 [0][53805.53815][11:22:52.896797][conn][0x7fd36c0019c0] Recv 1 UDP datagrams [0][53805.53815][11:22:52.896825][conn][0x7fd36c0019c0] IN: BytesRecv=2901 [0][53805.53815][11:22:52.896826][conn][0x7fd36c0019c0] Batch Recv 1 UDP datagrams [0][53805.53815][11:22:52.896854][strm][0x7fd378028360] Created, Conn=0x7fd36c0019c0 ID=0 IsLocal=0 [0][53805.53815][11:22:52.896856][conn][0x7fd36c0019c0] Indicating QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED [0x7fd378028360, 0x0] [1][53805.53813][11:22:53.142398][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams [1][53805.53813][11:22:53.392819][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams [1][53805.53813][11:22:53.644259][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams You will notice 3 different threads (seen in [0][53805.X]): 5381b - The app thread that is calling in to close the connection. 53815 - The MsQuic worker thread that drives execution for the connection. 53813 - The MsQuic UDP thread that is processing received packets and queuing them on the connection. As you can see, the last event/log on the MsQuic worker thread was an indication of a QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED event to the app. There are no further events on this thread (easily verified by adding an additional filter for [53805.53815]). So, the app must be blocking this thread. The most likely scenario is that the app is holding a lock while calling ConnectionClose on thread 5381b and then in thread 53815, the app is trying to acquire the same lock. The solution here is that the app must not hold the lock when it calls into the blocking API, if that lock may also be acquired on the MsQuic thread. Trouble Shooting SMB over QUIC issues To troubleshoot any SMB over QUIC issues on windows platforms, the best way is to collect SMB and QUIC traces and sharing it with SMB developers. Following are the steps: Copy msquic/scripts/t.cmd to a local folder. For SMB Client (a.k.a. RDR) WPP traces t.cmd clion // repro and get the relevant error. t.cmd off For SMB Server WPP traces t.cmd srvon // repro and get the relevant error. t.cmd off Share the generated cab file with SMB developers. Convert PEM to PKCS#12 for Schannel When using Schannel, a certificate imported by X509Certificate2.CreateFromPem() in .NET needs to be exported to a byte[] in PKCS#12 (aka PFX) format and re-imported to be used as a server certificate. static X509Certificate2 CreatePkcs12FromPem(string certPem, string keyPem) { using var cert = X509Certificate2.CreateFromPem(certPem, keyPem); return new(cert.Export(X509ContentType.Pkcs12)); } Using a self-signed certificate for HTTP/3 Chromium-based browsers requires the server certificate to be trusted by a default CA for QUIC (e.g. HTTP/3 and WebTransport), even though the same certificate may already be trusted for HTTP/1.1 and HTTP/2. To use a self-signed certificate or a certificate that is not ultimately issued by one of the default CAs, you need to whitelist its SHA-256 hash via the serverCertificateHashes option and follow stricter requirements. See FlyByWireless.CustomCertificate.Generate() on how to generate such a certificate. Collecting a Packet Capture Linux Packet Capture TODO Window Packet Capture On Windows, Packet Monitor (pktmon) is the best way to collect a packet capture. The (optional) first step is usually to find the interface you want to collect the capture on. Do this by first running pktmon list: > pktmon list Network Adapters: Id MAC Address Name -- ----------- ---- 9 40-8D-5C-B5-46-51 Intel(R) Ethernet Connection (2) I219-V #2 104 00-15-5D-D1-5A-30 433db3ea-0acd-457a-9c86-55bb7fa27391 80 00-15-5D-AD-8B-40 433db3ea-0acd-457a-9c86-55bb7fa27391 Note If you don't do this and use it to filter to a specific component, you will get a packet capture at every layer, which will include many duplicates of each packet. Once you find the interface you want, take note of the Id. For instance, in the example above, I want to use the Ethernet adapter, so I need 9. Then, to collect the capture for your scenario (example below uses port 443), run the following: pktmon filter remove pktmon filter add -c 9 -t UDP -p 443 pktmon start --capture --pkt-size 0 pktmon stop pktmon etl2pcap pktmon.etl This produced pktmon.pcapng in your current directory that can then be opened by Wireshark. If you want to be able to decrypt the QUIC packets, you will need to get/export the TLS secrets from your code (todo: add link/instructions). Note If you don't specify the component in the filter step, you can specify it at the etl2pcap step: pktmon etl2pcap pktmon.etl -c 9 and it will produce the same final output pcapng file. Trouble Shooting a Performance Issue Is it a problem with just a single (or very few) connection? Is it a problem multiple (lots) of connections? Why is Performance bad for my Connection? Where is the CPU being spent for my connection? What is limiting throughput for my connection? Why is the network limiting throughput for my connection? Analyzing CPU Usage Important - The following is specific to Windows OS. It's extremely common that everything may be functional, but not just as fast as expected. So the normal next step is then to grab a performance trace for the scenario and then dive into the details to analyze what exactly is happening. When you're explicitly looking for where the CPU is spending its time in the scenario, you will need to collect CPU traces. One way to do this is to use WPR (with Stacks.Light profile). Once you have the ETL, open it in WPA (MsQuic plugin is unnecessary). Then, go to the Graph Explorer, expand Computation, expand CPU Usage (Sampled) and open up Utilization by Process, Thread, Stack. The following is an example of a CPU trace a server in a client upload scenario: The next step is to filter things down to only the important information. First, select the relavent time period: click and drag over the area, right click, select Zoom. Then eliminate any unrelated \"noise\". The easiest way is to select all the Processes that have very little weight; then right click, and select Filter Out Selection. Another important thing to filter is the [Idle] stack under the Idle (0) process. Expact the Thread ID column for Idle (0), right click the [Idle] row and then select Filter Out Selection. You should then be left with only the relavent CPU usage. The current view now shows the CPU usage as a fraction of the total available CPU resources. Generally, the total available resources will include multiple (possibly many) different CPUs. So, it's a lot more helpful to get a per-CPU view. To see this, right click the table header (for instance, on Process), expand More Columns ..., and select CPU. Next, drag the CPU column to the front, before Process. Initially, nothing will most likely show up in the graph view. You will need to manually enable the relavent CPUs. In this particular example, only CPUs 4 and 36 have any significant usage (as seen by Count or Weight columns). In the Series list in the graph view, click the boxes next to each relavent CPU series to enable it. You may also chose to explicitly filter out the CPUs that are irrelevant. Now, you could use this view to dig into the various stacks by expanding the rows under the Stack column, but a better way is to change the view to a Flame Graph. Click the icon at the top to Select chart type and change it to Flame. Additionally, check the Enable box under it to enable filtering. You can then slide the filter slider to change the level of filtering. In this example, after resizing the window some, it looks like this: As you will immediately see, shows up everywhere because symbols haven't been loaded. To load them, you will first need to configure the symbol path. You can do this by clicking on Trace in the top and then Configure Symbol Path. Then add any relavent paths to the list and close the window (hit Ok). Next you can load the symbols; Trace -> Load Symbols. This example has just the msquic.dll and secnetperf.exe symbols loaded. You can then drill down to various various part of the flame, and zoom into them by right clicking on them and selecting Filter To Flame. Also, remember you can change the Flame filter slider at any time as well. For example, the following is what filtering to the msquic.dll!QuicWorkerThread flame. Since this flame was essentially all of CPU 4, whatever is taking the most significant CPU resources here can be blamed as the most significant cost of CPU resources. Practically, this comes down to bcrypt.dll. From experience, since this happens in QuicConnRecvDecryptAndAuthenticate this is the work related to decrypting the packet payloads (over 71% of the CPU!). Finding Throughput Bottlenecks TODO Why is Performance bad across all my Connections? UDP receive offload is not working. The work load isn't spreading evenly across cores. Diagnosing Software UDP Receive Offload Issues Important - The following is specific to Windows OS. Software UDP Receive Offload (URO) is an importance performance feature. To check if URO is working correctly, you can follow this guide and use Full.Verbose profile to collect TCPIP traces. In the converted text file, if you see this event, URO is working. The below event indicates UDP layer saw a UDP packet coalesced from 5 UDP packets each with 1000 byte payload. [Microsoft-Windows-TCPIP]UDP: endpoint 0xFFFFA4033FC652C0: URO SCU received. SegCount = 5, SegSize = 1000, DataLength = 5000. If you are not seeing the above event at all, there are several things that can break URO functionality. Global URO Knob URO can be administratively turned off system-wide from a netsh knob. Check by running netsh int udp show global. If Receive Offload State is displayed as disabled, then URO has been administratively disabled. Incompatible Miniport or Medium Take a look at the IP interface rundown traces. Software RSC/URO applicable must be TRUE for URO to work. If it is FALSE, it means the underlying miniport driver is using NDIS 5 or the interface medium is not compatible (e.g. KDNic). [Microsoft-Windows-TCPIP]Framing: interface rundown: Interface = 8, Luid = 0x6008000000000, Address family = 2(IPV4), Compartment = 1, Isolation mode = 0(None), Isolation ID = 0, DL address = 0x00155D563406, Interface type = 6, Physical medium type = 19(NdisPhysicalMediumOther), SW RSC/URO applicable = 0(FALSE), SW RSC enabled = 0(FALSE), Alias = Ethernet (Kernel Debugger). Incompatible Software Component We also have a rundown trace for URO global disabled mask. The mask must be zero for URO to work. It's common that the mask is 2 (0b000010), which means some incompatible WFP callouts have disabled URO. If you see this on a freshly installed machine, try disabling real-time protection from Microsoft Defender settings. It's also possible the mask is 48 (0b110000), which means some incompatible IPSNPI clients have disabled URO. These (winnat or FSE) can automatically get enabled when WSL or Hyper-V are enabled on a machine. [Microsoft-Windows-TCPIP]TCP software RSC global disabled mask = 0, UDP software URO global disabled mask = 0. Incompatible Hardware Feature (PTP Timestamps) Another reason that UDP packets by design will not be coalesced if they carry different PTP timestamps. PTP timestamp is a feature for accurately synchronizing time supported by some NICs and it should be off by default. You can turn off PTP timestamps in NIC properties. Diagnosing RSS Issues Important - The following is specific to Windows OS. For scenarios with lots of parallel connections, generally the work should be spread across all the available processors. But if there are issues with the work not spreading there are a few things you can do. First off, here's an example where the RPS tests were run to a server that should be able to handle near 1 million requests per second: > secnetperf.exe -test:RPS -target:quic-server -conns:250 -requests:7500 -request:0 -response:4096 -runtime:20000 All Connected! Waiting for idle. Start sending request... Started! Result: 25869 RPS, Min: 944, Max: 888078, 50th: 281889.000000, 90th: 369965.000000, 99th: 490582.000000, 99.9th: 574533.000000, 99.99th: 797810.000000, 99.999th: 884055.000000, 99.9999th: 888078.000000, StdErr: 91.224221 App Main returning status 0 As you can see from the output of secnetperf.exe the resulting 25869 RPS is nowhere near what it should be. The next step is to grab a performance trace to see what is going on. For these type of issues the best way to collect the traces would be to use WPR (with Scheduling.Verbose or Performance.Verbose profiles). Once you have the ETL, open it in WPA using the MsQuic plugin. First thing after opening, let's take a look at the QUIC Worker utilization. In the Graph Explorer, under Computation, expand QUIC Workers and open the one labeled Utilization by Worker. For the example above (server-side trace), here is what the output looks like: You can immediately see that only 2 different workers are being used, with worker 2 being used primarily. In depth details - MsQuic always uses at least two workers on the server side for per connection. The first worker is a global, shared worker that is used to do initial validation of the connection request. Its job is to figure out which app the incoming connection belongs to. Once that's complete, the connection will be handed off to that app (and its worker thread(s)). So, this is why you only see usage of worker 1 at the beginning of the trace. The first usage spikes are from the RPS test initially connecting all its (250) parallel connections. There is a bit of back and forth to do the handshakes for these connections. Then there is an idle period while the test waits for things to die down. Finally, the actual RPS tests commense and that is where you see the solid usage of worker 2. Ideally, RPS tests should generate work that is spread across many different workers. The fact that only 1 worker is being used is definitely the source of the low RPS numbers that were measured and indicated in the tool output above. Since MsQuic picks which workers to use based on how the UDP datagrams are received, the next step is to look into the UDP receive layer. One way to do this is by using the Generic Events table (under System Activity in the Graph Explorer). Open that up, and then filter to just MsQuic (also shows up as ff15e657-4f26-570e-88ab-0796b258d11c some times) Provider Name (Right Click -> Filter To Selection). Next, we're looking for specifically the datapath receive events, which are Id 9218. Filter to just those and we see something like this: Now, what we're really interested in is what CPU these events are coming in on. So, add the CPU column to the left of the yellow bar. It doesn't really show a much different picture, but you can clearly see that all events happen on CPU 0. Now, we can clearly see that all our receive events are happening on the same CPU. This definitely not supposed to happen in an environment where RSS should be spreading all the different incoming UDP flows/tuples to different processors. The next step is to take a look at the RSS configuration on the machine to ensure things are properly configured. Run Get-NetAdapterRss to get an output like this: Name : Slot0A x8 InterfaceDescription : Mellanox ConnectX-3 Pro Ethernet Adapter Enabled : True NumberOfReceiveQueues : 8 Profile : Closest BaseProcessor: [Group:Number] : 0:0 MaxProcessor: [Group:Number] : 1:38 MaxProcessors : 8 RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:2/0 0:4/0 0:6/0 0:8/0 0:10/0 0:12/0 0:14/0 0:16/0 0:18/0 0:20/0 0:22/0 0:24/0 0:26/0 0:28/0 0:30/0 0:32/0 0:34/0 0:36/0 0:38/0 1:0/32767 1:2/32767 1:4/32767 1:6/32767 1:8/32767 1:10/32767 1:12/32767 1:14/32767 1:16/32767 1:18/32767 1:20/32767 1:22/32767 1:24/32767 1:26/32767 1:28/32767 1:30/32767 1:32/32767 1:34/32767 1:36/32767 1:38/32767 IndirectionTable: [Group:Number] : 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 The output above indicates RSS is configured with 8 queues, so there should be spreading of the incoming flows to 8 different CPUs (and then passed to 8 different workers) instead of just the 1 that we are seeing. So, finally, in cases where everything seems to be configured correctly, but things still aren't working, that usually indicates a problem with the network card driver. Make sure the driver is up to date with the latest version available. If that still doesn't fix the problem, you will likely need to contact support from the network card vendor. Analyzing Network Issues TODO Drops on the Receiver Sometimes the issue can actually be the receiver itself, and not the network in between. The problem is that the sender generally cannot distinguish between network drops and receiver drops; even the receiving QUIC layer cannot necessarily identify these drops on its own. On Windows, the OS has a number of performance counters (some seen below) that can be used to analyze packet drops or discards at various layers. TODO - How to use ETW logs to collect even more detailed info on drops by the OS networking stack. TODO - How can you get similar info for Linux? Network Card Discards In some high throughput scenarios, the default number of NIC receive buffers might not be enough to handle spikes in network traffic. When a spike is too large for the NIC to handle, it has to drop the excess packets, resulting in Packet Received Discarded counter increases. When this happens, usually the best way to handle this it to increase the Receive Buffers value in the NIC's Advanced settings (seen below)."
+ "keywords": "Trouble Shooting Guide This document is meant to be a step-by-step guide for trouble shooting any issues while using MsQuic. What kind of Issue are you having? I am debugging a crash. Something is not functionally working as I expect. Performance is not what I expect it to be. Debugging a Crash TODO Trouble Shooting a Functional Issue MsQuic logging? I am getting an error code I don't understand. The connection is unexpectedly shutting down. The stream is aborted No application (stream) data seems to be flowing. Why is this API failing? An MsQuic API is hanging. I am having problems with SMB over QUIC. No credentials when loading a server certificate from PEM with Schannel. TLS handshake fails in Chrome and Edge for HTTP/3 (including WebTransport) even though HTTP/1.1 and HTTP/2 work. I need to get a packet capture. Logging See [Tracing](./Diagnostics.md#Built-in Tracing) Understanding Error Codes Some error codes are MsQuic specific (QUIC_STATUS_*), and some are simply a passthrough from the platform. You can find the MsQuic specific error codes in the platform specific header (msquic_posix.h, msquic_winkernel.h, or msquic_winuser.h). From msquic_winuser.h: #define QUIC_STATUS_ADDRESS_IN_USE HRESULT_FROM_WIN32(WSAEADDRINUSE) // 0x80072740 #define QUIC_STATUS_CONNECTION_TIMEOUT ERROR_QUIC_CONNECTION_TIMEOUT // 0x80410006 #define QUIC_STATUS_CONNECTION_IDLE ERROR_QUIC_CONNECTION_IDLE // 0x80410005 #define QUIC_STATUS_UNREACHABLE HRESULT_FROM_WIN32(ERROR_HOST_UNREACHABLE) // 0x800704d0 #define QUIC_STATUS_INTERNAL_ERROR ERROR_QUIC_INTERNAL_ERROR // 0x80410003 For more info, see the Well Known Status Codes. Linux File Handle Limit Too Small In many Linux setups, the default per-process file handle limit is relatively small (~1024). In scenarios where lots of (usually client) connection are opened, a large number of sockets (a type of file handle) are created. Eventually the handle limit is reached and connections start failing (error codes 0x16 or 0xbebc202) because new sockets cannot be created. To fix this, you will need to increase the handle limit. To query the maximum limit you may set: ulimit -Hn To set a new limit (up to the max): ulimit -n newValue Why is the connection shutting down? What does this QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_TRANSPORT event mean? What does this QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_PEER event mean? Understanding shutdown by Transport. There are two ways for a connection to be shutdown, either by the application layer or by the transport layer (i.e. the QUIC layer). The QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_TRANSPORT event occurs when the transport shuts the connection down. Generally, the transport shuts down the connection either when there's some kind of error or if the negotiated idle period has elapsed. [2]6F30.34B0::2021/04/13-09:22:48.297449100 [Microsoft-Quic][conn][0x1CF25AC46B0] Transport Shutdown: 18446744071566327813 (Remote=0) (QS=1) Above is an example event collected during an attempt to connect to a non-existent server. Eventually the connection failed and the transport indicated the event with the appropriate error code. This error code (18446744071566327813) maps to 0xFFFFFFFF80410005, which specifically refers to the QUIC_STATUS (indicated by QS=1) for 0x80410005; which indicates ERROR_QUIC_CONNECTION_IDLE. For more details for understanding error codes see here. Understanding shutdown by Peer. As indicated in Understanding shutdown by Transport, there are two ways for connections to be shutdown. The QUIC_CONNECTION_EVENT_SHUTDOWN_INITIATED_BY_PEER event occurs when the peer application has explicitly shut down the connection. In MsQuic API terms, this would mean the app called ConnectionShutdown. TODO - Add an example event The error code indicated in this event is completely application defined (type of QUIC_UINT62). The transport has no understanding of the meaning of this value. It never generates these error codes itself. So, to map these values to some meaning will require the application protocol documentation. The stream is aborted. Stream abortion is stream terminated abruptly. Remote Stream is aborted by my side? symptom: Peer complains starting stream is failed due to remote abortion. The remote stream maybe aborted locally by Local calls the StreamShutdown with abortive flags Local calls the StreamClose Local calls the ConnectionShutdown Connection callback handler returns the value other than QUIC_STATUS_SUCCESS for event: QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED Why isn't application data flowing? Application data is exchanged via Streams and queued by the app via StreamSend. The act of queuing data doesn't mean it will be immediately sent to the peer. There are a number of things that can block or delay the exchange. The QUIC_FLOW_BLOCK_REASON enum in quic_trace.h contains the full list of reasons that data may be blocked. Below is a short explanation of each: Value Meaning QUIC_FLOW_BLOCKED_SCHEDULING 1 The cross-connection scheduling logic has determined that too much work is queued on the connection to be processed all at once. Generally, this means we are CPU-bound. QUIC_FLOW_BLOCKED_PACING 2 Data burst sizes into the network are being limited and periodically sent into the network based on the congestion control's pacing logic. QUIC_FLOW_BLOCKED_AMPLIFICATION_PROT 4 The peer has not proved ownership of their IP address and therefore we are locally limiting the amount of data to send to it. QUIC_FLOW_BLOCKED_CONGESTION_CONTROL 8 Congestion control has determined that the network cannot handle any more data currently. QUIC_FLOW_BLOCKED_CONN_FLOW_CONTROL 16 The connection-wide limit for the amount of data that can be buffered or accepted by the peer at this time has been reached. QUIC_FLOW_BLOCKED_STREAM_ID_FLOW_CONTROL 32 The limit on the number of streams the peer can accept has been reached. QUIC_FLOW_BLOCKED_STREAM_FLOW_CONTROL 64 The limit on the amount of data that can be buffered or accepted by the peer for this stream has been reached. QUIC_FLOW_BLOCKED_APP 128 All data queued by the application on this stream has been sent. No more data is available to send. Internally, MsQuic tracks these flags at all times for every connection and stream. Whenever any of them change, MsQuic logs an event. For example: [0]0004.0F54::2021/05/14-10:30:22.541024000 [Microsoft-Quic][strm][0xC16BA610] Send Blocked Flags: 128 This event indicates that stream C16BA610 only has the QUIC_FLOW_BLOCKED_APP flag, so it is currently blocked because there is no more application data queued to be sent. The QUIC WPA plugin also supports visualizing these blocked states via the QUIC TX Blocked State graph. It allows you to see what flags are blocking the connection as a whole (shown as stream 0) and what is blocking each individual stream, over the lifetime of the whole connection. For instance, in the image above, you can see the stream (1) is blocked most of the time because there is no application data. Beyond that, the connection (\"stream\" 0) alternated between pacing and congestion control as the blocked reasons. Why is this API Failing? The simplest way to determine exactly why a particular API is failing is via tracing. Collect the traces for the repro and convert them to text and open them in your favorite text editor (try TextAnalysisTool!). MsQuic logs every API entry and exit. Depending on the platform and tool used to decode the traces to text, you may either have the number or an enum represented as the API type (see QUIC_TRACE_API_TYPE in quic_trace.h), but all events look something like this: [cpu][process.thread][time][ api] Enter () [cpu][process.thread][time][ api] Exit [optional status code] A TextAnalysisTool filter (api.tat) is also included in ./docs/tat to help quickly find all failed API calls. Example (ListenerStart failing with QUIC_STATUS_INVALID_STATE) In a recent example, we wanted to know why an app occasionally received QUIC_STATUS_INVALID_STATE when it called ListenerStart. We took the following steps to diagnose it. Collected traces for the repro. Converted to text and opened them in TextAnalysisTool. Added a filter for all API enter events for QUIC_TRACE_API_LISTENER_START (10). Looked for the following [ api] Exit event after each enter event on the same [process.thread]. This quickly resulted in the following pair of events. They show the app called ListenerStart for the listener pointer 7f30ac0dcff0 at 09:54:03.528362 in process 2e73 on thread 2e8b (CPU 1). Shortly after, MsQuic returned with status 200000002 (QUIC_STATUS_INVALID_STATE on Posix platforms). [1][2e73.2e8b][09:54:03.528362][ api] Enter 10 (0x7f30ac0dcff0). [1][2e73.2e8b][09:54:03.528913][ api] Exit 200000002 From here, we simply went backwards from the exit event to find any errors; and came up with the full set of important traces: [1][2e73.2e8b][09:54:03.528362][ api] Enter 10 (0x7f30ac0dcff0). [1][2e73.2e8b][09:54:03.528902][bind][0x7f30b80394e0] Listener (0x7f30ac076d90) already registered on ALPN [1][2e73.2e8b][09:54:03.528903][list][0x7f30ac0dcff0] ERROR, \"Register with binding\". [1][2e73.2e8b][09:54:03.528913][ api] Exit 200000002 This clearly shows that listener 7f30ac0dcff0 failed to register with the binding (i.e. UDP socket abstraction) because listener 7f30ac076d90 was already registered for the same ALPN. MsQuic only allows a single listener to be registered for a given ALPN on a local IP address and port. Why is the API hanging or deadlocking? First, a bit of background. The MsQuic API has two types of APIs: Blocking / Synchronous - These APIs run to completion and only return once finished. When running in the Windows kernel, these MUST NOT be called at DISPATCH_LEVEL. They are denoted by the _IRQL_requires_max_(PASSIVE_LEVEL) annotation. For example, ConnectionClose. Nonblocking / Asynchronous - These APIs merely queue work and return immediately. When running in the Windows kernel, these may be called at DISPATCH_LEVEL. They are denoted by the _IRQL_requires_max_(DISPATCH_LEVEL) annotation. For example, StreamSend. Additional documentation on the MsQuic execution model is available here. Now, back to the problem. The app is calling into an MsQuic API and it is hanging and likely deadlocked. This can only happen for synchronous APIs. What do you do next? Generally, this is because the app is breaking one of the following rules: Do not block the MsQuic thread/callback for any length of time. You may acquire a lock/mutex, but you must guarantee very quick execution. Do not grab a lock that you also hold (on a different thread) when calling back into MsQuic. Do not call MsQuic APIs cross-object on MsQuic the thread/callbacks. For instance, if you're in a callback for Connection A, do not call ConnectionClose for Connection B. To verify exactly what is happening, Collect the traces and open then up in a text editor (ideally TextAnalysisTool). The simplest way forward from here is to filter the logs based on the pointer of the object you are calling the API on. For instance, if you are calling ConnectionClose on 0x7fd36c0019c0, then add a filter for 7fd36c0019c0. Here is an example (filtered) log for just such a case: [0][53805.5381b][11:22:52.896762][ api] Enter 13 (0x7fd36c0019c0). [0][53805.53815][11:22:52.896796][conn][0x7fd36c0019c0] Scheduling: 2 [0][53805.53815][11:22:52.896797][conn][0x7fd36c0019c0] Execute: 1 [0][53805.53815][11:22:52.896797][conn][0x7fd36c0019c0] Recv 1 UDP datagrams [0][53805.53815][11:22:52.896825][conn][0x7fd36c0019c0] IN: BytesRecv=2901 [0][53805.53815][11:22:52.896826][conn][0x7fd36c0019c0] Batch Recv 1 UDP datagrams [0][53805.53815][11:22:52.896854][strm][0x7fd378028360] Created, Conn=0x7fd36c0019c0 ID=0 IsLocal=0 [0][53805.53815][11:22:52.896856][conn][0x7fd36c0019c0] Indicating QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED [0x7fd378028360, 0x0] [1][53805.53813][11:22:53.142398][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams [1][53805.53813][11:22:53.392819][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams [1][53805.53813][11:22:53.644259][conn][0x7fd36c0019c0] Queuing 1 UDP datagrams You will notice 3 different threads (seen in [0][53805.X]): 5381b - The app thread that is calling in to close the connection. 53815 - The MsQuic worker thread that drives execution for the connection. 53813 - The MsQuic UDP thread that is processing received packets and queuing them on the connection. As you can see, the last event/log on the MsQuic worker thread was an indication of a QUIC_CONNECTION_EVENT_PEER_STREAM_STARTED event to the app. There are no further events on this thread (easily verified by adding an additional filter for [53805.53815]). So, the app must be blocking this thread. The most likely scenario is that the app is holding a lock while calling ConnectionClose on thread 5381b and then in thread 53815, the app is trying to acquire the same lock. The solution here is that the app must not hold the lock when it calls into the blocking API, if that lock may also be acquired on the MsQuic thread. Trouble Shooting SMB over QUIC issues To troubleshoot any SMB over QUIC issues on windows platforms, the best way is to collect SMB and QUIC traces and sharing it with SMB developers. Following are the steps: Copy msquic/scripts/t.cmd to a local folder. For SMB Client (a.k.a. RDR) WPP traces t.cmd clion // repro and get the relevant error. t.cmd off For SMB Server WPP traces t.cmd srvon // repro and get the relevant error. t.cmd off Share the generated cab file with SMB developers. Convert PEM to PKCS#12 for Schannel When using Schannel, a certificate imported by X509Certificate2.CreateFromPem() in .NET needs to be exported to a byte[] in PKCS#12 (aka PFX) format and re-imported to be used as a server certificate. static X509Certificate2 CreatePkcs12FromPem(string certPem, string keyPem) { using var cert = X509Certificate2.CreateFromPem(certPem, keyPem); return new(cert.Export(X509ContentType.Pkcs12)); } Using a self-signed certificate for HTTP/3 Chromium-based browsers requires the server certificate to be trusted by a default CA for QUIC (e.g. HTTP/3 and WebTransport), even though the same certificate may already be trusted for HTTP/1.1 and HTTP/2. To use a self-signed certificate or a certificate that is not ultimately issued by one of the default CAs, you need to whitelist its SHA-256 hash via the serverCertificateHashes option and follow stricter requirements. See FlyByWireless.CustomCertificate.Generate() on how to generate such a certificate. Collecting a Packet Capture Linux Packet Capture # Capture from any interface, all the udp traffic on the host and write it to the msquic.pcap file tcpdump -i any udp -w msquic.pcap # wireshark to view it. (You need load SSLKEYLOGFILE for the session key to decrypt the session) wireshark ./msquic.pcap Window Packet Capture On Windows, Packet Monitor (pktmon) is the best way to collect a packet capture. The (optional) first step is usually to find the interface you want to collect the capture on. Do this by first running pktmon list: > pktmon list Network Adapters: Id MAC Address Name -- ----------- ---- 9 40-8D-5C-B5-46-51 Intel(R) Ethernet Connection (2) I219-V #2 104 00-15-5D-D1-5A-30 433db3ea-0acd-457a-9c86-55bb7fa27391 80 00-15-5D-AD-8B-40 433db3ea-0acd-457a-9c86-55bb7fa27391 Note If you don't do this and use it to filter to a specific component, you will get a packet capture at every layer, which will include many duplicates of each packet. Once you find the interface you want, take note of the Id. For instance, in the example above, I want to use the Ethernet adapter, so I need 9. Then, to collect the capture for your scenario (example below uses port 443), run the following: pktmon filter remove pktmon filter add -c 9 -t UDP -p 443 pktmon start --capture --pkt-size 0 pktmon stop pktmon etl2pcap pktmon.etl This produced pktmon.pcapng in your current directory that can then be opened by Wireshark. If you want to be able to decrypt the QUIC packets, you will need to get/export the TLS secrets from your code (todo: add link/instructions). Note If you don't specify the component in the filter step, you can specify it at the etl2pcap step: pktmon etl2pcap pktmon.etl -c 9 and it will produce the same final output pcapng file. Trouble Shooting a Performance Issue Is it a problem with just a single (or very few) connection? Is it a problem multiple (lots) of connections? Why is Performance bad for my Connection? Where is the CPU being spent for my connection? What is limiting throughput for my connection? Why is the network limiting throughput for my connection? Analyzing CPU Usage Important - The following is specific to Windows OS. It's extremely common that everything may be functional, but not just as fast as expected. So the normal next step is then to grab a performance trace for the scenario and then dive into the details to analyze what exactly is happening. When you're explicitly looking for where the CPU is spending its time in the scenario, you will need to collect CPU traces. One way to do this is to use WPR (with Stacks.Light profile). Once you have the ETL, open it in WPA (MsQuic plugin is unnecessary). Then, go to the Graph Explorer, expand Computation, expand CPU Usage (Sampled) and open up Utilization by Process, Thread, Stack. The following is an example of a CPU trace a server in a client upload scenario: The next step is to filter things down to only the important information. First, select the relavent time period: click and drag over the area, right click, select Zoom. Then eliminate any unrelated \"noise\". The easiest way is to select all the Processes that have very little weight; then right click, and select Filter Out Selection. Another important thing to filter is the [Idle] stack under the Idle (0) process. Expact the Thread ID column for Idle (0), right click the [Idle] row and then select Filter Out Selection. You should then be left with only the relavent CPU usage. The current view now shows the CPU usage as a fraction of the total available CPU resources. Generally, the total available resources will include multiple (possibly many) different CPUs. So, it's a lot more helpful to get a per-CPU view. To see this, right click the table header (for instance, on Process), expand More Columns ..., and select CPU. Next, drag the CPU column to the front, before Process. Initially, nothing will most likely show up in the graph view. You will need to manually enable the relavent CPUs. In this particular example, only CPUs 4 and 36 have any significant usage (as seen by Count or Weight columns). In the Series list in the graph view, click the boxes next to each relavent CPU series to enable it. You may also chose to explicitly filter out the CPUs that are irrelevant. Now, you could use this view to dig into the various stacks by expanding the rows under the Stack column, but a better way is to change the view to a Flame Graph. Click the icon at the top to Select chart type and change it to Flame. Additionally, check the Enable box under it to enable filtering. You can then slide the filter slider to change the level of filtering. In this example, after resizing the window some, it looks like this: As you will immediately see, shows up everywhere because symbols haven't been loaded. To load them, you will first need to configure the symbol path. You can do this by clicking on Trace in the top and then Configure Symbol Path. Then add any relavent paths to the list and close the window (hit Ok). Next you can load the symbols; Trace -> Load Symbols. This example has just the msquic.dll and secnetperf.exe symbols loaded. You can then drill down to various various part of the flame, and zoom into them by right clicking on them and selecting Filter To Flame. Also, remember you can change the Flame filter slider at any time as well. For example, the following is what filtering to the msquic.dll!QuicWorkerThread flame. Since this flame was essentially all of CPU 4, whatever is taking the most significant CPU resources here can be blamed as the most significant cost of CPU resources. Practically, this comes down to bcrypt.dll. From experience, since this happens in QuicConnRecvDecryptAndAuthenticate this is the work related to decrypting the packet payloads (over 71% of the CPU!). Finding Throughput Bottlenecks TODO Why is Performance bad across all my Connections? UDP receive offload is not working. The work load isn't spreading evenly across cores. Diagnosing Software UDP Receive Offload Issues Important - The following is specific to Windows OS. Software UDP Receive Offload (URO) is an importance performance feature. To check if URO is working correctly, you can follow this guide and use Full.Verbose profile to collect TCPIP traces. In the converted text file, if you see this event, URO is working. The below event indicates UDP layer saw a UDP packet coalesced from 5 UDP packets each with 1000 byte payload. [Microsoft-Windows-TCPIP]UDP: endpoint 0xFFFFA4033FC652C0: URO SCU received. SegCount = 5, SegSize = 1000, DataLength = 5000. If you are not seeing the above event at all, there are several things that can break URO functionality. Global URO Knob URO can be administratively turned off system-wide from a netsh knob. Check by running netsh int udp show global. If Receive Offload State is displayed as disabled, then URO has been administratively disabled. Incompatible Miniport or Medium Take a look at the IP interface rundown traces. Software RSC/URO applicable must be TRUE for URO to work. If it is FALSE, it means the underlying miniport driver is using NDIS 5 or the interface medium is not compatible (e.g. KDNic). [Microsoft-Windows-TCPIP]Framing: interface rundown: Interface = 8, Luid = 0x6008000000000, Address family = 2(IPV4), Compartment = 1, Isolation mode = 0(None), Isolation ID = 0, DL address = 0x00155D563406, Interface type = 6, Physical medium type = 19(NdisPhysicalMediumOther), SW RSC/URO applicable = 0(FALSE), SW RSC enabled = 0(FALSE), Alias = Ethernet (Kernel Debugger). Incompatible Software Component We also have a rundown trace for URO global disabled mask. The mask must be zero for URO to work. It's common that the mask is 2 (0b000010), which means some incompatible WFP callouts have disabled URO. If you see this on a freshly installed machine, try disabling real-time protection from Microsoft Defender settings. It's also possible the mask is 48 (0b110000), which means some incompatible IPSNPI clients have disabled URO. These (winnat or FSE) can automatically get enabled when WSL or Hyper-V are enabled on a machine. [Microsoft-Windows-TCPIP]TCP software RSC global disabled mask = 0, UDP software URO global disabled mask = 0. Incompatible Hardware Feature (PTP Timestamps) Another reason that UDP packets by design will not be coalesced if they carry different PTP timestamps. PTP timestamp is a feature for accurately synchronizing time supported by some NICs and it should be off by default. You can turn off PTP timestamps in NIC properties. Diagnosing RSS Issues Important - The following is specific to Windows OS. For scenarios with lots of parallel connections, generally the work should be spread across all the available processors. But if there are issues with the work not spreading there are a few things you can do. First off, here's an example where the RPS tests were run to a server that should be able to handle near 1 million requests per second: > secnetperf.exe -test:RPS -target:quic-server -conns:250 -requests:7500 -request:0 -response:4096 -runtime:20000 All Connected! Waiting for idle. Start sending request... Started! Result: 25869 RPS, Min: 944, Max: 888078, 50th: 281889.000000, 90th: 369965.000000, 99th: 490582.000000, 99.9th: 574533.000000, 99.99th: 797810.000000, 99.999th: 884055.000000, 99.9999th: 888078.000000, StdErr: 91.224221 App Main returning status 0 As you can see from the output of secnetperf.exe the resulting 25869 RPS is nowhere near what it should be. The next step is to grab a performance trace to see what is going on. For these type of issues the best way to collect the traces would be to use WPR (with Scheduling.Verbose or Performance.Verbose profiles). Once you have the ETL, open it in WPA using the MsQuic plugin. First thing after opening, let's take a look at the QUIC Worker utilization. In the Graph Explorer, under Computation, expand QUIC Workers and open the one labeled Utilization by Worker. For the example above (server-side trace), here is what the output looks like: You can immediately see that only 2 different workers are being used, with worker 2 being used primarily. In depth details - MsQuic always uses at least two workers on the server side for per connection. The first worker is a global, shared worker that is used to do initial validation of the connection request. Its job is to figure out which app the incoming connection belongs to. Once that's complete, the connection will be handed off to that app (and its worker thread(s)). So, this is why you only see usage of worker 1 at the beginning of the trace. The first usage spikes are from the RPS test initially connecting all its (250) parallel connections. There is a bit of back and forth to do the handshakes for these connections. Then there is an idle period while the test waits for things to die down. Finally, the actual RPS tests commense and that is where you see the solid usage of worker 2. Ideally, RPS tests should generate work that is spread across many different workers. The fact that only 1 worker is being used is definitely the source of the low RPS numbers that were measured and indicated in the tool output above. Since MsQuic picks which workers to use based on how the UDP datagrams are received, the next step is to look into the UDP receive layer. One way to do this is by using the Generic Events table (under System Activity in the Graph Explorer). Open that up, and then filter to just MsQuic (also shows up as ff15e657-4f26-570e-88ab-0796b258d11c some times) Provider Name (Right Click -> Filter To Selection). Next, we're looking for specifically the datapath receive events, which are Id 9218. Filter to just those and we see something like this: Now, what we're really interested in is what CPU these events are coming in on. So, add the CPU column to the left of the yellow bar. It doesn't really show a much different picture, but you can clearly see that all events happen on CPU 0. Now, we can clearly see that all our receive events are happening on the same CPU. This definitely not supposed to happen in an environment where RSS should be spreading all the different incoming UDP flows/tuples to different processors. The next step is to take a look at the RSS configuration on the machine to ensure things are properly configured. Run Get-NetAdapterRss to get an output like this: Name : Slot0A x8 InterfaceDescription : Mellanox ConnectX-3 Pro Ethernet Adapter Enabled : True NumberOfReceiveQueues : 8 Profile : Closest BaseProcessor: [Group:Number] : 0:0 MaxProcessor: [Group:Number] : 1:38 MaxProcessors : 8 RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:2/0 0:4/0 0:6/0 0:8/0 0:10/0 0:12/0 0:14/0 0:16/0 0:18/0 0:20/0 0:22/0 0:24/0 0:26/0 0:28/0 0:30/0 0:32/0 0:34/0 0:36/0 0:38/0 1:0/32767 1:2/32767 1:4/32767 1:6/32767 1:8/32767 1:10/32767 1:12/32767 1:14/32767 1:16/32767 1:18/32767 1:20/32767 1:22/32767 1:24/32767 1:26/32767 1:28/32767 1:30/32767 1:32/32767 1:34/32767 1:36/32767 1:38/32767 IndirectionTable: [Group:Number] : 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 0:0 0:2 0:4 0:6 0:8 0:10 0:12 0:14 The output above indicates RSS is configured with 8 queues, so there should be spreading of the incoming flows to 8 different CPUs (and then passed to 8 different workers) instead of just the 1 that we are seeing. So, finally, in cases where everything seems to be configured correctly, but things still aren't working, that usually indicates a problem with the network card driver. Make sure the driver is up to date with the latest version available. If that still doesn't fix the problem, you will likely need to contact support from the network card vendor. Analyzing Network Issues TODO Drops on the Receiver Sometimes the issue can actually be the receiver itself, and not the network in between. The problem is that the sender generally cannot distinguish between network drops and receiver drops; even the receiving QUIC layer cannot necessarily identify these drops on its own. On Windows, the OS has a number of performance counters (some seen below) that can be used to analyze packet drops or discards at various layers. TODO - How to use ETW logs to collect even more detailed info on drops by the OS networking stack. TODO - How can you get similar info for Linux? Network Card Discards In some high throughput scenarios, the default number of NIC receive buffers might not be enough to handle spikes in network traffic. When a spike is too large for the NIC to handle, it has to drop the excess packets, resulting in Packet Received Discarded counter increases. When this happens, usually the best way to handle this it to increase the Receive Buffers value in the NIC's Advanced settings (seen below)."
},
"docs/Versions.html": {
"href": "docs/Versions.html",