onnxruntime

Автор SHA1 Сообщение Дата

Автор	SHA1	Сообщение	Дата
ivberg	74028e4bdc	Fully dynamic ETW controlled logging for ORT and QNN logs (#20537 ) ### Description Windows - Fully dynamic ETW controlled logging for ORT and QNN logs The logging support is documented here - https://onnxruntime.ai/docs/performance/tune-performance/logging_tracing.html#tracing---windows - https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#tracelogging-etw-windows-profiling Also add support for logging ORT SessionCreation on ETW CaptureState ### Motivation and Context The previous ETW support only worked if you enabled ETW before the session started. There can commonly be long-lived AI inference processes that need to be traced & debugged. This enables logging fully on the fly. Without this support a dev would have to end up killing a process or stopping a service in order to get tracing. We had to do this for a recent issue with QNN, and it was a bit painful to get the logs and it ruined the repro. ### Testing I tested with the following cases - Leaving default ORT run - Enabling ETW prior to start and leaving running for entire session + inferences, then stopping - Starting ORT session + inf, then enabling and stopping ETW - Start ORT session /w long running Inferences - wpr -start [ort.wprp](`e6228575e4/ort.wprp (L4)`) -start [etw_provider.wprp](`e6228575e4/onnxruntime/test/platform/windows/logging/etw_provider.wprp`) - Wait a few seconds - wpr -stop ort.etl - Inferences are still running - Verify ONNXRuntimeLogEvent provider events are present and new SessionCreation_CaptureState event under Microsoft.ML.ONNXRuntime provider Related: #18882 #19428	2024-06-06 21:11:14 -07:00
ivberg	4d1243b4b4	ORT ETW dynamic logging that improves ORT diagnosability & performance (#18882 ) ### Description This PR has several combined ORT ETW changes that improve ORT log diagnosability & performance. - The existing log behavior in the ORT API and Severity behavior remain the same as compiled by the dev using the ORT API - The PR keeps the existing design which has 2 TraceLogging providers defined (although both were not used before this PR) - Keeps great inference (inf) and session load performance even with dynamic logging enabled (see below) - On Windows, when ONNXRuntimeTraceLoggingProvider is enabled, then ORT will dynamically _add_ a new sink reflecting the severity level provided by ETW dynamically. E.G Critical - Verbose per the need at runtime - This allows previous printf style LOGS() statements both for CPU and NPU cases to flow to ETW via a local trace (if enabled) - This DOES NOT add any new Telemetry which can optionally be sent to Microsoft. - Telemetry are ETW events marked with the Measure keyword that _can_ be sampled if a box opts-in - Existing Microsoft.ML.ONNXRuntime events have appropriate keywords and levels added if they were missing - If Execution Providers (EPs) can provide more detailed insight into their HW or component, then this PR allows for those to be dynamically logged instead of just at compile time - In this PR, the QNN EP for QC NPUs can have basic or detailed profiling enabled to give some insight into how the NPU is performing - When the Microsoft.ML.ONNXRuntime ETW provider is enabled with the Profiling keyword and level 5 then QC QNN basic profiling info is output to ETW ### Motivation and Context - This make ORT logging and diagnosability more performant (on Windows) and available in a wider variety of runtime environments. - The performance difference for inf times was ~300x+ drastically better/faster when these logs were output to ETW vs just stdout (Verbose Severity) - This style of ETW dynamic tracing is the widely used standard for Windows components, and even by some 3rd party software since the ETW API is open and part of the Windows API - These ETW based logs can be seamlessly combined with other ETW logs such as an AI component/feature using ORT, OS CPU profiling, scheduling, and more - Before the PR, ORT logging is largely printf style and output to a sink (usually stdout) only if compiled with a certain log Severity. When enabled the previous logging (to stdout) would vastly slow down inference times. Once compiled for release the internal ORT logs were not accessible by anyone except the AI model developer in their dev inner loop. That means logs could not be enabled on a lab machine, or on a production system where the runtime behavior or performance might be different for various reasons on a wide variety of HW. - This change was tested with performance in mind and tested with a mobilenet small AI model with onnxruntime_perf_test - CPU: There was no statistical difference for inf times with the baseline (main) or this PR whether ETW was enabled or not (both ORT providers all keywords level 5). - NPU (QNN on SP9 or Dev Kit 2023 QC SQ3): There was no statistical difference for inf times with this PR whether ETW (both ORT providers all keywords) were enabled or not for Level 5 (Verbose). This is even with QNN Basic profiling turned on and outputting NPU stats to ETW - As expected and designed, there was perf slowdown when Max Level 255 is enabled which translates to QNN Detailed profiling. This mirrors the expected slowdown in the NPU when individual model operations are monitored & recorded as well. This perf is similar to the QNN SDK Detailed profiling performance separate from this PR. This is designed to be above level 5 (verbose) as that is commonly the max level used in many trace profiles and won't affect inf performance. - Other OSes such as Linux & Android are left untouched for now. - Out of scope for this PR but TraceLogging is available for Linux with LTTng tracing. So in the future, this optional tracing could also be made available on other OSes where a TraceLogging API is available	2024-01-11 12:43:27 -08:00
Changming Sun	fc6773a65b	Add Tracelogging for profiling (#1639 ) Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON	2019-11-11 21:34:10 -08:00

ivberg

74028e4bdc

Fully dynamic ETW controlled logging for ORT and QNN logs (#20537 )

### Description
Windows - Fully dynamic ETW controlled logging for ORT and QNN logs

The logging support is documented here 
-
https://onnxruntime.ai/docs/performance/tune-performance/logging_tracing.html#tracing---windows
-
https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#tracelogging-etw-windows-profiling

Also add support for logging ORT SessionCreation on ETW CaptureState

### Motivation and Context
The previous ETW support only worked if you enabled ETW before the
session started. There can commonly be long-lived AI inference processes
that need to be traced & debugged. This enables logging fully on the
fly.

Without this support a dev would have to end up killing a process or
stopping a service in order to get tracing. We had to do this for a
recent issue with QNN, and it was a bit painful to get the logs and it
ruined the repro.

### Testing
I tested with the following cases
- Leaving default ORT run
- Enabling ETW prior to start and leaving running for entire session +
inferences, then stopping
- Starting ORT session + inf, then enabling and stopping ETW
  - Start ORT session /w long running Inferences 
- wpr -start
[ort.wprp](e6228575e4/ort.wprp (L4))
-start
[etw_provider.wprp](e6228575e4/onnxruntime/test/platform/windows/logging/etw_provider.wprp)
  - Wait a few seconds
  - wpr -stop ort.etl
  - Inferences are still running
- Verify ONNXRuntimeLogEvent provider events are present and new
SessionCreation_CaptureState event under Microsoft.ML.ONNXRuntime
provider

Related:
#18882
#19428

2024-06-06 21:11:14 -07:00

ivberg

4d1243b4b4

ORT ETW dynamic logging that improves ORT diagnosability & performance (#18882 )

### Description
This PR has several combined ORT ETW changes that improve ORT log
diagnosability & performance. 
- The existing log behavior in the ORT API and Severity behavior remain
the same as compiled by the dev using the ORT API
- The PR keeps the existing design which has 2 TraceLogging providers
defined (although both were not used before this PR)
- Keeps great inference (inf) and session load performance even with
dynamic logging enabled (see below)
- On Windows, when ONNXRuntimeTraceLoggingProvider is enabled, then ORT
will dynamically _add_ a new sink reflecting the severity level provided
by ETW dynamically. E.G Critical - Verbose per the need at runtime
- This allows previous printf style LOGS() statements both for CPU and
NPU cases to flow to ETW via a local trace (if enabled)
- This DOES NOT add any new Telemetry which can optionally be sent to
Microsoft.
- Telemetry are ETW events marked with the Measure keyword that _can_ be
sampled if a box opts-in
- Existing Microsoft.ML.ONNXRuntime events have appropriate keywords and
levels added if they were missing
- If Execution Providers (EPs) can provide more detailed insight into
their HW or component, then this PR allows for those to be dynamically
logged instead of just at compile time
- In this PR, the QNN EP for QC NPUs can have basic or detailed
profiling enabled to give some insight into how the NPU is performing
- When the Microsoft.ML.ONNXRuntime ETW provider is enabled with the
Profiling keyword and level 5 then QC QNN basic profiling info is output
to ETW
  
### Motivation and Context
- This make ORT logging and diagnosability more performant (on Windows)
and available in a wider variety of runtime environments.
- The performance difference for inf times was ~300x+ drastically
better/faster when these logs were output to ETW vs just stdout (Verbose
Severity)
- This style of ETW dynamic tracing is the widely used standard for
Windows components, and even by some 3rd party software since the ETW
API is open and part of the Windows API
- These ETW based logs can be seamlessly combined with other ETW logs
such as an AI component/feature using ORT, OS CPU profiling, scheduling,
and more
- Before the PR, ORT logging is largely printf style and output to a
sink (usually stdout) only if compiled with a certain log Severity. When
enabled the previous logging (to stdout) would vastly slow down
inference times. Once compiled for release the internal ORT logs were
not accessible by anyone except the AI model developer in their dev
inner loop. That means logs could not be enabled on a lab machine, or on
a production system where the runtime behavior or performance might be
different for various reasons on a wide variety of HW.
- This change was tested with performance in mind and tested with a
mobilenet small AI model with onnxruntime_perf_test
- CPU: There was no statistical difference for inf times with the
baseline (main) or this PR whether ETW was enabled or not (both ORT
providers all keywords level 5).
- NPU (QNN on SP9 or Dev Kit 2023 QC SQ3): There was no statistical
difference for inf times with this PR whether ETW (both ORT providers
all keywords) were enabled or not for Level 5 (Verbose). This is even
with QNN Basic profiling turned on and outputting NPU stats to ETW
- As expected and designed, there was perf slowdown when Max Level 255
is enabled which translates to QNN Detailed profiling. This mirrors the
expected slowdown in the NPU when individual model operations are
monitored & recorded as well. This perf is similar to the QNN SDK
Detailed profiling performance separate from this PR. This is designed
to be above level 5 (verbose) as that is commonly the max level used in
many trace profiles and won't affect inf performance.
- Other OSes such as Linux & Android are left untouched for now. 
- Out of scope for this PR but TraceLogging is available for Linux with
LTTng tracing. So in the future, this optional tracing could also be
made available on other OSes where a TraceLogging API is available

2024-01-11 12:43:27 -08:00

Changming Sun

fc6773a65b

Add Tracelogging for profiling (#1639 )

Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON

2019-11-11 21:34:10 -08:00

3 Коммитов