onnxruntime/cmake
Tianlei Wu 72186bbb71
[CUDA] Build nhwc ops by default (#22648)
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
2024-11-06 09:54:55 -08:00
..
external Add implementation of WebGPU EP (#22591) 2024-10-29 18:29:40 -07:00
patches Add implementation of WebGPU EP (#22591) 2024-10-29 18:29:40 -07:00
tensorboard Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
CMakeLists.txt [CUDA] Build nhwc ops by default (#22648) 2024-11-06 09:54:55 -08:00
CMakePresets.json Create CMake option `onnxruntime_USE_VCPKG` (#21348) 2024-09-10 16:39:27 -07:00
CMakeSettings.json Fork the WinML APIs into the Microsoft namespace (#3503) 2020-04-17 06:18:54 -07:00
EnableVisualStudioCodeAnalysis.props Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Info.plist.in Enable build dynamic framework for macOS/iOS (#7343) 2021-04-15 16:47:53 -07:00
Sdl.ruleset Add a Github workflow for Prefast (#15763) 2023-05-03 11:42:51 -07:00
adjust_global_compile_flags.cmake [JS/WebGPU] Support WASM64 (#21836) 2024-10-24 20:21:51 -07:00
arm64x.cmake Dev/mookerem/arm64x update (#20536) 2024-05-07 12:50:38 -07:00
codeconv.runsettings CMake changes (#2961) 2020-02-03 19:33:14 -08:00
deps.txt Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
deps_update_and_upload.py Update google benchmark to 1.8.3. (#19734) 2024-03-01 11:01:58 -08:00
gdk_toolchain.cmake Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
hip_fatbin_insert [MIGraphX EP/ ROCm EP] add gfx1200, gfx1201 to CMAKE_HIP_ARCHITECTURES (#22348) 2024-10-11 17:31:36 -07:00
libonnxruntime.pc.cmake.in cmake: support install target with generated pkg-config file (#7076) 2021-03-22 19:36:31 -07:00
linux_arm32_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
linux_arm64_crosscompile_toolchain.cmake Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
maccatalyst_prepare_objects_for_prelink.py Support xcframework for mac catalyst builds. (#19534) 2024-03-20 10:55:19 -07:00
nuget_helpers.cmake Update nuget.exe used in WindowsAI nuget packaging so `readme` property is supported. (#22141) 2024-09-19 19:06:47 +10:00
onnxruntime.cmake Refactor the cmake code that is related to delay loading (#22646) 2024-11-04 16:30:50 -08:00
onnxruntime_codegen_tvm.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_common.cmake Enable QNN HTP support for Node (#20576) 2024-05-09 13:11:07 -07:00
onnxruntime_compile_triton_kernel.cmake [CUDA] Add SparseAttention operator for Phi-3-small (#20216) 2024-04-30 09:06:29 -07:00
onnxruntime_config.h.in Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
onnxruntime_csharp.cmake Refactor training build options (#13964) 2023-01-03 13:28:16 -08:00
onnxruntime_flatbuffers.cmake Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` (#15323) 2023-04-03 17:45:12 -07:00
onnxruntime_framework.cmake Adding CUDNN Frontend and use for CUDA NN Convolution (#19470) 2024-08-02 15:16:42 -07:00
onnxruntime_framework.natvis [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
onnxruntime_fuzz_test.cmake [Fuzzer] Add two new ORT libfuzzer (Linux clang support for now) (#22055) 2024-09-12 11:50:34 -07:00
onnxruntime_graph.cmake [Apple framework] Fix minimal build with training enabled. (#19858) 2024-03-12 11:33:30 -07:00
onnxruntime_ios.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_java.cmake Remove deprecated "mobile" packages (#20941) 2024-06-07 16:20:32 -05:00
onnxruntime_java_unittests.cmake [Java] Add API for appending QNN EP (#22208) 2024-10-01 10:18:04 -07:00
onnxruntime_kernel_explorer.cmake [ROCm] prefer hip interfaces over roc during hipify (#22394) 2024-10-14 20:34:03 -07:00
onnxruntime_lora.cmake Multi-Lora support (#22046) 2024-09-30 15:59:07 -07:00
onnxruntime_mlas.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_nodejs.cmake Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
onnxruntime_objectivec.cmake Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
onnxruntime_opschema_lib.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_optimizer.cmake Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
onnxruntime_providers.cmake Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
onnxruntime_providers_acl.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_armnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_azure.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_cann.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_coreml.cmake Fix Objective-C static analysis warnings. (#20417) 2024-04-24 11:48:29 -07:00
onnxruntime_providers_cpu.cmake Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
onnxruntime_providers_cuda.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_dml.cmake Refactor the cmake code that is related to delay loading (#22646) 2024-11-04 16:30:50 -08:00
onnxruntime_providers_dnnl.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_js.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_migraphx.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_nnapi.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_openvino.cmake Ovep develop lnl 1.2 (#22424) 2024-10-14 12:10:01 -07:00
onnxruntime_providers_qnn.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_providers_rknpu.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_rocm.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_tensorrt.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_tvm.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_vitisai.cmake [VitisAI] remove wrong error msg, required by Microsoft (#21715) 2024-08-21 21:10:28 -07:00
onnxruntime_providers_vsinpu.cmake Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
onnxruntime_providers_webgpu.cmake Add implementation of WebGPU EP (#22591) 2024-10-29 18:29:40 -07:00
onnxruntime_providers_webnn.cmake Split onnxruntime_providers.cmake to multiple (#17853) 2023-10-09 20:33:44 -07:00
onnxruntime_providers_xnnpack.cmake Make partitioning utils QDQ aware so it does not break up QDQ node units (#19723) 2024-03-12 10:55:49 +10:00
onnxruntime_python.cmake Refactor the cmake code that is related to delay loading (#22646) 2024-11-04 16:30:50 -08:00
onnxruntime_rocm_hipify.cmake [ROCm] redo hipify of version controlled files (#22449) 2024-10-18 12:40:54 -07:00
onnxruntime_session.cmake Multi-Lora support (#22046) 2024-09-30 15:59:07 -07:00
onnxruntime_snpe_provider.cmake Use target name for flatbuffers (#13991) 2022-12-20 11:44:02 -08:00
onnxruntime_training.cmake Multi-Lora support (#22046) 2024-09-30 15:59:07 -07:00
onnxruntime_unittests.cmake Add implementation of WebGPU EP (#22591) 2024-10-29 18:29:40 -07:00
onnxruntime_util.cmake Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
onnxruntime_visionos.toolchain.cmake Support visionos build (#20365) 2024-04-23 18:15:07 -07:00
onnxruntime_webassembly.cmake [JS/WebGPU] Support WASM64 (#21836) 2024-10-24 20:21:51 -07:00
precompiled_header.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
riscv64.toolchain.cmake Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238) 2024-01-24 16:27:05 -08:00
set_winapi_family_desktop.h Fix WCOS/Win32 linking bugs (#3126) 2020-03-19 08:52:40 -07:00
target_delayload.cmake Refactor the cmake code that is related to delay loading (#22646) 2024-11-04 16:30:50 -08:00
uwp_stubs.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
vcpkg-configuration.json Auto regenerate LORA's fbs files (#22313) 2024-10-04 10:01:19 -07:00
vcpkg.json Create CMake option `onnxruntime_USE_VCPKG` (#21348) 2024-09-10 16:39:27 -07:00
wcos_rules_override.cmake Stop using apiset in OneCore build: use onecoreuap.lib instead of onecoreuap_apiset.lib (#19632) 2024-02-23 22:31:57 -08:00
winml.cmake Change libonnxruntime.so's SONAME: remove the minor and patch version. (#21339) 2024-07-15 14:21:34 -07:00
winml_cppwinrt.cmake Fix Windows Store build (#8753) 2021-08-23 11:19:03 -07:00
winml_sdk_helpers.cmake Merge windowsai (winml layering) into master (#2956) 2020-02-04 17:12:19 -08:00
winml_unittests.cmake Multi-Lora support (#22046) 2024-09-30 15:59:07 -07:00