onnxruntime/cmake
Tianlei Wu 72186bbb71
[CUDA] Build nhwc ops by default (#22648)
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
2024-11-06 09:54:55 -08:00
..
external
patches
tensorboard
CMakeLists.txt [CUDA] Build nhwc ops by default (#22648) 2024-11-06 09:54:55 -08:00
CMakePresets.json
CMakeSettings.json
EnableVisualStudioCodeAnalysis.props
Info.plist.in
Sdl.ruleset
adjust_global_compile_flags.cmake
arm64x.cmake
codeconv.runsettings
deps.txt
deps_update_and_upload.py
gdk_toolchain.cmake
hip_fatbin_insert
libonnxruntime.pc.cmake.in
linux_arm32_crosscompile_toolchain.cmake
linux_arm64_crosscompile_toolchain.cmake
maccatalyst_prepare_objects_for_prelink.py
nuget_helpers.cmake
onnxruntime.cmake
onnxruntime_codegen_tvm.cmake
onnxruntime_common.cmake
onnxruntime_compile_triton_kernel.cmake
onnxruntime_config.h.in
onnxruntime_csharp.cmake
onnxruntime_flatbuffers.cmake
onnxruntime_framework.cmake
onnxruntime_framework.natvis
onnxruntime_fuzz_test.cmake
onnxruntime_graph.cmake
onnxruntime_ios.toolchain.cmake
onnxruntime_java.cmake
onnxruntime_java_unittests.cmake
onnxruntime_kernel_explorer.cmake
onnxruntime_lora.cmake
onnxruntime_mlas.cmake
onnxruntime_nodejs.cmake
onnxruntime_objectivec.cmake
onnxruntime_opschema_lib.cmake
onnxruntime_optimizer.cmake
onnxruntime_providers.cmake
onnxruntime_providers_acl.cmake
onnxruntime_providers_armnn.cmake
onnxruntime_providers_azure.cmake
onnxruntime_providers_cann.cmake
onnxruntime_providers_coreml.cmake
onnxruntime_providers_cpu.cmake
onnxruntime_providers_cuda.cmake
onnxruntime_providers_dml.cmake
onnxruntime_providers_dnnl.cmake
onnxruntime_providers_js.cmake
onnxruntime_providers_migraphx.cmake
onnxruntime_providers_nnapi.cmake
onnxruntime_providers_openvino.cmake
onnxruntime_providers_qnn.cmake
onnxruntime_providers_rknpu.cmake
onnxruntime_providers_rocm.cmake
onnxruntime_providers_tensorrt.cmake
onnxruntime_providers_tvm.cmake
onnxruntime_providers_vitisai.cmake
onnxruntime_providers_vsinpu.cmake
onnxruntime_providers_webgpu.cmake
onnxruntime_providers_webnn.cmake
onnxruntime_providers_xnnpack.cmake
onnxruntime_python.cmake
onnxruntime_rocm_hipify.cmake
onnxruntime_session.cmake
onnxruntime_snpe_provider.cmake
onnxruntime_training.cmake
onnxruntime_unittests.cmake
onnxruntime_util.cmake
onnxruntime_visionos.toolchain.cmake
onnxruntime_webassembly.cmake
precompiled_header.cmake
riscv64.toolchain.cmake
set_winapi_family_desktop.h
target_delayload.cmake
uwp_stubs.h
vcpkg-configuration.json
vcpkg.json
wcos_rules_override.cmake
winml.cmake
winml_cppwinrt.cmake
winml_sdk_helpers.cmake
winml_unittests.cmake