onnxruntime

История

Hector Li 401d16c671 Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853 ) ### Description Enable QNN HTP spill fill buffer setting to save RAM usage. This feature is available after QNN 2.28. Need to re-generate QNN context binary. https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api Requirements: 1. Need to re-generate the Onnx model with QNN context binary by set the EP option enable_htp_spill_fill_buffer = 1. 2. Works for a model with multiple Context binaries. Need manually merge 2 Onnx model with context binary into 1 Onnx model. 3. Requires Linux platform if generate the context binary offline since QnnSystem lib is not available for Windows x86_64 platform. No need to do extra thing while running the model inference. The generated EPContext node will have a max_size attribute with the maximum spill fill buffer size for the context binary <img width="353" alt="image" src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d"> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>		2024-12-06 11:36:52 -08:00
..
c_cxx	Remove extraneous javascript includes (#17558 )	2023-09-14 20:43:24 -07:00
execution_providers/images	Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225 )	2021-02-05 18:09:27 -08:00
images	API Documentation (#8948 )	2021-09-09 22:04:51 -07:00
python	Cleanup code (#22827 )	2024-11-19 14:13:33 -08:00
ABI_Dev_Notes.md	Fix a typo in ABI_Dev_Notes.md (#17832 )	2023-10-09 07:51:34 -07:00
Android_testing.md	Removed BUILD.md from master as source now lives in gh-pages (#6709 )	2021-02-19 11:34:21 -08:00
C_API_Guidelines.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Coding_Conventions_and_Standards.md	Update lintrunner requirements (#22185 )	2024-09-23 18:27:16 -07:00
ContribOperators.md	Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853 )	2024-12-06 11:36:52 -08:00
FAQ.md	[Technical docs] Fixed a couple of old links in `FAQ.md` (#17415 )	2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md	fix requirements.txt path (#22946 )	2024-12-04 13:08:29 -08:00
Memory_Optimizer.md	Flash attention recompute (#20603 )	2024-05-21 13:38:19 +08:00
Model_Test.md	Update docs/Model_Test.md (#11466 )	2024-05-15 11:33:11 -07:00
NotesOnThreading.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md	Update docs/ONNX_Runtime_Server_Usage.md (#7818 )	2021-05-26 16:17:20 -07:00
ORTModule_Convergence_Notes.md	Fix and enable few ORTModule Unit Tests (#19847 )	2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md	add steps to write modulewithloss wrapper (#16486 )	2023-07-11 09:07:35 +08:00
ORTModule_PythonOp_Notes.md	Add document for PythonOp (#17888 )	2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md	Adds ATen fallback for scaled_dot_product_attention (#21107 )	2024-07-22 16:37:04 -07:00
ORT_Format_Update_in_1.13.md	Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413 )	2023-01-25 08:23:12 -08:00
ORT_Use_Triton_Kernel.md	Rename a mispelled filename in the documentation (#21066 )	2024-06-17 18:18:41 +02:00
OperatorKernels.md	Implementation of TreeEnsemble ai.onnx.ml==5 (#22333 )	2024-11-22 19:48:23 +01:00
PR_Guidelines.md	Add guidelines for writing a good PR. (#3830 )	2020-05-05 16:28:21 -07:00
Privacy.md	[C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481 )	2020-10-21 10:32:13 -07:00
Reduced_Operator_Kernel_build.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
ReleaseManagement.md	Updated TPN for OpenMPI and cleanup (#3932 )	2020-05-14 11:42:44 -07:00
Roadmap.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Server.md	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
Versioning.md	replace 'master' branch ref to 'main' for onnx repo (#12678 )	2022-08-30 13:41:42 -07:00
WinML_principles.md	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
cmake_guideline.md	fix some typo in docs (#13212 )	2022-10-07 15:58:18 -07:00
onnxruntime_dependencies.dot	Update dependencies graph	2020-04-17 07:38:45 -07:00
onnxruntime_dependencies.png	Update dependencies graph	2020-04-17 07:38:45 -07:00
onnxruntime_extensions.md	Remove the extensions submodule (#17097 )	2023-08-14 10:16:33 -07:00