onnxruntime

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yi Zhang	9f7e19cedd	[Fix] Make python API doc generation in Microsoft-hosted Agent (#21766 ) ### Description <!-- Describe your changes. --> ### Motivation and Context 1. Python API doc needs to be merged from a fork, but 1ES self-hosted pool is only for one github repo. 2. ubuntu-latest will be install numpy above 2.0 by default, and current python API doc generation doesn't support it. So I pin numpy < 2.0.0 ---------	2024-08-20 23:32:38 +08:00
Satya Kumar Jandhyala	1fb2e71ddc	[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782 ) Avoid producing presentKey/presentValue outputs if pastKey/pastValue don't exists. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-19 18:02:19 -07:00
Adrian Lizarraga	a22cc078b4	[QNN EP] Add support for GatherElements (#15966 ) ### Description - Adds support for the GatherElements operator to QNN EP. - Adds GatherElements to QDQ quantizer tool. ### Motivation and Context Enable more models to run on QNN EP.	2024-08-19 14:33:40 -07:00
Tianlei Wu	7c93d5ded1	Upgrade pytorch_lightning to 2.3.3 to fix orttraining_amd_gpu_ci_pipeline (#21789 ) ### Description Upgrade pytorch_lightning to fix orttraining_amd_gpu_ci_pipeline ``` #24 1.838 WARNING: Ignoring version 1.6.0 of pytorch_lightning since it has invalid metadata: #24 1.838 Requested pytorch_lightning==1.6.0 from `cee67f4849`3d9443e138a4172ec786/pytorch_lightning-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with `==` or `!=` operators #24 1.838 torch (>=1.8.*) #24 1.838 ~~~~~~^ #24 1.838 Please use pip<24.1 if you need to use this version. #24 1.838 ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10 #24 1.838 ERROR: Could not find a version that satisfies the requirement pytorch_lightning==1.6.0 (from versions: 0.0.2, 0.2, 0.2.2, 0.2.3, 0.2.4, 0.2.4.1, 0.2.5, 0.2.5.1, 0.2.5.2, 0.2.6, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.4.1, 0.3.5, 0.3.6, 0.3.6.1, 0.3.6.3, 0.3.6.4, 0.3.6.5, 0.3.6.6, 0.3.6.7, 0.3.6.8, 0.3.6.9, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.5.0, 0.5.1, 0.5.1.2, 0.5.1.3, 0.5.2, 0.5.2.1, 0.5.3, 0.5.3.1, 0.5.3.2, 0.5.3.3, 0.6.0, 0.7.1, 0.7.3, 0.7.5, 0.7.6, 0.8.1, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.10.0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0rc1, 1.3.0rc2, 1.3.0rc3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.7.post0, 1.3.8, 1.4.0rc0, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0rc0, 1.8.0rc1, 1.8.0rc2, 1.8.0, 1.8.0.post1, 1.8.1, 1.8.2, 1.8.3, 1.8.3.post0, 1.8.3.post1, 1.8.3.post2, 1.8.4, 1.8.4.post0, 1.8.5, 1.8.5.post0, 1.8.6, 1.9.0rc0, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 2.0.0rc0, 2.0.0, 2.0.1, 2.0.1.post0, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.9.post0, 2.1.0rc0, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0) #24 1.838 ERROR: No matching distribution found for pytorch_lightning==1.6.0 ```	2024-08-19 12:58:22 -07:00
Jing Fang	64674c50de	Added a tool to quantize Gather to GatherBlockQuantized (#21697 ) ### Description Added code in MatMul4BitsQuantizer to quantize Gather to GatherBlockQuantized. Only Gather with constant data is quantized. Since quantized data is in int4, the quantized model will force upgrade to onnx opset 21. The implementation purely relies on numpy. If optimization is needed, C++ kernels can be added later. Only support default RTN algorithm since GatherBlockQuantized require zero points to have the same type as quantized data. ### Motivation and Context Support quantizing gather to int4 in Web scenario.	2024-08-19 10:25:36 -07:00
Wanming Lin	7ae0b4ce64	[WebNN EP] Support Erf and Trilu for CPU backend (#21768 )	2024-08-19 07:56:16 -07:00
xhcao	417aa00406	[js/webgpu] fix conv1d error (#21585 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 15:45:13 -07:00
mingyueliuh	d1d40fbafd	[VitisAI][Fix] ShapeInferContext GetAttrxxxs support empty value (#21471 ) ### Description Bug fix for the ShapeInferContext GetAttrxxxs APIs. Node attribute maybe is empty. ### Motivation and Context If the attr value is empty, the expected result through the interface is empty , but currently, it returns a meaningless {0}. --------- Co-authored-by: mingyue <mingyue@amd.com> Co-authored-by: Liu Minyue <mingyue@xilinx.com>	2024-08-18 13:51:25 -07:00
jingyanwangms	c018ba43ef	[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742 ) ### Description - TensorRT 10.2.0.19 -> 10.3.0.26 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 13:26:41 -07:00
Jiajia Qin	c4ade796d6	[js/webgpu] Fix attention shader recompilation issue (#21770 ) ### Description <!-- Describe your changes. --> This PR fixes the `AttentionProbsSoftmax` recompilation issue when executing the phi3 model. With this fix, it will further improve the phi3 performance. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-17 17:15:15 -07:00
Yang Gu	49fc168eed	[js/webgpu] Handle negative axis in op Split (#21771 ) This is to fix issue #21703, where the axis is a negative value in the model. According to the spec (https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means counting dimensions from the back.	2024-08-17 16:41:23 -07:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Edward Chen	63e8849992	build_aar_package.py - Check that executable is present before trying to copy it. (#21730 ) Check that executable is present before trying to copy it. Accommodate builds where we skip building the test executables.	2024-08-16 11:21:09 -07:00
Emmanuel	a4bec3d374	Enabled Dynamo exporter (#21713 ) ### Description This PR modifies the run_dynamo_export function to ensure it mirrors the behavior of run_torchscript_merged_export rather than run_torchscript_separate_export. Additionally, I made adjustments to the main function to ensure that run_dynamo is correctly invoked. ### Motivation and Context The main motivation for this change is to enable successful export of LLaMA-2 and LLaMA-3 models using the Dynamo exporter to ONNX. Previously, the exporter was saving two copies of the weights, which is inefficient. The modified approach ensures that only one copy of the weights is saved, and the model can support both scenarios. These changes enhance the compatibility of the exporter with LLaMA models and subsequently other models and optimize the export process	2024-08-16 10:45:22 -07:00
Wanming Lin	b2d603abda	[WebNN EP] Remove workaround for scalar (#21704 ) Currently Chromium has supported scalar with dims = {}, remove legacy workaround for supporting scalar.	2024-08-15 22:59:51 -07:00
Scott McKay	c97cc5c1b0	Put all external project targets under the 'External' folder in VS (#21765 ) ### Description <!-- Describe your changes. --> Handle targets in subdirectories for external projects. All targets will now go in a per-project folder under 'External' e.g. gmock and gtest now get handled correctly and are under External/googletest vs. existing setup where they ended up as top-level projects. ![image](https://github.com/user-attachments/assets/99ec259c-47cd-44f3-954d-58569c941cc2) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve developer experience.	2024-08-16 15:51:50 +10:00
Yulong Wang	ef2ccc477b	[js/web] Add support for int4/uint4 tensor (#21720 ) ### Description Add support for int4/uint4 tensor.	2024-08-15 21:32:10 -07:00
Yulong Wang	d4d0bea1fb	[js] update docs for new code formatter (#21743 ) ### Description Update README.md for code formatter change (#21728)	2024-08-15 20:17:08 -07:00
Yang Gu	f8efc086ce	[js/webgpu] Support Chrome Canary in unit tests (#21750 ) Chrome Canary is helpful to test some new features. With this PR, we can enable Chrome Canary in unit tests with command like "npm test -- op abs.jsonc -b=webgpu -e=chromecanary".	2024-08-15 19:27:54 -07:00
Dmitri Smirnov	754dba2674	Change to std::fill (#21759 ) ### Description Replace `memset(0)` with `std::fill(T{})`. This would ensure that all the types are initialized in a portable way. ### Motivation and Context Some platforms exhibit intermittent failures with NaN results. Follow up to: https://github.com/microsoft/onnxruntime/pull/21525 Cc: @ranjitshs	2024-08-15 16:16:54 -07:00
Tianlei Wu	b9f3a5d5b6	Exclude cudnn 8 DLLs from manylinux package (#21746 ) ### Description It is a follow up of https://github.com/microsoft/onnxruntime/pull/21738 to exclude cudnn 8 DLLs since some python packaging pipelines (like training package) are still using cudnn 8.9 and cuda 11.8. ### Motivation and Context Size of python package for training pipeline increases a lot due to some DLLs are added to package: ![image](https://github.com/user-attachments/assets/643a808e-760b-4382-ba55-57d7d722ee9a)	2024-08-15 07:48:42 -07:00
Yi Zhang	8a59b4dc4b	Move Python Training CUDA 12.2 pipeline to another pool. (#21745 ) ### Description <!-- Describe your changes. --> ### Motivation and Context [Python Training CUDA 12.2 pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary) has been always cancelled by remote provider since Aug 2nd. But other workflows with the same pool haven't this issue. It looks like there're some weird things in Azure devops. It works by using another pool. In fact, the SKU is smaller than the old. ### Verification https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary	2024-08-15 17:31:56 +08:00
Tianlei Wu	212bcc9967	Exclude cuDNN 9 and CUDA 12 DLLs from manylinux wheel (#21738 ) ### Description Exclude cuDNN 9 and CUDA 12 DLLs from manylinux wheel to reduce python package size. ### Motivation and Context The 1.20.0 ort-nightly-gpu python wheels on linux are suddenly > 800 MB in size. The wheels built on 1.19 release branch have a size of around 220 MB. The size change is caused by https://github.com/microsoft/onnxruntime/pull/19470.	2024-08-15 00:03:10 -07:00
Yulong Wang	abdc31de40	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 ) ### Description See `454996d496` for manual changes (excluded auto-generated formatting changes) ### Why Because the toolsets for old clang-format is out-of-date. This reduces the development efficiency. - The NPM package `clang-format` is already in maintenance mode. not updated since 2 years ago. - The VSCode extension for clang-format is not maintained for a while, and a recent Node.js security update made it not working at all in Windows. No one in community seems interested in fixing those. Choose Prettier as it is the most popular TS/JS formatter. ### How to merge It's easy to break the build: - Be careful of any new commits on main not included in this PR. - Be careful that after this PR is merged, other PRs that already passed CI can merge. So, make sure there is no new commits before merging this one, and invalidate js PRs that already passed CI, force them to merge to latest.	2024-08-14 16:51:22 -07:00
Satya Kumar Jandhyala	6d8de1f7b8	Upgrade emsdk from 3.1.59 to 3.1.62 (#21421 ) ### Description Upgrade EM SDK to 3.1.62. ### Motivation and Context The changes are required to clear wasm64 errors.	2024-08-14 12:38:52 -07:00
Guenther Schmuelling	d82f15d0e3	add Gelu opset-20 to webgpu (#21725 ) https://github.com/microsoft/onnxruntime/issues/21618	2024-08-14 09:45:05 -07:00
Frank Dong	a0708a0d96	avoid redundant memory allocation for external initializers (#21682 ) ### Description avoid redundant memory allocation for external initializers, we will use mmap for external initializers later so no point to allocate memory in advance then release them later. ### Motivation and Context In current implementation, we will: 1. Allocate memory (with desired size of current initializer) for initializer first: [https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L131](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L131&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340041125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=6fN57MUsergrCX%2BBS7jztWBRmc8nx19EVvn0lUJ2Gtk%3D&reserved=0) 2. For external initializer, we will point initializer to mmaped object in memory and release previously allocated tensor: [https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L89](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L89&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340054491%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=yBtXLc%2Bhpx3IT1%2FX0664foqQ5X5O%2Fy5XNhj4Oed%2BAt4%3D&reserved=0) For large models, we are keep allocating and release memory for external initializers which seems unnecessary. For phi silica model, with this change we can reduce transient memory usage from 4,566MB to 2,724MB. Since these redundant memory is released quickly when we mmap external initializers so this change has no much impact on peak memory usage.	2024-08-13 23:13:49 -07:00
Xu Xing	7172aff1cf	[js/webgpu] Fix max pool shape end with 0 (#21698 ) Bug: https://github.com/microsoft/onnxruntime/issues/21386 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-13 20:59:24 -07:00
Prathik Rao	e32e3575d8	pin pytorch lightning version for training CI (#21731 ) ### Description <!-- Describe your changes. --> Pins pytorch-lightning package to version 2.3.3 since version >=2.4.0 requires torch > 2.1.0 which is not compatible with cu118. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19 Release Preparation	2024-08-13 20:04:56 -07:00
Yi Zhang	b92908e197	[Fix] Python API doc generation (#21717 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Make Python API doc generation workflow work. ### Verification Run https://github.com/microsoft/onnxruntime/actions/runs/10364762858	2024-08-14 08:48:29 +08:00
Dmitri Smirnov	c2911bbb1c	[CUDA] Special case for K==0 in CUDA MatMul (#21525 ) ### Description This change addresses a case where we multiply two matrices, and their inner dimension is 0. numpy and Eigen which is being used in our CPU EP implementation correctly handle this case and output a [M, N] matrix filled with zeros. ### Motivation and Context This is required to support GenAI empty input Lora implementation. Addresses: https://github.com/microsoft/onnxruntime/issues/21483	2024-08-13 11:27:05 -07:00
Scott McKay	6af5394bd7	Replace usage of jcenter in React Native build.gradle files (#21714 ) ### Description <!-- Describe your changes. --> Replace jcenter. It's deprecated and not responding. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CIs	2024-08-13 11:10:51 -07:00
liqun Fu	3439429717	Fix neural-speed ci failure (#21694 ) ### Description fix https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1461029&view=logs&j=3565c00d-48fa-5c65-7ab9-a05e12e29ed0&t=e43fe03a-689e-5dc5-9ad5-9f116eba3e9d&l=6341 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2024-08-13 10:48:25 -07:00
xhcao	9c6ee89fa7	[js/webgpu] fix two errors of attention operator (#21687 ) Fix two issues: (1) scale shall be fp32 instead of f16 (2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.	2024-08-13 09:42:34 -07:00
Yi Zhang	6db3d63add	move the A100 stage to main build (#21722 ) ### Description <!-- Describe your changes. --> ### Motivation and Context We couldn't get enough A100 agent time to finish the jobs since today. The PR makes the A100 job only runs in main branch to unblock other PRs if it's not recovered in a short time.	2024-08-13 22:48:58 +08:00
George Wu	a8462ffb61	enable qnn python arm64ec packaging (#21575 ) create the x64 qnn python package as arm64ec so it can be published publicly.	2024-08-12 22:43:17 -07:00
Sumit Agarwal	c5592fdcef	[DML EP] Update DML to 1.15.1 (#21695 ) ### Description Update DML runtime binary to 1.15.1 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-12 14:16:43 -07:00
jingyanwangms	154084efaa	Security Fuzz Test Fixes (#21608 ) ### Description Fix address sanitizer and memory access Bug 1, 4, 5, 7, 8 found in security fuzz test ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-11 03:28:41 -07:00
Yulong Wang	6ae7e02d34	Web CI: make multi-browser test job optional (#21669 ) ### Description This job is a little bit unstable. Make it optional to avoid blocking other PRs before we revise it.	2024-08-09 23:53:26 -07:00
Chi Lo	2abebb2a47	[TensorRT EP] No workspace size limit to TRT memory pool (#21643 ) We saw some models failed to run due to OOM and can be fixed by increase trt_max_workspace_size. This PR makes no size limitation by default (max device memory) which is aligned with trtexec.	2024-08-09 17:30:51 -07:00
Caroline Zhu	eeef0c8aca	Enable exporting for inference when loading from buffer without behavior changes (#21601 ) ### Description Added eval model buffer as optional field in Module so that you can export for inference using the eval model stored as a buffer. ### Motivation and Context - Resolves #21152 - Previous solution (PR #21422) produced an eval model that was specific to the EP's used to train because of unavoidable runtime optimizations that changed the graph stored with the eval session.	2024-08-09 16:59:50 -07:00
Krishna Bindumadhavan	37be90c9c8	[Quant tool]: Improve symmetric quantization range update for Relu/Clip (#21573 ) ### Description This PR improves the range calculation for input to Relu/Clip nodes for the symmetric quantization case. ### Motivation and Context Currently, the issue we face is that for the common scenario of conv followed by relu in the symmetric quantization config, different scales could assigned for the tensors corresponding to input & output of relu. The downside is that this may introduce noise due to multiple re-quant, and makes it difficult to fuse conv-relu nodes for hardware accelerators that support fused conv-relu. Instead, it is more efficient to assign the output range of relu as the input range of relu / output range of upstream op wherever possible. This adjustment is currently only being done for the asymmetric quantization case. For the scenario where the upstream op has multiple consumers, this assumption could be incorrect. For this case we do not adjust the ranges.	2024-08-09 14:48:09 -07:00
Adrian Lizarraga	390f0fd8ce	[QNN Quant tool] Fix validation of per-channel overrides for models with external data (#21656 ) ### Description Fixes validation of per-channel quantization overrides by not trying to unnecessary load the external weights. ### Motivation and Context The `get_qnn_qdq_config()` explicitly loads models without external data (i.e., `onnx.load_model(load_external_data=False)`). Afterwards, `get_qnn_qdq_config()` calls `tensor_proto_to_array()`, which expects that the external weights are stored in the current working directory. If the external weights are stored in a different directory, then we get a crash. Loading the actual weight values is unnecessary because we only need the weight shape. This PR removes the unnecessary call to `tensor_proto_to_array()` call.	2024-08-09 14:46:52 -07:00
Satya Kumar Jandhyala	51b2044120	[JS/WebGPU] Add Dequantizelinear operator (#21642 ) ### Description Added DequantizeLinear operator for JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 14:44:19 -07:00
Yifan Li	906ae77eea	[TensorRT EP] Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously (#21621 ) ### Description <!-- Describe your changes. --> Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reported and verified by https://github.com/microsoft/onnxruntime/issues/21567	2024-08-09 14:09:22 -07:00
saurabh	88788474b9	fix handling of multiple QuantizeLinear nodes (#21675 ) ### Description This fix addresses the issue of handling multiple QLinear nodes as outputs from the target node in OVEP. Previously, the stripping logic only supported a single Q node, leading to incorrect stripping of additional Q nodes. ### Motivation and Context The OVEP stripping logic was limited to handling a single Q node as an output from the target node. As a result, additional Q nodes were being stripped, despite the stripping rules indicating they should be retained. With this fix, OVEP can now properly handle multiple Q nodes according to the specified stripping rules, ensuring that the fate of each Q node is correctly determined. --------- Co-authored-by: sfatimar <sahar.fatima@intel.com>	2024-08-09 14:04:05 -07:00
Jing Fang	53a66f4e02	When quantize 4bit mamtul, force upgrade onnx domain opset to 21 (#21693 ) ### Description When quantize MatMul to DQ + MatMul using 4bit QDQ tool chain, previously the opsets of domains are not changed. Now, when quantize MatMul to DQ + MatMul in QDQ format, force upgrade onnx domain to opset 21. ### Motivation and Context In QDQ format, DQ with int4 and blocked quantization is used. This requires DQ with opset >= 21. When quantize MatMul to DQ + MatMul, force upgrade onnx domain to opset 21.	2024-08-09 13:50:12 -07:00
duanshengliu	c6a73defb8	Fix wrong per-tensor quantized weight type for matmul (#21347 ) ### Description <!-- Describe your changes. --> Fix wrong per-tensor quantized weight type for matmul. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix related bug as described in https://github.com/microsoft/onnxruntime/issues/21346	2024-08-09 13:36:25 -07:00
Jing Fang	f30581ed2c	[CPU EP] Add block quantized Gather contrib op (#21630 ) ### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.	2024-08-09 12:15:11 -07:00
Sumit Agarwal	702b2e28e0	Fuse Pad even if Cast is present in-between (#21640 ) ### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * \| * Cast (Optional) * \| * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * \| * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 06:52:59 -07:00

1 2 3 4 5 ...

11512 Коммитов Все ветки Поиск

11512 Коммитов

Все ветки