onnxruntime

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yulong Wang	99bc45dcbd	[js] add big data file to formatter ignore list (#21767 ) ### Description Add the big data file `web/test/data/ops/pad-big.jsonc` to formatter ignore list. This file slows down the formatter quite a lot at local.	2024-08-26 22:08:26 -07:00
zz002	422e6e6fb0	[VitisAI] add OpSchema, VitisAI use IKernelLookup to check supported ops, VitisAI def_builder adds TypeConstraint related processing (#21688 ) ### Description <!-- Describe your changes. --> 1. add OpSchema 2. VitisAI use IKernelLookup to check supported ops 3. VitisAI def_builder adds TypeConstraint related processing ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2024-08-26 21:16:44 -07:00
Satya Kumar Jandhyala	af18824f43	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 ) ### Description Add GatherBlockQuantized operator to JSEP. ### Motivation and Context Gemma model requires this.	2024-08-26 14:46:04 -07:00
Tianlei Wu	ad382120fe	[CUDA] enable causal in MultiHeadAttention (#21852 ) ### Description Enable causal in MultiHeadAttention cuda operator. All formats (Q_K_V_BSNH_BSNH_BSNH, Q_K_V_BSNH_BNSH_BNSH, Q_KV_BSNH_BSN2H and QKV_BSN3H) supports causal for now. Internally, casual will be dispatch to flash attention, efficient attention or unfused attention kernel. ### Motivation and Context Currently, MultiHeadAttention has causal enabled in CPU ep, but not in CUDA ep. It could cause issues in onnx conversion, like some model can run in CPU but not in CUDA. Enable causal in CUDA will reduce the difference of support matrix of CPU/CUDA.	2024-08-26 13:34:55 -07:00
Xu Xing	d9c57ac7db	[js/webgpu] Enable pad f16 uniform (#21691 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-26 07:58:48 -07:00
Yi Zhang	2877de73e1	sign native dll with correct cert (#21854 ) ### Description Fixed #21775 ### Motivation and Context The dlls should be signed with Keycode CP-230012. The default is the test code sign.	2024-08-26 16:46:19 +08:00
Caroline Zhu	983c4d57a4	Fix typo for react native pipeline (#21845 ) ### Description fix typo ### Motivation and Context [RN pipeline failing](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=188&_a=summary) since #21578 with this error: ![image](https://github.com/user-attachments/assets/75e5b968-572f-42cc-9816-7940de464cfa)	2024-08-26 12:05:11 +10:00
Ted Themistokleous	9a70475622	[MIGraphX EP Support]Remove default noopt for Migraphx EP in Benchmark.py (#21843 ) …ripts (#58) ### Description <!-- Describe your changes. --> Removes the heavy handed no opt for all MIGraphX using the benchmark.py scripts ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Finding this hurts performance if we remove all optimizations. Let the fine tuning occur at the script level instead of a blanket NoOPT being selected Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2024-08-24 22:01:08 -07:00
Jiajia Qin	87165b92e9	[js/webgpu] optimize MatmulNBits (#21747 ) ### Description <!-- Describe your changes. --> See 2x speedup for phi3 on the integrated intel gpu with this optimization. The optimization is mainly to store input A's data into local variable instead of loading them from global memory each time when calculate them with B data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-23 16:36:00 -07:00
duanshengliu	4af6291841	Refine `op_types_to_quantize` argument handling in matmul_4bits_quantizer.py (#21815 ) ### Description <!-- Describe your changes. --> Refine `op_types_to_quantize` argument handling in matmul_4bits_quantizer.py ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The default `op_types_to_quantize "MatMul"` will cause `tuple(args.op_types_to_quantize)` to become `('M', 'a', 't', 'M', 'u', 'l')`, which is not expected.	2024-08-23 13:45:06 -07:00
Sheil Kumar	44dcc3aafd	Replace "DML CPU" Allocator with onnxruntime::CpuAllocator (#21818 ) ### Description Replace "DML CPU" Allocator with onnxruntime::CpuAllocator ### Motivation and Context This allocator is being ignored by ORTExtensions and causes CPU memory to be treated as non-CPU memory and crash in SentencepieceTokenizer. In general it seems like this allocator is not used and can be handled just fine by the default allocator. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-08-23 10:35:57 -07:00
Edward Chen	5726318ec0	[CoreML EP] Fix ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access. (#21797 )	2024-08-23 10:19:53 -07:00
Frank Dong	4c4ae1e490	enable large initializer offset align for save external data in ORT (#21604 ) ### Description Address issue #21524 Enable offset align for model saved as external data format python data convertor fix here: https://github.com/onnx/onnx/pull/6248 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-22 23:29:14 -07:00
Jiajia Qin	27a6890529	[js/webgpu] Optimize conv1d by conv2d (#19388 ) ### Description <!-- Describe your changes. --> Optimize conv1d to go to the conv2d path to utilize the conv2d's optimization path. See whisper-tiny-encoder model becomes 158.66 ms from 532.28 ms. Conv goes to Conv2DMatMul(8 ms) instead of GroupedConv(382 ms). Old profiling result: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- Conv\\|GroupedConv \| 382.99 \| 71.95 MatMul \| 126.16 \| 23.70 Softmax \| 7.01 \| 1.32 Transpose \| 4.59 \| 0.86 Add \| 4.39 \| 0.82 Mul \| 2.36 \| 0.44 Div \| 1.44 \| 0.27 ReduceMean\\|ReduceMeanShared \| 1.25 \| 0.23 Erf \| 0.85 \| 0.16 Sub \| 0.72 \| 0.14 Pow \| 0.46 \| 0.09 Sqrt \| 0.07 \| 0.01 Sum \| 532.28 \| New profiling result with this PR: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- MatMul \| 127.07 \| 80.09 Conv\\|Conv2DMatMul \| 8.00 \| 5.04 Softmax \| 6.95 \| 4.38 Transpose \| 4.65 \| 2.93 Add \| 4.26 \| 2.68 Mul \| 2.56 \| 1.61 Div \| 1.51 \| 0.95 ReduceMean\\|ReduceMeanShared \| 1.31 \| 0.83 Erf \| 0.85 \| 0.54 Sub \| 0.79 \| 0.50 Pow \| 0.46 \| 0.29 Conv\\|Transpose \| 0.26 \| 0.17 Sqrt \| 0.00 \| 0.00 Sum \| 158.66 \| --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-22 22:56:07 -07:00
Preetha Veeramalai	0368dd4ea4	Ovep 1.19 bug fix 2 (#21829 ) ### Description Handles bug fix for EPCtx file path assertions. --------- Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>	2024-08-22 19:36:22 -07:00
Yueqing Zhang	37a7dd7d63	[VitisAI] optimize model clone (#21706 ) ### Description <!-- Describe your changes. --> Optimize the memory consumption for model_clone which is a crucial part of our model preparation ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is curcial for meeting the requirement for Microsoft's 8.15 release. --------- Co-authored-by: Yueqing Zhang <yueqingz@amd.com> Co-authored-by: Chunye Wang <chunywan@amd.com>	2024-08-22 13:28:31 -07:00
Guenther Schmuelling	ba7baae994	Revert "Upgrade emsdk from 3.1.59 to 3.1.62" (#21817 ) Reverts microsoft/onnxruntime#21421 Users are seeing chrome memory grow to 16GB before it crashes: https://github.com/microsoft/onnxruntime/issues/21810 Revert for now so we have time to debug.	2024-08-22 11:21:00 -07:00
Adrian Lizarraga	514b4699b4	[QNN EP] Apply workaround for Conv validation bug when bias input is implicit (#21764 ) ### Description - Adds a dummy bias of all zeros when translating a Conv without an explicit bias input. This is a workaround for a QNN validation issue that fails when the optional bias input is not provided. - Corrects logic for unpacking of non-zero int4 zero-points. Bug does not impact models because we currently only support int4 zero-points equal to 0 (symmetric quant). But this would become an issue in the future if/when QNN supports non-zero int4 zero-points (so good to fix now). ### Motivation and Context Support Conv operators without a bias input on QNN EP with the latest QNN SDK.	2024-08-22 10:38:03 -07:00
Jian Chen	6c1a3f85a6	Do not allow clearing Android logs if the emulator is not running (#21578 ) ### Description Do not allow clearing Android logs if the emulator is not running ### Motivation and Context Previously the Clearing Android logs step stuck until the pipeline timeout. If one of the previous steps failed.	2024-08-22 10:18:01 -07:00
Chen Feiyue	ff3e8b02c3	[VSINPU]Update vsinpu patches (#21402 ) ### Description - update patches for accuracy modification && local result recording	2024-08-21 23:58:56 -07:00
Yueqing Zhang	3ff8ca29e5	[VitisAI] remove wrong error msg, required by Microsoft (#21715 ) ### Description <!-- Describe your changes. --> Remove legacy code and wrong message. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required by Microsoft to remove unwanted error message. This is required for 8.15 release. Co-authored-by: Yueqing Zhang <yueqingz@amd.com>	2024-08-21 21:10:28 -07:00
Tianlei Wu	25d7a4fa08	[CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804 ) Use debug info to identify sdpa kernel actually used, and show it in the output of benchmark_mha.py. This updated benchmark script was used to get the benchmark results in https://github.com/microsoft/onnxruntime/pull/21629. (1) Change the output format of debug info to output like SdpaKernel=* (2) Add a step to capture stdout from onnxruntime session, and use regular expression to parse SdpaKernel=* from the captured text. Other minor changes: (1) Set different default repeats during benchmark: 100 for CPU; and 10000 for CUDA. (2) Fix PrintTensorByDims used in console dumper: if it is not enabled, do not dump tensor. (3) Update some comments ### Motivation and Context Sometime, we will use fallback for a sdpa_kernel. It could confuse user unless we can tell exact kernel is used in benchmark.	2024-08-21 17:30:16 -07:00
Tianlei Wu	44a3923ba5	run sparse attention test sequentially (#21808 ) ### Description For some reason, run SparseAttention tests in parallel causes random failure in CI pipeline. Maybe due to out of memory when too many tests running in parallel. This will run those tests in sequentially.	2024-08-21 17:24:58 -07:00
Jake Mathern	c0b68e77af	Fix warnings (#21809 ) ### Description Minor changes to resolve some warnings in ORT ### Motivation and Context Binskim for WindowsAI (which consumes ORT) treats warnings as errors, and has hit these warnings. As a security requirement, warnings like "signed/unsigned mismatch" must be resolved.	2024-08-21 14:23:37 -07:00
Edward Chen	fb9ce18e88	Add K=0 check to MatMul<float>::Compute() specialization. (#21803 ) Add K=0 check to `MatMul<float>::Compute()` specialization. Add unit test to cover both primary template and float specialization.	2024-08-21 09:15:58 -07:00
Ted Themistokleous	0e827c27fb	[MIGraphX EP] Add support for MIGraphX Exhaustive tune flag (#46 ) (#21599 ) ### Description <!-- Describe your changes. --> Set the exhaustive tune flag through the MIGraphX API and make this a Session option in Onnxruntime ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Allow users to use MIGraphX Exhaustive tuning with Onnxruntime inferences This goers hand in hand with save/load after a model and been compiled and tuning has found. --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-08-21 07:32:12 -07:00
Ted Themistokleous	26a499323f	[MIGraphX EP Support] Update migx scripts (#21806 ) ### Description <!-- Describe your changes. --> No code changes to the EP only changes to the scripts whihc invoke MIGraphX EP - One case be explicit to set MIGraphX EP when running gpt2 testing - The other to ensure we turn off optimizations like tensorRT and allow MIGraphX to handle graph optimizations ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> MIGraphX has moved away from using rocBLAS and without this, some cases used in CI shall fail as optmizations will attempt to use rocBLAS kernels instead of MIGraphx EP directly.	2024-08-21 07:22:42 -07:00
Ted Themistokleous	ed155ad46a	[MIGraphX EP] Ensure we support all inputs for MatMulInteger and ConvInteger. (#21680 ) … to int8 for now Allow for models with biases/full input and only check for int8 support in EP ### Description <!-- Describe your changes. --> Allows for all inputs for MatMulInteger and ConvInteger to be supported for prequantized models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixes issues when using prequantized models that contain weight biases --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2024-08-21 07:19:20 -07:00
mindest	009209e016	Fix Orttraining Linux Lazy Tensor CI Pipeline (#21652 ) ### Description Fix `Orttraining Linux Lazy Tensor CI Pipeline` - Remove unused import of `torch.onnx._internal.exporter`, whose path is changed in newer torch (pytorch/pytorch#132429). - Move import of `register_custom_op_symbolic` from `torch.onnx` into local function, which causes circular import when running `import torch.onnx` (at least in the CI environment).	2024-08-21 18:10:08 +08:00
Patrice Vignola	de6ebcbb54	[DML] Add int4 QDQ (#21592 )	2024-08-20 23:44:58 -07:00
Yi Zhang	12f426c63f	update size limit check of training GPU wheel (#21762 ) ### Description <!-- Describe your changes. --> ### Motivation and Context The training wheel size limit should be 400M	2024-08-21 09:30:05 +08:00
Adrian Lizarraga	6fbb0ae81a	[TransposeOptimizer] Fix axis for QuantizeLinear inserted after DQ (per-channel) -> Unsqueeze (#21793 ) ### Description - Fix computation of axis for `QuantizeLinear` inserted after the sequence `DQ (per-channel) -> Unsqueeze`. Example: - Original: `DQ (axis = 0) -> Unsqueeze (axes = [0, 1, 2]) -> Op` - After QDQ fix-up: `DQ (axis = 0) -> Unsqueeze (axes = [0, 1, 2]) -> Q (axis = 3) -> DQ (axis = 3) -> Op` - Before this PR, the axis for the inserted Q/DQ ops was not correctly set to 3 (left as 0). - Fix normalization of negative axis values for `QuantizeLinear` inserted after the sequence `DQ (per-channel) ->Transpose` - Existing code added the wrong rank value to normalize the DQ axis. ### Motivation and Context Fix errors in handling of per-channel DQ in code that fixes QDQ NodeUnits.	2024-08-20 16:26:02 -07:00
Adrian Lizarraga	28c252c77e	[QNN EP] Fix compile error for QNN EP on Windows x64 due to missing /bigobj flag (#21795 ) ### Description Compiling onnxruntime with QNN EP on Windows x86_64 results in a compilation error: ```shell $ onnxruntime\test\optimizer\qdq_transformer_test.cc(1,1): error C1128: num ber of sections exceeded object file format limit: compile with /bigobj [...onnxruntime\build\Debug\onnxruntime_test_all.vcxproj] ``` This PR adds the `/bigobj` compilation flag for the `qdq_transformer_test.cc` file.	2024-08-20 10:11:43 -07:00
Tianlei Wu	fbc3927231	[CUDA] cuDNN Flash Attention (#21629 ) ### Description - [x] Add cuDNN flash attention using cudnn frontend, and enable it in MultiHeadAttention operator. - [x] Support attention mask. - [x] Support attention bias. - [x] Update tests and benchmark script. The cuDNN SDPA is disabled by default. To enable it, need the following: (1) Requires cuDNN 9.3 or newer version installed. (2) Set an environment variable `ORT_ENABLE_CUDNN_FLASH_ATTENTION=1` or set `sdpa_kernel=8` cuda provider option to enable it. (3) Only works on devices with compute capability >= 8.0. Note that some combinations of parameters might be rejected due to limited support of head dimension or sequence lengths. Future Works: (1) FP8 and BF16 APIs. Currently, only API for FP16 are exposed. (2) Add API to support ragged batching (padding removed in inputs). (3) Support other input formats (like QKV_BS3NH). (4) Currently, q are converted to BSNH, k/v are converted to either BSNH or BNSH format. May do some experiment to see whether converting q to BNSH could be better in some case. ### Example Benchmark Results on H100 The following tests are on FP16 MultiHeadAttention operator without attention mask and attention bias. #### Test Setting 1 batch_size \| sequence_length \| past_sequence_length \| num_heads \| head_size -- \| -- \| -- \| -- \| -- 16 \| 256 \| 0 \| 32 \| 128 format \| average_latency \| tflops \| kernel -- \| -- \| -- \| -- Q,K,V (BNSH) \| 0.000075 \| 229.5 \| torch:flash Q,K,V (BNSH) \| 0.000119 \| 144.8 \| torch:efficient Q,K,V (BNSH) \| 0.000224 \| 76.5 \| torch:math Q,K,V (BSNH) \| 0.000075 \| 227.8 \| ort:cudnn Q,K,V (BSNH) \| 0.000094 \| 182.8 \| ort:flash Q,K,V (BSNH) \| 0.000138 \| 124.7 \| ort:efficient Q,K,V (BSNH) \| 0.000438 \| 39.3 \| ort:math Q,KV \| 0.000129 \| 133.0 \| ort:cudnn Q,KV \| 0.000151 \| 114.1 \| ort:flash Q,KV \| 0.000194 \| 88.5 \| ort:efficient QKV \| 0.000154 \| 111.8 \| ort:cudnn QKV \| 0.000175 \| 98.0 \| ort:flash QKV \| 0.000217 \| 79.0 \| ort:efficient #### Test Setting 2 batch_size \| sequence_length \| past_sequence_length \| num_heads \| head_size -- \| -- \| -- \| -- \| -- 16 \| 512 \| 0 \| 16 \| 64 format \| average_latency \| tflops \| kernel -- \| -- \| -- \| -- Q,K,V (BNSH) \| 0.000069 \| 249.2 \| torch:flash Q,K,V (BNSH) \| 0.000141 \| 121.7 \| torch:efficient Q,K,V (BNSH) \| 0.000294 \| 58.5 \| torch:math Q,K,V (BSNH) \| 0.000077 \| 221.7 \| ort:cudnn Q,K,V (BSNH) \| 0.000087 \| 196.6 \| ort:flash Q,K,V (BSNH) \| 0.000163 \| 105.6 \| ort:efficient Q,K,V (BSNH) \| 0.000651 \| 26.4 \| ort:math Q,KV \| 0.000103 \| 167.1 \| ort:cudnn Q,KV \| 0.000117 \| 146.3 \| ort:flash Q,KV \| 0.000192 \| 89.6 \| ort:efficient QKV \| 0.000113 \| 151.5 \| ort:cudnn QKV \| 0.000128 \| 134.7 \| ort:flash QKV \| 0.000201 \| 85.3 \| ort:efficient	2024-08-20 08:50:22 -07:00
Yi Zhang	9f7e19cedd	[Fix] Make python API doc generation in Microsoft-hosted Agent (#21766 ) ### Description <!-- Describe your changes. --> ### Motivation and Context 1. Python API doc needs to be merged from a fork, but 1ES self-hosted pool is only for one github repo. 2. ubuntu-latest will be install numpy above 2.0 by default, and current python API doc generation doesn't support it. So I pin numpy < 2.0.0 ---------	2024-08-20 23:32:38 +08:00
Satya Kumar Jandhyala	1fb2e71ddc	[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782 ) Avoid producing presentKey/presentValue outputs if pastKey/pastValue don't exists. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-19 18:02:19 -07:00
Adrian Lizarraga	a22cc078b4	[QNN EP] Add support for GatherElements (#15966 ) ### Description - Adds support for the GatherElements operator to QNN EP. - Adds GatherElements to QDQ quantizer tool. ### Motivation and Context Enable more models to run on QNN EP.	2024-08-19 14:33:40 -07:00
Tianlei Wu	7c93d5ded1	Upgrade pytorch_lightning to 2.3.3 to fix orttraining_amd_gpu_ci_pipeline (#21789 ) ### Description Upgrade pytorch_lightning to fix orttraining_amd_gpu_ci_pipeline ``` #24 1.838 WARNING: Ignoring version 1.6.0 of pytorch_lightning since it has invalid metadata: #24 1.838 Requested pytorch_lightning==1.6.0 from `cee67f4849`3d9443e138a4172ec786/pytorch_lightning-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with `==` or `!=` operators #24 1.838 torch (>=1.8.*) #24 1.838 ~~~~~~^ #24 1.838 Please use pip<24.1 if you need to use this version. #24 1.838 ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10 #24 1.838 ERROR: Could not find a version that satisfies the requirement pytorch_lightning==1.6.0 (from versions: 0.0.2, 0.2, 0.2.2, 0.2.3, 0.2.4, 0.2.4.1, 0.2.5, 0.2.5.1, 0.2.5.2, 0.2.6, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.4.1, 0.3.5, 0.3.6, 0.3.6.1, 0.3.6.3, 0.3.6.4, 0.3.6.5, 0.3.6.6, 0.3.6.7, 0.3.6.8, 0.3.6.9, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.5.0, 0.5.1, 0.5.1.2, 0.5.1.3, 0.5.2, 0.5.2.1, 0.5.3, 0.5.3.1, 0.5.3.2, 0.5.3.3, 0.6.0, 0.7.1, 0.7.3, 0.7.5, 0.7.6, 0.8.1, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.10.0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0rc1, 1.3.0rc2, 1.3.0rc3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.7.post0, 1.3.8, 1.4.0rc0, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0rc0, 1.8.0rc1, 1.8.0rc2, 1.8.0, 1.8.0.post1, 1.8.1, 1.8.2, 1.8.3, 1.8.3.post0, 1.8.3.post1, 1.8.3.post2, 1.8.4, 1.8.4.post0, 1.8.5, 1.8.5.post0, 1.8.6, 1.9.0rc0, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 2.0.0rc0, 2.0.0, 2.0.1, 2.0.1.post0, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.9.post0, 2.1.0rc0, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0) #24 1.838 ERROR: No matching distribution found for pytorch_lightning==1.6.0 ```	2024-08-19 12:58:22 -07:00
Jing Fang	64674c50de	Added a tool to quantize Gather to GatherBlockQuantized (#21697 ) ### Description Added code in MatMul4BitsQuantizer to quantize Gather to GatherBlockQuantized. Only Gather with constant data is quantized. Since quantized data is in int4, the quantized model will force upgrade to onnx opset 21. The implementation purely relies on numpy. If optimization is needed, C++ kernels can be added later. Only support default RTN algorithm since GatherBlockQuantized require zero points to have the same type as quantized data. ### Motivation and Context Support quantizing gather to int4 in Web scenario.	2024-08-19 10:25:36 -07:00
Wanming Lin	7ae0b4ce64	[WebNN EP] Support Erf and Trilu for CPU backend (#21768 )	2024-08-19 07:56:16 -07:00
xhcao	417aa00406	[js/webgpu] fix conv1d error (#21585 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 15:45:13 -07:00
mingyueliuh	d1d40fbafd	[VitisAI][Fix] ShapeInferContext GetAttrxxxs support empty value (#21471 ) ### Description Bug fix for the ShapeInferContext GetAttrxxxs APIs. Node attribute maybe is empty. ### Motivation and Context If the attr value is empty, the expected result through the interface is empty , but currently, it returns a meaningless {0}. --------- Co-authored-by: mingyue <mingyue@amd.com> Co-authored-by: Liu Minyue <mingyue@xilinx.com>	2024-08-18 13:51:25 -07:00
jingyanwangms	c018ba43ef	[Running CI] [TensorRT EP] support TensorRT 10.3-GA (#21742 ) ### Description - TensorRT 10.2.0.19 -> 10.3.0.26 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 13:26:41 -07:00
Jiajia Qin	c4ade796d6	[js/webgpu] Fix attention shader recompilation issue (#21770 ) ### Description <!-- Describe your changes. --> This PR fixes the `AttentionProbsSoftmax` recompilation issue when executing the phi3 model. With this fix, it will further improve the phi3 performance. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-17 17:15:15 -07:00
Yang Gu	49fc168eed	[js/webgpu] Handle negative axis in op Split (#21771 ) This is to fix issue #21703, where the axis is a negative value in the model. According to the spec (https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means counting dimensions from the back.	2024-08-17 16:41:23 -07:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Edward Chen	63e8849992	build_aar_package.py - Check that executable is present before trying to copy it. (#21730 ) Check that executable is present before trying to copy it. Accommodate builds where we skip building the test executables.	2024-08-16 11:21:09 -07:00
Emmanuel	a4bec3d374	Enabled Dynamo exporter (#21713 ) ### Description This PR modifies the run_dynamo_export function to ensure it mirrors the behavior of run_torchscript_merged_export rather than run_torchscript_separate_export. Additionally, I made adjustments to the main function to ensure that run_dynamo is correctly invoked. ### Motivation and Context The main motivation for this change is to enable successful export of LLaMA-2 and LLaMA-3 models using the Dynamo exporter to ONNX. Previously, the exporter was saving two copies of the weights, which is inefficient. The modified approach ensures that only one copy of the weights is saved, and the model can support both scenarios. These changes enhance the compatibility of the exporter with LLaMA models and subsequently other models and optimize the export process	2024-08-16 10:45:22 -07:00
Wanming Lin	b2d603abda	[WebNN EP] Remove workaround for scalar (#21704 ) Currently Chromium has supported scalar with dims = {}, remove legacy workaround for supporting scalar.	2024-08-15 22:59:51 -07:00
Scott McKay	c97cc5c1b0	Put all external project targets under the 'External' folder in VS (#21765 ) ### Description <!-- Describe your changes. --> Handle targets in subdirectories for external projects. All targets will now go in a per-project folder under 'External' e.g. gmock and gtest now get handled correctly and are under External/googletest vs. existing setup where they ended up as top-level projects. ![image](https://github.com/user-attachments/assets/99ec259c-47cd-44f3-954d-58569c941cc2) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve developer experience.	2024-08-16 15:51:50 +10:00

1 2 3 4 5 ...

11546 Коммитов Все ветки Поиск

11546 Коммитов

Все ветки