onnxruntime

Граф коммитов

Автор	SHA1	Сообщение	Дата
Hector Li	190588bb64	Enable QNN weight sharing (#21077 ) ### Description Enable QNN weight sharing across graphs in single context Create tool to generate QNN context cache model with weight sharing enabled.	2024-09-04 11:20:33 -07:00
Yueqing Zhang	9031112c8e	[VitisAI] add registered custom op for perf test (#21336 ) ### Description <!-- Describe your changes. --> Register for custom op when testing the performance ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is needed for providers to test their implementation	2024-09-04 11:13:35 -07:00
zz002	bf8a8e7e36	[VitisAI] Bug fixes in model_clone (#21950 ) ### Description <!-- Describe your changes. --> VitisAI bug fixes in model clone ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2024-09-04 10:29:17 -07:00
Edward Chen	cbf3c50d75	Improve stability of Android ReactNative E2E test (#21969 ) - Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test - Attempt to close any Application Not Responding (ANR) dialog prior to running Android test - Add `--take-screenshots failing` option to detox test commands to save screenshots on failure	2024-09-04 08:41:07 -07:00
Chen Feiyue	d4290f6e7f	Update vsinpu ep cross-compiling patch (#21963 ) - Block the bf16 && ummla gemm functions because we cannot support these features yet	2024-09-03 22:54:43 -07:00
Yueqing Zhang	dd2425932d	[VitisAI] Fix model path (#21911 ) ### Description <!-- Describe your changes. --> Change the .data path so it is on the same path as the model path. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This would fix the issue if a model has .data file, the executable can't read the data if the model is in another directory.	2024-09-03 22:42:01 -07:00
Yulong Wang	decb3852a0	refactor: extract shared util function ComputeBroadcastOutputShape (#21940 ) ### Description This is used in multiple places.	2024-09-03 18:21:36 -07:00
Tianlei Wu	628c0a8f0e	Remove unused find_cudnn_supported_cuda_versions (#21620 ) ### Description The function find_cudnn_supported_cuda_versions is not used anymore. Remove it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-03 14:38:33 -07:00
sfatimar	8dba8e3e24	Memory Optimization for Compilation in OVEP (#21872 ) Calling Split API Calls Read+Model in lieu of unified Compile Model call for export compile flow to ensure memory optimization. Freeing up model proto and serialized string and read model ov ir later to free up memory for the ahead pipeline Optimization during EpCtxt flow All the Graph related operations require all the Node Attributes to be set while dealing with model instances internally with them, in the existing implementation these attributes make a copy when constructing a Graph dynamically during runtime. Propose to use these attributes in place without creating a copy to avoid memory allocation / copy while calling these Graph related functions. Changes to ensure the bug fixes related to openvino version and epctxt file path. Moving Compiler version to C++20 for getting r-value mem optimizations benefit ### Motivation and Context This change is required because memory optimization during Compilation flow is too high. --------- Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: ankitm3k <ankit.maheshkar@intel.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>	2024-09-03 13:52:31 -07:00
Yi Zhang	4962252c8f	Enable xnnpack ep works in current windows xnn ci (#21951 ) ### Description The EP wasn't added in session option in onnxruntime_test_all. ### Motivation and Context After this PR onnxruntime_test_all --gtest_filter=\xnnpack\maxpool\* can step into `8c5336449d/onnxruntime/core/providers/xnnpack/nn/max_pool.cc (L209)` --------- Co-authored-by: Yi Zhang <your@email.com>	2024-09-03 10:02:00 -07:00
Chester Liu	5c74539ab7	Fix copying ORT dylib into wheel on macOS (#21931 ) Fix #21223 on macOS --------- Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2024-09-03 11:08:25 +08:00
Yulong Wang	257792225f	revert forceinline for MakeString (#21943 ) ### Description revert forceinline for MakeString. This change reverts https://github.com/microsoft/onnxruntime/pull/21893. The forceinline was introduced for performance considerations, however it turns out to have some notable binary size increase, which is a concern for some binary size sensitive platforms like Android. I made a few tests locally and found it is not related to whether or not have used the template struct `if_char_array_make_ptr_t` trick. So I have to revert this back.	2024-09-02 19:01:08 -07:00
Scott McKay	e788b3d30e	Fix C# warnings. (#21913 ) ### Description <!-- Describe your changes. --> Update some testing dependencies. Fix various warnings. Mainly around documentation (existing) and unit test usage (mainly resulting from xunit update). Invalid angle brackets for generics in documentation were changed to use curly braces based on https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/xmldoc/ > To refer to generic identifiers in code reference (cref) elements, you can use either the escape characters (for example, cref="List<T>") or braces (cref="List{T}"). As a special case, the compiler parses the braces as angle brackets to make the documentation comment less cumbersome to the author when referring to generic identifiers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-03 10:08:29 +10:00
Yulong Wang	bad00a3657	Add dependency dawn into deps.txt (#21910 ) ### Description Add dependency dawn into deps.txt. This is a preparation for introducing WebGPU EP.	2024-09-02 04:24:28 -07:00
Kyle	b1ae43cbcb	Add Files Signature Validation after Signed by ESRP (#21949 ) ### Description <!-- Describe your changes. --> Files signature validation after signed by ESRP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - Add validation after the ESRP process. - Make sure the targeting pattern/suffix files are signed successfully by ESRP. - If the signature is not Valid, then will fail the following stages.	2024-09-02 17:16:59 +08:00
Yulong Wang	8c5336449d	Stop VSCode appending file associations to settings.json (#21944 ) ### Description If you open onnxruntime source code using VSCode with C/C++ extension, it's keeping adding file associations for C/C++ headers into this settings.json. This is annoying when staging/committing changes. Add a configuration to disable this behavior. see: - https://stackoverflow.com/questions/65220185/how-to-stop-vs-code-to-keep-adding-standard-c-libraries-to-the-file-associatio - https://github.com/microsoft/vscode-cpptools/issues/722#issuecomment-480329005	2024-08-31 19:04:12 -07:00
mingyueliuh	047f32c79d	[VitisAI] Remove shape infer from bridge ort (#21331 ) ### Description Vitis AI EP's custom op are completely self contained within Vitis AI EP implementation (rather than needing to add static functions in provider_bridge). --------- Co-authored-by: liumingyue <mingyue@xilinx.com>	2024-08-31 08:57:23 -07:00
aciddelgado	509cb54d6f	softcap gqa (#21683 ) ### Description Implement softcap for gqa. ### Motivation and Context Fixes certain models like Gemma-2 which need softcap to work so they don't output nan's.	2024-08-30 19:11:04 -07:00
Jing Fang	5dee95fa10	[CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846 ) ### Description 1. Added CUDA EP support for blocked quantization in QuantizeLinear and DequantizeLinear ops. 2. Currently CUDA EP blocked quantization only supports int4/uint4 quantized types and float32/float16 unquantized types. 3. Added CUDA EP support in QDQ selector/action transformer. CUDA EP is only added to DQ + MatMul -> MatMulNBits rule. Other rules' EP support are not changed. ### Motivation and Context ONNX opset 21 introduced blocked quantization for Q/DQ opts. ORT originally only supports CPU EP blocked quantization.	2024-08-30 18:28:00 -07:00
Yi Zhang	60b07623a2	Add a reminder in set-trigger-rules script (#21929 ) ### Description After editing the set-trigger-rules.py, we must run the file. ### Motivation and Context Obviously the script wasn't run because some files's name are incorrect.	2024-08-30 12:18:10 -07:00
Ranjit Ranjan	02e3a430af	[AIX] Python binding enablement and gcc support (#21934 ) ### Description Enabling python binding and gcc support for AIX. ### Motivation and Context Code changes in this PR contains: 1. python binding enablement 2. gcc building support Below are list of files and the description. 1. cmake/CMakeLists.txt [gcc building support] -no-unused-function compiler flag addition for IBMClang 2. cmake/external/eigen.cmake [gcc building support] AIX check for applying the AIX patch 3. cmake/onnxruntime_python.cmake [python binding ] putting NOT AIX check for -Xlinker 4. cmake/onnxruntime_unittests.cmake [gcc building support] Fix for gtest behavior. Check the comment . [python binding ] using -Wl,-brtl for linking onnxruntime_providers_shared in test_execution_provider 5. cmake/patches/eigen/eigen-aix.patch [gcc building support] In AIX gcc, we are hitting __builtin_cpu_supports("mma") which is not supported yet. So patching code for this method . Patched code will check for P10 Processor at run-time and based on that routine will be set. 6. onnxruntime/python/onnxruntime_validation.py [python binding ] Adding AIX check in check_distro_info() 7. onnxruntime/test/providers/cpu/generator/random_test.cc [gcc building support] updating previous check for AIX , along with clang. So in case of gcc, else block will hit. 8. onnxruntime/test/python/onnxruntime_test_python.py [python binding ] powerpc check on platform.processor() 9. setup.py [python binding ] Adding AIX check for list of libs.	2024-08-30 12:17:26 -07:00
Changming Sun	1f879c3282	Disable absl symbolize in Windows Release build (#21923 ) ### Description This change disables Abseil's symbolize functionality in Windows non-debug builds. ### Motivation and Context To solve #21826. Avoid having a dependency on dbghelp.dll.	2024-08-30 12:03:17 -07:00
mindest	bfa4da4f65	Add Linux ROCm CI Pipeline (#21798 ) ### Description * Add new ROCm CI pipeline (`Linux ROCm CI Pipeline`) focusing on inference. * Resolve test errors; disable flaky tests. based on test PR #21614.	2024-08-30 14:50:32 +08:00
dependabot[bot]	924259617d	Bump Sixlabors.ImageSharp from 2.1.8 to 2.1.9 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#21920 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.8 to 2.1.9.	2024-08-29 21:58:02 -07:00
dependabot[bot]	4ac1558498	Bump torch from 1.13.1+cpu to 2.2.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/torch_eager_cpu (#21919 ) Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1+cpu to 2.2.0.	2024-08-29 21:57:24 -07:00
Wanming Lin	7550fec4aa	Revert "[WebNN EP] Remove NHWC preferred layout" (#21905 ) Reverts microsoft/onnxruntime#21570	2024-08-29 18:01:56 -07:00
aciddelgado	0223e8647b	Fix num splits bug (#21899 ) ### Description Found a bug with num splits where the heuristic isn't being performed properly due to incorrect passing of sequence length to heuristic function. ### Motivation and Context We were experiencing significant performance issues with long sequence length with flash attention due to this misconfiguration.	2024-08-29 15:00:53 -07:00
Jian Chen	fd88474077	Fix a CG issue that require upgrade transformer from 4.36 to 4.38 (#21900 ) ### Description Fix a CG issue that require upgrade transformer from 4.36 to 4.38 ### Motivation and Context See CG [link](https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474680?typeId=26218094&pipelinesTrackingFilter=0) Also the other [CG item](https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474678?typeId=26218094&pipelinesTrackingFilter=0) to request update 4.72 to 4.38	2024-08-29 14:53:15 -07:00
Sheil Kumar	867e0401a7	Catch statement causing build failures for flavors with EHsc disabled (#21902 ) ### Description Catch in etw_sink.cc is causing build failures for flavors with EHsc disabled. Remove the catch and set the Failure state as a response the FAILED check. ### Motivation and Context Catch in etw_sink.cc is causing build failures for flavors with EHsc disabled. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-08-29 12:15:39 -07:00
Yulong Wang	32af2ba68f	enhance string util functions (#21893 ) ### Description - make `MakeString` force inline - refactor ORT_FORCEINLINE macro - move to one place to avoid macro redefinition error - ~~add a `StringJoin` utility~~ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-29 10:37:50 -07:00
Xu Xing	01673389b8	[js/webgpu] Enable conv+clip fuse on mobilenetv2-12-f16 (#21234 ) There are failures for some inputs. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-29 08:03:02 -07:00
Yi Zhang	be76e1e1b8	Add dependent stages in nuget packaging pipelines (#21886 ) ### Description Since the stage need to download drop-extra, it should add the dependencies ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-29 11:34:10 +08:00
Guenther Schmuelling	4fece0430f	remove duplicate function definition (#21903 )	2024-08-28 16:18:56 -07:00
duanshengliu	7df8776322	Add overflow protection for quantization bias to reduce quantization precision loss (#21645 ) ### Description <!-- Describe your changes. --> When the scale of the bias is too small, the quantized bias may exceed the range of `int32`, leading to significant loss of precision. Therefore, before converting quantized bias to `int32`, it needs to be clipped within the range of `int32` to reduce the loss of quantization precision. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the issue https://github.com/microsoft/onnxruntime/issues/21000	2024-08-28 14:29:17 -07:00
xhcao	3bfb5e4f62	[js/webgpu] support float16 for Clip (#21584 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-28 13:19:20 -07:00
Wanming Lin	59114227fd	[WebNN EP] Remove NHWC preferred layout (#21570 ) Currently WebNN CPU backend has supported NCHW layout in Chromium, we can now drop NHWC preferred layout for CPU backend in WebNN EP to simplify the code.	2024-08-28 13:17:34 -07:00
Ye Wang	bf8855ba3c	Support Smooth Softmax in fmha (#21885 ) ### Description <!-- Describe your changes. --> refer to https://github.com/microsoft/onnxruntime/pull/21867 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <you@example.com>	2024-08-28 09:29:33 -07:00
AlbertGuan9527	ef073fd8f4	Add session and run option workload_type for applications to set efficient mode. (#21781 ) ### Description This PR added session and run option workload_type, this option is the knob for applications to enable/disable the processor performance efficient mode. ### Motivation and Context The efficient mode is co-engineered with processor vendors to allow applications voluntarily being serviced at a more energy efficient performance level. This functionality can be used by long running, latency insensitive application to save the energy consumption.	2024-08-28 08:17:01 -07:00
Jian Chen	e95277484e	Adding $(Build.SourcesDirectory)s to the ignoreDirectories (#21878 )	2024-08-27 19:56:48 -07:00
George Wu	23f3912334	support both qnn x64 and arm64ec stages in py packaging pipeline (#21880 ) both arm64ec and x64 packages are needed. x64 is needed for offline context binary generation and arm64ec is needed for interop with python packages that don't have prebuilt arm64 packages and only have x64.	2024-08-27 15:07:30 -07:00
Yulong Wang	d2a1b7a353	Introduce custom external data loader (#21634 ) ### Description This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU. ### Motivation and Context - In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (`sizeof(size_t)==4`), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models. - ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them. - Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly. ### Design Class `ExternalDataLoader` and `ExternalDataLoaderManager` are introduced. They are similar to `DataTransfer` and `DataTransferManager`. `InferenceSession` owns the manager object, and `SessionState` keeps a reference to it. Added a new method `GetExternalDataLoader` in `IExecutionProvider`. An EP can override the method to register an instance of custom external data loader. The key function in a `ExternalDataLoader` class is method `LoadTensor`: ```c++ // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan). virtual common::Status LoadTensor(const Env& env, const std::filesystem::path& data_file_path, FileOffsetType data_offset, SafeInt<size_t> data_length, Tensor& tensor) const; ``` This function can be registered by EP, going through a few layers and eventually get into `DeserializeTensorProto()` in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.	2024-08-27 12:18:52 -07:00
Caroline Zhu	b7f09d4c27	Increase timeout for orttraining-linux-gpu pipeline (#21844 ) ### Description Increase timeout to 160 minutes ### Motivation and Context - Recent runs of orttraining-linux-gpu pipeline have been timing out	2024-08-27 11:47:12 -07:00
Jian Chen	7f851f4e61	Removing docker_base_image parameter and variables (#21864 ) ### Description Removing `docker_base_image` parameter and variables. From the Cuda Packaging pipeline. ### Motivation and Context Since the docker image is hard coded in the `onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda12/Dockerfile` and `onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda11/Dockerfile` This parameter and variable is no longer needed.	2024-08-27 10:36:17 -07:00
Ye Wang	1d059b8702	Phi3 MoE cuda kernel (#21819 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <you@example.com>	2024-08-27 09:21:30 -07:00
Jiajia Qin	252222034f	[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871 ) ### Description <!-- Describe your changes. --> #21618 With this PR, the cross device copying (`MemcpyToHost`) can totally be removed for model `wav2vec2`. And the overall time becomes 48ms from 604ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-27 09:02:39 -07:00
mcollinswisc	5d54dc1462	Drop QDQ around more nodes (#21376 ) ### Description Extends the Drop QDQ optimization to remove DequantizeLinear and QuantizeLinear nodes from around operators: - Flatten - Expand - Tile - Slice - GatherElements - ReduceMin - ReduceMax ### Motivation and Context To reduce floating-point conversions in quantize inference. Mainly motivated by the Flatten case, since that will show up in graphs exported from PyTorch to ONNX. But to make the change complete, extending to a larger set of ops for which this optimization is valid. https://github.com/microsoft/onnxruntime/issues/21375 --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-08-27 16:54:37 +10:00
Tianlei Wu	6e57576988	Support Smooth Softmax in GroupQueryAttention (#21867 ) ### Description Softmax (formula 1) is like the following: ```math y_{i} = \frac{exp(x_{i})}{\sum_{i} exp(x_{i})} ``` After applying softmax, each element will be in the range of $(0, 1)$, and the elements will add up to 1, so that they can be interpreted as probabilities. However, in language model, softmax has two issues: * When all elements are -inf (for example, a whole row is masked when a query token is padding), the result is not defined since exp(-inf)=0 and divided-by-zero is encountered in the above formula. * Why do we need normalize in a way that each query word are treated as equal important (each row has sum equals to1)? Smooth Softmax (formula 2) is a modified version that introduces a smooth factor like the following: ```math s_{i} = \frac{exp(x_{i})}{1+ \sum_{i} exp(x_{i})} ``` This formula could tackle the above two issues: * It could handle the special case that all elements are -inf: the result $s_{i}$ is 0 for every element in such case. * Sum of all elements $\sum_{i}{s_{i}} = \frac{\sum_{i}{exp(x_{i})}}{1+ \sum_{i} exp(x_{i})}$ is in the range of (0, 1), so that we can train the model to assign different importance to different query words. Since exponential is prone to overflow or underflow, to get stable result, formula 3 can be used: ```math s_{i} = \frac{exp(x_{i} + c)}{exp(c)+ \sum_{i} exp(x_{i} +c)} ``` c can be any value in theory. In practical, choice of constant c shall avoid $exp(c)$ and $exp(x_{i} +c)$ overflow (or underflow) at the same time. A reasonable choice is like formula 4: ```math c=-\max_{i} \{ x_i \} ``` or apply a constraint that c <=0 like the following formula 5: ```math c=-\max(0, \max_{i} \{ x_i \}) ``` The latter one (formula 5) ensures that $s_{i}$ will fallback to formula 2 when all elements are negative. For CPU provider, smooth softmax is implemented in MLAS. CPU implementation uses formula 5. @wangyems implemented the smooth softmax in flash attention for CUDA, which requires Ampere or newer GPU. The implementation of smooth softmax in flash attention uses formula 4. --------- Co-authored-by: Ye Wang	2024-08-26 23:13:15 -07:00
Yulong Wang	99bc45dcbd	[js] add big data file to formatter ignore list (#21767 ) ### Description Add the big data file `web/test/data/ops/pad-big.jsonc` to formatter ignore list. This file slows down the formatter quite a lot at local.	2024-08-26 22:08:26 -07:00
zz002	422e6e6fb0	[VitisAI] add OpSchema, VitisAI use IKernelLookup to check supported ops, VitisAI def_builder adds TypeConstraint related processing (#21688 ) ### Description <!-- Describe your changes. --> 1. add OpSchema 2. VitisAI use IKernelLookup to check supported ops 3. VitisAI def_builder adds TypeConstraint related processing ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2024-08-26 21:16:44 -07:00
Satya Kumar Jandhyala	af18824f43	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 ) ### Description Add GatherBlockQuantized operator to JSEP. ### Motivation and Context Gemma model requires this.	2024-08-26 14:46:04 -07:00

1 2 3 4 5 ...

11643 Коммитов Все ветки Поиск

11643 Коммитов

Все ветки