Граф коммитов

11643 Коммитов

Автор SHA1 Сообщение Дата
Dmitri Smirnov 754dba2674
Change to std::fill (#21759)
### Description
Replace `memset(0)` with `std::fill(T{})`. This would ensure that all
the types are initialized in a portable way.

### Motivation and Context
Some platforms exhibit intermittent failures with NaN results.
Follow up to: https://github.com/microsoft/onnxruntime/pull/21525

Cc: @ranjitshs
2024-08-15 16:16:54 -07:00
Tianlei Wu b9f3a5d5b6
Exclude cudnn 8 DLLs from manylinux package (#21746)
### Description
It is a follow up of https://github.com/microsoft/onnxruntime/pull/21738
to exclude cudnn 8 DLLs since some python packaging pipelines (like
training package) are still using cudnn 8.9 and cuda 11.8.

### Motivation and Context

Size of python package for training pipeline increases a lot due to some
DLLs are added to package:

![image](https://github.com/user-attachments/assets/643a808e-760b-4382-ba55-57d7d722ee9a)
2024-08-15 07:48:42 -07:00
Yi Zhang 8a59b4dc4b
Move Python Training CUDA 12.2 pipeline to another pool. (#21745)
### Description
<!-- Describe your changes. -->



### Motivation and Context
[Python Training CUDA 12.2
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary)
has been always cancelled by remote provider since Aug 2nd.
But other workflows with the same pool haven't this issue.
 It looks like there're some weird things in Azure devops.
It works by using another pool. In fact, the SKU is smaller than the
old.

### Verification
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
2024-08-15 17:31:56 +08:00
Tianlei Wu 212bcc9967
Exclude cuDNN 9 and CUDA 12 DLLs from manylinux wheel (#21738)
### Description
Exclude cuDNN 9 and CUDA 12 DLLs from manylinux wheel to reduce python
package size.

### Motivation and Context

The 1.20.0 ort-nightly-gpu python wheels on linux are suddenly > 800 MB
in size. The wheels built on 1.19 release branch have a size of around
220 MB.

The size change is caused by
https://github.com/microsoft/onnxruntime/pull/19470.
2024-08-15 00:03:10 -07:00
Yulong Wang abdc31de40
[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728)
### Description

See
454996d496
for manual changes (excluded auto-generated formatting changes)

### Why

Because the toolsets for old clang-format is out-of-date. This reduces
the development efficiency.

- The NPM package `clang-format` is already in maintenance mode. not
updated since 2 years ago.
- The VSCode extension for clang-format is not maintained for a while,
and a recent Node.js security update made it not working at all in
Windows.

No one in community seems interested in fixing those.

Choose Prettier as it is the most popular TS/JS formatter.

### How to merge

It's easy to break the build:
- Be careful of any new commits on main not included in this PR.
- Be careful that after this PR is merged, other PRs that already passed
CI can merge.

So, make sure there is no new commits before merging this one, and
invalidate js PRs that already passed CI, force them to merge to latest.
2024-08-14 16:51:22 -07:00
Satya Kumar Jandhyala 6d8de1f7b8
Upgrade emsdk from 3.1.59 to 3.1.62 (#21421)
### Description
Upgrade EM SDK to 3.1.62.



### Motivation and Context
The changes are required to clear wasm64 errors.
2024-08-14 12:38:52 -07:00
Guenther Schmuelling d82f15d0e3
add Gelu opset-20 to webgpu (#21725)
https://github.com/microsoft/onnxruntime/issues/21618
2024-08-14 09:45:05 -07:00
Frank Dong a0708a0d96
avoid redundant memory allocation for external initializers (#21682)
### Description
avoid redundant memory allocation for external initializers, we will use
mmap for external initializers later so no point to allocate memory in
advance then release them later.



### Motivation and Context
In current implementation, we will:
1. Allocate memory (with desired size of current initializer) for
initializer first:
[https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L131](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L131&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340041125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=6fN57MUsergrCX%2BBS7jztWBRmc8nx19EVvn0lUJ2Gtk%3D&reserved=0)
2. For external initializer, we will point initializer to mmaped object
in memory and release previously allocated tensor:
[https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state_utils.cc#L89](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fframework%2Fsession_state_utils.cc%23L89&data=05%7C02%7Cfrdong%40microsoft.com%7C1e126797c95149aa217d08dcb781cc60%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638587015340054491%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=yBtXLc%2Bhpx3IT1%2FX0664foqQ5X5O%2Fy5XNhj4Oed%2BAt4%3D&reserved=0)

For large models, we are keep allocating and release memory for external
initializers which seems unnecessary.

For phi silica model, with this change we can reduce transient memory
usage from 4,566MB to 2,724MB. Since these redundant memory is released
quickly when we mmap external initializers so this change has no much
impact on peak memory usage.
2024-08-13 23:13:49 -07:00
Xu Xing 7172aff1cf
[js/webgpu] Fix max pool shape end with 0 (#21698)
Bug: https://github.com/microsoft/onnxruntime/issues/21386

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-13 20:59:24 -07:00
Prathik Rao e32e3575d8
pin pytorch lightning version for training CI (#21731)
### Description
<!-- Describe your changes. -->

Pins pytorch-lightning package to version 2.3.3 since version >=2.4.0
requires torch > 2.1.0 which is not compatible with cu118.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

ORT 1.19 Release Preparation
2024-08-13 20:04:56 -07:00
Yi Zhang b92908e197
[Fix] Python API doc generation (#21717)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Make Python API doc generation workflow work.

### Verification Run
https://github.com/microsoft/onnxruntime/actions/runs/10364762858
2024-08-14 08:48:29 +08:00
Dmitri Smirnov c2911bbb1c
[CUDA] Special case for K==0 in CUDA MatMul (#21525)
### Description
This change addresses a case where we multiply two matrices, and their
inner dimension is 0.
numpy and Eigen which is being used in our CPU EP implementation
correctly handle this case
and output a [M, N] matrix filled with zeros.

### Motivation and Context
This is required to support GenAI empty input Lora implementation.

Addresses: https://github.com/microsoft/onnxruntime/issues/21483
2024-08-13 11:27:05 -07:00
Scott McKay 6af5394bd7
Replace usage of jcenter in React Native build.gradle files (#21714)
### Description
<!-- Describe your changes. -->
Replace jcenter. It's deprecated and not responding.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CIs
2024-08-13 11:10:51 -07:00
liqun Fu 3439429717
Fix neural-speed ci failure (#21694)
### Description
fix
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1461029&view=logs&j=3565c00d-48fa-5c65-7ab9-a05e12e29ed0&t=e43fe03a-689e-5dc5-9ad5-9f116eba3e9d&l=6341



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2024-08-13 10:48:25 -07:00
xhcao 9c6ee89fa7
[js/webgpu] fix two errors of attention operator (#21687)
Fix two issues:
(1) scale shall be fp32 instead of f16
(2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.
2024-08-13 09:42:34 -07:00
Yi Zhang 6db3d63add
move the A100 stage to main build (#21722)
### Description
<!-- Describe your changes. -->



### Motivation and Context
We couldn't get enough A100 agent time to finish the jobs since today.
The PR makes the A100 job only runs in main branch to unblock other PRs
if it's not recovered in a short time.
2024-08-13 22:48:58 +08:00
George Wu a8462ffb61
enable qnn python arm64ec packaging (#21575)
create the x64 qnn python package as arm64ec so it can be published
publicly.
2024-08-12 22:43:17 -07:00
Sumit Agarwal c5592fdcef
[DML EP] Update DML to 1.15.1 (#21695)
### Description
Update DML runtime binary to 1.15.1



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-12 14:16:43 -07:00
jingyanwangms 154084efaa
Security Fuzz Test Fixes (#21608)
### Description
Fix address sanitizer and memory access Bug 1, 4, 5, 7, 8 found in
security fuzz test

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-11 03:28:41 -07:00
Yulong Wang 6ae7e02d34
Web CI: make multi-browser test job optional (#21669)
### Description

This job is a little bit unstable. Make it optional to avoid blocking
other PRs before we revise it.
2024-08-09 23:53:26 -07:00
Chi Lo 2abebb2a47
[TensorRT EP] No workspace size limit to TRT memory pool (#21643)
We saw some models failed to run due to OOM and can be fixed by increase
trt_max_workspace_size.
This PR makes no size limitation by default (max device memory) which is
aligned with trtexec.
2024-08-09 17:30:51 -07:00
Caroline Zhu eeef0c8aca
Enable exporting for inference when loading from buffer without behavior changes (#21601)
### Description
Added eval model buffer as optional field in Module so that you can
export for inference using the eval model stored as a buffer.

### Motivation and Context
- Resolves #21152 
- Previous solution (PR #21422) produced an eval model that was specific
to the EP's used to train because of unavoidable runtime optimizations
that changed the graph stored with the eval session.
2024-08-09 16:59:50 -07:00
Krishna Bindumadhavan 37be90c9c8
[Quant tool]: Improve symmetric quantization range update for Relu/Clip (#21573)
### Description
This PR improves the range calculation for input to Relu/Clip nodes for
the symmetric quantization case.

### Motivation and Context
Currently, the issue we face is that for the common scenario of conv
followed by relu in the symmetric quantization config, different scales
could assigned for the tensors corresponding to input & output of relu.

The downside is that this may introduce noise due to multiple re-quant,
and makes it difficult to fuse conv-relu nodes for hardware accelerators
that support fused conv-relu.

Instead, it is more efficient to assign the output range of relu as the
input range of relu / output range of upstream op wherever possible.
This adjustment is currently only being done for the asymmetric
quantization case.

For the scenario where the upstream op has multiple consumers, this
assumption could be incorrect. For this case we do not adjust the
ranges.
2024-08-09 14:48:09 -07:00
Adrian Lizarraga 390f0fd8ce
[QNN Quant tool] Fix validation of per-channel overrides for models with external data (#21656)
### Description
Fixes validation of per-channel quantization overrides by not trying to
unnecessary load the external weights.

### Motivation and Context
The `get_qnn_qdq_config()` explicitly loads models without external data
(i.e., `onnx.load_model(load_external_data=False)`). Afterwards,
`get_qnn_qdq_config()` calls `tensor_proto_to_array()`, which expects
that the external weights are stored in the current working directory.
If the external weights are stored in a different directory, then we get
a crash.

Loading the actual weight values is unnecessary because we only need the
weight shape. This PR removes the unnecessary call to
`tensor_proto_to_array()` call.
2024-08-09 14:46:52 -07:00
Satya Kumar Jandhyala 51b2044120
[JS/WebGPU] Add Dequantizelinear operator (#21642)
### Description
Added DequantizeLinear operator for JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 14:44:19 -07:00
Yifan Li 906ae77eea
[TensorRT EP] Add null_ptr check to avoid crash when running session which was failed to generate trt_engine previously (#21621)
### Description
<!-- Describe your changes. -->
Add null_ptr check to avoid crash when running session which was failed
to generate trt_engine previously


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Reported and verified by
https://github.com/microsoft/onnxruntime/issues/21567
2024-08-09 14:09:22 -07:00
saurabh 88788474b9
fix handling of multiple QuantizeLinear nodes (#21675)
### Description
This fix addresses the issue of handling multiple QLinear nodes as
outputs from the target node in OVEP. Previously, the stripping logic
only supported a single Q node, leading to incorrect stripping of
additional Q nodes.



### Motivation and Context
The OVEP stripping logic was limited to handling a single Q node as an
output from the target node. As a result, additional Q nodes were being
stripped, despite the stripping rules indicating they should be
retained.

With this fix, OVEP can now properly handle multiple Q nodes according
to the specified stripping rules, ensuring that the fate of each Q node
is correctly determined.

---------

Co-authored-by: sfatimar <sahar.fatima@intel.com>
2024-08-09 14:04:05 -07:00
Jing Fang 53a66f4e02
When quantize 4bit mamtul, force upgrade onnx domain opset to 21 (#21693)
### Description
When quantize MatMul to DQ + MatMul using 4bit QDQ tool chain,
previously the opsets of domains are not changed.
Now, when quantize MatMul to DQ + MatMul in QDQ format, force upgrade
onnx domain to opset 21.

### Motivation and Context
In QDQ format, DQ with int4 and blocked quantization is used. This
requires DQ with opset >= 21.
When quantize MatMul to DQ + MatMul, force upgrade onnx domain to opset
21.
2024-08-09 13:50:12 -07:00
duanshengliu c6a73defb8
Fix wrong per-tensor quantized weight type for matmul (#21347)
### Description
<!-- Describe your changes. -->
Fix wrong per-tensor quantized weight type for matmul.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix related bug as described in
https://github.com/microsoft/onnxruntime/issues/21346
2024-08-09 13:36:25 -07:00
Jing Fang f30581ed2c
[CPU EP] Add block quantized Gather contrib op (#21630)
### Description
Add a gather that supports block-quantized input data.


### Motivation and Context
To support Web inference scenario with quantized vocabulary embeddings.
2024-08-09 12:15:11 -07:00
Sumit Agarwal 702b2e28e0
Fuse Pad even if Cast is present in-between (#21640)
### Description
This change enhances the existing Pad Fusion to fuse Pad even if a Cast
operator is present between Pad and Conv/MaxPool/AveragePool. It keeps
the Cast as it is.
<pre>
/*
 * Before Fusion:
 *     Pad
 *      |
 *    Cast (Optional)
 *      |
 *   Conv/MaxPool/AveragePool
 * 
 * After Fusion:
 *    Cast (Optional)
 *      |
 *   Conv/MaxPool/AveragePool
 */
</pre>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 06:52:59 -07:00
Yulong Wang e6e4047a77
[js/web] update the build script for webgpu to enable model dump by default (#19707)
### Description
update the build script for webgpu to enable model dump by default

Now if using build_jsep.bat to build debug, the model dump is enabled.
Using
[`optimizedModelFilePath`](https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#optimizedModelFilePath)
in session option can dump the optimized model in browser

### Motivation and Context
Helps to debug/rule out problems may related to model optimizer.
2024-08-09 05:55:34 -07:00
Yulong Wang f4ec85259a
[js/web] allow relative path matching (#21657)
### Description
<!-- Describe your changes. -->

This change allows to match external data path like `a.data` to
`./a.data`.


<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 03:13:40 -07:00
Yulong Wang ae2b4d31ea
update pipeline list for run_CIs_for_external_pr.py (#21665)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 03:08:47 -07:00
Tianlei Wu 9334d4e362
[CUDA] Fix MHA mask (#21655)
### Description
Fix a check of mask type introduced by me in a recent commit. Add tests.
2024-08-09 01:31:00 -07:00
Scott McKay 410ae94e9e
Use zipped xcframework in nuget package (#21663)
### Description
<!-- Describe your changes. -->
The xcframework now uses symlinks to have the correct structure
according to Apple requirements. Symlinks are not supported by nuget on
Windows.

In order to work around that we can store a zip of the xcframeworks in
the nuget package.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix nuget packaging build break
2024-08-09 17:38:18 +10:00
Tianlei Wu a46e49b439
Unblock migraphx and linux GPU training ci pipelines (#21662)
### Description
* Fix migraphx build error caused by
https://github.com/microsoft/onnxruntime/pull/21598:
Add a conditional compile on code block that depends on ROCm >= 6.2.
Note that the pipeline uses ROCm 6.0.

Unblock orttraining-linux-gpu-ci-pipeline and
orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline
pipelines:
* Disable a model test in linux GPU training ci pipelines caused by
https://github.com/microsoft/onnxruntime/pull/19470:
Sometime, cudnn frontend throws exception that cudnn graph does not
support a Conv node of keras_lotus_resnet3D model on V100 GPU.
Note that same test does not throw exception in other GPU pipelines. The
failure might be related to cudnn 8.9 and V100 GPU used in the pipeline
(Amper GPUs and cuDNN 9.x do not have the issue).
The actual fix requires fallback logic, which will take time to
implement, so we temporarily disable the test in training pipelines.
* Force install torch for cuda 11.8. (The docker has torch 2.4.0 for
cuda 12.1 to build torch extension, which it is not compatible cuda
11.8). Note that this is temporary walkround. More elegant fix is to
make sure right torch version in docker build step, that might need
update install_python_deps.sh and corresponding requirements.txt.
* Skip test_gradient_correctness_conv1d since it causes segment fault.
Root cause need more investigation (maybe due to cudnn frontend as
well).
* Skip test_aten_attention since it causes assert failure. Root cause
need more investigation (maybe due to torch version).
* Skip orttraining_ortmodule_distributed_tests.py since it has error
that compiler for torch extension does not support c++17. One possible
fix it to set the following compile argument inside setup.py of
extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17'].
However, due to the urgency of unblocking the pipelines, just disable
the test for now.
* skip test_softmax_bf16_large. For some reason,
torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so
the test was run in CI, but V100 does not support bf16 natively.
* Fix typo of deterministic

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-08 19:44:15 -07:00
Yulong Wang 5e66fcc703
[js/web] allow op test to use f16 type for inputs/outputs (#21664)
### Description
allow op test to use f16 type for inputs/outputs.

This PR introduces "@petamoriken/float16" as Float16Array polyfill but
restricts it to be only used for test runner.
2024-08-08 09:56:37 -07:00
Scott McKay d616025884
Match changes in gh-pages PR (#21628)
### Description
<!-- Describe your changes. -->
Update to match #21627 and make the info for Split consistent.

As a Split that doesn't split anything is a no-op it doesn't seem
meaningful to call that limitation out in the docs.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-08 10:29:15 +10:00
Xiang Zhang c93b92a43f
fix wrong check for tree ensemble regressor (#21595)
Fix missed ORT_ENFORCE check which caused heap buffer overflow because
of out of bound access.
2024-08-07 16:27:18 -07:00
Yi Zhang 621b16f478
Pin transformer and optimum version (#21650)
### Description
<!-- Describe your changes. -->



### Motivation and Context
To fix whisper test failure
2024-08-07 17:47:15 +08:00
duanshengliu b95aa0563f
Improve speed in combining per-channel data (#21563)
### Description
<!-- Describe your changes. -->
Improve speed in combining `per-channel` data for using a single
`np.concatenate` instead of multiple `np.concatenates` within a for
loop.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix the issue https://github.com/microsoft/onnxruntime/issues/21562

Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>
2024-08-06 16:23:20 -07:00
Edward Chen 4ad87ca2e1
Fix usability checker CoreML config file path. (#21626)
Fix usability checker CoreML config file path. The files got renamed but one place was still referring to the old name.
2024-08-06 12:42:57 -07:00
Adrian Lizarraga 0acefc7988
[QNN EP] Update QNN SDK to 2.25 (#21623)
### Description
- Update pipelines to use QNN SDK 2.25 by default
- Update ifdef condition to apply workaround for QNN LayerNorm
validation bug to QNN SDK 2.25 (as well as 2.24)



### Motivation and Context
Use the latest QNN SDK
2024-08-06 09:08:48 -07:00
Yi Zhang 0d1da41ca8
Fix docker image layer caching to avoid redundant docker building and transient connection exceptions. (#21612)
### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-08-06 21:37:09 +08:00
liqun Fu f6f9657fb6
Fix typos so to call correct vnni functions under vnni condition (#21625)
### Description
Fix 2 typos in mlas avx 4bit gemm implementation to call correct vnni
functions under vnni condition



### Motivation and Context
needed for 1.19.0 release

Signed-off-by: liqunfu <liqun.fu@microsoft.com>
2024-08-05 20:52:26 -07:00
Yifan Li 1f907a23f0
[EP Perf] Update cmake (#21624)
### Description
<!-- Describe your changes. -->
update script with cmake 3.30 to unblock EP Perf


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-05 16:41:56 -07:00
Edward Chen a5ce65d87a
Clean up some mobile package related files and their usages. (#21606)
The mobile packages have been removed.
2024-08-05 16:38:20 -07:00
Scott McKay bcc01ac123
Updates to apple packaging (#21611)
### Description
<!-- Describe your changes. -->
Add ability to test packaging without rebuilding every time.
Add ability to comment out some platforms/architectures without the
scripts to assemble the c/obj-c packages breaking.
Update a couple of commands to preserve symlinks.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make debugging packaging issues faster.
Creates correct package for mac-catalyst and doesn't require setting
symlinks via bash script.
2024-08-06 08:50:56 +10:00
Prathik Rao 134f47743e
bumps up version in main from 1.19 -> 1.20 (#21588)
Bump up version in main from 1.19.0 to 1.20.0 since the release branch
has been cut.
2024-08-05 15:46:04 -07:00