## Describe your changes
Added two new passes that split a transformers model into multiple
models. The split is done on the transformers layers.
- `CaptureSplitInfo`: Used the pytorch/hf model to assign split ids to
the transformers layers and saves them as model attributes. The
conversion pass or model builder pass adds the split assignments and
model metadata.
- `SplitModel`: Uses the split_assignments metadata to break the model
into splits.
- If `include_all_nodes` is `True`: Nodes before first split or outside
the splits are assigned to the first split. Nodes after the last split
are assigned to the last split.
- If `include_all_nodes` is `False`: Such nodes are ignored. This means
the embedding, attention mask computation, final norm and lm heads don't
appear in the splits. We expect the user to extract them separately.
`Cache.save_model` now supports saving composite models.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Use user provided values as searchable suggestions
Update to conditionally turn user provided input values into Categorical
search suggestions.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Clarify help messages for the command line options.
Fix hyper link to the doc in help message.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
fix doc link
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Refactor cache system.
- Rename cloud cache to shared cache.
- Unify local cache structure and shared cache structure.
- Update cache structure and config.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Rename `export-adapters` to `convert-adapters`
- Support `onnx_adapter` format in `load_weights` and `save_weights`.
- These are exposed in the `generate-adapters` and `convert-adapters` as
`adapter_format` option. The default value is still `numpy` since the
new ort format is only available in the nightly package.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
- Onnxruntime originally only support sub-4bit quantization using the
`MatMulNBits` contrib operator. But int4/uint4 are now officially
supported by onnx using the standard QDQ operators and is used by EPs
such as DirectML and QNN.
- This pass converts a MatMulNBits node into a sequence of
`DequantizeLinear`->`Transpose`(optional)->`MatMul` nodes. This way,
existing quantization tools don't need to be modified to export in QDQ
format.
- Updated the `quantize` CLI to use QDQ encoding by default. There is an
option to use `MatMulNBits` if they pefer.
## Describe your changes
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
New `MatMulNBitsToQDQ` pass to convert `MatMulNBits` to `DQ->MatMul`.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Document quantize command
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Use separate section for CLI documentation.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- GPTQ/AWQ have `pack_model_for_onnx_conversion` option which modifies
the quantization algorithm to pack the quantized model for onnx
conversion using `MatMulNBits` contrib op. However, the output model is
a pytorch model object which can only be used for onnx export. We cannot
apply any other passes on it. Also, the awq pass doesn't pack the
weights at all.
- Implemented a new functionality where the quantized models are
repacked right before export. This framework is extensible to newer
quantization methods which implement their own QuantLinear classes. The
output of the quantization passes are now hf models if the input is an
hf model.
- GPTQ/AWQ also support models with adapters by just quantizing the base
model and leaving the adapters as is in float precision.
**Note:** Support for QDQ export will be added in a follow-up PR.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Cleanup Olive command docs
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Remove the `pack_weights` option from the `ExtractAdapters` pass. It is
not used and requires lora specific information (lora target modules) to
do the packing. Also makes the logic unnecessarily complicated.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Restrict initialization of passes to what's called out in pass_flows
Config can list any number of passes, but the ones that are actually run
are the ones listed in pass_flows. This avoids the case where a pass may
not be able to successfully be initialized which shouldn't matter
because its not going to be run unless it is explicitly called out in
the pass_flows.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Update cli doc
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
**Engine**:
- Improve the output folder structure of a workflow run. The current
structure was meant for multiple-ep, multiple-passflow worfklows but
that is not the common usage for olive.
- Unnecessary nesting for accelerator spec and pass flows is removed for
single ep, single passflow scenario.
- `output_name` is removed from both pass config and engine config.
- The behavior of `output_name` is arbitrary. User can get the output in
a specific folder by directly providing the `output_dir` like
`parent-dir/specific-dir`.
- `output_name` was allowed for pass config to save intermediate models.
But this can be achieved by providing multiple pass flows like `[[A, B],
[A, B, C]]`. This is cleaner than the former.
- Refer to `Engine.run` for more details on the new output structure.
**CLI**:
- `add_model_options` is made configurable so that only the desired
model type related options are added.
- `save_output_model` uses the new engine output directory structure to
copy the output model into the final output directory.
- `finetune` command separated into `finetune` and `generate-adapter`
commands. These commands can be chained as shown in the llama2 multilora
notebook.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add local pytorch model and cli-produced-model support for cli.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Add additional files support for cloud cache.
- Fix minor bugs.
- New AML job can't delete existing blobs. So one option here is to call
AML Data API to archive the old data. Another option is to using current
time as folder name to distinguish different runs. This PR takes the
second proposal.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Code quality improvements
* Make use/requirement of user_script/script_dir params by pass explicit.
* Add caching for UserModule load calls to avoid repetitive loading of the same scripts.
* Add validation for DataConfig::user_script and DataConfig::type param.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Remove `eval_dataset_size` option which was used to split train data
into train and eval. this is a specific use case which is supported
using data configs that use slices of the train data. We want to remove
as much of the data processing logic from the pass to make it generic
- Data collator is kept for now since the trainer has its own batch size
(which depends on batch size params as well as number of gpus). It is
more generic now.
- olive `Dataset.to_hf_dataset` removed. The logic has been improved and
moved to the pass since it's specific to it.
- Remove `phi` and `open_llama` qlora examples since these are copies of
the examples for `llama`, `phi2` and `phi3`. Keep multiple copies of
essentially the same workflow is hard to maintain when we make changes.
- Other lora examples updated to remove redundant options.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Remove support for `ORTTrainer`. This feature is not used and hasn't
been validated for newer models or transformers version.
- Set a minimum version for accelerate and remove the logic to bypass
qlora multi-gpu bug #1117
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Rename `label_cols` to `label_col` and make its type string. We only use
`label_cols[0]` which makes the name and list typing confusing
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Update remote workflow and huggingface doc
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Use custom cuss for header tags for between contrast between levels.
Especially between h3 and h4 which are used in the cli command auto
docs.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add --packages to cli to list required packages
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
1. Integration QuaRot into olive, which is a new Quantization scheme
based on Rotations, which is able to quantize LLMs end-to-end.
2. Add phi3-quarot example
Release Note: New pass QuaRot to quantize transformer to improve
performance and reduce memory footprint.
Example:
`w_gptq` needs flash-atten which only supports Ampere GPUs or newer.
Test on A100:
<img width="1276" alt="image"
src="https://github.com/user-attachments/assets/55d8512f-ebb0-4163-a4ef-0875d37587f9">
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [x] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- The `pair` dataset type was originally added to replicate the original
qlora implementation's dataprocessing options. However, it is not used
and is has limited use case. Drop support for this dataset type and only
keep the common `corpus` type.
- Since pair is not a concept anymore, `source_max_len` is renamed to
`max_seq_len` to align the parameter name to how it's commonly known as
(such as in hf trl, etc).
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
`pair` dataset type has been removed. `source_max_len` renamed to
`max_seq_len`.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Deprecate dataloader_func and related config parameters
* Update quantization passes to use DataConfig
* Update relevant tests, examples and docs
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Clean up post metric change
* Update documentation
* Remove references to deprecated keys in Metric/user_config
* Move Metric/user_config/data_dir & Metric/user_config/batch_size into
Metric/evaluate_func_args/* as they apply only to custom evaluation
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
* Add cloud model cache doc.
* Fix typo.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add workflow host to run Olive workflow on a AzureML system.
- User can submit workflow run to AzureML system.
- The local machine can be safely turned off while the workflow can
still be running on the AML workspace compute.
## TODO:
- User can retrieve the logs and artifacts from the remote VM by
`--retrieve` flag.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Added `set -e` to the doc build step since `make html` failure doesn't
cause the build to fail automatically.
- Fix underline length
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Added a new pass of Qnn Mixed precision overrides.
This pass preprocesses the model for mixed precision quantization by
resolving constraints that each operator has when being converted to QNN
operator. Constraints refer to situations where certain tensor cannot be
quantized to 16 bits standalone but rather neighbouring tensors as well
in order to have valid operators
- Update onnx quantization to let it accept the init_overrides config
from last model output which is enabled only for qnn config preparation.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Remove the concept of `data_root` entirely. It is not used at all in
any of our examples and we only have one customer who use a pinned
version of Olive. `data_root` is too complicated to maintain as it is
tied into all parts of Olive and most passes, etc ignore it. The cost of
maintaining the feature is not worth it.
- If needed, we can revisit this requirement and maybe create a new type
of resource path that gets resolved based on an environment variable
(like `DATA_ROOT`) or the workflow/engine resolves at the beginning.
There is no need to propagate the `data_root` throught olive.
- All cache related functionalities are now abtracted under a new
`olive.engine.cache.OliveCache` class.
- `prepare_non_local_model` is now generalized as
`OliveCache.prepare_resources_for_local`. This method sets up config for
local execution.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add dockerfile packaging type. This packaging type will generate a
Dockerfile that includes onnxruntime package and output models.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Separate the bert cuda example test into local and aml tests
- AML test:
- doesn't require to be run on a GPU pipeline agent.
- Reduced the cases since we only want to test CUDA EP on aml. No need
to test the pass_flows or search strategies. Saves test time since the
aml gpu cluster is small and has longer queue times.
- Updated base image with cuda 11.8. Both torch and ort don't support
11.6 anymore
- Updated the conda yamls
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Provide built in config for Random data with kv cache
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Update OrtPerfTuning pass to accept data only using data_config
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link