Граф коммитов

261 Коммитов

Автор SHA1 Сообщение Дата
Jambay Kinley 1e42f0b2b2
SplitModel: Pass to split transformers model (#1436)
## Describe your changes
Added two new passes that split a transformers model into multiple
models. The split is done on the transformers layers.
- `CaptureSplitInfo`: Used the pytorch/hf model to assign split ids to
the transformers layers and saves them as model attributes. The
conversion pass or model builder pass adds the split assignments and
model metadata.
- `SplitModel`: Uses the split_assignments metadata to break the model
into splits.
- If `include_all_nodes` is `True`: Nodes before first split or outside
the splits are assigned to the first split. Nodes after the last split
are assigned to the last split.
- If `include_all_nodes` is `False`: Such nodes are ignored. This means
the embedding, attention mask computation, final norm and lm heads don't
appear in the splits. We expect the user to extract them separately.

`Cache.save_model` now supports saving composite models.
  
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-25 16:19:09 -07:00
shaahji ed1011e4ae
Deprecate "disable_search" from Pass config (#1431) 2024-10-24 02:08:57 -07:00
shaahji b2bcfb852a
Use user provided values as searchable suggestions (#1428)
## Use user provided values as searchable suggestions

Update to conditionally turn user provided input values into Categorical
search suggestions.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-21 13:02:37 -07:00
devang-ml acc87632d1
Clean up CLI docs (#1429)
## Describe your changes
Clarify help messages for the command line options.
Fix hyper link to the doc in help message. 

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-20 16:51:32 -07:00
Xiaoyu bd6ef9e7a2
fix doc link (#1423)
## Describe your changes

fix doc link

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-16 13:04:10 -07:00
Xiaoyu 86a443982b
Refactor cache (#1400)
## Describe your changes

Refactor cache system.

- Rename cloud cache to shared cache.
- Unify local cache structure and shared cache structure.
- Update cache structure and config.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-10 10:27:29 -07:00
Jambay Kinley c473a3b734
CLI: `export-adapters`->`convert-adapters`, `onnx_adapter` format support (#1408)
## Describe your changes
- Rename `export-adapters` to `convert-adapters`
- Support `onnx_adapter` format in `load_weights` and `save_weights`. 
- These are exposed in the `generate-adapters` and `convert-adapters` as
`adapter_format` option. The default value is still `numpy` since the
new ort format is only available in the nightly package.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-08 11:56:18 -07:00
Jambay Kinley add3f639a6
MatMulNBitsToQDQ: Pass to convert MatMulNBits contrib op to standard QDQ (#1403)
- Onnxruntime originally only support sub-4bit quantization using the
`MatMulNBits` contrib operator. But int4/uint4 are now officially
supported by onnx using the standard QDQ operators and is used by EPs
such as DirectML and QNN.
- This pass converts a MatMulNBits node into a sequence of
`DequantizeLinear`->`Transpose`(optional)->`MatMul` nodes. This way,
existing quantization tools don't need to be modified to export in QDQ
format.
- Updated the `quantize` CLI to use QDQ encoding by default. There is an
option to use `MatMulNBits` if they pefer.

## Describe your changes

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
   New `MatMulNBitsToQDQ` pass to convert `MatMulNBits` to `DQ->MatMul`.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-07 15:32:58 -07:00
Jambay Kinley 854f4e599e
CI: Fix test dependencies (#1409)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-07 14:55:58 -07:00
devang-ml ab80d653b1
Document quantize command (#1396)
## Describe your changes
Document quantize command
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-02 10:25:43 -07:00
shaahji 5bd14bd64f
Rename OrtPerfTuning to OrtSessionParamsTuning (#1380) 2024-10-01 18:45:25 -07:00
devang-ml fd588a663b
Use separate section for CLI documentation. (#1393)
## Describe your changes
Use separate section for CLI documentation.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-30 17:12:42 -07:00
Jambay Kinley 1b47f07d7b
GPTQ/AWQ: Support adapters and onnx export (#1390)
## Describe your changes
- GPTQ/AWQ have `pack_model_for_onnx_conversion` option which modifies
the quantization algorithm to pack the quantized model for onnx
conversion using `MatMulNBits` contrib op. However, the output model is
a pytorch model object which can only be used for onnx export. We cannot
apply any other passes on it. Also, the awq pass doesn't pack the
weights at all.
- Implemented a new functionality where the quantized models are
repacked right before export. This framework is extensible to newer
quantization methods which implement their own QuantLinear classes. The
output of the quantization passes are now hf models if the input is an
hf model.
- GPTQ/AWQ also support models with adapters by just quantizing the base
model and leaving the adapters as is in float precision.

**Note:** Support for QDQ export will be added in a follow-up PR.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-30 17:09:52 -07:00
devang-ml 1865dd8c31
Cleanup Olive command docs (#1373)
## Describe your changes
Cleanup Olive command docs
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-30 14:56:17 -07:00
Jambay Kinley 2a2e96784b
ExtractAdapters: remove `pack_weights` option (#1378)
## Describe your changes
Remove the `pack_weights` option from the `ExtractAdapters` pass. It is
not used and requires lora specific information (lora target modules) to
do the packing. Also makes the logic unnecessarily complicated.
 
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-30 13:54:33 -07:00
shaahji 304b81d740
Restrict initialization of passes to what's called out in pass_flows (#1374)
## Restrict initialization of passes to what's called out in pass_flows

Config can list any number of passes, but the ones that are actually run
are the ones listed in pass_flows. This avoids the case where a pass may
not be able to successfully be initialized which shouldn't matter
because its not going to be run unless it is explicitly called out in
the pass_flows.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-26 15:10:28 -07:00
Xiaoyu 1b3671ec63
Update cli doc (#1367)
## Describe your changes

Update cli doc
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-24 17:02:53 -07:00
Jambay Kinley 9301aae1ec
Engine: Improve output structure, CLI: Configurable model options, Separate `finetune`+`generate-adapter` (#1361)
## Describe your changes
**Engine**:
- Improve the output folder structure of a workflow run. The current
structure was meant for multiple-ep, multiple-passflow worfklows but
that is not the common usage for olive.
- Unnecessary nesting for accelerator spec and pass flows is removed for
single ep, single passflow scenario.
- `output_name` is removed from both pass config and engine config.
- The behavior of `output_name` is arbitrary. User can get the output in
a specific folder by directly providing the `output_dir` like
`parent-dir/specific-dir`.
- `output_name` was allowed for pass config to save intermediate models.
But this can be achieved by providing multiple pass flows like `[[A, B],
[A, B, C]]`. This is cleaner than the former.
- Refer to `Engine.run` for more details on the new output structure.

**CLI**:
- `add_model_options` is made configurable so that only the desired
model type related options are added.
- `save_output_model` uses the new engine output directory structure to
copy the output model into the final output directory.
- `finetune` command separated into `finetune` and `generate-adapter`
commands. These commands can be chained as shown in the llama2 multilora
notebook.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-18 15:42:06 -07:00
Xiaoyu 9fa260403d
Add local pytorch model and cli-produced-model support for cli (#1355)
## Describe your changes

Add local pytorch model and cli-produced-model support for cli.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-09-17 13:05:48 -07:00
Xiaoyu 7389f81bcd
Fix editor format (#1333)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-26 14:22:10 -07:00
Xiaoyu 0a57b9d009
Add additional files support for cloud cache (#1318)
## Describe your changes

- Add additional files support for cloud cache.
- Fix minor bugs.
- New AML job can't delete existing blobs. So one option here is to call
AML Data API to archive the old data. Another option is to using current
time as folder name to distinguish different runs. This PR takes the
second proposal.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-21 10:48:43 -07:00
trajep 7ab3f92a7b
🏪 Support opversion conversion for models with external data (#1322)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-21 16:16:08 +08:00
devang-ml 2050ee3256
Reorganize Features section of Olive docs (#1323) 2024-08-19 18:09:58 -07:00
Jambay Kinley 2f592507dc
HfModelHandler: Use `optimum` for automatic `io_config` and `dummy_inputs` (#1317) 2024-08-15 22:44:41 -07:00
shaahji 56f131b7b2
Code quality improvements (#1313)
## Code quality improvements

* Make use/requirement of user_script/script_dir params by pass explicit.
* Add caching for UserModule load calls to avoid repetitive loading of the same scripts.
* Add validation for DataConfig::user_script and DataConfig::type param.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-14 11:13:24 -07:00
devang-ml 3b74878d57
Document LoRA feature (#1316)
## Describe your changes
Document LoRA feature
2024-08-14 10:44:01 -07:00
Jambay Kinley 35f29f12db
LoRA: Remove `eval_dataset_size`, Generalize data collator, Remove redundant examples (#1308)
## Describe your changes
- Remove `eval_dataset_size` option which was used to split train data
into train and eval. this is a specific use case which is supported
using data configs that use slices of the train data. We want to remove
as much of the data processing logic from the pass to make it generic
- Data collator is kept for now since the trainer has its own batch size
(which depends on batch size params as well as number of gpus). It is
more generic now.
- olive `Dataset.to_hf_dataset` removed. The logic has been improved and
moved to the pass since it's specific to it.
- Remove `phi` and `open_llama` qlora examples since these are copies of
the examples for `llama`, `phi2` and `phi3`. Keep multiple copies of
essentially the same workflow is hard to maintain when we make changes.
- Other lora examples updated to remove redundant options.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-13 17:35:21 -07:00
Jambay Kinley 6ef4bca1de
LoRA: Remote `ORTTrainer` support, Minimum `accelerate` version (#1303)
## Describe your changes
- Remove support for `ORTTrainer`. This feature is not used and hasn't
been validated for newer models or transformers version.
- Set a minimum version for accelerate and remove the logic to bypass
qlora multi-gpu bug #1117

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-12 20:04:39 -07:00
Jambay Kinley d98ebf0047
Data: `label_cols` -> `label_col` (#1302)
## Describe your changes
Rename `label_cols` to `label_col` and make its type string. We only use
`label_cols[0]` which makes the name and list typing confusing

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-12 20:02:53 -07:00
Xiaoyu 9e2c817b12
Update remote workflow and huggingface doc (#1293)
## Describe your changes

Update remote workflow and huggingface doc

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-12 14:12:26 -07:00
Jambay Kinley c73098eff3
Docs: Custom CSS for header tags (#1285)
## Describe your changes
Use custom cuss for header tags for between contrast between levels.
Especially between h3 and h4 which are used in the cli command auto
docs.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-07 15:35:14 -07:00
Jambay Kinley ab2b248b74
CLI: `finetune` command (#1277) 2024-08-05 19:21:24 -07:00
Xiaoyu 807fca0baa
Add --packages to cli to list required packages (#1256)
## Describe your changes

Add --packages to cli to list required packages
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-02 14:53:11 -07:00
Jambay Kinley ff929df6a3
Make config types case insensitive (#1258) 2024-08-01 22:52:20 -07:00
trajep 63d75f158e
🤖 Quarot (#1268)
## Describe your changes

1. Integration QuaRot into olive, which is a new Quantization scheme
based on Rotations, which is able to quantize LLMs end-to-end.
2. Add phi3-quarot example

Release Note: New pass QuaRot to quantize transformer to improve
performance and reduce memory footprint.

Example:
`w_gptq` needs flash-atten which only supports Ampere GPUs or newer.

Test on A100:
<img width="1276" alt="image"
src="https://github.com/user-attachments/assets/55d8512f-ebb0-4163-a4ef-0875d37587f9">


## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [x] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-02 12:17:47 +08:00
Jambay Kinley bfb05cefe8
Text-gen: Remove `pair` dataset_type, `source_max_len` -> `max_seq_len` (#1273)
## Describe your changes
- The `pair` dataset type was originally added to replicate the original
qlora implementation's dataprocessing options. However, it is not used
and is has limited use case. Drop support for this dataset type and only
keep the common `corpus` type.
- Since pair is not a concept anymore, `source_max_len` is renamed to
`max_seq_len` to align the parameter name to how it's commonly known as
(such as in hf trl, etc).

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
`pair` dataset type has been removed. `source_max_len` renamed to
`max_seq_len`.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-08-01 11:15:57 -07:00
devang-ml 1cc9f90860
Fix remote workflow section title (#1271)
Fix remote workflow section title
2024-07-31 09:40:50 +08:00
shaahji e7fe006fad
Deprecate dataloader_func and related config parameters (#1253)
## Deprecate dataloader_func and related config parameters

* Update quantization passes to use DataConfig
* Update relevant tests, examples and docs

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-30 01:18:05 -07:00
shaahji 6b93df3a49
Clean up post metric change (#1247)
## Clean up post metric change

* Update documentation
* Remove references to deprecated keys in Metric/user_config
* Move Metric/user_config/data_dir & Metric/user_config/batch_size into
Metric/evaluate_func_args/* as they apply only to custom evaluation

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-22 15:25:49 -07:00
Xiaoyu 7c46bab031
Add cloud model cache doc (#1246)
## Describe your changes

* Add cloud model cache doc.
* Fix typo.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-22 14:43:48 -07:00
Jambay Kinley 076e792fee
`NestedConfigs`: config fields don't need nesting (#1245) 2024-07-22 07:34:16 -07:00
Xiaoyu 79bcf0bfad
Add workflow host (#1203)
## Describe your changes

Add workflow host to run Olive workflow on a AzureML system. 

- User can submit workflow run to AzureML system.
- The local machine can be safely turned off while the workflow can
still be running on the AML workspace compute.

## TODO:

- User can retrieve the logs and artifacts from the remote VM by
`--retrieve` flag.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-18 16:46:51 -07:00
Jambay Kinley aab163c561
Fix doc (#1242)
## Describe your changes
- Added `set -e` to the doc build step since `make html` failure doesn't
cause the build to fail automatically.
- Fix underline length

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-17 22:06:22 -07:00
Jambay Kinley d325da074e
`HfModelHandler` separated from `PyTorchModelHandler` (#1239) 2024-07-17 20:31:04 -07:00
trajep d6b1a6061e
🌜 Qnn Mixed precision init_overrides processing (#1234)
## Describe your changes

- Added a new pass of Qnn Mixed precision overrides.

This pass preprocesses the model for mixed precision quantization by
resolving constraints that each operator has when being converted to QNN
operator. Constraints refer to situations where certain tensor cannot be
quantized to 16 bits standalone but rather neighbouring tensors as well
in order to have valid operators

- Update onnx quantization to let it accept the init_overrides config
from last model output which is enabled only for qnn config preparation.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-18 11:26:55 +08:00
Jambay Kinley 0d580b0fe5
Remove `data_root`, `prepare_resources_for_local` for all configs (#1235)
## Describe your changes
- Remove the concept of `data_root` entirely. It is not used at all in
any of our examples and we only have one customer who use a pinned
version of Olive. `data_root` is too complicated to maintain as it is
tied into all parts of Olive and most passes, etc ignore it. The cost of
maintaining the feature is not worth it.
- If needed, we can revisit this requirement and maybe create a new type
of resource path that gets resolved based on an environment variable
(like `DATA_ROOT`) or the workflow/engine resolves at the beginning.
There is no need to propagate the `data_root` throught olive.
- All cache related functionalities are now abtracted under a new
`olive.engine.cache.OliveCache` class.
- `prepare_non_local_model` is now generalized as
`OliveCache.prepare_resources_for_local`. This method sets up config for
local execution.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-15 19:54:40 -07:00
Xiaoyu 97df601aae
Add dockerfile packaging type (#1225)
## Describe your changes

Add dockerfile packaging type. This packaging type will generate a
Dockerfile that includes onnxruntime package and output models.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-13 01:46:10 -07:00
Jambay Kinley 3980c8c61a
CI: Separate Bert CUDA local and aml test (#1232)
## Describe your changes
- Separate the bert cuda example test into local and aml tests
- AML test:
  - doesn't require to be run on a GPU pipeline agent. 
- Reduced the cases since we only want to test CUDA EP on aml. No need
to test the pass_flows or search strategies. Saves test time since the
aml gpu cluster is small and has longer queue times.
- Updated base image with cuda 11.8. Both torch and ort don't support
11.6 anymore
  - Updated the conda yamls

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-12 15:29:21 -07:00
trajep 064e9a60f5
🗝️ Provide built in config for Random data with kv cache (#1214)
## Describe your changes

Provide built in config for Random data with kv cache

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-12 16:29:34 +08:00
shaahji fc4fc903f0
Update OrtPerfTuning pass to accept data only using data_config (#1220)
## Update OrtPerfTuning pass to accept data only using data_config

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-07-11 15:58:38 -07:00