Граф коммитов

781 Коммитов

Автор SHA1 Сообщение Дата
Xiaoyu a27150ccb4
Add generate_config_file option to cli (#1568)
## Describe your changes

Add generate_config_file option to cli.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-23 17:47:25 -08:00
Xiaoyu 0b46917a81
Add options document (#1566)
## Describe your changes

- Add options page.
- Update document for recent changes.
- Rename `API reference` to `Reference`.
- Remove Engine, Metrics, Evaluators, Resource Path API page.
- Remove advanced users page.
- Add subsections to Pass page.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-23 16:03:46 -08:00
Jambay Kinley fb0d2284f1
OnnxStaticQuantization: Remove `quant_preprocess=False` default for qnn ep (#1565)
## Describe your changes
`quant_preprocess` was originally set to `False` by default because I
thought it might not be compatible with QNN EP. However, it is needed
for most models for the quantization to work properly:
- The quantizer needs a shape inferred model to quantize some tensors.
- Model quality is bad if constants are not made into initializers (see
#1552).

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-23 16:02:30 -08:00
Zac 28beb3f5c6
Fix: Use correct cache directory for file logging (#1564)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-22 19:38:09 -08:00
Jambay Kinley ab72d6905e
Offline QuaRot Implementation (#1556)
## Describe your changes
- Previous QuaRot pass is replaced with our own implementation of the
pass which only performs the offline weight rotation.
- The online hadamard rotation parts are not relevant to us since it
involves reimplementing the model architecture or updating them
dynamically to add the input/kv rotation functions. Moreover, these are
not compatible with onnx export.
- This pass does not do any quantization. The rotated output model
should be subsequently quantized using GPTQ and/or QDQ passes.
- All usage of quarot from the examples and cli are removed. New
examples and cli options will be added once E2E validation of Rotate ->
GPTQ -> QDQ workflows is complete.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-22 15:30:38 -08:00
Jambay Kinley e510074cfe
Add huggingface `ModelWrapper` (#1555)
## Describe your changes
- Implemented a new huggingface `ModelWrapper` that acts as in interface
with huggingface models. It keeps maps for different model types that
allows the user to get model attributes and submodules.
- All code using the previous mappings directly have been updated
accordingly.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-21 18:04:45 -08:00
Jambay Kinley cdb8693b38
Hf data preprocessing: Truncate dataset if `max_samples` is provided (#1561)
## Describe your changes
- For huggingface data preprocessing (except text-gen which has it's own
logic), truncate the data before tokenization if `max_samples` is
provided.
- There is no need to tokenizer and process the whole dataset if only a
subset is going to be used. This is useful for large datasets where the
tokenized data might be too large to fit in memory.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-21 17:37:44 -08:00
shaahji be1278ca4f
Validate pass config before instantiating the pass (#1553)
## Validate pass config before instantiating the pass

* Use of search point should be limited to engine logic only. Rest of
the Olive implementation should receive a validated configuration to
use.
* Validation is for complete configuration and not merely for a search
point.
* Fixed a few issues related to use of BasePassConfig vs.
FullPassConfig.
* Add local caching to OlivePackageConfig for loaded modules.
* Renamed a few variables in engine logic to be explicit about use of
pass config vs. pass run config.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-21 14:45:42 -08:00
Jambay Kinley 92f431e812
GraphSurgeries: Add pass to olive_config.json, Resolve output_model_path (#1560) 2025-01-21 11:58:35 -08:00
Xiaoyu 3d6e7317a7
Set onnxoptimizer.optimize as default for peepholeoptimizer (#1550)
## Describe your changes

Set onnxoptimizer.optimize as default for peepholeoptimizer

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-15 15:31:25 -08:00
shaahji 6c2b14b81c
Consolidate pass configs (#1547)
## Consolidate pass configs

Move AbstractPassConfig into pass_config.py and rename PassConfigBase to
BasePassConfig.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-14 21:30:59 -08:00
shaahji c1e1365c75
Fix graph vertex ordering during topological ordering (#1546)
## Fix graph vertex ordering during topological ordering

Previous implementation returned vertices in reverse order even when
there are no dependencies in the graph. Iterate the vertices in reverse
order so input order can be retained.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-13 22:28:50 -08:00
shaahji d27a07852f
Move module class variables from implementation to olive_config.json (#1545)
## Move module class variables from implementation to olive_config.json

This prevents loading the module pre-emptively even when it isn't
intended to be used. Also, after module gets imported, set the variable
back on the class so rest of the implementation doesn't get impacted.

TODO: Pass::run_on_target should be removed but code paths in tests
circumvent loading the olive package config. Follow up change to
implement pass search will fix the issue.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-13 22:28:37 -08:00
Xiaoyu dcc0cb9905
Migrate pipeline linux test to docker image (#1529)
## Describe your changes

- Migrate pipeline linux test to docker image.
- Skip failed tests for further investigation. Potential fail reasons:
  - python 3.10
  - onnxruntime 1.20
  - docker in docker


## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2025-01-09 10:56:34 -08:00
Xiaoyu c631ba2b61
Add onnxscript to peephole pass (#1530)
## Describe your changes

- Add onnxscrits to peephole pass
- Remove self-implemented constant folding.


## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-20 14:06:06 -08:00
Xiaoyu 0e328a50f4
update images to ubuntu 22.04 & fix integ test (#1532)
## Describe your changes

- Update images to ubuntu 22.04. 
  - GPU image may nee to be updated to cuda 12.x. 
- Fix integ test to unblock pipeline.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-20 12:51:58 -08:00
Ti-Tai Wang f372b89a69
Support `conversion_dtype` in ONNXConversion pass and `dynamic_shapes` on torch.onnx.export(..., dynamo=True) (#1511)
## Describe your changes

(1) Introduce `dynamic_shapes`, which is a crucial parameter to
torch.onnx.export(..., dynamo=True).
(2) Enable `dynamic` boolean control in the config
(3) Actually apply `torch_dtype` to model inputs.

NOTE:
1. Some automation is intentionally not supported by the dynamic_shapes
field due to the complexity of generatirng `dynamic_shapes`, and
torch.onnx.export(dynamo=True) actually supports converting
`dynamic_axes` to `dynamic_shapes` (limited).
2. To follow JSON rules, `dynamic_shapes` requires users to provide a
list of [dim_name(str), min(int), max(int)]. These information will
later be used to compose `torch.export.Dim(dim_name, min=min, max=max)`
([detail](https://pytorch.org/docs/stable/export.html#expressing-dynamism)).
3. `dynamic_shapes` follows the tree structure of the model inputs. For
example, if the model input is nested tuple, then the `dynamic_shapes`
should be a nested tuple, instead of a dictionary.
4. The `kv_cache` support of `dynamic_shapes` is limited in terms of the
variation of model signatures, implementations, and inputs. Users are
encouraged to provide full kv cache.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-18 10:00:42 -08:00
Jambay Kinley 68b1ee72be
CaptureSplitInfo: Separate split for memory intensive module (#1520) 2024-12-16 09:11:37 -08:00
Xiaoyu 72616c42ab
Add constant folding and onnxoptimizer to peephole optimizer (#1516)
## Describe your changes

Add constant folding and remove initializer from input to peephole
optimizer.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-12 15:57:07 -08:00
Xiaoyu dc6700880d
Add ReplaceErfWithTanh to GraphSurgeries (#1521)
## Describe your changes

Add ReplaceErfWithTanh to GraphSurgeries

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-11 20:03:06 -08:00
Xiaoyu 9191ba61e0
Add OnnxIODataTypeConverter & remove Float32Converter (#1519)
## Describe your changes

Add OnnxIODataTypeConverter & remove Float32Converter

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-10 14:45:51 -08:00
Xiaoyu aa21017238
Add ReorderInputs to GraphSurgeries Pass (#1518)
## Describe your changes

Add ReorderInputs to GraphSurgeries Pass

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-09 19:25:08 -08:00
Jambay Kinley 4e4a625e49
SplitModel: Keep QDQ nodes together (#1515) 2024-12-06 16:52:51 -08:00
Xiaoyu 9c562e6c97
Add graph surgeries pass (#1507)
## Describe your changes

Add graph surgeries pass.

Add surgeon:
- RenameInputs
- RenameOutputs
- InferShapes
- RemoveShapes
- ReorderInputs
- ZeroOutInput
- RemoveInputs
- ExposeOutputs
- ExposeQuantizedOutput

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-06 15:03:03 -08:00
Xiaoyu cf322550b3
Fix shared cache bug (#1513)
## Describe your changes

Fix shared cache bug

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
https://github.com/microsoft/Olive/issues/1509
2024-12-03 11:32:24 -08:00
Xiaoyu b3809d1a8d
Save output model to output_dir (#1430)
## Describe your changes

Save output model to output_dir

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-12-03 11:32:11 -08:00
Xiaoyu 6a74990726
Fix doc link (#1494)
## Describe your changes

Fix doc link
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-25 15:25:08 -08:00
Xiaoyu a0822a2a2c
Fix RUFF format (#1495)
## Describe your changes
Fix RUFF format
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-22 16:21:37 -08:00
Jambay Kinley 0db2d72b00
CLI: Use getattr on conditional args, Check handler type. Debug log in `get_attr` (#1486) 2024-11-20 13:19:04 -08:00
Jambay Kinley d1499ab07c
HfModelHandler: Call `save_metadata` after model is saved (#1485) 2024-11-20 13:18:52 -08:00
Xiaoyu f47265f377
Fix several bugs & add Bert inc example to pipeline (#1492)
## Describe your changes

Fix Olive bugs & Add Bert inc examples to example pipeline.

- `batch_size` is needed for Inc dataloader. Set default size to 1.
- Custom eval func doesn't have batch_size as input.
- For `QuantizationAwareTraining` pass, `train_data_config` is not
required if user provides `training_loop_func`.
- Latest transformers package will automatically save trained model as
safetensors format. Add `save_safetensors` as false to train argument.
- Some passes may have nested data_config in its config. Update
auto-fill data_config logic to achieve this.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-18 14:03:15 -08:00
Jambay Kinley f11061a174
Update main version to 0.8.0.dev0 (#1489)
## Describe your changes
We already already done some `0.7.x` releases so, updating the dev
version to `0.8.0.dev0`. This way it's ahead of the official releases
and we can differentiate dev builds from official releases.
Similar to how transformers versions their dev branch.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-14 13:02:19 -08:00
anujj 52af325ede
Update TensorRT Model Optimizer documentation and dependencies (#1483)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
2024-11-13 14:26:25 -08:00
Xiaoyu 3e44c97806
Remove static key from quantize cli (#1481)
## Describe your changes

`static` quantization is not supported by cli now. Removing `static` key
from template. Otherwise, cli doc will have this option.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-12 12:34:01 -08:00
Justin Chu ddf98d0aaa
Set to use `dynamo=True` only with PyTorch 2.6 temporarily (#1479)
## Describe your changes

For the Olive release, in the ONNX export pass, since when `dynamo=True`
the dynamic_shapes argument is not provided properly, we temporarily
change the check to require torch 2.6 to enable the new `dynamo=True`
logic pass. This way when a user has torch 2.5 Olive will still use the
old `dynamo_export` logic and function without errors.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link

- https://github.com/microsoft/Olive/issues/1478
2024-11-11 23:55:13 -08:00
Jambay Kinley 61876e211f
AWQ: Patch for mismatched devices in RotaryEmbedding (#1480) 2024-11-11 23:00:55 -08:00
Jambay Kinley b7405973cb
OnnxModelOptimizer: Rename to OnnxPeepholeOptimizer, handle failed shape inference (#1477) 2024-11-11 13:45:36 -08:00
shaahji 1363be501d
Add option to use dynamo exporter for onnx conversion pass (#1476)
## Add option to use dynamo exporter for onnx conversion pass

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-11 13:21:43 -08:00
Jambay Kinley 1ff2f40815
MatMulNBitsToQDQ: Support `INT4` elem type, per-axis quantization (#1468)
## Describe your changes
- `use_int4` parameter added to use `INT4` elem type instead of `UINT4`.
Quantized data is converted to int4 by first casting from uint8 to int8
and then subtracting 8.
- We expect INT4 to only be used for symmetric quantized models but it
appears to work fine for assymetric quantized models so there is no
restriction on it.
- `add_zero_point` parameter to force adding zero points to the DQ node
even though they are all zeros.
- If the number of K-blocks is 1, we assume it to be per-axis
quantization and create the DQ node as such:
  - no `block_size` attribute
  - scales and zero points are 1-D
  - `axis` is opposite that of block-wise quantization

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-11 12:46:52 -08:00
Jambay Kinley 4009defbc9
HfModel: Disable `use_exllama` by default for GPTQ models (#1474)
## Describe your changes
The default value for `use_exllama` in transformers is `True`. However,
exllama model cannot be loaded on cpu (for model export) and doesn't
have a backward pass implemented for finetuning.
Since the main use for gptq quantized model in Olive is for export and
finetuning, we should disable `use_exllama` by default. User can provide
`use_exllama=True` as part of the loading args if they want to enable
exllama for inference, etc.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-11 12:46:40 -08:00
devang-ml cad03e2807
auto-opt: `--use_ort_genai` option, fix fp16 transformers optimizer (#1473)
## Describe your changes
- Use ONNX adapter format as the default format
- Provide --use_ort_genai option to generate genai config via auto-opt
CLI
- Correct the fp16 parameter in transformers optimizer config from
`use_fp16` to `float16`

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link

---------

Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com>
2024-11-08 17:37:27 -08:00
Jambay Kinley aa031633f8
Expose accelerator memory and use in `CaptureSplitInfo` (#1467)
## Describe your changes
- Expose `memory` in `AcceleratorSpec` and `AcceleratorConfig`.
- Removed the unused fields in accelerator spec to avoid confusion.
- CaptureSplitInfo pass now uses this field instead of a config
parameter.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-08 16:49:48 -08:00
Jambay Kinley 81037d7d03
`auto-opt`: Add model splitting options, Update no-search rules (#1466)
## Describe your changes
- Add new options to enable model splitting in `auto-opt` command. Only
used for no-search model since model evaluation for split model is
undefined.

Some other rule changes:
- Add `--use_qdq_encoding` option to make this optional
- Remove pytorch and model splitting passes when input model is onnx
model
- Remove optimizer and matmul4 passes when model builder is used since
the model is already optimized and in the expected precision
- Keep/remove passes that are only needed for specific EPs.
- Onnx conversion is always in fp32 since fp16 conversion doesn't work
for all models. We instead use the transformers optimizer pass to do the
conversion to fp16 afterwards.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Fixes #1464
2024-11-07 15:56:56 -08:00
Jambay Kinley 74873a54a8
Model Splitting: Add cost model, Simplify split logic (#1456)
## Describe your changes
- In addition to `num_splits`, `CaptureSplitInfo` pass now supports cost
model. The cost model is a csv containing the cost per component of the
model. Currently, the only cost supported is the number of bytes.
- The split decision is made based on the `max_memory` config parameter.
This config parameter is temporary and will be moved into the
accelerator spec. The required changes are bigger than the scope of this
PR so it will be handled in a follow up PR.
- Added `generate-cost-model` CLI to generate cost models for
huggingface models. Also added some pre-generated cost models for phi
and llama models under `assets/cost_models`.
- `include_all_nodes` option has been removed from `SplitModel` pass. It
always includes all nodes. This also made the logic simpler.
- OnnxDAG:
- Updated logic to handle unused inputs/outputs for nodes that are left
as `""`. Found in contrib operators commonly.
  - Handle unnamed nodes. Node names are optional in the onnx spec.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Fixes:
#1459
2024-11-07 13:00:52 -08:00
Jambay Kinley e6b3e99ce4
SharedCache: Fix incorrect calls (#1465)
## Describe your changes

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Fixes: #1463
2024-11-07 12:40:38 -08:00
shaahji 3ce5035b28
Issue #1460: Don't run transformer optimizer pass when using MB (#1462)
## Issue #1460: Don't run transformer optimizer pass when using MB

The pass and MB aren't compatible.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Fixes: #1460
2024-11-07 11:14:57 -08:00
shaahji 0bfc4dfdfa
Fix an order of operation issue causing AttributeError (#1455)
## Fix an order of operation issue causing AttributeError

Github Issue: 1449, 1454

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-11-06 18:00:36 -08:00
anujj 16ffab8314
Update NV-ModelOPT INT4 quantization (#1441)
1. Update modelopt integration in Olive
2. Add phi3 example 
3. Remove the old bert model example

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
2024-11-01 12:24:27 -07:00
Jambay Kinley ad72625b7f
GPTQ: Don't save zero bias with the checkpoint, MB: Config save bugfix (#1445)
## Describe your changes
- GPTQ creates and saves zero bias for all quantized modules. This is
fixed on main but there hasn't been a release with this fix yet.
- This causes unnecessary warning while loading the quantized model for
unused bias.
- Model exported using MB might also have the bias even though they are
just zeros.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Fixes
#1450
2024-11-01 10:10:47 -07:00
shaahji 07afaae3e9
Update olive_config and auto-opt to use precision, providers and accelerators (#1440)
## Update olive_config and auto-opt to use precision, providers and
accelerators

* Implement providers, accelerators and precisions for pass in
olive_config.json.
Each pass module config now also requires listing supported providers,
accelerators
  and precisions.
* Update auto-opt cli to use a pre-defined order of passes when search
is disabled.
The list of passes is selected based on the user's choice of precision
for
  the output model.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
2024-10-31 11:34:44 -07:00