## Describe your changes
Add generate_config_file option to cli.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Add options page.
- Update document for recent changes.
- Rename `API reference` to `Reference`.
- Remove Engine, Metrics, Evaluators, Resource Path API page.
- Remove advanced users page.
- Add subsections to Pass page.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
`quant_preprocess` was originally set to `False` by default because I
thought it might not be compatible with QNN EP. However, it is needed
for most models for the quantization to work properly:
- The quantizer needs a shape inferred model to quantize some tensors.
- Model quality is bad if constants are not made into initializers (see
#1552).
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Previous QuaRot pass is replaced with our own implementation of the
pass which only performs the offline weight rotation.
- The online hadamard rotation parts are not relevant to us since it
involves reimplementing the model architecture or updating them
dynamically to add the input/kv rotation functions. Moreover, these are
not compatible with onnx export.
- This pass does not do any quantization. The rotated output model
should be subsequently quantized using GPTQ and/or QDQ passes.
- All usage of quarot from the examples and cli are removed. New
examples and cli options will be added once E2E validation of Rotate ->
GPTQ -> QDQ workflows is complete.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Implemented a new huggingface `ModelWrapper` that acts as in interface
with huggingface models. It keeps maps for different model types that
allows the user to get model attributes and submodules.
- All code using the previous mappings directly have been updated
accordingly.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- For huggingface data preprocessing (except text-gen which has it's own
logic), truncate the data before tokenization if `max_samples` is
provided.
- There is no need to tokenizer and process the whole dataset if only a
subset is going to be used. This is useful for large datasets where the
tokenized data might be too large to fit in memory.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Validate pass config before instantiating the pass
* Use of search point should be limited to engine logic only. Rest of
the Olive implementation should receive a validated configuration to
use.
* Validation is for complete configuration and not merely for a search
point.
* Fixed a few issues related to use of BasePassConfig vs.
FullPassConfig.
* Add local caching to OlivePackageConfig for loaded modules.
* Renamed a few variables in engine logic to be explicit about use of
pass config vs. pass run config.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Set onnxoptimizer.optimize as default for peepholeoptimizer
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Consolidate pass configs
Move AbstractPassConfig into pass_config.py and rename PassConfigBase to
BasePassConfig.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Fix graph vertex ordering during topological ordering
Previous implementation returned vertices in reverse order even when
there are no dependencies in the graph. Iterate the vertices in reverse
order so input order can be retained.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Move module class variables from implementation to olive_config.json
This prevents loading the module pre-emptively even when it isn't
intended to be used. Also, after module gets imported, set the variable
back on the class so rest of the implementation doesn't get impacted.
TODO: Pass::run_on_target should be removed but code paths in tests
circumvent loading the olive package config. Follow up change to
implement pass search will fix the issue.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Migrate pipeline linux test to docker image.
- Skip failed tests for further investigation. Potential fail reasons:
- python 3.10
- onnxruntime 1.20
- docker in docker
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Add onnxscrits to peephole pass
- Remove self-implemented constant folding.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Update images to ubuntu 22.04.
- GPU image may nee to be updated to cuda 12.x.
- Fix integ test to unblock pipeline.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
(1) Introduce `dynamic_shapes`, which is a crucial parameter to
torch.onnx.export(..., dynamo=True).
(2) Enable `dynamic` boolean control in the config
(3) Actually apply `torch_dtype` to model inputs.
NOTE:
1. Some automation is intentionally not supported by the dynamic_shapes
field due to the complexity of generatirng `dynamic_shapes`, and
torch.onnx.export(dynamo=True) actually supports converting
`dynamic_axes` to `dynamic_shapes` (limited).
2. To follow JSON rules, `dynamic_shapes` requires users to provide a
list of [dim_name(str), min(int), max(int)]. These information will
later be used to compose `torch.export.Dim(dim_name, min=min, max=max)`
([detail](https://pytorch.org/docs/stable/export.html#expressing-dynamism)).
3. `dynamic_shapes` follows the tree structure of the model inputs. For
example, if the model input is nested tuple, then the `dynamic_shapes`
should be a nested tuple, instead of a dictionary.
4. The `kv_cache` support of `dynamic_shapes` is limited in terms of the
variation of model signatures, implementations, and inputs. Users are
encouraged to provide full kv cache.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add constant folding and remove initializer from input to peephole
optimizer.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add ReplaceErfWithTanh to GraphSurgeries
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add OnnxIODataTypeConverter & remove Float32Converter
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add ReorderInputs to GraphSurgeries Pass
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Add graph surgeries pass.
Add surgeon:
- RenameInputs
- RenameOutputs
- InferShapes
- RemoveShapes
- ReorderInputs
- ZeroOutInput
- RemoveInputs
- ExposeOutputs
- ExposeQuantizedOutput
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Fix shared cache bug
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
https://github.com/microsoft/Olive/issues/1509
## Describe your changes
Save output model to output_dir
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Fix doc link
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Fix RUFF format
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
Fix Olive bugs & Add Bert inc examples to example pipeline.
- `batch_size` is needed for Inc dataloader. Set default size to 1.
- Custom eval func doesn't have batch_size as input.
- For `QuantizationAwareTraining` pass, `train_data_config` is not
required if user provides `training_loop_func`.
- Latest transformers package will automatically save trained model as
safetensors format. Add `save_safetensors` as false to train argument.
- Some passes may have nested data_config in its config. Update
auto-fill data_config logic to achieve this.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
We already already done some `0.7.x` releases so, updating the dev
version to `0.8.0.dev0`. This way it's ahead of the official releases
and we can differentiate dev builds from official releases.
Similar to how transformers versions their dev branch.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## Describe your changes
`static` quantization is not supported by cli now. Removing `static` key
from template. Otherwise, cli doc will have this option.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
For the Olive release, in the ONNX export pass, since when `dynamo=True`
the dynamic_shapes argument is not provided properly, we temporarily
change the check to require torch 2.6 to enable the new `dynamo=True`
logic pass. This way when a user has torch 2.5 Olive will still use the
old `dynamo_export` logic and function without errors.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
- https://github.com/microsoft/Olive/issues/1478
## Add option to use dynamo exporter for onnx conversion pass
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- `use_int4` parameter added to use `INT4` elem type instead of `UINT4`.
Quantized data is converted to int4 by first casting from uint8 to int8
and then subtracting 8.
- We expect INT4 to only be used for symmetric quantized models but it
appears to work fine for assymetric quantized models so there is no
restriction on it.
- `add_zero_point` parameter to force adding zero points to the DQ node
even though they are all zeros.
- If the number of K-blocks is 1, we assume it to be per-axis
quantization and create the DQ node as such:
- no `block_size` attribute
- scales and zero points are 1-D
- `axis` is opposite that of block-wise quantization
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
The default value for `use_exllama` in transformers is `True`. However,
exllama model cannot be loaded on cpu (for model export) and doesn't
have a backward pass implemented for finetuning.
Since the main use for gptq quantized model in Olive is for export and
finetuning, we should disable `use_exllama` by default. User can provide
`use_exllama=True` as part of the loading args if they want to enable
exllama for inference, etc.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Use ONNX adapter format as the default format
- Provide --use_ort_genai option to generate genai config via auto-opt
CLI
- Correct the fp16 parameter in transformers optimizer config from
`use_fp16` to `float16`
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
---------
Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com>
## Describe your changes
- Expose `memory` in `AcceleratorSpec` and `AcceleratorConfig`.
- Removed the unused fields in accelerator spec to avoid confusion.
- CaptureSplitInfo pass now uses this field instead of a config
parameter.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
## Describe your changes
- Add new options to enable model splitting in `auto-opt` command. Only
used for no-search model since model evaluation for split model is
undefined.
Some other rule changes:
- Add `--use_qdq_encoding` option to make this optional
- Remove pytorch and model splitting passes when input model is onnx
model
- Remove optimizer and matmul4 passes when model builder is used since
the model is already optimized and in the expected precision
- Keep/remove passes that are only needed for specific EPs.
- Onnx conversion is always in fp32 since fp16 conversion doesn't work
for all models. We instead use the transformers optimizer pass to do the
conversion to fp16 afterwards.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
Fixes#1464
## Describe your changes
- In addition to `num_splits`, `CaptureSplitInfo` pass now supports cost
model. The cost model is a csv containing the cost per component of the
model. Currently, the only cost supported is the number of bytes.
- The split decision is made based on the `max_memory` config parameter.
This config parameter is temporary and will be moved into the
accelerator spec. The required changes are bigger than the scope of this
PR so it will be handled in a follow up PR.
- Added `generate-cost-model` CLI to generate cost models for
huggingface models. Also added some pre-generated cost models for phi
and llama models under `assets/cost_models`.
- `include_all_nodes` option has been removed from `SplitModel` pass. It
always includes all nodes. This also made the logic simpler.
- OnnxDAG:
- Updated logic to handle unused inputs/outputs for nodes that are left
as `""`. Found in contrib operators commonly.
- Handle unnamed nodes. Node names are optional in the onnx spec.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
Fixes:
#1459
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
Fixes: #1463
## Issue #1460: Don't run transformer optimizer pass when using MB
The pass and MB aren't compatible.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
Fixes: #1460
## Fix an order of operation issue causing AttributeError
Github Issue: 1449, 1454
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
1. Update modelopt integration in Olive
2. Add phi3 example
3. Remove the old bert model example
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## Describe your changes
- GPTQ creates and saves zero bias for all quantized modules. This is
fixed on main but there hasn't been a release with this fix yet.
- This causes unnecessary warning while loading the quantized model for
unused bias.
- Model exported using MB might also have the bias even though they are
just zeros.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link
Fixes
#1450
## Update olive_config and auto-opt to use precision, providers and
accelerators
* Implement providers, accelerators and precisions for pass in
olive_config.json.
Each pass module config now also requires listing supported providers,
accelerators
and precisions.
* Update auto-opt cli to use a pre-defined order of passes when search
is disabled.
The list of passes is selected based on the user's choice of precision
for
the output model.
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.
## (Optional) Issue link