`HfModelHandler` separated from `PyTorchModelHandler` (#1239)

This commit is contained in:
Jambay Kinley 2024-07-17 20:31:04 -07:00 коммит произвёл GitHub
Родитель d6b1a6061e
Коммит d325da074e
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
159 изменённых файлов: 2165 добавлений и 2823 удалений

Просмотреть файл

@ -10,21 +10,33 @@ Model Configuration
-------------------
.. autoclass:: olive.model.ModelConfig
.. _hf_model:
Hf Model Handler
----------------
.. autoclass:: olive.model.HfModelHandler
.. _distributed_hf_model:
Distributed Hf Model Handler
---------------------------------
.. autoclass:: olive.model.DistributedHfModelHandler
.. _pytorch_model:
PyTorch Model Handler
---------------------
.. autoclass:: olive.model.PyTorchModelHandler
.. _onnx_model:
ONNX Model Handler
------------------
.. autoclass:: olive.model.ONNXModelHandler
.. _composite_onnx_model:
CompositeModel Model Handler
----------------------------
.. autoclass:: olive.model.CompositeModelHandler
.. _distributed_onnx_model:
DistributedOnnxModel Model Handler
Distributed Onnx Model Handler
----------------------------------
.. autoclass:: olive.model.DistributedOnnxModelHandler
@ -34,24 +46,15 @@ OpenVINO Model Handler
----------------------
.. autoclass:: olive.model.OpenVINOModelHandler
.. _pytorch_model:
PyTorch Model Handler
---------------------
.. autoclass:: olive.model.PyTorchModelHandler
DistributedPyTorchModelHandler Model
------------------------------------
.. autoclass:: olive.model.DistributedPyTorchModelHandler
.. _snpe_model:
SNPEHandler Model
SNPE Model Handler
-----------------
.. autoclass:: olive.model.SNPEModelHandler
CompositePyTorchModel Model Handler
-----------------------------------
.. autoclass:: olive.model.CompositePyTorchModelHandler
.. _composite_model:
Composite Model Handler
----------------------------
.. autoclass:: olive.model.CompositeModelHandler
.. _distributed_pytorch_model:

Просмотреть файл

@ -3,142 +3,40 @@
## Introduction
This document outlines the integrations between Olive and Huggingface. Discover how to use Huggingface resources within Olive.
## hf_config
If you want to optimize a Huggingface model, or evaluate a Huggingface model, you will need `hf_config` defined in your `input_model` section. Please refer to [this section](../overview/options.md#input-model-information) for detailed parameters of `hf_config`.
## Input Model
Use the `HfModel` type if you want to optimize a Huggingface model, or evaluate a Huggingface model. The default `task` is `text-generation-with-past`.
Here is how you can use `hf_config`:
### Model config loading
Olive can automatically retrieve model configurations from Huggingface hub:
- Olive retrieves model [configuration](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) from transformers for future usage.
- Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the `OnnxConversion` pass from [OnnxConfig](https://huggingface.co/docs/transformers/main_classes/onnx#onnx-configurations). This means there's no need for you to manually specify the IO config and dummy input when using the `OnnxConversion` pass.
If you want to use your own `io_config` or `dummy_input`, you can still add them to the model config:
### Huggingface Hub model
Olive can automatically retrieve models from Huggingface hub:
```json
"input_model":{
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_script": "user_script.py",
"io_config": "get_io_config",
"dummy_inputs_func": "get_dummy_inputs",
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"task": "text-generation"
}
"model_path": "meta-llama/Llama-2-7b-hf"
}
}
```
### Model loading
#### Load Huggingface model from Huggingface hub
Olive can automatically retrieve models from Huggingface hub. Here are the examples:
#### PyTorch model
Take `Intel/bert-base-uncased-mrpc` as an example, you can specify task name as `text-classification` to form the `hf_config` as follows:
### Local model
If you have the Huggingface model prepared in local:
```json
"input_model":{
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
"model_path": "path/to/local/model"
}
}
```
**Note:** You must also have the tokenizer and other necessary files in the same local directory.
#### Optimum model
Optimum model is a special case of PyTorch model. By specifying `OptimumModel` as `type`, the `model_path` should be the model's name. Then add the names of the model components to `model_components`. Olive will retrieve the components from Huggingface hub:
```json
"input_model":{
"type": "OptimumModel",
"config": {
"model_path": "openlm-research/open_llama_3b",
"model_components": ["decoder_model.onnx", "decoder_with_past_model.onnx"],
"hf_config": {
"model_class": "LlamaForCausalLM"
}
}
}
```
### Model loading from local
If you have the Huggingface model prepared in local, add `model_path` to the model config, and specify `model_name` and `task` in `hf_config` so that Olive can automatically fetch the model attributes:
Example:
```json
"input_model":{
"type": "PyTorchModel",
"config": {
"model_path": "path_to_local_model",
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
}
}
```
### Model loading from local with custom components
You can use your own custom components functions for your model. You will need to define the details of your components in your script as functions.
Example:
```json
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_script": "user_script.py",
"hf_config": {
"model_class": "WhisperForConditionalGeneration",
"model_name": "openai/whisper-medium",
"components": [
{
"name": "encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
},
{
"name": "decoder",
"io_config": "get_dec_io_config",
"component_func": "get_decoder",
"dummy_inputs_func": "decoder_dummy_inputs"
}
]
}
}
},
}
```
#### Script example
```python
# my_script.py
def get_dec_io_config(model: OliveModelHandler):
# return your io dict
...
def get_decoder(model: OliveModelHandler):
# your component implementation
...
def dummy_inputs_func(model: OliveModelHandler):
# return the dummy input for your component
...
```
### Model loading from Azure ML resources
### Azure ML model
Olive supports loading model from your Azure Machine Learning workspace. Find detailed configurations [here](./azureml_integration.md).
Example: [Llama-2-7b](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/registry/azureml-meta) from Azure ML model catalog:
```json
"input_model":{
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": {
"type": "azureml_registry_model",
@ -147,17 +45,38 @@ Example: [Llama-2-7b](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/
"registry_name": "azureml-meta",
"version": "13"
}
},
"model_file_format": "PyTorch.MLflow",
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"task": "text-generation"
}
}
}
```
Please note the model for `Llama-2-7b` in Azure ML model catalog is a mlflow model. So `"model_file_format": "PyTorch.MLflow"` is required here.
### Model config loading
Olive can automatically retrieve model configurations from Huggingface hub:
- Olive retrieves model [configuration](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) from transformers for future usage.
- Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the `OnnxConversion` pass from [OnnxConfig](https://huggingface.co/docs/transformers/main_classes/onnx#onnx-configurations). This means there's no need for you to manually specify the IO config when using the `OnnxConversion` pass.
You can also provide your own IO config which will override the automatically fetched IO config and dummy inputs:
```json
"input_model": {
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int64", "int64", "int64" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
}
}
}
}
```
## Huggingface datasets
Olive supports automatically downloading and applying [Huggingface datasets](https://huggingface.co/datasets) to Passes and Evaluators.

Просмотреть файл

@ -8,7 +8,7 @@ It is based on the [LoRA paper](https://arxiv.org/abs/2106.09685).
The output model is the input transformers model along with the fine-tuned LoRA adapters. The adapters can be loaded and/or merged into the original model using the `peft` library from Hugging Face.
This pass only supports Hugging Face transformers PyTorch models. Please refer to [LoRA](lora) for more details about the pass and its config parameters.
This pass only supports HfModels. Please refer to [LoRA](lora) for more details about the pass and its config parameters.
### Example Configuration
```json
@ -33,7 +33,7 @@ the QLoRA [paper](https://arxiv.org/abs/2305.14314) and [code](https://github.co
The output model is the input transformers model along with the quantization config and the fine-tuned LoRA adapters. The adapters can be loaded and/or merged into the original model using the
`peft` library from Hugging Face.
This pass only supports Hugging Face transformers PyTorch models. Please refer to [QLoRA](qlora) for more details about the pass and its config parameters.
This pass only supports HfModels. Please refer to [QLoRA](qlora) for more details about the pass and its config parameters.
**Note:** QLoRA requires a GPU to run.
@ -60,7 +60,7 @@ and [code](https://github.com/yxli2123/LoftQ). More information on LoRA can be f
The `LoftQ` pass initializes the quantized LoRA model using the LoftQ initialization method and then fine-tunes the adapters. The output model has new quantization aware master weights and the fine-tuned LoRA adapters.
This pass only supports Hugging Face transformers PyTorch models. Please refer to [LoftQ](loftq) for more details about the pass and its config parameters.
This pass only supports HfModels. Please refer to [LoftQ](loftq) for more details about the pass and its config parameters.
**Note:** LoftQ requires a GPU to run.
```json
@ -193,7 +193,7 @@ as 2:4 and 4:8 patterns.
Please refer to the original paper linked above for more details on the algorithm and performance results for different models, sparsities and datasets.
This pass only supports Hugging Face transformers PyTorch models. Please refer to [SparseGPT](sparsegpt) for more details on the types of transformers models supported.
This pass only supports HfModels. Please refer to [SparseGPT](sparsegpt) for more details on the types of transformers models supported.
**Note:** TensorRT can accelerate inference on 2:4 sparse models as described in [this blog](https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/).
@ -234,7 +234,7 @@ This pass only supports HuggingFace transformer PyTorch models. Please refer to
applicable. `torch_tensorrt` is an extension to `torch` where TensorRT compiled engines can be used like regular `torch.nn.Module`s. This pass can be used to accelerate inference on transformer models
with sparse weights by taking advantage of the 2:4 structured sparsity pattern supported by TensorRT.
This pass only supports Hugging Face transformers PyTorch models. Please refer to [TorchTRTConversion](torch_trt_conversion) for more details on the types of transformers models supported.
This pass only supports HfModels. Please refer to [TorchTRTConversion](torch_trt_conversion) for more details on the types of transformers models supported.
### Example Configuration
```json

Просмотреть файл

@ -85,36 +85,31 @@ The default value is 3. User can increase if there are network issues and the op
"operation_retry_interval" : 5
},
```
<!-- TODO(anyone): Docs for all model handlers-->
## Input Model Information
`input_model: [Dict]`
User should specify input model type and configuration using `input model` dictionary. It contains following items:
- `type: [str]` Type of the input model which is case insensitive.. The supported types contain `PyTorchModelHandler`, `ONNXModelHandler`, `OpenVINOModelHandler`,`SNPEModelHandler` and etc. You can
- `type: [str]` Type of the input model which is case insensitive.. The supported types contain `HfModelHandler`, `PyTorchModelHandler`, `ONNXModelHandler`, `OpenVINOModelHandler`,`SNPEModelHandler` and etc. You can
find more details in [Olive Models](https://microsoft.github.io/Olive/api/models.html).
- `config: [Dict]` For example, for `PytorchModelHandler`, the input model config dictionary specifies following items:
- `config: [Dict]` For example, for `HfModelHandler`, the input model config dictionary specifies following items:
- `model_path: [str | Dict]` The model path can be a string or a dictionary. If it is a string, it is either a string name
used by the model loader or the path to the model file/directory. If it is a dictionary, it contains information about the model path.
Please refer to [Configuring Model Path](../tutorials/configure_model_path.md) for the more information of the model path dictionary.
- `model_path: [str | Dict]` The model path can be a string or a dictionary. If it is a string, it is a huggingface hub model id or a local directory. If it is a dictionary, it contains information about the model path. Please refer to [Configuring Model Path](../tutorials/configure_model_path.md) for the more information of the model path dictionary.
- `model_loader: [str]` The name of the function provided by the user to load the model. The function should take the model path as
input and return the loaded model.
- `task: [str]` The task of the model. The default task is `text-generation-with-past` which is equivalent to a causal language model with key-value cache enabled.
- `model_script: [str]` The name of the script provided by the user to assist with model loading.
- `script_dir: [str]` The directory that contains dependencies for the model script.
- `io_config: [Dict[str, Any] | IoConfig | str | Callable]`: The inputs and outputs information of the model. It can be a dictionary, an IoConfig object or a function string under `model_script`. Basically, it contains following items:
- `io_config: [Dict]`: The inputs and outputs information of the model. If not provided, Olive will try to infer the input and output information from the model. The dictionary contains following items:
- `input_names: [List[str]]` The input names of the model.
- `input_types: [List[str]]` The input types of the model.
- `input_shapes: [List[List[int]]]` The input shapes of the model.
- `output_names: [List[str]]` The output names of the model.
- `dynamic_axes: [Dict[str, Dict[str, str]]]` The dynamic axes of the model. The key is the name of the input or output and the value is a dictionary that contains the dynamic axes of the input or output. The key of the value dictionary is the index of the dynamic axis and the value is the name of the dynamic axis. For example, `{"input": {"0": "batch_size"}, "output": {"0": "batch_size"}}` means the first dimension of the input and output is dynamic and the name of the dynamic axis is `batch_size`.
- `string_to_int_dim_params: List[str]` The list of input names in dynamic axes that need to be converted to int value.
- `kv_cache: Union[bool, Dict[str, str]]` The key value cache configuration.
- `kv_cache: Union[bool, Dict[str, str]]` The key value cache configuration. If not provided, it is assumed to be `True` if the `task` ends with `-with-past`.
- If it is `False`, Olive will not use key value cache.
- If it is `True`, Olive will infer the cache configuration from the input_names/input_shapes and input model based on default `kv_cache`.
- If it is a dictionary, it should contains the key value cache configuration. Here is an default configuration example:
@ -148,35 +143,15 @@ find more details in [Olive Models](https://microsoft.github.io/Olive/api/models
The dynamic axis of the past key value cache. If it is null, Olive will infer the dynamic axis.
- `present_kv_dynamic_axis`: null
The dynamic axis of the present key value cache. If it is null, Olive will infer the dynamic axis.
- <a name="hf_config"></a> `hf_config: [Dict]` Instead of `model_path` or `model_loader`, the model can be specified using a dictionary describing a huggingface
model. This dictionary specifies the following items:
- `model_name: [str]`: This the model name of the huggingface model such as `distilbert-base-uncased` which will be used to load the model with huggingface `from_pretrained` method.
- `task: [str]`: This is the task type for the model such as `text-classification`. The complete list of supported task can be found
at [huggingface-tasks](https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#transformers.pipeline.task).
- `feature: [str]`: The ONNX export features. This is only needed for HuggingFace hub model. It is inferred from `task` if not provided. You must provide the feature if you need past key value cache.
For instance, `"causal-lm-with-past"`. You can find more info at [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
- `model_class: [str]`: Instead of the `task`, the class of the model can be provided as well. Such as `DistilBertForSequenceClassification`
- `components: [List[HFComponent]]`: HFComponent list:
- `HFComponent`:
- `name: [str]`: Component name. Olive will generate a model class with this str as attribute name.
- `io_config: [Dict[str, Any] | IoConfig | str | Callable]`: The io_config of this component. If `str`, Olive will load `io_config` from `model_script`.
- `component_func: [str]`: The component function name will be loaded from `model_script`.
- `dummy_inputs_func: [str]`: The dummy input function name will be loaded from `model_script`.
```
For cases where you do not want to use the huggingface model but want to use the huggingface dataset, you can provide `dataset` config only like above.
- `from_pretrained_args: [dict]`: Arguments to pass to the `from_pretrained` method of the model class. Refer to [this documentation](https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained).
- `load_kwargs: [dict]`: Arguments to pass to the `from_pretrained` method of the model class. Refer to [this documentation](https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained).
Please find the detailed config options from following table for each model type:
| Model Type | Description |
|:----------|:-------------|
| [PytorchModelHandler(pytorch_model) | Pytorch model |
| [HfModelHandler](hf_model) | Hf model |
| [PytorchModelHandler](pytorch_model) | Pytorch model |
| [ONNXModelHandler](onnx_model) | ONNX model |
| [OpenVINOModelHandler](openvino_model) | OpenVINO IR model |
| [SNPEModelHandler](snpe_model) | SNPE DLC model |
@ -184,20 +159,9 @@ Please find the detailed config options from following table for each model type
### Example
```json
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "user_script.py",
"io_config": {
"input_names": ["input"],
"input_types": ["int32"],
"input_shapes": [[1, 3, 32, 32]],
"output_names": ["output"],
"dynamic_axes": {
"input": {"0": "batch_size"},
"output": {"0": "batch_size"}
}
}
"model_path": "meta-llama/Llama-2-7b-hf"
}
}
```

Просмотреть файл

@ -92,12 +92,10 @@ Here is another quick comparison between Auto Optimizer and manual settings.
{
"input_model":{
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
"model_path": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
},
"systems": {
@ -188,12 +186,10 @@ Here is another quick comparison between Auto Optimizer and manual settings.
{
"input_model":{
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
"model_path": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
},
"systems": {

Просмотреть файл

@ -207,7 +207,7 @@ Convert the transformer dummy data config to the data container.
name="transformers_dummy_data_config",
type="TransformersDummyDataContainer",
load_dataset_config=DataComponentConfig(params={
# model_name can be filled with the model name in input model's hf_config
# model_name can be filled with the model name in input model's model_path
# if you start olive with olive run --config <config_path>
"model_name": "meta-llama/Llama-2-7b-hf"
})

Просмотреть файл

@ -1,12 +1,9 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_class": "ASTForAudioClassification",
"model_name": "MIT/ast-finetuned-speech-commands-v2",
"task": "audio-classification"
},
"model_path": "MIT/ast-finetuned-speech-commands-v2",
"task": "audio-classification",
"io_config": {
"input_names": [ "input_values" ],
"output_names": [ "logits" ],

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" } }
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,16 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "user_script.py",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,16 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "user_script.py",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,16 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "user_script.py",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,16 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "user_script.py",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,16 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_loader": "load_pytorch_origin_model",
"model_script": "nv_user_script.py",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,12 +1,10 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification",
"from_pretrained_args": { "attn_implementation": "eager" }
}
"model_path": "Intel/bert-base-uncased-mrpc",
"task": "text-classification",
"load_kwargs": { "attn_implementation": "eager" }
}
},
"systems": {

Просмотреть файл

@ -5,10 +5,10 @@
"workspace_name": "<place_holder>"
},
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": { "type": "azureml_model", "config": { "name": "bert-hf", "version": "3" } },
"hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
"task": "text-classification"
}
},
"data_configs": [

Просмотреть файл

@ -1,18 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
"input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
"input_types": [ "int64", "int64", "int64" ],
"output_names": [ "output" ]
}
}
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"data_configs": [
{

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" } }
"type": "HfModel",
"config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,11 +1,9 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
"model_path": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
},
"systems": {

Просмотреть файл

@ -45,13 +45,11 @@
"In this notebook, we will use a simple `bert-base-uncased` model as an example:\n",
"\n",
"```json\n",
"\"input_model\":{\n",
" \"type\": \"PyTorchModel\",\n",
"\"input_model\": {\n",
" \"type\": \"HfModel\",\n",
" \"config\": {\n",
" \"hf_config\": {\n",
" \"model_name\": \"Intel/bert-base-uncased-mrpc\",\n",
" \"task\": \"text-classification\"\n",
" }\n",
" \"model_path\": \"Intel/bert-base-uncased-mrpc\",\n",
" \"task\": \"text-classification\"\n",
" }\n",
"}\n",
"```\n",

Просмотреть файл

@ -1,11 +1,9 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
"model_path": "Intel/bert-base-uncased-mrpc",
"task": "text-classification"
}
},
"evaluators": {

Просмотреть файл

@ -4,7 +4,7 @@
# --------------------------------------------------------------------------
import torch
from datasets.utils import logging as datasets_logging # type: ignore[import]
from transformers import AutoTokenizer, BertModel # type: ignore[import]
from transformers import AutoTokenizer
from olive.data.registry import Registry
@ -12,12 +12,6 @@ datasets_logging.disable_progress_bar()
datasets_logging.set_verbosity_error()
def load_pytorch_origin_model(model_path):
model = BertModel.from_pretrained("bert-base-uncased")
model.eval()
return model
@Registry.register_dataloader("nvmo_calibration_dataloader")
def create_calibration_dataloader(dataset, batch_size, calib_size=64, **kwargs):
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

Просмотреть файл

@ -15,7 +15,6 @@ from neural_compressor.data import DefaultDataLoader
from torch.utils.data import Dataset
from transformers import (
AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer,
EvalPrediction,
Trainer,
@ -34,29 +33,6 @@ datasets_logging.set_verbosity_error()
# pylint: disable=attribute-defined-outside-init, protected-access
# This file is only used by bert_inc_ptq_cpu, bert_qat_customized_train_loop_cpu
# -------------------------------------------------------------------------
# Model Loader
# -------------------------------------------------------------------------
def load_pytorch_origin_model(model_path):
model = AutoModelForSequenceClassification.from_pretrained("Intel/bert-base-uncased-mrpc")
model.eval()
return model
# -------------------------------------------------------------------------
# Dummy Input for ONNX Export
# -------------------------------------------------------------------------
def create_input_tensors(model):
return {
"input_ids": torch.ones(1, 128, dtype=torch.int64),
"attention_mask": torch.ones(1, 128, dtype=torch.int64),
"token_type_ids": torch.ones(1, 128, dtype=torch.int64),
}
# -------------------------------------------------------------------------
# Common Dataset

Просмотреть файл

@ -1,6 +1,6 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": {
"type": "azureml_registry_model",
@ -10,8 +10,7 @@
"version": "9"
}
},
"model_file_format": "PyTorch.MLflow",
"hf_config": { "model_name": "microsoft/deberta-base-mnli", "task": "text-classification" }
"task": "text-classification"
}
},
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "tiiuae/falcon-7b", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "tiiuae/falcon-7b", "task": "text-generation" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,13 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "EleutherAI/gpt-j-6B",
"task": "text-generation",
"feature": "causal-lm-with-past"
}
}
"type": "HfModel",
"config": { "model_path": "EleutherAI/gpt-j-6B" }
},
"evaluators": {
"common_evaluator": {

Просмотреть файл

@ -2,11 +2,7 @@
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "EleutherAI/gpt-j-6B",
"task": "text-generation",
"feature": "causal-lm-with-past"
}
"model_path": "EleutherAI/gpt-j-6B"
}
},
"evaluators": {

1
examples/llama2/.gitignore поставляемый
Просмотреть файл

@ -1,2 +1,3 @@
llama2_cpu*
llama2_gpu*
llama2_model_builder.json

Просмотреть файл

@ -1,22 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"generative": true,
"io_config": {
"input_names": [ "input_ids", "attention_mask" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32" ],
"kv_cache": false
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"model_class": "LlamaForCausalLM",
"from_pretrained_args": { "_attn_implementation": "eager" },
"task": "text-generation"
}
}
"type": "HfModel",
"config": { "model_path": "meta-llama/Llama-2-7b-hf", "generative": true }
},
"data_configs": [
{

Просмотреть файл

@ -1,13 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "<model_name_placeholder>",
"model_class": "LlamaForCausalLM",
"task": "text-generation"
}
}
"type": "HfModel",
"config": { "model_path": "<model_name_placeholder>" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,23 +1,21 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"load_kwargs": {
"attn_implementation": "eager"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int64", "int64", "int64" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"task": "text-generation",
"from_pretrained_args": { "_attn_implementation": "eager" }
}
}
}
},

Просмотреть файл

@ -1,24 +1,21 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "<model_name_placeholder>",
"load_kwargs": {
"attn_implementation": "eager"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "<model_name_placeholder>",
"model_class": "LlamaForCausalLM",
"from_pretrained_args": { "_attn_implementation": "eager" },
"task": "text-generation"
}
}
}
},

Просмотреть файл

@ -1,23 +1,18 @@
{
"input_model": {
"type": "PyTorchModel",
"input_model":{
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"model_class": "LlamaForCausalLM",
"task": "text-generation"
}
}
}
},
@ -59,8 +54,6 @@
}
},
"engine": {
"log_severity_level": 0,
"evaluate_input_model": false,
"host": "local_system",
"target": "local_system",
"cache_dir": "cache",

Просмотреть файл

@ -15,5 +15,5 @@ dependencies:
- scipy
- sentencepiece
- torch==2.0.1
- transformers
- transformers>=4.33.2,<= 4.37.2
- git+https://github.com/microsoft/Olive#egg=olive-ai[gpu]

Просмотреть файл

@ -6,20 +6,8 @@
"keyvault_name": "<my_keyvault_name>"
},
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"model_path": {
"type": "azureml_registry_model",
"config": {
@ -28,8 +16,17 @@
"version": "13"
}
},
"model_file_format": "PyTorch.MLflow",
"hf_config": { "model_name": "meta-llama/Llama-2-7b-hf", "task": "text-generation" }
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
}
}
}
},
"systems": {

Просмотреть файл

@ -52,7 +52,7 @@
"In this tutorial, we will use Azure Machine Learning Llama2 curated model. The input model will be automatically downloaded from the [Azure Model catalog](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/registry/azureml-meta):\n",
"```json\n",
"\"input_model\":{\n",
" \"type\": \"PyTorchModel\",\n",
" \"type\": \"HfModel\",\n",
" \"config\": {\n",
" \"model_path\": {\n",
" \"type\": \"azureml_registry_model\",\n",
@ -61,11 +61,6 @@
" \"registry_name\": \"azureml-meta\",\n",
" \"version\": \"13\"\n",
" }\n",
" },\n",
" \"model_file_format\": \"PyTorch.MLflow\",\n",
" \"hf_config\": {\n",
" \"model_name\": \"meta-llama/Llama-2-7b-hf\",\n",
" \"task\": \"text-generation\"\n",
" }\n",
" }\n",
"}\n",

Просмотреть файл

@ -1,25 +1,22 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"load_kwargs": {
"attn_implementation": "eager"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32"
],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"model_class": "LlamaForCausalLM",
"from_pretrained_args": { "_attn_implementation": "eager" },
"task": "text-generation"
}
}
}
},

Просмотреть файл

@ -1,24 +1,21 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"load_kwargs": {
"attn_implementation": "eager"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"model_class": "LlamaForCausalLM",
"from_pretrained_args": { "_attn_implementation": "eager" },
"task": "text-generation"
}
}
}
},

Просмотреть файл

@ -1,24 +1,21 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "meta-llama/Llama-2-7b-hf",
"load_kwargs": {
"attn_implementation": "eager"
},
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": {
"model_name": "meta-llama/Llama-2-7b-hf",
"model_class": "LlamaForCausalLM",
"from_pretrained_args": { "_attn_implementation": "eager" },
"task": "text-generation"
}
}
}
},

Просмотреть файл

@ -3,4 +3,5 @@ onnx>=1.14.0
optimum>=1.17.0
protobuf==3.20.2
torch
transformers>=4.33.2
# transformers optimizer fusions don't match in newer versions
transformers>=4.33.2,<= 4.37.2

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "mistralai/Mistral-7B-v0.1", "model_class": "MistralForCausalLM" } }
"type": "HfModel",
"config": { "model_path": "mistralai/Mistral-7B-v0.1" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "mistralai/Mistral-7B-v0.1", "model_class": "MistralForCausalLM" } }
"type": "HfModel",
"config": { "model_path": "mistralai/Mistral-7B-v0.1" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -38,13 +38,9 @@ When you run the example config for other larger models, you may need
1. change the `model_path` to the one you use in `open_llama_config.json` and `user_script.py`.
```json
"input_model":{
"type": "OptimumModel",
"type": "HfModel",
"config": {
"model_path": "openlm-research/open_llama_3b", // to change based on the model you use
"model_components": ["decoder_model.onnx", "decoder_with_past_model.onnx"],
"hf_config": {
"model_class": "LlamaForCausalLM"
}
}
}
```

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "huggyllama/llama-7b", "task": "text-generation" } }
"config": { "model_path": "huggyllama/llama-7b" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -5,21 +5,20 @@
"workspace_name": "<workspace_name>"
},
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "openlm-research/open_llama_3b",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
}
}
}
},
"systems": {

Просмотреть файл

@ -1,20 +1,19 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "openlm-research/open_llama_3b",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
}
}
}
},
"data_configs": [ { "name": "transformer_token_dummy_data", "type": "TransformersTokenDummyDataContainer" } ],

Просмотреть файл

@ -1,20 +1,19 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "openlm-research/open_llama_3b",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
"attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
"position_ids": { "0": "batch_size", "1": "sequence_length" }
},
"kv_cache": true
},
"hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
}
}
}
},
"evaluators": {

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "openlm-research/open_llama_7b_v2" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "openlm-research/open_llama_7b_v2" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "openlm-research/open_llama_7b_v2" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "openlm-research/open_llama_7b_v2" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "openlm-research/open_llama_3b" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -92,9 +92,7 @@ def eval_accuracy(model: OliveModelHandler, data_dir, batch_size, device, execut
if model.framework == Framework.PYTORCH:
eval_args = LMEvalParser(
model="hf",
model_args=(
f"pretrained={model.model_path or model.hf_config.model_name},tokenizer={model_id},dtype=float32"
),
model_args=f"pretrained={model.model_path},tokenizer={model_id},dtype=float32",
batch_size=batch_size,
tasks="lambada_openai",
device="cpu",

Просмотреть файл

@ -1,11 +1,13 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"io_config": {
"model_path": "facebook/opt_125m",
"task": "text-generation",
"input_names": [ "input_ids", "attention_mask" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
@ -18,11 +20,6 @@
"ort_present_value_name": "present_value_<id>",
"dtype": "float16"
}
},
"hf_config": {
"model_name": "facebook/opt-125m",
"task": "text-generation",
"from_pretrained_args": { "trust_remote_code": true }
}
}
},

Просмотреть файл

@ -1,13 +1,8 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "microsoft/phi-1_5",
"task": "text-generation",
"from_pretrained_args": { "trust_remote_code": true }
}
}
"type": "HfModel",
"config": { "model_path": "microsoft/phi-1_5" }
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
"data_configs": [

Просмотреть файл

@ -1,7 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": { "hf_config": { "model_name": "microsoft/phi-2", "task": "text-generation" } }
"type": "HfModel",
"config": { "model_path": "microsoft/phi-2" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,11 +1,12 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"model_path": "microsoft/phi-2",
"io_config": {
"input_names": [ "input_ids", "attention_mask", "position_ids" ],
"output_names": [ "logits" ],
"input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
"input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
"input_types": [ "int32", "int32", "int32" ],
"dynamic_axes": {
"input_ids": { "0": "batch_size", "1": "sequence_length" },
@ -18,11 +19,6 @@
"ort_present_key_name": "present_key_<id>",
"ort_present_value_name": "present_value_<id>"
}
},
"hf_config": {
"model_name": "microsoft/phi-2",
"task": "text-generation",
"from_pretrained_args": { "trust_remote_code": true }
}
}
},

Просмотреть файл

@ -28,8 +28,7 @@ AML_MODEL_Path = {
"model_path": {
"type": "azureml_registry_model",
"config": {"registry_name": "azureml", "name": "Phi-3-mini-4k-instruct", "version": "7"},
},
"model_file_format": "PyTorch.MLflow",
}
}
@ -49,16 +48,20 @@ def get_args(raw_args):
type=str,
default=None,
choices=["qlora", "lora"],
help="Finetune method before onnxruntime optimization. "
"qlora finetuned model cannot be converted to onnx by model builder.",
help=(
"Finetune method before onnxruntime optimization. "
"qlora finetuned model cannot be converted to onnx by model builder."
),
)
parser.add_argument(
"--precision",
type=str,
default="int4",
choices=["fp32", "fp16", "int4"],
help="Choose from fp32 or int4(default) for cpu target; "
"fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web",
help=(
"Choose from fp32 or int4(default) for cpu target; "
"fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web"
),
)
parser.add_argument(
"--inference",

Просмотреть файл

@ -1,13 +1,7 @@
{
"input_model": {
"type": "PyTorchModel",
"config": {
"hf_config": {
"model_name": "microsoft/Phi-3-mini-4k-instruct",
"task": "text-generation",
"from_pretrained_args": { "trust_remote_code": true }
}
}
"input_model":{
"type": "HfModel",
"config": { "model_path": "microsoft/Phi-3-mini-4k-instruct" }
},
"systems": {
"local_system": {

Просмотреть файл

@ -1,11 +1,8 @@
{
"input_model": {
"type": "PyTorchModel",
"type": "HfModel",
"config": {
"hf_config": {
"model_name": "togethercomputer/RedPajama-INCITE-Base-3B-v1",
"model_class": "GPTNeoXForCausalLM"
}
"model_path": "togethercomputer/RedPajama-INCITE-Base-3B-v1"
}
},
"systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },

Просмотреть файл

@ -3,6 +3,7 @@
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from past_helper import PastKeyValuesHelper
from transformers import AutoConfig, WhisperForConditionalGeneration
from whisper_dataset import WhisperDataset
from whisper_decoder import WhisperDecoder, WhisperDecoderInputs
from whisper_encoder_decoder_init import WhisperEncoderDecoderInit, WhisperEncoderDecoderInitInputs
@ -11,9 +12,8 @@ from olive.data.registry import Registry
from olive.model import PyTorchModelHandler
def get_encoder_decoder_init(olive_model: PyTorchModelHandler):
# model is WhisperForConditionalGeneration
model = olive_model.load_model()
def get_encoder_decoder_init(model_path: str):
model = WhisperForConditionalGeneration.from_pretrained(model_path, attn_implementation="eager")
return WhisperEncoderDecoderInit(
model,
model,
@ -22,9 +22,8 @@ def get_encoder_decoder_init(olive_model: PyTorchModelHandler):
)
def get_decoder(olive_model: PyTorchModelHandler):
# model is WhisperForConditionalGeneration
model = olive_model.load_model()
def get_decoder(model_path: str):
model = WhisperForConditionalGeneration.from_pretrained(model_path, attn_implementation="eager")
return WhisperDecoder(model, model.config)
@ -104,7 +103,7 @@ def get_encdec_io_config(olive_model: PyTorchModelHandler):
def get_dec_io_config(olive_model: PyTorchModelHandler):
# Fix past disappearing bug - duplicate first past entry
# input_list.insert(2, input_list[2])
config = olive_model.get_hf_model_config()
config = AutoConfig.from_pretrained(olive_model.model_path)
past_names = PastKeyValuesHelper.get_past_names(config.decoder_layers, present=False)
present_names = PastKeyValuesHelper.get_past_names(config.decoder_layers, present=True)
present_self_names = present_names[: 2 * config.decoder_layers]
@ -145,7 +144,7 @@ def get_dec_io_config(olive_model: PyTorchModelHandler):
def encoder_decoder_init_dummy_inputs(olive_model: PyTorchModelHandler):
inputs = WhisperEncoderDecoderInitInputs.create_dummy(
olive_model.get_hf_model_config(),
AutoConfig.from_pretrained(olive_model.model_path),
batch_size=2,
encode_sequence_length=3000,
use_decoder_input_ids=True,
@ -157,7 +156,7 @@ def encoder_decoder_init_dummy_inputs(olive_model: PyTorchModelHandler):
def decoder_dummy_inputs(olive_model: PyTorchModelHandler):
inputs = WhisperDecoderInputs.create_dummy(
olive_model.get_hf_model_config(),
AutoConfig.from_pretrained(olive_model.model_path),
batch_size=2,
encode_sequence_length=3000,
past_decode_sequence_length=5,

Просмотреть файл

@ -10,7 +10,7 @@ from urllib import request
from onnxruntime import __version__ as OrtVersion
from packaging import version
from transformers import __version__ as TransformersVersion
from transformers import AutoConfig
SUPPORTED_WORKFLOWS = {
("cpu", "fp32"): ["conversion", "transformers_optimization", "insert_beam_search", "prepost"],
@ -95,7 +95,6 @@ def main(raw_args=None):
# version check
version_1_16 = version.parse(OrtVersion) >= version.parse("1.16.0")
transformers_version_4_36 = version.parse(TransformersVersion) >= version.parse("4.36.0")
# multi-lingual support check
if not version_1_16:
@ -114,10 +113,15 @@ def main(raw_args=None):
template_json = json.load(f)
model_name = args.model_name
# update model name
template_json["input_model"]["config"]["hf_config"]["model_name"] = model_name
if transformers_version_4_36:
template_json["input_model"]["config"]["hf_config"]["from_pretrained_args"] = {"attn_implementation": "eager"}
# update model paths
for model_component in template_json["input_model"]["config"]["model_components"]:
model_component["config"]["model_path"] = model_name
# update model attributes
template_json["input_model"]["config"]["model_attributes"] = model_attributes = AutoConfig.from_pretrained(
model_name
).to_dict()
# remove suppress_tokens since it takes too much space in the config
model_attributes.pop("suppress_tokens", None)
load_dataset_params = template_json["data_configs"][0]["load_dataset_config"]["params"]
load_dataset_params["model_name"] = model_name

Просмотреть файл

@ -60,7 +60,7 @@ def main(raw_args=None):
config = json.load(f)
# get model information
model_name = config["input_model"]["config"]["hf_config"]["model_name"]
model_name = config["input_model"]["config"]["model_components"][0]["config"]["model_path"]
use_audio_decoder = config["passes"]["prepost"]["config"]["tool_command_args"]["use_audio_decoder"]
# check if model is multilingual
multilingual = config["passes"]["insert_beam_search"]["config"].get("use_forced_decoder_ids", False)

Просмотреть файл

@ -1,27 +1,33 @@
{
"input_model": {
"type": "PyTorchModel",
"input_model":{
"type": "CompositeModel",
"config": {
"model_script": "code/user_script.py",
"script_dir": "code",
"hf_config": {
"model_class": "WhisperForConditionalGeneration",
"model_name": "<place_holder>",
"components": [
{
"name": "encoder_decoder_init",
"model_component_names": ["encoder_decoder_init", "decoder"],
"model_components": [
{
"type": "PyTorchModel",
"config" : {
"model_path": "<place_holder>",
"model_script": "code/user_script.py",
"script_dir": "code",
"model_loader": "get_encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
},
{
"name": "decoder",
}
},
{
"type": "PyTorchModel",
"config" : {
"model_path": "<place_holder>",
"model_script": "code/user_script.py",
"script_dir": "code",
"model_loader": "get_decoder",
"io_config": "get_dec_io_config",
"component_func": "get_decoder",
"dummy_inputs_func": "decoder_dummy_inputs"
}
]
}
}
],
"model_attributes": "<place_holder>"
}
},
"systems": {

Просмотреть файл

@ -4,12 +4,14 @@
# --------------------------------------------------------------------------
import json
import logging
import os
import shutil
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Dict, Optional, Union
from olive.common.config_utils import ConfigBase, convert_configs_to_dicts, serialize_to_json, validate_config
from olive.common.constants import DEFAULT_CACHE_DIR, DEFAULT_WORKFLOW_ID
from olive.common.utils import hash_dict, set_nested_dict_value
from olive.resource_path import ResourcePath, create_resource_path, find_all_resources
@ -25,6 +27,19 @@ class CacheSubDirs:
runs: Path
evaluations: Path
resources: Path
mlflow: Path
cloud_cache: Path
@classmethod
def from_cache_dir(cls, cache_dir: Path) -> "CacheSubDirs":
return cls(
models=cache_dir / "models",
runs=cache_dir / "runs",
evaluations=cache_dir / "evaluations",
resources=cache_dir / "resources",
mlflow=cache_dir / "mlflow",
cloud_cache=cache_dir / "cloud_cache",
)
class OliveCache:
@ -36,12 +51,7 @@ class OliveCache:
):
self.cache_dir = Path(cache_dir).resolve()
logger.info("Using cache directory: %s", self.cache_dir)
self.dirs = CacheSubDirs(
models=self.cache_dir / "models",
runs=self.cache_dir / "runs",
evaluations=self.cache_dir / "evaluations",
resources=self.cache_dir / "resources",
)
self.dirs = CacheSubDirs.from_cache_dir(self.cache_dir)
if clean_evaluation_cache and self.dirs.evaluations.exists():
shutil.rmtree(self.dirs.evaluations, ignore_errors=True)
@ -243,7 +253,7 @@ class OliveCache:
with model_jsons[0].open("r") as f:
model_json = serialize_to_json(json.load(f))
if model_json["type"].lower() in ("compositemodel", "compositepytorchmodel"):
if model_json["type"].lower() == "compositemodel":
logger.warning("Saving models of type '%s' is not supported yet.", model_json["type"])
return None
@ -289,3 +299,17 @@ class OliveCache:
json.dump(model_json, f, indent=4)
return model_json
def set_cache_env(self):
"""Set environment variable for the cache directory."""
os.environ["OLIVE_CACHE_DIR"] = str(self.cache_dir)
logger.debug("Set OLIVE_CACHE_DIR: %s", self.cache_dir)
@classmethod
def from_cache_env(cls) -> "OliveCache":
"""Create an OliveCache object from the cache directory environment variable."""
cache_dir = os.environ.get("OLIVE_CACHE_DIR")
if cache_dir is None:
logger.debug("OLIVE_CACHE_DIR environment variable not set. Using default cache directory.")
cache_dir = Path(DEFAULT_CACHE_DIR).resolve() / DEFAULT_WORKFLOW_ID
return cls(cache_dir)

Просмотреть файл

@ -14,7 +14,13 @@ class OS(str, Enum):
DEFAULT_WORKFLOW_ID = "default_workflow"
DEFAULT_CACHE_DIR = ".olive-cache"
############# Packaging #############
BASE_IMAGE = "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04"
############# HF #############
DEFAULT_HF_TASK = "text-generation-with-past"

Просмотреть файл

@ -2,9 +2,3 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
class CompositeMixin:
def set_composite_parent(self, cp):
self.composite_parent = cp
def get_composite_parent(self):
return self.composite_parent

29
olive/common/hf/login.py Normal file
Просмотреть файл

@ -0,0 +1,29 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
import os
logger = logging.getLogger(__name__)
def huggingface_login(token: str):
from huggingface_hub import login
login(token=token)
def aml_runner_hf_login():
hf_login = os.environ.get("HF_LOGIN")
if hf_login:
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
keyvault_name = os.environ.get("KEYVAULT_NAME")
logger.debug("Getting token from keyvault %s", keyvault_name)
credential = DefaultAzureCredential()
secret_client = SecretClient(vault_url=f"https://{keyvault_name}.vault.azure.net/", credential=credential)
token = secret_client.get_secret("hf-token").value
huggingface_login(token)

Просмотреть файл

50
olive/common/hf/mlflow.py Normal file
Просмотреть файл

@ -0,0 +1,50 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from pathlib import Path
import yaml
def is_mlflow_transformers(model_name_or_path: str) -> bool:
yaml_path = Path(model_name_or_path) / "MLmodel"
if not yaml_path.exists():
return False
with open(yaml_path) as fp:
mlflow_data = yaml.safe_load(fp)
# default flavor is "hftransformersv2" from azureml.evaluate.mlflow>=0.0.8
# "hftransformers" from azureml.evaluate.mlflow<0.0.8
# TODO(trajep): let user specify flavor name if needed
# to support other flavors in mlflow not only hftransformers
flavors = mlflow_data.get("flavors", {})
if not flavors or not any(flavor.startswith("hftransformers") for flavor in flavors):
raise ValueError(
"Invalid MLFlow model format. Please make sure the input model"
" format is same with the result of mlflow.transformers.save_model,"
" or aml_mlflow.hftransformers.save_model from azureml.evaluate.mlflow"
)
return True
def get_pretrained_name_or_path(model_name_or_path: str, name: str) -> str:
if not is_mlflow_transformers(model_name_or_path):
# assumed to be an hf hub id or a local checkpoint
return model_name_or_path
parent_dir = Path(model_name_or_path).resolve()
# assumed to be an mlflow model
pretrained_path = parent_dir / "data" / name
if pretrained_path.exists():
return str(pretrained_path)
# some mlflow models only have model directory
model_dir = parent_dir / "data" / "model"
if model_dir.exists():
return str(model_dir)
raise ValueError("Invalid MLFlow model format.")

120
olive/common/hf/model_io.py Normal file
Просмотреть файл

@ -0,0 +1,120 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from functools import partial
from itertools import chain
from typing import TYPE_CHECKING, Callable, Dict, Optional
from olive.common.hf.utils import get_feature_from_task, get_model_config, get_tokenizer
from olive.common.utils import get_attr
if TYPE_CHECKING:
from transformers.onnx import OnnxConfig
logger = logging.getLogger(__name__)
# patched version of transformers.onnx.features.supported_features_mapping
# to support additional models in olive
def patched_supported_features_mapping(
*supported_features: str, onnx_config_cls: Optional[str] = None
) -> Dict[str, Callable]:
"""Generate the mapping between supported the features and their corresponding OnnxConfig for a given model.
Args:
*supported_features: The names of the supported features.
onnx_config_cls: The OnnxConfig full name corresponding to the model.
Returns:
The dictionary mapping a feature to an OnnxConfig constructor.
"""
if onnx_config_cls is None:
raise ValueError("A OnnxConfig class must be provided")
from olive.common.hf import onnx_config
config_cls = get_attr(onnx_config, onnx_config_cls)
mapping = {}
for feature in supported_features:
if "-with-past" in feature:
mapping[feature] = partial(config_cls.with_past, task=feature.replace("-with-past", ""))
else:
mapping[feature] = partial(config_cls.from_model_config, task=feature)
return mapping
# TODO(jambayk): switch to optimum backend and make this an optional feature
# remove "feature" entirely from the codebase
def get_onnx_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs) -> "OnnxConfig":
# pylint: disable=protected-access
from transformers.onnx import FeaturesManager
from olive.common.hf.onnx_config import ADDITIONAL_MODEL_TYPES
# patch FeaturesManager._SUPPORTED_MODEL_TYPE to support additional models in olive
for model_type, feature_list in ADDITIONAL_MODEL_TYPES.items():
if model_type in FeaturesManager._SUPPORTED_MODEL_TYPE:
continue
# TODO(trajep): remove the need for unpacking feature_list
features, onnx_config_cls = feature_list
FeaturesManager._SUPPORTED_MODEL_TYPE[model_type] = patched_supported_features_mapping(
*features, onnx_config_cls=onnx_config_cls
)
# if feature is not provided, try to get it from task
# else use "default"
feature = feature or get_feature_from_task(task) or "default"
# don't want to load the model here since all we need is the config
# model loading is expensive computationally and memory-wise for large models
config = get_model_config(model_name, **kwargs)
# recreate the logic for FeaturesManager.check_supported_model_or_raise to get the model_onnx_config
# https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/features.py#L712
model_type = config.model_type.replace("_", "-")
onnx_config = None
try:
model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
if feature in model_features:
onnx_config = FeaturesManager.get_config(model_type, feature)(config)
else:
logger.debug(
"%s doesn't support feature %s. Supported features are: %s", model_type, feature, model_features
)
except KeyError:
logger.debug("Model type %s is not supported", model_type)
return onnx_config
def get_model_io_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
# just log a debug message if io_config is not found
# this is not a critical error and the caller may not need the io_config
model_config = get_onnx_config(model_name, task, feature, **kwargs)
if not model_config:
return None
inputs = model_config.inputs
outputs = model_config.outputs
if not inputs or not outputs:
# just log a warning and return None, since this is not a critical error
# and following pass may not use the io_config, like OptimumConversion
logger.debug("No inputs or outputs found from hf onnx_config %s. Won't use it to get io config", model_config)
return None
io_config = {}
io_config["input_names"] = list(inputs.keys())
io_config["output_names"] = list(outputs.keys())
io_config["dynamic_axes"] = dict(chain(inputs.items(), outputs.items()))
return io_config
def get_model_dummy_input(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
model_config = get_onnx_config(model_name, task, feature, **kwargs)
if not model_config:
return None
tokenizer = get_tokenizer(model_name)
return model_config.generate_dummy_inputs(tokenizer, framework="pt")

Просмотреть файл

151
olive/common/hf/utils.py Normal file
Просмотреть файл

@ -0,0 +1,151 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from typing import TYPE_CHECKING, Optional, Tuple, Union
from transformers import AutoConfig, AutoModel, AutoTokenizer, GenerationConfig
from olive.common.hf.mappings import FEATURE_TO_PEFT_TASK_TYPE, MODELS_TO_MAX_LENGTH_MAPPING, TASK_TO_FEATURE
from olive.common.hf.mlflow import get_pretrained_name_or_path
if TYPE_CHECKING:
from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast
logger = logging.getLogger(__name__)
def load_model_from_task(task: str, model_name_or_path: str, **kwargs) -> "PreTrainedModel":
"""Load huggingface model from task and model_name_or_path."""
from transformers.pipelines import check_task
task_results = check_task(task.replace("-with-past", ""))
assert isinstance(task_results, tuple)
if len(task_results) == 2:
targeted_task = task_results[0]
elif len(task_results) == 3:
targeted_task = task_results[1]
else:
raise ValueError("unsupported transformers version")
class_tuple = targeted_task["pt"] or (AutoModel,)
model = None
for i, model_class in enumerate(class_tuple):
try:
model = from_pretrained(model_class, model_name_or_path, "model", **kwargs)
logger.debug("Loaded model %s with name_or_path %s", model_class, model_name_or_path)
break
except (OSError, ValueError) as e:
if i == len(class_tuple) - 1:
# len(class_tuple) == 1 covers most common tasks like text-generation, text-classification, etc
# error could be device OOM, device_map: "auto" not supported, etc
# len(class_tuple) > 1: not common - image-segmentation, conversational, etc
# there is no easy way to get tracebacks for earlier failures, so just raise from last
raise
# the ValueError need to be caught since there will be multiple model_class for single task.
# if the model_class is not the one for the task, it will raise ValueError and
# next model_class will be tried.
logger.info(
"Failed to load model %s with name_or_path %s.\n kwargs: %s.\n Exception raised: %s",
model_class,
model_name_or_path,
kwargs,
e,
)
# this won't be None since class_tuple is never empty and we only reach here if model loaded successfully
# satisfies linter too
return model
def from_pretrained(cls, model_name_or_path: str, mlflow_dir: str, **kwargs):
"""Call cls.from_pretrained with hf checkpoint or mlflow model.
If the model_name_or_path is an MLFlow model, the corresponding subdirectory is used.
"""
return cls.from_pretrained(get_pretrained_name_or_path(model_name_or_path, mlflow_dir), **kwargs)
def get_model_config(model_name_or_path: str, **kwargs) -> "PretrainedConfig":
"""Get HF Config for the given model_name_or_path."""
return from_pretrained(AutoConfig, model_name_or_path, "config", **kwargs)
def save_model_config(config: Union["PretrainedConfig", "GenerationConfig"], output_dir: str, **kwargs):
"""Save input HF Config to output directory."""
config.save_pretrained(output_dir, **kwargs)
def get_generation_config(model_name_or_path: str, **kwargs) -> Optional["GenerationConfig"]:
"""Get HF model's generation config for the given model_name_or_path. If not found, return None."""
try:
return from_pretrained(GenerationConfig, model_name_or_path, "model", **kwargs)
except OSError:
return None
def get_tokenizer(model_name_or_path: str, **kwargs) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
"""Get HF model's tokenizer."""
return from_pretrained(AutoTokenizer, model_name_or_path, "tokenizer", **kwargs)
def save_tokenizer(
tokenizer: Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"], output_dir: str, **kwargs
) -> Tuple[str]:
"""Save input tokenizer to output directory."""
return tokenizer.save_pretrained(output_dir, **kwargs)
# TODO(jambayk): Remove this once we transition away from using "feature"
def get_feature_from_task(task: str, fail_on_not_found=False) -> str:
"""Get feature from task."""
feature = TASK_TO_FEATURE.get(task.replace("-with-past", ""), None)
not_found_msg = f"There is no feature for task {task}"
if feature is None and fail_on_not_found:
raise ValueError(not_found_msg)
elif feature is None:
logger.warning(not_found_msg)
elif task.endswith("-with-past"):
feature += "-with-past"
return feature
def get_peft_task_type_from_task(task: str, fail_on_not_found=False) -> str:
"""Get peft task type from feature."""
feature = get_feature_from_task(task)
peft_task_type = FEATURE_TO_PEFT_TASK_TYPE.get(feature.replace("-with-past", ""), None) if feature else None
not_found_msg = f"There is no peft task type for task {task}"
if peft_task_type is None and fail_on_not_found:
raise ValueError(not_found_msg)
elif peft_task_type is None:
logger.warning(not_found_msg)
return peft_task_type
def get_model_max_length(model_name_or_path: str, fail_on_not_found=False) -> int:
"""Get max length of the model, extracted from the config."""
model_config = get_model_config(model_name_or_path)
model_type = model_config.model_type
max_length = MODELS_TO_MAX_LENGTH_MAPPING.get(model_type, None)
if isinstance(max_length, int):
return max_length
elif isinstance(max_length, str):
return getattr(model_config, max_length)
else:
logger.debug(
"No max length mapping found in MODELS_TO_MAX_LENGTH_MAPPING for model type %s, trying __default__",
model_type,
)
default_max_length = MODELS_TO_MAX_LENGTH_MAPPING["__default__"]
try:
return getattr(model_config, default_max_length)
except AttributeError:
not_found_msg = f"Could not find max length for model type {model_type}"
if fail_on_not_found:
raise ValueError(not_found_msg) from None
else:
logger.warning(not_found_msg)
return None

Просмотреть файл

@ -8,7 +8,7 @@ from pathlib import Path
from typing import Optional, Union
def import_module_from_file(module_path: Union[Path, str], module_name: str = None):
def import_module_from_file(module_path: Union[Path, str], module_name: Optional[str] = None):
module_path = Path(module_path).resolve()
if not module_path.exists():
raise ValueError(f"{module_path} doesn't exist")

Просмотреть файл

@ -16,7 +16,7 @@ import subprocess
import tempfile
import time
from pathlib import Path
from typing import Dict, List, Tuple, Union
from typing import Dict, List, Optional, Tuple, Union
from olive.common.constants import OS
@ -171,6 +171,13 @@ def set_nested_dict_value(dictionary: dict, key: Union[str, Tuple, List[str]], n
dictionary[key[-1]] = new_value
def dict_diff(dict1: Optional[dict], dict2: Optional[dict]) -> Optional[dict]:
"""Return all members of dict1 that are not in dict2 or have different values."""
dict1 = dict1 or {}
dict2 = dict2 or {}
return {k: v for k, v in dict1.items() if k not in dict2 or dict2[k] != v} or None
def retry_func(func, args=None, kwargs=None, max_tries=3, delay=5, backoff=2, exceptions=None):
"""Retry a function call using an exponential backoff.
@ -288,27 +295,6 @@ def find_submodules(module, submodule_types, full_name=False):
return list(submodules) if submodules else None
def huggingface_login(token: str):
from huggingface_hub import login
login(token=token)
def aml_runner_hf_login():
hf_login = os.environ.get("HF_LOGIN")
if hf_login:
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
keyvault_name = os.environ.get("KEYVAULT_NAME")
logger.debug("Getting token from keyvault %s", keyvault_name)
credential = DefaultAzureCredential()
secret_client = SecretClient(vault_url=f"https://{keyvault_name}.vault.azure.net/", credential=credential)
token = secret_client.get_secret("hf-token").value
huggingface_login(token)
def all_files(path, ignore=None):
"""Find all files in a directory recursively, optionally ignoring some paths.

Просмотреть файл

@ -23,7 +23,6 @@ class ModelFileFormat(str, Enum):
PYTORCH_ENTIRE_MODEL = "PyTorch.EntireModel"
PYTORCH_STATE_DICT = "PyTorch.StateDict"
PYTORCH_TORCH_SCRIPT = "PyTorch.TorchScript"
PYTORCH_MLFLOW_MODEL = "PyTorch.MLflow"
PYTORCH_SLICE_GPT_MODEL = "PyTorch.SliceGPT"
TENSORFLOW_PROTOBUF = "TensorFlow.Protobuf"
TENSORFLOW_SAVED_MODEL = "TensorFlow.SavedModel"

Просмотреть файл

@ -11,6 +11,7 @@ import numpy as np
import torch
from torch.utils.data import Dataset as TorchDataset
from olive.common.hf.utils import get_model_config
from olive.common.utils import find_first_matched_value, resolve_torch_dtype
from olive.constants import Framework
@ -343,9 +344,7 @@ class TransformersDummyDataset(BaseDataset):
# can instead write dummy input functions like 'get_merged_decoder_with_past_dummy_inputs' if needed
# Using Namespace class to access dict items like class attributes
from transformers import AutoConfig
model_attributes = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code).__dict__
model_attributes = get_model_config(model_name, trust_remote_code=trust_remote_code).to_dict()
world_size = model_attributes.get("world_size", 1)
vocab_size = model_attributes.get("vocab_size", 50256)
input_ids = torch.randint(low=0, high=vocab_size, size=(seq_len,), dtype=torch.int64)
@ -371,7 +370,7 @@ class TransformersDummyDataset(BaseDataset):
Shape of past_key_values is (num_heads, past_seq_len, head_size).
"""
from olive.model.utils.hf_mappings import (
from olive.common.hf.mappings import (
HIDDEN_SIZE_NAMES,
NUM_HEADS_NAMES,
NUM_HIDDEN_LAYER_NAMES,

Просмотреть файл

@ -7,6 +7,7 @@
from copy import deepcopy
from typing import Any, Dict, List, Optional
from olive.common.hf.utils import get_model_config, get_tokenizer
from olive.data.component.dataset import BaseDataset
from olive.data.component.text_generation import (
TextGenDatasetType,
@ -78,10 +79,9 @@ def huggingface_pre_process(
object: Pre-processed data.
"""
from transformers import AutoConfig, AutoTokenizer
def _tokenizer_and_align_labels(examples):
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)
tokenized_inputs = tokenizer(
*[examples[input_col] for input_col in input_cols],
padding=kwargs.get("padding", True),
@ -100,9 +100,7 @@ def huggingface_pre_process(
# align_labels -> align_labels_with_mapping
# Also to support customized operation arguments from users
if kwargs.pop("align_labels", False):
model_hf_config = AutoConfig.from_pretrained(
model_config_path or model_name, trust_remote_code=trust_remote_code
)
model_hf_config = get_model_config(model_config_path or model_name, trust_remote_code=trust_remote_code)
if model_hf_config and model_hf_config.label2id:
dataset = dataset.align_labels_with_mapping(model_hf_config.label2id, label_cols[0])
@ -118,7 +116,6 @@ def ner_huggingface_preprocess(
dataset, model_name, input_cols, label_cols, max_samples=None, trust_remote_code=None, **kwargs
):
"""Pre-process data for ner task."""
from transformers import AutoTokenizer
def _align_labels_with_tokens(labels, word_ids):
new_labels = []
@ -142,7 +139,7 @@ def ner_huggingface_preprocess(
return new_labels
def _tokenizer_and_align_labels(examples):
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)
tokenized_inputs = tokenizer(
*[examples[input_col] for input_col in input_cols],
padding=kwargs.get("padding", True),
@ -193,15 +190,13 @@ def text_generation_huggingface_pre_process(
Note: the TextGenCorpusParams and TextGenPairParams subclasses already include the common arguments.
"""
from transformers import AutoTokenizer
all_kwargs = deepcopy(kwargs)
# task is not used in the pre-process function. Will pop it so that the config validation doesn't warn about
# unused kwargs
all_kwargs.pop("task", None)
all_kwargs.update({"max_samples": max_samples, "source_max_len": source_max_len})
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)
if dataset_type == TextGenDatasetType.CORPUS:
return text_gen_corpus_pre_process(dataset, tokenizer, all_kwargs)
@ -240,12 +235,12 @@ def audio_classification_pre_process(
"""
from datasets import Audio
from transformers import AutoConfig, AutoFeatureExtractor
from transformers import AutoFeatureExtractor
assert len(input_cols) == 1, "Only one input column is supported for audio classification task."
# align labels with model configs
model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code)
model_config = get_model_config(model_name, trust_remote_code=trust_remote_code)
labels_to_filter = kwargs.get("labels_to_filter", None) or []
dataset = dataset.filter(
lambda x: x not in dataset.features["label"].str2int(labels_to_filter), input_columns=label_cols[0]

Просмотреть файл

@ -193,6 +193,8 @@ class DataConfig(ConfigBase):
if config and config.params:
task_type = config.params.get("task")
if task_type:
task_specific_override = dc_cls.task_type_components_map.get(task_type, {}).get(component_name)
task_specific_override = dc_cls.task_type_components_map.get(
task_type.replace("-with-past", ""), {}
).get(component_name)
if task_specific_override:
default_components_type[component_name] = task_specific_override

Просмотреть файл

@ -8,11 +8,12 @@ import logging
import tempfile
from copy import deepcopy
from pathlib import Path
from typing import Any, Dict
from typing import Any, Dict, Optional
from olive.common.config_utils import ConfigBase
from olive.common.utils import get_credentials, hash_dict
from olive.model.config.model_config import ModelConfig
from olive.resource_path import create_resource_path
logger = logging.getLogger(__name__)
@ -67,14 +68,19 @@ class CloudCacheHelper:
return model_config
def get_hash_key(self, model_config: ModelConfig, pass_search_point: Dict[str, Any], input_model_hash: str):
def get_hash_key(
self, model_config: ModelConfig, pass_search_point: Dict[str, Any], input_model_hash: Optional[str]
):
hf_hub_model_commit_id = None
model_config_copy = deepcopy(model_config)
if input_model_hash is None:
if (
input_model_hash is None
and model_config.type.lower() == "hfmodel"
and create_resource_path(model_config.config["model_path"]).is_string_name()
):
from huggingface_hub import repo_info
if model_config.has_hf_config():
hf_hub_model_commit_id = repo_info(model_config.get_hf_model_name()).sha
hf_hub_model_commit_id = repo_info(model_config.config["model_path"]).sha
else:
model_config_copy.config.pop("model_path", None)
return hash_dict(

Просмотреть файл

@ -12,10 +12,10 @@ from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Type, Union
from olive.cache import OliveCache
from olive.common.config_utils import validate_config
from olive.common.constants import DEFAULT_WORKFLOW_ID
from olive.common.constants import DEFAULT_CACHE_DIR, DEFAULT_WORKFLOW_ID
from olive.common.utils import hash_dict
from olive.engine.cache import OliveCache
from olive.engine.cloud_cache_helper import (
CloudCacheHelper,
check_model_cache,
@ -31,7 +31,7 @@ from olive.evaluator.olive_evaluator import OliveEvaluatorConfig
from olive.exception import EXCEPTIONS_TO_RAISE, OlivePassError
from olive.hardware import AcceleratorSpec
from olive.model import ModelConfig
from olive.resource_path import ResourceType, create_resource_path
from olive.resource_path import create_resource_path
from olive.strategy.search_strategy import SearchStrategy, SearchStrategyConfig
from olive.systems.common import SystemType
from olive.systems.system_config import SystemConfig
@ -59,7 +59,7 @@ class Engine:
host: Optional[Union[Dict[str, Any], "SystemConfig"]] = None,
target: Optional[Union[Dict[str, Any], "SystemConfig"]] = None,
evaluator: Optional[Union[Dict[str, Any], "OliveEvaluatorConfig"]] = None,
cache_dir: str = ".olive-cache",
cache_dir: str = DEFAULT_CACHE_DIR,
clean_cache: bool = False,
clean_evaluation_cache: bool = False,
plot_pareto_frontier: bool = False,
@ -111,6 +111,10 @@ class Engine:
def initialize(self):
"""Initialize engine state. This should be done before running the registered passes."""
# set cache dir environment variables
# might be used by other parts of olive to cache data
self.cache.set_cache_env()
# clean pass run cache if requested
# removes all run cache for pass type and all children elements
for pass_config in self.pass_config.values():
@ -473,7 +477,7 @@ class Engine:
output_name=f"{pass_output_name}_model",
overwrite=True,
)
# it is not supported to save compositepytorchmodel/compositemodel again
# it is not supported to save compositemodel again
# so the output_model_json could be None
output_models[pass_output_model_id] = output_model_json
@ -797,19 +801,18 @@ class Engine:
output_model_hash = None
if cloud_cache_config.enable_cloud_cache:
if (
model_config.config.get("model_path")
and create_resource_path(model_config.config.get("model_path")) == ResourceType.StringName
if not (
model_config.type.lower() == "hfmodel"
and create_resource_path(model_config.config.get("model_path")).is_string_name()
):
logger.warning(
"Model path is a str name, should not use cloud model cache. Set enable_cloud_cache=False."
"Only HfModel with huggingface id as model_path is supported by cloud cache. Setting"
" enable_cloud_cache=False."
)
cloud_cache_config.enable_cloud_cache = False
else:
cloud_cache_dir = Path(self.cache_dir) / "cloud_models"
cloud_cache_dir.mkdir(parents=True, exist_ok=True)
self.cloud_cache_helper = CloudCacheHelper(
cloud_cache_dir,
self.cache.dirs.cloud_cache,
cloud_cache_config.account_url,
cloud_cache_config.contaier_name,
cloud_cache_config.input_model_config,

Просмотреть файл

@ -3,7 +3,6 @@
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive.model.config import ModelConfig
from olive.model.config.hf_config import HfFromPretrainedArgs
from olive.model.handler import * # noqa: F403
__all__ = ["ModelConfig", "HfFromPretrainedArgs"]
__all__ = ["ModelConfig"]

Просмотреть файл

@ -2,7 +2,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive.model.config.hf_config import HfComponent, HfConfig
from olive.model.config.hf_config import HfLoadKwargs
from olive.model.config.io_config import (
IoConfig,
complete_kv_cache_with_model_attributes,
@ -12,8 +12,7 @@ from olive.model.config.kv_cache_config import KVCacheConfig
from olive.model.config.model_config import ModelConfig
__all__ = [
"HfComponent",
"HfConfig",
"HfLoadKwargs",
"IoConfig",
"KVCacheConfig",
"ModelConfig",

Просмотреть файл

@ -4,38 +4,19 @@
# --------------------------------------------------------------------------
import logging
from copy import deepcopy
from typing import Any, Callable, Dict, List, Union
from typing import Any, Dict, Union
import torch
import transformers
from olive.common.config_utils import ConfigBase, ConfigWithExtraArgs
from olive.common.config_utils import ConfigWithExtraArgs
from olive.common.pydantic_v1 import Field, validator
from olive.common.utils import resolve_torch_dtype
from olive.model.config.io_config import IoConfig
logger = logging.getLogger(__name__)
class HfComponent(ConfigBase):
"""Used for Hf models which has multiple components, such as whisper.
For example, in the Whisper model example, the component looks like:
{
"name": "encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
}
"""
name: str
io_config: Union[IoConfig, Dict[str, Any], str, Callable]
component_func: Union[str, Callable] = None
dummy_inputs_func: Union[str, Callable]
class HfFromPretrainedArgs(ConfigWithExtraArgs):
class HfLoadKwargs(ConfigWithExtraArgs):
"""Arguments to pass to the `from_pretrained` method of the model class.
Refer to https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained
@ -139,7 +120,7 @@ class HfFromPretrainedArgs(ConfigWithExtraArgs):
)
return v
def get_loading_args(self) -> Dict[str, Any]:
def get_load_kwargs(self) -> Dict[str, Any]:
"""Return all args in a dict with types expected by `from_pretrained`."""
loading_args = {}
# copy args that can be directly copied
@ -215,52 +196,3 @@ class HfFromPretrainedArgs(ConfigWithExtraArgs):
if extras:
logger.warning("Unused kwargs in quantization_config: %s. Ignoring them", extras)
return config
class HfConfig(ConfigBase):
"""The config for HuggingFace models.
For example, the config for the Whisper model looks like:
"model_class": "WhisperForConditionalGeneration",
"model_name": "openai/whisper-tiny.en",
"components": [
{
"name": "encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
},
{
"name": "decoder",
"io_config": "get_dec_io_config",
"component_func": "get_decoder",
"dummy_inputs_func": "decoder_dummy_inputs"
}
]
"""
model_name: str = None
task: str = None
# feature is optional if task is specified and don't need past
# else, provide feature such as "causal-lm-with-past"
feature: str = None
# TODO(xiaoyu): remove model_class and only use task
model_class: str = None
components: List[HfComponent] = None
from_pretrained_args: HfFromPretrainedArgs = None
@validator("model_class", always=True)
def task_or_model_class_required(cls, v, values):
if values["model_name"] and not v and not values.get("task", None):
raise ValueError("Either task or model_class must be specified")
return v
def get_loading_args_from_pretrained(self) -> Dict[str, Any]:
"""Return all args from from_pretrained_args in a dict with types expected by `from_pretrained`."""
return self.from_pretrained_args.get_loading_args() if self.from_pretrained_args else {}
def get_model_type_from_hf_config(hf_config: HfConfig) -> str:
from olive.model.utils.hf_utils import get_hf_model_config
return get_hf_model_config(hf_config.model_name, **hf_config.get_loading_args_from_pretrained()).model_type

Просмотреть файл

@ -6,10 +6,10 @@ from copy import deepcopy
from typing import Any, Dict, List, Union
from olive.common.config_utils import ConfigBase
from olive.common.hf.mappings import HIDDEN_SIZE_NAMES, NUM_HEADS_NAMES, NUM_HIDDEN_LAYER_NAMES
from olive.common.pydantic_v1 import validator
from olive.common.utils import find_first_matched_value
from olive.model.config.kv_cache_config import KVCacheConfig
from olive.model.utils.hf_mappings import HIDDEN_SIZE_NAMES, NUM_HEADS_NAMES, NUM_HIDDEN_LAYER_NAMES
class IoConfig(ConfigBase):
@ -122,12 +122,14 @@ def complete_kv_cache_with_model_attributes(kv_cache, model_attributes):
num_hidden_layers = find_first_matched_value(model_attributes, NUM_HIDDEN_LAYER_NAMES)
num_attention_heads = find_first_matched_value(model_attributes, NUM_HEADS_NAMES)
hidden_size = find_first_matched_value(model_attributes, HIDDEN_SIZE_NAMES)
world_size = model_attributes.get("world_size", 1)
kv_cache_obj = None
if isinstance(kv_cache, bool) and kv_cache:
kv_cache_obj = KVCacheConfig(
num_hidden_layers=num_hidden_layers,
num_attention_heads=num_attention_heads,
hidden_size=hidden_size,
world_size=world_size,
)
elif isinstance(kv_cache, dict):
kv_cache_dict = deepcopy(kv_cache)
@ -136,6 +138,7 @@ def complete_kv_cache_with_model_attributes(kv_cache, model_attributes):
"num_hidden_layers": kv_cache.get("num_hidden_layers") or num_hidden_layers,
"num_attention_heads": kv_cache.get("num_attention_heads") or num_attention_heads,
"hidden_size": kv_cache.get("hidden_size") or hidden_size,
"world_size": kv_cache.get("world_size") or world_size,
}
)
kv_cache_obj = KVCacheConfig.parse_obj(kv_cache_dict)

Просмотреть файл

@ -2,6 +2,7 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from itertools import chain
from typing import Dict, Optional
from olive.common.config_utils import ConfigBase
@ -70,17 +71,14 @@ class KVCacheConfig(ConfigBase):
else:
return [self.ort_present_value_name.replace("<id>", str(i)) for i in range(self.num_hidden_layers)]
def get_ort_past_key_names(self):
return self._get_k_names("inputs")
def _get_kv_names(self, direction="inputs"):
return list(chain.from_iterable(zip(self._get_k_names(direction), self._get_v_names(direction))))
def get_ort_past_value_names(self):
return self._get_v_names("inputs")
def get_ort_past_kv_names(self):
return self._get_kv_names("inputs")
def get_ort_present_key_names(self):
return self._get_k_names("outputs")
def get_ort_present_value_names(self):
return self._get_v_names("outputs")
def get_ort_present_kv_names(self):
return self._get_kv_names("outputs")
def _get_kv_shape(self):
return [
@ -91,24 +89,20 @@ class KVCacheConfig(ConfigBase):
]
def get_input_names_shapes_types(self):
input_names = [*self.get_ort_past_key_names(), *self.get_ort_past_value_names()]
input_names = self.get_ort_past_kv_names()
input_shapes = [self._get_kv_shape()] * 2 * self.num_hidden_layers
input_types = [self.dtype] * 2 * self.num_hidden_layers
return input_names, input_shapes, input_types
def get_output_names(self):
return [*self.get_ort_present_key_names(), *self.get_ort_present_value_names()]
return self.get_ort_present_kv_names()
def get_dynamic_axes(self):
dynamic_axis = {}
for past_name in self.get_ort_past_key_names():
dynamic_axis[past_name] = self.past_kv_dynamic_axis
for past_name in self.get_ort_past_value_names():
for past_name in self.get_ort_past_kv_names():
dynamic_axis[past_name] = self.past_kv_dynamic_axis
for present_name in self.get_ort_present_key_names():
dynamic_axis[present_name] = self.present_kv_dynamic_axis
for present_name in self.get_ort_present_value_names():
for present_name in self.get_ort_present_kv_names():
dynamic_axis[present_name] = self.present_kv_dynamic_axis
return dynamic_axis

Просмотреть файл

@ -9,62 +9,7 @@ from olive.resource_path import create_resource_path
class ModelConfig(ConfigBase):
"""Input model config which will be used to create the model handler.
For example, the config looks like for llama2:
.. code-block:: json
{
"input_model": {
"type": "CompositePyTorchModel",
"config": {
"model_path": "llama_v2",
"generative": False,
"model_components": [
{
"name": "decoder_model",
"type": "PyTorchModel",
"config": {
"model_script": "user_script.py",
"io_config": {
"input_names": ["tokens", "position_ids", "attn_mask", //...],
"output_names": ["logits", "attn_mask_out", //...],
"dynamic_axes": {
"tokens": { "0": "batch_size", "1": "seq_len" },
"position_ids": { "0": "batch_size", "1": "seq_len" },
"attn_mask": { "0": "batch_size", "1": "max_seq_len" },
//...
}
},
"model_loader": "load_decoder_model",
"dummy_inputs_func": "decoder_inputs"
}
},
{
"name": "decoder_with_past_model",
"type": "PyTorchModel",
"config": {
"model_script": "user_script.py",
"io_config": {
"input_names": ["tokens_increment", "position_ids_increment", "attn_mask", //...],
"output_names": ["logits", "attn_mask_out", //...],
"dynamic_axes": {
"tokens_increment": { "0": "batch_size", "1": "seq_len_increment" },
"position_ids_increment": { "0": "batch_size", "1": "seq_len_increment" },
"attn_mask": { "0": "batch_size", "1": "max_seq_len" },
//...
}
},
"model_loader": "load_decoder_with_past_model",
"dummy_inputs_func": "decoder_with_past_inputs"
}
}
]
}
}
}
"""
"""Input model config which will be used to create the model handler."""
type: str = Field(description="The type of the model handler.")
config: dict = Field(description="The config for the model handler. Used to initialize the model handler.")
@ -84,14 +29,6 @@ class ModelConfig(ConfigBase):
resources = self.get_resource_strings()
return {k: create_resource_path(v) for k, v in resources.items()}
def get_hf_model_name(self):
if self.has_hf_config():
return self.config["hf_config"].get("model_name")
return None
def has_hf_config(self):
return self.config.get("hf_config") is not None
def create_model(self):
cls = get_model_handler(self.type)
return cls(**self.config)

Просмотреть файл

@ -3,10 +3,11 @@
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive.model.handler.base import OliveModelHandler
from olive.model.handler.composite import CompositeModelHandler, CompositePyTorchModelHandler
from olive.model.handler.composite import CompositeModelHandler
from olive.model.handler.hf import DistributedHfModelHandler, HfModelHandler
from olive.model.handler.onnx import DistributedOnnxModelHandler, ONNXModelHandler
from olive.model.handler.openvino import OpenVINOModelHandler
from olive.model.handler.pytorch import DistributedPyTorchModelHandler, PyTorchModelHandler
from olive.model.handler.pytorch import PyTorchModelHandler
from olive.model.handler.qnn import QNNModelHandler
from olive.model.handler.snpe import SNPEModelHandler
from olive.model.handler.tensorflow import TensorFlowModelHandler
@ -14,11 +15,11 @@ from olive.model.handler.tensorflow import TensorFlowModelHandler
__all__ = [
"OliveModelHandler",
"CompositeModelHandler",
"CompositePyTorchModelHandler",
"DistributedHfModelHandler",
"DistributedOnnxModelHandler",
"HfModelHandler",
"ONNXModelHandler",
"OpenVINOModelHandler",
"DistributedPyTorchModelHandler",
"PyTorchModelHandler",
"QNNModelHandler",
"SNPEModelHandler",

Просмотреть файл

@ -6,13 +6,13 @@ from olive.common.config_utils import validate_config
from olive.constants import Framework, ModelFileFormat
from olive.hardware.accelerator import Device
from olive.model.config import IoConfig
from olive.model.handler.mixin import CompositeMixin, IoConfigMixin, JsonMixin, ResourceMixin
from olive.model.handler.mixin import IoConfigMixin, JsonMixin, ResourceMixin
from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS
logger = logging.getLogger(__name__)
class OliveModelHandler(ABC, ResourceMixin, IoConfigMixin, JsonMixin, CompositeMixin):
class OliveModelHandler(ABC, ResourceMixin, IoConfigMixin, JsonMixin):
"""Abstraction for logical "Model", it contains model path and related metadata.
Each technique accepts Model as input, return Model as output.

Просмотреть файл

@ -3,10 +3,10 @@
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from copy import deepcopy
from typing import Any, Dict, List, Optional, Tuple, Union
from olive.common.config_utils import serialize_to_json, validate_config
from olive.common.utils import dict_diff
from olive.constants import Framework, ModelFileFormat
from olive.hardware.accelerator import Device
from olive.model.config.model_config import ModelConfig
@ -39,18 +39,23 @@ class CompositeModelHandler(OliveModelHandler):
model_file_format=ModelFileFormat.COMPOSITE_MODEL,
model_attributes=model_attributes,
)
if isinstance(model_components[0], dict):
self.model_components = [validate_config(m, ModelConfig).create_model() for m in model_components]
else:
assert all(
isinstance(m, OliveModelHandler) for m in model_components
), "All components must be OliveModelHandler"
self.model_components = model_components
self._model_components = [
validate_config(m, ModelConfig).create_model() if isinstance(m, dict) else m for m in model_components
]
assert all(
isinstance(m, OliveModelHandler) for m in self._model_components
), "All components must be OliveModelHandler or dict"
assert len(self.model_components) == len(model_component_names), "Number of components and names must match"
assert len(self._model_components) == len(model_component_names), "Number of components and names must match"
self.model_component_names = model_component_names
for m in self.model_components:
m.set_composite_parent(self)
@property
def model_components(self):
for m in self._model_components:
# the parent attributes should be inherited by the child model
# child attributes take precedence
m.model_attributes = {**(self.model_attributes or {}), **(m.model_attributes or {})}
yield m
def to_json(self, check_object: bool = False):
json_dict = {
@ -58,8 +63,13 @@ class CompositeModelHandler(OliveModelHandler):
"config": {"model_attributes": self.model_attributes, "model_component_names": self.model_component_names},
}
json_dict["config"]["model_components"] = []
for m in self.model_components:
json_dict["config"]["model_components"].append(m.to_json(check_object))
for m in self._model_components:
component_json = m.to_json(check_object)
# only keep attributes that are different from the parent
component_json["config"]["model_attributes"] = dict_diff(
component_json["config"]["model_attributes"], self.model_attributes
)
json_dict["config"]["model_components"].append(component_json)
return serialize_to_json(json_dict, check_object)
@ -85,35 +95,3 @@ class CompositeModelHandler(OliveModelHandler):
**kwargs: Dict[str, Any],
) -> Any:
raise RuntimeError("CompositeModelHandler doesn't have a session of its own")
@model_handler_registry("CompositePyTorchModel")
class CompositePyTorchModelHandler(CompositeModelHandler):
"""The CompositePyTorchModel handler.
Its main responsibility is to create a list of child PyTorch model and used to initialzie a composite model.
"""
def __init__(self, model_components: List[Dict[str, Any]], **kwargs):
model_names = []
pytorch_models = []
for model_config in model_components:
config_copy = deepcopy(model_config)
assert "name" in config_copy
model_name = config_copy["name"]
del config_copy["name"]
model_names.append(model_name)
pytorch_models.append(validate_config(config_copy, ModelConfig).create_model())
kwargs_inner = {}
kwargs_inner["model_components"] = pytorch_models
kwargs_inner["model_component_names"] = model_names
if "model_attributes" in kwargs:
kwargs_inner["model_attributes"] = kwargs["model_attributes"]
if "model_path" in kwargs:
logger.warning("model_path is not used in CompositePyTorchModelHandler")
super().__init__(**kwargs_inner)

216
olive/model/handler/hf.py Normal file
Просмотреть файл

@ -0,0 +1,216 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from pathlib import Path
from typing import Any, ClassVar, Dict, List, Optional, Tuple, Union
import torch
from olive.common.config_utils import serialize_to_json, validate_config
from olive.common.constants import DEFAULT_HF_TASK
from olive.common.hf.utils import load_model_from_task
from olive.common.utils import dict_diff
from olive.constants import Framework
from olive.hardware.accelerator import Device
from olive.model.config import HfLoadKwargs, IoConfig
from olive.model.config.registry import model_handler_registry
from olive.model.handler.base import OliveModelHandler
from olive.model.handler.mixin import HfMixin, MLFlowTransformersMixin
from olive.model.handler.pytorch import PyTorchModelHandlerBase
from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS
logger = logging.getLogger(__name__)
@model_handler_registry("HFModel")
class HfModelHandler(PyTorchModelHandlerBase, MLFlowTransformersMixin, HfMixin): # pylint: disable=too-many-ancestors
resource_keys: Tuple[str, ...] = ("model_path", "adapter_path")
json_config_keys: Tuple[str, ...] = ("task", "load_kwargs", "generative")
def __init__(
self,
model_path: OLIVE_RESOURCE_ANNOTATIONS,
task: str = DEFAULT_HF_TASK,
load_kwargs: Union[Dict[str, Any], HfLoadKwargs] = None,
io_config: Union[Dict[str, Any], IoConfig, str] = None,
adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
model_attributes: Optional[Dict[str, Any]] = None,
generative: bool = False,
):
super().__init__(
framework=Framework.PYTORCH,
model_file_format=None,
model_path=model_path,
model_attributes=model_attributes,
io_config=io_config,
generative=generative,
)
self.add_resources(locals())
self.task = task
self.load_kwargs = validate_config(load_kwargs, HfLoadKwargs, warn_unused_keys=False) if load_kwargs else None
self.model_attributes = {**self.get_hf_model_config().to_dict(), **(self.model_attributes or {})}
self.model = None
self.dummy_inputs = None
@property
def model_name_or_path(self) -> str:
"""Return the path to valid hf transformers checkpoint.
Call this instead of model_path if you expect a checkpoint path.
"""
return self.get_mlflow_transformers_path() or self.model_path
@property
def adapter_path(self) -> str:
"""Return the path to the peft adapter."""
return self.get_resource("adapter_path")
def load_model(self, rank: int = None) -> torch.nn.Module:
"""Load the model from the model path."""
if self.model is not None:
return self.model
model = load_model_from_task(self.task, self.model_path, **self.get_load_kwargs())
# we only have peft adapters for now
if self.adapter_path:
from peft import PeftModel
model = PeftModel.from_pretrained(model, self.adapter_path)
self.model = model
return model
@property
def io_config(self) -> Dict[str, Any]:
"""Return io config of the model.
Priority: io_config > hf onnx_config
"""
io_config = None
if self._io_config:
# io_config is provided
io_config = self.get_resolved_io_config(
self._io_config, force_kv_cache=self.task.endswith("-with-past"), model_attributes=self.model_attributes
)
else:
logger.debug("Trying hf onnx_config to get io_config")
io_config = self.get_hf_io_config()
if io_config:
logger.debug("Got io_config from hf onnx_config")
return io_config
def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
"""Return a dummy input for the model."""
if self.dummy_inputs is not None:
return self.dummy_inputs
# Priority: io_config > hf onnx_config
dummy_inputs = self._get_dummy_inputs_from_io_config(
force_kv_cache=self.task.endswith("-with-past"),
filter_hook=filter_hook,
filter_hook_kwargs=filter_hook_kwargs,
)
if dummy_inputs:
return dummy_inputs
logger.debug("Trying hf onnx_config to get dummy inputs")
dummy_inputs = self.get_hf_dummy_inputs()
if dummy_inputs:
logger.debug("Got dummy inputs from hf onnx_config")
if dummy_inputs is None:
raise ValueError("Unable to get dummy inputs for the model.")
return dummy_inputs
def to_json(self, check_object: bool = False):
config = super().to_json(check_object)
# only keep model_attributes that are not in hf model config
hf_model_config_dict = self.get_hf_model_config().to_dict()
config["config"]["model_attributes"] = dict_diff(self.model_attributes, hf_model_config_dict)
return serialize_to_json(config, check_object)
@model_handler_registry("DistributedHfModel")
class DistributedHfModelHandler(OliveModelHandler):
json_config_keys: Tuple[str, ...] = (
"model_name_pattern",
"num_ranks",
"task",
"load_kwargs",
"io_config",
"generative",
)
DEFAULT_RANKED_MODEL_NAME_FORMAT: ClassVar[str] = "model_{:02d}"
def __init__(
self,
model_path: OLIVE_RESOURCE_ANNOTATIONS,
model_name_pattern: str,
num_ranks: int,
task: str,
load_kwargs: Union[Dict[str, Any], HfLoadKwargs] = None,
io_config: Union[Dict[str, Any], IoConfig] = None,
model_attributes: Optional[Dict[str, Any]] = None,
generative: bool = False,
):
super().__init__(
framework=Framework.PYTORCH,
model_file_format=None,
model_path=model_path,
model_attributes=model_attributes,
io_config=io_config,
generative=generative,
)
self.add_resources(locals())
self.model_name_pattern = model_name_pattern
self.num_ranks = num_ranks
self.task = task
self.load_kwargs = load_kwargs
def ranked_model_name(self, rank: int) -> str:
return self.model_name_pattern.format(rank)
def ranked_model_path(self, rank: int) -> Union[Path, str]:
return Path(self.model_path) / self.ranked_model_name(rank)
def load_model(self, rank: int = None) -> HfModelHandler:
return HfModelHandler(
model_path=self.ranked_model_path(rank),
task=self.task,
load_kwargs=self.load_kwargs,
io_config=self.io_config,
model_attributes=self.model_attributes,
generative=self.generative,
)
def prepare_session(
self,
inference_settings: Optional[Dict[str, Any]] = None,
device: Device = Device.GPU, # pylint: disable=signature-differs
execution_providers: Union[str, List[str]] = None,
rank: Optional[int] = 0,
) -> torch.nn.Module:
return self.load_model(rank).load_model(rank).eval()
def run_session(
self,
session: Any = None,
inputs: Union[Dict[str, Any], List[Any], Tuple[Any, ...]] = None,
**kwargs: Dict[str, Any],
) -> Any:
if isinstance(inputs, dict):
results = session.generate(**inputs, **kwargs) if self.generative else session(**inputs, **kwargs)
else:
results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
return results

Просмотреть файл

@ -2,24 +2,22 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive.model.handler.mixin.composite import CompositeMixin
from olive.model.handler.mixin.dummy_inputs import DummyInputsMixin
from olive.model.handler.mixin.hf_config import HfConfigMixin
from olive.model.handler.mixin.hf import HfMixin
from olive.model.handler.mixin.io_config import IoConfigMixin
from olive.model.handler.mixin.json import JsonMixin
from olive.model.handler.mixin.kv_cache import PytorchKvCacheMixin
from olive.model.handler.mixin.mlflow import MLFlowMixin
from olive.model.handler.mixin.mlflow import MLFlowTransformersMixin
from olive.model.handler.mixin.onnx_ep import OnnxEpValidateMixin
from olive.model.handler.mixin.onnx_graph import OnnxGraphMixin
from olive.model.handler.mixin.resource import ResourceMixin
__all__ = [
"CompositeMixin",
"DummyInputsMixin",
"HfConfigMixin",
"HfMixin",
"IoConfigMixin",
"JsonMixin",
"MLFlowMixin",
"MLFlowTransformersMixin",
"OnnxEpValidateMixin",
"OnnxGraphMixin",
"PytorchKvCacheMixin",

Просмотреть файл

@ -5,7 +5,6 @@
import logging
import olive.data.template as data_config_template
from olive.common.user_module_loader import UserModuleLoader
logger = logging.getLogger(__name__)
@ -16,51 +15,31 @@ class DummyInputsMixin:
the dummy data is used to evaluate the latency if user doesn't provide the data for evaluation.
"""
def _get_dummy_dataloader_from_io_config(self):
dataloader = None
# resolved self.io_config
# won't use self.io_config since we don't want hf_config to be used
resolved_io_config = self.get_user_io_config(self.io_config) or {}
if resolved_io_config.get("input_shapes"):
logger.debug("Using io_config.input_shapes to build dummy dataloader")
dataloader = (
# input_types is optional
data_config_template.dummy_data_config_template(
input_shapes=resolved_io_config["input_shapes"],
input_types=resolved_io_config.get("input_types"),
input_names=resolved_io_config.get("input_names"),
).to_data_container()
def _get_dummy_inputs_from_io_config(self, force_kv_cache: bool = False, filter_hook=None, filter_hook_kwargs=None):
if not self._io_config:
return None
resolved_io_config = (
self.get_resolved_io_config(
self._io_config, force_kv_cache=force_kv_cache, model_attributes=self.model_attributes
)
return dataloader
or {}
)
if not resolved_io_config.get("input_shapes"):
return None
def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
"""Return a dummy input for the model."""
if self.dummy_inputs is not None:
return self.dummy_inputs
# Priority: dummy_inputs_func > io_config.input_shapes > hf_config.dataset > onnx_config
dummy_inputs = None
if self.dummy_inputs_func is not None:
logger.debug("Using dummy_inputs_func to get dummy inputs")
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
dummy_inputs = user_module_loader.call_object(self.dummy_inputs_func, self)
# respect user's dummy_inputs_func, no hook
else:
dataloader = self._get_dummy_dataloader_from_io_config()
if dataloader:
dummy_inputs, _ = dataloader.get_first_batch()
elif self.hf_config and not self.hf_config.components and self.hf_config.task:
logger.debug("Trying hf onnx_config to get dummy inputs")
dummy_inputs = self.get_hf_dummy_inputs()
if dummy_inputs is not None:
logger.debug("Got dummy inputs from hf onnx_config")
if filter_hook:
dummy_inputs = filter_hook(dummy_inputs, **(filter_hook_kwargs or {}))
if dummy_inputs is None:
raise ValueError(
"Unable to get dummy inputs. Please provide dummy_inputs_func, io_config.input_shapes,"
" hf_config.dataset, or hf_config."
logger.debug("Using io_config.input_shapes to build dummy inputs")
dummy_inputs = (
data_config_template.dummy_data_config_template(
input_shapes=resolved_io_config["input_shapes"],
input_types=resolved_io_config.get("input_types"),
input_names=resolved_io_config.get("input_names"),
)
.to_data_container()
.get_first_batch()
)[0]
if filter_hook:
dummy_inputs = filter_hook(dummy_inputs, **(filter_hook_kwargs or {}))
return dummy_inputs

Просмотреть файл

@ -0,0 +1,88 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from pathlib import Path
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
from olive.common.hf.model_io import get_model_dummy_input, get_model_io_config
from olive.common.hf.utils import (
get_generation_config,
get_model_config,
get_tokenizer,
save_model_config,
save_tokenizer,
)
if TYPE_CHECKING:
from transformers import GenerationConfig, PretrainedConfig, PreTrainedTokenizer, PreTrainedTokenizerFast
logger = logging.getLogger(__name__)
class HfMixin:
"""Provide the following Hugging Face model functionalities."""
def get_load_kwargs(self) -> Dict[str, Any]:
"""Return all args from load_kwargs in a dict with types expected by `from_pretrained`."""
return self.load_kwargs.get_load_kwargs() if self.load_kwargs else {}
def get_hf_model_config(self) -> "PretrainedConfig":
return get_model_config(self.model_path, **self.get_load_kwargs())
def get_hf_generation_config(self) -> "GenerationConfig":
return get_generation_config(self.model_path, **self.get_load_kwargs())
def get_hf_tokenizer(self) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
# don't provide loading args for tokenizer directly since it tries to serialize all kwargs
# TODO(anyone): only provide relevant kwargs, no use case for now to provide kwargs
return get_tokenizer(self.model_path)
def save_metadata(self, output_dir: str, **kwargs) -> List[str]:
"""Save model metadata files to the output directory.
:param output_dir: output directory to save metadata files
:param kwargs: additional keyword arguments to pass to `save_pretrained` method
:return: list of file paths
"""
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir(parents=True)
elif not output_dir.is_dir():
raise ValueError("Expecting a directory as input.")
saved_filepaths = []
# save model config
save_model_config(self.get_hf_model_config(), output_dir, **kwargs)
saved_filepaths.append(str(output_dir / "config.json"))
# save model generation config
# non-generative models won't have generation config
generation_config = self.get_hf_generation_config()
if generation_config:
save_model_config(generation_config, output_dir, **kwargs)
saved_filepaths.append(str(output_dir / "generation_config.json"))
# save tokenizer
tokenizer_filepaths = save_tokenizer(self.get_hf_tokenizer(), output_dir, **kwargs)
saved_filepaths.extend([fp for fp in tokenizer_filepaths if Path(fp).exists()])
return saved_filepaths
def get_hf_io_config(self) -> Optional[Dict[str, Any]]:
"""Get Io config for the model."""
return get_model_io_config(self.model_path, self.task, **self.get_load_kwargs())
def get_hf_dummy_inputs(self) -> Optional[Dict[str, Any]]:
"""Get dummy inputs for the model."""
return get_model_dummy_input(
self.model_path,
self.task,
**self.get_load_kwargs(),
)
def get_hf_model_type(self) -> str:
"""Get model type for the model."""
return self.get_hf_model_config().model_type

Просмотреть файл

@ -1,147 +0,0 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from pathlib import Path
from typing import TYPE_CHECKING, Generator, List, Optional, Tuple
from olive.constants import ModelFileFormat
from olive.model.utils.hf_utils import (
get_hf_model_config,
get_hf_model_dummy_input,
get_hf_model_generation_config,
get_hf_model_io_config,
get_hf_model_tokenizer,
load_hf_model_from_model_class,
load_hf_model_from_task,
save_hf_model_config,
save_hf_model_tokenizer,
)
if TYPE_CHECKING:
from olive.model.handler.pytorch import PyTorchModelHandler
logger = logging.getLogger(__name__)
class HfConfigMixin:
"""Provide the following Hugging Face model functionalities.
* loading huggingface model
* getting huggingface model config
* getting huggingface model io config
* getting huggingface model components like Whisper scenario.
The mixin requires the following attributes to be set.
* model_path
* model_file_format
* model_loader
* model_script
* script_dir
* model_attributes
* hf_config
"""
def get_hf_model_config(self):
if self.hf_config is None:
raise ValueError("HF model_config is not available")
return get_hf_model_config(self.get_model_path_or_name(), **self.hf_config.get_loading_args_from_pretrained())
def get_hf_model_generation_config(self):
if self.hf_config is None:
raise ValueError("HF model_config is not available")
return get_hf_model_generation_config(
self.get_model_path_or_name(), **self.hf_config.get_loading_args_from_pretrained()
)
def get_hf_model_tokenizer(self, **kwargs):
if self.hf_config is None:
raise ValueError("HF model_config is not available")
# don't provide loading args for tokenizer directly since it tries to serialize all kwargs
# TODO(anyone): only provide relevant kwargs, no use case for now to provide kwargs
return get_hf_model_tokenizer(self.get_model_path_or_name(), **kwargs)
def save_metadata_for_token_generation(self, output_dir: str, **kwargs) -> List[str]:
"""Save metadata for token generation.
:param output_dir: output directory to save metadata files
:param kwargs: additional keyword arguments to pass to `save_pretrained` method
:return: list of file paths
"""
if self.hf_config is None:
raise ValueError("HF model_config is not available.")
if not Path(output_dir).is_dir():
raise ValueError("Expecting a directory as input.")
save_hf_model_config(self.get_hf_model_config(), output_dir, **kwargs)
save_hf_model_config(self.get_hf_model_generation_config(), output_dir, **kwargs)
tokenizer_filepaths = save_hf_model_tokenizer(self.get_hf_model_tokenizer(), output_dir, **kwargs)
output_dir = Path(output_dir)
return [
str(output_dir / "config.json"),
str(output_dir / "generation_config.json"),
*[fp for fp in tokenizer_filepaths if Path(fp).exists()],
]
def get_hf_io_config(self):
"""Get Io config for the model."""
if self.hf_config and self.hf_config.task and not self.hf_config.components:
return get_hf_model_io_config(
self.get_model_path_or_name(),
self.hf_config.task,
self.hf_config.feature,
**self.hf_config.get_loading_args_from_pretrained(),
)
else:
return None
def get_hf_components(self, rank: Optional[int] = None) -> Generator[Tuple[str, "PyTorchModelHandler"], None, None]:
if self.hf_config and self.hf_config.components:
for component in self.hf_config.components:
yield component.name, self.get_component_model(component, rank)
def load_hf_model(self, model_path: str = None):
"""Load model from model_path or model_name."""
model_name_or_path = model_path or self.hf_config.model_name
loading_args = self.hf_config.get_loading_args_from_pretrained()
logger.info("Loading Huggingface model from %s", model_name_or_path)
if self.hf_config.task:
model = load_hf_model_from_task(self.hf_config.task, model_name_or_path, **loading_args)
elif self.hf_config.model_class:
model = load_hf_model_from_model_class(self.hf_config.model_class, model_name_or_path, **loading_args)
else:
raise ValueError("Either task or model_class must be specified")
return model
def get_hf_dummy_inputs(self):
"""Get dummy inputs for the model."""
return get_hf_model_dummy_input(
self.get_model_path_or_name(),
self.hf_config.task,
self.hf_config.feature,
**self.hf_config.get_loading_args_from_pretrained(),
)
def is_model_loaded_from_hf_config(self) -> bool:
"""Return True if the model is loaded from hf_config, False otherwise."""
return (
(not self.model_loader)
and (
self.model_file_format
not in (ModelFileFormat.PYTORCH_TORCH_SCRIPT, ModelFileFormat.PYTORCH_MLFLOW_MODEL)
)
and self.hf_config
and (self.hf_config.model_class or self.hf_config.task)
)
def get_model_path_or_name(self):
if self.model_file_format == ModelFileFormat.PYTORCH_MLFLOW_MODEL:
return self.get_mlflow_model_path_or_name(self.get_mlflow_transformers_dir())
else:
return self.model_path or self.hf_config.model_name

Просмотреть файл

@ -23,9 +23,8 @@ class PytorchKvCacheMixin:
unused_keys = set()
if kv_cache_config and not dummy_inputs.get(past_kv_names):
torch_past_key_values = []
k_inputs = kv_cache_config.get_ort_past_key_names()
v_inputs = kv_cache_config.get_ort_past_value_names()
for k_input, v_input in zip(k_inputs, v_inputs):
kv_inputs = kv_cache_config.get_ort_past_kv_names()
for k_input, v_input in zip(kv_inputs[::2], kv_inputs[1::2]):
if k_input not in dummy_inputs or v_input not in dummy_inputs:
raise ValueError(
f"Cannot find past key-value pair for {k_input} and {v_input} in dummy inputs."
@ -51,6 +50,7 @@ class PytorchKvCacheMixin:
"""
return (self.merge_kv_cache_hook(dummy_inputs, past_kv_names),)
# TODO(jambayk): consider removing this since we don't use hf dataset for dummy inputs anymore
def past_key_values_input_filter_hook(self, dummy_inputs, past_kv_names: str = "past_key_values"):
if not isinstance(dummy_inputs, dict):
return dummy_inputs

Просмотреть файл

@ -5,42 +5,43 @@
import logging
from pathlib import Path
from typing import Optional
from olive.common.utils import copy_dir
from olive.constants import ModelFileFormat
from olive.cache import OliveCache
from olive.common.hf.mlflow import get_pretrained_name_or_path, is_mlflow_transformers
from olive.common.utils import hardlink_copy_dir, hash_string
logger = logging.getLogger(__name__)
class MLFlowMixin:
def _get_mlflow_transformers_model_path(self, cache_dir):
# DO NOT use the model.to_json() to get hash_dict, since it will get hf_config from the model
# and the operation to get hf_config will use this function to get model_path, which will
# cause infinite loop
return str(Path(cache_dir) / "olive_tmp" / "transformers")
class MLFlowTransformersMixin:
def get_mlflow_transformers_path(self) -> Optional[str]:
if not is_mlflow_transformers(self.model_path):
return None
def to_mlflow_transformer_model(self, cache_dir):
if self.model_file_format != ModelFileFormat.PYTORCH_MLFLOW_MODEL:
raise ValueError(
"Model file format is not PyTorch MLFlow model, you cannot get MLFlow transformers model path."
)
target_path = self._get_mlflow_transformers_model_path(cache_dir)
if (Path(target_path) / "config.json").exists():
logger.debug("Use cached mlflow-transformers models from %s", target_path)
return target_path
if (Path(self.model_path) / "data" / "model").exists():
copy_dir(Path(self.model_path) / "data" / "model", target_path, dirs_exist_ok=True)
copy_dir(Path(self.model_path) / "data" / "config", target_path, dirs_exist_ok=True)
copy_dir(Path(self.model_path) / "data" / "tokenizer", target_path, dirs_exist_ok=True)
return target_path
return None
model_dir = get_pretrained_name_or_path(self.model_path, "model")
config_dir = get_pretrained_name_or_path(self.model_path, "config")
tokenizer_dir = get_pretrained_name_or_path(self.model_path, "tokenizer")
def get_mlflow_model_path_or_name(self, cache_dir):
# both config.json and model file will be saved under data/model
mlflow_transformer_model_path = self.to_mlflow_transformer_model(cache_dir)
if not mlflow_transformer_model_path:
logger.debug(
"Model path %s does not exist. Use hf_config.model_name instead.", mlflow_transformer_model_path
)
return self.hf_config.model_name
return str(mlflow_transformer_model_path)
# some mlflow models only have model directory
if config_dir == model_dir and tokenizer_dir == model_dir:
return model_dir
# some mlflow models have config and tokenizer directories but model directory also
# contains the same files
model_dir_contents = set(Path(model_dir).iterdir())
if (
set(Path(config_dir).iterdir()) <= model_dir_contents
and set(Path(tokenizer_dir).iterdir()) <= model_dir_contents
):
return model_dir
# have to gather all contents into a single directory
cache = OliveCache.from_cache_env()
mlflow_transformers_path = cache.dirs.mlflow / hash_string(str(Path(self.model_path).resolve()))
if (mlflow_transformers_path / "config.json").exists():
logger.debug("MLFlow model already exists in cache. Reusing it.")
else:
for src_dir in [model_dir, config_dir, tokenizer_dir]:
hardlink_copy_dir(src_dir, mlflow_transformers_path)
return str(mlflow_transformers_path)

Просмотреть файл

@ -3,191 +3,28 @@
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
import os
from copy import deepcopy
from pathlib import Path
from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
import torch
import yaml
from olive.common.config_utils import serialize_to_json, validate_config
from olive.common.user_module_loader import UserModuleLoader
from olive.constants import Framework, ModelFileFormat
from olive.hardware.accelerator import Device
from olive.model.config import (
HfComponent,
HfConfig,
IoConfig,
complete_kv_cache_with_model_attributes,
extend_io_config_with_kv_cache,
)
from olive.model.config import IoConfig, complete_kv_cache_with_model_attributes, extend_io_config_with_kv_cache
from olive.model.config.registry import model_handler_registry
from olive.model.handler.base import OliveModelHandler
from olive.model.handler.mixin import DummyInputsMixin, HfConfigMixin, MLFlowMixin, PytorchKvCacheMixin
from olive.model.utils.hf_utils import load_hf_model_from_model_class
from olive.model.handler.mixin import DummyInputsMixin, PytorchKvCacheMixin
from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS, ResourceType, create_resource_path
logger = logging.getLogger(__name__)
@model_handler_registry("PyTorchModel")
class PyTorchModelHandler(
OliveModelHandler, HfConfigMixin, DummyInputsMixin, PytorchKvCacheMixin, MLFlowMixin
class PyTorchModelHandlerBase(
OliveModelHandler, DummyInputsMixin, PytorchKvCacheMixin
): # pylint: disable=too-many-ancestors
"""PyTorch model handler.
Besides the model loading for PyTorch model, the model handler also provides the following functionalities:
* Get the model io configuration either from user provider io_config or from hf_config. The priority is user
provided io_config is higher than hf_config.
* Get the dummy inputs for PyTorch model used to evaluate the latency.
* All kinds of Hf model functionalities by HfConfigMixin.
"""
resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script", "adapter_path")
json_config_keys: Tuple[str, ...] = (
"model_file_format",
"model_loader",
"dummy_inputs_func",
"hf_config",
"mlflow_transformer_model_cache_dir",
"generative",
)
def __init__(
self,
model_path: OLIVE_RESOURCE_ANNOTATIONS = None,
model_file_format: ModelFileFormat = ModelFileFormat.PYTORCH_ENTIRE_MODEL,
model_loader: Union[str, Callable] = None,
model_script: Union[str, Path] = None,
script_dir: Union[str, Path] = None,
io_config: Union[Dict[str, Any], IoConfig, str, Callable] = None,
dummy_inputs_func: Union[str, Callable] = None,
hf_config: Union[Dict[str, Any], HfConfig] = None,
adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
model_attributes: Optional[Dict[str, Any]] = None,
mlflow_transformer_model_cache_dir: Optional[str] = None,
generative: bool = False,
):
if not (
isinstance(model_loader, Callable)
or (isinstance(model_loader, str) and model_script)
or model_path
or hf_config
):
raise ValueError(
"model_path is required since model_loader is not callable or model_script is not provided"
)
self.mlflow_transformer_model_cache_dir = mlflow_transformer_model_cache_dir
self.model_loader = model_loader
self.model = None
super().__init__(
framework=Framework.PYTORCH,
model_file_format=model_file_format,
model_path=model_path,
model_attributes=model_attributes,
io_config=io_config,
generative=generative,
)
self.add_resources(locals())
self.hf_config = None
if hf_config:
self.hf_config = validate_config(hf_config, HfConfig)
hf_model_config = self.get_hf_model_config().to_dict()
model_attr = self.model_attributes or {}
hf_model_config.update(model_attr)
self.model_attributes = hf_model_config
# ensure that script_dirs are local folder
script_dir_resource = create_resource_path(self.script_dir)
if script_dir_resource:
assert script_dir_resource.type == ResourceType.LocalFolder, "script_dir must be a local directory."
# ensure that model_script is local file or string name
model_script_resource = create_resource_path(self.model_script)
if model_script_resource:
assert model_script_resource.type in (
ResourceType.LocalFile,
ResourceType.StringName,
), "model_script must be a local file or a string name."
self.dummy_inputs_func = dummy_inputs_func
self.dummy_inputs = None
@property
def script_dir(self) -> str:
return self.get_resource("script_dir")
@property
def model_script(self) -> str:
return self.get_resource("model_script")
@property
def adapter_path(self) -> str:
return self.get_resource("adapter_path")
def get_mlflow_transformers_dir(self):
return self.mlflow_transformer_model_cache_dir or self.model_path
def load_model(self, rank: int = None) -> torch.nn.Module:
if self.model is not None:
return self.model
# Load user module at the beginning since we may need user defined models to load model
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
# Load special path or format model -> load model from hf config -> load normal path model
if self.model_loader is not None:
model = user_module_loader.call_object(self.model_loader, self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_TORCH_SCRIPT:
model = torch.jit.load(self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_MLFLOW_MODEL:
model = self._load_mlflow_model()
elif self.hf_config and (self.hf_config.model_class or self.hf_config.task):
model = self.load_hf_model(self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_ENTIRE_MODEL:
model = torch.load(self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_SLICE_GPT_MODEL:
model = self._load_slicegpt_model()
elif self.model_file_format == ModelFileFormat.PYTORCH_STATE_DICT:
raise ValueError("Please use customized model loader to load state dict of model.")
else:
raise ValueError(f"Unsupported model file format: {self.model_file_format}")
# we only have peft adapters for now
if self.adapter_path:
from peft import PeftModel
model = PeftModel.from_pretrained(model, self.adapter_path)
self.model = model
return model
def get_component_model(self, component: HfComponent, rank: Optional[int] = None) -> "PyTorchModelHandler":
if component.component_func is None:
logger.debug("component_func is not provided, using hf_config to get component")
model_component = self.load_hf_model(self.model_path)
else:
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
model_component = user_module_loader.call_object(component.component_func, self)
# the second default parameter is to fix ruff b023:
# https://docs.astral.sh/ruff/rules/function-uses-loop-variable/
def model_loader(_, model_component=model_component):
return model_component
component_hf_config = deepcopy(self.hf_config).dict()
component_hf_config.pop("components", None)
return PyTorchModelHandler(
model_loader=model_loader,
io_config=component.io_config,
dummy_inputs_func=component.dummy_inputs_func,
model_script=self.model_script,
script_dir=self.script_dir,
hf_config=HfConfig.parse_obj(component_hf_config),
model_attributes=self.model_attributes,
)
"""Base class for PyTorch model handler."""
def prepare_session(
self,
@ -210,133 +47,64 @@ class PyTorchModelHandler(
results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
return results
def _load_mlflow_model(self):
logger.info("Loading MLFlow model from %s", self.model_path)
mlflow_transformers_path = self.to_mlflow_transformer_model(self.get_mlflow_transformers_dir())
with open(os.path.join(self.model_path, "MLmodel")) as fp:
mlflow_data = yaml.safe_load(fp)
# default flavor is "hftransformersv2" from azureml.evaluate.mlflow>=0.0.8
# "hftransformers" from azureml.evaluate.mlflow<0.0.8
# TODO(trajep): let user specify flavor name if needed
# to support other flavors in mlflow not only hftransformers
hf_pretrained_class = None
flavors = mlflow_data.get("flavors", {})
if not flavors:
raise ValueError(
"Invalid MLFlow model format. Please make sure the input model"
" format is same with the result of mlflow.transformers.save_model,"
" or aml_mlflow.hftransformers.save_model from azureml.evaluate.mlflow"
)
@staticmethod
def get_resolved_io_config(
io_config: Union[Dict[str, Any], IoConfig],
force_kv_cache: bool = False,
model_attributes: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Resolve io_config to a dictionary.
if "hftransformersv2" in flavors:
hf_pretrained_class = flavors["hftransformersv2"].get("hf_pretrained_class", "AutoModel")
elif "hftransformers" in flavors:
hf_pretrained_class = flavors["hftransformers"].get("hf_pretrained_class", "AutoModel")
else:
raise ValueError(
"Unsupported MLFlow model flavor. Currently only support hftransformersv2/hftransformers."
)
loading_args = self.hf_config.get_loading_args_from_pretrained() if self.hf_config else {}
loaded_model = load_hf_model_from_model_class(hf_pretrained_class, mlflow_transformers_path, **loading_args)
loaded_model.eval()
return loaded_model
:param io_config: io_config to resolve.
:param force_kv_cache: whether to enable kv_cache if not already enabled.
"""
io_config_obj = validate_config(io_config, IoConfig)
def _load_slicegpt_model(self):
logger.info("Loading SliceGPT model from %s", self.model_path)
from slicgpt.hf_utils import load_sliced_model
# enable kv_cache
io_config_obj.kv_cache = io_config_obj.kv_cache or force_kv_cache
loaded_model, _ = load_sliced_model(self.hf_config.model_name, self.model_path)
return loaded_model
if io_config_obj.kv_cache:
kv_cache_config = complete_kv_cache_with_model_attributes(io_config_obj.kv_cache, model_attributes or {})
io_config_obj = extend_io_config_with_kv_cache(io_config_obj, kv_cache_config)
return io_config_obj.dict(exclude_none=True)
def to_json(self, check_object: bool = False):
config = super().to_json(check_object)
# add _io_config to config to keep what was provided at init
config["config"]["io_config"] = self._io_config
# only keep model_attributes that are not in hf_config
if self.model_attributes and self.hf_config:
hf_config_dict = self.get_hf_model_config().to_dict()
config["config"]["model_attributes"] = {
key: value
for key, value in self.model_attributes.items()
if key not in hf_config_dict or hf_config_dict[key] != value
} or None
return serialize_to_json(config, check_object)
def get_user_io_config(self, io_config: Union[Dict[str, Any], IoConfig, str, Callable]) -> Dict[str, Any]:
"""Resolve io_config to a dictionary.
If io_config is a string name or a callable, it will be called to get io_config.
"""
io_config_obj = None
if isinstance(io_config, dict):
io_config_obj = IoConfig.parse_obj(io_config)
elif isinstance(io_config, IoConfig):
# return a new copy of io_config to avoid modifying the original one
io_config_obj = io_config.copy(deep=True)
elif isinstance(io_config, (str, Callable)):
# io_config is a string name or a callable
logger.debug("Calling %s to get io_config", io_config)
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
io_config = user_module_loader.call_object(io_config, self)
io_config_obj = validate_config(io_config, IoConfig)
# TODO(anyone): infer if to use kv_cache from task config
if io_config_obj.kv_cache:
kv_cache_config = complete_kv_cache_with_model_attributes(io_config_obj.kv_cache, self.model_attributes)
io_config_obj = extend_io_config_with_kv_cache(io_config_obj, kv_cache_config)
return io_config_obj.dict(exclude_none=True)
@model_handler_registry("PyTorchModel")
class PyTorchModelHandler(PyTorchModelHandlerBase): # pylint: disable=too-many-ancestors
"""PyTorch model handler.
@property
def io_config(self) -> Dict[str, Any]:
"""Return io config of the model.
Besides the model loading for PyTorch model, the model handler also provides the following functionalities:
* Get the model io configuration from user provider io_config.
* Get the dummy inputs for PyTorch model used to evaluate the latency.
"""
Priority: io_config > hf_config (using onnx_config)
"""
io_config = None
if self._io_config:
# io_config is provided
io_config = self.get_user_io_config(self._io_config)
elif self.hf_config and self.hf_config.task and not self.hf_config.components:
# hf_config is provided
logger.debug("Trying hf onnx_config to get io_config")
# For MLFlow model, get io config from model_name instead of model_path
# TODO(xiaoyu): more investigation on the integration between MLFlow and HF
io_config = self.get_hf_io_config()
if io_config:
logger.debug("Got io_config from hf_config")
return io_config
@model_handler_registry("DistributedPyTorchModel")
class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script", "adapter_path")
json_config_keys: Tuple[str, ...] = (
"model_name_pattern",
"num_ranks",
"model_loader",
"io_config",
"dummy_inputs_func",
"hf_config",
)
DEFAULT_RANKED_MODEL_NAME_FORMAT: ClassVar[str] = "model_{:02d}"
resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script")
json_config_keys: Tuple[str, ...] = ("model_file_format", "model_loader", "dummy_inputs_func", "generative")
def __init__(
self,
model_path: OLIVE_RESOURCE_ANNOTATIONS,
model_name_pattern: str,
num_ranks: int,
model_path: OLIVE_RESOURCE_ANNOTATIONS = None,
model_file_format: ModelFileFormat = ModelFileFormat.PYTORCH_ENTIRE_MODEL,
model_loader: Union[str, Callable] = None,
model_script: Union[str, Path] = None,
script_dir: Union[str, Path] = None,
io_config: Union[Dict[str, Any], IoConfig, str, Callable] = None,
dummy_inputs_func: Union[str, Callable] = None,
hf_config: Union[Dict[str, Any], HfConfig] = None,
adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
model_attributes: Optional[Dict[str, Any]] = None,
generative: bool = False,
):
if not (isinstance(model_loader, Callable) or (isinstance(model_loader, str) and model_script) or model_path):
raise ValueError(
"model_path is required since model_loader is not callable or model_script is not provided"
)
self.model_loader = model_loader
self.model = None
super().__init__(
framework=Framework.PYTORCH,
model_file_format=model_file_format,
@ -345,14 +113,19 @@ class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
io_config=io_config,
generative=generative,
)
self.add_resources(locals())
self.model_name_pattern = model_name_pattern
self.num_ranks = num_ranks
self.model_loader = model_loader
# ensure that script_dir and model_script are local resorces
for resource_name, expected_type in [
("script_dir", ResourceType.LocalFolder),
("model_script", ResourceType.LocalFile),
]:
resource = create_resource_path(self.get_resource(resource_name))
if resource:
assert resource.type == expected_type, f"{resource_name} must be a local {expected_type}."
self.dummy_inputs_func = dummy_inputs_func
self.hf_config = validate_config(hf_config, HfConfig) if hf_config else None
self.dummy_inputs = None
@property
def script_dir(self) -> str:
@ -362,63 +135,71 @@ class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
def model_script(self) -> str:
return self.get_resource("model_script")
@property
def adapter_path(self) -> str:
return self.get_resource("adapter_path")
def load_model(self, rank: int = None) -> torch.nn.Module:
if self.model is not None:
return self.model
def ranked_model_name(self, rank: int) -> str:
return self.model_name_pattern.format(rank)
# Load user module at the beginning since we may need user defined models to load model
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
def ranked_model_path(self, rank: int) -> Union[Path, str]:
return Path(self.model_path) / self.ranked_model_name(rank)
def load_model(self, rank: int = None) -> PyTorchModelHandler:
return PyTorchModelHandler(
model_path=self.ranked_model_path(rank),
model_file_format=ModelFileFormat.PYTORCH_ENTIRE_MODEL,
model_loader=self.model_loader,
model_script=self.model_script,
script_dir=self.script_dir,
io_config=self._io_config,
dummy_inputs_func=self.dummy_inputs_func,
hf_config=self.hf_config,
adapter_path=self.adapter_path,
model_attributes=self.model_attributes,
)
def get_component_model(self, component: HfComponent, rank: int = 0) -> PyTorchModelHandler:
# TODO(shaahji): Add support for 'HfComponent.component_func'
hf_config = deepcopy(self.hf_config).dict()
hf_config.pop("components", None)
return PyTorchModelHandler(
model_path=self.ranked_model_path(rank),
model_file_format=ModelFileFormat.PYTORCH_ENTIRE_MODEL,
model_script=self.model_script,
script_dir=self.script_dir,
io_config=component.io_config,
dummy_inputs_func=component.dummy_inputs_func,
hf_config=HfConfig.parse_obj(hf_config),
adapter_path=self.adapter_path,
model_attributes=self.model_attributes,
)
def prepare_session(
self,
inference_settings: Optional[Dict[str, Any]] = None,
device: Device = Device.GPU, # pylint: disable=signature-differs
execution_providers: Union[str, List[str]] = None,
rank: Optional[int] = 0,
) -> torch.nn.Module:
return self.load_model(rank).load_model(rank).eval()
def run_session(
self,
session: Any = None,
inputs: Union[Dict[str, Any], List[Any], Tuple[Any, ...]] = None,
**kwargs: Dict[str, Any],
) -> Any:
if isinstance(inputs, dict):
results = session.generate(**inputs, **kwargs) if self.generative else session(**inputs, **kwargs)
# Load special path or format model -> load model from hf config -> load normal path model
if self.model_loader is not None:
model = user_module_loader.call_object(self.model_loader, self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_TORCH_SCRIPT:
model = torch.jit.load(self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_ENTIRE_MODEL:
model = torch.load(self.model_path)
elif self.model_file_format == ModelFileFormat.PYTORCH_SLICE_GPT_MODEL:
model = self._load_slicegpt_model()
elif self.model_file_format == ModelFileFormat.PYTORCH_STATE_DICT:
raise ValueError("Please use customized model loader to load state dict of model.")
else:
results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
return results
raise ValueError(f"Unsupported model file format: {self.model_file_format}")
self.model = model
return model
def _load_slicegpt_model(self):
from slicgpt.hf_utils import load_sliced_model
model_name = self.model_attributes.get("model_name")
if not model_name:
raise ValueError("`model_name` model attribute is required to load SliceGPT model.")
logger.info("Loading SliceGPT model with model_name %s from %s", model_name, self.model_path)
loaded_model, _ = load_sliced_model(model_name, self.model_path)
return loaded_model
@property
def io_config(self) -> Dict[str, Any]:
"""Return io config of the model."""
if not self._io_config:
return None
io_config = self._io_config
if isinstance(io_config, (str, Callable)):
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
io_config = user_module_loader.call_object(io_config, self)
return self.get_resolved_io_config(io_config, model_attributes=self.model_attributes)
def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
"""Return a dummy input for the model."""
if self.dummy_inputs is not None:
return self.dummy_inputs
# Priority: user provided dummy_inputs_func > io_config
if self.dummy_inputs_func is not None:
logger.debug("Using dummy_inputs_func to get dummy inputs")
user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
# respect user's dummy_inputs_func, no hook
return user_module_loader.call_object(self.dummy_inputs_func, self)
dummy_inputs = self._get_dummy_inputs_from_io_config(
filter_hook=filter_hook, filter_hook_kwargs=filter_hook_kwargs
)
if dummy_inputs is None:
raise ValueError("Unable to get dummy inputs for the model.")
return dummy_inputs

Просмотреть файл

@ -2,20 +2,10 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive.model.utils.hf_mappings import (
HIDDEN_SIZE_NAMES,
MODEL_TYPE_MAPPING,
NUM_HEADS_NAMES,
NUM_KEY_VALUE_HEADS_NAMES,
)
from olive.model.utils.onnx_utils import resolve_onnx_path
from olive.model.utils.path_utils import normalize_path_suffix
__all__ = [
"HIDDEN_SIZE_NAMES",
"MODEL_TYPE_MAPPING",
"NUM_HEADS_NAMES",
"NUM_KEY_VALUE_HEADS_NAMES",
"normalize_path_suffix",
"resolve_onnx_path",
]

Просмотреть файл

@ -1,250 +0,0 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import logging
from functools import partial
from itertools import chain
from typing import TYPE_CHECKING, Callable, Dict, Optional, Tuple, Union
import transformers
from transformers import AutoConfig, AutoModel, AutoTokenizer, GenerationConfig
from olive.common.utils import get_attr
from olive.model.utils.hf_mappings import FEATURE_TO_PEFT_TASK_TYPE, MODELS_TO_MAX_LENGTH_MAPPING, TASK_TO_FEATURE
if TYPE_CHECKING:
from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast
from transformers.onnx import OnnxConfig
logger = logging.getLogger(__name__)
def load_hf_model_from_task(task: str, name: str, **kwargs) -> "PreTrainedModel":
"""Load huggingface model from task and name."""
from transformers.pipelines import check_task
task_results = check_task(task)
assert isinstance(task_results, tuple)
if len(task_results) == 2:
targeted_task = task_results[0]
elif len(task_results) == 3:
targeted_task = task_results[1]
else:
raise ValueError("unsupported transformers version")
class_tuple = targeted_task["pt"] or (AutoModel,)
model = None
for i, model_class in enumerate(class_tuple):
try:
model = model_class.from_pretrained(name, **kwargs)
logger.debug("Loaded model %s with name_or_path %s", model_class, name)
break
except (OSError, ValueError) as e:
if i == len(class_tuple) - 1:
# len(class_tuple) == 1 covers most common tasks like text-generation, text-classification, etc
# error could be device OOM, device_map: "auto" not supported, etc
# len(class_tuple) > 1: not common - image-segmentation, conversational, etc
# there is no easy way to get tracebacks for earlier failures, so just raise from last
raise
# the ValueError need to be caught since there will be multiple model_class for single task.
# if the model_class is not the one for the task, it will raise ValueError and
# next model_class will be tried.
logger.info(
"Failed to load model %s with name_or_path %s.\n kwargs: %s.\n Exception raised: %s",
model_class,
name,
kwargs,
e,
)
# this won't be None since class_tuple is never empty and we only reach here if model loaded successfully
# satisfies linter too
return model
def huggingface_model_loader(model_loader: Union[str, Callable]) -> Callable:
if model_loader is None:
model_loader = "AutoModel"
if isinstance(model_loader, str):
try:
model_loader = getattr(transformers, model_loader)
except AttributeError:
raise AttributeError(f"{model_loader} is not found in transformers") from None
elif not isinstance(model_loader, Callable):
raise ValueError("model_loader must be a callable or a string defined in transformers")
return model_loader.from_pretrained
def get_hf_model_config(model_name: str, **kwargs) -> "PretrainedConfig":
"""Get HF Config for the given model name."""
return AutoConfig.from_pretrained(model_name, **kwargs)
def save_hf_model_config(config: Union["PretrainedConfig", "GenerationConfig"], output_dir: str, **kwargs):
"""Save input HF Config to output directory."""
config.save_pretrained(output_dir, **kwargs)
def get_hf_model_generation_config(model_name: str, **kwargs) -> GenerationConfig:
"""Get HF model's generation config for the given model name."""
return GenerationConfig.from_pretrained(model_name, **kwargs)
def get_hf_model_tokenizer(model_name: str, **kwargs) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
"""Get HF model's tokenizer."""
return AutoTokenizer.from_pretrained(model_name, **kwargs)
def save_hf_model_tokenizer(
tokenizer: Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"], output_dir: str, **kwargs
) -> Tuple[str]:
"""Save input tokenizer to output directory."""
return tokenizer.save_pretrained(output_dir, **kwargs)
def load_hf_model_from_model_class(model_class: str, name: str, **kwargs):
"""Load huggingface model from model_loader and name."""
return huggingface_model_loader(model_class)(name, **kwargs)
# patched version of transformers.onnx.features.supported_features_mapping
# to support additional models in olive
def patched_supported_features_mapping(*supported_features: str, onnx_config_cls: str = None) -> Dict[str, Callable]:
"""Generate the mapping between supported the features and their corresponding OnnxConfig for a given model.
Args:
*supported_features: The names of the supported features.
onnx_config_cls: The OnnxConfig full name corresponding to the model.
Returns:
The dictionary mapping a feature to an OnnxConfig constructor.
"""
if onnx_config_cls is None:
raise ValueError("A OnnxConfig class must be provided")
from olive.model.utils import hf_onnx_config
config_cls = get_attr(hf_onnx_config, onnx_config_cls)
mapping = {}
for feature in supported_features:
if "-with-past" in feature:
task = feature.replace("-with-past", "")
mapping[feature] = partial(config_cls.with_past, task=task)
else:
mapping[feature] = partial(config_cls.from_model_config, task=feature)
return mapping
def get_onnx_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs) -> "OnnxConfig":
# pylint: disable=protected-access
from transformers.onnx import FeaturesManager
from olive.model.utils.hf_onnx_config import ADDITIONAL_MODEL_TYPES
# patch FeaturesManager._SUPPORTED_MODEL_TYPE to support additional models in olive
for model_type, feature_list in ADDITIONAL_MODEL_TYPES.items():
if model_type in FeaturesManager._SUPPORTED_MODEL_TYPE:
continue
# TODO(trajep): remove the need for unpacking feature_list
features, onnx_config_cls = feature_list
FeaturesManager._SUPPORTED_MODEL_TYPE[model_type] = patched_supported_features_mapping(
*features, onnx_config_cls=onnx_config_cls
)
# if feature is not provided, try to get it from task
# else use "default"
feature = feature or TASK_TO_FEATURE.get(task, "default")
# don't want to load the model here since all we need is the config
# model loading is expensive computationally and memory-wise for large models
config = get_hf_model_config(model_name, **kwargs)
# recreate the logic for FeaturesManager.check_supported_model_or_raise to get the model_onnx_config
# https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/features.py#L712
model_type = config.model_type.replace("_", "-")
onnx_config = None
try:
model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
if feature in model_features:
onnx_config = FeaturesManager.get_config(model_type, feature)(config)
else:
logger.debug(
"%s doesn't support feature %s. Supported features are: %s", model_type, feature, model_features
)
except KeyError:
logger.debug("Model type %s is not supported", model_type)
return onnx_config
def get_hf_model_io_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
# just log a debug message if io_config is not found
# this is not a critical error and the caller may not need the io_config
model_config = get_onnx_config(model_name, task, feature, **kwargs)
if not model_config:
return None
inputs = model_config.inputs
outputs = model_config.outputs
if not inputs or not outputs:
# just log a warning and return None, since this is not a critical error
# and following pass may not use the io_config, like OptimumConversion
logger.debug("No inputs or outputs found from hf onnx_config %s. Won't use it to get io config", model_config)
return None
io_config = {}
io_config["input_names"] = list(inputs.keys())
io_config["output_names"] = list(outputs.keys())
io_config["dynamic_axes"] = dict(chain(inputs.items(), outputs.items()))
return io_config
def get_hf_model_dummy_input(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
model_config = get_onnx_config(model_name, task, feature, **kwargs)
if not model_config:
return None
tokenizer = AutoTokenizer.from_pretrained(model_name, **kwargs)
return model_config.generate_dummy_inputs(tokenizer, framework="pt")
def get_peft_task_type_from_task(task: str, fail_on_not_found=False) -> str:
"""Get peft task type from feature."""
feature = TASK_TO_FEATURE.get(task, None)
peft_task_type = FEATURE_TO_PEFT_TASK_TYPE.get(feature, None) if feature else None
not_found_msg = f"There is no peft task type for task {task}"
if peft_task_type is None and fail_on_not_found:
raise ValueError(not_found_msg)
elif peft_task_type is None:
logger.warning(not_found_msg)
return peft_task_type
def get_model_max_length(model_name: str, fail_on_not_found=False) -> int:
"""Get max length of the model, extracted from the config."""
model_config = get_hf_model_config(model_name)
model_type = model_config.model_type
max_length = MODELS_TO_MAX_LENGTH_MAPPING.get(model_type, None)
if isinstance(max_length, int):
return max_length
elif isinstance(max_length, str):
return getattr(model_config, max_length)
else:
logger.debug(
"No max length mapping found in MODELS_TO_MAX_LENGTH_MAPPING for model type %s, trying __default__",
model_type,
)
default_max_length = MODELS_TO_MAX_LENGTH_MAPPING["__default__"]
try:
return getattr(model_config, default_max_length)
except AttributeError:
not_found_msg = f"Could not find max length for model type {model_type}"
if fail_on_not_found:
raise ValueError(not_found_msg) from None
else:
logger.warning(not_found_msg)
return None

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше