`HfModelHandler` separated from `PyTorchModelHandler` (#1239)

2024-07-17 20:31:04 -07:00 · 2024-07-17 20:31:04 -07:00 · d325da074e
--- a/docs/source/api/models.rst
+++ b/docs/source/api/models.rst
@ -10,21 +10,33 @@ Model Configuration
 -------------------
 .. autoclass:: olive.model.ModelConfig

+.. _hf_model:
+
+Hf Model Handler
+----------------
+.. autoclass:: olive.model.HfModelHandler
+
+.. _distributed_hf_model:
+
+Distributed Hf Model Handler
+---------------------------------
+.. autoclass:: olive.model.DistributedHfModelHandler
+
+.. _pytorch_model:
+
+PyTorch Model Handler
+---------------------
+.. autoclass:: olive.model.PyTorchModelHandler
+
 .. _onnx_model:

 ONNX Model Handler
 ------------------
 .. autoclass:: olive.model.ONNXModelHandler

-.. _composite_onnx_model:
-
-CompositeModel Model Handler
----------------------------
-.. autoclass:: olive.model.CompositeModelHandler
-
 .. _distributed_onnx_model:

-DistributedOnnxModel Model Handler
+Distributed Onnx Model Handler
 ----------------------------------
 .. autoclass:: olive.model.DistributedOnnxModelHandler

@ -34,24 +46,15 @@ OpenVINO Model Handler
 ----------------------
 .. autoclass:: olive.model.OpenVINOModelHandler

-.. _pytorch_model:
-
-PyTorch Model Handler
---------------------
-.. autoclass:: olive.model.PyTorchModelHandler
-
-DistributedPyTorchModelHandler Model
------------------------------------
-.. autoclass:: olive.model.DistributedPyTorchModelHandler
-
 .. _snpe_model:

-SNPEHandler Model
+SNPE Model Handler
 -----------------
 .. autoclass:: olive.model.SNPEModelHandler

-CompositePyTorchModel Model Handler
-----------------------------------
-.. autoclass:: olive.model.CompositePyTorchModelHandler
+.. _composite_model:
+
+Composite Model Handler
+----------------------------
+.. autoclass:: olive.model.CompositeModelHandler

-.. _distributed_pytorch_model:
--- a/docs/source/features/huggingface_model_optimization.md
+++ b/docs/source/features/huggingface_model_optimization.md
@ -3,142 +3,40 @@
 ## Introduction
 This document outlines the integrations between Olive and Huggingface. Discover how to use Huggingface resources within Olive.

-## hf_config
-If you want to optimize a Huggingface model, or evaluate a Huggingface model, you will need `hf_config` defined in your `input_model` section. Please refer to [this section](../overview/options.md#input-model-information) for detailed parameters of `hf_config`.
+## Input Model
+Use the `HfModel` type if you want to optimize a Huggingface model, or evaluate a Huggingface model. The default `task` is `text-generation-with-past`.

-Here is how you can use `hf_config`:
-
-### Model config loading
-Olive can automatically retrieve model configurations from Huggingface hub:
-
- Olive retrieves model [configuration](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) from transformers for future usage.
-
- Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the `OnnxConversion` pass from [OnnxConfig](https://huggingface.co/docs/transformers/main_classes/onnx#onnx-configurations). This means there's no need for you to manually specify the IO config and dummy input when using the `OnnxConversion` pass.
-
-If you want to use your own `io_config` or `dummy_input`, you can still add them to the model config:
+### Huggingface Hub model
+Olive can automatically retrieve models from Huggingface hub:
 ```json
 "input_model":{
-    "type": "PyTorchModel",
+    "type": "HfModel",
    "config": {
-        "model_script": "user_script.py",
-        "io_config": "get_io_config",
-        "dummy_inputs_func": "get_dummy_inputs",
-        "hf_config": {
-            "model_name": "meta-llama/Llama-2-7b-hf",
-            "task": "text-generation"
-        }
+        "model_path": "meta-llama/Llama-2-7b-hf"
    }
 }
 ```

-### Model loading
-#### Load Huggingface model from Huggingface hub
-Olive can automatically retrieve models from Huggingface hub. Here are the examples:
-
-#### PyTorch model
-Take `Intel/bert-base-uncased-mrpc` as an example, you can specify task name as `text-classification` to form the `hf_config` as follows:
-
+### Local model
+If you have the Huggingface model prepared in local:
 ```json
 "input_model":{
-    "type": "PyTorchModel",
+    "type": "HfModel",
    "config": {
-        "hf_config": {
-            "model_name": "Intel/bert-base-uncased-mrpc",
-            "task": "text-classification"
-        }
+        "model_path": "path/to/local/model"
    }
 }
 ```
+**Note:** You must also have the tokenizer and other necessary files in the same local directory.

-#### Optimum model
-Optimum model is a special case of PyTorch model. By specifying `OptimumModel` as `type`, the `model_path` should be the model's name. Then add the names of the model components to `model_components`. Olive will retrieve the components from Huggingface hub:

-```json
-"input_model":{
-    "type": "OptimumModel",
-    "config": {
-        "model_path": "openlm-research/open_llama_3b",
-        "model_components": ["decoder_model.onnx", "decoder_with_past_model.onnx"],
-        "hf_config": {
-            "model_class": "LlamaForCausalLM"
-        }
-    }
-}
-```
-
-### Model loading from local
-If you have the Huggingface model prepared in local, add `model_path` to the model config, and specify `model_name` and `task` in `hf_config` so that Olive can automatically fetch the model attributes:
-
-Example:
-```json
-"input_model":{
-    "type": "PyTorchModel",
-    "config": {
-        "model_path": "path_to_local_model",
-        "hf_config": {
-            "model_name": "Intel/bert-base-uncased-mrpc",
-            "task": "text-classification"
-        }
-    }
-}
-```
-
-### Model loading from local with custom components
-You can use your own custom components functions for your model. You will need to define the details of your components in your script as functions.
-
-Example:
-```json
-{
-    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_script": "user_script.py",
-            "hf_config": {
-                "model_class": "WhisperForConditionalGeneration",
-                "model_name": "openai/whisper-medium",
-                "components": [
-                    {
-                        "name": "encoder_decoder_init",
-                        "io_config": "get_encdec_io_config",
-                        "component_func": "get_encoder_decoder_init",
-                        "dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
-                    },
-                    {
-                        "name": "decoder",
-                        "io_config": "get_dec_io_config",
-                        "component_func": "get_decoder",
-                        "dummy_inputs_func": "decoder_dummy_inputs"
-                    }
-                ]
-            }
-        }
-    },
-}
-```
-
-#### Script example
-```python
-# my_script.py
-def get_dec_io_config(model: OliveModelHandler):
-    # return your io dict
-    ...
-
-def get_decoder(model: OliveModelHandler):
-    # your component implementation
-    ...
-
-def dummy_inputs_func(model: OliveModelHandler):
-    # return the dummy input for your component
-    ...
-```
-
-### Model loading from Azure ML resources
+### Azure ML model
 Olive supports loading model from your Azure Machine Learning workspace. Find detailed configurations [here](./azureml_integration.md).

 Example: [Llama-2-7b](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/registry/azureml-meta) from Azure ML model catalog:
 ```json
 "input_model":{
-    "type": "PyTorchModel",
+    "type": "HfModel",
    "config": {
        "model_path": {
            "type": "azureml_registry_model",
@ -147,17 +45,38 @@ Example: [Llama-2-7b](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/
                "registry_name": "azureml-meta",
                "version": "13"
            }
-        },
-        "model_file_format": "PyTorch.MLflow",
-        "hf_config": {
-            "model_name": "meta-llama/Llama-2-7b-hf",
-            "task": "text-generation"
        }
    }
 }
 ```

-Please note the model for `Llama-2-7b` in Azure ML model catalog is a mlflow model. So `"model_file_format": "PyTorch.MLflow"` is required here.
+### Model config loading
+Olive can automatically retrieve model configurations from Huggingface hub:
+
+- Olive retrieves model [configuration](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoConfig) from transformers for future usage.
+
+- Olive simplifies the process by automatically fetching configurations such as IO config and dummy input required for the `OnnxConversion` pass from [OnnxConfig](https://huggingface.co/docs/transformers/main_classes/onnx#onnx-configurations). This means there's no need for you to manually specify the IO config when using the `OnnxConversion` pass.
+
+You can also provide your own IO config which will override the automatically fetched IO config and dummy inputs:
+```json
+"input_model": {
+    "type": "HfModel",
+    "config": {
+        "model_path": "meta-llama/Llama-2-7b-hf",
+        "io_config": {
+            "input_names": [ "input_ids", "attention_mask", "position_ids" ],
+            "output_names": [ "logits" ],
+            "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
+            "input_types": [ "int64", "int64", "int64" ],
+            "dynamic_axes": {
+                "input_ids": { "0": "batch_size", "1": "sequence_length" },
+                "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
+                "position_ids": { "0": "batch_size", "1": "sequence_length" }
+            }
+        }
+    }
+}
+```

 ## Huggingface datasets
 Olive supports automatically downloading and applying [Huggingface datasets](https://huggingface.co/datasets) to Passes and Evaluators.
--- a/docs/source/features/passes/pytorch.md
+++ b/docs/source/features/passes/pytorch.md
@ -8,7 +8,7 @@ It is based on the [LoRA paper](https://arxiv.org/abs/2106.09685).

 The output model is the input transformers model along with the fine-tuned LoRA adapters. The adapters can be loaded and/or merged into the original model using the `peft` library from Hugging Face.

-This pass only supports Hugging Face transformers PyTorch models. Please refer to [LoRA](lora) for more details about the pass and its config parameters.
+This pass only supports HfModels. Please refer to [LoRA](lora) for more details about the pass and its config parameters.

 ### Example Configuration
 ```json
@ -33,7 +33,7 @@ the QLoRA [paper](https://arxiv.org/abs/2305.14314) and [code](https://github.co
 The output model is the input transformers model along with the quantization config and the fine-tuned LoRA adapters. The adapters can be loaded and/or merged into the original model using the
 `peft` library from Hugging Face.

-This pass only supports Hugging Face transformers PyTorch models. Please refer to [QLoRA](qlora) for more details about the pass and its config parameters.
+This pass only supports HfModels. Please refer to [QLoRA](qlora) for more details about the pass and its config parameters.

 **Note:** QLoRA requires a GPU to run.

@ -60,7 +60,7 @@ and [code](https://github.com/yxli2123/LoftQ). More information on LoRA can be f

 The `LoftQ` pass initializes the quantized LoRA model using the LoftQ initialization method and then fine-tunes the adapters. The output model has new quantization aware master weights and the fine-tuned LoRA adapters.

-This pass only supports Hugging Face transformers PyTorch models. Please refer to [LoftQ](loftq) for more details about the pass and its config parameters.
+This pass only supports HfModels. Please refer to [LoftQ](loftq) for more details about the pass and its config parameters.

 **Note:** LoftQ requires a GPU to run.
 ```json
@ -193,7 +193,7 @@ as 2:4 and 4:8 patterns.

 Please refer to the original paper linked above for more details on the algorithm and performance results for different models, sparsities and datasets.

-This pass only supports Hugging Face transformers PyTorch models. Please refer to [SparseGPT](sparsegpt) for more details on the types of transformers models supported.
+This pass only supports HfModels. Please refer to [SparseGPT](sparsegpt) for more details on the types of transformers models supported.

 **Note:** TensorRT can accelerate inference on 2:4 sparse models as described in [this blog](https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/).

@ -234,7 +234,7 @@ This pass only supports HuggingFace transformer PyTorch models. Please refer to
 applicable. `torch_tensorrt` is an extension to `torch` where TensorRT compiled engines can be used like regular `torch.nn.Module`s. This pass can be used to accelerate inference on transformer models
 with sparse weights by taking advantage of the 2:4 structured sparsity pattern supported by TensorRT.

-This pass only supports Hugging Face transformers PyTorch models. Please refer to [TorchTRTConversion](torch_trt_conversion) for more details on the types of transformers models supported.
+This pass only supports HfModels. Please refer to [TorchTRTConversion](torch_trt_conversion) for more details on the types of transformers models supported.

 ### Example Configuration
 ```json
--- a/docs/source/overview/options.md
+++ b/docs/source/overview/options.md
@ -85,36 +85,31 @@ The default value is 3. User can increase if there are network issues and the op
    "operation_retry_interval" : 5
 },
 ```
+
+<!-- TODO(anyone): Docs for all model handlers-->
 ## Input Model Information

 `input_model: [Dict]`

 User should specify input model type and configuration using `input model` dictionary. It contains following items:

- `type: [str]` Type of the input model which is case insensitive.. The supported types contain `PyTorchModelHandler`, `ONNXModelHandler`, `OpenVINOModelHandler`,`SNPEModelHandler` and etc. You can
+- `type: [str]` Type of the input model which is case insensitive.. The supported types contain `HfModelHandler`, `PyTorchModelHandler`, `ONNXModelHandler`, `OpenVINOModelHandler`,`SNPEModelHandler` and etc. You can
 find more details in [Olive Models](https://microsoft.github.io/Olive/api/models.html).

- `config: [Dict]` For example, for `PytorchModelHandler`, the input model config dictionary specifies following items:
+- `config: [Dict]` For example, for `HfModelHandler`, the input model config dictionary specifies following items:

-    - `model_path: [str | Dict]` The model path can be a string or a dictionary. If it is a string, it is either a string name
-    used by the model loader or the path to the model file/directory. If it is a dictionary, it contains information about the model path.
-    Please refer to [Configuring Model Path](../tutorials/configure_model_path.md) for the more information of the model path dictionary.
+    - `model_path: [str | Dict]` The model path can be a string or a dictionary. If it is a string, it is a huggingface hub model id or a local directory. If it is a dictionary, it contains information about the model path. Please refer to [Configuring Model Path](../tutorials/configure_model_path.md) for the more information of the model path dictionary.

-    - `model_loader: [str]` The name of the function provided by the user to load the model. The function should take the model path as
-    input and return the loaded model.
+    - `task: [str]` The task of the model. The default task is `text-generation-with-past` which is equivalent to a causal language model with key-value cache enabled.

-    - `model_script: [str]` The name of the script provided by the user to assist with model loading.
-
-    - `script_dir: [str]` The directory that contains dependencies for the model script.
-
-    - `io_config: [Dict[str, Any] | IoConfig | str | Callable]`: The inputs and outputs information of the model. It can be a dictionary, an IoConfig object or a function string under `model_script`. Basically, it contains following items:
+    - `io_config: [Dict]`: The inputs and outputs information of the model. If not provided, Olive will try to infer the input and output information from the model. The dictionary contains following items:
        - `input_names: [List[str]]` The input names of the model.
        - `input_types: [List[str]]` The input types of the model.
        - `input_shapes: [List[List[int]]]` The input shapes of the model.
        - `output_names: [List[str]]` The output names of the model.
        - `dynamic_axes: [Dict[str, Dict[str, str]]]` The dynamic axes of the model. The key is the name of the input or output and the value is a dictionary that contains the dynamic axes of the input or output. The key of the value dictionary is the index of the dynamic axis and the value is the name of the dynamic axis. For example, `{"input": {"0": "batch_size"}, "output": {"0": "batch_size"}}` means the first dimension of the input and output is dynamic and the name of the dynamic axis is `batch_size`.
        - `string_to_int_dim_params: List[str]` The list of input names in dynamic axes that need to be converted to int value.
-        - `kv_cache: Union[bool, Dict[str, str]]` The key value cache configuration.
+        - `kv_cache: Union[bool, Dict[str, str]]` The key value cache configuration. If not provided, it is assumed to be `True` if the `task` ends with `-with-past`.
          - If it is `False`, Olive will not use key value cache.
          - If it is `True`, Olive will infer the cache configuration from the input_names/input_shapes and input model based on default `kv_cache`.
          - If it is a dictionary, it should contains the key value cache configuration. Here is an default configuration example:
@ -148,35 +143,15 @@ find more details in [Olive Models](https://microsoft.github.io/Olive/api/models
                The dynamic axis of the past key value cache. If it is null, Olive will infer the dynamic axis.
            - `present_kv_dynamic_axis`: null
                The dynamic axis of the present key value cache. If it is null, Olive will infer the dynamic axis.
-    - <a name="hf_config"></a> `hf_config: [Dict]` Instead of `model_path` or `model_loader`, the model can be specified using a dictionary describing a huggingface
-    model. This dictionary specifies the following items:

-        - `model_name: [str]`: This the model name of the huggingface model such as `distilbert-base-uncased` which will be used to load the model with huggingface `from_pretrained` method.
-
-        - `task: [str]`: This is the task type for the model such as `text-classification`. The complete list of supported task can be found
-        at [huggingface-tasks](https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/pipelines#transformers.pipeline.task).
-
-        - `feature: [str]`: The ONNX export features. This is only needed for HuggingFace hub model. It is inferred from `task` if not provided. You must provide the feature if you need past key value cache.
-        For instance, `"causal-lm-with-past"`. You can find more info at [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
-
-        - `model_class: [str]`: Instead of the `task`, the class of the model can be provided as well. Such as `DistilBertForSequenceClassification`
-
-        - `components: [List[HFComponent]]`: HFComponent list:
-            - `HFComponent`:
-                - `name: [str]`: Component name. Olive will generate a model class with this str as attribute name.
-                - `io_config: [Dict[str, Any] | IoConfig | str | Callable]`: The io_config of this component. If `str`, Olive will load `io_config` from `model_script`.
-                - `component_func: [str]`: The component function name will be loaded from `model_script`.
-                - `dummy_inputs_func: [str]`: The dummy input function name will be loaded from `model_script`.
-            ```
-            For cases where you do not want to use the huggingface model but want to use the huggingface dataset, you can provide `dataset` config only like above.
-
-        - `from_pretrained_args: [dict]`: Arguments to pass to the `from_pretrained` method of the model class. Refer to [this documentation](https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained).
+    - `load_kwargs: [dict]`: Arguments to pass to the `from_pretrained` method of the model class. Refer to [this documentation](https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained).

 Please find the detailed config options from following table for each model type:

 | Model Type | Description |
 |:----------|:-------------|
-| [PytorchModelHandler(pytorch_model) | Pytorch model |
+| [HfModelHandler](hf_model) | Hf model |
+| [PytorchModelHandler](pytorch_model) | Pytorch model |
 | [ONNXModelHandler](onnx_model) | ONNX model |
 | [OpenVINOModelHandler](openvino_model) | OpenVINO IR model |
 | [SNPEModelHandler](snpe_model) | SNPE DLC model |
@ -184,20 +159,9 @@ Please find the detailed config options from following table for each model type
 ### Example
 ```json
 "input_model": {
-    "type": "PyTorchModel",
+    "type": "HfModel",
    "config": {
-        "model_loader": "load_pytorch_origin_model",
-        "model_script": "user_script.py",
-        "io_config": {
-            "input_names": ["input"],
-            "input_types": ["int32"],
-            "input_shapes": [[1, 3, 32, 32]],
-            "output_names": ["output"],
-            "dynamic_axes": {
-                "input": {"0": "batch_size"},
-                "output": {"0": "batch_size"}
-            }
-        }
+        "model_path": "meta-llama/Llama-2-7b-hf"
    }
 }
 ```
--- a/docs/source/tutorials/configure_auto_optimizer.rst
+++ b/docs/source/tutorials/configure_auto_optimizer.rst
@ -92,12 +92,10 @@ Here is another quick comparison between Auto Optimizer and manual settings.

            {
                "input_model":{
-                    "type": "PyTorchModel",
+                    "type": "HfModel",
                    "config": {
-                        "hf_config": {
-                            "model_name": "Intel/bert-base-uncased-mrpc",
-                            "task": "text-classification"
-                        }
+                        "model_path": "Intel/bert-base-uncased-mrpc",
+                        "task": "text-classification"
                    }
                },
                "systems": {
@ -188,12 +186,10 @@ Here is another quick comparison between Auto Optimizer and manual settings.

            {
                "input_model":{
-                    "type": "PyTorchModel",
+                    "type": "HfModel",
                    "config": {
-                        "hf_config": {
-                            "model_name": "Intel/bert-base-uncased-mrpc",
-                            "task": "text-classification"
-                        }
+                        "model_path": "Intel/bert-base-uncased-mrpc",
+                        "task": "text-classification"
                    }
                },
                "systems": {
--- a/docs/source/tutorials/configure_data.rst
+++ b/docs/source/tutorials/configure_data.rst
@ -207,7 +207,7 @@ Convert the transformer dummy data config to the data container.
                name="transformers_dummy_data_config",
                type="TransformersDummyDataContainer",
                load_dataset_config=DataComponentConfig(params={
-                    # model_name can be filled with the model name in input model's hf_config
+                    # model_name can be filled with the model name in input model's model_path
                    # if you start olive with olive run --config <config_path>
                    "model_name": "meta-llama/Llama-2-7b-hf"
                })
--- a/examples/AST/ast.json
+++ b/examples/AST/ast.json
@ -1,12 +1,9 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "hf_config": {
-                "model_class": "ASTForAudioClassification",
-                "model_name": "MIT/ast-finetuned-speech-commands-v2",
-                "task": "audio-classification"
-            },
+            "model_path": "MIT/ast-finetuned-speech-commands-v2",
+            "task": "audio-classification",
            "io_config": {
                "input_names": [ "input_values" ],
                "output_names": [ "logits" ],
--- a/examples/bert/bert_cuda_gpu.json
+++ b/examples/bert/bert_cuda_gpu.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" } }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "systems": {
        "local_system": {
--- a/examples/bert/bert_inc_dynamic_ptq_cpu.json
+++ b/examples/bert/bert_inc_dynamic_ptq_cpu.json
@ -1,16 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_loader": "load_pytorch_origin_model",
-            "model_script": "user_script.py",
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_inc_ptq_cpu.json
+++ b/examples/bert/bert_inc_ptq_cpu.json
@ -1,16 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_loader": "load_pytorch_origin_model",
-            "model_script": "user_script.py",
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_inc_smoothquant_ptq_cpu.json
+++ b/examples/bert/bert_inc_smoothquant_ptq_cpu.json
@ -1,16 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_loader": "load_pytorch_origin_model",
-            "model_script": "user_script.py",
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_inc_static_ptq_cpu.json
+++ b/examples/bert/bert_inc_static_ptq_cpu.json
@ -1,16 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_loader": "load_pytorch_origin_model",
-            "model_script": "user_script.py",
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_nvmo_ptq.json
+++ b/examples/bert/bert_nvmo_ptq.json
@ -1,16 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "model_loader": "load_pytorch_origin_model",
-            "model_script": "nv_user_script.py",
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_ptq_cpu.json
+++ b/examples/bert/bert_ptq_cpu.json
@ -1,12 +1,10 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "hf_config": {
-                "model_name": "Intel/bert-base-uncased-mrpc",
-                "task": "text-classification",
-                "from_pretrained_args": { "attn_implementation": "eager" }
-            }
+            "model_path": "Intel/bert-base-uncased-mrpc",
+            "task": "text-classification",
+            "load_kwargs": { "attn_implementation": "eager" }
        }
    },
    "systems": {
--- a/examples/bert/bert_ptq_cpu_aml.json
+++ b/examples/bert/bert_ptq_cpu_aml.json
@ -5,10 +5,10 @@
        "workspace_name": "<place_holder>"
    },
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
            "model_path": { "type": "azureml_model", "config": { "name": "bert-hf", "version": "3" } },
-            "hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
+            "task": "text-classification"
        }
    },
    "data_configs": [
--- a/examples/bert/bert_qat_customized_train_loop_cpu.json
+++ b/examples/bert/bert_qat_customized_train_loop_cpu.json
@ -1,18 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "hf_config": {
-                "model_name": "Intel/bert-base-uncased-mrpc",
-                "task": "text-classification"
-            },
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "token_type_ids" ],
-                "input_shapes": [ [ 1, 128 ], [ 1, 128 ], [ 1, 128 ] ],
-                "input_types": [ "int64", "int64", "int64" ],
-                "output_names": [ "output" ]
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "data_configs": [
        {
--- a/examples/bert/bert_trt_gpu.json
+++ b/examples/bert/bert_trt_gpu.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "Intel/bert-base-uncased-mrpc", "task": "text-classification" } }
+        "type": "HfModel",
+        "config": { "model_path": "Intel/bert-base-uncased-mrpc", "task": "text-classification" }
    },
    "systems": {
        "local_system": {
--- a/examples/bert/notebook/bert_auto_opt_gpu.json
+++ b/examples/bert/notebook/bert_auto_opt_gpu.json
@ -1,11 +1,9 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "hf_config": {
-                "model_name": "Intel/bert-base-uncased-mrpc",
-                "task": "text-classification"
-            }
+            "model_path": "Intel/bert-base-uncased-mrpc",
+            "task": "text-classification"
        }
    },
    "systems": {
--- a/examples/bert/notebook/multi_ep_search.ipynb
+++ b/examples/bert/notebook/multi_ep_search.ipynb
@ -45,13 +45,11 @@
    "In this notebook, we will use a simple `bert-base-uncased` model as an example:\n",
    "\n",
    "```json\n",
-    "\"input_model\":{\n",
-    "    \"type\": \"PyTorchModel\",\n",
+    "\"input_model\": {\n",
+    "    \"type\": \"HfModel\",\n",
    "    \"config\": {\n",
-    "        \"hf_config\": {\n",
-    "            \"model_name\": \"Intel/bert-base-uncased-mrpc\",\n",
-    "            \"task\": \"text-classification\"\n",
-    "        }\n",
+    "        \"model_path\": \"Intel/bert-base-uncased-mrpc\",\n",
+    "        \"task\": \"text-classification\"\n",
    "    }\n",
    "}\n",
    "```\n",
--- a/examples/bert/npu/bert_snpe.json
+++ b/examples/bert/npu/bert_snpe.json
@ -1,11 +1,9 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "hf_config": {
-                "model_name": "Intel/bert-base-uncased-mrpc",
-                "task": "text-classification"
-            }
+            "model_path": "Intel/bert-base-uncased-mrpc",
+            "task": "text-classification"
        }
    },
    "evaluators": {
--- a/examples/bert/nv_user_script.py
+++ b/examples/bert/nv_user_script.py
@ -4,7 +4,7 @@
 # --------------------------------------------------------------------------
 import torch
 from datasets.utils import logging as datasets_logging  # type: ignore[import]
-from transformers import AutoTokenizer, BertModel  # type: ignore[import]
+from transformers import AutoTokenizer

 from olive.data.registry import Registry

@ -12,12 +12,6 @@ datasets_logging.disable_progress_bar()
 datasets_logging.set_verbosity_error()


-def load_pytorch_origin_model(model_path):
-    model = BertModel.from_pretrained("bert-base-uncased")
-    model.eval()
-    return model
-
-
@Registry.register_dataloader("nvmo_calibration_dataloader")
 def create_calibration_dataloader(dataset, batch_size, calib_size=64, **kwargs):
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
--- a/examples/bert/user_script.py
+++ b/examples/bert/user_script.py
@ -15,7 +15,6 @@ from neural_compressor.data import DefaultDataLoader
 from torch.utils.data import Dataset
 from transformers import (
    AutoConfig,
-    AutoModelForSequenceClassification,
    AutoTokenizer,
    EvalPrediction,
    Trainer,
@ -34,29 +33,6 @@ datasets_logging.set_verbosity_error()
 # pylint: disable=attribute-defined-outside-init, protected-access
 # This file is only used by bert_inc_ptq_cpu, bert_qat_customized_train_loop_cpu

-# -------------------------------------------------------------------------
-# Model Loader
-# -------------------------------------------------------------------------
-
-
-def load_pytorch_origin_model(model_path):
-    model = AutoModelForSequenceClassification.from_pretrained("Intel/bert-base-uncased-mrpc")
-    model.eval()
-    return model
-
-
-# -------------------------------------------------------------------------
-# Dummy Input for ONNX Export
-# -------------------------------------------------------------------------
-
-
-def create_input_tensors(model):
-    return {
-        "input_ids": torch.ones(1, 128, dtype=torch.int64),
-        "attention_mask": torch.ones(1, 128, dtype=torch.int64),
-        "token_type_ids": torch.ones(1, 128, dtype=torch.int64),
-    }
-

 # -------------------------------------------------------------------------
 # Common Dataset
--- a/examples/deberta/azureml_registry_config.json
+++ b/examples/deberta/azureml_registry_config.json
@ -1,6 +1,6 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
            "model_path": {
                "type": "azureml_registry_model",
@ -10,8 +10,7 @@
                    "version": "9"
                }
            },
-            "model_file_format": "PyTorch.MLflow",
-            "hf_config": { "model_name": "microsoft/deberta-base-mnli", "task": "text-classification" }
+            "task": "text-classification"
        }
    },
    "data_configs": [
--- a/examples/falcon/config.json
+++ b/examples/falcon/config.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "tiiuae/falcon-7b", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "tiiuae/falcon-7b", "task": "text-generation" }
    },
    "systems": {
        "local_system": {
--- a/examples/gptj/gptj_inc_dynamic_ptq_cpu.json
+++ b/examples/gptj/gptj_inc_dynamic_ptq_cpu.json
@ -1,13 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "hf_config": {
-                "model_name": "EleutherAI/gpt-j-6B",
-                "task": "text-generation",
-                "feature": "causal-lm-with-past"
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "EleutherAI/gpt-j-6B" }
    },
    "evaluators": {
        "common_evaluator": {
--- a/examples/gptj/gptj_inc_static_ptq_cpu.json
+++ b/examples/gptj/gptj_inc_static_ptq_cpu.json
@ -2,11 +2,7 @@
    "input_model": {
        "type": "PyTorchModel",
        "config": {
-            "hf_config": {
-                "model_name": "EleutherAI/gpt-j-6B",
-                "task": "text-generation",
-                "feature": "causal-lm-with-past"
-            }
+            "model_path": "EleutherAI/gpt-j-6B"
        }
    },
    "evaluators": {
--- a/examples/llama2/.gitignore
+++ b/examples/llama2/.gitignore
@ -1,2 +1,3 @@
 llama2_cpu*
 llama2_gpu*
+llama2_model_builder.json
--- a/examples/llama2/llama2_generate.json
+++ b/examples/llama2/llama2_generate.json
@ -1,22 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "generative": true,
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask" ],
-                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 8 ] ],
-                "input_types": [ "int32", "int32" ],
-                "kv_cache": false
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "model_class": "LlamaForCausalLM",
-                "from_pretrained_args": { "_attn_implementation": "eager" },
-                "task": "text-generation"
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "meta-llama/Llama-2-7b-hf", "generative": true }
    },
    "data_configs": [
        {
--- a/examples/llama2/llama2_model_builder_template.json
+++ b/examples/llama2/llama2_model_builder_template.json
@ -1,13 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "hf_config": {
-                "model_name": "<model_name_placeholder>",
-                "model_class": "LlamaForCausalLM",
-                "task": "text-generation"
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "<model_name_placeholder>" }
    },
    "systems": {
        "local_system": {
--- a/examples/llama2/llama2_qlora.json
+++ b/examples/llama2/llama2_qlora.json
@ -1,23 +1,21 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "meta-llama/Llama-2-7b-hf",
+            "load_kwargs": {
+                "attn_implementation": "eager"
+            },
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int64", "int64", "int64" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "task": "text-generation",
-                "from_pretrained_args": { "_attn_implementation": "eager" }
+                }
            }
        }
    },
--- a/examples/llama2/llama2_template.json
+++ b/examples/llama2/llama2_template.json
@ -1,24 +1,21 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "<model_name_placeholder>",
+            "load_kwargs": {
+                "attn_implementation": "eager"
+            },
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "<model_name_placeholder>",
-                "model_class": "LlamaForCausalLM",
-                "from_pretrained_args": { "_attn_implementation": "eager" },
-                "task": "text-generation"
+                }
            }
        }
    },
--- a/examples/llama2/llama2_tensor_parallel.json
+++ b/examples/llama2/llama2_tensor_parallel.json
@ -1,23 +1,18 @@
 {
-    "input_model": {
-        "type": "PyTorchModel",
+    "input_model":{
+        "type": "HfModel",
        "config": {
+            "model_path": "meta-llama/Llama-2-7b-hf",
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "model_class": "LlamaForCausalLM",
-                "task": "text-generation"
+                }
            }
        }
    },
@ -59,8 +54,6 @@
        }
    },
    "engine": {
-        "log_severity_level": 0,
-        "evaluate_input_model": false,
        "host": "local_system",
        "target": "local_system",
        "cache_dir": "cache",
--- a/examples/llama2/notebook/llama2/conda.yaml
+++ b/examples/llama2/notebook/llama2/conda.yaml
@ -15,5 +15,5 @@ dependencies:
      - scipy
      - sentencepiece
      - torch==2.0.1
-      - transformers
+      - transformers>=4.33.2,<= 4.37.2
      - git+https://github.com/microsoft/Olive#egg=olive-ai[gpu]
--- a/examples/llama2/notebook/llama2/config.json
+++ b/examples/llama2/notebook/llama2/config.json
@ -6,20 +6,8 @@
        "keyvault_name": "<my_keyvault_name>"
    },
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "io_config": {
-                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
-                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
-                "input_types": [ "int32", "int32", "int32" ],
-                "dynamic_axes": {
-                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
-                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
-                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
            "model_path": {
                "type": "azureml_registry_model",
                "config": {
@ -28,8 +16,17 @@
                    "version": "13"
                }
            },
-            "model_file_format": "PyTorch.MLflow",
-            "hf_config": { "model_name": "meta-llama/Llama-2-7b-hf", "task": "text-generation" }
+            "io_config": {
+                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
+                "output_names": [ "logits" ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
+                "input_types": [ "int32", "int32", "int32" ],
+                "dynamic_axes": {
+                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
+                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
+                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
+                }
+            }
        }
    },
    "systems": {
--- a/examples/llama2/notebook/llama2/notebook.ipynb
+++ b/examples/llama2/notebook/llama2/notebook.ipynb
@ -52,7 +52,7 @@
    "In this tutorial, we will use Azure Machine Learning Llama2 curated model. The input model will be automatically downloaded from the [Azure Model catalog](https://ml.azure.com/models/Llama-2-7b/version/13/catalog/registry/azureml-meta):\n",
    "```json\n",
    "\"input_model\":{\n",
-    "    \"type\": \"PyTorchModel\",\n",
+    "    \"type\": \"HfModel\",\n",
    "    \"config\": {\n",
    "        \"model_path\": {\n",
    "            \"type\": \"azureml_registry_model\",\n",
@ -61,11 +61,6 @@
    "                \"registry_name\": \"azureml-meta\",\n",
    "                \"version\": \"13\"\n",
    "            }\n",
-    "        },\n",
-    "        \"model_file_format\": \"PyTorch.MLflow\",\n",
-    "        \"hf_config\": {\n",
-    "            \"model_name\": \"meta-llama/Llama-2-7b-hf\",\n",
-    "            \"task\": \"text-generation\"\n",
    "        }\n",
    "    }\n",
    "}\n",
--- a/examples/llama2/notebook/llama2_multiep/config_cpu.json
+++ b/examples/llama2/notebook/llama2_multiep/config_cpu.json
@ -1,25 +1,22 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "meta-llama/Llama-2-7b-hf",
+            "load_kwargs": {
+                "attn_implementation": "eager"
+            },
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32"
                ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "model_class": "LlamaForCausalLM",
-                "from_pretrained_args": { "_attn_implementation": "eager" },
-                "task": "text-generation"
+                }
            }
        }
    },
--- a/examples/llama2/notebook/llama2_multiep/config_gpu.json
+++ b/examples/llama2/notebook/llama2_multiep/config_gpu.json
@ -1,24 +1,21 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "meta-llama/Llama-2-7b-hf",
+            "load_kwargs": {
+                "attn_implementation": "eager"
+            },
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "model_class": "LlamaForCausalLM",
-                "from_pretrained_args": { "_attn_implementation": "eager" },
-                "task": "text-generation"
+                }
            }
        }
    },
--- a/examples/llama2/notebook/llama2_multiep/config_multi_ep.json
+++ b/examples/llama2/notebook/llama2_multiep/config_multi_ep.json
@ -1,24 +1,21 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "meta-llama/Llama-2-7b-hf",
+            "load_kwargs": {
+                "attn_implementation": "eager"
+            },
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": {
-                "model_name": "meta-llama/Llama-2-7b-hf",
-                "model_class": "LlamaForCausalLM",
-                "from_pretrained_args": { "_attn_implementation": "eager" },
-                "task": "text-generation"
+                }
            }
        }
    },
--- a/examples/llama2/requirements.txt
+++ b/examples/llama2/requirements.txt
@ -3,4 +3,5 @@ onnx>=1.14.0
 optimum>=1.17.0
 protobuf==3.20.2
 torch
-transformers>=4.33.2
+# transformers optimizer fusions don't match in newer versions
+transformers>=4.33.2,<= 4.37.2
--- a/examples/mistral/mistral_fp16_optimize.json
+++ b/examples/mistral/mistral_fp16_optimize.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "mistralai/Mistral-7B-v0.1", "model_class": "MistralForCausalLM" } }
+        "type": "HfModel",
+        "config": { "model_path": "mistralai/Mistral-7B-v0.1" }
    },
    "systems": {
        "local_system": {
--- a/examples/mistral/mistral_int4_optimize.json
+++ b/examples/mistral/mistral_int4_optimize.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "mistralai/Mistral-7B-v0.1", "model_class": "MistralForCausalLM" } }
+        "type": "HfModel",
+        "config": { "model_path": "mistralai/Mistral-7B-v0.1" }
    },
    "systems": {
        "local_system": {
--- a/examples/open_llama/README.md
+++ b/examples/open_llama/README.md
@ -38,13 +38,9 @@ When you run the example config for other larger models, you may need
 1. change the `model_path` to the one you use in `open_llama_config.json` and `user_script.py`.
    ```json
    "input_model":{
-        "type": "OptimumModel",
+        "type": "HfModel",
        "config": {
            "model_path": "openlm-research/open_llama_3b", // to change based on the model you use
-            "model_components": ["decoder_model.onnx", "decoder_with_past_model.onnx"],
-            "hf_config": {
-                "model_class": "LlamaForCausalLM"
-            }
        }
    }
    ```
--- a/examples/open_llama/llama_qlora.json
+++ b/examples/open_llama/llama_qlora.json
@ -1,7 +1,7 @@
 {
    "input_model": {
        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "huggyllama/llama-7b", "task": "text-generation" } }
+        "config": { "model_path": "huggyllama/llama-7b" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/open_llama/open_llama_arc.json
+++ b/examples/open_llama/open_llama_arc.json
@ -5,21 +5,20 @@
        "workspace_name": "<workspace_name>"
    },
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "openlm-research/open_llama_3b",
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
+                }
+            }
        }
    },
    "systems": {
--- a/examples/open_llama/open_llama_config.json
+++ b/examples/open_llama/open_llama_config.json
@ -1,20 +1,19 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "openlm-research/open_llama_3b",
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
+                }
+            }
        }
    },
    "data_configs": [ { "name": "transformer_token_dummy_data", "type": "TransformersTokenDummyDataContainer" } ],
--- a/examples/open_llama/open_llama_inc_woq.json
+++ b/examples/open_llama/open_llama_inc_woq.json
@ -1,20 +1,19 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "openlm-research/open_llama_3b",
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
                    "attention_mask": { "0": "batch_size", "1": "total_sequence_length" },
                    "position_ids": { "0": "batch_size", "1": "sequence_length" }
-                },
-                "kv_cache": true
-            },
-            "hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" }
+                }
+            }
        }
    },
    "evaluators": {
--- a/examples/open_llama/open_llama_loftq_tinycodes.json
+++ b/examples/open_llama/open_llama_loftq_tinycodes.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "openlm-research/open_llama_7b_v2" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/open_llama/open_llama_lora_tinycodes.json
+++ b/examples/open_llama/open_llama_lora_tinycodes.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "openlm-research/open_llama_7b_v2" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/open_llama/open_llama_qlora_ort_tinycodes.json
+++ b/examples/open_llama/open_llama_qlora_ort_tinycodes.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "openlm-research/open_llama_7b_v2" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/open_llama/open_llama_qlora_tinycodes.json
+++ b/examples/open_llama/open_llama_qlora_tinycodes.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "openlm-research/open_llama_7b_v2", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "openlm-research/open_llama_7b_v2" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/open_llama/open_llama_sparsegpt_gpu.json
+++ b/examples/open_llama/open_llama_sparsegpt_gpu.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "openlm-research/open_llama_3b", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "openlm-research/open_llama_3b" }
    },
    "systems": {
        "local_system": {
--- a/examples/open_llama/user_script.py
+++ b/examples/open_llama/user_script.py
@ -92,9 +92,7 @@ def eval_accuracy(model: OliveModelHandler, data_dir, batch_size, device, execut
    if model.framework == Framework.PYTORCH:
        eval_args = LMEvalParser(
            model="hf",
-            model_args=(
-                f"pretrained={model.model_path or model.hf_config.model_name},tokenizer={model_id},dtype=float32"
-            ),
+            model_args=f"pretrained={model.model_path},tokenizer={model_id},dtype=float32",
            batch_size=batch_size,
            tasks="lambada_openai",
            device="cpu",
--- a/examples/opt_125m/awq.json
+++ b/examples/opt_125m/awq.json
@ -1,11 +1,13 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
            "io_config": {
+                "model_path": "facebook/opt_125m",
+                "task": "text-generation",
                "input_names": [ "input_ids", "attention_mask" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
@ -18,11 +20,6 @@
                    "ort_present_value_name": "present_value_<id>",
                    "dtype": "float16"
                }
-            },
-            "hf_config": {
-                "model_name": "facebook/opt-125m",
-                "task": "text-generation",
-                "from_pretrained_args": { "trust_remote_code": true }
            }
        }
    },
--- a/examples/phi/phi_qlora_tinycodes.json
+++ b/examples/phi/phi_qlora_tinycodes.json
@ -1,13 +1,8 @@
 {
+
    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "hf_config": {
-                "model_name": "microsoft/phi-1_5",
-                "task": "text-generation",
-                "from_pretrained_args": { "trust_remote_code": true }
-            }
-        }
+        "type": "HfModel",
+        "config": { "model_path": "microsoft/phi-1_5" }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
    "data_configs": [
--- a/examples/phi2/phi2_genai.json
+++ b/examples/phi2/phi2_genai.json
@ -1,7 +1,7 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
-        "config": { "hf_config": { "model_name": "microsoft/phi-2", "task": "text-generation" } }
+        "type": "HfModel",
+        "config": { "model_path": "microsoft/phi-2" }
    },
    "systems": {
        "local_system": {
--- a/examples/phi2/phi2_optimize_template.json
+++ b/examples/phi2/phi2_optimize_template.json
@ -1,11 +1,12 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
+            "model_path": "microsoft/phi-2",
            "io_config": {
                "input_names": [ "input_ids", "attention_mask", "position_ids" ],
                "output_names": [ "logits" ],
-                "input_shapes": [ [ 2, 8 ], [ 2, 40 ], [ 2, 8 ] ],
+                "input_shapes": [ [ 2, 8 ], [ 2, 8 ], [ 2, 8 ] ],
                "input_types": [ "int32", "int32", "int32" ],
                "dynamic_axes": {
                    "input_ids": { "0": "batch_size", "1": "sequence_length" },
@ -18,11 +19,6 @@
                    "ort_present_key_name": "present_key_<id>",
                    "ort_present_value_name": "present_value_<id>"
                }
-            },
-            "hf_config": {
-                "model_name": "microsoft/phi-2",
-                "task": "text-generation",
-                "from_pretrained_args": { "trust_remote_code": true }
            }
        }
    },
--- a/examples/phi2/requirements-splicegpt.txt
+++ b/examples/phi2/requirements-splicegpt.txt
--- a/examples/phi3/phi3.py
+++ b/examples/phi3/phi3.py
@ -28,8 +28,7 @@ AML_MODEL_Path = {
    "model_path": {
        "type": "azureml_registry_model",
        "config": {"registry_name": "azureml", "name": "Phi-3-mini-4k-instruct", "version": "7"},
-    },
-    "model_file_format": "PyTorch.MLflow",
+    }
 }


@ -49,16 +48,20 @@ def get_args(raw_args):
        type=str,
        default=None,
        choices=["qlora", "lora"],
-        help="Finetune method before onnxruntime optimization. "
-        "qlora finetuned model cannot be converted to onnx by model builder.",
+        help=(
+            "Finetune method before onnxruntime optimization. "
+            "qlora finetuned model cannot be converted to onnx by model builder."
+        ),
    )
    parser.add_argument(
        "--precision",
        type=str,
        default="int4",
        choices=["fp32", "fp16", "int4"],
-        help="Choose from fp32 or int4(default) for cpu target; "
-        "fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web",
+        help=(
+            "Choose from fp32 or int4(default) for cpu target; "
+            "fp32 or fp16 or int4(default) for gpu target; int4(default) for mobile or web"
+        ),
    )
    parser.add_argument(
        "--inference",
--- a/examples/phi3/phi3_template.json
+++ b/examples/phi3/phi3_template.json
@ -1,13 +1,7 @@
 {
-    "input_model": {
-        "type": "PyTorchModel",
-        "config": {
-            "hf_config": {
-                "model_name": "microsoft/Phi-3-mini-4k-instruct",
-                "task": "text-generation",
-                "from_pretrained_args": { "trust_remote_code": true }
-            }
-        }
+    "input_model":{
+        "type": "HfModel",
+        "config": { "model_path": "microsoft/Phi-3-mini-4k-instruct" }
    },
    "systems": {
        "local_system": {
--- a/examples/red_pajama/config.json
+++ b/examples/red_pajama/config.json
@ -1,11 +1,8 @@
 {
    "input_model": {
-        "type": "PyTorchModel",
+        "type": "HfModel",
        "config": {
-            "hf_config": {
-                "model_name": "togethercomputer/RedPajama-INCITE-Base-3B-v1",
-                "model_class": "GPTNeoXForCausalLM"
-            }
+            "model_path": "togethercomputer/RedPajama-INCITE-Base-3B-v1"
        }
    },
    "systems": { "local_system": { "type": "LocalSystem", "config": { "accelerators": [ { "device": "gpu" } ] } } },
--- a/examples/whisper/code/user_script.py
+++ b/examples/whisper/code/user_script.py
@ -3,6 +3,7 @@
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
 from past_helper import PastKeyValuesHelper
+from transformers import AutoConfig, WhisperForConditionalGeneration
 from whisper_dataset import WhisperDataset
 from whisper_decoder import WhisperDecoder, WhisperDecoderInputs
 from whisper_encoder_decoder_init import WhisperEncoderDecoderInit, WhisperEncoderDecoderInitInputs
@ -11,9 +12,8 @@ from olive.data.registry import Registry
 from olive.model import PyTorchModelHandler


-def get_encoder_decoder_init(olive_model: PyTorchModelHandler):
-    # model is WhisperForConditionalGeneration
-    model = olive_model.load_model()
+def get_encoder_decoder_init(model_path: str):
+    model = WhisperForConditionalGeneration.from_pretrained(model_path, attn_implementation="eager")
    return WhisperEncoderDecoderInit(
        model,
        model,
@ -22,9 +22,8 @@ def get_encoder_decoder_init(olive_model: PyTorchModelHandler):
    )


-def get_decoder(olive_model: PyTorchModelHandler):
-    # model is WhisperForConditionalGeneration
-    model = olive_model.load_model()
+def get_decoder(model_path: str):
+    model = WhisperForConditionalGeneration.from_pretrained(model_path, attn_implementation="eager")
    return WhisperDecoder(model, model.config)


@ -104,7 +103,7 @@ def get_encdec_io_config(olive_model: PyTorchModelHandler):
 def get_dec_io_config(olive_model: PyTorchModelHandler):
    # Fix past disappearing bug - duplicate first past entry
    # input_list.insert(2, input_list[2])
-    config = olive_model.get_hf_model_config()
+    config = AutoConfig.from_pretrained(olive_model.model_path)
    past_names = PastKeyValuesHelper.get_past_names(config.decoder_layers, present=False)
    present_names = PastKeyValuesHelper.get_past_names(config.decoder_layers, present=True)
    present_self_names = present_names[: 2 * config.decoder_layers]
@ -145,7 +144,7 @@ def get_dec_io_config(olive_model: PyTorchModelHandler):

 def encoder_decoder_init_dummy_inputs(olive_model: PyTorchModelHandler):
    inputs = WhisperEncoderDecoderInitInputs.create_dummy(
-        olive_model.get_hf_model_config(),
+        AutoConfig.from_pretrained(olive_model.model_path),
        batch_size=2,
        encode_sequence_length=3000,
        use_decoder_input_ids=True,
@ -157,7 +156,7 @@ def encoder_decoder_init_dummy_inputs(olive_model: PyTorchModelHandler):

 def decoder_dummy_inputs(olive_model: PyTorchModelHandler):
    inputs = WhisperDecoderInputs.create_dummy(
-        olive_model.get_hf_model_config(),
+        AutoConfig.from_pretrained(olive_model.model_path),
        batch_size=2,
        encode_sequence_length=3000,
        past_decode_sequence_length=5,
--- a/examples/whisper/prepare_whisper_configs.py
+++ b/examples/whisper/prepare_whisper_configs.py
@ -10,7 +10,7 @@ from urllib import request

 from onnxruntime import __version__ as OrtVersion
 from packaging import version
-from transformers import __version__ as TransformersVersion
+from transformers import AutoConfig

 SUPPORTED_WORKFLOWS = {
    ("cpu", "fp32"): ["conversion", "transformers_optimization", "insert_beam_search", "prepost"],
@ -95,7 +95,6 @@ def main(raw_args=None):

    # version check
    version_1_16 = version.parse(OrtVersion) >= version.parse("1.16.0")
-    transformers_version_4_36 = version.parse(TransformersVersion) >= version.parse("4.36.0")

    # multi-lingual support check
    if not version_1_16:
@ -114,10 +113,15 @@ def main(raw_args=None):
        template_json = json.load(f)
    model_name = args.model_name

-    # update model name
-    template_json["input_model"]["config"]["hf_config"]["model_name"] = model_name
-    if transformers_version_4_36:
-        template_json["input_model"]["config"]["hf_config"]["from_pretrained_args"] = {"attn_implementation": "eager"}
+    # update model paths
+    for model_component in template_json["input_model"]["config"]["model_components"]:
+        model_component["config"]["model_path"] = model_name
+    # update model attributes
+    template_json["input_model"]["config"]["model_attributes"] = model_attributes = AutoConfig.from_pretrained(
+        model_name
+    ).to_dict()
+    # remove suppress_tokens since it takes too much space in the config
+    model_attributes.pop("suppress_tokens", None)

    load_dataset_params = template_json["data_configs"][0]["load_dataset_config"]["params"]
    load_dataset_params["model_name"] = model_name
--- a/examples/whisper/test_transcription.py
+++ b/examples/whisper/test_transcription.py
@ -60,7 +60,7 @@ def main(raw_args=None):
        config = json.load(f)

    # get model information
-    model_name = config["input_model"]["config"]["hf_config"]["model_name"]
+    model_name = config["input_model"]["config"]["model_components"][0]["config"]["model_path"]
    use_audio_decoder = config["passes"]["prepost"]["config"]["tool_command_args"]["use_audio_decoder"]
    # check if model is multilingual
    multilingual = config["passes"]["insert_beam_search"]["config"].get("use_forced_decoder_ids", False)
--- a/examples/whisper/whisper_template.json
+++ b/examples/whisper/whisper_template.json
@ -1,27 +1,33 @@
 {
-    "input_model": {
-        "type": "PyTorchModel",
+    "input_model":{
+        "type": "CompositeModel",
        "config": {
-            "model_script": "code/user_script.py",
-            "script_dir": "code",
-            "hf_config": {
-                "model_class": "WhisperForConditionalGeneration",
-                "model_name": "<place_holder>",
-                "components": [
-                    {
-                        "name": "encoder_decoder_init",
+            "model_component_names": ["encoder_decoder_init", "decoder"],
+            "model_components": [
+                {
+                    "type": "PyTorchModel",
+                    "config" : {
+                        "model_path": "<place_holder>",
+                        "model_script": "code/user_script.py",
+                        "script_dir": "code",
+                        "model_loader": "get_encoder_decoder_init",
                        "io_config": "get_encdec_io_config",
-                        "component_func": "get_encoder_decoder_init",
                        "dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
-                    },
-                    {
-                        "name": "decoder",
+                    }
+                },
+                {
+                    "type": "PyTorchModel",
+                    "config" : {
+                        "model_path": "<place_holder>",
+                        "model_script": "code/user_script.py",
+                        "script_dir": "code",
+                        "model_loader": "get_decoder",
                        "io_config": "get_dec_io_config",
-                        "component_func": "get_decoder",
                        "dummy_inputs_func": "decoder_dummy_inputs"
                    }
-                ]
-            }
+                }
+            ],
+            "model_attributes": "<place_holder>"
        }
    },
    "systems": {
--- a/olive/engine/cache.py
+++ b/olive/engine/cache.py
@ -4,12 +4,14 @@
 # --------------------------------------------------------------------------
 import json
 import logging
+import os
 import shutil
 from dataclasses import asdict, dataclass
 from pathlib import Path
 from typing import TYPE_CHECKING, Dict, Optional, Union

 from olive.common.config_utils import ConfigBase, convert_configs_to_dicts, serialize_to_json, validate_config
+from olive.common.constants import DEFAULT_CACHE_DIR, DEFAULT_WORKFLOW_ID
 from olive.common.utils import hash_dict, set_nested_dict_value
 from olive.resource_path import ResourcePath, create_resource_path, find_all_resources

@ -25,6 +27,19 @@ class CacheSubDirs:
    runs: Path
    evaluations: Path
    resources: Path
+    mlflow: Path
+    cloud_cache: Path
+
+    @classmethod
+    def from_cache_dir(cls, cache_dir: Path) -> "CacheSubDirs":
+        return cls(
+            models=cache_dir / "models",
+            runs=cache_dir / "runs",
+            evaluations=cache_dir / "evaluations",
+            resources=cache_dir / "resources",
+            mlflow=cache_dir / "mlflow",
+            cloud_cache=cache_dir / "cloud_cache",
+        )


 class OliveCache:
@ -36,12 +51,7 @@ class OliveCache:
    ):
        self.cache_dir = Path(cache_dir).resolve()
        logger.info("Using cache directory: %s", self.cache_dir)
-        self.dirs = CacheSubDirs(
-            models=self.cache_dir / "models",
-            runs=self.cache_dir / "runs",
-            evaluations=self.cache_dir / "evaluations",
-            resources=self.cache_dir / "resources",
-        )
+        self.dirs = CacheSubDirs.from_cache_dir(self.cache_dir)

        if clean_evaluation_cache and self.dirs.evaluations.exists():
            shutil.rmtree(self.dirs.evaluations, ignore_errors=True)
@ -243,7 +253,7 @@ class OliveCache:
        with model_jsons[0].open("r") as f:
            model_json = serialize_to_json(json.load(f))

-        if model_json["type"].lower() in ("compositemodel", "compositepytorchmodel"):
+        if model_json["type"].lower() == "compositemodel":
            logger.warning("Saving models of type '%s' is not supported yet.", model_json["type"])
            return None

@ -289,3 +299,17 @@ class OliveCache:
            json.dump(model_json, f, indent=4)

        return model_json
+
+    def set_cache_env(self):
+        """Set environment variable for the cache directory."""
+        os.environ["OLIVE_CACHE_DIR"] = str(self.cache_dir)
+        logger.debug("Set OLIVE_CACHE_DIR: %s", self.cache_dir)
+
+    @classmethod
+    def from_cache_env(cls) -> "OliveCache":
+        """Create an OliveCache object from the cache directory environment variable."""
+        cache_dir = os.environ.get("OLIVE_CACHE_DIR")
+        if cache_dir is None:
+            logger.debug("OLIVE_CACHE_DIR environment variable not set. Using default cache directory.")
+            cache_dir = Path(DEFAULT_CACHE_DIR).resolve() / DEFAULT_WORKFLOW_ID
+        return cls(cache_dir)
--- a/olive/common/constants.py
+++ b/olive/common/constants.py
@ -14,7 +14,13 @@ class OS(str, Enum):

 DEFAULT_WORKFLOW_ID = "default_workflow"

+DEFAULT_CACHE_DIR = ".olive-cache"
+

 ############# Packaging #############

 BASE_IMAGE = "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04"
+
+############# HF #############
+
+DEFAULT_HF_TASK = "text-generation-with-past"
--- a/olive/model/handler/mixin/composite.py
+++ b/olive/model/handler/mixin/composite.py
@ -2,9 +2,3 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
-class CompositeMixin:
-    def set_composite_parent(self, cp):
-        self.composite_parent = cp
-
-    def get_composite_parent(self):
-        return self.composite_parent
--- a/olive/common/hf/login.py
+++ b/olive/common/hf/login.py
@ -0,0 +1,29 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+import logging
+import os
+
+logger = logging.getLogger(__name__)
+
+
+def huggingface_login(token: str):
+    from huggingface_hub import login
+
+    login(token=token)
+
+
+def aml_runner_hf_login():
+    hf_login = os.environ.get("HF_LOGIN")
+    if hf_login:
+        from azure.identity import DefaultAzureCredential
+        from azure.keyvault.secrets import SecretClient
+
+        keyvault_name = os.environ.get("KEYVAULT_NAME")
+        logger.debug("Getting token from keyvault %s", keyvault_name)
+
+        credential = DefaultAzureCredential()
+        secret_client = SecretClient(vault_url=f"https://{keyvault_name}.vault.azure.net/", credential=credential)
+        token = secret_client.get_secret("hf-token").value
+        huggingface_login(token)
--- a/olive/model/utils/hf_mappings.py
+++ b/olive/model/utils/hf_mappings.py
--- a/olive/common/hf/mlflow.py
+++ b/olive/common/hf/mlflow.py
@ -0,0 +1,50 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+from pathlib import Path
+
+import yaml
+
+
+def is_mlflow_transformers(model_name_or_path: str) -> bool:
+    yaml_path = Path(model_name_or_path) / "MLmodel"
+
+    if not yaml_path.exists():
+        return False
+
+    with open(yaml_path) as fp:
+        mlflow_data = yaml.safe_load(fp)
+        # default flavor is "hftransformersv2" from azureml.evaluate.mlflow>=0.0.8
+        # "hftransformers" from azureml.evaluate.mlflow<0.0.8
+        # TODO(trajep): let user specify flavor name if needed
+        # to support other flavors in mlflow not only hftransformers
+        flavors = mlflow_data.get("flavors", {})
+        if not flavors or not any(flavor.startswith("hftransformers") for flavor in flavors):
+            raise ValueError(
+                "Invalid MLFlow model format. Please make sure the input model"
+                " format is same with the result of mlflow.transformers.save_model,"
+                " or aml_mlflow.hftransformers.save_model from azureml.evaluate.mlflow"
+            )
+
+    return True
+
+
+def get_pretrained_name_or_path(model_name_or_path: str, name: str) -> str:
+    if not is_mlflow_transformers(model_name_or_path):
+        # assumed to be an hf hub id or a local checkpoint
+        return model_name_or_path
+
+    parent_dir = Path(model_name_or_path).resolve()
+
+    # assumed to be an mlflow model
+    pretrained_path = parent_dir / "data" / name
+    if pretrained_path.exists():
+        return str(pretrained_path)
+
+    # some mlflow models only have model directory
+    model_dir = parent_dir / "data" / "model"
+    if model_dir.exists():
+        return str(model_dir)
+
+    raise ValueError("Invalid MLFlow model format.")
--- a/olive/common/hf/model_io.py
+++ b/olive/common/hf/model_io.py
@ -0,0 +1,120 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+import logging
+from functools import partial
+from itertools import chain
+from typing import TYPE_CHECKING, Callable, Dict, Optional
+
+from olive.common.hf.utils import get_feature_from_task, get_model_config, get_tokenizer
+from olive.common.utils import get_attr
+
+if TYPE_CHECKING:
+    from transformers.onnx import OnnxConfig
+
+logger = logging.getLogger(__name__)
+
+
+# patched version of transformers.onnx.features.supported_features_mapping
+# to support additional models in olive
+def patched_supported_features_mapping(
+    *supported_features: str, onnx_config_cls: Optional[str] = None
+) -> Dict[str, Callable]:
+    """Generate the mapping between supported the features and their corresponding OnnxConfig for a given model.
+
+    Args:
+        *supported_features: The names of the supported features.
+        onnx_config_cls: The OnnxConfig full name corresponding to the model.
+
+    Returns:
+        The dictionary mapping a feature to an OnnxConfig constructor.
+
+    """
+    if onnx_config_cls is None:
+        raise ValueError("A OnnxConfig class must be provided")
+
+    from olive.common.hf import onnx_config
+
+    config_cls = get_attr(onnx_config, onnx_config_cls)
+    mapping = {}
+    for feature in supported_features:
+        if "-with-past" in feature:
+            mapping[feature] = partial(config_cls.with_past, task=feature.replace("-with-past", ""))
+        else:
+            mapping[feature] = partial(config_cls.from_model_config, task=feature)
+
+    return mapping
+
+
+# TODO(jambayk): switch to optimum backend and make this an optional feature
+# remove "feature" entirely from the codebase
+def get_onnx_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs) -> "OnnxConfig":
+    # pylint: disable=protected-access
+    from transformers.onnx import FeaturesManager
+
+    from olive.common.hf.onnx_config import ADDITIONAL_MODEL_TYPES
+
+    # patch FeaturesManager._SUPPORTED_MODEL_TYPE to support additional models in olive
+    for model_type, feature_list in ADDITIONAL_MODEL_TYPES.items():
+        if model_type in FeaturesManager._SUPPORTED_MODEL_TYPE:
+            continue
+        # TODO(trajep): remove the need for unpacking feature_list
+        features, onnx_config_cls = feature_list
+        FeaturesManager._SUPPORTED_MODEL_TYPE[model_type] = patched_supported_features_mapping(
+            *features, onnx_config_cls=onnx_config_cls
+        )
+
+    # if feature is not provided, try to get it from task
+    # else use "default"
+    feature = feature or get_feature_from_task(task) or "default"
+
+    # don't want to load the model here since all we need is the config
+    # model loading is expensive computationally and memory-wise for large models
+    config = get_model_config(model_name, **kwargs)
+    # recreate the logic for FeaturesManager.check_supported_model_or_raise to get the model_onnx_config
+    # https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/features.py#L712
+    model_type = config.model_type.replace("_", "-")
+    onnx_config = None
+    try:
+        model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
+        if feature in model_features:
+            onnx_config = FeaturesManager.get_config(model_type, feature)(config)
+        else:
+            logger.debug(
+                "%s doesn't support feature %s. Supported features are: %s", model_type, feature, model_features
+            )
+    except KeyError:
+        logger.debug("Model type %s is not supported", model_type)
+
+    return onnx_config
+
+
+def get_model_io_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
+    # just log a debug message if io_config is not found
+    # this is not a critical error and the caller may not need the io_config
+    model_config = get_onnx_config(model_name, task, feature, **kwargs)
+    if not model_config:
+        return None
+
+    inputs = model_config.inputs
+    outputs = model_config.outputs
+    if not inputs or not outputs:
+        # just log a warning and return None, since this is not a critical error
+        # and following pass may not use the io_config, like OptimumConversion
+        logger.debug("No inputs or outputs found from hf onnx_config %s. Won't use it to get io config", model_config)
+        return None
+
+    io_config = {}
+    io_config["input_names"] = list(inputs.keys())
+    io_config["output_names"] = list(outputs.keys())
+    io_config["dynamic_axes"] = dict(chain(inputs.items(), outputs.items()))
+    return io_config
+
+
+def get_model_dummy_input(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
+    model_config = get_onnx_config(model_name, task, feature, **kwargs)
+    if not model_config:
+        return None
+    tokenizer = get_tokenizer(model_name)
+    return model_config.generate_dummy_inputs(tokenizer, framework="pt")
--- a/olive/model/utils/hf_onnx_config.py
+++ b/olive/model/utils/hf_onnx_config.py
--- a/olive/common/hf/utils.py
+++ b/olive/common/hf/utils.py
@ -0,0 +1,151 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+import logging
+from typing import TYPE_CHECKING, Optional, Tuple, Union
+
+from transformers import AutoConfig, AutoModel, AutoTokenizer, GenerationConfig
+
+from olive.common.hf.mappings import FEATURE_TO_PEFT_TASK_TYPE, MODELS_TO_MAX_LENGTH_MAPPING, TASK_TO_FEATURE
+from olive.common.hf.mlflow import get_pretrained_name_or_path
+
+if TYPE_CHECKING:
+    from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast
+
+logger = logging.getLogger(__name__)
+
+
+def load_model_from_task(task: str, model_name_or_path: str, **kwargs) -> "PreTrainedModel":
+    """Load huggingface model from task and model_name_or_path."""
+    from transformers.pipelines import check_task
+
+    task_results = check_task(task.replace("-with-past", ""))
+    assert isinstance(task_results, tuple)
+    if len(task_results) == 2:
+        targeted_task = task_results[0]
+    elif len(task_results) == 3:
+        targeted_task = task_results[1]
+    else:
+        raise ValueError("unsupported transformers version")
+
+    class_tuple = targeted_task["pt"] or (AutoModel,)
+    model = None
+    for i, model_class in enumerate(class_tuple):
+        try:
+            model = from_pretrained(model_class, model_name_or_path, "model", **kwargs)
+            logger.debug("Loaded model %s with name_or_path %s", model_class, model_name_or_path)
+            break
+        except (OSError, ValueError) as e:
+            if i == len(class_tuple) - 1:
+                # len(class_tuple) == 1 covers most common tasks like text-generation, text-classification, etc
+                # error could be device OOM, device_map: "auto" not supported, etc
+
+                # len(class_tuple) > 1: not common - image-segmentation, conversational, etc
+                # there is no easy way to get tracebacks for earlier failures, so just raise from last
+                raise
+            # the ValueError need to be caught since there will be multiple model_class for single task.
+            # if the model_class is not the one for the task, it will raise ValueError and
+            # next model_class will be tried.
+            logger.info(
+                "Failed to load model %s with name_or_path %s.\n kwargs: %s.\n Exception raised: %s",
+                model_class,
+                model_name_or_path,
+                kwargs,
+                e,
+            )
+
+    # this won't be None since class_tuple is never empty and we only reach here if model loaded successfully
+    # satisfies linter too
+    return model
+
+
+def from_pretrained(cls, model_name_or_path: str, mlflow_dir: str, **kwargs):
+    """Call cls.from_pretrained with hf checkpoint or mlflow model.
+
+    If the model_name_or_path is an MLFlow model, the corresponding subdirectory is used.
+    """
+    return cls.from_pretrained(get_pretrained_name_or_path(model_name_or_path, mlflow_dir), **kwargs)
+
+
+def get_model_config(model_name_or_path: str, **kwargs) -> "PretrainedConfig":
+    """Get HF Config for the given model_name_or_path."""
+    return from_pretrained(AutoConfig, model_name_or_path, "config", **kwargs)
+
+
+def save_model_config(config: Union["PretrainedConfig", "GenerationConfig"], output_dir: str, **kwargs):
+    """Save input HF Config to output directory."""
+    config.save_pretrained(output_dir, **kwargs)
+
+
+def get_generation_config(model_name_or_path: str, **kwargs) -> Optional["GenerationConfig"]:
+    """Get HF model's generation config for the given model_name_or_path. If not found, return None."""
+    try:
+        return from_pretrained(GenerationConfig, model_name_or_path, "model", **kwargs)
+    except OSError:
+        return None
+
+
+def get_tokenizer(model_name_or_path: str, **kwargs) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
+    """Get HF model's tokenizer."""
+    return from_pretrained(AutoTokenizer, model_name_or_path, "tokenizer", **kwargs)
+
+
+def save_tokenizer(
+    tokenizer: Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"], output_dir: str, **kwargs
+) -> Tuple[str]:
+    """Save input tokenizer to output directory."""
+    return tokenizer.save_pretrained(output_dir, **kwargs)
+
+
+# TODO(jambayk): Remove this once we transition away from using "feature"
+def get_feature_from_task(task: str, fail_on_not_found=False) -> str:
+    """Get feature from task."""
+    feature = TASK_TO_FEATURE.get(task.replace("-with-past", ""), None)
+    not_found_msg = f"There is no feature for task {task}"
+    if feature is None and fail_on_not_found:
+        raise ValueError(not_found_msg)
+    elif feature is None:
+        logger.warning(not_found_msg)
+    elif task.endswith("-with-past"):
+        feature += "-with-past"
+    return feature
+
+
+def get_peft_task_type_from_task(task: str, fail_on_not_found=False) -> str:
+    """Get peft task type from feature."""
+    feature = get_feature_from_task(task)
+    peft_task_type = FEATURE_TO_PEFT_TASK_TYPE.get(feature.replace("-with-past", ""), None) if feature else None
+    not_found_msg = f"There is no peft task type for task {task}"
+    if peft_task_type is None and fail_on_not_found:
+        raise ValueError(not_found_msg)
+    elif peft_task_type is None:
+        logger.warning(not_found_msg)
+    return peft_task_type
+
+
+def get_model_max_length(model_name_or_path: str, fail_on_not_found=False) -> int:
+    """Get max length of the model, extracted from the config."""
+    model_config = get_model_config(model_name_or_path)
+    model_type = model_config.model_type
+
+    max_length = MODELS_TO_MAX_LENGTH_MAPPING.get(model_type, None)
+    if isinstance(max_length, int):
+        return max_length
+    elif isinstance(max_length, str):
+        return getattr(model_config, max_length)
+    else:
+        logger.debug(
+            "No max length mapping found in MODELS_TO_MAX_LENGTH_MAPPING for model type %s, trying __default__",
+            model_type,
+        )
+        default_max_length = MODELS_TO_MAX_LENGTH_MAPPING["__default__"]
+        try:
+            return getattr(model_config, default_max_length)
+        except AttributeError:
+            not_found_msg = f"Could not find max length for model type {model_type}"
+            if fail_on_not_found:
+                raise ValueError(not_found_msg) from None
+            else:
+                logger.warning(not_found_msg)
+                return None
--- a/olive/common/import_lib.py
+++ b/olive/common/import_lib.py
@ -8,7 +8,7 @@ from pathlib import Path
 from typing import Optional, Union


-def import_module_from_file(module_path: Union[Path, str], module_name: str = None):
+def import_module_from_file(module_path: Union[Path, str], module_name: Optional[str] = None):
    module_path = Path(module_path).resolve()
    if not module_path.exists():
        raise ValueError(f"{module_path} doesn't exist")
--- a/olive/common/utils.py
+++ b/olive/common/utils.py
@ -16,7 +16,7 @@ import subprocess
 import tempfile
 import time
 from pathlib import Path
-from typing import Dict, List, Tuple, Union
+from typing import Dict, List, Optional, Tuple, Union

 from olive.common.constants import OS

@ -171,6 +171,13 @@ def set_nested_dict_value(dictionary: dict, key: Union[str, Tuple, List[str]], n
    dictionary[key[-1]] = new_value


+def dict_diff(dict1: Optional[dict], dict2: Optional[dict]) -> Optional[dict]:
+    """Return all members of dict1 that are not in dict2 or have different values."""
+    dict1 = dict1 or {}
+    dict2 = dict2 or {}
+    return {k: v for k, v in dict1.items() if k not in dict2 or dict2[k] != v} or None
+
+
 def retry_func(func, args=None, kwargs=None, max_tries=3, delay=5, backoff=2, exceptions=None):
    """Retry a function call using an exponential backoff.

@ -288,27 +295,6 @@ def find_submodules(module, submodule_types, full_name=False):
    return list(submodules) if submodules else None


-def huggingface_login(token: str):
-    from huggingface_hub import login
-
-    login(token=token)
-
-
-def aml_runner_hf_login():
-    hf_login = os.environ.get("HF_LOGIN")
-    if hf_login:
-        from azure.identity import DefaultAzureCredential
-        from azure.keyvault.secrets import SecretClient
-
-        keyvault_name = os.environ.get("KEYVAULT_NAME")
-        logger.debug("Getting token from keyvault %s", keyvault_name)
-
-        credential = DefaultAzureCredential()
-        secret_client = SecretClient(vault_url=f"https://{keyvault_name}.vault.azure.net/", credential=credential)
-        token = secret_client.get_secret("hf-token").value
-        huggingface_login(token)
-
-
 def all_files(path, ignore=None):
    """Find all files in a directory recursively, optionally ignoring some paths.

--- a/olive/constants.py
+++ b/olive/constants.py
@ -23,7 +23,6 @@ class ModelFileFormat(str, Enum):
    PYTORCH_ENTIRE_MODEL = "PyTorch.EntireModel"
    PYTORCH_STATE_DICT = "PyTorch.StateDict"
    PYTORCH_TORCH_SCRIPT = "PyTorch.TorchScript"
-    PYTORCH_MLFLOW_MODEL = "PyTorch.MLflow"
    PYTORCH_SLICE_GPT_MODEL = "PyTorch.SliceGPT"
    TENSORFLOW_PROTOBUF = "TensorFlow.Protobuf"
    TENSORFLOW_SAVED_MODEL = "TensorFlow.SavedModel"
--- a/olive/data/component/dataset.py
+++ b/olive/data/component/dataset.py
@ -11,6 +11,7 @@ import numpy as np
 import torch
 from torch.utils.data import Dataset as TorchDataset

+from olive.common.hf.utils import get_model_config
 from olive.common.utils import find_first_matched_value, resolve_torch_dtype
 from olive.constants import Framework

@ -343,9 +344,7 @@ class TransformersDummyDataset(BaseDataset):
        # can instead write dummy input functions like 'get_merged_decoder_with_past_dummy_inputs' if needed

        # Using Namespace class to access dict items like class attributes
-        from transformers import AutoConfig
-
-        model_attributes = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code).__dict__
+        model_attributes = get_model_config(model_name, trust_remote_code=trust_remote_code).to_dict()
        world_size = model_attributes.get("world_size", 1)
        vocab_size = model_attributes.get("vocab_size", 50256)
        input_ids = torch.randint(low=0, high=vocab_size, size=(seq_len,), dtype=torch.int64)
@ -371,7 +370,7 @@ class TransformersDummyDataset(BaseDataset):

        Shape of past_key_values is (num_heads, past_seq_len, head_size).
        """
-        from olive.model.utils.hf_mappings import (
+        from olive.common.hf.mappings import (
            HIDDEN_SIZE_NAMES,
            NUM_HEADS_NAMES,
            NUM_HIDDEN_LAYER_NAMES,
--- a/olive/data/component/pre_process_data.py
+++ b/olive/data/component/pre_process_data.py
@ -7,6 +7,7 @@
 from copy import deepcopy
 from typing import Any, Dict, List, Optional

+from olive.common.hf.utils import get_model_config, get_tokenizer
 from olive.data.component.dataset import BaseDataset
 from olive.data.component.text_generation import (
    TextGenDatasetType,
@ -78,10 +79,9 @@ def huggingface_pre_process(
        object: Pre-processed data.

    """
-    from transformers import AutoConfig, AutoTokenizer

    def _tokenizer_and_align_labels(examples):
-        tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
+        tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)
        tokenized_inputs = tokenizer(
            *[examples[input_col] for input_col in input_cols],
            padding=kwargs.get("padding", True),
@ -100,9 +100,7 @@ def huggingface_pre_process(
    # align_labels -> align_labels_with_mapping
    # Also to support customized operation arguments from users
    if kwargs.pop("align_labels", False):
-        model_hf_config = AutoConfig.from_pretrained(
-            model_config_path or model_name, trust_remote_code=trust_remote_code
-        )
+        model_hf_config = get_model_config(model_config_path or model_name, trust_remote_code=trust_remote_code)
        if model_hf_config and model_hf_config.label2id:
            dataset = dataset.align_labels_with_mapping(model_hf_config.label2id, label_cols[0])

@ -118,7 +116,6 @@ def ner_huggingface_preprocess(
    dataset, model_name, input_cols, label_cols, max_samples=None, trust_remote_code=None, **kwargs
 ):
    """Pre-process data for ner task."""
-    from transformers import AutoTokenizer

    def _align_labels_with_tokens(labels, word_ids):
        new_labels = []
@ -142,7 +139,7 @@ def ner_huggingface_preprocess(
        return new_labels

    def _tokenizer_and_align_labels(examples):
-        tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
+        tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)
        tokenized_inputs = tokenizer(
            *[examples[input_col] for input_col in input_cols],
            padding=kwargs.get("padding", True),
@ -193,15 +190,13 @@ def text_generation_huggingface_pre_process(
            Note: the TextGenCorpusParams and TextGenPairParams subclasses already include the common arguments.

    """
-    from transformers import AutoTokenizer
-
    all_kwargs = deepcopy(kwargs)
    # task is not used in the pre-process function. Will pop it so that the config validation doesn't warn about
    # unused kwargs
    all_kwargs.pop("task", None)
    all_kwargs.update({"max_samples": max_samples, "source_max_len": source_max_len})

-    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)
+    tokenizer = get_tokenizer(model_name, trust_remote_code=trust_remote_code)

    if dataset_type == TextGenDatasetType.CORPUS:
        return text_gen_corpus_pre_process(dataset, tokenizer, all_kwargs)
@ -240,12 +235,12 @@ def audio_classification_pre_process(

    """
    from datasets import Audio
-    from transformers import AutoConfig, AutoFeatureExtractor
+    from transformers import AutoFeatureExtractor

    assert len(input_cols) == 1, "Only one input column is supported for audio classification task."

    # align labels with model configs
-    model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code)
+    model_config = get_model_config(model_name, trust_remote_code=trust_remote_code)
    labels_to_filter = kwargs.get("labels_to_filter", None) or []
    dataset = dataset.filter(
        lambda x: x not in dataset.features["label"].str2int(labels_to_filter), input_columns=label_cols[0]
--- a/olive/data/config.py
+++ b/olive/data/config.py
@ -193,6 +193,8 @@ class DataConfig(ConfigBase):
            if config and config.params:
                task_type = config.params.get("task")
                if task_type:
-                    task_specific_override = dc_cls.task_type_components_map.get(task_type, {}).get(component_name)
+                    task_specific_override = dc_cls.task_type_components_map.get(
+                        task_type.replace("-with-past", ""), {}
+                    ).get(component_name)
                    if task_specific_override:
                        default_components_type[component_name] = task_specific_override
--- a/olive/engine/cloud_cache_helper.py
+++ b/olive/engine/cloud_cache_helper.py
@ -8,11 +8,12 @@ import logging
 import tempfile
 from copy import deepcopy
 from pathlib import Path
-from typing import Any, Dict
+from typing import Any, Dict, Optional

 from olive.common.config_utils import ConfigBase
 from olive.common.utils import get_credentials, hash_dict
 from olive.model.config.model_config import ModelConfig
+from olive.resource_path import create_resource_path

 logger = logging.getLogger(__name__)

@ -67,14 +68,19 @@ class CloudCacheHelper:

        return model_config

-    def get_hash_key(self, model_config: ModelConfig, pass_search_point: Dict[str, Any], input_model_hash: str):
+    def get_hash_key(
+        self, model_config: ModelConfig, pass_search_point: Dict[str, Any], input_model_hash: Optional[str]
+    ):
        hf_hub_model_commit_id = None
        model_config_copy = deepcopy(model_config)
-        if input_model_hash is None:
+        if (
+            input_model_hash is None
+            and model_config.type.lower() == "hfmodel"
+            and create_resource_path(model_config.config["model_path"]).is_string_name()
+        ):
            from huggingface_hub import repo_info

-            if model_config.has_hf_config():
-                hf_hub_model_commit_id = repo_info(model_config.get_hf_model_name()).sha
+            hf_hub_model_commit_id = repo_info(model_config.config["model_path"]).sha
        else:
            model_config_copy.config.pop("model_path", None)
        return hash_dict(
--- a/olive/engine/engine.py
+++ b/olive/engine/engine.py
@ -12,10 +12,10 @@ from datetime import datetime
 from pathlib import Path
 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Type, Union

+from olive.cache import OliveCache
 from olive.common.config_utils import validate_config
-from olive.common.constants import DEFAULT_WORKFLOW_ID
+from olive.common.constants import DEFAULT_CACHE_DIR, DEFAULT_WORKFLOW_ID
 from olive.common.utils import hash_dict
-from olive.engine.cache import OliveCache
 from olive.engine.cloud_cache_helper import (
    CloudCacheHelper,
    check_model_cache,
@ -31,7 +31,7 @@ from olive.evaluator.olive_evaluator import OliveEvaluatorConfig
 from olive.exception import EXCEPTIONS_TO_RAISE, OlivePassError
 from olive.hardware import AcceleratorSpec
 from olive.model import ModelConfig
-from olive.resource_path import ResourceType, create_resource_path
+from olive.resource_path import create_resource_path
 from olive.strategy.search_strategy import SearchStrategy, SearchStrategyConfig
 from olive.systems.common import SystemType
 from olive.systems.system_config import SystemConfig
@ -59,7 +59,7 @@ class Engine:
        host: Optional[Union[Dict[str, Any], "SystemConfig"]] = None,
        target: Optional[Union[Dict[str, Any], "SystemConfig"]] = None,
        evaluator: Optional[Union[Dict[str, Any], "OliveEvaluatorConfig"]] = None,
-        cache_dir: str = ".olive-cache",
+        cache_dir: str = DEFAULT_CACHE_DIR,
        clean_cache: bool = False,
        clean_evaluation_cache: bool = False,
        plot_pareto_frontier: bool = False,
@ -111,6 +111,10 @@ class Engine:

    def initialize(self):
        """Initialize engine state. This should be done before running the registered passes."""
+        # set cache dir environment variables
+        # might be used by other parts of olive to cache data
+        self.cache.set_cache_env()
+
        # clean pass run cache if requested
        # removes all run cache for pass type and all children elements
        for pass_config in self.pass_config.values():
@ -473,7 +477,7 @@ class Engine:
                    output_name=f"{pass_output_name}_model",
                    overwrite=True,
                )
-                # it is not supported to save compositepytorchmodel/compositemodel again
+                # it is not supported to save compositemodel again
                # so the output_model_json could be None
                output_models[pass_output_model_id] = output_model_json

@ -797,19 +801,18 @@ class Engine:
        output_model_hash = None

        if cloud_cache_config.enable_cloud_cache:
-            if (
-                model_config.config.get("model_path")
-                and create_resource_path(model_config.config.get("model_path")) == ResourceType.StringName
+            if not (
+                model_config.type.lower() == "hfmodel"
+                and create_resource_path(model_config.config.get("model_path")).is_string_name()
            ):
                logger.warning(
-                    "Model path is a str name, should not use cloud model cache. Set enable_cloud_cache=False."
+                    "Only HfModel with huggingface id as model_path is supported by cloud cache. Setting"
+                    " enable_cloud_cache=False."
                )
                cloud_cache_config.enable_cloud_cache = False
            else:
-                cloud_cache_dir = Path(self.cache_dir) / "cloud_models"
-                cloud_cache_dir.mkdir(parents=True, exist_ok=True)
                self.cloud_cache_helper = CloudCacheHelper(
-                    cloud_cache_dir,
+                    self.cache.dirs.cloud_cache,
                    cloud_cache_config.account_url,
                    cloud_cache_config.contaier_name,
                    cloud_cache_config.input_model_config,
--- a/olive/model/init.py
+++ b/olive/model/init.py
@ -3,7 +3,6 @@
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
 from olive.model.config import ModelConfig
-from olive.model.config.hf_config import HfFromPretrainedArgs
 from olive.model.handler import *  # noqa: F403

-__all__ = ["ModelConfig", "HfFromPretrainedArgs"]
+__all__ = ["ModelConfig"]
--- a/olive/model/config/init.py
+++ b/olive/model/config/init.py
@ -2,7 +2,7 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
-from olive.model.config.hf_config import HfComponent, HfConfig
+from olive.model.config.hf_config import HfLoadKwargs
 from olive.model.config.io_config import (
    IoConfig,
    complete_kv_cache_with_model_attributes,
@ -12,8 +12,7 @@ from olive.model.config.kv_cache_config import KVCacheConfig
 from olive.model.config.model_config import ModelConfig

 __all__ = [
-    "HfComponent",
-    "HfConfig",
+    "HfLoadKwargs",
    "IoConfig",
    "KVCacheConfig",
    "ModelConfig",
--- a/olive/model/config/hf_config.py
+++ b/olive/model/config/hf_config.py
@ -4,38 +4,19 @@
 # --------------------------------------------------------------------------
 import logging
 from copy import deepcopy
-from typing import Any, Callable, Dict, List, Union
+from typing import Any, Dict, Union

 import torch
 import transformers

-from olive.common.config_utils import ConfigBase, ConfigWithExtraArgs
+from olive.common.config_utils import ConfigWithExtraArgs
 from olive.common.pydantic_v1 import Field, validator
 from olive.common.utils import resolve_torch_dtype
-from olive.model.config.io_config import IoConfig

 logger = logging.getLogger(__name__)


-class HfComponent(ConfigBase):
-    """Used for Hf models which has multiple components, such as whisper.
-
-    For example, in the Whisper model example, the component looks like:
-        {
-            "name": "encoder_decoder_init",
-            "io_config": "get_encdec_io_config",
-            "component_func": "get_encoder_decoder_init",
-            "dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
-        }
-    """
-
-    name: str
-    io_config: Union[IoConfig, Dict[str, Any], str, Callable]
-    component_func: Union[str, Callable] = None
-    dummy_inputs_func: Union[str, Callable]
-
-
-class HfFromPretrainedArgs(ConfigWithExtraArgs):
+class HfLoadKwargs(ConfigWithExtraArgs):
    """Arguments to pass to the `from_pretrained` method of the model class.

    Refer to https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained
@ -139,7 +120,7 @@ class HfFromPretrainedArgs(ConfigWithExtraArgs):
            )
            return v

-    def get_loading_args(self) -> Dict[str, Any]:
+    def get_load_kwargs(self) -> Dict[str, Any]:
        """Return all args in a dict with types expected by `from_pretrained`."""
        loading_args = {}
        # copy args that can be directly copied
@ -215,52 +196,3 @@ class HfFromPretrainedArgs(ConfigWithExtraArgs):
        if extras:
            logger.warning("Unused kwargs in quantization_config: %s. Ignoring them", extras)
        return config
-
-
-class HfConfig(ConfigBase):
-    """The config for HuggingFace models.
-
-    For example, the config for the Whisper model looks like:
-        "model_class": "WhisperForConditionalGeneration",
-        "model_name": "openai/whisper-tiny.en",
-        "components": [
-            {
-                "name": "encoder_decoder_init",
-                "io_config": "get_encdec_io_config",
-                "component_func": "get_encoder_decoder_init",
-                "dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
-            },
-            {
-                "name": "decoder",
-                "io_config": "get_dec_io_config",
-                "component_func": "get_decoder",
-                "dummy_inputs_func": "decoder_dummy_inputs"
-            }
-        ]
-    """
-
-    model_name: str = None
-    task: str = None
-    # feature is optional if task is specified and don't need past
-    # else, provide feature such as "causal-lm-with-past"
-    feature: str = None
-    # TODO(xiaoyu): remove model_class and only use task
-    model_class: str = None
-    components: List[HfComponent] = None
-    from_pretrained_args: HfFromPretrainedArgs = None
-
-    @validator("model_class", always=True)
-    def task_or_model_class_required(cls, v, values):
-        if values["model_name"] and not v and not values.get("task", None):
-            raise ValueError("Either task or model_class must be specified")
-        return v
-
-    def get_loading_args_from_pretrained(self) -> Dict[str, Any]:
-        """Return all args from from_pretrained_args in a dict with types expected by `from_pretrained`."""
-        return self.from_pretrained_args.get_loading_args() if self.from_pretrained_args else {}
-
-
-def get_model_type_from_hf_config(hf_config: HfConfig) -> str:
-    from olive.model.utils.hf_utils import get_hf_model_config
-
-    return get_hf_model_config(hf_config.model_name, **hf_config.get_loading_args_from_pretrained()).model_type
--- a/olive/model/config/io_config.py
+++ b/olive/model/config/io_config.py
@ -6,10 +6,10 @@ from copy import deepcopy
 from typing import Any, Dict, List, Union

 from olive.common.config_utils import ConfigBase
+from olive.common.hf.mappings import HIDDEN_SIZE_NAMES, NUM_HEADS_NAMES, NUM_HIDDEN_LAYER_NAMES
 from olive.common.pydantic_v1 import validator
 from olive.common.utils import find_first_matched_value
 from olive.model.config.kv_cache_config import KVCacheConfig
-from olive.model.utils.hf_mappings import HIDDEN_SIZE_NAMES, NUM_HEADS_NAMES, NUM_HIDDEN_LAYER_NAMES


 class IoConfig(ConfigBase):
@ -122,12 +122,14 @@ def complete_kv_cache_with_model_attributes(kv_cache, model_attributes):
    num_hidden_layers = find_first_matched_value(model_attributes, NUM_HIDDEN_LAYER_NAMES)
    num_attention_heads = find_first_matched_value(model_attributes, NUM_HEADS_NAMES)
    hidden_size = find_first_matched_value(model_attributes, HIDDEN_SIZE_NAMES)
+    world_size = model_attributes.get("world_size", 1)
    kv_cache_obj = None
    if isinstance(kv_cache, bool) and kv_cache:
        kv_cache_obj = KVCacheConfig(
            num_hidden_layers=num_hidden_layers,
            num_attention_heads=num_attention_heads,
            hidden_size=hidden_size,
+            world_size=world_size,
        )
    elif isinstance(kv_cache, dict):
        kv_cache_dict = deepcopy(kv_cache)
@ -136,6 +138,7 @@ def complete_kv_cache_with_model_attributes(kv_cache, model_attributes):
                "num_hidden_layers": kv_cache.get("num_hidden_layers") or num_hidden_layers,
                "num_attention_heads": kv_cache.get("num_attention_heads") or num_attention_heads,
                "hidden_size": kv_cache.get("hidden_size") or hidden_size,
+                "world_size": kv_cache.get("world_size") or world_size,
            }
        )
        kv_cache_obj = KVCacheConfig.parse_obj(kv_cache_dict)
--- a/olive/model/config/kv_cache_config.py
+++ b/olive/model/config/kv_cache_config.py
@ -2,6 +2,7 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
+from itertools import chain
 from typing import Dict, Optional

 from olive.common.config_utils import ConfigBase
@ -70,17 +71,14 @@ class KVCacheConfig(ConfigBase):
        else:
            return [self.ort_present_value_name.replace("<id>", str(i)) for i in range(self.num_hidden_layers)]

-    def get_ort_past_key_names(self):
-        return self._get_k_names("inputs")
+    def _get_kv_names(self, direction="inputs"):
+        return list(chain.from_iterable(zip(self._get_k_names(direction), self._get_v_names(direction))))

-    def get_ort_past_value_names(self):
-        return self._get_v_names("inputs")
+    def get_ort_past_kv_names(self):
+        return self._get_kv_names("inputs")

-    def get_ort_present_key_names(self):
-        return self._get_k_names("outputs")
-
-    def get_ort_present_value_names(self):
-        return self._get_v_names("outputs")
+    def get_ort_present_kv_names(self):
+        return self._get_kv_names("outputs")

    def _get_kv_shape(self):
        return [
@ -91,24 +89,20 @@ class KVCacheConfig(ConfigBase):
        ]

    def get_input_names_shapes_types(self):
-        input_names = [*self.get_ort_past_key_names(), *self.get_ort_past_value_names()]
+        input_names = self.get_ort_past_kv_names()
        input_shapes = [self._get_kv_shape()] * 2 * self.num_hidden_layers
        input_types = [self.dtype] * 2 * self.num_hidden_layers

        return input_names, input_shapes, input_types

    def get_output_names(self):
-        return [*self.get_ort_present_key_names(), *self.get_ort_present_value_names()]
+        return self.get_ort_present_kv_names()

    def get_dynamic_axes(self):
        dynamic_axis = {}
-        for past_name in self.get_ort_past_key_names():
-            dynamic_axis[past_name] = self.past_kv_dynamic_axis
-        for past_name in self.get_ort_past_value_names():
+        for past_name in self.get_ort_past_kv_names():
            dynamic_axis[past_name] = self.past_kv_dynamic_axis

-        for present_name in self.get_ort_present_key_names():
-            dynamic_axis[present_name] = self.present_kv_dynamic_axis
-        for present_name in self.get_ort_present_value_names():
+        for present_name in self.get_ort_present_kv_names():
            dynamic_axis[present_name] = self.present_kv_dynamic_axis
        return dynamic_axis
--- a/olive/model/config/model_config.py
+++ b/olive/model/config/model_config.py
@ -9,62 +9,7 @@ from olive.resource_path import create_resource_path


 class ModelConfig(ConfigBase):
-    """Input model config which will be used to create the model handler.
-
-    For example, the config looks like for llama2:
-
-    .. code-block:: json
-
-        {
-            "input_model": {
-                "type": "CompositePyTorchModel",
-                "config": {
-                    "model_path": "llama_v2",
-                    "generative": False,
-                    "model_components": [
-                        {
-                            "name": "decoder_model",
-                            "type": "PyTorchModel",
-                            "config": {
-                                "model_script": "user_script.py",
-                                "io_config": {
-                                    "input_names": ["tokens", "position_ids", "attn_mask", //...],
-                                    "output_names": ["logits", "attn_mask_out", //...],
-                                    "dynamic_axes": {
-                                        "tokens": { "0": "batch_size", "1": "seq_len" },
-                                        "position_ids": { "0": "batch_size", "1": "seq_len" },
-                                        "attn_mask": { "0": "batch_size", "1": "max_seq_len" },
-                                        //...
-                                    }
-                                },
-                                "model_loader": "load_decoder_model",
-                                "dummy_inputs_func": "decoder_inputs"
-                            }
-                        },
-                        {
-                            "name": "decoder_with_past_model",
-                            "type": "PyTorchModel",
-                            "config": {
-                                "model_script": "user_script.py",
-                                "io_config": {
-                                    "input_names": ["tokens_increment", "position_ids_increment", "attn_mask", //...],
-                                    "output_names": ["logits", "attn_mask_out", //...],
-                                    "dynamic_axes": {
-                                        "tokens_increment": { "0": "batch_size", "1": "seq_len_increment" },
-                                        "position_ids_increment": { "0": "batch_size", "1": "seq_len_increment" },
-                                        "attn_mask": { "0": "batch_size", "1": "max_seq_len" },
-                                        //...
-                                    }
-                                },
-                                "model_loader": "load_decoder_with_past_model",
-                                "dummy_inputs_func": "decoder_with_past_inputs"
-                            }
-                        }
-                    ]
-                }
-            }
-        }
-    """
+    """Input model config which will be used to create the model handler."""

    type: str = Field(description="The type of the model handler.")
    config: dict = Field(description="The config for the model handler. Used to initialize the model handler.")
@ -84,14 +29,6 @@ class ModelConfig(ConfigBase):
        resources = self.get_resource_strings()
        return {k: create_resource_path(v) for k, v in resources.items()}

-    def get_hf_model_name(self):
-        if self.has_hf_config():
-            return self.config["hf_config"].get("model_name")
-        return None
-
-    def has_hf_config(self):
-        return self.config.get("hf_config") is not None
-
    def create_model(self):
        cls = get_model_handler(self.type)
        return cls(**self.config)
--- a/olive/model/handler/init.py
+++ b/olive/model/handler/init.py
@ -3,10 +3,11 @@
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
 from olive.model.handler.base import OliveModelHandler
-from olive.model.handler.composite import CompositeModelHandler, CompositePyTorchModelHandler
+from olive.model.handler.composite import CompositeModelHandler
+from olive.model.handler.hf import DistributedHfModelHandler, HfModelHandler
 from olive.model.handler.onnx import DistributedOnnxModelHandler, ONNXModelHandler
 from olive.model.handler.openvino import OpenVINOModelHandler
-from olive.model.handler.pytorch import DistributedPyTorchModelHandler, PyTorchModelHandler
+from olive.model.handler.pytorch import PyTorchModelHandler
 from olive.model.handler.qnn import QNNModelHandler
 from olive.model.handler.snpe import SNPEModelHandler
 from olive.model.handler.tensorflow import TensorFlowModelHandler
@ -14,11 +15,11 @@ from olive.model.handler.tensorflow import TensorFlowModelHandler
 __all__ = [
    "OliveModelHandler",
    "CompositeModelHandler",
-    "CompositePyTorchModelHandler",
+    "DistributedHfModelHandler",
    "DistributedOnnxModelHandler",
+    "HfModelHandler",
    "ONNXModelHandler",
    "OpenVINOModelHandler",
-    "DistributedPyTorchModelHandler",
    "PyTorchModelHandler",
    "QNNModelHandler",
    "SNPEModelHandler",
--- a/olive/model/handler/base.py
+++ b/olive/model/handler/base.py
@ -6,13 +6,13 @@ from olive.common.config_utils import validate_config
 from olive.constants import Framework, ModelFileFormat
 from olive.hardware.accelerator import Device
 from olive.model.config import IoConfig
-from olive.model.handler.mixin import CompositeMixin, IoConfigMixin, JsonMixin, ResourceMixin
+from olive.model.handler.mixin import IoConfigMixin, JsonMixin, ResourceMixin
 from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS

 logger = logging.getLogger(__name__)


-class OliveModelHandler(ABC, ResourceMixin, IoConfigMixin, JsonMixin, CompositeMixin):
+class OliveModelHandler(ABC, ResourceMixin, IoConfigMixin, JsonMixin):
    """Abstraction for logical "Model", it contains model path and related metadata.

    Each technique accepts Model as input, return Model as output.
--- a/olive/model/handler/composite.py
+++ b/olive/model/handler/composite.py
@ -3,10 +3,10 @@
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
 import logging
-from copy import deepcopy
 from typing import Any, Dict, List, Optional, Tuple, Union

 from olive.common.config_utils import serialize_to_json, validate_config
+from olive.common.utils import dict_diff
 from olive.constants import Framework, ModelFileFormat
 from olive.hardware.accelerator import Device
 from olive.model.config.model_config import ModelConfig
@ -39,18 +39,23 @@ class CompositeModelHandler(OliveModelHandler):
            model_file_format=ModelFileFormat.COMPOSITE_MODEL,
            model_attributes=model_attributes,
        )
-        if isinstance(model_components[0], dict):
-            self.model_components = [validate_config(m, ModelConfig).create_model() for m in model_components]
-        else:
-            assert all(
-                isinstance(m, OliveModelHandler) for m in model_components
-            ), "All components must be OliveModelHandler"
-            self.model_components = model_components
+        self._model_components = [
+            validate_config(m, ModelConfig).create_model() if isinstance(m, dict) else m for m in model_components
+        ]
+        assert all(
+            isinstance(m, OliveModelHandler) for m in self._model_components
+        ), "All components must be OliveModelHandler or dict"

-        assert len(self.model_components) == len(model_component_names), "Number of components and names must match"
+        assert len(self._model_components) == len(model_component_names), "Number of components and names must match"
        self.model_component_names = model_component_names
-        for m in self.model_components:
-            m.set_composite_parent(self)
+
+    @property
+    def model_components(self):
+        for m in self._model_components:
+            # the parent attributes should be inherited by the child model
+            # child attributes take precedence
+            m.model_attributes = {**(self.model_attributes or {}), **(m.model_attributes or {})}
+            yield m

    def to_json(self, check_object: bool = False):
        json_dict = {
@ -58,8 +63,13 @@ class CompositeModelHandler(OliveModelHandler):
            "config": {"model_attributes": self.model_attributes, "model_component_names": self.model_component_names},
        }
        json_dict["config"]["model_components"] = []
-        for m in self.model_components:
-            json_dict["config"]["model_components"].append(m.to_json(check_object))
+        for m in self._model_components:
+            component_json = m.to_json(check_object)
+            # only keep attributes that are different from the parent
+            component_json["config"]["model_attributes"] = dict_diff(
+                component_json["config"]["model_attributes"], self.model_attributes
+            )
+            json_dict["config"]["model_components"].append(component_json)

        return serialize_to_json(json_dict, check_object)

@ -85,35 +95,3 @@ class CompositeModelHandler(OliveModelHandler):
        **kwargs: Dict[str, Any],
    ) -> Any:
        raise RuntimeError("CompositeModelHandler doesn't have a session of its own")
-
-
-@model_handler_registry("CompositePyTorchModel")
-class CompositePyTorchModelHandler(CompositeModelHandler):
-    """The  CompositePyTorchModel handler.
-
-    Its main responsibility is to create a list of child PyTorch model and used to initialzie a composite model.
-    """
-
-    def __init__(self, model_components: List[Dict[str, Any]], **kwargs):
-        model_names = []
-        pytorch_models = []
-        for model_config in model_components:
-            config_copy = deepcopy(model_config)
-
-            assert "name" in config_copy
-            model_name = config_copy["name"]
-            del config_copy["name"]
-
-            model_names.append(model_name)
-            pytorch_models.append(validate_config(config_copy, ModelConfig).create_model())
-
-        kwargs_inner = {}
-        kwargs_inner["model_components"] = pytorch_models
-        kwargs_inner["model_component_names"] = model_names
-
-        if "model_attributes" in kwargs:
-            kwargs_inner["model_attributes"] = kwargs["model_attributes"]
-        if "model_path" in kwargs:
-            logger.warning("model_path is not used in CompositePyTorchModelHandler")
-
-        super().__init__(**kwargs_inner)
--- a/olive/model/handler/hf.py
+++ b/olive/model/handler/hf.py
@ -0,0 +1,216 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+import logging
+from pathlib import Path
+from typing import Any, ClassVar, Dict, List, Optional, Tuple, Union
+
+import torch
+
+from olive.common.config_utils import serialize_to_json, validate_config
+from olive.common.constants import DEFAULT_HF_TASK
+from olive.common.hf.utils import load_model_from_task
+from olive.common.utils import dict_diff
+from olive.constants import Framework
+from olive.hardware.accelerator import Device
+from olive.model.config import HfLoadKwargs, IoConfig
+from olive.model.config.registry import model_handler_registry
+from olive.model.handler.base import OliveModelHandler
+from olive.model.handler.mixin import HfMixin, MLFlowTransformersMixin
+from olive.model.handler.pytorch import PyTorchModelHandlerBase
+from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS
+
+logger = logging.getLogger(__name__)
+
+
+@model_handler_registry("HFModel")
+class HfModelHandler(PyTorchModelHandlerBase, MLFlowTransformersMixin, HfMixin):  # pylint: disable=too-many-ancestors
+    resource_keys: Tuple[str, ...] = ("model_path", "adapter_path")
+    json_config_keys: Tuple[str, ...] = ("task", "load_kwargs", "generative")
+
+    def __init__(
+        self,
+        model_path: OLIVE_RESOURCE_ANNOTATIONS,
+        task: str = DEFAULT_HF_TASK,
+        load_kwargs: Union[Dict[str, Any], HfLoadKwargs] = None,
+        io_config: Union[Dict[str, Any], IoConfig, str] = None,
+        adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
+        model_attributes: Optional[Dict[str, Any]] = None,
+        generative: bool = False,
+    ):
+        super().__init__(
+            framework=Framework.PYTORCH,
+            model_file_format=None,
+            model_path=model_path,
+            model_attributes=model_attributes,
+            io_config=io_config,
+            generative=generative,
+        )
+        self.add_resources(locals())
+        self.task = task
+        self.load_kwargs = validate_config(load_kwargs, HfLoadKwargs, warn_unused_keys=False) if load_kwargs else None
+
+        self.model_attributes = {**self.get_hf_model_config().to_dict(), **(self.model_attributes or {})}
+
+        self.model = None
+        self.dummy_inputs = None
+
+    @property
+    def model_name_or_path(self) -> str:
+        """Return the path to valid hf transformers checkpoint.
+
+        Call this instead of model_path if you expect a checkpoint path.
+        """
+        return self.get_mlflow_transformers_path() or self.model_path
+
+    @property
+    def adapter_path(self) -> str:
+        """Return the path to the peft adapter."""
+        return self.get_resource("adapter_path")
+
+    def load_model(self, rank: int = None) -> torch.nn.Module:
+        """Load the model from the model path."""
+        if self.model is not None:
+            return self.model
+
+        model = load_model_from_task(self.task, self.model_path, **self.get_load_kwargs())
+
+        # we only have peft adapters for now
+        if self.adapter_path:
+            from peft import PeftModel
+
+            model = PeftModel.from_pretrained(model, self.adapter_path)
+
+        self.model = model
+
+        return model
+
+    @property
+    def io_config(self) -> Dict[str, Any]:
+        """Return io config of the model.
+
+        Priority: io_config > hf onnx_config
+        """
+        io_config = None
+        if self._io_config:
+            # io_config is provided
+            io_config = self.get_resolved_io_config(
+                self._io_config, force_kv_cache=self.task.endswith("-with-past"), model_attributes=self.model_attributes
+            )
+        else:
+            logger.debug("Trying hf onnx_config to get io_config")
+            io_config = self.get_hf_io_config()
+            if io_config:
+                logger.debug("Got io_config from hf onnx_config")
+
+        return io_config
+
+    def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
+        """Return a dummy input for the model."""
+        if self.dummy_inputs is not None:
+            return self.dummy_inputs
+
+        # Priority: io_config > hf onnx_config
+        dummy_inputs = self._get_dummy_inputs_from_io_config(
+            force_kv_cache=self.task.endswith("-with-past"),
+            filter_hook=filter_hook,
+            filter_hook_kwargs=filter_hook_kwargs,
+        )
+        if dummy_inputs:
+            return dummy_inputs
+
+        logger.debug("Trying hf onnx_config to get dummy inputs")
+        dummy_inputs = self.get_hf_dummy_inputs()
+        if dummy_inputs:
+            logger.debug("Got dummy inputs from hf onnx_config")
+
+        if dummy_inputs is None:
+            raise ValueError("Unable to get dummy inputs for the model.")
+
+        return dummy_inputs
+
+    def to_json(self, check_object: bool = False):
+        config = super().to_json(check_object)
+        # only keep model_attributes that are not in hf model config
+        hf_model_config_dict = self.get_hf_model_config().to_dict()
+        config["config"]["model_attributes"] = dict_diff(self.model_attributes, hf_model_config_dict)
+        return serialize_to_json(config, check_object)
+
+
+@model_handler_registry("DistributedHfModel")
+class DistributedHfModelHandler(OliveModelHandler):
+    json_config_keys: Tuple[str, ...] = (
+        "model_name_pattern",
+        "num_ranks",
+        "task",
+        "load_kwargs",
+        "io_config",
+        "generative",
+    )
+
+    DEFAULT_RANKED_MODEL_NAME_FORMAT: ClassVar[str] = "model_{:02d}"
+
+    def __init__(
+        self,
+        model_path: OLIVE_RESOURCE_ANNOTATIONS,
+        model_name_pattern: str,
+        num_ranks: int,
+        task: str,
+        load_kwargs: Union[Dict[str, Any], HfLoadKwargs] = None,
+        io_config: Union[Dict[str, Any], IoConfig] = None,
+        model_attributes: Optional[Dict[str, Any]] = None,
+        generative: bool = False,
+    ):
+        super().__init__(
+            framework=Framework.PYTORCH,
+            model_file_format=None,
+            model_path=model_path,
+            model_attributes=model_attributes,
+            io_config=io_config,
+            generative=generative,
+        )
+
+        self.add_resources(locals())
+
+        self.model_name_pattern = model_name_pattern
+        self.num_ranks = num_ranks
+        self.task = task
+        self.load_kwargs = load_kwargs
+
+    def ranked_model_name(self, rank: int) -> str:
+        return self.model_name_pattern.format(rank)
+
+    def ranked_model_path(self, rank: int) -> Union[Path, str]:
+        return Path(self.model_path) / self.ranked_model_name(rank)
+
+    def load_model(self, rank: int = None) -> HfModelHandler:
+        return HfModelHandler(
+            model_path=self.ranked_model_path(rank),
+            task=self.task,
+            load_kwargs=self.load_kwargs,
+            io_config=self.io_config,
+            model_attributes=self.model_attributes,
+            generative=self.generative,
+        )
+
+    def prepare_session(
+        self,
+        inference_settings: Optional[Dict[str, Any]] = None,
+        device: Device = Device.GPU,  # pylint: disable=signature-differs
+        execution_providers: Union[str, List[str]] = None,
+        rank: Optional[int] = 0,
+    ) -> torch.nn.Module:
+        return self.load_model(rank).load_model(rank).eval()
+
+    def run_session(
+        self,
+        session: Any = None,
+        inputs: Union[Dict[str, Any], List[Any], Tuple[Any, ...]] = None,
+        **kwargs: Dict[str, Any],
+    ) -> Any:
+        if isinstance(inputs, dict):
+            results = session.generate(**inputs, **kwargs) if self.generative else session(**inputs, **kwargs)
+        else:
+            results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
+        return results
--- a/olive/model/handler/mixin/init.py
+++ b/olive/model/handler/mixin/init.py
@ -2,24 +2,22 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
-from olive.model.handler.mixin.composite import CompositeMixin
 from olive.model.handler.mixin.dummy_inputs import DummyInputsMixin
-from olive.model.handler.mixin.hf_config import HfConfigMixin
+from olive.model.handler.mixin.hf import HfMixin
 from olive.model.handler.mixin.io_config import IoConfigMixin
 from olive.model.handler.mixin.json import JsonMixin
 from olive.model.handler.mixin.kv_cache import PytorchKvCacheMixin
-from olive.model.handler.mixin.mlflow import MLFlowMixin
+from olive.model.handler.mixin.mlflow import MLFlowTransformersMixin
 from olive.model.handler.mixin.onnx_ep import OnnxEpValidateMixin
 from olive.model.handler.mixin.onnx_graph import OnnxGraphMixin
 from olive.model.handler.mixin.resource import ResourceMixin

 __all__ = [
-    "CompositeMixin",
    "DummyInputsMixin",
-    "HfConfigMixin",
+    "HfMixin",
    "IoConfigMixin",
    "JsonMixin",
-    "MLFlowMixin",
+    "MLFlowTransformersMixin",
    "OnnxEpValidateMixin",
    "OnnxGraphMixin",
    "PytorchKvCacheMixin",
--- a/olive/model/handler/mixin/dummy_inputs.py
+++ b/olive/model/handler/mixin/dummy_inputs.py
@ -5,7 +5,6 @@
 import logging

 import olive.data.template as data_config_template
-from olive.common.user_module_loader import UserModuleLoader

 logger = logging.getLogger(__name__)

@ -16,51 +15,31 @@ class DummyInputsMixin:
    the dummy data is used to evaluate the latency if user doesn't provide the data for evaluation.
    """

-    def _get_dummy_dataloader_from_io_config(self):
-        dataloader = None
-        # resolved self.io_config
-        # won't use self.io_config since we don't want hf_config to be used
-        resolved_io_config = self.get_user_io_config(self.io_config) or {}
-        if resolved_io_config.get("input_shapes"):
-            logger.debug("Using io_config.input_shapes to build dummy dataloader")
-            dataloader = (
-                # input_types is optional
-                data_config_template.dummy_data_config_template(
-                    input_shapes=resolved_io_config["input_shapes"],
-                    input_types=resolved_io_config.get("input_types"),
-                    input_names=resolved_io_config.get("input_names"),
-                ).to_data_container()
+    def _get_dummy_inputs_from_io_config(self, force_kv_cache: bool = False, filter_hook=None, filter_hook_kwargs=None):
+        if not self._io_config:
+            return None
+
+        resolved_io_config = (
+            self.get_resolved_io_config(
+                self._io_config, force_kv_cache=force_kv_cache, model_attributes=self.model_attributes
            )
-        return dataloader
+            or {}
+        )
+        if not resolved_io_config.get("input_shapes"):
+            return None

-    def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
-        """Return a dummy input for the model."""
-        if self.dummy_inputs is not None:
-            return self.dummy_inputs
-
-        # Priority: dummy_inputs_func > io_config.input_shapes > hf_config.dataset > onnx_config
-        dummy_inputs = None
-
-        if self.dummy_inputs_func is not None:
-            logger.debug("Using dummy_inputs_func to get dummy inputs")
-            user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
-            dummy_inputs = user_module_loader.call_object(self.dummy_inputs_func, self)
-            # respect user's dummy_inputs_func, no hook
-        else:
-            dataloader = self._get_dummy_dataloader_from_io_config()
-            if dataloader:
-                dummy_inputs, _ = dataloader.get_first_batch()
-            elif self.hf_config and not self.hf_config.components and self.hf_config.task:
-                logger.debug("Trying hf onnx_config to get dummy inputs")
-                dummy_inputs = self.get_hf_dummy_inputs()
-                if dummy_inputs is not None:
-                    logger.debug("Got dummy inputs from hf onnx_config")
-            if filter_hook:
-                dummy_inputs = filter_hook(dummy_inputs, **(filter_hook_kwargs or {}))
-
-        if dummy_inputs is None:
-            raise ValueError(
-                "Unable to get dummy inputs. Please provide dummy_inputs_func, io_config.input_shapes,"
-                " hf_config.dataset, or hf_config."
+        logger.debug("Using io_config.input_shapes to build dummy inputs")
+        dummy_inputs = (
+            data_config_template.dummy_data_config_template(
+                input_shapes=resolved_io_config["input_shapes"],
+                input_types=resolved_io_config.get("input_types"),
+                input_names=resolved_io_config.get("input_names"),
            )
+            .to_data_container()
+            .get_first_batch()
+        )[0]
+
+        if filter_hook:
+            dummy_inputs = filter_hook(dummy_inputs, **(filter_hook_kwargs or {}))
+
        return dummy_inputs
--- a/olive/model/handler/mixin/hf.py
+++ b/olive/model/handler/mixin/hf.py
@ -0,0 +1,88 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+# --------------------------------------------------------------------------
+import logging
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
+
+from olive.common.hf.model_io import get_model_dummy_input, get_model_io_config
+from olive.common.hf.utils import (
+    get_generation_config,
+    get_model_config,
+    get_tokenizer,
+    save_model_config,
+    save_tokenizer,
+)
+
+if TYPE_CHECKING:
+    from transformers import GenerationConfig, PretrainedConfig, PreTrainedTokenizer, PreTrainedTokenizerFast
+
+logger = logging.getLogger(__name__)
+
+
+class HfMixin:
+    """Provide the following Hugging Face model functionalities."""
+
+    def get_load_kwargs(self) -> Dict[str, Any]:
+        """Return all args from load_kwargs in a dict with types expected by `from_pretrained`."""
+        return self.load_kwargs.get_load_kwargs() if self.load_kwargs else {}
+
+    def get_hf_model_config(self) -> "PretrainedConfig":
+        return get_model_config(self.model_path, **self.get_load_kwargs())
+
+    def get_hf_generation_config(self) -> "GenerationConfig":
+        return get_generation_config(self.model_path, **self.get_load_kwargs())
+
+    def get_hf_tokenizer(self) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
+        # don't provide loading args for tokenizer directly since it tries to serialize all kwargs
+        # TODO(anyone): only provide relevant kwargs, no use case for now to provide kwargs
+        return get_tokenizer(self.model_path)
+
+    def save_metadata(self, output_dir: str, **kwargs) -> List[str]:
+        """Save model metadata files to the output directory.
+
+        :param output_dir: output directory to save metadata files
+        :param kwargs: additional keyword arguments to pass to `save_pretrained` method
+        :return: list of file paths
+        """
+        output_dir = Path(output_dir)
+        if not output_dir.exists():
+            output_dir.mkdir(parents=True)
+        elif not output_dir.is_dir():
+            raise ValueError("Expecting a directory as input.")
+
+        saved_filepaths = []
+
+        # save model config
+        save_model_config(self.get_hf_model_config(), output_dir, **kwargs)
+        saved_filepaths.append(str(output_dir / "config.json"))
+
+        # save model generation config
+        # non-generative models won't have generation config
+        generation_config = self.get_hf_generation_config()
+        if generation_config:
+            save_model_config(generation_config, output_dir, **kwargs)
+            saved_filepaths.append(str(output_dir / "generation_config.json"))
+
+        # save tokenizer
+        tokenizer_filepaths = save_tokenizer(self.get_hf_tokenizer(), output_dir, **kwargs)
+        saved_filepaths.extend([fp for fp in tokenizer_filepaths if Path(fp).exists()])
+
+        return saved_filepaths
+
+    def get_hf_io_config(self) -> Optional[Dict[str, Any]]:
+        """Get Io config for the model."""
+        return get_model_io_config(self.model_path, self.task, **self.get_load_kwargs())
+
+    def get_hf_dummy_inputs(self) -> Optional[Dict[str, Any]]:
+        """Get dummy inputs for the model."""
+        return get_model_dummy_input(
+            self.model_path,
+            self.task,
+            **self.get_load_kwargs(),
+        )
+
+    def get_hf_model_type(self) -> str:
+        """Get model type for the model."""
+        return self.get_hf_model_config().model_type
--- a/olive/model/handler/mixin/hf_config.py
+++ b/olive/model/handler/mixin/hf_config.py
@ -1,147 +0,0 @@
-# -------------------------------------------------------------------------
-# Copyright (c) Microsoft Corporation. All rights reserved.
-# Licensed under the MIT License.
-# --------------------------------------------------------------------------
-import logging
-from pathlib import Path
-from typing import TYPE_CHECKING, Generator, List, Optional, Tuple
-
-from olive.constants import ModelFileFormat
-from olive.model.utils.hf_utils import (
-    get_hf_model_config,
-    get_hf_model_dummy_input,
-    get_hf_model_generation_config,
-    get_hf_model_io_config,
-    get_hf_model_tokenizer,
-    load_hf_model_from_model_class,
-    load_hf_model_from_task,
-    save_hf_model_config,
-    save_hf_model_tokenizer,
-)
-
-if TYPE_CHECKING:
-    from olive.model.handler.pytorch import PyTorchModelHandler
-
-logger = logging.getLogger(__name__)
-
-
-class HfConfigMixin:
-    """Provide the following Hugging Face model functionalities.
-
-        * loading huggingface model
-        * getting huggingface model config
-        * getting huggingface model io config
-        * getting huggingface model components like Whisper scenario.
-
-    The mixin requires the following attributes to be set.
-        * model_path
-        * model_file_format
-        * model_loader
-        * model_script
-        * script_dir
-        * model_attributes
-        * hf_config
-    """
-
-    def get_hf_model_config(self):
-        if self.hf_config is None:
-            raise ValueError("HF model_config is not available")
-
-        return get_hf_model_config(self.get_model_path_or_name(), **self.hf_config.get_loading_args_from_pretrained())
-
-    def get_hf_model_generation_config(self):
-        if self.hf_config is None:
-            raise ValueError("HF model_config is not available")
-
-        return get_hf_model_generation_config(
-            self.get_model_path_or_name(), **self.hf_config.get_loading_args_from_pretrained()
-        )
-
-    def get_hf_model_tokenizer(self, **kwargs):
-        if self.hf_config is None:
-            raise ValueError("HF model_config is not available")
-
-        # don't provide loading args for tokenizer directly since it tries to serialize all kwargs
-        # TODO(anyone): only provide relevant kwargs, no use case for now to provide kwargs
-        return get_hf_model_tokenizer(self.get_model_path_or_name(), **kwargs)
-
-    def save_metadata_for_token_generation(self, output_dir: str, **kwargs) -> List[str]:
-        """Save metadata for token generation.
-
-        :param output_dir: output directory to save metadata files
-        :param kwargs: additional keyword arguments to pass to `save_pretrained` method
-        :return: list of file paths
-        """
-        if self.hf_config is None:
-            raise ValueError("HF model_config is not available.")
-        if not Path(output_dir).is_dir():
-            raise ValueError("Expecting a directory as input.")
-
-        save_hf_model_config(self.get_hf_model_config(), output_dir, **kwargs)
-        save_hf_model_config(self.get_hf_model_generation_config(), output_dir, **kwargs)
-        tokenizer_filepaths = save_hf_model_tokenizer(self.get_hf_model_tokenizer(), output_dir, **kwargs)
-
-        output_dir = Path(output_dir)
-        return [
-            str(output_dir / "config.json"),
-            str(output_dir / "generation_config.json"),
-            *[fp for fp in tokenizer_filepaths if Path(fp).exists()],
-        ]
-
-    def get_hf_io_config(self):
-        """Get Io config for the model."""
-        if self.hf_config and self.hf_config.task and not self.hf_config.components:
-            return get_hf_model_io_config(
-                self.get_model_path_or_name(),
-                self.hf_config.task,
-                self.hf_config.feature,
-                **self.hf_config.get_loading_args_from_pretrained(),
-            )
-        else:
-            return None
-
-    def get_hf_components(self, rank: Optional[int] = None) -> Generator[Tuple[str, "PyTorchModelHandler"], None, None]:
-        if self.hf_config and self.hf_config.components:
-            for component in self.hf_config.components:
-                yield component.name, self.get_component_model(component, rank)
-
-    def load_hf_model(self, model_path: str = None):
-        """Load model from model_path or model_name."""
-        model_name_or_path = model_path or self.hf_config.model_name
-        loading_args = self.hf_config.get_loading_args_from_pretrained()
-        logger.info("Loading Huggingface model from %s", model_name_or_path)
-        if self.hf_config.task:
-            model = load_hf_model_from_task(self.hf_config.task, model_name_or_path, **loading_args)
-        elif self.hf_config.model_class:
-            model = load_hf_model_from_model_class(self.hf_config.model_class, model_name_or_path, **loading_args)
-        else:
-            raise ValueError("Either task or model_class must be specified")
-
-        return model
-
-    def get_hf_dummy_inputs(self):
-        """Get dummy inputs for the model."""
-        return get_hf_model_dummy_input(
-            self.get_model_path_or_name(),
-            self.hf_config.task,
-            self.hf_config.feature,
-            **self.hf_config.get_loading_args_from_pretrained(),
-        )
-
-    def is_model_loaded_from_hf_config(self) -> bool:
-        """Return True if the model is loaded from hf_config, False otherwise."""
-        return (
-            (not self.model_loader)
-            and (
-                self.model_file_format
-                not in (ModelFileFormat.PYTORCH_TORCH_SCRIPT, ModelFileFormat.PYTORCH_MLFLOW_MODEL)
-            )
-            and self.hf_config
-            and (self.hf_config.model_class or self.hf_config.task)
-        )
-
-    def get_model_path_or_name(self):
-        if self.model_file_format == ModelFileFormat.PYTORCH_MLFLOW_MODEL:
-            return self.get_mlflow_model_path_or_name(self.get_mlflow_transformers_dir())
-        else:
-            return self.model_path or self.hf_config.model_name
--- a/olive/model/handler/mixin/kv_cache.py
+++ b/olive/model/handler/mixin/kv_cache.py
@ -23,9 +23,8 @@ class PytorchKvCacheMixin:
            unused_keys = set()
            if kv_cache_config and not dummy_inputs.get(past_kv_names):
                torch_past_key_values = []
-                k_inputs = kv_cache_config.get_ort_past_key_names()
-                v_inputs = kv_cache_config.get_ort_past_value_names()
-                for k_input, v_input in zip(k_inputs, v_inputs):
+                kv_inputs = kv_cache_config.get_ort_past_kv_names()
+                for k_input, v_input in zip(kv_inputs[::2], kv_inputs[1::2]):
                    if k_input not in dummy_inputs or v_input not in dummy_inputs:
                        raise ValueError(
                            f"Cannot find past key-value pair for {k_input} and {v_input} in dummy inputs."
@ -51,6 +50,7 @@ class PytorchKvCacheMixin:
        """
        return (self.merge_kv_cache_hook(dummy_inputs, past_kv_names),)

+    # TODO(jambayk): consider removing this since we don't use hf dataset for dummy inputs anymore
    def past_key_values_input_filter_hook(self, dummy_inputs, past_kv_names: str = "past_key_values"):
        if not isinstance(dummy_inputs, dict):
            return dummy_inputs
--- a/olive/model/handler/mixin/mlflow.py
+++ b/olive/model/handler/mixin/mlflow.py
@ -5,42 +5,43 @@

 import logging
 from pathlib import Path
+from typing import Optional

-from olive.common.utils import copy_dir
-from olive.constants import ModelFileFormat
+from olive.cache import OliveCache
+from olive.common.hf.mlflow import get_pretrained_name_or_path, is_mlflow_transformers
+from olive.common.utils import hardlink_copy_dir, hash_string

 logger = logging.getLogger(__name__)


-class MLFlowMixin:
-    def _get_mlflow_transformers_model_path(self, cache_dir):
-        # DO NOT use the model.to_json() to get hash_dict, since it will get hf_config from the model
-        # and the operation to get hf_config will use this function to get model_path, which will
-        # cause infinite loop
-        return str(Path(cache_dir) / "olive_tmp" / "transformers")
+class MLFlowTransformersMixin:
+    def get_mlflow_transformers_path(self) -> Optional[str]:
+        if not is_mlflow_transformers(self.model_path):
+            return None

-    def to_mlflow_transformer_model(self, cache_dir):
-        if self.model_file_format != ModelFileFormat.PYTORCH_MLFLOW_MODEL:
-            raise ValueError(
-                "Model file format is not PyTorch MLFlow model, you cannot get MLFlow transformers model path."
-            )
-        target_path = self._get_mlflow_transformers_model_path(cache_dir)
-        if (Path(target_path) / "config.json").exists():
-            logger.debug("Use cached mlflow-transformers models from %s", target_path)
-            return target_path
-        if (Path(self.model_path) / "data" / "model").exists():
-            copy_dir(Path(self.model_path) / "data" / "model", target_path, dirs_exist_ok=True)
-            copy_dir(Path(self.model_path) / "data" / "config", target_path, dirs_exist_ok=True)
-            copy_dir(Path(self.model_path) / "data" / "tokenizer", target_path, dirs_exist_ok=True)
-            return target_path
-        return None
+        model_dir = get_pretrained_name_or_path(self.model_path, "model")
+        config_dir = get_pretrained_name_or_path(self.model_path, "config")
+        tokenizer_dir = get_pretrained_name_or_path(self.model_path, "tokenizer")

-    def get_mlflow_model_path_or_name(self, cache_dir):
-        # both config.json and model file will be saved under data/model
-        mlflow_transformer_model_path = self.to_mlflow_transformer_model(cache_dir)
-        if not mlflow_transformer_model_path:
-            logger.debug(
-                "Model path %s does not exist. Use hf_config.model_name instead.", mlflow_transformer_model_path
-            )
-            return self.hf_config.model_name
-        return str(mlflow_transformer_model_path)
+        # some mlflow models only have model directory
+        if config_dir == model_dir and tokenizer_dir == model_dir:
+            return model_dir
+
+        # some mlflow models have config and tokenizer directories but model directory also
+        # contains the same files
+        model_dir_contents = set(Path(model_dir).iterdir())
+        if (
+            set(Path(config_dir).iterdir()) <= model_dir_contents
+            and set(Path(tokenizer_dir).iterdir()) <= model_dir_contents
+        ):
+            return model_dir
+
+        # have to gather all contents into a single directory
+        cache = OliveCache.from_cache_env()
+        mlflow_transformers_path = cache.dirs.mlflow / hash_string(str(Path(self.model_path).resolve()))
+        if (mlflow_transformers_path / "config.json").exists():
+            logger.debug("MLFlow model already exists in cache. Reusing it.")
+        else:
+            for src_dir in [model_dir, config_dir, tokenizer_dir]:
+                hardlink_copy_dir(src_dir, mlflow_transformers_path)
+        return str(mlflow_transformers_path)
--- a/olive/model/handler/pytorch.py
+++ b/olive/model/handler/pytorch.py
@ -3,191 +3,28 @@
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
 import logging
-import os
-from copy import deepcopy
 from pathlib import Path
-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union
+from typing import Any, Callable, Dict, List, Optional, Tuple, Union

 import torch
-import yaml

 from olive.common.config_utils import serialize_to_json, validate_config
 from olive.common.user_module_loader import UserModuleLoader
 from olive.constants import Framework, ModelFileFormat
 from olive.hardware.accelerator import Device
-from olive.model.config import (
-    HfComponent,
-    HfConfig,
-    IoConfig,
-    complete_kv_cache_with_model_attributes,
-    extend_io_config_with_kv_cache,
-)
+from olive.model.config import IoConfig, complete_kv_cache_with_model_attributes, extend_io_config_with_kv_cache
 from olive.model.config.registry import model_handler_registry
 from olive.model.handler.base import OliveModelHandler
-from olive.model.handler.mixin import DummyInputsMixin, HfConfigMixin, MLFlowMixin, PytorchKvCacheMixin
-from olive.model.utils.hf_utils import load_hf_model_from_model_class
+from olive.model.handler.mixin import DummyInputsMixin, PytorchKvCacheMixin
 from olive.resource_path import OLIVE_RESOURCE_ANNOTATIONS, ResourceType, create_resource_path

 logger = logging.getLogger(__name__)


-@model_handler_registry("PyTorchModel")
-class PyTorchModelHandler(
-    OliveModelHandler, HfConfigMixin, DummyInputsMixin, PytorchKvCacheMixin, MLFlowMixin
+class PyTorchModelHandlerBase(
+    OliveModelHandler, DummyInputsMixin, PytorchKvCacheMixin
 ):  # pylint: disable=too-many-ancestors
-    """PyTorch model handler.
-
-    Besides the model loading for PyTorch model, the model handler also provides the following functionalities:
-      * Get the model io configuration either from user provider io_config or from hf_config. The priority is user
-        provided io_config is higher than hf_config.
-      * Get the dummy inputs for PyTorch model used to evaluate the latency.
-      * All kinds of Hf model functionalities by HfConfigMixin.
-    """
-
-    resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script", "adapter_path")
-    json_config_keys: Tuple[str, ...] = (
-        "model_file_format",
-        "model_loader",
-        "dummy_inputs_func",
-        "hf_config",
-        "mlflow_transformer_model_cache_dir",
-        "generative",
-    )
-
-    def __init__(
-        self,
-        model_path: OLIVE_RESOURCE_ANNOTATIONS = None,
-        model_file_format: ModelFileFormat = ModelFileFormat.PYTORCH_ENTIRE_MODEL,
-        model_loader: Union[str, Callable] = None,
-        model_script: Union[str, Path] = None,
-        script_dir: Union[str, Path] = None,
-        io_config: Union[Dict[str, Any], IoConfig, str, Callable] = None,
-        dummy_inputs_func: Union[str, Callable] = None,
-        hf_config: Union[Dict[str, Any], HfConfig] = None,
-        adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
-        model_attributes: Optional[Dict[str, Any]] = None,
-        mlflow_transformer_model_cache_dir: Optional[str] = None,
-        generative: bool = False,
-    ):
-        if not (
-            isinstance(model_loader, Callable)
-            or (isinstance(model_loader, str) and model_script)
-            or model_path
-            or hf_config
-        ):
-            raise ValueError(
-                "model_path is required since model_loader is not callable or model_script is not provided"
-            )
-        self.mlflow_transformer_model_cache_dir = mlflow_transformer_model_cache_dir
-        self.model_loader = model_loader
-        self.model = None
-        super().__init__(
-            framework=Framework.PYTORCH,
-            model_file_format=model_file_format,
-            model_path=model_path,
-            model_attributes=model_attributes,
-            io_config=io_config,
-            generative=generative,
-        )
-        self.add_resources(locals())
-        self.hf_config = None
-        if hf_config:
-            self.hf_config = validate_config(hf_config, HfConfig)
-            hf_model_config = self.get_hf_model_config().to_dict()
-            model_attr = self.model_attributes or {}
-            hf_model_config.update(model_attr)
-            self.model_attributes = hf_model_config
-
-        # ensure that script_dirs are local folder
-        script_dir_resource = create_resource_path(self.script_dir)
-        if script_dir_resource:
-            assert script_dir_resource.type == ResourceType.LocalFolder, "script_dir must be a local directory."
-
-        # ensure that model_script is local file or string name
-        model_script_resource = create_resource_path(self.model_script)
-        if model_script_resource:
-            assert model_script_resource.type in (
-                ResourceType.LocalFile,
-                ResourceType.StringName,
-            ), "model_script must be a local file or a string name."
-
-        self.dummy_inputs_func = dummy_inputs_func
-        self.dummy_inputs = None
-
-    @property
-    def script_dir(self) -> str:
-        return self.get_resource("script_dir")
-
-    @property
-    def model_script(self) -> str:
-        return self.get_resource("model_script")
-
-    @property
-    def adapter_path(self) -> str:
-        return self.get_resource("adapter_path")
-
-    def get_mlflow_transformers_dir(self):
-        return self.mlflow_transformer_model_cache_dir or self.model_path
-
-    def load_model(self, rank: int = None) -> torch.nn.Module:
-        if self.model is not None:
-            return self.model
-
-        # Load user module at the beginning since we may need user defined models to load model
-        user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
-
-        # Load special path or format model -> load model from hf config -> load normal path model
-        if self.model_loader is not None:
-            model = user_module_loader.call_object(self.model_loader, self.model_path)
-        elif self.model_file_format == ModelFileFormat.PYTORCH_TORCH_SCRIPT:
-            model = torch.jit.load(self.model_path)
-        elif self.model_file_format == ModelFileFormat.PYTORCH_MLFLOW_MODEL:
-            model = self._load_mlflow_model()
-        elif self.hf_config and (self.hf_config.model_class or self.hf_config.task):
-            model = self.load_hf_model(self.model_path)
-        elif self.model_file_format == ModelFileFormat.PYTORCH_ENTIRE_MODEL:
-            model = torch.load(self.model_path)
-        elif self.model_file_format == ModelFileFormat.PYTORCH_SLICE_GPT_MODEL:
-            model = self._load_slicegpt_model()
-        elif self.model_file_format == ModelFileFormat.PYTORCH_STATE_DICT:
-            raise ValueError("Please use customized model loader to load state dict of model.")
-        else:
-            raise ValueError(f"Unsupported model file format: {self.model_file_format}")
-
-        # we only have peft adapters for now
-        if self.adapter_path:
-            from peft import PeftModel
-
-            model = PeftModel.from_pretrained(model, self.adapter_path)
-
-        self.model = model
-
-        return model
-
-    def get_component_model(self, component: HfComponent, rank: Optional[int] = None) -> "PyTorchModelHandler":
-        if component.component_func is None:
-            logger.debug("component_func is not provided, using hf_config to get component")
-            model_component = self.load_hf_model(self.model_path)
-        else:
-            user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
-            model_component = user_module_loader.call_object(component.component_func, self)
-
-        # the second default parameter is to fix ruff b023:
-        # https://docs.astral.sh/ruff/rules/function-uses-loop-variable/
-        def model_loader(_, model_component=model_component):
-            return model_component
-
-        component_hf_config = deepcopy(self.hf_config).dict()
-        component_hf_config.pop("components", None)
-        return PyTorchModelHandler(
-            model_loader=model_loader,
-            io_config=component.io_config,
-            dummy_inputs_func=component.dummy_inputs_func,
-            model_script=self.model_script,
-            script_dir=self.script_dir,
-            hf_config=HfConfig.parse_obj(component_hf_config),
-            model_attributes=self.model_attributes,
-        )
+    """Base class for PyTorch model handler."""

    def prepare_session(
        self,
@ -210,133 +47,64 @@ class PyTorchModelHandler(
            results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
        return results

-    def _load_mlflow_model(self):
-        logger.info("Loading MLFlow model from %s", self.model_path)
-        mlflow_transformers_path = self.to_mlflow_transformer_model(self.get_mlflow_transformers_dir())
-        with open(os.path.join(self.model_path, "MLmodel")) as fp:
-            mlflow_data = yaml.safe_load(fp)
-            # default flavor is "hftransformersv2" from azureml.evaluate.mlflow>=0.0.8
-            # "hftransformers" from azureml.evaluate.mlflow<0.0.8
-            # TODO(trajep): let user specify flavor name if needed
-            # to support other flavors in mlflow not only hftransformers
-            hf_pretrained_class = None
-            flavors = mlflow_data.get("flavors", {})
-            if not flavors:
-                raise ValueError(
-                    "Invalid MLFlow model format. Please make sure the input model"
-                    " format is same with the result of mlflow.transformers.save_model,"
-                    " or aml_mlflow.hftransformers.save_model from azureml.evaluate.mlflow"
-                )
+    @staticmethod
+    def get_resolved_io_config(
+        io_config: Union[Dict[str, Any], IoConfig],
+        force_kv_cache: bool = False,
+        model_attributes: Optional[Dict[str, Any]] = None,
+    ) -> Dict[str, Any]:
+        """Resolve io_config to a dictionary.

-            if "hftransformersv2" in flavors:
-                hf_pretrained_class = flavors["hftransformersv2"].get("hf_pretrained_class", "AutoModel")
-            elif "hftransformers" in flavors:
-                hf_pretrained_class = flavors["hftransformers"].get("hf_pretrained_class", "AutoModel")
-            else:
-                raise ValueError(
-                    "Unsupported MLFlow model flavor. Currently only support hftransformersv2/hftransformers."
-                )
-        loading_args = self.hf_config.get_loading_args_from_pretrained() if self.hf_config else {}
-        loaded_model = load_hf_model_from_model_class(hf_pretrained_class, mlflow_transformers_path, **loading_args)
-        loaded_model.eval()
-        return loaded_model
+        :param io_config: io_config to resolve.
+        :param force_kv_cache: whether to enable kv_cache if not already enabled.
+        """
+        io_config_obj = validate_config(io_config, IoConfig)

-    def _load_slicegpt_model(self):
-        logger.info("Loading SliceGPT model from %s", self.model_path)
-        from slicgpt.hf_utils import load_sliced_model
+        # enable kv_cache
+        io_config_obj.kv_cache = io_config_obj.kv_cache or force_kv_cache

-        loaded_model, _ = load_sliced_model(self.hf_config.model_name, self.model_path)
-        return loaded_model
+        if io_config_obj.kv_cache:
+            kv_cache_config = complete_kv_cache_with_model_attributes(io_config_obj.kv_cache, model_attributes or {})
+            io_config_obj = extend_io_config_with_kv_cache(io_config_obj, kv_cache_config)
+        return io_config_obj.dict(exclude_none=True)

    def to_json(self, check_object: bool = False):
        config = super().to_json(check_object)
        # add _io_config to config to keep what was provided at init
        config["config"]["io_config"] = self._io_config
-        # only keep model_attributes that are not in hf_config
-        if self.model_attributes and self.hf_config:
-            hf_config_dict = self.get_hf_model_config().to_dict()
-            config["config"]["model_attributes"] = {
-                key: value
-                for key, value in self.model_attributes.items()
-                if key not in hf_config_dict or hf_config_dict[key] != value
-            } or None
        return serialize_to_json(config, check_object)

-    def get_user_io_config(self, io_config: Union[Dict[str, Any], IoConfig, str, Callable]) -> Dict[str, Any]:
-        """Resolve io_config to a dictionary.

-        If io_config is a string name or a callable, it will be called to get io_config.
-        """
-        io_config_obj = None
-        if isinstance(io_config, dict):
-            io_config_obj = IoConfig.parse_obj(io_config)
-        elif isinstance(io_config, IoConfig):
-            # return a new copy of io_config to avoid modifying the original one
-            io_config_obj = io_config.copy(deep=True)
-        elif isinstance(io_config, (str, Callable)):
-            # io_config is a string name or a callable
-            logger.debug("Calling %s to get io_config", io_config)
-            user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
-            io_config = user_module_loader.call_object(io_config, self)
-            io_config_obj = validate_config(io_config, IoConfig)
-        # TODO(anyone): infer if to use kv_cache from task config
-        if io_config_obj.kv_cache:
-            kv_cache_config = complete_kv_cache_with_model_attributes(io_config_obj.kv_cache, self.model_attributes)
-            io_config_obj = extend_io_config_with_kv_cache(io_config_obj, kv_cache_config)
-        return io_config_obj.dict(exclude_none=True)
+@model_handler_registry("PyTorchModel")
+class PyTorchModelHandler(PyTorchModelHandlerBase):  # pylint: disable=too-many-ancestors
+    """PyTorch model handler.

-    @property
-    def io_config(self) -> Dict[str, Any]:
-        """Return io config of the model.
+    Besides the model loading for PyTorch model, the model handler also provides the following functionalities:
+      * Get the model io configuration from user provider io_config.
+      * Get the dummy inputs for PyTorch model used to evaluate the latency.
+    """

-        Priority: io_config > hf_config (using onnx_config)
-        """
-        io_config = None
-        if self._io_config:
-            # io_config is provided
-            io_config = self.get_user_io_config(self._io_config)
-        elif self.hf_config and self.hf_config.task and not self.hf_config.components:
-            # hf_config is provided
-            logger.debug("Trying hf onnx_config to get io_config")
-            # For MLFlow model, get io config from model_name instead of model_path
-            # TODO(xiaoyu): more investigation on the integration between MLFlow and HF
-            io_config = self.get_hf_io_config()
-            if io_config:
-                logger.debug("Got io_config from hf_config")
-
-        return io_config
-
-
-@model_handler_registry("DistributedPyTorchModel")
-class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
-    resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script", "adapter_path")
-    json_config_keys: Tuple[str, ...] = (
-        "model_name_pattern",
-        "num_ranks",
-        "model_loader",
-        "io_config",
-        "dummy_inputs_func",
-        "hf_config",
-    )
-
-    DEFAULT_RANKED_MODEL_NAME_FORMAT: ClassVar[str] = "model_{:02d}"
+    resource_keys: Tuple[str, ...] = ("model_path", "script_dir", "model_script")
+    json_config_keys: Tuple[str, ...] = ("model_file_format", "model_loader", "dummy_inputs_func", "generative")

    def __init__(
        self,
-        model_path: OLIVE_RESOURCE_ANNOTATIONS,
-        model_name_pattern: str,
-        num_ranks: int,
+        model_path: OLIVE_RESOURCE_ANNOTATIONS = None,
        model_file_format: ModelFileFormat = ModelFileFormat.PYTORCH_ENTIRE_MODEL,
        model_loader: Union[str, Callable] = None,
        model_script: Union[str, Path] = None,
        script_dir: Union[str, Path] = None,
        io_config: Union[Dict[str, Any], IoConfig, str, Callable] = None,
        dummy_inputs_func: Union[str, Callable] = None,
-        hf_config: Union[Dict[str, Any], HfConfig] = None,
-        adapter_path: OLIVE_RESOURCE_ANNOTATIONS = None,
        model_attributes: Optional[Dict[str, Any]] = None,
        generative: bool = False,
    ):
+        if not (isinstance(model_loader, Callable) or (isinstance(model_loader, str) and model_script) or model_path):
+            raise ValueError(
+                "model_path is required since model_loader is not callable or model_script is not provided"
+            )
+        self.model_loader = model_loader
+        self.model = None
        super().__init__(
            framework=Framework.PYTORCH,
            model_file_format=model_file_format,
@ -345,14 +113,19 @@ class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
            io_config=io_config,
            generative=generative,
        )
-
        self.add_resources(locals())

-        self.model_name_pattern = model_name_pattern
-        self.num_ranks = num_ranks
-        self.model_loader = model_loader
+        # ensure that script_dir and model_script are local resorces
+        for resource_name, expected_type in [
+            ("script_dir", ResourceType.LocalFolder),
+            ("model_script", ResourceType.LocalFile),
+        ]:
+            resource = create_resource_path(self.get_resource(resource_name))
+            if resource:
+                assert resource.type == expected_type, f"{resource_name} must be a local {expected_type}."
+
        self.dummy_inputs_func = dummy_inputs_func
-        self.hf_config = validate_config(hf_config, HfConfig) if hf_config else None
+        self.dummy_inputs = None

    @property
    def script_dir(self) -> str:
@ -362,63 +135,71 @@ class DistributedPyTorchModelHandler(OliveModelHandler, HfConfigMixin):
    def model_script(self) -> str:
        return self.get_resource("model_script")

-    @property
-    def adapter_path(self) -> str:
-        return self.get_resource("adapter_path")
+    def load_model(self, rank: int = None) -> torch.nn.Module:
+        if self.model is not None:
+            return self.model

-    def ranked_model_name(self, rank: int) -> str:
-        return self.model_name_pattern.format(rank)
+        # Load user module at the beginning since we may need user defined models to load model
+        user_module_loader = UserModuleLoader(self.model_script, self.script_dir)

-    def ranked_model_path(self, rank: int) -> Union[Path, str]:
-        return Path(self.model_path) / self.ranked_model_name(rank)
-
-    def load_model(self, rank: int = None) -> PyTorchModelHandler:
-        return PyTorchModelHandler(
-            model_path=self.ranked_model_path(rank),
-            model_file_format=ModelFileFormat.PYTORCH_ENTIRE_MODEL,
-            model_loader=self.model_loader,
-            model_script=self.model_script,
-            script_dir=self.script_dir,
-            io_config=self._io_config,
-            dummy_inputs_func=self.dummy_inputs_func,
-            hf_config=self.hf_config,
-            adapter_path=self.adapter_path,
-            model_attributes=self.model_attributes,
-        )
-
-    def get_component_model(self, component: HfComponent, rank: int = 0) -> PyTorchModelHandler:
-        # TODO(shaahji): Add support for 'HfComponent.component_func'
-        hf_config = deepcopy(self.hf_config).dict()
-        hf_config.pop("components", None)
-        return PyTorchModelHandler(
-            model_path=self.ranked_model_path(rank),
-            model_file_format=ModelFileFormat.PYTORCH_ENTIRE_MODEL,
-            model_script=self.model_script,
-            script_dir=self.script_dir,
-            io_config=component.io_config,
-            dummy_inputs_func=component.dummy_inputs_func,
-            hf_config=HfConfig.parse_obj(hf_config),
-            adapter_path=self.adapter_path,
-            model_attributes=self.model_attributes,
-        )
-
-    def prepare_session(
-        self,
-        inference_settings: Optional[Dict[str, Any]] = None,
-        device: Device = Device.GPU,  # pylint: disable=signature-differs
-        execution_providers: Union[str, List[str]] = None,
-        rank: Optional[int] = 0,
-    ) -> torch.nn.Module:
-        return self.load_model(rank).load_model(rank).eval()
-
-    def run_session(
-        self,
-        session: Any = None,
-        inputs: Union[Dict[str, Any], List[Any], Tuple[Any, ...]] = None,
-        **kwargs: Dict[str, Any],
-    ) -> Any:
-        if isinstance(inputs, dict):
-            results = session.generate(**inputs, **kwargs) if self.generative else session(**inputs, **kwargs)
+        # Load special path or format model -> load model from hf config -> load normal path model
+        if self.model_loader is not None:
+            model = user_module_loader.call_object(self.model_loader, self.model_path)
+        elif self.model_file_format == ModelFileFormat.PYTORCH_TORCH_SCRIPT:
+            model = torch.jit.load(self.model_path)
+        elif self.model_file_format == ModelFileFormat.PYTORCH_ENTIRE_MODEL:
+            model = torch.load(self.model_path)
+        elif self.model_file_format == ModelFileFormat.PYTORCH_SLICE_GPT_MODEL:
+            model = self._load_slicegpt_model()
+        elif self.model_file_format == ModelFileFormat.PYTORCH_STATE_DICT:
+            raise ValueError("Please use customized model loader to load state dict of model.")
        else:
-            results = session.generate(inputs, **kwargs) if self.generative else session(inputs, **kwargs)
-        return results
+            raise ValueError(f"Unsupported model file format: {self.model_file_format}")
+
+        self.model = model
+
+        return model
+
+    def _load_slicegpt_model(self):
+        from slicgpt.hf_utils import load_sliced_model
+
+        model_name = self.model_attributes.get("model_name")
+        if not model_name:
+            raise ValueError("`model_name` model attribute is required to load SliceGPT model.")
+
+        logger.info("Loading SliceGPT model with model_name %s from %s", model_name, self.model_path)
+        loaded_model, _ = load_sliced_model(model_name, self.model_path)
+        return loaded_model
+
+    @property
+    def io_config(self) -> Dict[str, Any]:
+        """Return io config of the model."""
+        if not self._io_config:
+            return None
+
+        io_config = self._io_config
+        if isinstance(io_config, (str, Callable)):
+            user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
+            io_config = user_module_loader.call_object(io_config, self)
+
+        return self.get_resolved_io_config(io_config, model_attributes=self.model_attributes)
+
+    def get_dummy_inputs(self, filter_hook=None, filter_hook_kwargs=None):
+        """Return a dummy input for the model."""
+        if self.dummy_inputs is not None:
+            return self.dummy_inputs
+
+        # Priority: user provided dummy_inputs_func > io_config
+        if self.dummy_inputs_func is not None:
+            logger.debug("Using dummy_inputs_func to get dummy inputs")
+            user_module_loader = UserModuleLoader(self.model_script, self.script_dir)
+            # respect user's dummy_inputs_func, no hook
+            return user_module_loader.call_object(self.dummy_inputs_func, self)
+
+        dummy_inputs = self._get_dummy_inputs_from_io_config(
+            filter_hook=filter_hook, filter_hook_kwargs=filter_hook_kwargs
+        )
+
+        if dummy_inputs is None:
+            raise ValueError("Unable to get dummy inputs for the model.")
+        return dummy_inputs
--- a/olive/model/utils/init.py
+++ b/olive/model/utils/init.py
@ -2,20 +2,10 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 # --------------------------------------------------------------------------
-from olive.model.utils.hf_mappings import (
-    HIDDEN_SIZE_NAMES,
-    MODEL_TYPE_MAPPING,
-    NUM_HEADS_NAMES,
-    NUM_KEY_VALUE_HEADS_NAMES,
-)
 from olive.model.utils.onnx_utils import resolve_onnx_path
 from olive.model.utils.path_utils import normalize_path_suffix

 __all__ = [
-    "HIDDEN_SIZE_NAMES",
-    "MODEL_TYPE_MAPPING",
-    "NUM_HEADS_NAMES",
-    "NUM_KEY_VALUE_HEADS_NAMES",
    "normalize_path_suffix",
    "resolve_onnx_path",
 ]
--- a/olive/model/utils/hf_utils.py
+++ b/olive/model/utils/hf_utils.py
@ -1,250 +0,0 @@
-# -------------------------------------------------------------------------
-# Copyright (c) Microsoft Corporation. All rights reserved.
-# Licensed under the MIT License.
-# --------------------------------------------------------------------------
-import logging
-from functools import partial
-from itertools import chain
-from typing import TYPE_CHECKING, Callable, Dict, Optional, Tuple, Union
-
-import transformers
-from transformers import AutoConfig, AutoModel, AutoTokenizer, GenerationConfig
-
-from olive.common.utils import get_attr
-from olive.model.utils.hf_mappings import FEATURE_TO_PEFT_TASK_TYPE, MODELS_TO_MAX_LENGTH_MAPPING, TASK_TO_FEATURE
-
-if TYPE_CHECKING:
-    from transformers import PretrainedConfig, PreTrainedModel, PreTrainedTokenizer, PreTrainedTokenizerFast
-    from transformers.onnx import OnnxConfig
-
-logger = logging.getLogger(__name__)
-
-
-def load_hf_model_from_task(task: str, name: str, **kwargs) -> "PreTrainedModel":
-    """Load huggingface model from task and name."""
-    from transformers.pipelines import check_task
-
-    task_results = check_task(task)
-    assert isinstance(task_results, tuple)
-    if len(task_results) == 2:
-        targeted_task = task_results[0]
-    elif len(task_results) == 3:
-        targeted_task = task_results[1]
-    else:
-        raise ValueError("unsupported transformers version")
-
-    class_tuple = targeted_task["pt"] or (AutoModel,)
-    model = None
-    for i, model_class in enumerate(class_tuple):
-        try:
-            model = model_class.from_pretrained(name, **kwargs)
-            logger.debug("Loaded model %s with name_or_path %s", model_class, name)
-            break
-        except (OSError, ValueError) as e:
-            if i == len(class_tuple) - 1:
-                # len(class_tuple) == 1 covers most common tasks like text-generation, text-classification, etc
-                # error could be device OOM, device_map: "auto" not supported, etc
-
-                # len(class_tuple) > 1: not common - image-segmentation, conversational, etc
-                # there is no easy way to get tracebacks for earlier failures, so just raise from last
-                raise
-            # the ValueError need to be caught since there will be multiple model_class for single task.
-            # if the model_class is not the one for the task, it will raise ValueError and
-            # next model_class will be tried.
-            logger.info(
-                "Failed to load model %s with name_or_path %s.\n kwargs: %s.\n Exception raised: %s",
-                model_class,
-                name,
-                kwargs,
-                e,
-            )
-
-    # this won't be None since class_tuple is never empty and we only reach here if model loaded successfully
-    # satisfies linter too
-    return model
-
-
-def huggingface_model_loader(model_loader: Union[str, Callable]) -> Callable:
-    if model_loader is None:
-        model_loader = "AutoModel"
-    if isinstance(model_loader, str):
-        try:
-            model_loader = getattr(transformers, model_loader)
-        except AttributeError:
-            raise AttributeError(f"{model_loader} is not found in transformers") from None
-    elif not isinstance(model_loader, Callable):
-        raise ValueError("model_loader must be a callable or a string defined in transformers")
-
-    return model_loader.from_pretrained
-
-
-def get_hf_model_config(model_name: str, **kwargs) -> "PretrainedConfig":
-    """Get HF Config for the given model name."""
-    return AutoConfig.from_pretrained(model_name, **kwargs)
-
-
-def save_hf_model_config(config: Union["PretrainedConfig", "GenerationConfig"], output_dir: str, **kwargs):
-    """Save input HF Config to output directory."""
-    config.save_pretrained(output_dir, **kwargs)
-
-
-def get_hf_model_generation_config(model_name: str, **kwargs) -> GenerationConfig:
-    """Get HF model's generation config for the given model name."""
-    return GenerationConfig.from_pretrained(model_name, **kwargs)
-
-
-def get_hf_model_tokenizer(model_name: str, **kwargs) -> Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"]:
-    """Get HF model's tokenizer."""
-    return AutoTokenizer.from_pretrained(model_name, **kwargs)
-
-
-def save_hf_model_tokenizer(
-    tokenizer: Union["PreTrainedTokenizer", "PreTrainedTokenizerFast"], output_dir: str, **kwargs
-) -> Tuple[str]:
-    """Save input tokenizer to output directory."""
-    return tokenizer.save_pretrained(output_dir, **kwargs)
-
-
-def load_hf_model_from_model_class(model_class: str, name: str, **kwargs):
-    """Load huggingface model from model_loader and name."""
-    return huggingface_model_loader(model_class)(name, **kwargs)
-
-
-# patched version of transformers.onnx.features.supported_features_mapping
-# to support additional models in olive
-def patched_supported_features_mapping(*supported_features: str, onnx_config_cls: str = None) -> Dict[str, Callable]:
-    """Generate the mapping between supported the features and their corresponding OnnxConfig for a given model.
-
-    Args:
-        *supported_features: The names of the supported features.
-        onnx_config_cls: The OnnxConfig full name corresponding to the model.
-
-    Returns:
-        The dictionary mapping a feature to an OnnxConfig constructor.
-
-    """
-    if onnx_config_cls is None:
-        raise ValueError("A OnnxConfig class must be provided")
-
-    from olive.model.utils import hf_onnx_config
-
-    config_cls = get_attr(hf_onnx_config, onnx_config_cls)
-    mapping = {}
-    for feature in supported_features:
-        if "-with-past" in feature:
-            task = feature.replace("-with-past", "")
-            mapping[feature] = partial(config_cls.with_past, task=task)
-        else:
-            mapping[feature] = partial(config_cls.from_model_config, task=feature)
-
-    return mapping
-
-
-def get_onnx_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs) -> "OnnxConfig":
-    # pylint: disable=protected-access
-    from transformers.onnx import FeaturesManager
-
-    from olive.model.utils.hf_onnx_config import ADDITIONAL_MODEL_TYPES
-
-    # patch FeaturesManager._SUPPORTED_MODEL_TYPE to support additional models in olive
-    for model_type, feature_list in ADDITIONAL_MODEL_TYPES.items():
-        if model_type in FeaturesManager._SUPPORTED_MODEL_TYPE:
-            continue
-        # TODO(trajep): remove the need for unpacking feature_list
-        features, onnx_config_cls = feature_list
-        FeaturesManager._SUPPORTED_MODEL_TYPE[model_type] = patched_supported_features_mapping(
-            *features, onnx_config_cls=onnx_config_cls
-        )
-
-    # if feature is not provided, try to get it from task
-    # else use "default"
-    feature = feature or TASK_TO_FEATURE.get(task, "default")
-
-    # don't want to load the model here since all we need is the config
-    # model loading is expensive computationally and memory-wise for large models
-    config = get_hf_model_config(model_name, **kwargs)
-    # recreate the logic for FeaturesManager.check_supported_model_or_raise to get the model_onnx_config
-    # https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/features.py#L712
-    model_type = config.model_type.replace("_", "-")
-    onnx_config = None
-    try:
-        model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
-        if feature in model_features:
-            onnx_config = FeaturesManager.get_config(model_type, feature)(config)
-        else:
-            logger.debug(
-                "%s doesn't support feature %s. Supported features are: %s", model_type, feature, model_features
-            )
-    except KeyError:
-        logger.debug("Model type %s is not supported", model_type)
-
-    return onnx_config
-
-
-def get_hf_model_io_config(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
-    # just log a debug message if io_config is not found
-    # this is not a critical error and the caller may not need the io_config
-    model_config = get_onnx_config(model_name, task, feature, **kwargs)
-    if not model_config:
-        return None
-
-    inputs = model_config.inputs
-    outputs = model_config.outputs
-    if not inputs or not outputs:
-        # just log a warning and return None, since this is not a critical error
-        # and following pass may not use the io_config, like OptimumConversion
-        logger.debug("No inputs or outputs found from hf onnx_config %s. Won't use it to get io config", model_config)
-        return None
-
-    io_config = {}
-    io_config["input_names"] = list(inputs.keys())
-    io_config["output_names"] = list(outputs.keys())
-    io_config["dynamic_axes"] = dict(chain(inputs.items(), outputs.items()))
-    return io_config
-
-
-def get_hf_model_dummy_input(model_name: str, task: str, feature: Optional[str] = None, **kwargs):
-    model_config = get_onnx_config(model_name, task, feature, **kwargs)
-    if not model_config:
-        return None
-    tokenizer = AutoTokenizer.from_pretrained(model_name, **kwargs)
-    return model_config.generate_dummy_inputs(tokenizer, framework="pt")
-
-
-def get_peft_task_type_from_task(task: str, fail_on_not_found=False) -> str:
-    """Get peft task type from feature."""
-    feature = TASK_TO_FEATURE.get(task, None)
-    peft_task_type = FEATURE_TO_PEFT_TASK_TYPE.get(feature, None) if feature else None
-    not_found_msg = f"There is no peft task type for task {task}"
-    if peft_task_type is None and fail_on_not_found:
-        raise ValueError(not_found_msg)
-    elif peft_task_type is None:
-        logger.warning(not_found_msg)
-    return peft_task_type
-
-
-def get_model_max_length(model_name: str, fail_on_not_found=False) -> int:
-    """Get max length of the model, extracted from the config."""
-    model_config = get_hf_model_config(model_name)
-    model_type = model_config.model_type
-
-    max_length = MODELS_TO_MAX_LENGTH_MAPPING.get(model_type, None)
-    if isinstance(max_length, int):
-        return max_length
-    elif isinstance(max_length, str):
-        return getattr(model_config, max_length)
-    else:
-        logger.debug(
-            "No max length mapping found in MODELS_TO_MAX_LENGTH_MAPPING for model type %s, trying __default__",
-            model_type,
-        )
-        default_max_length = MODELS_TO_MAX_LENGTH_MAPPING["__default__"]
-        try:
-            return getattr(model_config, default_max_length)
-        except AttributeError:
-            not_found_msg = f"Could not find max length for model type {model_type}"
-            if fail_on_not_found:
-                raise ValueError(not_found_msg) from None
-            else:
-                logger.warning(not_found_msg)
-                return None
--- a/Показать больше
+++ b/Показать больше