[Compression] merge nni.contrib.compression with nni.compression (#5573)

Co-authored-by: nishang <nishang@microsoft.com>
2023-07-10 10:33:53 +08:00 · 2023-07-10 10:33:53 +08:00 · 27a24a12af
--- a/docs/source/compression/changes.rst
+++ b/docs/source/compression/changes.rst
@ -8,23 +8,6 @@ Nonetheless, if you have employed NNI Compression before and want to try the lat
 this document will help you in comprehending the noteworthy alterations in the interface in 3.0.


-New compression version import path:
-
-.. code-block:: python
-
-    # most new compression related, include pruners, quantizers, distillers, except new pruning speedup
-    from nni.contrib.compression.xxx import xxx
-    # new pruning speedup
-    from nni.compression.pytorch.speedup.v2 import ModelSpeedup
-
-
-Old compression version import path:
-
-.. code-block:: python
-
-    from nni.compression.pytorch.xxx import xxx
-
-
 Compression Target
 ------------------

--- a/docs/source/compression/evaluator.rst
+++ b/docs/source/compression/evaluator.rst
@ -20,21 +20,21 @@ NNI introduces the ``Evaluator`` as the carrier of the training and evaluation p
    These APIs were maybe tedious in terms of user experience. Users need to exchange the corresponding API frequently if they want to switch compression algorithms.
    ``Evaluator`` is an alternative to the above interface, users only need to create the evaluator once and it can be used in all compressors.

-For users of native PyTorch, :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
+For users of native PyTorch, :class:`TorchEvaluator <nni.compression.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
 which will bring some complexity. But don't worry, in most cases, this will not change too much code.

-For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.
+For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.compression.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.

-For users of `Transformers Trainer <https://huggingface.co/docs/transformers/main_classes/trainer>`__, :class:`TransformersEvaluator <nni.contrib.compression.TransformersEvaluator>` can be created with only a few lines of code.
+For users of `Transformers Trainer <https://huggingface.co/docs/transformers/main_classes/trainer>`__, :class:`TransformersEvaluator <nni.compression.TransformersEvaluator>` can be created with only a few lines of code.

 Here we give three examples of how to create an ``Evaluator`` for native PyTorch users, PyTorchLightning users and Huggingface Transformers users.

 TorchEvaluator
 --------------

-:class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).
+:class:`TorchEvaluator <nni.compression.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).

-:class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``training_step``, ``lr_schedulers``,
+:class:`TorchEvaluator <nni.compression.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``training_step``, ``lr_schedulers``,
 ``dummy_input``, ``evaluating_func``.

 * ``training_func`` is the training loop to train the compressed model.
@ -53,8 +53,8 @@ TorchEvaluator
 * ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
  The format of metric should be a float number or a dict with key ``default``.

-Please refer :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` for more details.
-Here is an example of how to initialize a :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>`.
+Please refer :class:`TorchEvaluator <nni.compression.TorchEvaluator>` for more details.
+Here is an example of how to initialize a :class:`TorchEvaluator <nni.compression.TorchEvaluator>`.

 .. code-block:: python

@ -89,7 +89,7 @@ Here is an example of how to initialize a :class:`TorchEvaluator <nni.contrib.co
    evaluator = TorchEvaluator(training_func, optimizer, training_step, lr_scheduler)

 .. note::
-    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` must be provided.
+    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.TorchEvaluator>` must be provided.
    Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
    Please refer to each compressor's doc to check the required arguments.
    But, it is fine to provide more arguments than the compressor's need.
@ -100,7 +100,7 @@ A complete example can be found :githublink:`here <examples/compression/evaluato

 LightningEvaluator
 ------------------
-:class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` is for the users who work with PyTorchLightning.
+:class:`LightningEvaluator <nni.compression.LightningEvaluator>` is for the users who work with PyTorchLightning.

 Only three parts users need to modify compared with the original pytorch-lightning code:

@ -108,8 +108,8 @@ Only three parts users need to modify compared with the original pytorch-lightni
 2. Wrap the ``LightningModule`` class with ``nni.trace``.
 3. Wrap the ``LightningDataModule`` class with ``nni.trace``.

-Please refer :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` for more details.
-Here is an example of how to initialize a :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>`.
+Please refer :class:`LightningEvaluator <nni.compression.LightningEvaluator>` for more details.
+Here is an example of how to initialize a :class:`LightningEvaluator <nni.compression.LightningEvaluator>`.

 .. code-block:: python

@ -139,7 +139,7 @@ A complete example can be found :githublink:`here <examples/compression/evaluato
 TransformersEvaluator
 ---------------------

-:class:`TransformersEvaluator <nni.contrib.compression.TransformersEvaluator>` is for the users who work with Huggingface Transformers Trainer.
+:class:`TransformersEvaluator <nni.compression.TransformersEvaluator>` is for the users who work with Huggingface Transformers Trainer.

 The only need is using ``nni.trace`` to wrap the Trainer class.

@ -149,7 +149,7 @@ The only need is using ``nni.trace`` to wrap the Trainer class.
    from transformers.trainer import Trainer
    trainer = nni.trace(Trainer)(model, training_args, ...)

-    from nni.contrib.compression.utils import TransformersEvaluator
+    from nni.compression.utils import TransformersEvaluator
    evaluator = TransformersEvaluator(trainer)

 Moreover, if you are utilizing a personalized optimizer or learning rate scheduler, kindly use ``nni.trace`` to wrap their class as well.
@ -166,9 +166,9 @@ A complete example of using a trainer with DeepSpeed mode under the Transformers
 DeepspeedTorchEvaluator
 -----------------------

-:class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>` is an evaluator designed specifically for native PyTorch users who are utilizing DeepSpeed.
+:class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>` is an evaluator designed specifically for native PyTorch users who are utilizing DeepSpeed.

-:class:`DeepspeedTorchEvaluator <nni.contrib.compression.TorchEvaluator>` has eight initialization parameters ``training_func``,  ``training_step``, ``deepspeed``, ``optimizer``, ``lr_scheduler``,
+:class:`DeepspeedTorchEvaluator <nni.compression.TorchEvaluator>` has eight initialization parameters ``training_func``,  ``training_step``, ``deepspeed``, ``optimizer``, ``lr_scheduler``,
 ``resume_from_checkpoint_args``, ``dummy_input``, ``evaluating_func``.

 * ``training_func`` is the training loop to train the compressed model.
@ -189,8 +189,8 @@ DeepspeedTorchEvaluator
 * ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
  The format of metric should be a float number or a dict with key ``default``.

-Please refer :class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>` for more details.
-Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>`.
+Please refer :class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>` for more details.
+Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>`.

 .. code-block:: python

@ -236,7 +236,7 @@ Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.c
    evaluator = DeepspeedTorchEvaluator(training_func, training_step, ds_config, lr_scheduler)

 .. note::
-    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` must be provided.
+    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.TorchEvaluator>` must be provided.
    Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
    Please refer to each compressor's doc to check the required arguments.
    But, it is fine to provide more arguments than the compressor's need.
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -120,6 +120,11 @@ linkcheck_ignore = [

    # remove after 3.0 release
    r'https://nni\.readthedocs\.io/en/v2\.10/compression/overview\.html',
+    
+    r'https://github.com/google-research/google-research/blob/20736344/tunas/rematlib/mobile_model_v3.py#L453',
+    r'https://github.com/google-research/google-research/blob/20736344591f774f4b1570af64624ed1e18d2867/tunas/mobile_search_space_v3.py#L728',
+    r'https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L135',
+    r'https://github.com/rwightman/pytorch-image-models/blob/b7cb8d03/timm/models/efficientnet_blocks.py#L134',
 ]

 # Ignore all links located in release.rst
--- a/docs/source/reference/compression/distiller.rst
+++ b/docs/source/reference/compression/distiller.rst
@ -4,9 +4,9 @@ Distiller
 DynamicLayerwiseDistiller
 -------------------------

-..  autoclass:: nni.contrib.compression.distillation.DynamicLayerwiseDistiller
+..  autoclass:: nni.compression.distillation.DynamicLayerwiseDistiller

 Adaptive1dLayerwiseDistiller
 ----------------------------

-..  autoclass:: nni.contrib.compression.distillation.Adaptive1dLayerwiseDistiller
+..  autoclass:: nni.compression.distillation.Adaptive1dLayerwiseDistiller
--- a/docs/source/reference/compression/evaluator.rst
+++ b/docs/source/reference/compression/evaluator.rst
@ -6,25 +6,25 @@ Evaluator
 TorchEvaluator
 --------------

-..  autoclass:: nni.contrib.compression.TorchEvaluator
+..  autoclass:: nni.compression.TorchEvaluator

 .. _new-lightning-evaluator:

 LightningEvaluator
 ------------------

-..  autoclass:: nni.contrib.compression.LightningEvaluator
+..  autoclass:: nni.compression.LightningEvaluator

 .. _new-transformers-evaluator:

 TransformersEvaluator
 ---------------------

-..  autoclass:: nni.contrib.compression.TransformersEvaluator
+..  autoclass:: nni.compression.TransformersEvaluator

 .. _new-deepspeed-torch-evaluator:

 DeepspeedTorchEvaluator
 -----------------------

-..  autoclass:: nni.contrib.compression.DeepspeedTorchEvaluator
+..  autoclass:: nni.compression.DeepspeedTorchEvaluator
--- a/docs/source/reference/compression/pruner.rst
+++ b/docs/source/reference/compression/pruner.rst
@ -9,42 +9,42 @@ Basic Pruner
 Level Pruner
 ^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.pruning.LevelPruner
+..  autoclass:: nni.compression.pruning.LevelPruner

 .. _new-l1-norm-pruner:

 L1 Norm Pruner
 ^^^^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.L1NormPruner
+.. autoclass:: nni.compression.pruning.L1NormPruner

 .. _new-l2-norm-pruner:

 L2 Norm Pruner
 ^^^^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.L2NormPruner
+.. autoclass:: nni.compression.pruning.L2NormPruner

 .. _new-fpgm-pruner:

 FPGM Pruner
 ^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.FPGMPruner
+.. autoclass:: nni.compression.pruning.FPGMPruner

 .. _new-slim-pruner:

 Slim Pruner
 ^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.SlimPruner
+.. autoclass:: nni.compression.pruning.SlimPruner

 .. _new-taylor-pruner:

 Taylor FO Weight Pruner
 ^^^^^^^^^^^^^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.TaylorPruner
+.. autoclass:: nni.compression.pruning.TaylorPruner

 Scheduled Pruners
 -----------------
@ -54,14 +54,14 @@ Scheduled Pruners
 Linear Pruner
 ^^^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.LinearPruner
+.. autoclass:: nni.compression.pruning.LinearPruner

 .. _new-agp-pruner:

 AGP Pruner
 ^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.AGPPruner
+.. autoclass:: nni.compression.pruning.AGPPruner

 Other Pruner
 ------------
@ -71,4 +71,4 @@ Other Pruner
 Movement Pruner
 ^^^^^^^^^^^^^^^

-.. autoclass:: nni.contrib.compression.pruning.MovementPruner
+.. autoclass:: nni.compression.pruning.MovementPruner
--- a/docs/source/reference/compression/pruning_speedup.rst
+++ b/docs/source/reference/compression/pruning_speedup.rst
@ -1,5 +1,5 @@
 Pruning Speedup
 ===============

-.. autoclass:: nni.compression.pytorch.speedup.v2.ModelSpeedup
+.. autoclass:: nni.compression.speedup.ModelSpeedup
    :members:
--- a/docs/source/reference/compression/quantizer.rst
+++ b/docs/source/reference/compression/quantizer.rst
@ -6,32 +6,32 @@ Quantizer
 QAT Quantizer
 ^^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.quantization.QATQuantizer
+..  autoclass:: nni.compression.quantization.QATQuantizer

 .. _NewDorefaQuantizer:

 DoReFa Quantizer
 ^^^^^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.quantization.DoReFaQuantizer
+..  autoclass:: nni.compression.quantization.DoReFaQuantizer

 .. _NewBNNQuantizer:

 BNN Quantizer
 ^^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.quantization.BNNQuantizer
+..  autoclass:: nni.compression.quantization.BNNQuantizer

 .. _NewLsqQuantizer:

 LSQ Quantizer
 ^^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.quantization.LsqQuantizer
+..  autoclass:: nni.compression.quantization.LsqQuantizer

 .. _NewPtqQuantizer:

 PTQ Quantizer
 ^^^^^^^^^^^^^

-..  autoclass:: nni.contrib.compression.quantization.PtqQuantizer
+..  autoclass:: nni.compression.quantization.PtqQuantizer
--- a/docs/source/reference/compression/utils.rst
+++ b/docs/source/reference/compression/utils.rst
@ -6,5 +6,5 @@ Compression Utilities
 auto_set_denpendency_group_ids
 ------------------------------

-.. autoclass:: nni.contrib.compression.utils.auto_set_denpendency_group_ids
+.. autoclass:: nni.compression.utils.auto_set_denpendency_group_ids
    :members:
--- a/docs/source/tutorials/hpo_quickstart_tensorflow/index.rst
+++ b/docs/source/tutorials/hpo_quickstart_tensorflow/index.rst
@ -17,7 +17,7 @@
 .. only:: html

  .. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png
-    :alt: HPO Quickstart with TensorFlow
+    :alt:

  :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py`

@ -34,7 +34,7 @@
 .. only:: html

  .. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_model_thumb.png
-    :alt: Port TensorFlow Quickstart to NNI
+    :alt:

  :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py`

--- a/docs/source/tutorials/images/thumb/sphx_glr_new_pruning_bert_glue_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_new_pruning_bert_glue_thumb.png
--- a/docs/source/tutorials/images/thumb/sphx_glr_quantization_bert_glue_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_quantization_bert_glue_thumb.png
--- a/docs/source/tutorials/new_pruning_bert_glue.ipynb
+++ b/docs/source/tutorials/new_pruning_bert_glue.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -134,7 +123,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller\nfrom nni.contrib.compression.utils import TransformersEvaluator"
+        "from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller\nfrom nni.compression.utils import TransformersEvaluator"
      ]
    },
    {
@ -188,7 +177,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.contrib.compression.pruning import MovementPruner\nfrom nni.compression.pytorch.speedup.v2 import ModelSpeedup\nfrom nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer\n\n\ndef pruning_attn():\n    Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n    model = build_finetuning_model(task_name, f'./output/bert_finetuned/{task_name}.bin')\n    trainer = prepare_traced_trainer(model, task_name)\n    evaluator = TransformersEvaluator(trainer)\n\n    config_list = [{\n        'op_types': ['Linear'],\n        'op_names_re': ['bert\\.encoder\\.layer\\.[0-9]*\\.attention\\.*'],\n        'sparse_threshold': 0.1,\n        'granularity': [64, 64]\n    }]\n\n    pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)\n    pruner.compress(None, 4)\n    pruner.unwrap_model()\n\n    masks = pruner.get_masks()\n    Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n    torch.save(masks, './output/pruning/attn_masks.pth')\n    torch.save(model, './output/pruning/attn_masked_model.pth')\n\n\nif not skip_exec:\n    pruning_attn()"
+        "from nni.compression.pruning import MovementPruner\nfrom nni.compression.speedup import ModelSpeedup\nfrom nni.compression.utils.external.external_replacer import TransformersAttentionReplacer\n\n\ndef pruning_attn():\n    Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n    model = build_finetuning_model(task_name, f'./output/bert_finetuned/{task_name}.bin')\n    trainer = prepare_traced_trainer(model, task_name)\n    evaluator = TransformersEvaluator(trainer)\n\n    config_list = [{\n        'op_types': ['Linear'],\n        'op_names_re': ['bert\\.encoder\\.layer\\.[0-9]*\\.attention\\.*'],\n        'sparse_threshold': 0.1,\n        'granularity': [64, 64]\n    }]\n\n    pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)\n    pruner.compress(None, 4)\n    pruner.unwrap_model()\n\n    masks = pruner.get_masks()\n    Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n    torch.save(masks, './output/pruning/attn_masks.pth')\n    torch.save(model, './output/pruning/attn_masked_model.pth')\n\n\nif not skip_exec:\n    pruning_attn()"
      ]
    },
    {
@ -224,7 +213,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.contrib.compression.pruning import TaylorPruner, AGPPruner\nfrom transformers.models.bert.modeling_bert import BertLayer\n\n\ndef pruning_ffn():\n    model: BertForSequenceClassification = torch.load('./output/pruning/attn_pruned_model.pth')\n    teacher_model: BertForSequenceClassification = build_finetuning_model('mnli', f'./output/bert_finetuned/{task_name}.bin')\n    # create ffn config list, here simply use a linear function related to the number of retained heads to determine the sparse ratio\n    config_list = []\n    for name, module in model.named_modules():\n        if isinstance(module, BertLayer):\n            retained_head_num = module.attention.self.num_attention_heads\n            ori_head_num = len(module.attention.pruned_heads) + retained_head_num\n            ffn_sparse_ratio = 1 - retained_head_num / ori_head_num / 2\n            config_list.append({'op_names': [f'{name}.intermediate.dense'], 'sparse_ratio': ffn_sparse_ratio})\n\n    trainer = prepare_traced_trainer(model, task_name)\n    teacher_model.eval().to(trainer.args.device)\n    # create a distiller for restoring the accuracy\n    distiller = dynamic_distiller(model, teacher_model, trainer)\n    # fusion compress: TaylorPruner + DynamicLayerwiseDistiller\n    taylor_pruner = TaylorPruner.from_compressor(distiller, config_list, 1000)\n    # fusion compress: AGPPruner(TaylorPruner) + DynamicLayerwiseDistiller\n    agp_pruner = AGPPruner(taylor_pruner, 1000, 36)\n    agp_pruner.compress(None, 3)\n    agp_pruner.unwrap_model()\n    distiller.unwrap_teacher_model()\n\n    masks = agp_pruner.get_masks()\n    Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n    torch.save(masks, './output/pruning/ffn_masks.pth')\n    torch.save(model, './output/pruning/ffn_masked_model.pth')\n\n\nif not skip_exec:\n    pruning_ffn()"
+        "from nni.compression.pruning import TaylorPruner, AGPPruner\nfrom transformers.models.bert.modeling_bert import BertLayer\n\n\ndef pruning_ffn():\n    model: BertForSequenceClassification = torch.load('./output/pruning/attn_pruned_model.pth')\n    teacher_model: BertForSequenceClassification = build_finetuning_model('mnli', f'./output/bert_finetuned/{task_name}.bin')\n    # create ffn config list, here simply use a linear function related to the number of retained heads to determine the sparse ratio\n    config_list = []\n    for name, module in model.named_modules():\n        if isinstance(module, BertLayer):\n            retained_head_num = module.attention.self.num_attention_heads\n            ori_head_num = len(module.attention.pruned_heads) + retained_head_num\n            ffn_sparse_ratio = 1 - retained_head_num / ori_head_num / 2\n            config_list.append({'op_names': [f'{name}.intermediate.dense'], 'sparse_ratio': ffn_sparse_ratio})\n\n    trainer = prepare_traced_trainer(model, task_name)\n    teacher_model.eval().to(trainer.args.device)\n    # create a distiller for restoring the accuracy\n    distiller = dynamic_distiller(model, teacher_model, trainer)\n    # fusion compress: TaylorPruner + DynamicLayerwiseDistiller\n    taylor_pruner = TaylorPruner.from_compressor(distiller, config_list, 1000)\n    # fusion compress: AGPPruner(TaylorPruner) + DynamicLayerwiseDistiller\n    agp_pruner = AGPPruner(taylor_pruner, 1000, 36)\n    agp_pruner.compress(None, 3)\n    agp_pruner.unwrap_model()\n    distiller.unwrap_teacher_model()\n\n    masks = agp_pruner.get_masks()\n    Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n    torch.save(masks, './output/pruning/ffn_masks.pth')\n    torch.save(model, './output/pruning/ffn_masked_model.pth')\n\n\nif not skip_exec:\n    pruning_ffn()"
      ]
    },
    {
@ -260,7 +249,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.contrib.compression.base.setting import PruningSetting\n\noutput_align_setting = {\n    '_output_': {\n        'align': {\n            'module_name': None,\n            'target_name': 'weight',\n            'dims': [0],\n        },\n        'apply_method': 'mul',\n    }\n}\nPruningSetting.register('BertAttention', output_align_setting)\nPruningSetting.register('BertOutput', output_align_setting)"
+        "from nni.compression.base.setting import PruningSetting\n\noutput_align_setting = {\n    '_output_': {\n        'align': {\n            'module_name': None,\n            'target_name': 'weight',\n            'dims': [0],\n        },\n        'apply_method': 'mul',\n    }\n}\nPruningSetting.register('BertAttention', output_align_setting)\nPruningSetting.register('BertOutput', output_align_setting)"
      ]
    },
    {
@ -341,7 +330,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.9.16"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/new_pruning_bert_glue.py
+++ b/docs/source/tutorials/new_pruning_bert_glue.py
@ -199,8 +199,8 @@ if not skip_exec:
 # The following code creates distillers for distillation.


-from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
-from nni.contrib.compression.utils import TransformersEvaluator
+from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
+from nni.compression.utils import TransformersEvaluator

 # %%
 # Dynamic distillation is suitable for the situation where the distillation states dimension of the student and the teacher match.
@ -312,9 +312,9 @@ def adapt_distillation(student_model: BertForSequenceClassification, teacher_mod
 # You could refer to the experiment results to choose a appropriate ``regular_scale`` you like.


-from nni.contrib.compression.pruning import MovementPruner
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
-from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
+from nni.compression.pruning import MovementPruner
+from nni.compression.speedup import ModelSpeedup
+from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer


 def pruning_attn():
@ -378,7 +378,7 @@ if not skip_exec:
 # so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning performance.


-from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
+from nni.compression.pruning import TaylorPruner, AGPPruner
 from transformers.models.bert.modeling_bert import BertLayer


@ -444,7 +444,7 @@ if not skip_exec:
 # The output masks can be generated and applied after register the setting template for them.


-from nni.contrib.compression.base.setting import PruningSetting
+from nni.compression.base.setting import PruningSetting

 output_align_setting = {
    '_output_': {
--- a/docs/source/tutorials/new_pruning_bert_glue.py.md5
+++ b/docs/source/tutorials/new_pruning_bert_glue.py.md5
@ -1 +1 @@
-3e81f00f13fab8cfc204a0baef7d075e
+f9ff31917a7b6ae9f988fcd63d626663
--- a/docs/source/tutorials/new_pruning_bert_glue.rst
+++ b/docs/source/tutorials/new_pruning_bert_glue.rst
@ -10,7 +10,7 @@
    .. note::
        :class: sphx-glr-download-link-note

-        Click :ref:`here <sphx_glr_download_tutorials_new_pruning_bert_glue.py>`
+        :ref:`Go to the end <sphx_glr_download_tutorials_new_pruning_bert_glue.py>`
        to download the full example code

 .. rst-class:: sphx-glr-example-title
@ -300,8 +300,8 @@ The following code creates distillers for distillation.



-    from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
-    from nni.contrib.compression.utils import TransformersEvaluator
+    from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
+    from nni.compression.utils import TransformersEvaluator



@ -452,9 +452,9 @@ You could refer to the experiment results to choose a appropriate ``regular_scal



-    from nni.contrib.compression.pruning import MovementPruner
-    from nni.compression.pytorch.speedup.v2 import ModelSpeedup
-    from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
+    from nni.compression.pruning import MovementPruner
+    from nni.compression.speedup import ModelSpeedup
+    from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer


    def pruning_attn():
@ -544,7 +544,7 @@ so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning p



-    from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
+    from nni.compression.pruning import TaylorPruner, AGPPruner
    from transformers.models.bert.modeling_bert import BertLayer


@ -636,7 +636,7 @@ The output masks can be generated and applied after register the setting templat



-    from nni.contrib.compression.base.setting import PruningSetting
+    from nni.compression.base.setting import PruningSetting

    output_align_setting = {
        '_output_': {
@ -858,7 +858,7 @@ Results

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 0 minutes  1.990 seconds)
+   **Total running time of the script:** ( 0 minutes  0.020 seconds)


 .. _sphx_glr_download_tutorials_new_pruning_bert_glue.py:
@ -868,6 +868,8 @@ Results
  .. container:: sphx-glr-footer sphx-glr-footer-example


+
+
    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: new_pruning_bert_glue.py <new_pruning_bert_glue.py>`
--- a/docs/source/tutorials/new_pruning_bert_glue_codeobj.pickle
+++ b/docs/source/tutorials/new_pruning_bert_glue_codeobj.pickle
--- a/docs/source/tutorials/pruning_quick_start.ipynb
+++ b/docs/source/tutorials/pruning_quick_start.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -80,7 +69,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.contrib.compression.pruning import L1NormPruner\npruner = L1NormPruner(model, config_list)\n\n# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.\nprint(model)"
+        "from nni.compression.pruning import L1NormPruner\npruner = L1NormPruner(model, config_list)\n\n# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.\nprint(model)"
      ]
    },
    {
@ -109,7 +98,7 @@
      },
      "outputs": [],
      "source": [
-        "# need to unwrap the model, if the model is wrapped before speedup\npruner.unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.pytorch.speedup.v2 import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
+        "# need to unwrap the model, if the model is wrapped before speedup\npruner.unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
      ]
    },
    {
@ -165,7 +154,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.8.8"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/pruning_quick_start.py
+++ b/docs/source/tutorials/pruning_quick_start.py
@ -65,7 +65,7 @@ config_list = [{
 # %%
 # Pruners usually require `model` and `config_list` as input arguments.

-from nni.contrib.compression.pruning import L1NormPruner
+from nni.compression.pruning import L1NormPruner
 pruner = L1NormPruner(model, config_list)

 # show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -88,7 +88,7 @@ for name, mask in masks.items():
 pruner.unwrap_model()

 # speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression.speedup import ModelSpeedup

 ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

--- a/docs/source/tutorials/pruning_quick_start.py.md5
+++ b/docs/source/tutorials/pruning_quick_start.py.md5
@ -1 +1 @@
-9feea465b118b0fa5da9379f4bb2d357
+026cf2d53a9a109f620494e783ecec0b
--- a/docs/source/tutorials/pruning_quick_start.rst
+++ b/docs/source/tutorials/pruning_quick_start.rst
@ -10,7 +10,7 @@
    .. note::
        :class: sphx-glr-download-link-note

-        Click :ref:`here <sphx_glr_download_tutorials_pruning_quick_start.py>`
+        :ref:`Go to the end <sphx_glr_download_tutorials_pruning_quick_start.py>`
        to download the full example code

 .. rst-class:: sphx-glr-example-title
@ -104,9 +104,9 @@ If you are familiar with defining a model and training in pytorch, you can skip

 .. code-block:: none

-    Average test loss: 0.6140, Accuracy: 7985/10000 (80%)
-    Average test loss: 0.2676, Accuracy: 9209/10000 (92%)
-    Average test loss: 0.1946, Accuracy: 9424/10000 (94%)
+    Average test loss: 0.7821, Accuracy: 7228/10000 (72%)
+    Average test loss: 0.2444, Accuracy: 9262/10000 (93%)
+    Average test loss: 0.1760, Accuracy: 9493/10000 (95%)



@ -151,7 +151,7 @@ Pruners usually require `model` and `config_list` as input arguments.
 .. code-block:: default


-    from nni.contrib.compression.pruning import L1NormPruner
+    from nni.compression.pruning import L1NormPruner
    pruner = L1NormPruner(model, config_list)

    # show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -213,10 +213,10 @@ Pruners usually require `model` and `config_list` as input arguments.

 .. code-block:: none

-    fc2  sparsity :  0.5
+    fc1  sparsity :  0.5
    conv1  sparsity :  0.5
    conv2  sparsity :  0.5
-    fc1  sparsity :  0.5
+    fc2  sparsity :  0.5



@ -236,7 +236,7 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma
    pruner.unwrap_model()

    # speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
-    from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+    from nni.compression.speedup import ModelSpeedup

    ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

@ -326,7 +326,7 @@ Because speedup will replace the masked big layers with dense small ones.

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 1 minutes  20.740 seconds)
+   **Total running time of the script:** ( 1 minutes  1.145 seconds)


 .. _sphx_glr_download_tutorials_pruning_quick_start.py:
@ -336,6 +336,8 @@ Because speedup will replace the masked big layers with dense small ones.
  .. container:: sphx-glr-footer sphx-glr-footer-example


+
+
    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: pruning_quick_start.py <pruning_quick_start.py>`
--- a/docs/source/tutorials/pruning_quick_start_codeobj.pickle
+++ b/docs/source/tutorials/pruning_quick_start_codeobj.pickle
--- a/docs/source/tutorials/pruning_speedup.ipynb
+++ b/docs/source/tutorials/pruning_speedup.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -87,7 +76,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.compression.pytorch.speedup.v2 import ModelSpeedup\nModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()\nprint(model)"
+        "from nni.compression.speedup import ModelSpeedup\nModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()\nprint(model)"
      ]
    },
    {
@ -112,7 +101,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,\nplease refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.\n\nNOTE: The current implementation supports PyTorch 1.3.1 or newer.\n\n## Limitations\n\nFor PyTorch we can only replace modules, if functions in ``forward`` should be replaced,\nour current implementation does not work. One workaround is make the function a PyTorch module.\n\nIf you want to speedup your own model which cannot supported by the current implementation,\nyou need implement the replace function for module replacement, welcome to contribute.\n\n## Speedup Results of Examples\n\n\nThese result are tested on the [legacy pruning framework](https://nni.readthedocs.io/en/v2.6/Compression/pruning.html), new results will coming soon.\n\n### slim pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01197\n     - 0.005107\n   * - 2\n     - 0.02019\n     - 0.008769\n   * - 4\n     - 0.02733\n     - 0.014809\n   * - 8\n     - 0.04310\n     - 0.027441\n   * - 16\n     - 0.07731\n     - 0.05008\n   * - 32\n     - 0.14464\n     - 0.10027\n\n### fpgm pruner example\n\non cpu,\ninput tensor: ``torch.randn(64, 1, 28, 28)``\\ ,\ntoo large variance\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01383\n     - 0.01839\n   * - 2\n     - 0.01167\n     - 0.003558\n   * - 4\n     - 0.01636\n     - 0.01088\n   * - 40\n     - 0.14412\n     - 0.08268\n   * - 40\n     - 1.29385\n     - 0.14408\n   * - 40\n     - 0.41035\n     - 0.46162\n   * - 400\n     - 6.29020\n     - 5.82143\n\n### l1filter pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01026\n     - 0.003677\n   * - 2\n     - 0.01657\n     - 0.008161\n   * - 4\n     - 0.02458\n     - 0.020018\n   * - 8\n     - 0.03498\n     - 0.025504\n   * - 16\n     - 0.06757\n     - 0.047523\n   * - 32\n     - 0.10487\n     - 0.086442\n\n### APoZ pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01389\n     - 0.004208\n   * - 2\n     - 0.01628\n     - 0.008310\n   * - 4\n     - 0.02521\n     - 0.014008\n   * - 8\n     - 0.03386\n     - 0.023923\n   * - 16\n     - 0.06042\n     - 0.046183\n   * - 32\n     - 0.12421\n     - 0.087113\n\n### SimulatedAnnealing pruner example\n\nIn this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.\nWe measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.\nThe latency is measured on one V100 GPU and the input tensor is  ``torch.randn(128, 3, 32, 32)``.\n\n<img src=\"file://../../img/SA_latency_accuracy.png\">\n\n"
+        "For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,\nplease refer to :doc:`Pruning Quick Start <pruning_quick_start>`.\n\nNOTE: The current implementation supports PyTorch 1.3.1 or newer.\n\n## Limitations\n\nFor PyTorch we can only replace modules, if functions in ``forward`` should be replaced,\nour current implementation does not work. One workaround is make the function a PyTorch module.\n\nIf you want to speedup your own model which cannot supported by the current implementation,\nyou need implement the replace function for module replacement, welcome to contribute.\n\n## Speedup Results of Examples\n\n\nThese result are tested on the [legacy pruning framework](https://nni.readthedocs.io/en/v2.6/Compression/pruning.html), new results will coming soon.\n\n### slim pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01197\n     - 0.005107\n   * - 2\n     - 0.02019\n     - 0.008769\n   * - 4\n     - 0.02733\n     - 0.014809\n   * - 8\n     - 0.04310\n     - 0.027441\n   * - 16\n     - 0.07731\n     - 0.05008\n   * - 32\n     - 0.14464\n     - 0.10027\n\n### fpgm pruner example\n\non cpu,\ninput tensor: ``torch.randn(64, 1, 28, 28)``\\ ,\ntoo large variance\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01383\n     - 0.01839\n   * - 2\n     - 0.01167\n     - 0.003558\n   * - 4\n     - 0.01636\n     - 0.01088\n   * - 40\n     - 0.14412\n     - 0.08268\n   * - 40\n     - 1.29385\n     - 0.14408\n   * - 40\n     - 0.41035\n     - 0.46162\n   * - 400\n     - 6.29020\n     - 5.82143\n\n### l1filter pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01026\n     - 0.003677\n   * - 2\n     - 0.01657\n     - 0.008161\n   * - 4\n     - 0.02458\n     - 0.020018\n   * - 8\n     - 0.03498\n     - 0.025504\n   * - 16\n     - 0.06757\n     - 0.047523\n   * - 32\n     - 0.10487\n     - 0.086442\n\n### APoZ pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n   :header-rows: 1\n   :widths: auto\n\n   * - Times\n     - Mask Latency\n     - Speedup Latency\n   * - 1\n     - 0.01389\n     - 0.004208\n   * - 2\n     - 0.01628\n     - 0.008310\n   * - 4\n     - 0.02521\n     - 0.014008\n   * - 8\n     - 0.03386\n     - 0.023923\n   * - 16\n     - 0.06042\n     - 0.046183\n   * - 32\n     - 0.12421\n     - 0.087113\n\n### SimulatedAnnealing pruner example\n\nIn this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.\nWe measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.\nThe latency is measured on one V100 GPU and the input tensor is  ``torch.randn(128, 3, 32, 32)``.\n\n<img src=\"file://../../img/SA_latency_accuracy.png\">\n\n"
      ]
    }
  ],
@ -132,7 +121,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.8.8"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/pruning_speedup.py
+++ b/docs/source/tutorials/pruning_speedup.py
@ -65,7 +65,7 @@ print('Original Model - Elapsed Time : ', time.time() - start)

 # %%
 # Speedup the model and show the model structure after speedup.
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression.speedup import ModelSpeedup
 ModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()
 print(model)

@ -77,7 +77,7 @@ print('Speedup Model - Elapsed Time : ', time.time() - start)

 # %%
 # For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,
-# please refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.
+# please refer to :doc:`Pruning Quick Start <pruning_quick_start>`.
 #
 # NOTE: The current implementation supports PyTorch 1.3.1 or newer.
 #
--- a/docs/source/tutorials/pruning_speedup.py.md5
+++ b/docs/source/tutorials/pruning_speedup.py.md5
@ -1 +1 @@
-60334840999c86b64ff889ee9909a797
+e128a8e53fcc5368f479aa5a40aa2fe1
--- a/docs/source/tutorials/pruning_speedup.rst
+++ b/docs/source/tutorials/pruning_speedup.rst
@ -10,7 +10,7 @@
    .. note::
        :class: sphx-glr-download-link-note

-        Click :ref:`here <sphx_glr_download_tutorials_pruning_speedup.py>`
+        :ref:`Go to the end <sphx_glr_download_tutorials_pruning_speedup.py>`
        to download the full example code

 .. rst-class:: sphx-glr-example-title
@ -138,7 +138,7 @@ Roughly test the original model inference speed.

 .. code-block:: none

-    Original Model - Elapsed Time :  0.16419386863708496
+    Original Model - Elapsed Time :  2.3036391735076904



@ -151,7 +151,7 @@ Speedup the model and show the model structure after speedup.

 .. code-block:: default

-    from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+    from nni.compression.speedup import ModelSpeedup
    ModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()
    print(model)

@ -200,7 +200,7 @@ Roughly test the model after speedup inference speed.

 .. code-block:: none

-    Speedup Model - Elapsed Time :  0.0038301944732666016
+    Speedup Model - Elapsed Time :  0.09416508674621582



@ -371,7 +371,7 @@ The latency is measured on one V100 GPU and the input tensor is  ``torch.randn(1

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 0 minutes  16.241 seconds)
+   **Total running time of the script:** ( 0 minutes  10.330 seconds)


 .. _sphx_glr_download_tutorials_pruning_speedup.py:
@ -381,6 +381,8 @@ The latency is measured on one V100 GPU and the input tensor is  ``torch.randn(1
  .. container:: sphx-glr-footer sphx-glr-footer-example


+
+
    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: pruning_speedup.py <pruning_speedup.py>`
--- a/docs/source/tutorials/pruning_speedup_codeobj.pickle
+++ b/docs/source/tutorials/pruning_speedup_codeobj.pickle
--- a/docs/source/tutorials/quantization_bert_glue.ipynb
+++ b/docs/source/tutorials/quantization_bert_glue.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -116,7 +105,7 @@
      },
      "outputs": [],
      "source": [
-        "import nni\nfrom nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer\nfrom nni.contrib.compression.utils import TransformersEvaluator\n\ndef fake_quantize():\n    config_list = [{\n        'op_types': ['Linear'],\n        'op_names_re': ['bert.encoder.layer.{}'.format(i) for i in range(12)],\n        'target_names': ['weight', '_output_'],\n        'quant_dtype': 'int8',\n        'quant_scheme': 'affine',\n        'granularity': 'default',\n    }]\n\n    # create a finetune model\n    Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n    model: torch.nn.Module = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)  # type: ignore\n    traced_trainer = prepare_traced_trainer(model, is_quant=False)\n    evaluator = TransformersEvaluator(traced_trainer)\n    if quant_method == 'lsq':\n        quantizer = LsqQuantizer(model, config_list, evaluator)\n        model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n    elif quant_method == 'qat':\n        quantizer = QATQuantizer(model, config_list, evaluator, 1000)\n        model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n    elif quant_method == 'ptq':\n        quantizer = PtqQuantizer(model, config_list, evaluator)\n        model, calibration_config = quantizer.compress(max_steps=1, max_epochs=None)\n    else:\n        raise ValueError(f\"quantization method {quant_method} is not supported\")\n    print(calibration_config)\n    # evaluate the performance of the fake quantize model\n    quantizer.evaluator.bind_model(model, quantizer._get_param_names_map())\n    print(quantizer.evaluator.evaluate())\n\ndef evaluate():\n    model = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)\n    trainer = prepare_traced_trainer(model, is_quant=False)\n    metrics = trainer.evaluate()\n    print(f\"Evaluate metrics={metrics}\")\n\n\nfake_quantize()\nevaluate()"
+        "import nni\nfrom nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer\nfrom nni.compression.utils import TransformersEvaluator\n\ndef fake_quantize():\n    config_list = [{\n        'op_types': ['Linear'],\n        'op_names_re': ['bert.encoder.layer.{}'.format(i) for i in range(12)],\n        'target_names': ['weight', '_output_'],\n        'quant_dtype': 'int8',\n        'quant_scheme': 'affine',\n        'granularity': 'default',\n    }]\n\n    # create a finetune model\n    Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n    model: torch.nn.Module = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)  # type: ignore\n    traced_trainer = prepare_traced_trainer(model, is_quant=False)\n    evaluator = TransformersEvaluator(traced_trainer)\n    if quant_method == 'lsq':\n        quantizer = LsqQuantizer(model, config_list, evaluator)\n        model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n    elif quant_method == 'qat':\n        quantizer = QATQuantizer(model, config_list, evaluator, 1000)\n        model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n    elif quant_method == 'ptq':\n        quantizer = PtqQuantizer(model, config_list, evaluator)\n        model, calibration_config = quantizer.compress(max_steps=1, max_epochs=None)\n    else:\n        raise ValueError(f\"quantization method {quant_method} is not supported\")\n    print(calibration_config)\n    # evaluate the performance of the fake quantize model\n    quantizer.evaluator.bind_model(model, quantizer._get_param_names_map())\n    print(quantizer.evaluator.evaluate())\n\ndef evaluate():\n    model = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)\n    trainer = prepare_traced_trainer(model, is_quant=False)\n    metrics = trainer.evaluate()\n    print(f\"Evaluate metrics={metrics}\")\n\n\nskip_exec = True\nif not skip_exec:\n    fake_quantize()\n    evaluate()"
      ]
    },
    {
@ -143,7 +132,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.9.16"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/quantization_bert_glue.py
+++ b/docs/source/tutorials/quantization_bert_glue.py
@ -209,8 +209,8 @@ def build_finetuning_model(state_dict_path: str, is_quant=False):
 # 6. Call ``quantizer.compress(max_steps, max_epochs)`` to execute the simulated quantization process

 import nni
-from nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
-from nni.contrib.compression.utils import TransformersEvaluator
+from nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
+from nni.compression.utils import TransformersEvaluator

 def fake_quantize():
    config_list = [{
@ -250,8 +250,10 @@ def evaluate():
    print(f"Evaluate metrics={metrics}")


-fake_quantize()
-evaluate()
+skip_exec = True
+if not skip_exec:
+    fake_quantize()
+    evaluate()


 # %%
--- a/docs/source/tutorials/quantization_bert_glue.py.md5
+++ b/docs/source/tutorials/quantization_bert_glue.py.md5
@ -1 +1 @@
-ba05e89a27a4d771b22a3de6d5172778
+67e335e86718ed077e4997a9f0092ee3
--- a/docs/source/tutorials/quantization_bert_glue.rst
+++ b/docs/source/tutorials/quantization_bert_glue.rst
--- a/docs/source/tutorials/quantization_bert_glue_codeobj.pickle
+++ b/docs/source/tutorials/quantization_bert_glue_codeobj.pickle
--- a/docs/source/tutorials/quantization_quick_start.ipynb
+++ b/docs/source/tutorials/quantization_quick_start.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -123,7 +112,7 @@
      },
      "outputs": [],
      "source": [
-        "import nni\nfrom nni.contrib.compression.quantization import QATQuantizer\nfrom nni.contrib.compression.utils import TorchEvaluator\n\n\noptimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\nevaluator = TorchEvaluator(training_model, optimizer, training_step)  # type: ignore\n\nconfig_list = [{\n    'op_names': ['conv1', 'conv2', 'fc1', 'fc2'],\n    'target_names': ['_input_', 'weight', '_output_'],\n    'quant_dtype': 'int8',\n    'quant_scheme': 'affine',\n    'granularity': 'default',\n},{\n    'op_names': ['relu1', 'relu2', 'relu3'],\n    'target_names': ['_output_'],\n    'quant_dtype': 'int8',\n    'quant_scheme': 'affine',\n    'granularity': 'default',\n}]\n\nquantizer = QATQuantizer(model, config_list, evaluator, len(train_dataloader))\nreal_input = next(iter(train_dataloader))[0].to(device)\nquantizer.track_forward(real_input)\n\nstart = time.time()\n_, calibration_config = quantizer.compress(None, max_epochs=5)\nprint(f'pure training 5 epochs: {time.time() - start}s')\n\nprint(calibration_config)\nstart = time.time()\nacc = evaluating_model(model)\nprint(f'quantization evaluating: {time.time() - start}s    Acc.: {acc}')"
+        "import nni\nfrom nni.compression.quantization import QATQuantizer\nfrom nni.compression.utils import TorchEvaluator\n\n\noptimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\nevaluator = TorchEvaluator(training_model, optimizer, training_step)  # type: ignore\n\nconfig_list = [{\n    'op_names': ['conv1', 'conv2', 'fc1', 'fc2'],\n    'target_names': ['_input_', 'weight', '_output_'],\n    'quant_dtype': 'int8',\n    'quant_scheme': 'affine',\n    'granularity': 'default',\n},{\n    'op_names': ['relu1', 'relu2', 'relu3'],\n    'target_names': ['_output_'],\n    'quant_dtype': 'int8',\n    'quant_scheme': 'affine',\n    'granularity': 'default',\n}]\n\nquantizer = QATQuantizer(model, config_list, evaluator, len(train_dataloader))\nreal_input = next(iter(train_dataloader))[0].to(device)\nquantizer.track_forward(real_input)\n\nstart = time.time()\n_, calibration_config = quantizer.compress(None, max_epochs=5)\nprint(f'pure training 5 epochs: {time.time() - start}s')\n\nprint(calibration_config)\nstart = time.time()\nacc = evaluating_model(model)\nprint(f'quantization evaluating: {time.time() - start}s    Acc.: {acc}')"
      ]
    }
  ],
@ -143,7 +132,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.8.8"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/quantization_quick_start.py
+++ b/docs/source/tutorials/quantization_quick_start.py
@ -136,8 +136,8 @@ print(f'pure evaluating: {time.time() - start}s    Acc.: {acc}')
 # Detailed about how to write ``config_list`` please refer :doc:`Config Specification <../compression/config_list>`.

 import nni
-from nni.contrib.compression.quantization import QATQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import QATQuantizer
+from nni.compression.utils import TorchEvaluator


 optimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
--- a/docs/source/tutorials/quantization_quick_start.py.md5
+++ b/docs/source/tutorials/quantization_quick_start.py.md5
@ -1 +1 @@
-d3d1074e56626255e3e19ef2a2ff057f
+0eda6c780fb06aaecfc2e9c9e804d33a
--- a/docs/source/tutorials/quantization_quick_start.rst
+++ b/docs/source/tutorials/quantization_quick_start.rst
@ -10,7 +10,7 @@
    .. note::
        :class: sphx-glr-download-link-note

-        Click :ref:`here <sphx_glr_download_tutorials_quantization_quick_start.py>`
+        :ref:`Go to the end <sphx_glr_download_tutorials_quantization_quick_start.py>`
        to download the full example code

 .. rst-class:: sphx-glr-example-title
@ -123,6 +123,31 @@ Create training and evaluation dataloader



+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
+    Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/mnist/MNIST/raw/train-images-idx3-ubyte.gz
+
      0%|          | 0/9912422 [00:00<?, ?it/s]
    100%|##########| 9912422/9912422 [00:00<00:00, 110174318.21it/s]
+    Extracting data/mnist/MNIST/raw/train-images-idx3-ubyte.gz to data/mnist/MNIST/raw
+
+    Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
+    Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz
+
      0%|          | 0/28881 [00:00<?, ?it/s]
    100%|##########| 28881/28881 [00:00<00:00, 91839040.05it/s]
+    Extracting data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz to data/mnist/MNIST/raw
+
+    Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
+    Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
+
      0%|          | 0/1648877 [00:00<?, ?it/s]
    100%|##########| 1648877/1648877 [00:00<00:00, 26703211.30it/s]
+    Extracting data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to data/mnist/MNIST/raw
+
+    Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
+    Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
+
      0%|          | 0/4542 [00:00<?, ?it/s]
    100%|##########| 4542/4542 [00:00<00:00, 63081221.09it/s]
+    Extracting data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/mnist/MNIST/raw
+
+



@ -217,8 +242,8 @@ Pre-train and evaluate the model on MNIST dataset
    Epoch 2 start!
    Epoch 3 start!
    Epoch 4 start!
-    pure training 5 epochs: 71.90893840789795s
-    pure evaluating: 1.6302893161773682s    Acc.: 0.9908
+    pure training 5 epochs: 62.24345350265503s
+    pure evaluating: 1.5607831478118896s    Acc.: 0.9906



@ -237,8 +262,8 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat


    import nni
-    from nni.contrib.compression.quantization import QATQuantizer
-    from nni.contrib.compression.utils import TorchEvaluator
+    from nni.compression.quantization import QATQuantizer
+    from nni.compression.utils import TorchEvaluator


    optimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
@ -282,9 +307,9 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
    Epoch 2 start!
    Epoch 3 start!
    Epoch 4 start!
-    pure training 5 epochs: 117.75990748405457s
-    defaultdict(<class 'dict'>, {'fc2': {'weight': {'scale': tensor(0.0020), 'zero_point': tensor(-8.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.2640), 'tracked_min': tensor(-0.2319)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.1541), 'zero_point': tensor(-39.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(25.6346), 'tracked_min': tensor(-13.5170)}}, 'conv1': {'weight': {'scale': tensor(0.0023), 'zero_point': tensor(-12.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3128), 'tracked_min': tensor(-0.2606)}, '_input_0': {'scale': tensor(0.0128), 'zero_point': tensor(-94.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(2.8215), 'tracked_min': tensor(-0.4242)}, '_output_0': {'scale': tensor(0.0265), 'zero_point': tensor(-5.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.4957), 'tracked_min': tensor(-3.2373)}}, 'fc1': {'weight': {'scale': tensor(0.0007), 'zero_point': tensor(3.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.0894), 'tracked_min': tensor(-0.0943)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0678), 'zero_point': tensor(-8.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(9.1579), 'tracked_min': tensor(-8.0707)}}, 'conv2': {'weight': {'scale': tensor(0.0012), 'zero_point': tensor(-35.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.1927), 'tracked_min': tensor(-0.1097)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9995), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0893), 'zero_point': tensor(2.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.1702), 'tracked_min': tensor(-11.5212)}}, 'relu3': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu2': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu1': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9996), 'tracked_min': tensor(0.)}}})
-    quantization evaluating: 1.6024222373962402s    Acc.: 0.9915
+    pure training 5 epochs: 94.30406522750854s
+    defaultdict(<class 'dict'>, {'fc1': {'weight': {'scale': tensor(0.0007), 'zero_point': tensor(6.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.0897), 'tracked_min': tensor(-0.0992)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0648), 'zero_point': tensor(3.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(8.0606), 'tracked_min': tensor(-8.4004)}}, 'fc2': {'weight': {'scale': tensor(0.0018), 'zero_point': tensor(-5.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.2388), 'tracked_min': tensor(-0.2198)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.1514), 'zero_point': tensor(-35.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(24.4862), 'tracked_min': tensor(-13.9780)}}, 'conv1': {'weight': {'scale': tensor(0.0027), 'zero_point': tensor(11.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3176), 'tracked_min': tensor(-0.3750)}, '_input_0': {'scale': tensor(0.0128), 'zero_point': tensor(-94.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(2.8215), 'tracked_min': tensor(-0.4242)}, '_output_0': {'scale': tensor(0.0261), 'zero_point': tensor(4.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.2271), 'tracked_min': tensor(-3.4134)}}, 'conv2': {'weight': {'scale': tensor(0.0011), 'zero_point': tensor(-24.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.1707), 'tracked_min': tensor(-0.1165)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9999), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0900), 'zero_point': tensor(1.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.3434), 'tracked_min': tensor(-11.5140)}}, 'relu2': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu1': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.0000), 'tracked_min': tensor(0.)}}, 'relu3': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}})
+    quantization evaluating: 1.3835649490356445s    Acc.: 0.9912



@ -292,7 +317,7 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 3 minutes  22.673 seconds)
+   **Total running time of the script:** ( 2 minutes  40.255 seconds)


 .. _sphx_glr_download_tutorials_quantization_quick_start.py:
@ -302,6 +327,8 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
  .. container:: sphx-glr-footer sphx-glr-footer-example


+
+
    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: quantization_quick_start.py <quantization_quick_start.py>`
--- a/docs/source/tutorials/quantization_quick_start_codeobj.pickle
+++ b/docs/source/tutorials/quantization_quick_start_codeobj.pickle
--- a/docs/source/tutorials/quantization_speedup.ipynb
+++ b/docs/source/tutorials/quantization_speedup.ipynb
@ -1,16 +1,5 @@
 {
  "cells": [
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "collapsed": false
-      },
-      "outputs": [],
-      "source": [
-        "%matplotlib inline"
-      ]
-    },
    {
      "cell_type": "markdown",
      "metadata": {},
@ -33,7 +22,7 @@
      },
      "outputs": [],
      "source": [
-        "import torch\nimport torchvision\nimport torchvision.transforms as transforms\ndef prepare_data_loaders(data_path, batch_size):\n    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n                                     std=[0.229, 0.224, 0.225])\n    dataset = torchvision.datasets.ImageNet(\n        data_path, split=\"train\",\n        transform=transforms.Compose([\n            transforms.Resize(256),\n            transforms.CenterCrop(224),\n            transforms.ToTensor(),\n            normalize,\n        ]))\n\n    sampler = torch.utils.data.SequentialSampler(dataset)\n    data_loader = torch.utils.data.DataLoader(\n        dataset, batch_size=batch_size,\n        sampler=sampler)\n    return data_loader\n\ndata_path = '/data' # replace it with your path of ImageNet dataset\ndata_loader = prepare_data_loaders(data_path, batch_size=128)\ncalib_data = None\nfor image, target in data_loader:\n    calib_data = image.numpy()\n    break\n\nfrom nni.compression.pytorch.quantization_speedup.calibrator import Calibrator\n# TensorRT processes the calibration data in the batch size of 64\ncalib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)"
+        "import torch\nimport torchvision\nimport torchvision.transforms as transforms\n\n\nskip_exec = True\n\nif not skip_exec:\n\n    def prepare_data_loaders(data_path, batch_size):\n        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n                                        std=[0.229, 0.224, 0.225])\n        dataset = torchvision.datasets.ImageNet(\n            data_path, split=\"train\",\n            transform=transforms.Compose([\n                transforms.Resize(256),\n                transforms.CenterCrop(224),\n                transforms.ToTensor(),\n                normalize,\n            ]))\n\n        sampler = torch.utils.data.SequentialSampler(dataset)\n        data_loader = torch.utils.data.DataLoader(\n            dataset, batch_size=batch_size,\n            sampler=sampler)\n        return data_loader\n\n    data_path = '/data'  # replace it with your path of ImageNet dataset\n    data_loader = prepare_data_loaders(data_path, batch_size=128)\n    calib_data = None\n    for image, target in data_loader:\n        calib_data = image.numpy()\n        break\n\n    from nni.compression.quantization_speedup.calibrator import Calibrator\n    # TensorRT processes the calibration data in the batch size of 64\n    calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)"
      ]
    },
    {
@ -51,7 +40,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni_assets.compression.mobilenetv2 import MobileNetV2\nmodel = MobileNetV2()\n# a checkpoint of MobileNetV2 can be found here\n# https://download.pytorch.org/models/mobilenet_v2-b0353104.pth\nfloat_model_file = 'mobilenet_pretrained_float.pth'\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()"
+        "if not skip_exec:\n    from nni_assets.compression.mobilenetv2 import MobileNetV2\n    model = MobileNetV2()\n    # a checkpoint of MobileNetV2 can be found here\n    # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth\n    float_model_file = 'mobilenet_pretrained_float.pth'\n    state_dict = torch.load(float_model_file)\n    model.load_state_dict(state_dict)\n    model.eval()"
      ]
    },
    {
@ -69,7 +58,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT\n# input shape is used for converting to onnx\nengine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))\nengine.compress_with_calibrator(calib)"
+        "if not skip_exec:\n    from nni.compression.quantization_speedup import ModelSpeedupTensorRT\n    # input shape is used for converting to onnx\n    engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))\n    engine.compress_with_calibrator(calib)"
      ]
    },
    {
@ -87,7 +76,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy\nimport time\ndef test_accelerated_model(engine, data_loader, neval_batches):\n    top1 = AverageMeter('Acc@1', ':6.2f')\n    top5 = AverageMeter('Acc@5', ':6.2f')\n    cnt = 0\n    total_time = 0\n    for image, target in data_loader:\n        start_time = time.time()\n        output, time_span = engine.inference(image)\n        infer_time = time.time() - start_time\n        print('time: ', time_span, infer_time)\n        total_time += time_span\n\n        start_time = time.time()\n        output = output.view(-1, 1000)\n        cnt += 1\n        acc1, acc5 = accuracy(output, target, topk=(1, 5))\n        top1.update(acc1[0], image.size(0))\n        top5.update(acc5[0], image.size(0))\n        rest_time = time.time() - start_time\n        print('rest time: ', rest_time)\n        if cnt >= neval_batches:\n            break\n    print('inference time: ', total_time / neval_batches)\n    return top1, top5\n\ndata_loader = prepare_data_loaders(data_path, batch_size=64)\ntop1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\nprint('Accuracy of mode #1: ', top1, top5)\n\n\"\"\"\n\nMode #2: Using TensorRT as a pure acceleration backend\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this mode, the post-training quantization within TensorRT is not used, instead, the quantization bit-width and the range of tensor values are fed into TensorRT for speedup (i.e., with `trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS` configured).\n\n\"\"\""
+        "if not skip_exec:\n    from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy\n    import time\n\n    def test_accelerated_model(engine, data_loader, neval_batches):\n        top1 = AverageMeter('Acc@1', ':6.2f')\n        top5 = AverageMeter('Acc@5', ':6.2f')\n        cnt = 0\n        total_time = 0\n        for image, target in data_loader:\n            start_time = time.time()\n            output, time_span = engine.inference(image)\n            infer_time = time.time() - start_time\n            print('time: ', time_span, infer_time)\n            total_time += time_span\n\n            start_time = time.time()\n            output = output.view(-1, 1000)\n            cnt += 1\n            acc1, acc5 = accuracy(output, target, topk=(1, 5))\n            top1.update(acc1[0], image.size(0))\n            top5.update(acc5[0], image.size(0))\n            rest_time = time.time() - start_time\n            print('rest time: ', rest_time)\n            if cnt >= neval_batches:\n                break\n        print('inference time: ', total_time / neval_batches)\n        return top1, top5\n\n    data_loader = prepare_data_loaders(data_path, batch_size=64)\n    top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\n    print('Accuracy of mode #1: ', top1, top5)\n\n\"\"\"\n\nMode #2: Using TensorRT as a pure acceleration backend\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this mode, the post-training quantization within TensorRT is not used, instead, the quantization bit-width and the range of tensor values are fed into TensorRT for speedup (i.e., with `trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS` configured).\n\n\"\"\""
      ]
    },
    {
@ -105,7 +94,7 @@
      },
      "outputs": [],
      "source": [
-        "model = MobileNetV2()\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()\ndevice = torch.device('cuda')\nmodel.to(device)"
+        "if not skip_exec:\n    model = MobileNetV2()\n    state_dict = torch.load(float_model_file)\n    model.load_state_dict(state_dict)\n    model.eval()\n    device = torch.device('cuda')\n    model.to(device)"
      ]
    },
    {
@ -123,7 +112,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni_assets.compression.mobilenetv2 import evaluate\nfrom nni.compression.pytorch.utils import TorchEvaluator\ndata_loader = prepare_data_loaders(data_path, batch_size=128)\ndef eval_for_calibration(model):\n    evaluate(model, data_loader,\n                neval_batches=1, device=device)\n\ndummy_input = torch.Tensor(64, 3, 224, 224).to(device)\npredict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)"
+        "if not skip_exec:\n    from nni_assets.compression.mobilenetv2 import evaluate\n    from nni.compression.utils import TorchEvaluator\n    data_loader = prepare_data_loaders(data_path, batch_size=128)\n\n    def eval_for_calibration(model):\n        evaluate(model, data_loader, neval_batches=1, device=device)\n\n    dummy_input = torch.Tensor(64, 3, 224, 224).to(device)\n    predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)"
      ]
    },
    {
@ -141,7 +130,7 @@
      },
      "outputs": [],
      "source": [
-        "from nni.compression.pytorch.quantization import PtqQuantizer\nconfig_list = [{\n    'quant_types': ['input', 'weight', 'output'],\n    'quant_bits': {'input': 8, 'weight': 8, 'output': 8},\n    'quant_dtype': 'int',\n    'quant_scheme': 'per_tensor_symmetric',\n    'op_types': ['default']\n}]\nquantizer = PtqQuantizer(model, config_list, predict_func, True)\nquantizer.compress()\ncalibration_config = quantizer.export_model()\nprint('quant result config: ', calibration_config)"
+        "from nni.compression.quantization import PtqQuantizer\nif not skip_exec:\n    config_list = [{\n        'quant_types': ['input', 'weight', 'output'],\n        'quant_bits': {'input': 8, 'weight': 8, 'output': 8},\n        'quant_dtype': 'int',\n        'quant_scheme': 'per_tensor_symmetric',\n        'op_types': ['default']\n    }]\n    quantizer = PtqQuantizer(model, config_list, predict_func, True)\n    quantizer.compress()\n    calibration_config = quantizer.export_model()\n    print('quant result config: ', calibration_config)"
      ]
    },
    {
@ -159,7 +148,7 @@
      },
      "outputs": [],
      "source": [
-        "model = MobileNetV2()\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()\n\nengine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)\nengine.compress()\ndata_loader = prepare_data_loaders(data_path, batch_size=64)\ntop1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\nprint('Accuracy of mode #2: ', top1, top5)"
+        "if not skip_exec:\n    model = MobileNetV2()\n    state_dict = torch.load(float_model_file)\n    model.load_state_dict(state_dict)\n    model.eval()\n\n    engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)\n    engine.compress()\n    data_loader = prepare_data_loaders(data_path, batch_size=64)\n    top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\n    print('Accuracy of mode #2: ', top1, top5)"
      ]
    }
  ],
@ -179,7 +168,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.8.13"
+      "version": "3.10.11"
    }
  },
  "nbformat": 4,
--- a/docs/source/tutorials/quantization_speedup.py
+++ b/docs/source/tutorials/quantization_speedup.py
@ -33,85 +33,95 @@ As TensorRT has supported post-training quantization, directly leveraging this f
 import torch
 import torchvision
 import torchvision.transforms as transforms
-def prepare_data_loaders(data_path, batch_size):
-    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
-                                     std=[0.229, 0.224, 0.225])
-    dataset = torchvision.datasets.ImageNet(
-        data_path, split="train",
-        transform=transforms.Compose([
-            transforms.Resize(256),
-            transforms.CenterCrop(224),
-            transforms.ToTensor(),
-            normalize,
-        ]))

-    sampler = torch.utils.data.SequentialSampler(dataset)
-    data_loader = torch.utils.data.DataLoader(
-        dataset, batch_size=batch_size,
-        sampler=sampler)
-    return data_loader

-data_path = '/data' # replace it with your path of ImageNet dataset
-data_loader = prepare_data_loaders(data_path, batch_size=128)
-calib_data = None
-for image, target in data_loader:
-    calib_data = image.numpy()
-    break
+skip_exec = True

-from nni.compression.pytorch.quantization_speedup.calibrator import Calibrator
-# TensorRT processes the calibration data in the batch size of 64
-calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)
+if not skip_exec:
+
+    def prepare_data_loaders(data_path, batch_size):
+        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                                        std=[0.229, 0.224, 0.225])
+        dataset = torchvision.datasets.ImageNet(
+            data_path, split="train",
+            transform=transforms.Compose([
+                transforms.Resize(256),
+                transforms.CenterCrop(224),
+                transforms.ToTensor(),
+                normalize,
+            ]))
+
+        sampler = torch.utils.data.SequentialSampler(dataset)
+        data_loader = torch.utils.data.DataLoader(
+            dataset, batch_size=batch_size,
+            sampler=sampler)
+        return data_loader
+
+    data_path = '/data'  # replace it with your path of ImageNet dataset
+    data_loader = prepare_data_loaders(data_path, batch_size=128)
+    calib_data = None
+    for image, target in data_loader:
+        calib_data = image.numpy()
+        break
+
+    from nni.compression.quantization_speedup.calibrator import Calibrator
+    # TensorRT processes the calibration data in the batch size of 64
+    calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)

 # %%
 # Prepare the float32 model MobileNetV2
-from nni_assets.compression.mobilenetv2 import MobileNetV2
-model = MobileNetV2()
-# a checkpoint of MobileNetV2 can be found here
-# https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
-float_model_file = 'mobilenet_pretrained_float.pth'
-state_dict = torch.load(float_model_file)
-model.load_state_dict(state_dict)
-model.eval()
+if not skip_exec:
+    from nni_assets.compression.mobilenetv2 import MobileNetV2
+    model = MobileNetV2()
+    # a checkpoint of MobileNetV2 can be found here
+    # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
+    float_model_file = 'mobilenet_pretrained_float.pth'
+    state_dict = torch.load(float_model_file)
+    model.load_state_dict(state_dict)
+    model.eval()

 # %%
 # Speed up the model with TensorRT
-from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
-# input shape is used for converting to onnx
-engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))
-engine.compress_with_calibrator(calib)
+if not skip_exec:
+    from nni.compression.quantization_speedup import ModelSpeedupTensorRT
+    # input shape is used for converting to onnx
+    engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))
+    engine.compress_with_calibrator(calib)

 # %%
 # Test the accuracy of the accelerated model
-from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy
-import time
-def test_accelerated_model(engine, data_loader, neval_batches):
-    top1 = AverageMeter('Acc@1', ':6.2f')
-    top5 = AverageMeter('Acc@5', ':6.2f')
-    cnt = 0
-    total_time = 0
-    for image, target in data_loader:
-        start_time = time.time()
-        output, time_span = engine.inference(image)
-        infer_time = time.time() - start_time
-        print('time: ', time_span, infer_time)
-        total_time += time_span
+if not skip_exec:
+    from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy
+    import time

-        start_time = time.time()
-        output = output.view(-1, 1000)
-        cnt += 1
-        acc1, acc5 = accuracy(output, target, topk=(1, 5))
-        top1.update(acc1[0], image.size(0))
-        top5.update(acc5[0], image.size(0))
-        rest_time = time.time() - start_time
-        print('rest time: ', rest_time)
-        if cnt >= neval_batches:
-            break
-    print('inference time: ', total_time / neval_batches)
-    return top1, top5
+    def test_accelerated_model(engine, data_loader, neval_batches):
+        top1 = AverageMeter('Acc@1', ':6.2f')
+        top5 = AverageMeter('Acc@5', ':6.2f')
+        cnt = 0
+        total_time = 0
+        for image, target in data_loader:
+            start_time = time.time()
+            output, time_span = engine.inference(image)
+            infer_time = time.time() - start_time
+            print('time: ', time_span, infer_time)
+            total_time += time_span

-data_loader = prepare_data_loaders(data_path, batch_size=64)
-top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
-print('Accuracy of mode #1: ', top1, top5)
+            start_time = time.time()
+            output = output.view(-1, 1000)
+            cnt += 1
+            acc1, acc5 = accuracy(output, target, topk=(1, 5))
+            top1.update(acc1[0], image.size(0))
+            top5.update(acc5[0], image.size(0))
+            rest_time = time.time() - start_time
+            print('rest time: ', rest_time)
+            if cnt >= neval_batches:
+                break
+        print('inference time: ', total_time / neval_batches)
+        return top1, top5
+
+    data_loader = prepare_data_loaders(data_path, batch_size=64)
+    top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
+    print('Accuracy of mode #1: ', top1, top5)

 """

@ -124,41 +134,44 @@ In this mode, the post-training quantization within TensorRT is not used, instea

 # %%
 # re-instantiate the MobileNetV2 model
-model = MobileNetV2()
-state_dict = torch.load(float_model_file)
-model.load_state_dict(state_dict)
-model.eval()
-device = torch.device('cuda')
-model.to(device)
+if not skip_exec:
+    model = MobileNetV2()
+    state_dict = torch.load(float_model_file)
+    model.load_state_dict(state_dict)
+    model.eval()
+    device = torch.device('cuda')
+    model.to(device)

 # %%
 # Prepare Evaluator for PtqQuantizer
 # PtqQuantizer uses eval_for_calibration to collect calibration data 
 # in the current setting, it handles 128 samples
-from nni_assets.compression.mobilenetv2 import evaluate
-from nni.compression.pytorch.utils import TorchEvaluator
-data_loader = prepare_data_loaders(data_path, batch_size=128)
-def eval_for_calibration(model):
-    evaluate(model, data_loader,
-                neval_batches=1, device=device)
+if not skip_exec:
+    from nni_assets.compression.mobilenetv2 import evaluate
+    from nni.compression.utils import TorchEvaluator
+    data_loader = prepare_data_loaders(data_path, batch_size=128)

-dummy_input = torch.Tensor(64, 3, 224, 224).to(device)
-predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)
+    def eval_for_calibration(model):
+        evaluate(model, data_loader, neval_batches=1, device=device)
+
+    dummy_input = torch.Tensor(64, 3, 224, 224).to(device)
+    predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)

 # %%
 # Use PtqQuantizer to quantize the model
-from nni.compression.pytorch.quantization import PtqQuantizer
-config_list = [{
-    'quant_types': ['input', 'weight', 'output'],
-    'quant_bits': {'input': 8, 'weight': 8, 'output': 8},
-    'quant_dtype': 'int',
-    'quant_scheme': 'per_tensor_symmetric',
-    'op_types': ['default']
-}]
-quantizer = PtqQuantizer(model, config_list, predict_func, True)
-quantizer.compress()
-calibration_config = quantizer.export_model()
-print('quant result config: ', calibration_config)
+from nni.compression.quantization import PtqQuantizer
+if not skip_exec:
+    config_list = [{
+        'quant_types': ['input', 'weight', 'output'],
+        'quant_bits': {'input': 8, 'weight': 8, 'output': 8},
+        'quant_dtype': 'int',
+        'quant_scheme': 'per_tensor_symmetric',
+        'op_types': ['default']
+    }]
+    quantizer = PtqQuantizer(model, config_list, predict_func, True)
+    quantizer.compress()
+    calibration_config = quantizer.export_model()
+    print('quant result config: ', calibration_config)

 # %%
 # Speed up the quantized model following the generated calibration_config
@ -166,13 +179,14 @@ print('quant result config: ', calibration_config)
 # after applying bn folding. bn folding changes the models structure and weights.
 # As TensorRT does bn folding by itself, we should input an original model to it.
 # For simplicity, we re-instantiate a new model.
-model = MobileNetV2()
-state_dict = torch.load(float_model_file)
-model.load_state_dict(state_dict)
-model.eval()
+if not skip_exec:
+    model = MobileNetV2()
+    state_dict = torch.load(float_model_file)
+    model.load_state_dict(state_dict)
+    model.eval()

-engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)
-engine.compress()
-data_loader = prepare_data_loaders(data_path, batch_size=64)
-top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
-print('Accuracy of mode #2: ', top1, top5)
+    engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)
+    engine.compress()
+    data_loader = prepare_data_loaders(data_path, batch_size=64)
+    top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
+    print('Accuracy of mode #2: ', top1, top5)
--- a/docs/source/tutorials/quantization_speedup.py.md5
+++ b/docs/source/tutorials/quantization_speedup.py.md5
@ -1 +1 @@
-19e925997289f729983ff4d5ac76c89f
+d364a206afed723ad9006f2a7035c00c
--- a/docs/source/tutorials/quantization_speedup.rst
+++ b/docs/source/tutorials/quantization_speedup.rst
--- a/docs/source/tutorials/quantization_speedup_codeobj.pickle
+++ b/docs/source/tutorials/quantization_speedup_codeobj.pickle
--- a/docs/source/tutorials/sg_execution_times.rst
+++ b/docs/source/tutorials/sg_execution_times.rst
@ -3,12 +3,15 @@

 .. _sphx_glr_tutorials_sg_execution_times:

+
 Computation times
 =================
-**03:22.673** total execution time for **tutorials** files:
+**00:03.512** total execution time for **tutorials** files:

 +-----------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_quantization_quick_start.py` (``quantization_quick_start.py``) | 03:22.673 | 0.0 MB |
+| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``)         | 00:03.504 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_tutorials_quantization_bert_glue.py` (``quantization_bert_glue.py``)     | 00:00.009 | 0.0 MB |
 +-----------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorials_darts.py` (``darts.py``)                                       | 00:00.000 | 0.0 MB |
 +-----------------------------------------------------------------------------------------+-----------+--------+
@ -22,7 +25,5 @@ Computation times
 +-----------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorials_pruning_speedup.py` (``pruning_speedup.py``)                   | 00:00.000 | 0.0 MB |
 +-----------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_quantization_bert_glue.py` (``quantization_bert_glue.py``)     | 00:00.000 | 0.0 MB |
-+-----------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``)         | 00:00.000 | 0.0 MB |
+| :ref:`sphx_glr_tutorials_quantization_quick_start.py` (``quantization_quick_start.py``) | 00:00.000 | 0.0 MB |
 +-----------------------------------------------------------------------------------------+-----------+--------+
--- a/examples/compression/evaluator/lightning_evaluator.py
+++ b/examples/compression/evaluator/lightning_evaluator.py
@ -74,7 +74,7 @@ class MyModule(pl.LightningModule):
 class MyDataModule(pl.LightningDataModule):
    pass

-from nni.contrib.compression import LightningEvaluator
+from nni.compression import LightningEvaluator

 pl_trainer = nni.trace(pl.Trainer)(
    accelerator='auto',
--- a/examples/compression/evaluator/torch_evaluator.py
+++ b/examples/compression/evaluator/torch_evaluator.py
@ -90,6 +90,6 @@ def training_step(batch: Any, model: torch.nn.Module, *args, **kwargs):
 # Init ``TorchEvaluator``
 # -----------------------

-from nni.contrib.compression import TorchEvaluator
+from nni.compression import TorchEvaluator

 evaluator = TorchEvaluator(training_func, optimizer, training_step, lr_scheduler)
--- a/examples/compression/fusion/pd_fuse.py
+++ b/examples/compression/fusion/pd_fuse.py
@ -19,11 +19,11 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression import TorchEvaluator
-from nni.contrib.compression.distillation import DynamicLayerwiseDistiller
-from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression import TorchEvaluator
+from nni.compression.distillation import DynamicLayerwiseDistiller
+from nni.compression.pruning import TaylorPruner, AGPPruner
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup


 if __name__ == '__main__':
--- a/examples/compression/fusion/pqd_fuse.py
+++ b/examples/compression/fusion/pqd_fuse.py
@ -19,13 +19,13 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression import TorchEvaluator
-from nni.contrib.compression.base.compressor import Quantizer
-from nni.contrib.compression.distillation import DynamicLayerwiseDistiller
-from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
-from nni.contrib.compression.quantization import QATQuantizer
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression import TorchEvaluator
+from nni.compression.base.compressor import Quantizer
+from nni.compression.distillation import DynamicLayerwiseDistiller
+from nni.compression.pruning import TaylorPruner, AGPPruner
+from nni.compression.quantization import QATQuantizer
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup


 if __name__ == '__main__':
--- a/examples/compression/pruning/norm_pruning.py
+++ b/examples/compression/pruning/norm_pruning.py
@ -13,13 +13,13 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression.pruning import (
+from nni.compression.pruning import (
    L1NormPruner,
    L2NormPruner,
    FPGMPruner
 )
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup

 prune_type = 'l1'

--- a/examples/compression/pruning/scheduled_pruning.py
+++ b/examples/compression/pruning/scheduled_pruning.py
@ -13,10 +13,10 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression import TorchEvaluator
-from nni.contrib.compression.pruning import TaylorPruner, LinearPruner, AGPPruner
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression import TorchEvaluator
+from nni.compression.pruning import TaylorPruner, LinearPruner, AGPPruner
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup

 schedule_type = 'agp'

--- a/examples/compression/pruning/slim_pruning.py
+++ b/examples/compression/pruning/slim_pruning.py
@ -13,10 +13,10 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression import TorchEvaluator
-from nni.contrib.compression.pruning import SlimPruner
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression import TorchEvaluator
+from nni.compression.pruning import SlimPruner
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup


 if __name__ == '__main__':
--- a/examples/compression/pruning/taylor_pruning.py
+++ b/examples/compression/pruning/taylor_pruning.py
@ -13,10 +13,10 @@ from examples.compression.models import (
    device
 )

-from nni.contrib.compression import TorchEvaluator
-from nni.contrib.compression.pruning import TaylorPruner
-from nni.contrib.compression.utils import auto_set_denpendency_group_ids
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression import TorchEvaluator
+from nni.compression.pruning import TaylorPruner
+from nni.compression.utils import auto_set_denpendency_group_ids
+from nni.compression.speedup import ModelSpeedup


 if __name__ == '__main__':
--- a/examples/compression/quantization/bnn_example.py
+++ b/examples/compression/quantization/bnn_example.py
@ -11,8 +11,8 @@ from torch.utils.data import DataLoader
 from torchvision import datasets, transforms

 import nni
-from nni.contrib.compression.quantization import BNNQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import BNNQuantizer
+from nni.compression.utils import TorchEvaluator
 from nni.common.types import SCHEDULER


--- a/examples/compression/quantization/dorefa_example.py
+++ b/examples/compression/quantization/dorefa_example.py
@ -13,8 +13,8 @@ from torch import Tensor
 from torchvision import datasets, transforms

 import nni
-from nni.contrib.compression.quantization import DoReFaQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import DoReFaQuantizer
+from nni.compression.utils import TorchEvaluator
 from nni.common.types import SCHEDULER


--- a/examples/compression/quantization/lsq_example.py
+++ b/examples/compression/quantization/lsq_example.py
@ -13,8 +13,8 @@ from torch import Tensor
 from torchvision import datasets, transforms

 import nni
-from nni.contrib.compression.quantization import LsqQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import LsqQuantizer
+from nni.compression.utils import TorchEvaluator
 from nni.common.types import SCHEDULER

 torch.manual_seed(0)
--- a/examples/compression/quantization/ptq_example.py
+++ b/examples/compression/quantization/ptq_example.py
@ -12,8 +12,8 @@ from torch import Tensor
 from torchvision import datasets, transforms

 import nni
-from nni.contrib.compression.quantization import PtqQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import PtqQuantizer
+from nni.compression.utils import TorchEvaluator
 from nni.common.types import SCHEDULER


--- a/examples/compression/quantization/qat_example.py
+++ b/examples/compression/quantization/qat_example.py
@ -14,8 +14,8 @@ from torchvision import transforms
 from torchvision.datasets import MNIST

 import nni
-from nni.contrib.compression.quantization import QATQuantizer
-from nni.contrib.compression.utils import TorchEvaluator
+from nni.compression.quantization import QATQuantizer
+from nni.compression.utils import TorchEvaluator
 from nni.common.types import SCHEDULER


--- a/examples/model_compress/.gitignore
+++ b/examples/model_compress/.gitignore
@ -1,8 +0,0 @@
-.pth
-.tar.gz
-data/
-MNIST/
-cifar-10-batches-py/
-experiment_data/
-pruning/models
-pruning/pruning_log
--- a/examples/model_compress/README.md
+++ b/examples/model_compress/README.md
@ -1,7 +0,0 @@
-# Examples
-
-This folder contains a large number of examples of old versions of compression.
-If you find that some examples are invalid, please contact us.
-This folder will be deleted around NNI 3.2.
-
-The new version examples is under `examples/compression`.
--- a/examples/model_compress/auto_compress/torch/auto_compress_module.py
+++ b/examples/model_compress/auto_compress/torch/auto_compress_module.py
@ -1,129 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from typing import Callable, Optional, Iterable
-
-import torch
-import torch.nn as nn
-import torch.optim as optim
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.auto_compress import AbstractAutoCompressionModule
-
-torch.manual_seed(1)
-
-class LeNet(nn.Module):
-    def __init__(self):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 32, 3, 1)
-        self.conv2 = nn.Conv2d(32, 64, 3, 1)
-        self.dropout1 = nn.Dropout2d(0.25)
-        self.dropout2 = nn.Dropout2d(0.5)
-        self.fc1 = nn.Linear(9216, 128)
-        self.fc2 = nn.Linear(128, 10)
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = F.relu(x)
-        x = self.conv2(x)
-        x = F.relu(x)
-        x = F.max_pool2d(x, 2)
-        x = self.dropout1(x)
-        x = torch.flatten(x, 1)
-        x = self.fc1(x)
-        x = F.relu(x)
-        x = self.dropout2(x)
-        x = self.fc2(x)
-        output = F.log_softmax(x, dim=1)
-        return output
-
-_use_cuda = torch.cuda.is_available()
-
-_train_kwargs = {'batch_size': 64}
-_test_kwargs = {'batch_size': 1000}
-if _use_cuda:
-    _cuda_kwargs = {'num_workers': 1,
-                    'pin_memory': True,
-                    'shuffle': True}
-    _train_kwargs.update(_cuda_kwargs)
-    _test_kwargs.update(_cuda_kwargs)
-
-_transform = transforms.Compose([
-    transforms.ToTensor(),
-    transforms.Normalize((0.1307,), (0.3081,))
-])
-
-_device = torch.device("cuda" if _use_cuda else "cpu")
-
-_train_loader = None
-_test_loader = None
-
-def _train(model, optimizer, criterion, epoch):
-    global _train_loader
-    if _train_loader is None:
-        dataset = datasets.MNIST('./data', train=True, download=True, transform=_transform)
-        _train_loader = torch.utils.data.DataLoader(dataset, **_train_kwargs)
-    model.train()
-    for data, target in _train_loader:
-        data, target = data.to(_device), target.to(_device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def _test(model):
-    global _test_loader
-    if _test_loader is None:
-        dataset = datasets.MNIST('./data', train=False, transform=_transform)
-        _test_loader = torch.utils.data.DataLoader(dataset, **_test_kwargs)
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in _test_loader:
-            data, target = data.to(_device), target.to(_device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(_test_loader.dataset)
-    acc = 100 * correct / len(_test_loader.dataset)
-    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
-        test_loss, correct, len(_test_loader.dataset), acc))
-    return acc
-
-_model = LeNet().to(_device)
-_model.load_state_dict(torch.load('mnist_pretrain_lenet.pth'))
-
-class AutoCompressionModule(AbstractAutoCompressionModule):
-    @classmethod
-    def model(cls) -> nn.Module:
-        return _model
-
-    @classmethod
-    def evaluator(cls) -> Callable[[nn.Module], float]:
-        return _test
-
-    @classmethod
-    def optimizer_factory(cls) -> Optional[Callable[[Iterable], optim.Optimizer]]:
-        def _optimizer_factory(params: Iterable):
-            return torch.optim.SGD(params, lr=0.01)
-        return _optimizer_factory
-
-    @classmethod
-    def criterion(cls) -> Optional[Callable]:
-        return F.nll_loss
-
-    @classmethod
-    def sparsifying_trainer(cls, compress_algorithm_name: str) -> Optional[Callable[[nn.Module, optim.Optimizer, Callable, int], None]]:
-        return _train
-
-    @classmethod
-    def post_compress_finetuning_trainer(cls, compress_algorithm_name: str) -> Optional[Callable[[nn.Module, optim.Optimizer, Callable, int], None]]:
-        return _train
-
-    @classmethod
-    def post_compress_finetuning_epochs(cls, compress_algorithm_name: str) -> int:
-        return 2
--- a/examples/model_compress/auto_compress/torch/auto_compress_torch.py
+++ b/examples/model_compress/auto_compress/torch/auto_compress_torch.py
@ -1,50 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from pathlib import Path
-
-from nni.compression.pytorch.auto_compress import AutoCompressionExperiment, AutoCompressionSearchSpaceGenerator
-
-from auto_compress_module import AutoCompressionModule
-
-generator = AutoCompressionSearchSpaceGenerator()
-generator.add_config('level', [
-    {
-        "sparsity": {
-            "_type": "uniform",
-            "_value": [0.01, 0.99]
-        },
-        'op_types': ['default']
-    }
-])
-generator.add_config('l1', [
-    {
-        "sparsity": {
-            "_type": "uniform",
-            "_value": [0.01, 0.99]
-        },
-        'op_types': ['Conv2d']
-    }
-])
-generator.add_config('qat', [
-    {
-        'quant_types': ['weight', 'output'],
-        'quant_bits': {
-            'weight': 8,
-            'output': 8
-        },
-        'op_types': ['Conv2d', 'Linear']
-    }])
-search_space = generator.dumps()
-
-experiment = AutoCompressionExperiment(AutoCompressionModule, 'local')
-experiment.config.experiment_name = 'auto compression torch example'
-experiment.config.trial_concurrency = 1
-experiment.config.max_trial_number = 10
-experiment.config.search_space = search_space
-experiment.config.trial_code_directory = Path(__file__).parent
-experiment.config.tuner.name = 'TPE'
-experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
-experiment.config.training_service.use_active_gpu = True
-
-experiment.run(8088)
--- a/examples/model_compress/auto_compress/torch/mnist_pretrain_lenet.pth
+++ b/examples/model_compress/auto_compress/torch/mnist_pretrain_lenet.pth
--- a/examples/model_compress/end2end_compression.py
+++ b/examples/model_compress/end2end_compression.py
@ -1,300 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-NNI example for combined pruning and quantization to compress a model.
-In this example, we show the compression process to first prune a model, then quantize the pruned model.
-
-"""
-import argparse
-import os
-import time
-import torch
-import torch.nn.functional as F
-import torch.optim as optim
-from torch.optim.lr_scheduler import StepLR
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch import ModelSpeedup
-
-from nni.compression.pytorch.pruning import L1FilterPruner
-from nni.compression.pytorch.quantization import QAT_Quantizer
-
-from models.mnist.naive import NaiveModel
-from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
-
-
-def get_model_time_cost(model, dummy_input):
-    model.eval()
-    n_times = 100
-    time_list = []
-    for _ in range(n_times):
-        torch.cuda.synchronize()
-        tic = time.time()
-        _ = model(dummy_input)
-        torch.cuda.synchronize()
-        time_list.append(time.time()-tic)
-    time_list = time_list[10:]
-    return sum(time_list) / len(time_list)
-
-
-def train(args, model, device, train_loader, criterion, optimizer, epoch):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-
-        optimizer.step()
-        if batch_idx % args.log_interval == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-            if args.dry_run:
-                break
-
-
-def test(args, model, device, criterion, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += criterion(output, target).item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-    acc = 100 * correct / len(test_loader.dataset)
-
-    print('Test Loss: {:.6f}  Accuracy: {}%\n'.format(
-        test_loss, acc))
-    return acc
-
-def test_trt(engine, test_loader):
-    test_loss = 0
-    correct = 0
-    time_elasped = 0
-    for data, target in test_loader:
-        output, time = engine.inference(data)
-        test_loss += F.nll_loss(output, target, reduction='sum').item()
-        pred = output.argmax(dim=1, keepdim=True)
-        correct += pred.eq(target.view_as(pred)).sum().item()
-        time_elasped += time
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-    print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
-
-def main(args):
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    os.makedirs(args.experiment_data_dir, exist_ok=True)
-
-    transform = transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.1307,), (0.3081,))
-        ])
-
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=transform),
-        batch_size=64,)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=transform),
-        batch_size=1000)
-
-    # Step1. Model Pretraining
-    model = NaiveModel().to(device)
-    criterion = torch.nn.NLLLoss()
-    optimizer = optim.Adadelta(model.parameters(), lr=args.pretrain_lr)
-    scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
-    flops, params, _ = count_flops_params(model, (1, 1, 28, 28), verbose=False)
-
-    if args.pretrained_model_dir is None:
-        args.pretrained_model_dir = os.path.join(args.experiment_data_dir, f'pretrained.pth')
-
-        best_acc = 0
-        for epoch in range(args.pretrain_epochs):
-            train(args, model, device, train_loader, criterion, optimizer, epoch)
-            scheduler.step()
-            acc = test(args, model, device, criterion, test_loader)
-            if acc > best_acc:
-                best_acc = acc
-                state_dict = model.state_dict()
-
-        model.load_state_dict(state_dict)
-        torch.save(state_dict, args.pretrained_model_dir)
-        print(f'Model saved to {args.pretrained_model_dir}')
-    else:
-        state_dict = torch.load(args.pretrained_model_dir)
-        model.load_state_dict(state_dict)
-        best_acc = test(args, model, device, criterion, test_loader)
-
-    dummy_input = torch.randn([1000, 1, 28, 28]).to(device)
-    time_cost = get_model_time_cost(model, dummy_input)
-
-    # 125.49 M, 0.85M, 93.29, 1.1012
-    print(f'Pretrained model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}, Time Cost: {time_cost}')
-
-    # Step2. Model Pruning
-    config_list = [{
-        'sparsity': args.sparsity,
-        'op_types': ['Conv2d']
-    }]
-
-    kw_args = {}
-    if args.dependency_aware:
-        dummy_input = torch.randn([1000, 1, 28, 28]).to(device)
-        print('Enable the dependency_aware mode')
-        # note that, not all pruners support the dependency_aware mode
-        kw_args['dependency_aware'] = True
-        kw_args['dummy_input'] = dummy_input
-
-    pruner = L1FilterPruner(model, config_list, **kw_args)
-    model = pruner.compress()
-    pruner.get_pruned_weights()
-
-    mask_path = os.path.join(args.experiment_data_dir, 'mask.pth')
-    model_path = os.path.join(args.experiment_data_dir, 'pruned.pth')
-    pruner.export_model(model_path=model_path, mask_path=mask_path)
-    pruner._unwrap_model()  # unwrap all modules to normal state
-
-    # Step3. Model Speedup
-    m_speedup = ModelSpeedup(model, dummy_input, mask_path, device)
-    m_speedup.speedup_model()
-    print('model after speedup', model)
-
-    flops, params, _ = count_flops_params(model, dummy_input, verbose=False)
-    acc = test(args, model, device, criterion, test_loader)
-    time_cost = get_model_time_cost(model, dummy_input)
-    print(f'Pruned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {acc: .2f}, Time Cost: {time_cost}')
-
-    # Step4. Model Finetuning
-    optimizer = optim.Adadelta(model.parameters(), lr=args.pretrain_lr)
-    scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
-
-    best_acc = 0
-    for epoch in range(args.finetune_epochs):
-        train(args, model, device, train_loader, criterion, optimizer, epoch)
-        scheduler.step()
-        acc = test(args, model, device, criterion, test_loader)
-        if acc > best_acc:
-            best_acc = acc
-            state_dict = model.state_dict()
-
-    model.load_state_dict(state_dict)
-    save_path = os.path.join(args.experiment_data_dir, f'finetuned.pth')
-    torch.save(state_dict, save_path)
-
-    flops, params, _ = count_flops_params(model, dummy_input, verbose=True)
-    time_cost = get_model_time_cost(model, dummy_input)
-
-    # FLOPs 28.48 M, #Params: 0.18M, Accuracy:  89.03, Time Cost: 1.03
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}, Time Cost: {time_cost}')
-    print(f'Model saved to {save_path}')
-
-    # Step5. Model Quantization via QAT
-    config_list = [{
-        'quant_types': ['weight', 'output'],
-        'quant_bits': {'weight': 8, 'output': 8},
-        'op_names': ['conv1']
-    }, {
-        'quant_types': ['output'],
-        'quant_bits': {'output':8},
-        'op_names': ['relu1']
-    }, {
-        'quant_types': ['weight', 'output'],
-        'quant_bits': {'weight': 8, 'output': 8},
-        'op_names': ['conv2']
-    }, {
-        'quant_types': ['output'],
-        'quant_bits': {'output': 8},
-        'op_names': ['relu2']
-    }]
-
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-    quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
-    quantizer.compress()
-
-    # Step6. Quantization Aware Training
-    best_acc = 0
-    for epoch in range(1):
-        train(args, model, device, train_loader, criterion, optimizer, epoch)
-        scheduler.step()
-        acc = test(args, model, device, criterion, test_loader)
-        if acc > best_acc:
-            best_acc = acc
-            state_dict = model.state_dict()
-
-    calibration_path = os.path.join(args.experiment_data_dir, 'calibration.pth')
-    calibration_config = quantizer.export_model(model_path, calibration_path)
-    print("calibration_config: ", calibration_config)
-
-    # Step7. Model Speedup
-    batch_size = 32
-    input_shape = (batch_size, 1, 28, 28)
-    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
-    engine.compress()
-
-    test_trt(engine, test_loader)
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-
-    # dataset and model
-    # parser.add_argument('--dataset', type=str, default='mnist',
-    #                     help='dataset to use, mnist, cifar10 or imagenet')
-    # parser.add_argument('--data-dir', type=str, default='./data/',
-    #                     help='dataset directory')
-    parser.add_argument('--pretrained-model-dir', type=str, default=None,
-                        help='path to pretrained model')
-    parser.add_argument('--pretrain-epochs', type=int, default=10,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--pretrain-lr', type=float, default=1.0,
-                        help='learning rate to pretrain the model')
-
-    parser.add_argument('--experiment-data-dir', type=str, default='./experiment_data',
-                        help='For saving output checkpoints')
-    parser.add_argument('--log-interval', type=int, default=100, metavar='N',
-                        help='how many batches to wait before logging training status')
-    parser.add_argument('--dry-run', action='store_true', default=False,
-                        help='quickly check a single pass')
-    # parser.add_argument('--multi-gpu', action='store_true', default=False,
-    #                     help='run on mulitple gpus')
-    # parser.add_argument('--test-only', action='store_true', default=False,
-    #                     help='run test only')
-
-    # pruner
-    # parser.add_argument('--pruner', type=str, default='l1filter',
-    #                     choices=['level', 'l1filter', 'l2filter', 'slim', 'agp',
-    #                              'fpgm', 'mean_activation', 'apoz', 'admm'],
-    #                     help='pruner to use')
-    parser.add_argument('--sparsity', type=float, default=0.5,
-                        help='target overall target sparsity')
-    parser.add_argument('--dependency-aware', action='store_true', default=False,
-                        help='toggle dependency-aware mode')
-
-    # finetuning
-    parser.add_argument('--finetune-epochs', type=int, default=5,
-                        help='epochs to fine tune')
-    # parser.add_argument('--kd', action='store_true', default=False,
-    #                     help='quickly check a single pass')
-    # parser.add_argument('--kd_T', type=float, default=4,
-    #                     help='temperature for KD distillation')
-    # parser.add_argument('--finetune-lr', type=float, default=0.5,
-    #                     help='learning rate to finetune the model')
-
-    # speedup
-    # parser.add_argument('--speedup', action='store_true', default=False,
-    #                     help='whether to speedup the pruned model')
-
-    # parser.add_argument('--nni', action='store_true', default=False,
-    #                     help="whether to tune the pruners using NNi tuners")
-
-    args = parser.parse_args()
-    main(args)
--- a/examples/model_compress/experimental/compression_experiment/demo.py
+++ b/examples/model_compress/experimental/compression_experiment/demo.py
@ -1,43 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from pathlib import Path
-
-import torch
-from torch.optim import Adam
-
-import nni
-from nni.compression.experiment.experiment import CompressionExperiment
-from nni.compression.experiment.config import CompressionExperimentConfig, TaylorFOWeightPrunerConfig
-from vessel import LeNet, finetuner, evaluator, trainer, criterion, device
-
-
-model = LeNet().to(device)
-
-# pre-training model
-finetuner(model)
-
-optimizer = nni.trace(Adam)(model.parameters())
-
-dummy_input = torch.rand(16, 1, 28, 28).to(device)
-
-# normal experiment setting, no need to set search_space and trial_command
-config = CompressionExperimentConfig('local')
-config.experiment_name = 'auto compression torch example'
-config.trial_concurrency = 1
-config.max_trial_number = 10
-config.trial_code_directory = Path(__file__).parent
-config.tuner.name = 'TPE'
-config.tuner.class_args['optimize_mode'] = 'maximize'
-
-# compression experiment specific setting
-# single float value means the expected remaining ratio upper limit for flops & params, lower limit for metric
-config.compression_setting.flops = 0.2
-config.compression_setting.params = 0.5
-config.compression_setting.module_types = ['Conv2d', 'Linear']
-config.compression_setting.exclude_module_names = ['fc2']
-config.compression_setting.pruners = [TaylorFOWeightPrunerConfig()]
-
-experiment = CompressionExperiment(config, model, finetuner, evaluator, dummy_input, trainer, optimizer, criterion, device)
-
-experiment.run(8080)
--- a/examples/model_compress/experimental/compression_experiment/vessel.py
+++ b/examples/model_compress/experimental/compression_experiment/vessel.py
@ -1,99 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torch.optim import Adam
-from torchvision import datasets, transforms
-
-import nni
-
-
-@nni.trace
-class LeNet(nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.conv1 = nn.Conv2d(1, 32, 3, 1)
-        self.conv2 = nn.Conv2d(32, 64, 3, 1)
-        self.dropout1 = nn.Dropout2d(0.25)
-        self.dropout2 = nn.Dropout2d(0.5)
-        self.fc1 = nn.Linear(9216, 128)
-        self.fc2 = nn.Linear(128, 10)
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = F.relu(x)
-        x = self.conv2(x)
-        x = F.relu(x)
-        x = F.max_pool2d(x, 2)
-        x = self.dropout1(x)
-        x = torch.flatten(x, 1)
-        x = self.fc1(x)
-        x = F.relu(x)
-        x = self.dropout2(x)
-        x = self.fc2(x)
-        output = F.log_softmax(x, dim=1)
-        return output
-
-_use_cuda = True
-device = torch.device("cuda" if _use_cuda else "cpu")
-
-_train_kwargs = {'batch_size': 64}
-_test_kwargs = {'batch_size': 1000}
-if _use_cuda:
-    _cuda_kwargs = {'num_workers': 1,
-                    'pin_memory': True,
-                    'shuffle': True}
-    _train_kwargs.update(_cuda_kwargs)
-    _test_kwargs.update(_cuda_kwargs)
-
-_transform = transforms.Compose([
-    transforms.ToTensor(),
-    transforms.Normalize((0.1307,), (0.3081,))
-])
-
-_train_loader = None
-_test_loader = None
-
-def trainer(model, optimizer, criterion):
-    global _train_loader
-    if _train_loader is None:
-        dataset = datasets.MNIST('./data', train=True, download=True, transform=_transform)
-        _train_loader = torch.utils.data.DataLoader(dataset, **_train_kwargs)
-    model.train()
-    for data, target in _train_loader:
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    global _test_loader
-    if _test_loader is None:
-        dataset = datasets.MNIST('./data', train=False, transform=_transform, download=True)
-        _test_loader = torch.utils.data.DataLoader(dataset, **_test_kwargs)
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in _test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(_test_loader.dataset)
-    acc = 100 * correct / len(_test_loader.dataset)
-    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
-        test_loss, correct, len(_test_loader.dataset), acc))
-    return acc
-
-criterion = F.nll_loss
-
-def finetuner(model: nn.Module):
-    optimizer = Adam(model.parameters())
-    for i in range(3):
-        trainer(model, optimizer, criterion)
--- a/examples/model_compress/models/cifar10/resnet.py
+++ b/examples/model_compress/models/cifar10/resnet.py
@ -1,115 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-
-class BasicBlock(nn.Module):
-    expansion = 1
-
-    def __init__(self, in_planes, planes, stride=1):
-        super(BasicBlock, self).__init__()
-        self.conv1 = nn.Conv2d(
-            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
-        self.bn1 = nn.BatchNorm2d(planes)
-        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
-                               stride=1, padding=1, bias=False)
-        self.bn2 = nn.BatchNorm2d(planes)
-
-        self.shortcut = nn.Sequential()
-        if stride != 1 or in_planes != self.expansion*planes:
-            self.shortcut = nn.Sequential(
-                nn.Conv2d(in_planes, self.expansion*planes,
-                          kernel_size=1, stride=stride, bias=False),
-                nn.BatchNorm2d(self.expansion*planes)
-            )
-
-    def forward(self, x):
-        out = F.relu(self.bn1(self.conv1(x)))
-        out = self.bn2(self.conv2(out))
-        out += self.shortcut(x)
-        out = F.relu(out)
-        return out
-
-
-class Bottleneck(nn.Module):
-    expansion = 4
-
-    def __init__(self, in_planes, planes, stride=1):
-        super(Bottleneck, self).__init__()
-        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
-        self.bn1 = nn.BatchNorm2d(planes)
-        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
-                               stride=stride, padding=1, bias=False)
-        self.bn2 = nn.BatchNorm2d(planes)
-        self.conv3 = nn.Conv2d(planes, self.expansion *
-                               planes, kernel_size=1, bias=False)
-        self.bn3 = nn.BatchNorm2d(self.expansion*planes)
-
-        self.shortcut = nn.Sequential()
-        if stride != 1 or in_planes != self.expansion*planes:
-            self.shortcut = nn.Sequential(
-                nn.Conv2d(in_planes, self.expansion*planes,
-                          kernel_size=1, stride=stride, bias=False),
-                nn.BatchNorm2d(self.expansion*planes)
-            )
-
-    def forward(self, x):
-        out = F.relu(self.bn1(self.conv1(x)))
-        out = F.relu(self.bn2(self.conv2(out)))
-        out = self.bn3(self.conv3(out))
-        out += self.shortcut(x)
-        out = F.relu(out)
-        return out
-
-
-class ResNet(nn.Module):
-    def __init__(self, block, num_blocks, num_classes=10):
-        super(ResNet, self).__init__()
-        self.in_planes = 64
-        # this layer is different from torchvision.resnet18() since this model adopted for Cifar10
-        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
-        self.bn1 = nn.BatchNorm2d(64)
-        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
-        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
-        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
-        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
-        self.linear = nn.Linear(512*block.expansion, num_classes)
-
-    def _make_layer(self, block, planes, num_blocks, stride):
-        strides = [stride] + [1]*(num_blocks-1)
-        layers = []
-        for stride in strides:
-            layers.append(block(self.in_planes, planes, stride))
-            self.in_planes = planes * block.expansion
-        return nn.Sequential(*layers)
-
-    def forward(self, x):
-        out = F.relu(self.bn1(self.conv1(x)))
-        out = self.layer1(out)
-        out = self.layer2(out)
-        out = self.layer3(out)
-        out = self.layer4(out)
-        out = F.avg_pool2d(out, 4)
-        out = out.view(out.size(0), -1)
-        out = self.linear(out)
-        return out
-
-
-def ResNet18():
-    return ResNet(BasicBlock, [2, 2, 2, 2])
-
-
-def ResNet34():
-    return ResNet(BasicBlock, [3, 4, 6, 3])
-
-
-def ResNet50():
-    return ResNet(Bottleneck, [3, 4, 6, 3])
-
-
-def ResNet101():
-    return ResNet(Bottleneck, [3, 4, 23, 3])
-
-
-def ResNet152():
-    return ResNet(Bottleneck, [3, 8, 36, 3])
--- a/examples/model_compress/models/cifar10/vgg.py
+++ b/examples/model_compress/models/cifar10/vgg.py
@ -1,63 +0,0 @@
-import math
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-defaultcfg = {
-    11: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
-    13: [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
-    16: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512],
-    19: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512],
-}
-
-
-class VGG(nn.Module):
-    def __init__(self, depth=16):
-        super(VGG, self).__init__()
-        cfg = defaultcfg[depth]
-        self.cfg = cfg
-        self.feature = self.make_layers(cfg, True)
-        num_classes = 10
-        self.classifier = nn.Sequential(
-            nn.Linear(cfg[-1], 512),
-            nn.BatchNorm1d(512),
-            nn.ReLU(inplace=True),
-            nn.Linear(512, num_classes)
-        )
-        self._initialize_weights()
-
-    def make_layers(self, cfg, batch_norm=False):
-        layers = []
-        in_channels = 3
-        for v in cfg:
-            if v == 'M':
-                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
-            else:
-                conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1, bias=False)
-                if batch_norm:
-                    layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
-                else:
-                    layers += [conv2d, nn.ReLU(inplace=True)]
-                in_channels = v
-        return nn.Sequential(*layers)
-
-    def forward(self, x):
-        x = self.feature(x)
-        x = nn.AvgPool2d(2)(x)
-        x = x.view(x.size(0), -1)
-        y = self.classifier(x)
-        return y
-
-    def _initialize_weights(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
-                m.weight.data.normal_(0, math.sqrt(2. / n))
-                if m.bias is not None:
-                    m.bias.data.zero_()
-            elif isinstance(m, nn.BatchNorm2d):
-                m.weight.data.fill_(0.5)
-                m.bias.data.zero_()
-            elif isinstance(m, nn.Linear):
-                m.weight.data.normal_(0, 0.01)
-                m.bias.data.zero_()
--- a/examples/model_compress/models/mnist/lenet.py
+++ b/examples/model_compress/models/mnist/lenet.py
@ -1,29 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-
-class LeNet(nn.Module):
-    def __init__(self):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 32, 3, 1)
-        self.conv2 = nn.Conv2d(32, 64, 3, 1)
-        self.dropout1 = nn.Dropout2d(0.25)
-        self.dropout2 = nn.Dropout2d(0.5)
-        self.fc1 = nn.Linear(9216, 128)
-        self.fc2 = nn.Linear(128, 10)
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = F.relu(x)
-        x = self.conv2(x)
-        x = F.relu(x)
-        x = F.max_pool2d(x, 2)
-        x = self.dropout1(x)
-        x = torch.flatten(x, 1)
-        x = self.fc1(x)
-        x = F.relu(x)
-        x = self.dropout2(x)
-        x = self.fc2(x)
-        output = F.log_softmax(x, dim=1)
-        return output
--- a/examples/model_compress/models/mnist/naive.py
+++ b/examples/model_compress/models/mnist/naive.py
@ -1,27 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from functools import reduce
-
-class NaiveModel(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
-        self.fc2 = torch.nn.Linear(500, 10)
-        self.relu1 = torch.nn.ReLU6()
-        self.relu2 = torch.nn.ReLU6()
-        self.relu3 = torch.nn.ReLU6()
-        self.max_pool1 = torch.nn.MaxPool2d(2, 2)
-        self.max_pool2 = torch.nn.MaxPool2d(2, 2)
-
-    def forward(self, x):
-        x = self.relu1(self.conv1(x))
-        x = self.max_pool1(x)
-        x = self.relu2(self.conv2(x))
-        x = self.max_pool2(x)
-        x = x.view(-1, x.size()[1:].numel())
-        x = self.relu3(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
--- a/examples/model_compress/models/mobilenet.py
+++ b/examples/model_compress/models/mobilenet.py
@ -1,88 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-#
-# This file contains code adapted from AMC (https://github.com/mit-han-lab/amc)
-# Copyright (c) 2018 MIT_Han_Lab
-# Licensed under the MIT License
-# https://github.com/mit-han-lab/amc/blob/master/LICENSE
-
-import torch.nn as nn
-import math
-
-
-def conv_bn(inp, oup, stride):
-    return nn.Sequential(
-        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
-        nn.BatchNorm2d(oup),
-        nn.ReLU(inplace=True)
-    )
-
-
-def conv_dw(inp, oup, stride):
-    return nn.Sequential(
-        nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
-        nn.BatchNorm2d(inp),
-        nn.ReLU(inplace=True),
-
-        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
-        nn.BatchNorm2d(oup),
-        nn.ReLU(inplace=True),
-    )
-
-
-class MobileNet(nn.Module):
-    def __init__(self, n_class,  profile='normal'):
-        super(MobileNet, self).__init__()
-
-        # original
-        if profile == 'normal':
-            in_planes = 32
-            cfg = [64, (128, 2), 128, (256, 2), 256, (512, 2), 512, 512, 512, 512, 512, (1024, 2), 1024]
-        # 0.5 AMC
-        elif profile == '0.5flops':
-            in_planes = 24
-            cfg = [48, (96, 2), 80, (192, 2), 200, (328, 2), 352, 368, 360, 328, 400, (736, 2), 752]
-        else:
-            raise NotImplementedError
-
-        self.conv1 = conv_bn(3, in_planes, stride=2)
-
-        self.features = self._make_layers(in_planes, cfg, conv_dw)
-
-        self.classifier = nn.Sequential(
-            nn.Linear(cfg[-1], n_class),
-        )
-
-        self._initialize_weights()
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = self.features(x)
-        x = x.mean([2, 3]) # global average pooling
-
-        x = self.classifier(x)
-        return x
-
-    def _make_layers(self, in_planes, cfg, layer):
-        layers = []
-        for x in cfg:
-            out_planes = x if isinstance(x, int) else x[0]
-            stride = 1 if isinstance(x, int) else x[1]
-            layers.append(layer(in_planes, out_planes, stride))
-            in_planes = out_planes
-        return nn.Sequential(*layers)
-
-    def _initialize_weights(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
-                m.weight.data.normal_(0, math.sqrt(2. / n))
-                if m.bias is not None:
-                    m.bias.data.zero_()
-            elif isinstance(m, nn.BatchNorm2d):
-                m.weight.data.fill_(1)
-                m.bias.data.zero_()
-            elif isinstance(m, nn.Linear):
-                n = m.weight.size(1)
-                m.weight.data.normal_(0, 0.01)
-                m.bias.data.zero_()
--- a/examples/model_compress/models/mobilenet_v2.py
+++ b/examples/model_compress/models/mobilenet_v2.py
@ -1,131 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import torch.nn as nn
-import math
-
-
-def conv_bn(inp, oup, stride):
-    return nn.Sequential(
-        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
-        nn.BatchNorm2d(oup),
-        nn.ReLU6(inplace=True)
-    )
-
-
-def conv_1x1_bn(inp, oup):
-    return nn.Sequential(
-        nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
-        nn.BatchNorm2d(oup),
-        nn.ReLU6(inplace=True)
-    )
-
-
-class InvertedResidual(nn.Module):
-    def __init__(self, inp, oup, stride, expand_ratio):
-        super(InvertedResidual, self).__init__()
-        self.stride = stride
-        assert stride in [1, 2]
-
-        hidden_dim = round(inp * expand_ratio)
-        self.use_res_connect = self.stride == 1 and inp == oup
-
-        if expand_ratio == 1:
-            self.conv = nn.Sequential(
-                # dw
-                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
-                nn.BatchNorm2d(hidden_dim),
-                nn.ReLU6(inplace=True),
-                # pw-linear
-                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
-                nn.BatchNorm2d(oup),
-            )
-        else:
-            self.conv = nn.Sequential(
-                # pw
-                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
-                nn.BatchNorm2d(hidden_dim),
-                nn.ReLU6(inplace=True),
-                # dw
-                nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
-                nn.BatchNorm2d(hidden_dim),
-                nn.ReLU6(inplace=True),
-                # pw-linear
-                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
-                nn.BatchNorm2d(oup),
-            )
-
-    def forward(self, x):
-        if self.use_res_connect:
-            return x + self.conv(x)
-        else:
-            return self.conv(x)
-
-
-class MobileNetV2(nn.Module):
-    def __init__(self, n_class=1000, input_size=224, width_mult=1.):
-        super(MobileNetV2, self).__init__()
-        block = InvertedResidual
-        input_channel = 32
-        last_channel = 1280
-        interverted_residual_setting = [
-            # t, c, n, s
-            [1, 16, 1, 1],
-            [6, 24, 2, 2],
-            [6, 32, 3, 2],
-            [6, 64, 4, 2],
-            [6, 96, 3, 1],
-            [6, 160, 3, 2],
-            [6, 320, 1, 1],
-        ]
-
-        # building first layer
-        assert input_size % 32 == 0
-        input_channel = int(input_channel * width_mult)
-        self.last_channel = int(last_channel * width_mult) if width_mult > 1.0 else last_channel
-        self.features = [conv_bn(3, input_channel, 2)]
-        # building inverted residual blocks
-        for t, c, n, s in interverted_residual_setting:
-            output_channel = int(c * width_mult)
-            for i in range(n):
-                if i == 0:
-                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
-                else:
-                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
-                input_channel = output_channel
-        # building last several layers
-        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
-        # make it nn.Sequential
-        self.features = nn.Sequential(*self.features)
-
-        # building classifier
-        self.classifier = nn.Sequential(
-            nn.Dropout(0.2),
-            nn.Linear(self.last_channel, n_class),
-        )
-
-        self._initialize_weights()
-
-    def forward(self, x):
-        x = self.features(x)
-        # it's same with .mean(3).mean(2), but
-        # speedup only suport the mean option
-        # whose output only have two dimensions
-        x = x.mean([2, 3])
-        x = self.classifier(x)
-        return x
-
-    def _initialize_weights(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
-                m.weight.data.normal_(0, math.sqrt(2. / n))
-                if m.bias is not None:
-                    m.bias.data.zero_()
-            elif isinstance(m, nn.BatchNorm2d):
-                m.weight.data.fill_(1)
-                m.bias.data.zero_()
-            elif isinstance(m, nn.Linear):
-                n = m.weight.size(1)
-                m.weight.data.normal_(0, 0.01)
-                m.bias.data.zero_()
--- a/examples/model_compress/pruning/activation_pruning_torch.py
+++ b/examples/model_compress/pruning/activation_pruning_torch.py
@ -1,142 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported ActivationAPoZRank and ActivationMeanRank pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import ActivationAPoZRankPruner, ActivationMeanRankPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pruner', type=str, default='apoz',
-                        choices=['apoz', 'mean'],
-                        help='pruner to use')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'total_sparsity': 0.5,
-        'op_types': ['Conv2d'],
-    }]
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    if 'apoz' in args.pruner:
-        pruner = ActivationAPoZRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
-    else:
-        pruner = ActivationMeanRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/activation_pruning_torch_v2.py
+++ b/examples/model_compress/pruning/activation_pruning_torch_v2.py
@ -1,142 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported ActivationAPoZRank and ActivationMeanRank pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import ActivationAPoZRankPruner, ActivationMeanRankPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def train(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluate(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pruner', type=str, default='apoz',
-                        choices=['apoz', 'mean'],
-                        help='pruner to use')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        train(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluate(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'total_sparsity': 0.5,
-        'op_types': ['Conv2d'],
-    }]
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    if 'apoz' in args.pruner:
-        pruner = ActivationAPoZRankPruner(model, config_list, train, traced_optimizer, criterion, training_batches=20)
-    else:
-        pruner = ActivationMeanRankPruner(model, config_list, train, traced_optimizer, criterion, training_batches=20)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, torch.rand([10, 3, 32, 32]), masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluate(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        train(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluate(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/admm_pruning_torch.py
+++ b/examples/model_compress/pruning/admm_pruning_torch.py
@ -1,138 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported ADMM pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch.speedup import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import ADMMPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'sparsity': 0.8,
-        'op_types': ['Conv2d'],
-    }]
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    pruner = ADMMPruner(model, config_list, trainer, traced_optimizer, criterion, iterations=10, training_epochs=1, granularity='coarse-grained')
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-
-    pruner._unwrap_model()
-    ModelSpeedup(model, torch.randn([128, 3, 32, 32]).to(device), masks).speedup_model()
-
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER PRUNING ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/amc_pruning_torch.py
+++ b/examples/model_compress/pruning/amc_pruning_torch.py
@ -1,98 +0,0 @@
-import sys
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-from nni.compression.pytorch.pruning import AMCPruner
-from nni.compression.pytorch.utils import count_flops_params
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-def trainer(model, optimizer, criterion, epoch):
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def finetuner(model):
-    model.train()
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    # model = MobileNetV2(n_class=10).to(device)
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    scheduler = MultiStepLR(optimizer, milestones=[50, 75], gamma=0.1)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    for i in range(100):
-        trainer(model, optimizer, criterion, i)
-    pre_best_acc = evaluator(model)
-
-    dummy_input = torch.rand(10, 3, 32, 32).to(device)
-    pre_flops, pre_params, _ = count_flops_params(model, dummy_input)
-
-    config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.5, 'max_sparsity_per_layer': 0.8}]
-
-    # if you just want to keep the final result as the best result, you can pass evaluator as None.
-    # or the result with the highest score (given by evaluator) will be the best result.
-    ddpg_params = {'hidden1': 300, 'hidden2': 300, 'lr_c': 1e-3, 'lr_a': 1e-4, 'warmup': 100, 'discount': 1., 'bsize': 64,
-                   'rmsize': 100, 'window_length': 1, 'tau': 0.01, 'init_delta': 0.5, 'delta_decay': 0.99, 'max_episode_length': 1e9, 'epsilon': 50000}
-    pruner = AMCPruner(400, model, config_list, dummy_input, evaluator, finetuner=finetuner, ddpg_params=ddpg_params, target='flops')
-    pruner.compress()
-    _, model, masks, best_acc, _ = pruner.get_best_result()
-    flops, params, _ = count_flops_params(model, dummy_input)
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/auto_compress_pruner.py
+++ b/examples/model_compress/pruning/auto_compress_pruner.py
@ -1,94 +0,0 @@
-import sys
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-
-import nni
-from nni.compression.pytorch.pruning import AutoCompressPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-epoch = 0
-
-def trainer(model, optimizer, criterion):
-    global epoch
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Total Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-    epoch = epoch + 1
-
-def finetuner(model):
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    trainer(model, optimizer, criterion)
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    # pre-train the model
-    for _ in range(10):
-        trainer(model, optimizer, criterion)
-
-    config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.8}]
-    dummy_input = torch.rand(10, 3, 32, 32).to(device)
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    admm_params = {
-        'trainer': trainer,
-        'traced_optimizer': traced_optimizer,
-        'criterion': criterion,
-        'iterations': 10,
-        'training_epochs': 1
-    }
-    sa_params = {
-        'evaluator': evaluator
-    }
-    pruner = AutoCompressPruner(model, config_list, 10, admm_params, sa_params, keep_intermediate_result=True, finetuner=finetuner)
-    pruner.compress()
-    _, model, masks, _, _ = pruner.get_best_result()
--- a/examples/model_compress/pruning/fpgm_pruning_torch.py
+++ b/examples/model_compress/pruning/fpgm_pruning_torch.py
@ -1,131 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported fpgm pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import FPGMPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'sparsity': 0.5,
-        'op_types': ['Conv2d']
-    }]
-    pruner = FPGMPruner(model, config_list)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/iterative_pruning_torch.py
+++ b/examples/model_compress/pruning/iterative_pruning_torch.py
@ -1,138 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported iterative pruning algorithms.
-In this example, we show the end-to-end iterative pruning process: pre-training -> pruning -> fine-tuning.
-
-'''
-import sys
-import argparse
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.pruning import (
-    LinearPruner,
-    AGPPruner,
-    LotteryTicketPruner
-)
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-def trainer(model, optimizer, criterion, epoch):
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def finetuner(model):
-    model.train()
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Iterative Example for model comporession')
-    parser.add_argument('--pruner', type=str, default='linear',
-                        choices=['linear', 'agp', 'lottery'],
-                        help='pruner to use')
-    parser.add_argument('--pretrain-epochs', type=int, default=10,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--total-iteration', type=int, default=10,
-                        help='number of iteration to iteratively prune the model')
-    parser.add_argument('--pruning-algo', type=str, default='l1',
-                        choices=['level', 'l1', 'l2', 'fpgm', 'slim', 'apoz',
-                                 'mean_activation', 'taylorfo', 'admm'],
-                        help='algorithm to evaluate weights to prune')
-    parser.add_argument('--speedup', type=bool, default=False,
-                        help='Whether to speedup the pruned model')
-    parser.add_argument('--reset-weight', type=bool, default=True,
-                        help='Whether to reset weight during each iteration')
-
-    args = parser.parse_args()
-
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    # pre-train the model
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion, i)
-        evaluator(model)
-
-    config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
-    dummy_input = torch.rand(10, 3, 32, 32).to(device)
-
-    # if you just want to keep the final result as the best result, you can pass evaluator as None.
-    # or the result with the highest score (given by evaluator) will be the best result.
-    kw_args = {'pruning_algorithm': args.pruning_algo,
-               'total_iteration': args.total_iteration,
-               'evaluator': None,
-               'finetuner': finetuner}
-
-    if args.speedup:
-        kw_args['speedup'] = args.speedup
-        kw_args['dummy_input'] = torch.rand(10, 3, 32, 32).to(device)
-
-    if args.pruner == 'linear':
-        iterative_pruner = LinearPruner
-    elif args.pruner == 'agp':
-        iterative_pruner = AGPPruner
-    elif args.pruner == 'lottery':
-        kw_args['reset_weight'] = args.reset_weight
-        iterative_pruner = LotteryTicketPruner
-
-    pruner = iterative_pruner(model, config_list, **kw_args)
-    pruner.compress()
-    _, model, masks, _, _ = pruner.get_best_result()
-    evaluator(model)
--- a/examples/model_compress/pruning/level_pruning_torch.py
+++ b/examples/model_compress/pruning/level_pruning_torch.py
@ -1,130 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported level pruning algorithm.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import LevelPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'sparsity': 0.5,
-        'op_types': ['default']
-    }]
-    pruner = LevelPruner(model, config_list)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-
-    # Fine-grained method does not need to speedup
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER PRUNING ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/movement_pruning_glue.py
+++ b/examples/model_compress/pruning/movement_pruning_glue.py
@ -1,128 +0,0 @@
-import functools
-import time
-from tqdm import tqdm
-
-import torch
-from torch.optim import Adam
-from torch.utils.data import DataLoader
-
-from datasets import load_metric, load_dataset
-from transformers import (
-    BertForSequenceClassification,
-    BertTokenizerFast,
-    DataCollatorWithPadding,
-    set_seed
-)
-
-import nni
-from nni.compression.pytorch.pruning import MovementPruner
-
-
-task_to_keys = {
-    "cola": ("sentence", None),
-    "mnli": ("premise", "hypothesis"),
-    "mrpc": ("sentence1", "sentence2"),
-    "qnli": ("question", "sentence"),
-    "qqp": ("question1", "question2"),
-    "rte": ("sentence1", "sentence2"),
-    "sst2": ("sentence", None),
-    "stsb": ("sentence1", "sentence2"),
-    "wnli": ("sentence1", "sentence2"),
-}
-
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-
-gradient_accumulation_steps = 8
-
-# a fake criterion because huggingface output already has loss
-def criterion(input, target):
-    return input.loss
-
-def trainer(model, optimizer, criterion, train_dataloader):
-    model.train()
-    counter = 0
-    for batch in (train_dataloader):
-        counter += 1
-        batch.to(device)
-        optimizer.zero_grad()
-        outputs = model(**batch)
-        # pruner may wrap the criterion, for example, loss = origin_loss + norm(weight), so call criterion to get loss here
-        loss = criterion(outputs, None)
-        loss = loss / gradient_accumulation_steps
-        loss.backward()
-        if counter % gradient_accumulation_steps == 0 or counter == len(train_dataloader):
-            optimizer.step()
-        if counter % 800 == 0:
-            print('[{}]: {}'.format(time.asctime(time.localtime(time.time())), counter))
-        if counter % 8000 == 0:
-            print('Step {}: {}'.format(counter // gradient_accumulation_steps, evaluator(model, metric, is_regression, validate_dataloader)))
-
-def evaluator(model, metric, is_regression, eval_dataloader):
-    model.eval()
-    for batch in (eval_dataloader):
-        batch.to(device)
-        outputs = model(**batch)
-        predictions = outputs.logits.argmax(dim=-1) if not is_regression else outputs.logits.squeeze()
-        metric.add_batch(
-            predictions=predictions,
-            references=batch["labels"],
-        )
-    return metric.compute()
-
-if __name__ == '__main__':
-    task_name = 'mnli'
-    is_regression = False
-    num_labels = 1 if is_regression else (3 if task_name == 'mnli' else 2)
-    train_batch_size = 4
-    eval_batch_size = 4
-
-    set_seed(1024)
-
-    tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
-    sentence1_key, sentence2_key = task_to_keys[task_name]
-
-    # used to preprocess the raw data
-    def preprocess_function(examples):
-        # Tokenize the texts
-        args = (
-            (examples[sentence1_key],) if sentence2_key is None else (examples[sentence1_key], examples[sentence2_key])
-        )
-        result = tokenizer(*args, padding=False, max_length=128, truncation=True)
-
-        if "label" in examples:
-            # In all cases, rename the column to labels because the model will expect that.
-            result["labels"] = examples["label"]
-        return result
-
-    raw_datasets = load_dataset('glue', task_name, cache_dir='./data')
-    processed_datasets = raw_datasets.map(preprocess_function, batched=True, remove_columns=raw_datasets["train"].column_names)
-
-    train_dataset = processed_datasets['train']
-    validate_dataset = processed_datasets['validation_matched' if task_name == "mnli" else 'validation']
-
-    data_collator = DataCollatorWithPadding(tokenizer)
-    train_dataloader = DataLoader(train_dataset, shuffle=True, collate_fn=data_collator, batch_size=train_batch_size)
-    validate_dataloader = DataLoader(validate_dataset, collate_fn=data_collator, batch_size=eval_batch_size)
-
-    metric = load_metric("glue", task_name)
-
-    model = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=num_labels).to(device)
-
-    print('Initial: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))
-
-    config_list = [{'op_types': ['Linear'], 'op_partial_names': ['bert.encoder'], 'sparsity': 0.9}]
-    p_trainer = functools.partial(trainer, train_dataloader=train_dataloader)
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(Adam)(model.parameters(), lr=2e-5)
-    pruner = MovementPruner(model, config_list, p_trainer, traced_optimizer, criterion, training_epochs=10,
-                            warm_up_step=12272, cool_down_beginning_step=110448)
-
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-
-    print('Final: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))
-
-    optimizer = Adam(model.parameters(), lr=2e-5)
-    trainer(model, optimizer, criterion, train_dataloader)
-    print('After 1 epoch finetuning: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))
--- a/examples/model_compress/pruning/norm_pruning_torch.py
+++ b/examples/model_compress/pruning/norm_pruning_torch.py
@ -1,137 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported l1norm and l2norm pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import L1NormPruner, L2NormPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pruner', type=str, default='l1norm',
-                        choices=['l1norm', 'l2norm'],
-                        help='pruner to use')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'sparsity': 0.5,
-        'op_types': ['Conv2d']
-    }]
-    if 'l1' in args.pruner:
-        pruner = L1NormPruner(model, config_list)
-    else:
-        pruner = L2NormPruner(model, config_list)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/scheduler_torch.py
+++ b/examples/model_compress/pruning/scheduler_torch.py
@ -1,100 +0,0 @@
-import sys
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.pruning import L1NormPruner
-from nni.compression.pytorch.pruning.tools import AGPTaskGenerator
-from nni.compression.pytorch.pruning.basic_scheduler import PruningScheduler
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-def trainer(model, optimizer, criterion, epoch):
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def finetuner(model):
-    model.train()
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    # pre-train the model
-    for i in range(5):
-        trainer(model, optimizer, criterion, i)
-
-    # No need to pass model and config_list to pruner during initializing when using scheduler.
-    pruner = L1NormPruner(None, None)
-
-    # you can specify the log_dir, all intermediate results and best result will save under this folder.
-    # if you don't want to keep intermediate results, you can set `keep_intermediate_result=False`.
-    config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
-    task_generator = AGPTaskGenerator(10, model, config_list, log_dir='.', keep_intermediate_result=True)
-
-    dummy_input = torch.rand(10, 3, 32, 32).to(device)
-
-    # if you just want to keep the final result as the best result, you can pass evaluator as None.
-    # or the result with the highest score (given by evaluator) will be the best result.
-
-    # scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speedup=True, dummy_input=dummy_input, evaluator=evaluator)
-    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speedup=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)
-
-    scheduler.compress()
-
-    _, model, masks, _, _ = scheduler.get_best_result()
--- a/examples/model_compress/pruning/simple_pruning_torch.py
+++ b/examples/model_compress/pruning/simple_pruning_torch.py
@ -1,88 +0,0 @@
-import sys
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.pruning import L1NormPruner
-from nni.compression.pytorch.speedup import ModelSpeedup
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-def trainer(model, optimizer, criterion, epoch):
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    print('\nPre-train the model:')
-    for i in range(5):
-        trainer(model, optimizer, criterion, i)
-        evaluator(model)
-
-    config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
-    pruner = L1NormPruner(model, config_list)
-    _, masks = pruner.compress()
-
-    print('\nThe accuracy with masks:')
-    evaluator(model)
-
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand(10, 3, 32, 32).to(device), masks_file=masks).speedup_model()
-
-    print('\nThe accuracy after speedup:')
-    evaluator(model)
-
-    # Need a new optimizer due to the modules in model will be replaced during speedup.
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    print('\nFinetune the model after speedup:')
-    for i in range(5):
-        trainer(model, optimizer, criterion, i)
-        evaluator(model)
--- a/examples/model_compress/pruning/simulated_anealing_pruning_torch.py
+++ b/examples/model_compress/pruning/simulated_anealing_pruning_torch.py
@ -1,109 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for simulated anealing pruning algorithm.
-In this example, we show the end-to-end iterative pruning process: pre-training -> pruning -> fine-tuning.
-
-'''
-import sys
-import argparse
-from tqdm import tqdm
-
-import torch
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.pruning import SimulatedAnnealingPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-criterion = torch.nn.CrossEntropyLoss()
-
-def trainer(model, optimizer, criterion, epoch):
-    model.train()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def finetuner(model):
-    model.train()
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-
-def evaluator(model):
-    model.eval()
-    correct = 0
-    with torch.no_grad():
-        for data, target in tqdm(iterable=test_loader, desc='Test'):
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Iterative Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=10,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--pruning-algo', type=str, default='l1',
-                        choices=['level', 'l1', 'l2', 'fpgm', 'slim', 'apoz',
-                                 'mean_activation', 'taylorfo', 'admm'],
-                        help='algorithm to evaluate weights to prune')
-    parser.add_argument('--cool-down-rate', type=float, default=0.9,
-                        help='Cool down rate of the temperature.')
-
-    args = parser.parse_args()
-
-    model = VGG().to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    # pre-train the model
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion, i)
-        evaluator(model)
-
-    config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.8}]
-
-    # evaluator in 'SimulatedAnnealingPruner' could not be None.
-    pruner = SimulatedAnnealingPruner(model, config_list, pruning_algorithm=args.pruning_algo,
-                                      evaluator=evaluator, cool_down_rate=args.cool_down_rate, finetuner=finetuner)
-    pruner.compress()
-    _, model, masks, _, _ = pruner.get_best_result()
-    evaluator(model)
--- a/examples/model_compress/pruning/slim_pruning_torch.py
+++ b/examples/model_compress/pruning/slim_pruning_torch.py
@ -1,136 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported slim pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> speedup -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import SlimPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'total_sparsity': 0.5,
-        'op_types': ['BatchNorm2d'],
-        'max_sparsity_per_layer': 0.9
-    }]
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    pruner = SlimPruner(model, config_list, trainer, traced_optimizer, criterion, training_epochs=1, scale=0.0001, mode='global')
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/speedup_attention.py
+++ b/examples/model_compress/pruning/speedup_attention.py
@ -1,36 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-This is an example for pruning speedup the huggingface transformers.
-Now nni officially support speedup bert, bart, t5, vit attention heads.
-For other transforms attention or even any hyper-module, users could customize by implementation a Replacer.
-"""
-
-import torch
-
-from transformers.models.bert.configuration_bert import BertConfig
-from transformers.models.bert.modeling_bert import BertForSequenceClassification
-
-from nni.compression.pytorch.pruning import L1NormPruner
-from nni.compression.pytorch.speedup import ModelSpeedup
-from nni.compression.pytorch.utils.external.atten_replacer import TransformersAttentionReplacer
-
-config = BertConfig()
-model = BertForSequenceClassification(config)
-
-config_list = [{
-    'op_types': ['Linear'],
-    'op_partial_names': ['bert.encoder.layer.{}.attention.self'.format(i) for i in range(12)],
-    'sparsity': 0.98
-}]
-
-pruner = L1NormPruner(model, config_list)
-_, masks = pruner.compress()
-pruner._unwrap_model()
-
-replacer = TransformersAttentionReplacer(model)
-ModelSpeedup(model, torch.randint(0, 30000, [4, 128]), masks, customized_replacers=[replacer]).speedup_model()
-
-print(model(**{'input_ids': torch.randint(0, 30000, [4, 128])}))
-print(model)
--- a/examples/model_compress/pruning/taylorfo_lightning_evaluator.py
+++ b/examples/model_compress/pruning/taylorfo_lightning_evaluator.py
@ -1,165 +0,0 @@
-from __future__ import annotations
-import pytorch_lightning as pl
-from pytorch_lightning.loggers import TensorBoardLogger
-import torch
-from torch.optim.lr_scheduler import StepLR
-from torch.utils.data import DataLoader
-from torchmetrics.functional import accuracy
-from torchvision import datasets, transforms
-
-import nni
-from nni.compression.pytorch import LightningEvaluator
-
-import sys
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-class SimpleLightningModel(pl.LightningModule):
-    def __init__(self):
-        super().__init__()
-        self.model = VGG()
-        self.criterion = torch.nn.CrossEntropyLoss()
-
-    def forward(self, x):
-        return self.model(x)
-
-    def training_step(self, batch, batch_idx):
-        x, y = batch
-        logits = self(x)
-        loss = self.criterion(logits, y)
-        self.log("train_loss", loss)
-        return loss
-
-    def evaluate(self, batch, stage=None):
-        x, y = batch
-        logits = self(x)
-        loss = self.criterion(logits, y)
-        preds = torch.argmax(logits, dim=1)
-        acc = accuracy(preds, y, 'multiclass', num_classes=10)
-
-        if stage:
-            self.log(f"default", loss, prog_bar=False)
-            self.log(f"{stage}_loss", loss, prog_bar=True)
-            self.log(f"{stage}_acc", acc, prog_bar=True)
-
-    def validation_step(self, batch, batch_idx):
-        self.evaluate(batch, "val")
-
-    def test_step(self, batch, batch_idx):
-        self.evaluate(batch, "test")
-
-    def configure_optimizers(self):
-        optimizer = nni.trace(torch.optim.Adam)(
-            self.parameters(),
-            lr=0.001
-        )
-        scheduler_dict = {
-            "scheduler": nni.trace(StepLR)(
-                optimizer,
-                step_size=1,
-                gamma=0.5
-            ),
-            "interval": "epoch",
-        }
-        return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
-
-
-class ImageNetDataModule(pl.LightningDataModule):
-    def __init__(self, data_dir: str = "./data"):
-        super().__init__()
-        self.data_dir = data_dir
-
-    def prepare_data(self):
-        # download
-        datasets.CIFAR10(self.data_dir, train=True, download=True)
-        datasets.CIFAR10(self.data_dir, train=False, download=True)
-
-    def setup(self, stage: str | None = None):
-        if stage == "fit" or stage is None:
-            self.cifar10_train_data = datasets.CIFAR10(root='data', train=True, transform=transforms.Compose([
-                transforms.RandomHorizontalFlip(),
-                transforms.RandomCrop(32, 4),
-                transforms.ToTensor(),
-                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-            ]))
-            self.cifar10_val_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
-                transforms.ToTensor(),
-                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-            ]))
-
-        if stage == "test" or stage is None:
-            self.cifar10_test_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
-                transforms.ToTensor(),
-                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-            ]))
-
-        if stage == "predict" or stage is None:
-            self.cifar10_predict_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
-                transforms.ToTensor(),
-                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-            ]))
-
-    def train_dataloader(self):
-        return DataLoader(self.cifar10_train_data, batch_size=128, shuffle=True)
-
-    def val_dataloader(self):
-        return DataLoader(self.cifar10_val_data, batch_size=128, shuffle=False)
-
-    def test_dataloader(self):
-        return DataLoader(self.cifar10_test_data, batch_size=128, shuffle=False)
-
-    def predict_dataloader(self):
-        return DataLoader(self.cifar10_predict_data, batch_size=128, shuffle=False)
-
-# Train the model
-pl_trainer = nni.trace(pl.Trainer)(
-    accelerator='auto',
-    devices=1,
-    max_epochs=3,
-    logger=TensorBoardLogger('./lightning_logs', name="vgg"),
-)
-pl_data = nni.trace(ImageNetDataModule)(data_dir='./data')
-model = SimpleLightningModel()
-pl_trainer.fit(model, pl_data)
-metric = pl_trainer.test(model, pl_data)
-print(f'The trained model accuracy: {metric}')
-
-# create traced optimizer / lr_scheduler
-optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
-criterion = torch.nn.CrossEntropyLoss()
-lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
-dummy_input = torch.rand(4, 3, 224, 224)
-
-# TorchEvaluator initialization
-evaluator = LightningEvaluator(pl_trainer, pl_data)
-
-# apply pruning
-from nni.compression.pytorch.pruning import TaylorFOWeightPruner
-from nni.compression.pytorch.speedup import ModelSpeedup
-
-pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
-_, masks = pruner.compress()
-metric = pl_trainer.test(model, pl_data)
-print(f'The masked model accuracy: {metric}')
-pruner.show_pruned_weights()
-pruner._unwrap_model()
-ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]), masks_file=masks).speedup_model()
-metric = pl_trainer.test(model, pl_data)
-print(f'The speedup model accuracy: {metric}')
-
-# finetune the speedup model
-optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-criterion = torch.nn.CrossEntropyLoss()
-lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
-
-pl_trainer = pl.Trainer(
-    accelerator='auto',
-    devices=1,
-    max_epochs=3,
-    logger=TensorBoardLogger('./lightning_logs', name="vgg"),
-)
-pl_trainer.fit(model, pl_data)
-metric = pl_trainer.test(model, pl_data)
-print(f'The speedup model after finetuning accuracy: {metric}')
--- a/examples/model_compress/pruning/taylorfo_pruning_torch.py
+++ b/examples/model_compress/pruning/taylorfo_pruning_torch.py
@ -1,136 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported TaylorFOWeight pruning algorithms.
-In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import sys
-
-import torch
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import TaylorFOWeightPruner
-
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-g_epoch = 0
-
-train_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        normalize,
-    ]), download=True),
-    batch_size=128, shuffle=True)
-
-test_loader = torch.utils.data.DataLoader(
-    datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        normalize,
-    ])),
-    batch_size=128, shuffle=False)
-
-def trainer(model, optimizer, criterion):
-    global g_epoch
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = criterion(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx and batch_idx % 100 == 0:
-            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                g_epoch, batch_idx * len(data), len(train_loader.dataset),
-                100. * batch_idx / len(train_loader), loss.item()))
-    g_epoch += 1
-
-def evaluator(model):
-    model.eval()
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(test_loader.dataset)
-    print('Accuracy: {}%\n'.format(acc))
-    return acc
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
-    parser.add_argument('--pretrain-epochs', type=int, default=20,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--fine-tune-epochs', type=int, default=20,
-                        help='number of epochs to fine tune the model')
-    args = parser.parse_args()
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    model = VGG().to(device)
-    optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-    pre_best_acc = 0.0
-    best_state_dict = None
-
-    for i in range(args.pretrain_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        acc = evaluator(model)
-        if acc > pre_best_acc:
-            pre_best_acc = acc
-            best_state_dict = model.state_dict()
-    print("Best accuracy: {}".format(pre_best_acc))
-    model.load_state_dict(best_state_dict)
-    pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    g_epoch = 0
-
-    # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    config_list = [{
-        'total_sparsity': 0.5,
-        'op_types': ['Conv2d'],
-    }]
-
-    # make sure you have used nni.trace to wrap the optimizer class before initialize
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    pruner = TaylorFOWeightPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-    print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
-    evaluator(model)
-
-    # Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
-
-    best_acc = 0.0
-    g_epoch = 0
-    for i in range(args.fine_tune_epochs):
-        trainer(model, optimizer, criterion)
-        scheduler.step()
-        best_acc = max(evaluator(model), best_acc)
-    flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
-    print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')
--- a/examples/model_compress/pruning/taylorfo_pruning_with_ddp_torch.py
+++ b/examples/model_compress/pruning/taylorfo_pruning_with_ddp_torch.py
@ -1,265 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-'''
-NNI example for supported DistributedDataParallel pruning.
-In this example, we use TaylorFo pruner to show the end-to-end ddp pruning process: pre-training -> pruning -> fine-tuning.
-Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
-
-'''
-import argparse
-import time
-import functools
-from typing import Callable
-from pathlib import Path
-
-import torch
-import torch.nn as nn
-import torch.distributed as dist
-from torch.utils.data import DataLoader
-from torchvision import datasets, transforms
-from torch.optim.lr_scheduler import MultiStepLR
-
-import nni
-from nni.compression.pytorch import ModelSpeedup
-from nni.compression.pytorch.utils import count_flops_params
-from nni.compression.pytorch.pruning import TaylorFOWeightPruner
-from nni.compression.pytorch.utils import TorchEvaluator
-from nni.common.types import SCHEDULER
-
-#############  Create dataloaders, optimizer, training and evaluation function ############
-
-
-class Mnist(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
-        self.fc2 = torch.nn.Linear(500, 10)
-        self.relu1 = torch.nn.ReLU6()
-        self.relu2 = torch.nn.ReLU6()
-        self.relu3 = torch.nn.ReLU6()
-        self.max_pool1 = torch.nn.MaxPool2d(2, 2)
-        self.max_pool2 = torch.nn.MaxPool2d(2, 2)
-
-    def forward(self, x):
-        x = self.relu1(self.conv1(x))
-        x = self.max_pool1(x)
-        x = self.relu2(self.conv2(x))
-        x = self.max_pool2(x)
-        x = x.view(x.shape[0], -1)
-        x = self.relu3(self.fc1(x))
-        x = self.fc2(x)
-        return x
-
-
-def create_dataloaders():
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    # training dataloader
-    training_dataset = datasets.MNIST('data', train=True, download=True, transform=trans)
-    training_sampler = torch.utils.data.distributed.DistributedSampler(training_dataset)
-    training_dataloader = torch.utils.data.DataLoader(training_dataset, \
-        batch_size=64, sampler=training_sampler)
-    # validation dataloader
-    validation_dataset = datasets.MNIST('data', train=False, transform=trans)
-    validation_sampler = torch.utils.data.distributed.DistributedSampler(validation_dataset)
-    validation_dataloader = torch.utils.data.DataLoader(validation_dataset, \
-        batch_size=1000, sampler=validation_sampler)
-
-    return training_dataloader, validation_dataloader
-
-def training(
-    training_dataloader: DataLoader,
-    validation_dataloader: DataLoader,
-    model: nn.Module,
-    optimizer: torch.optim.Optimizer,
-    criterion: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
-    lr_scheduler: SCHEDULER = None,
-    max_steps: int = None, max_epochs: int = None,
-    local_rank: int = -1,
-    save_best_model: bool = False, save_path: str = None,
-    log_path: str = None,
-    evaluation_func=None,
-):
-    model.train()
-    current_step = 0
-    best_acc = 0.
-
-    for current_epoch in range(max_epochs if max_epochs else 2):
-        for (data, target) in training_dataloader:
-            data, target = data.cuda(), target.cuda()
-            optimizer.zero_grad()
-            output = model(data)
-            loss = criterion(output, target)
-
-            loss.backward()
-            optimizer.step()
-
-            if lr_scheduler:
-                lr_scheduler.step()
-
-            current_step += 1
-
-        # evaluation for every 1000 steps
-        if current_step % 1000 == 0 or current_step % len(training_dataloader) == 0:
-            acc = evaluation_func(validation_dataloader, model)
-            with log_path.open('a+') as f:
-                msg = '[{}] Epoch {}, Step {}: Acc: {} Loss:{}\n'.format(time.asctime(time.localtime(time.time())), \
-                    current_epoch, current_step, acc, loss.item())
-                f.write(msg)
-            if save_best_model and best_acc < acc:
-                assert save_path is not None
-                if local_rank == 0:
-                    torch.save(model.module.state_dict(), save_path)
-                best_acc = acc
-
-        if max_steps and current_step >= max_steps:
-            return best_acc
-
-    return best_acc
-
-def evaluation(validation_dataloader: DataLoader, model: nn.Module):
-    training = model.training
-    model.eval()
-
-    correct = 0.0
-    with torch.no_grad():
-        for data, target in validation_dataloader:
-            data, target = data.cuda(), target.cuda()
-            output = model(data)
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    acc = 100 * correct / len(validation_dataloader.dataset)
-    # average acc in different local_ranks
-    average_acc = torch.tensor([acc]).cuda()
-    dist.all_reduce(average_acc, op=dist.ReduceOp.SUM)
-    world_size = dist.get_world_size()
-    average_acc = average_acc / world_size
-
-    print('Average Accuracy: {}%\n'.format(average_acc.item()))
-    model.train(training)
-
-    return average_acc.item()
-
-def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
-    optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
-    scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
-    return optimizer, scheduler
-
-def retrain_model(
-    args,
-    local_rank: int,
-    model: nn.Module = None,
-):
-    # create an ddp model
-    if model is None:  # pretraining process
-        model = Mnist().cuda()
-        log_save_path = "pretraining.log"
-        model_save_path = "pretraining_best_model.pth"
-        epochs = args.pretrain_epochs
-        lr = args.pretraining_lr
-    else: # finetune process
-        log_save_path = "finetune.log"
-        model_save_path = "finetune_best_model.pth"
-        epochs = args.finetune_epochs
-        lr = args.finetune_lr
-
-    model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
-
-    # create dataloaders
-    training_dataloader, validation_dataloader = create_dataloaders()
-    # create optimizer, lr_scheduler and criterion
-    optimizer, lr_scheduler = optimizer_scheduler_generator(model, \
-        _lr=lr, total_epoch=epochs)
-    criterion = torch.nn.CrossEntropyLoss()
-
-    # training and evaluation process
-    best_acc = training(training_dataloader, validation_dataloader, model, optimizer, criterion, lr_scheduler,\
-        args.max_steps, epochs, local_rank, save_best_model=True, \
-        save_path=Path(args.log_dir) / model_save_path, \
-        log_path=Path(args.log_dir) / log_save_path, \
-        evaluation_func=evaluation)
-
-    # compute params and FLOPs
-    flops, params, _ = count_flops_params(model, torch.randn([32, 1, 28, 28]).cuda())
-
-    return flops, params, best_acc
-
-def pruned_model_process(args, local_rank):
-    # load the pretrained model
-    model = Mnist().cuda()
-    state_dict = torch.load(Path(args.log_dir) / "pretraining_best_model.pth")
-    model.load_state_dict(state_dict)
-    model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
-    # create dataloaders
-    training_dataloader, validation_dataloader = create_dataloaders()
-    # build a config_list
-    config_list = [{'total_sparsity': 0.7, 'op_types': ['Conv2d']}]
-    # create an evaluator
-    taylor_training = functools.partial(
-        training,
-        training_dataloader,
-        validation_dataloader,
-        local_rank = local_rank,
-        log_path = Path(args.log_dir) / "taylor_pruning.log",
-        evaluation_func = evaluation,
-    )
-    traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-    criterion = torch.nn.CrossEntropyLoss()
-    evaluator = TorchEvaluator(taylor_training, traced_optimizer, criterion)
-    # create an taylor pruner
-    pruner = TaylorFOWeightPruner(model=model, config_list=config_list,
-                evaluator=evaluator, training_steps=args.pruner_training_steps)
-    _, masks = pruner.compress()
-    pruner.show_pruned_weights()
-    pruner._unwrap_model()
-    #speedup
-    sub_module = ModelSpeedup(model, dummy_input=torch.rand([32, 1, 28, 28]).cuda(), masks_file=masks).speedup_model()
-
-    return sub_module
-
-def main():
-    parser = argparse.ArgumentParser(description='PyTorch Example for model comporession with ddp')
-    parser.add_argument('--finetune_lr', type=float, default=0.01,
-                        help='the learning rate in the fine-tune process')
-    parser.add_argument('--pretraining_lr', type=float, default=0.01,
-                        help='the learning rate in the pretraining process')
-    parser.add_argument('--eps', type=float, default=1e-8,
-                        help='the parameter in the Adam optimizer')
-    parser.add_argument('--max_steps', type=int, default=None,
-                        help='the max number of training steps')
-    parser.add_argument('--log_dir', type=str, default='./mnist_infos',
-                        help='the base path for saving files')
-    parser.add_argument('--pruner_training_steps', type=int, default=1000,
-                        help='the number of training steps in the pruning process')
-    parser.add_argument('--pretrain_epochs', type=int, default=5,
-                        help='number of epochs to pretrain the model')
-    parser.add_argument('--finetune_epochs', type=int, default=20,
-                        help='number of epochs to fine-tune the model')
-    args = parser.parse_args()
-
-    Path(args.log_dir).mkdir(parents=True, exist_ok=True)
-    #init ddp
-    dist.init_process_group(backend='nccl')
-    # get local_rank
-    rank = dist.get_rank()
-    local_rank = rank % torch.cuda.device_count()
-    print(f"local_rank:{local_rank}")
-    torch.cuda.set_device(local_rank)
-
-    print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
-    original_flops, original_params, original_best_acc = retrain_model(args, local_rank)
-
-    # # Start to prune and speedup
-    print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
-    model = pruned_model_process(args, local_rank)
-    print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
-    finetuned_flops, finetuned_params, finetuned_best_acc = retrain_model(args, local_rank, model.cuda())
-    print(f'Pretrained model FLOPs {original_flops/1e6:.2f} M, #Params: {original_params/1e6:.2f}M, Accuracy: {original_best_acc: .2f}%')
-    print(f'Finetuned model FLOPs {finetuned_flops/1e6:.2f} M, #Params: {finetuned_params/1e6:.2f}M, Accuracy: {finetuned_best_acc: .2f}%')
-
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/pruning/taylorfo_torch_evaluator.py
+++ b/examples/model_compress/pruning/taylorfo_torch_evaluator.py
@ -1,112 +0,0 @@
-from __future__ import annotations
-from typing import Callable, Any
-
-import torch
-from torch.optim.lr_scheduler import StepLR
-from torch.utils.data import DataLoader
-from torchvision import datasets, transforms
-
-import nni
-from nni.compression.pytorch import TorchEvaluator
-from nni.common.types import SCHEDULER
-
-import sys
-from pathlib import Path
-sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
-from cifar10.vgg import VGG
-
-
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-model: torch.nn.Module = VGG().to(device)
-
-
-def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
-                  criterion: Callable[[Any, Any], torch.Tensor],
-                  lr_schedulers: SCHEDULER | None = None, max_steps: int | None = None,
-                  max_epochs: int | None = None, *args, **kwargs):
-    model.train()
-    # prepare data
-    cifar10_train_data = datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
-        transforms.RandomHorizontalFlip(),
-        transforms.RandomCrop(32, 4),
-        transforms.ToTensor(),
-        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-    ]), download=True)
-    train_dataloader = DataLoader(cifar10_train_data, batch_size=128, shuffle=True)
-
-    total_epochs = max_epochs if max_epochs else 3
-    total_steps = max_steps if max_steps else None
-    current_steps = 0
-
-    # training loop
-    for _ in range(total_epochs):
-        for inputs, labels in train_dataloader:
-            inputs, labels = inputs.to(device), labels.to(device)
-
-            optimizers.zero_grad()
-            loss = criterion(model(inputs), labels)
-            loss.backward()
-            optimizers.step()
-            current_steps += 1
-            if total_steps and current_steps == total_steps:
-                return
-        lr_schedulers.step()
-
-
-def evaluating_func(model: torch.nn.Module):
-    model.eval()
-    # prepare data
-    cifar10_val_data = datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
-    ]), download=True)
-    val_dataloader = DataLoader(cifar10_val_data, batch_size=4, shuffle=False)
-    # testing loop
-    correct = 0
-    with torch.no_grad():
-        for inputs, labels in val_dataloader:
-            inputs, labels = inputs.to(device), labels.to(device)
-            logits = model(inputs)
-            preds = torch.argmax(logits, dim=1)
-            correct += preds.eq(labels.view_as(preds)).sum().item()
-    return correct / len(cifar10_val_data)
-
-# Train the model
-optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-criterion = torch.nn.CrossEntropyLoss()
-lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
-training_func(model, optimizer, criterion, lr_scheduler)
-acc = evaluating_func(model)
-print(f'The trained model accuracy: {acc}')
-
-# create traced optimizer / lr_scheduler
-optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
-criterion = torch.nn.CrossEntropyLoss()
-lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
-dummy_input = torch.rand(4, 3, 224, 224).to(device)
-
-# TorchEvaluator initialization
-evaluator = TorchEvaluator(training_func=training_func, optimizers=optimizer, criterion=criterion,
-                           lr_schedulers=lr_scheduler, dummy_input=dummy_input, evaluating_func=evaluating_func)
-
-# apply pruning
-from nni.compression.pytorch.pruning import TaylorFOWeightPruner
-from nni.compression.pytorch.speedup import ModelSpeedup
-
-pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
-_, masks = pruner.compress()
-acc = evaluating_func(model)
-print(f'The masked model accuracy: {acc}')
-pruner.show_pruned_weights()
-pruner._unwrap_model()
-ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
-acc = evaluating_func(model)
-print(f'The speedup model accuracy: {acc}')
-
-# finetune the speedup model
-optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
-criterion = torch.nn.CrossEntropyLoss()
-lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
-training_func(model, optimizer, criterion, lr_scheduler)
-acc = evaluating_func(model)
-print(f'The speedup model after finetuning accuracy: {acc}')
--- a/examples/model_compress/pruning/taylorfo_transformers_evaluator.py
+++ b/examples/model_compress/pruning/taylorfo_transformers_evaluator.py
@ -1,57 +0,0 @@
-import numpy as np
-
-from datasets import load_dataset, load_metric
-from transformers import (
-    AutoTokenizer,
-    AutoModelForSequenceClassification,
-    Trainer,
-    TrainingArguments
-)
-
-import nni
-from nni.compression.pytorch import TransformersEvaluator
-from nni.compression.pytorch.pruning import TaylorFOWeightPruner
-
-
-dataset = load_dataset('yelp_review_full')
-tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
-
-def tokenize_function(examples):
-    return tokenizer(examples['text'], padding='max_length', truncation=True)
-
-tokenized_datasets = dataset.map(tokenize_function, batched=True)
-
-small_train_dataset = tokenized_datasets['train'].shuffle(seed=42).select(range(1000))
-small_eval_dataset = tokenized_datasets['test'].shuffle(seed=42).select(range(1000))
-
-model = AutoModelForSequenceClassification.from_pretrained('bert-base-cased', num_labels=5)
-
-training_args = TrainingArguments(output_dir='test_trainer')
-
-metric = load_metric('accuracy')
-
-def compute_metrics(eval_pred):
-    logits, labels = eval_pred
-    predictions = np.argmax(logits, axis=-1)
-    return metric.compute(predictions=predictions, references=labels)
-
-training_args = TrainingArguments(
-    output_dir='./log',
-    evaluation_strategy='epoch',
-    per_device_train_batch_size=32,
-    num_train_epochs=3,
-    max_steps=-1
-)
-
-trainer = nni.trace(Trainer)(
-    model=model,
-    args=training_args,
-    train_dataset=small_train_dataset,
-    eval_dataset=small_eval_dataset,
-    compute_metrics=compute_metrics
-)
-
-evaluator = TransformersEvaluator(trainer)
-pruner = TaylorFOWeightPruner(model, [{'op_types': ['Linear'], 'sparsity': 0.5}], evaluator, 20)
-_, masks = pruner.compress()
-pruner.show_pruned_weights()
--- a/examples/model_compress/quantization/BNN_quantizer_cifar10.py
+++ b/examples/model_compress/quantization/BNN_quantizer_cifar10.py
@ -1,154 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-from nni.compression.pytorch.quantization import BNNQuantizer
-
-
-class VGG_Cifar10(nn.Module):
-    def __init__(self, num_classes=1000):
-        super(VGG_Cifar10, self).__init__()
-        self.features = nn.Sequential(
-            nn.Conv2d(3, 128, kernel_size=3, padding=1, bias=False),
-            nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True),
-
-            nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=False),
-            nn.MaxPool2d(kernel_size=2, stride=2),
-            nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True),
-
-            nn.Conv2d(128, 256, kernel_size=3, padding=1, bias=False),
-            nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True),
-
-
-            nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=False),
-            nn.MaxPool2d(kernel_size=2, stride=2),
-            nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True),
-
-            nn.Conv2d(256, 512, kernel_size=3, padding=1, bias=False),
-            nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True),
-
-
-            nn.Conv2d(512, 512, kernel_size=3, padding=1, bias=False),
-            nn.MaxPool2d(kernel_size=2, stride=2),
-            nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
-            nn.Hardtanh(inplace=True)
-        )
-
-        self.classifier = nn.Sequential(
-            nn.Linear(512 * 4 * 4, 1024, bias=False),
-            nn.BatchNorm1d(1024),
-            nn.Hardtanh(inplace=True),
-            nn.Linear(1024, 1024, bias=False),
-            nn.BatchNorm1d(1024),
-            nn.Hardtanh(inplace=True),
-            nn.Linear(1024, num_classes), # do not quantize output
-            nn.BatchNorm1d(num_classes, affine=False)
-        )
-
-
-    def forward(self, x):
-        x = self.features(x)
-        x = x.view(-1, 512 * 4 * 4)
-        x = self.classifier(x)
-        return x
-
-
-def train(model, device, train_loader, optimizer):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.cross_entropy(output, target)
-        loss.backward()
-        optimizer.step()
-        for name, param in model.named_parameters():
-            if name.endswith('old_weight'):
-                param = param.clamp(-1, 1)
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-    acc = 100 * correct / len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, acc))
-    return acc
-
-def adjust_learning_rate(optimizer, epoch):
-    update_list = [55, 100, 150, 200, 400, 600]
-    if epoch in update_list:
-        for param_group in optimizer.param_groups:
-            param_group['lr'] = param_group['lr'] * 0.1
-    return
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    train_loader = torch.utils.data.DataLoader(
-        datasets.CIFAR10('./data.cifar10', train=True, download=True,
-                         transform=transforms.Compose([
-                             transforms.ToTensor(),
-                             transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-                         ])),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
-            transforms.ToTensor(),
-            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
-        ])),
-        batch_size=200, shuffle=False)
-
-    model = VGG_Cifar10(num_classes=10)
-    model.to(device)
-
-    configure_list = [{
-        'quant_types': ['weight'],
-        'quant_bits': 1,
-        'op_types': ['Conv2d', 'Linear'],
-        'op_names': ['features.3', 'features.7', 'features.10', 'features.14', 'classifier.0', 'classifier.3']
-    }, {
-        'quant_types': ['output'],
-        'quant_bits': 1,
-        'op_types': ['Hardtanh'],
-        'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
-    }]
-
-    optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
-    quantizer = BNNQuantizer(model, configure_list, optimizer)
-    model = quantizer.compress()
-
-    print('=' * 10 + 'train' + '=' * 10)
-    best_top1 = 0
-    for epoch in range(400):
-        print('# Epoch {} #'.format(epoch))
-        train(model, device, train_loader, optimizer)
-        adjust_learning_rate(optimizer, epoch)
-        top1 = test(model, device, test_loader)
-        if top1 > best_top1:
-            best_top1 = top1
-    print(best_top1)
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/DoReFaQuantizer_torch_mnist.py
+++ b/examples/model_compress/quantization/DoReFaQuantizer_torch_mnist.py
@ -1,71 +0,0 @@
-import torch
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-from nni.compression.pytorch.quantization import DoReFaQuantizer
-
-import sys
-sys.path.append('../models')
-from mnist.naive import NaiveModel
-
-
-def train(model, quantizer, device, train_loader, optimizer):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=trans),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=trans),
-        batch_size=1000, shuffle=True)
-
-    model = NaiveModel()
-    model = model.to(device)
-    configure_list = [{
-        'quant_types': ['weight'],
-        'quant_bits': {
-            'weight': 8,
-        }, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
-        'op_types':['Conv2d', 'Linear']
-    }]
-    quantizer = DoReFaQuantizer(model, configure_list)
-    quantizer.compress()
-
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.5)
-    for epoch in range(10):
-        print('# Epoch {} #'.format(epoch))
-        train(model, quantizer, device, train_loader, optimizer)
-        test(model, device, test_loader)
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/LSQ_torch_quantizer.py
+++ b/examples/model_compress/quantization/LSQ_torch_quantizer.py
@ -1,142 +0,0 @@
-import torch
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-from nni.compression.pytorch.quantization import LsqQuantizer
-from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
-
-
-class Mnist(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
-        self.fc2 = torch.nn.Linear(500, 10)
-        self.relu1 = torch.nn.ReLU6()
-        self.relu2 = torch.nn.ReLU6()
-        self.relu3 = torch.nn.ReLU6()
-        self.max_pool1 = torch.nn.MaxPool2d(2, 2)
-        self.max_pool2 = torch.nn.MaxPool2d(2, 2)
-
-    def forward(self, x):
-        x = self.relu1(self.conv1(x))
-        x = self.max_pool1(x)
-        x = self.relu2(self.conv2(x))
-        x = self.max_pool2(x)
-        x = x.view(-1, 4 * 4 * 50)
-        x = self.relu3(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-
-def train(model, quantizer, device, train_loader, optimizer):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-
-
-def test_trt(engine, test_loader):
-    test_loss = 0
-    correct = 0
-    time_elasped = 0
-    for data, target in test_loader:
-        output, time = engine.inference(data)
-        test_loss += F.nll_loss(output, target, reduction='sum').item()
-        pred = output.argmax(dim=1, keepdim=True)
-        correct += pred.eq(target.view_as(pred)).sum().item()
-        time_elasped += time
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-    print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
-
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=trans),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=trans),
-        batch_size=1000, shuffle=True)
-
-    model = Mnist()
-    configure_list = [{
-            'quant_types': ['weight', 'input'],
-            'quant_bits': {'weight': 8, 'input': 8},
-            'op_names': ['conv1']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8, },
-            'op_names': ['relu1']
-        }, {
-            'quant_types': ['weight', 'input'],
-            'quant_bits': {'weight': 8, 'input': 8},
-            'op_names': ['conv2']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8},
-            'op_names': ['relu2']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8},
-            'op_names': ['max_pool2']
-        }
-    ]
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-    quantizer = LsqQuantizer(model, configure_list, optimizer)
-    quantizer.compress()
-
-    model.to(device)
-    for epoch in range(40):
-        print('# Epoch {} #'.format(epoch))
-        train(model, quantizer, device, train_loader, optimizer)
-        test(model, device, test_loader)
-
-    model_path = "mnist_model.pth"
-    calibration_path = "mnist_calibration.pth"
-    calibration_config = quantizer.export_model(model_path, calibration_path)
-
-    test(model, device, test_loader)
-
-    print("calibration_config: ", calibration_config)
-
-    batch_size = 32
-    input_shape = (batch_size, 1, 28, 28)
-
-    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=batch_size)
-    engine.compress()
-
-    test_trt(engine, test_loader)
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/QAT_torch_quantizer.py
+++ b/examples/model_compress/quantization/QAT_torch_quantizer.py
@ -1,115 +0,0 @@
-import torch
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-from nni.compression.pytorch.quantization import QAT_Quantizer
-from nni.compression.pytorch.quantization.settings import set_quant_scheme_dtype
-
-import sys
-sys.path.append('../models')
-from mnist.naive import NaiveModel
-
-
-def train(model, device, train_loader, optimizer):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=trans),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=trans),
-        batch_size=1000, shuffle=True)
-
-    # Two things should be kept in mind when set this configure_list:
-    # 1. When deploying model on backend, some layers will be fused into one layer. For example, the consecutive
-    # conv + bn + relu layers will be fused into one big layer. If we want to execute the big layer in quantization
-    # mode, we should tell the backend the quantization information of the input, output, and the weight tensor of
-    # the big layer, which correspond to conv's input, conv's weight and relu's output.
-    # 2. Same tensor should be quantized only once. For example, if a tensor is the output of layer A and the input
-    # of the layer B, you should configure either {'quant_types': ['output'], 'op_names': ['a']} or
-    # {'quant_types': ['input'], 'op_names': ['b']} in the configure_list.
-
-    configure_list = [{
-        'quant_types': ['weight', 'input'],
-        'quant_bits': {'weight': 8, 'input': 8},
-        'op_names': ['conv1', 'conv2']
-    }, {
-        'quant_types': ['output'],
-        'quant_bits': {'output': 8, },
-        'op_names': ['relu1', 'relu2']
-    }, {
-        'quant_types': ['output', 'weight', 'input'],
-        'quant_bits': {'output': 8, 'weight': 8, 'input': 8},
-        'op_names': ['fc1', 'fc2'],
-    }]
-
-    # you can also set the quantization dtype and scheme layer-wise through configure_list like:
-    # configure_list = [{
-    #         'quant_types': ['weight', 'input'],
-    #         'quant_bits': {'weight': 8, 'input': 8},
-    #         'op_names': ['conv1', 'conv2'],
-    #         'quant_dtype': 'int',
-    #         'quant_scheme': 'per_channel_symmetric'
-    #       }]
-    # For now quant_dtype's options are 'int' and 'uint. And quant_scheme's options are per_tensor_affine,
-    # per_tensor_symmetric, per_channel_affine and per_channel_symmetric.
-    set_quant_scheme_dtype('weight', 'per_channel_symmetric', 'int')
-    set_quant_scheme_dtype('output', 'per_tensor_symmetric', 'int')
-    set_quant_scheme_dtype('input', 'per_tensor_symmetric', 'int')
-
-    model = NaiveModel().to(device)
-    dummy_input = torch.randn(1, 1, 28, 28).to(device)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-    # To enable batch normalization folding in the training process, you should
-    # pass dummy_input to the QAT_Quantizer.
-    quantizer = QAT_Quantizer(model, configure_list, optimizer, dummy_input=dummy_input)
-    quantizer.compress()
-
-    model.to(device)
-    for epoch in range(40):
-        print('# Epoch {} #'.format(epoch))
-        train(model, device, train_loader, optimizer)
-        test(model, device, test_loader)
-
-    model_path = "mnist_model.pth"
-    calibration_path = "mnist_calibration.pth"
-    onnx_path = "mnist_model.onnx"
-    input_shape = (1, 1, 28, 28)
-    device = torch.device("cuda")
-
-    calibration_config = quantizer.export_model(model_path, calibration_path, onnx_path, input_shape, device)
-    print("Generated calibration config is: ", calibration_config)
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/bert_quantization_with_ds.py
+++ b/examples/model_compress/quantization/bert_quantization_with_ds.py
@ -152,8 +152,8 @@ def build_finetuning_model(state_dict_path: str, is_quant=False):


 import nni
-from nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
-from nni.contrib.compression.utils import TransformersEvaluator
+from nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
+from nni.compression.utils import TransformersEvaluator


 def fake_quantize():
--- a/examples/model_compress/quantization/mixed_precision_speedup_mnist.py
+++ b/examples/model_compress/quantization/mixed_precision_speedup_mnist.py
@ -1,152 +0,0 @@
-import torch
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-
-from nni.compression.pytorch.quantization import QAT_Quantizer
-from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
-
-import sys
-sys.path.append('../models')
-from mnist.naive import NaiveModel
-
-
-def train(model, device, train_loader, optimizer):
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-
-def test_trt(engine, test_loader):
-    test_loss = 0
-    correct = 0
-    time_elasped = 0
-    for data, target in test_loader:
-        output, time = engine.inference(data)
-        test_loss += F.nll_loss(output, target, reduction='sum').item()
-        pred = output.argmax(dim=1, keepdim=True)
-        correct += pred.eq(target.view_as(pred)).sum().item()
-        time_elasped += time
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-    print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
-
-def post_training_quantization_example(train_loader, test_loader, device):
-    model = NaiveModel()
-
-    config = {
-        'conv1':{'weight_bits':8, 'output_bits':8},
-        'conv2':{'weight_bits':32, 'output_bits':32},
-        'fc1':{'weight_bits':16, 'output_bits':16},
-        'fc2':{'weight_bits':8, 'output_bits':8}
-    }
-
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-
-    model.to(device)
-    for epoch in range(1):
-        print('# Epoch {} #'.format(epoch))
-        train(model, device, train_loader, optimizer)
-        test(model, device, test_loader)
-
-    batch_size = 32
-    input_shape = (batch_size, 1, 28, 28)
-
-    engine = ModelSpeedupTensorRT(model, input_shape, config=config, calib_data_loader=train_loader, batchsize=batch_size)
-    engine.compress()
-    test_trt(engine, test_loader)
-
-def quantization_aware_training_example(train_loader, test_loader, device):
-    model = NaiveModel()
-
-    configure_list = [{
-            'quant_types': ['input', 'weight'],
-            'quant_bits': {'input':8, 'weight':8},
-            'op_names': ['conv1']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output':8},
-            'op_names': ['relu1']
-        }, {
-            'quant_types': ['input', 'weight'],
-            'quant_bits': {'input':8, 'weight':8},
-            'op_names': ['conv2']
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output':8},
-            'op_names': ['relu2']
-        }
-    ]
-
-    # finetune the model by using QAT
-    # enable batchnorm folding mode
-    dummy_input = torch.randn(1, 1, 28, 28)
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-    quantizer = QAT_Quantizer(model, configure_list, optimizer, dummy_input=dummy_input)
-    quantizer.compress()
-
-    model.to(device)
-    for epoch in range(1):
-        print('# Epoch {} #'.format(epoch))
-        train(model, device, train_loader, optimizer)
-        test(model, device, test_loader)
-
-    model_path = "mnist_model.pth"
-    calibration_path = "mnist_calibration.pth"
-    calibration_config = quantizer.export_model(model_path, calibration_path)
-
-    test(model, device, test_loader)
-
-    print("calibration_config: ", calibration_config)
-
-    batch_size = 32
-    input_shape = (batch_size, 1, 28, 28)
-
-    engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=batch_size)
-    engine.compress()
-
-    test_trt(engine, test_loader)
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=trans),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=trans),
-        batch_size=1000, shuffle=True)
-
-    # post-training quantization on TensorRT
-    post_training_quantization_example(train_loader, test_loader, device)
-
-    # combine NNI quantization algorithm QAT with backend framework TensorRT
-    quantization_aware_training_example(train_loader, test_loader, device)
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/observer_quantizer.py
+++ b/examples/model_compress/quantization/observer_quantizer.py
@ -1,117 +0,0 @@
-import torch
-import torch.nn.functional as F
-from torchvision import datasets, transforms
-from nni.compression.pytorch.quantization import ObserverQuantizer
-import sys
-sys.path.append('../models')
-from mnist.naive import NaiveModel
-
-
-def train(model, device, train_loader, optimizer):
-    model.to(device)
-    model.train()
-    for batch_idx, (data, target) in enumerate(train_loader):
-        data, target = data.to(device), target.to(device)
-        optimizer.zero_grad()
-        output = model(data)
-        loss = F.nll_loss(output, target)
-        loss.backward()
-        optimizer.step()
-        if batch_idx % 100 == 0:
-            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
-
-
-def test(model, device, test_loader):
-    model.eval()
-    test_loss = 0
-    correct = 0
-    with torch.no_grad():
-        for data, target in test_loader:
-            data, target = data.to(device), target.to(device)
-            output = model(data)
-            test_loss += F.nll_loss(output, target, reduction='sum').item()
-            pred = output.argmax(dim=1, keepdim=True)
-            correct += pred.eq(target.view_as(pred)).sum().item()
-    test_loss /= len(test_loader.dataset)
-
-    print('Loss: {}  Accuracy: {}%)\n'.format(
-        test_loss, 100 * correct / len(test_loader.dataset)))
-
-
-def calibration(model, device, test_loader):
-    model.eval()
-    with torch.no_grad():
-        for data, _ in test_loader:
-            data = data.to(device)
-            model(data)
-
-
-def main():
-    torch.manual_seed(0)
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=True, download=True, transform=trans),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('data', train=False, transform=trans),
-        batch_size=1000, shuffle=True)
-
-    model = NaiveModel()
-    configure_list = [{
-            'quant_types': ['weight', 'input'],
-            'quant_bits': {'weight': 8, 'input': 8},
-            'op_names': ['conv1'],
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8, },
-            'op_names': ['relu1'],
-        }, {
-            'quant_types': ['weight', 'input'],
-            'quant_bits': {'weight': 8, 'input': 8},
-            'op_names': ['conv2'],
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8},
-            'op_names': ['relu2'],
-        }, {
-            'quant_types': ['output'],
-            'quant_bits': {'output': 8},
-            'op_names': ['max_pool2'],
-        }
-    ]
-    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
-
-    # Train the model to get a baseline performance
-    for epoch in range(5):
-        print('# Epoch {} #'.format(epoch))
-        train(model, device, train_loader, optimizer)
-
-    test(model, device, test_loader)
-
-    # Construct the ObserverQuantizer. Note that currently ObserverQuantizer only works
-    # in evaluation mode.
-    quantizer = ObserverQuantizer(model.eval(), configure_list, optimizer)
-    # Use the test data set to do calibration, this will not change the model parameters
-    calibration(model, device, test_loader)
-    # obtain the quantization information and switch the model to "accuracy verification" mode
-    quantizer.compress()
-
-    # measure the accuracy of the quantized model.
-    test(model, device, test_loader)
-
-    model_path = "mnist_model.pth"
-    calibration_path = "mnist_calibration.pth"
-    calibration_config = quantizer.export_model(model_path, calibration_path)
-    print("calibration_config: ", calibration_config)
-
-    # For now the quantization settings of ObserverQuantizer does not match the TensorRT,
-    # so TensorRT conversion are not supported
-    # current settings:
-    # weight      : per_tensor_symmetric, qint8
-    # activation  : per_tensor_affine, quint8, reduce_range=True
-
-
-if __name__ == '__main__':
-    main()
--- a/examples/model_compress/quantization/quantization_with_deepspeed.py
+++ b/examples/model_compress/quantization/quantization_with_deepspeed.py
@ -12,8 +12,8 @@ from torch import Tensor

 from torchvision import datasets, transforms
 from deepspeed import DeepSpeedEngine
-from nni.contrib.compression.quantization import LsqQuantizer
-from nni.contrib.compression.utils import DeepspeedTorchEvaluator
+from nni.compression.quantization import LsqQuantizer
+from nni.compression.utils import DeepspeedTorchEvaluator
 from nni.common.types import SCHEDULER


--- a/examples/tutorials/new_pruning_bert_glue.py
+++ b/examples/tutorials/new_pruning_bert_glue.py
@ -199,8 +199,8 @@ if not skip_exec:
 # The following code creates distillers for distillation.


-from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
-from nni.contrib.compression.utils import TransformersEvaluator
+from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
+from nni.compression.utils import TransformersEvaluator

 # %%
 # Dynamic distillation is suitable for the situation where the distillation states dimension of the student and the teacher match.
@ -312,9 +312,9 @@ def adapt_distillation(student_model: BertForSequenceClassification, teacher_mod
 # You could refer to the experiment results to choose a appropriate ``regular_scale`` you like.


-from nni.contrib.compression.pruning import MovementPruner
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
-from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
+from nni.compression.pruning import MovementPruner
+from nni.compression.speedup import ModelSpeedup
+from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer


 def pruning_attn():
@ -378,7 +378,7 @@ if not skip_exec:
 # so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning performance.


-from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
+from nni.compression.pruning import TaylorPruner, AGPPruner
 from transformers.models.bert.modeling_bert import BertLayer


@ -444,7 +444,7 @@ if not skip_exec:
 # The output masks can be generated and applied after register the setting template for them.


-from nni.contrib.compression.base.setting import PruningSetting
+from nni.compression.base.setting import PruningSetting

 output_align_setting = {
    '_output_': {
--- a/examples/tutorials/pruning_quick_start.py
+++ b/examples/tutorials/pruning_quick_start.py
@ -65,7 +65,7 @@ config_list = [{
 # %%
 # Pruners usually require `model` and `config_list` as input arguments.

-from nni.contrib.compression.pruning import L1NormPruner
+from nni.compression.pruning import L1NormPruner
 pruner = L1NormPruner(model, config_list)

 # show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -88,7 +88,7 @@ for name, mask in masks.items():
 pruner.unwrap_model()

 # speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
-from nni.compression.pytorch.speedup.v2 import ModelSpeedup
+from nni.compression.speedup import ModelSpeedup

 ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

--- a/Показать больше
+++ b/Показать больше