[Compression] merge nni.contrib.compression with nni.compression (#5573)

Co-authored-by: nishang <nishang@microsoft.com>
This commit is contained in:
J-shang 2023-07-10 10:33:53 +08:00 коммит произвёл GitHub
Родитель 8dc1a83047
Коммит 27a24a12af
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
323 изменённых файлов: 950 добавлений и 22053 удалений

Просмотреть файл

@ -8,23 +8,6 @@ Nonetheless, if you have employed NNI Compression before and want to try the lat
this document will help you in comprehending the noteworthy alterations in the interface in 3.0.
New compression version import path:
.. code-block:: python
# most new compression related, include pruners, quantizers, distillers, except new pruning speedup
from nni.contrib.compression.xxx import xxx
# new pruning speedup
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
Old compression version import path:
.. code-block:: python
from nni.compression.pytorch.xxx import xxx
Compression Target
------------------

Просмотреть файл

@ -20,21 +20,21 @@ NNI introduces the ``Evaluator`` as the carrier of the training and evaluation p
These APIs were maybe tedious in terms of user experience. Users need to exchange the corresponding API frequently if they want to switch compression algorithms.
``Evaluator`` is an alternative to the above interface, users only need to create the evaluator once and it can be used in all compressors.
For users of native PyTorch, :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
For users of native PyTorch, :class:`TorchEvaluator <nni.compression.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
which will bring some complexity. But don't worry, in most cases, this will not change too much code.
For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.
For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.compression.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.
For users of `Transformers Trainer <https://huggingface.co/docs/transformers/main_classes/trainer>`__, :class:`TransformersEvaluator <nni.contrib.compression.TransformersEvaluator>` can be created with only a few lines of code.
For users of `Transformers Trainer <https://huggingface.co/docs/transformers/main_classes/trainer>`__, :class:`TransformersEvaluator <nni.compression.TransformersEvaluator>` can be created with only a few lines of code.
Here we give three examples of how to create an ``Evaluator`` for native PyTorch users, PyTorchLightning users and Huggingface Transformers users.
TorchEvaluator
--------------
:class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).
:class:`TorchEvaluator <nni.compression.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).
:class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``training_step``, ``lr_schedulers``,
:class:`TorchEvaluator <nni.compression.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``training_step``, ``lr_schedulers``,
``dummy_input``, ``evaluating_func``.
* ``training_func`` is the training loop to train the compressed model.
@ -53,8 +53,8 @@ TorchEvaluator
* ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
The format of metric should be a float number or a dict with key ``default``.
Please refer :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` for more details.
Here is an example of how to initialize a :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>`.
Please refer :class:`TorchEvaluator <nni.compression.TorchEvaluator>` for more details.
Here is an example of how to initialize a :class:`TorchEvaluator <nni.compression.TorchEvaluator>`.
.. code-block:: python
@ -89,7 +89,7 @@ Here is an example of how to initialize a :class:`TorchEvaluator <nni.contrib.co
evaluator = TorchEvaluator(training_func, optimizer, training_step, lr_scheduler)
.. note::
It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` must be provided.
It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.TorchEvaluator>` must be provided.
Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
Please refer to each compressor's doc to check the required arguments.
But, it is fine to provide more arguments than the compressor's need.
@ -100,7 +100,7 @@ A complete example can be found :githublink:`here <examples/compression/evaluato
LightningEvaluator
------------------
:class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` is for the users who work with PyTorchLightning.
:class:`LightningEvaluator <nni.compression.LightningEvaluator>` is for the users who work with PyTorchLightning.
Only three parts users need to modify compared with the original pytorch-lightning code:
@ -108,8 +108,8 @@ Only three parts users need to modify compared with the original pytorch-lightni
2. Wrap the ``LightningModule`` class with ``nni.trace``.
3. Wrap the ``LightningDataModule`` class with ``nni.trace``.
Please refer :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>` for more details.
Here is an example of how to initialize a :class:`LightningEvaluator <nni.contrib.compression.LightningEvaluator>`.
Please refer :class:`LightningEvaluator <nni.compression.LightningEvaluator>` for more details.
Here is an example of how to initialize a :class:`LightningEvaluator <nni.compression.LightningEvaluator>`.
.. code-block:: python
@ -139,7 +139,7 @@ A complete example can be found :githublink:`here <examples/compression/evaluato
TransformersEvaluator
---------------------
:class:`TransformersEvaluator <nni.contrib.compression.TransformersEvaluator>` is for the users who work with Huggingface Transformers Trainer.
:class:`TransformersEvaluator <nni.compression.TransformersEvaluator>` is for the users who work with Huggingface Transformers Trainer.
The only need is using ``nni.trace`` to wrap the Trainer class.
@ -149,7 +149,7 @@ The only need is using ``nni.trace`` to wrap the Trainer class.
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model, training_args, ...)
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.utils import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)
Moreover, if you are utilizing a personalized optimizer or learning rate scheduler, kindly use ``nni.trace`` to wrap their class as well.
@ -166,9 +166,9 @@ A complete example of using a trainer with DeepSpeed mode under the Transformers
DeepspeedTorchEvaluator
-----------------------
:class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>` is an evaluator designed specifically for native PyTorch users who are utilizing DeepSpeed.
:class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>` is an evaluator designed specifically for native PyTorch users who are utilizing DeepSpeed.
:class:`DeepspeedTorchEvaluator <nni.contrib.compression.TorchEvaluator>` has eight initialization parameters ``training_func``, ``training_step``, ``deepspeed``, ``optimizer``, ``lr_scheduler``,
:class:`DeepspeedTorchEvaluator <nni.compression.TorchEvaluator>` has eight initialization parameters ``training_func``, ``training_step``, ``deepspeed``, ``optimizer``, ``lr_scheduler``,
``resume_from_checkpoint_args``, ``dummy_input``, ``evaluating_func``.
* ``training_func`` is the training loop to train the compressed model.
@ -189,8 +189,8 @@ DeepspeedTorchEvaluator
* ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
The format of metric should be a float number or a dict with key ``default``.
Please refer :class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>` for more details.
Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.contrib.compression.DeepspeedTorchEvaluator>`.
Please refer :class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>` for more details.
Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.compression.DeepspeedTorchEvaluator>`.
.. code-block:: python
@ -236,7 +236,7 @@ Here is an example of how to initialize a :class:`DeepspeedTorchEvaluator <nni.c
evaluator = DeepspeedTorchEvaluator(training_func, training_step, ds_config, lr_scheduler)
.. note::
It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.contrib.compression.TorchEvaluator>` must be provided.
It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.TorchEvaluator>` must be provided.
Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
Please refer to each compressor's doc to check the required arguments.
But, it is fine to provide more arguments than the compressor's need.

Просмотреть файл

@ -120,6 +120,11 @@ linkcheck_ignore = [
# remove after 3.0 release
r'https://nni\.readthedocs\.io/en/v2\.10/compression/overview\.html',
r'https://github.com/google-research/google-research/blob/20736344/tunas/rematlib/mobile_model_v3.py#L453',
r'https://github.com/google-research/google-research/blob/20736344591f774f4b1570af64624ed1e18d2867/tunas/mobile_search_space_v3.py#L728',
r'https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L135',
r'https://github.com/rwightman/pytorch-image-models/blob/b7cb8d03/timm/models/efficientnet_blocks.py#L134',
]
# Ignore all links located in release.rst

Просмотреть файл

@ -4,9 +4,9 @@ Distiller
DynamicLayerwiseDistiller
-------------------------
.. autoclass:: nni.contrib.compression.distillation.DynamicLayerwiseDistiller
.. autoclass:: nni.compression.distillation.DynamicLayerwiseDistiller
Adaptive1dLayerwiseDistiller
----------------------------
.. autoclass:: nni.contrib.compression.distillation.Adaptive1dLayerwiseDistiller
.. autoclass:: nni.compression.distillation.Adaptive1dLayerwiseDistiller

Просмотреть файл

@ -6,25 +6,25 @@ Evaluator
TorchEvaluator
--------------
.. autoclass:: nni.contrib.compression.TorchEvaluator
.. autoclass:: nni.compression.TorchEvaluator
.. _new-lightning-evaluator:
LightningEvaluator
------------------
.. autoclass:: nni.contrib.compression.LightningEvaluator
.. autoclass:: nni.compression.LightningEvaluator
.. _new-transformers-evaluator:
TransformersEvaluator
---------------------
.. autoclass:: nni.contrib.compression.TransformersEvaluator
.. autoclass:: nni.compression.TransformersEvaluator
.. _new-deepspeed-torch-evaluator:
DeepspeedTorchEvaluator
-----------------------
.. autoclass:: nni.contrib.compression.DeepspeedTorchEvaluator
.. autoclass:: nni.compression.DeepspeedTorchEvaluator

Просмотреть файл

@ -9,42 +9,42 @@ Basic Pruner
Level Pruner
^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.LevelPruner
.. autoclass:: nni.compression.pruning.LevelPruner
.. _new-l1-norm-pruner:
L1 Norm Pruner
^^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.L1NormPruner
.. autoclass:: nni.compression.pruning.L1NormPruner
.. _new-l2-norm-pruner:
L2 Norm Pruner
^^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.L2NormPruner
.. autoclass:: nni.compression.pruning.L2NormPruner
.. _new-fpgm-pruner:
FPGM Pruner
^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.FPGMPruner
.. autoclass:: nni.compression.pruning.FPGMPruner
.. _new-slim-pruner:
Slim Pruner
^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.SlimPruner
.. autoclass:: nni.compression.pruning.SlimPruner
.. _new-taylor-pruner:
Taylor FO Weight Pruner
^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.TaylorPruner
.. autoclass:: nni.compression.pruning.TaylorPruner
Scheduled Pruners
-----------------
@ -54,14 +54,14 @@ Scheduled Pruners
Linear Pruner
^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.LinearPruner
.. autoclass:: nni.compression.pruning.LinearPruner
.. _new-agp-pruner:
AGP Pruner
^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.AGPPruner
.. autoclass:: nni.compression.pruning.AGPPruner
Other Pruner
------------
@ -71,4 +71,4 @@ Other Pruner
Movement Pruner
^^^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.pruning.MovementPruner
.. autoclass:: nni.compression.pruning.MovementPruner

Просмотреть файл

@ -1,5 +1,5 @@
Pruning Speedup
===============
.. autoclass:: nni.compression.pytorch.speedup.v2.ModelSpeedup
.. autoclass:: nni.compression.speedup.ModelSpeedup
:members:

Просмотреть файл

@ -6,32 +6,32 @@ Quantizer
QAT Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.quantization.QATQuantizer
.. autoclass:: nni.compression.quantization.QATQuantizer
.. _NewDorefaQuantizer:
DoReFa Quantizer
^^^^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.quantization.DoReFaQuantizer
.. autoclass:: nni.compression.quantization.DoReFaQuantizer
.. _NewBNNQuantizer:
BNN Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.quantization.BNNQuantizer
.. autoclass:: nni.compression.quantization.BNNQuantizer
.. _NewLsqQuantizer:
LSQ Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.quantization.LsqQuantizer
.. autoclass:: nni.compression.quantization.LsqQuantizer
.. _NewPtqQuantizer:
PTQ Quantizer
^^^^^^^^^^^^^
.. autoclass:: nni.contrib.compression.quantization.PtqQuantizer
.. autoclass:: nni.compression.quantization.PtqQuantizer

Просмотреть файл

@ -6,5 +6,5 @@ Compression Utilities
auto_set_denpendency_group_ids
------------------------------
.. autoclass:: nni.contrib.compression.utils.auto_set_denpendency_group_ids
.. autoclass:: nni.compression.utils.auto_set_denpendency_group_ids
:members:

Просмотреть файл

@ -17,7 +17,7 @@
.. only:: html
.. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png
:alt: HPO Quickstart with TensorFlow
:alt:
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py`
@ -34,7 +34,7 @@
.. only:: html
.. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_model_thumb.png
:alt: Port TensorFlow Quickstart to NNI
:alt:
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py`

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 18 KiB

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 18 KiB

После

Ширина:  |  Высота:  |  Размер: 35 KiB

21
docs/source/tutorials/new_pruning_bert_glue.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -134,7 +123,7 @@
},
"outputs": [],
"source": [
"from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller\nfrom nni.contrib.compression.utils import TransformersEvaluator"
"from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller\nfrom nni.compression.utils import TransformersEvaluator"
]
},
{
@ -188,7 +177,7 @@
},
"outputs": [],
"source": [
"from nni.contrib.compression.pruning import MovementPruner\nfrom nni.compression.pytorch.speedup.v2 import ModelSpeedup\nfrom nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer\n\n\ndef pruning_attn():\n Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n model = build_finetuning_model(task_name, f'./output/bert_finetuned/{task_name}.bin')\n trainer = prepare_traced_trainer(model, task_name)\n evaluator = TransformersEvaluator(trainer)\n\n config_list = [{\n 'op_types': ['Linear'],\n 'op_names_re': ['bert\\.encoder\\.layer\\.[0-9]*\\.attention\\.*'],\n 'sparse_threshold': 0.1,\n 'granularity': [64, 64]\n }]\n\n pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)\n pruner.compress(None, 4)\n pruner.unwrap_model()\n\n masks = pruner.get_masks()\n Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n torch.save(masks, './output/pruning/attn_masks.pth')\n torch.save(model, './output/pruning/attn_masked_model.pth')\n\n\nif not skip_exec:\n pruning_attn()"
"from nni.compression.pruning import MovementPruner\nfrom nni.compression.speedup import ModelSpeedup\nfrom nni.compression.utils.external.external_replacer import TransformersAttentionReplacer\n\n\ndef pruning_attn():\n Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n model = build_finetuning_model(task_name, f'./output/bert_finetuned/{task_name}.bin')\n trainer = prepare_traced_trainer(model, task_name)\n evaluator = TransformersEvaluator(trainer)\n\n config_list = [{\n 'op_types': ['Linear'],\n 'op_names_re': ['bert\\.encoder\\.layer\\.[0-9]*\\.attention\\.*'],\n 'sparse_threshold': 0.1,\n 'granularity': [64, 64]\n }]\n\n pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)\n pruner.compress(None, 4)\n pruner.unwrap_model()\n\n masks = pruner.get_masks()\n Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n torch.save(masks, './output/pruning/attn_masks.pth')\n torch.save(model, './output/pruning/attn_masked_model.pth')\n\n\nif not skip_exec:\n pruning_attn()"
]
},
{
@ -224,7 +213,7 @@
},
"outputs": [],
"source": [
"from nni.contrib.compression.pruning import TaylorPruner, AGPPruner\nfrom transformers.models.bert.modeling_bert import BertLayer\n\n\ndef pruning_ffn():\n model: BertForSequenceClassification = torch.load('./output/pruning/attn_pruned_model.pth')\n teacher_model: BertForSequenceClassification = build_finetuning_model('mnli', f'./output/bert_finetuned/{task_name}.bin')\n # create ffn config list, here simply use a linear function related to the number of retained heads to determine the sparse ratio\n config_list = []\n for name, module in model.named_modules():\n if isinstance(module, BertLayer):\n retained_head_num = module.attention.self.num_attention_heads\n ori_head_num = len(module.attention.pruned_heads) + retained_head_num\n ffn_sparse_ratio = 1 - retained_head_num / ori_head_num / 2\n config_list.append({'op_names': [f'{name}.intermediate.dense'], 'sparse_ratio': ffn_sparse_ratio})\n\n trainer = prepare_traced_trainer(model, task_name)\n teacher_model.eval().to(trainer.args.device)\n # create a distiller for restoring the accuracy\n distiller = dynamic_distiller(model, teacher_model, trainer)\n # fusion compress: TaylorPruner + DynamicLayerwiseDistiller\n taylor_pruner = TaylorPruner.from_compressor(distiller, config_list, 1000)\n # fusion compress: AGPPruner(TaylorPruner) + DynamicLayerwiseDistiller\n agp_pruner = AGPPruner(taylor_pruner, 1000, 36)\n agp_pruner.compress(None, 3)\n agp_pruner.unwrap_model()\n distiller.unwrap_teacher_model()\n\n masks = agp_pruner.get_masks()\n Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n torch.save(masks, './output/pruning/ffn_masks.pth')\n torch.save(model, './output/pruning/ffn_masked_model.pth')\n\n\nif not skip_exec:\n pruning_ffn()"
"from nni.compression.pruning import TaylorPruner, AGPPruner\nfrom transformers.models.bert.modeling_bert import BertLayer\n\n\ndef pruning_ffn():\n model: BertForSequenceClassification = torch.load('./output/pruning/attn_pruned_model.pth')\n teacher_model: BertForSequenceClassification = build_finetuning_model('mnli', f'./output/bert_finetuned/{task_name}.bin')\n # create ffn config list, here simply use a linear function related to the number of retained heads to determine the sparse ratio\n config_list = []\n for name, module in model.named_modules():\n if isinstance(module, BertLayer):\n retained_head_num = module.attention.self.num_attention_heads\n ori_head_num = len(module.attention.pruned_heads) + retained_head_num\n ffn_sparse_ratio = 1 - retained_head_num / ori_head_num / 2\n config_list.append({'op_names': [f'{name}.intermediate.dense'], 'sparse_ratio': ffn_sparse_ratio})\n\n trainer = prepare_traced_trainer(model, task_name)\n teacher_model.eval().to(trainer.args.device)\n # create a distiller for restoring the accuracy\n distiller = dynamic_distiller(model, teacher_model, trainer)\n # fusion compress: TaylorPruner + DynamicLayerwiseDistiller\n taylor_pruner = TaylorPruner.from_compressor(distiller, config_list, 1000)\n # fusion compress: AGPPruner(TaylorPruner) + DynamicLayerwiseDistiller\n agp_pruner = AGPPruner(taylor_pruner, 1000, 36)\n agp_pruner.compress(None, 3)\n agp_pruner.unwrap_model()\n distiller.unwrap_teacher_model()\n\n masks = agp_pruner.get_masks()\n Path('./output/pruning/').mkdir(parents=True, exist_ok=True)\n torch.save(masks, './output/pruning/ffn_masks.pth')\n torch.save(model, './output/pruning/ffn_masked_model.pth')\n\n\nif not skip_exec:\n pruning_ffn()"
]
},
{
@ -260,7 +249,7 @@
},
"outputs": [],
"source": [
"from nni.contrib.compression.base.setting import PruningSetting\n\noutput_align_setting = {\n '_output_': {\n 'align': {\n 'module_name': None,\n 'target_name': 'weight',\n 'dims': [0],\n },\n 'apply_method': 'mul',\n }\n}\nPruningSetting.register('BertAttention', output_align_setting)\nPruningSetting.register('BertOutput', output_align_setting)"
"from nni.compression.base.setting import PruningSetting\n\noutput_align_setting = {\n '_output_': {\n 'align': {\n 'module_name': None,\n 'target_name': 'weight',\n 'dims': [0],\n },\n 'apply_method': 'mul',\n }\n}\nPruningSetting.register('BertAttention', output_align_setting)\nPruningSetting.register('BertOutput', output_align_setting)"
]
},
{
@ -341,7 +330,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.11"
}
},
"nbformat": 4,

14
docs/source/tutorials/new_pruning_bert_glue.py сгенерированный
Просмотреть файл

@ -199,8 +199,8 @@ if not skip_exec:
# The following code creates distillers for distillation.
from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.compression.utils import TransformersEvaluator
# %%
# Dynamic distillation is suitable for the situation where the distillation states dimension of the student and the teacher match.
@ -312,9 +312,9 @@ def adapt_distillation(student_model: BertForSequenceClassification, teacher_mod
# You could refer to the experiment results to choose a appropriate ``regular_scale`` you like.
from nni.contrib.compression.pruning import MovementPruner
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
from nni.compression.pruning import MovementPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer
def pruning_attn():
@ -378,7 +378,7 @@ if not skip_exec:
# so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning performance.
from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
from nni.compression.pruning import TaylorPruner, AGPPruner
from transformers.models.bert.modeling_bert import BertLayer
@ -444,7 +444,7 @@ if not skip_exec:
# The output masks can be generated and applied after register the setting template for them.
from nni.contrib.compression.base.setting import PruningSetting
from nni.compression.base.setting import PruningSetting
output_align_setting = {
'_output_': {

2
docs/source/tutorials/new_pruning_bert_glue.py.md5 сгенерированный
Просмотреть файл

@ -1 +1 @@
3e81f00f13fab8cfc204a0baef7d075e
f9ff31917a7b6ae9f988fcd63d626663

20
docs/source/tutorials/new_pruning_bert_glue.rst сгенерированный
Просмотреть файл

@ -10,7 +10,7 @@
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_new_pruning_bert_glue.py>`
:ref:`Go to the end <sphx_glr_download_tutorials_new_pruning_bert_glue.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
@ -300,8 +300,8 @@ The following code creates distillers for distillation.
from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.compression.utils import TransformersEvaluator
@ -452,9 +452,9 @@ You could refer to the experiment results to choose a appropriate ``regular_scal
from nni.contrib.compression.pruning import MovementPruner
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
from nni.compression.pruning import MovementPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer
def pruning_attn():
@ -544,7 +544,7 @@ so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning p
from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
from nni.compression.pruning import TaylorPruner, AGPPruner
from transformers.models.bert.modeling_bert import BertLayer
@ -636,7 +636,7 @@ The output masks can be generated and applied after register the setting templat
from nni.contrib.compression.base.setting import PruningSetting
from nni.compression.base.setting import PruningSetting
output_align_setting = {
'_output_': {
@ -858,7 +858,7 @@ Results
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 1.990 seconds)
**Total running time of the script:** ( 0 minutes 0.020 seconds)
.. _sphx_glr_download_tutorials_new_pruning_bert_glue.py:
@ -868,6 +868,8 @@ Results
.. container:: sphx-glr-footer sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: new_pruning_bert_glue.py <new_pruning_bert_glue.py>`

Двоичные данные
docs/source/tutorials/new_pruning_bert_glue_codeobj.pickle сгенерированный

Двоичный файл не отображается.

17
docs/source/tutorials/pruning_quick_start.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -80,7 +69,7 @@
},
"outputs": [],
"source": [
"from nni.contrib.compression.pruning import L1NormPruner\npruner = L1NormPruner(model, config_list)\n\n# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.\nprint(model)"
"from nni.compression.pruning import L1NormPruner\npruner = L1NormPruner(model, config_list)\n\n# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.\nprint(model)"
]
},
{
@ -109,7 +98,7 @@
},
"outputs": [],
"source": [
"# need to unwrap the model, if the model is wrapped before speedup\npruner.unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.pytorch.speedup.v2 import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
"# need to unwrap the model, if the model is wrapped before speedup\npruner.unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
]
},
{
@ -165,7 +154,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.10.11"
}
},
"nbformat": 4,

4
docs/source/tutorials/pruning_quick_start.py сгенерированный
Просмотреть файл

@ -65,7 +65,7 @@ config_list = [{
# %%
# Pruners usually require `model` and `config_list` as input arguments.
from nni.contrib.compression.pruning import L1NormPruner
from nni.compression.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)
# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -88,7 +88,7 @@ for name, mask in masks.items():
pruner.unwrap_model()
# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

2
docs/source/tutorials/pruning_quick_start.py.md5 сгенерированный
Просмотреть файл

@ -1 +1 @@
9feea465b118b0fa5da9379f4bb2d357
026cf2d53a9a109f620494e783ecec0b

20
docs/source/tutorials/pruning_quick_start.rst сгенерированный
Просмотреть файл

@ -10,7 +10,7 @@
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_pruning_quick_start.py>`
:ref:`Go to the end <sphx_glr_download_tutorials_pruning_quick_start.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
@ -104,9 +104,9 @@ If you are familiar with defining a model and training in pytorch, you can skip
.. code-block:: none
Average test loss: 0.6140, Accuracy: 7985/10000 (80%)
Average test loss: 0.2676, Accuracy: 9209/10000 (92%)
Average test loss: 0.1946, Accuracy: 9424/10000 (94%)
Average test loss: 0.7821, Accuracy: 7228/10000 (72%)
Average test loss: 0.2444, Accuracy: 9262/10000 (93%)
Average test loss: 0.1760, Accuracy: 9493/10000 (95%)
@ -151,7 +151,7 @@ Pruners usually require `model` and `config_list` as input arguments.
.. code-block:: default
from nni.contrib.compression.pruning import L1NormPruner
from nni.compression.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)
# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -213,10 +213,10 @@ Pruners usually require `model` and `config_list` as input arguments.
.. code-block:: none
fc2 sparsity : 0.5
fc1 sparsity : 0.5
conv1 sparsity : 0.5
conv2 sparsity : 0.5
fc1 sparsity : 0.5
fc2 sparsity : 0.5
@ -236,7 +236,7 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma
pruner.unwrap_model()
# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()
@ -326,7 +326,7 @@ Because speedup will replace the masked big layers with dense small ones.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 20.740 seconds)
**Total running time of the script:** ( 1 minutes 1.145 seconds)
.. _sphx_glr_download_tutorials_pruning_quick_start.py:
@ -336,6 +336,8 @@ Because speedup will replace the masked big layers with dense small ones.
.. container:: sphx-glr-footer sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: pruning_quick_start.py <pruning_quick_start.py>`

Двоичные данные
docs/source/tutorials/pruning_quick_start_codeobj.pickle сгенерированный

Двоичный файл не отображается.

17
docs/source/tutorials/pruning_speedup.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -87,7 +76,7 @@
},
"outputs": [],
"source": [
"from nni.compression.pytorch.speedup.v2 import ModelSpeedup\nModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()\nprint(model)"
"from nni.compression.speedup import ModelSpeedup\nModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()\nprint(model)"
]
},
{
@ -112,7 +101,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,\nplease refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.\n\nNOTE: The current implementation supports PyTorch 1.3.1 or newer.\n\n## Limitations\n\nFor PyTorch we can only replace modules, if functions in ``forward`` should be replaced,\nour current implementation does not work. One workaround is make the function a PyTorch module.\n\nIf you want to speedup your own model which cannot supported by the current implementation,\nyou need implement the replace function for module replacement, welcome to contribute.\n\n## Speedup Results of Examples\n\n\nThese result are tested on the [legacy pruning framework](https://nni.readthedocs.io/en/v2.6/Compression/pruning.html), new results will coming soon.\n\n### slim pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01197\n - 0.005107\n * - 2\n - 0.02019\n - 0.008769\n * - 4\n - 0.02733\n - 0.014809\n * - 8\n - 0.04310\n - 0.027441\n * - 16\n - 0.07731\n - 0.05008\n * - 32\n - 0.14464\n - 0.10027\n\n### fpgm pruner example\n\non cpu,\ninput tensor: ``torch.randn(64, 1, 28, 28)``\\ ,\ntoo large variance\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01383\n - 0.01839\n * - 2\n - 0.01167\n - 0.003558\n * - 4\n - 0.01636\n - 0.01088\n * - 40\n - 0.14412\n - 0.08268\n * - 40\n - 1.29385\n - 0.14408\n * - 40\n - 0.41035\n - 0.46162\n * - 400\n - 6.29020\n - 5.82143\n\n### l1filter pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01026\n - 0.003677\n * - 2\n - 0.01657\n - 0.008161\n * - 4\n - 0.02458\n - 0.020018\n * - 8\n - 0.03498\n - 0.025504\n * - 16\n - 0.06757\n - 0.047523\n * - 32\n - 0.10487\n - 0.086442\n\n### APoZ pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01389\n - 0.004208\n * - 2\n - 0.01628\n - 0.008310\n * - 4\n - 0.02521\n - 0.014008\n * - 8\n - 0.03386\n - 0.023923\n * - 16\n - 0.06042\n - 0.046183\n * - 32\n - 0.12421\n - 0.087113\n\n### SimulatedAnnealing pruner example\n\nIn this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.\nWe measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.\nThe latency is measured on one V100 GPU and the input tensor is ``torch.randn(128, 3, 32, 32)``.\n\n<img src=\"file://../../img/SA_latency_accuracy.png\">\n\n"
"For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,\nplease refer to :doc:`Pruning Quick Start <pruning_quick_start>`.\n\nNOTE: The current implementation supports PyTorch 1.3.1 or newer.\n\n## Limitations\n\nFor PyTorch we can only replace modules, if functions in ``forward`` should be replaced,\nour current implementation does not work. One workaround is make the function a PyTorch module.\n\nIf you want to speedup your own model which cannot supported by the current implementation,\nyou need implement the replace function for module replacement, welcome to contribute.\n\n## Speedup Results of Examples\n\n\nThese result are tested on the [legacy pruning framework](https://nni.readthedocs.io/en/v2.6/Compression/pruning.html), new results will coming soon.\n\n### slim pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01197\n - 0.005107\n * - 2\n - 0.02019\n - 0.008769\n * - 4\n - 0.02733\n - 0.014809\n * - 8\n - 0.04310\n - 0.027441\n * - 16\n - 0.07731\n - 0.05008\n * - 32\n - 0.14464\n - 0.10027\n\n### fpgm pruner example\n\non cpu,\ninput tensor: ``torch.randn(64, 1, 28, 28)``\\ ,\ntoo large variance\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01383\n - 0.01839\n * - 2\n - 0.01167\n - 0.003558\n * - 4\n - 0.01636\n - 0.01088\n * - 40\n - 0.14412\n - 0.08268\n * - 40\n - 1.29385\n - 0.14408\n * - 40\n - 0.41035\n - 0.46162\n * - 400\n - 6.29020\n - 5.82143\n\n### l1filter pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01026\n - 0.003677\n * - 2\n - 0.01657\n - 0.008161\n * - 4\n - 0.02458\n - 0.020018\n * - 8\n - 0.03498\n - 0.025504\n * - 16\n - 0.06757\n - 0.047523\n * - 32\n - 0.10487\n - 0.086442\n\n### APoZ pruner example\n\non one V100 GPU,\ninput tensor: ``torch.randn(64, 3, 32, 32)``\n\n.. list-table::\n :header-rows: 1\n :widths: auto\n\n * - Times\n - Mask Latency\n - Speedup Latency\n * - 1\n - 0.01389\n - 0.004208\n * - 2\n - 0.01628\n - 0.008310\n * - 4\n - 0.02521\n - 0.014008\n * - 8\n - 0.03386\n - 0.023923\n * - 16\n - 0.06042\n - 0.046183\n * - 32\n - 0.12421\n - 0.087113\n\n### SimulatedAnnealing pruner example\n\nIn this experiment, we use SimulatedAnnealing pruner to prune the resnet18 on the cifar10 dataset.\nWe measure the latencies and accuracies of the pruned model under different sparsity ratios, as shown in the following figure.\nThe latency is measured on one V100 GPU and the input tensor is ``torch.randn(128, 3, 32, 32)``.\n\n<img src=\"file://../../img/SA_latency_accuracy.png\">\n\n"
]
}
],
@ -132,7 +121,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.10.11"
}
},
"nbformat": 4,

4
docs/source/tutorials/pruning_speedup.py сгенерированный
Просмотреть файл

@ -65,7 +65,7 @@ print('Original Model - Elapsed Time : ', time.time() - start)
# %%
# Speedup the model and show the model structure after speedup.
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()
print(model)
@ -77,7 +77,7 @@ print('Speedup Model - Elapsed Time : ', time.time() - start)
# %%
# For combining usage of ``Pruner`` masks generation with ``ModelSpeedup``,
# please refer to :doc:`Pruning Quick Start <pruning_quick_start_mnist>`.
# please refer to :doc:`Pruning Quick Start <pruning_quick_start>`.
#
# NOTE: The current implementation supports PyTorch 1.3.1 or newer.
#

2
docs/source/tutorials/pruning_speedup.py.md5 сгенерированный
Просмотреть файл

@ -1 +1 @@
60334840999c86b64ff889ee9909a797
e128a8e53fcc5368f479aa5a40aa2fe1

12
docs/source/tutorials/pruning_speedup.rst сгенерированный
Просмотреть файл

@ -10,7 +10,7 @@
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_pruning_speedup.py>`
:ref:`Go to the end <sphx_glr_download_tutorials_pruning_speedup.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
@ -138,7 +138,7 @@ Roughly test the original model inference speed.
.. code-block:: none
Original Model - Elapsed Time : 0.16419386863708496
Original Model - Elapsed Time : 2.3036391735076904
@ -151,7 +151,7 @@ Speedup the model and show the model structure after speedup.
.. code-block:: default
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(10, 1, 28, 28).to(device), masks).speedup_model()
print(model)
@ -200,7 +200,7 @@ Roughly test the model after speedup inference speed.
.. code-block:: none
Speedup Model - Elapsed Time : 0.0038301944732666016
Speedup Model - Elapsed Time : 0.09416508674621582
@ -371,7 +371,7 @@ The latency is measured on one V100 GPU and the input tensor is ``torch.randn(1
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 16.241 seconds)
**Total running time of the script:** ( 0 minutes 10.330 seconds)
.. _sphx_glr_download_tutorials_pruning_speedup.py:
@ -381,6 +381,8 @@ The latency is measured on one V100 GPU and the input tensor is ``torch.randn(1
.. container:: sphx-glr-footer sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: pruning_speedup.py <pruning_speedup.py>`

Двоичные данные
docs/source/tutorials/pruning_speedup_codeobj.pickle сгенерированный

Двоичный файл не отображается.

15
docs/source/tutorials/quantization_bert_glue.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -116,7 +105,7 @@
},
"outputs": [],
"source": [
"import nni\nfrom nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer\nfrom nni.contrib.compression.utils import TransformersEvaluator\n\ndef fake_quantize():\n config_list = [{\n 'op_types': ['Linear'],\n 'op_names_re': ['bert.encoder.layer.{}'.format(i) for i in range(12)],\n 'target_names': ['weight', '_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n }]\n\n # create a finetune model\n Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n model: torch.nn.Module = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False) # type: ignore\n traced_trainer = prepare_traced_trainer(model, is_quant=False)\n evaluator = TransformersEvaluator(traced_trainer)\n if quant_method == 'lsq':\n quantizer = LsqQuantizer(model, config_list, evaluator)\n model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n elif quant_method == 'qat':\n quantizer = QATQuantizer(model, config_list, evaluator, 1000)\n model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n elif quant_method == 'ptq':\n quantizer = PtqQuantizer(model, config_list, evaluator)\n model, calibration_config = quantizer.compress(max_steps=1, max_epochs=None)\n else:\n raise ValueError(f\"quantization method {quant_method} is not supported\")\n print(calibration_config)\n # evaluate the performance of the fake quantize model\n quantizer.evaluator.bind_model(model, quantizer._get_param_names_map())\n print(quantizer.evaluator.evaluate())\n\ndef evaluate():\n model = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)\n trainer = prepare_traced_trainer(model, is_quant=False)\n metrics = trainer.evaluate()\n print(f\"Evaluate metrics={metrics}\")\n\n\nfake_quantize()\nevaluate()"
"import nni\nfrom nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer\nfrom nni.compression.utils import TransformersEvaluator\n\ndef fake_quantize():\n config_list = [{\n 'op_types': ['Linear'],\n 'op_names_re': ['bert.encoder.layer.{}'.format(i) for i in range(12)],\n 'target_names': ['weight', '_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n }]\n\n # create a finetune model\n Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True)\n model: torch.nn.Module = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False) # type: ignore\n traced_trainer = prepare_traced_trainer(model, is_quant=False)\n evaluator = TransformersEvaluator(traced_trainer)\n if quant_method == 'lsq':\n quantizer = LsqQuantizer(model, config_list, evaluator)\n model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n elif quant_method == 'qat':\n quantizer = QATQuantizer(model, config_list, evaluator, 1000)\n model, calibration_config = quantizer.compress(max_steps=None, max_epochs=quant_max_epochs)\n elif quant_method == 'ptq':\n quantizer = PtqQuantizer(model, config_list, evaluator)\n model, calibration_config = quantizer.compress(max_steps=1, max_epochs=None)\n else:\n raise ValueError(f\"quantization method {quant_method} is not supported\")\n print(calibration_config)\n # evaluate the performance of the fake quantize model\n quantizer.evaluator.bind_model(model, quantizer._get_param_names_map())\n print(quantizer.evaluator.evaluate())\n\ndef evaluate():\n model = build_finetuning_model(f'./output/bert_finetuned/{task_name}.bin', is_quant=False)\n trainer = prepare_traced_trainer(model, is_quant=False)\n metrics = trainer.evaluate()\n print(f\"Evaluate metrics={metrics}\")\n\n\nskip_exec = True\nif not skip_exec:\n fake_quantize()\n evaluate()"
]
},
{
@ -143,7 +132,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.11"
}
},
"nbformat": 4,

10
docs/source/tutorials/quantization_bert_glue.py сгенерированный
Просмотреть файл

@ -209,8 +209,8 @@ def build_finetuning_model(state_dict_path: str, is_quant=False):
# 6. Call ``quantizer.compress(max_steps, max_epochs)`` to execute the simulated quantization process
import nni
from nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
from nni.compression.utils import TransformersEvaluator
def fake_quantize():
config_list = [{
@ -250,8 +250,10 @@ def evaluate():
print(f"Evaluate metrics={metrics}")
fake_quantize()
evaluate()
skip_exec = True
if not skip_exec:
fake_quantize()
evaluate()
# %%

2
docs/source/tutorials/quantization_bert_glue.py.md5 сгенерированный
Просмотреть файл

@ -1 +1 @@
ba05e89a27a4d771b22a3de6d5172778
67e335e86718ed077e4997a9f0092ee3

247
docs/source/tutorials/quantization_bert_glue.rst сгенерированный

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Двоичные данные
docs/source/tutorials/quantization_bert_glue_codeobj.pickle сгенерированный

Двоичный файл не отображается.

15
docs/source/tutorials/quantization_quick_start.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -123,7 +112,7 @@
},
"outputs": [],
"source": [
"import nni\nfrom nni.contrib.compression.quantization import QATQuantizer\nfrom nni.contrib.compression.utils import TorchEvaluator\n\n\noptimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\nevaluator = TorchEvaluator(training_model, optimizer, training_step) # type: ignore\n\nconfig_list = [{\n 'op_names': ['conv1', 'conv2', 'fc1', 'fc2'],\n 'target_names': ['_input_', 'weight', '_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n},{\n 'op_names': ['relu1', 'relu2', 'relu3'],\n 'target_names': ['_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n}]\n\nquantizer = QATQuantizer(model, config_list, evaluator, len(train_dataloader))\nreal_input = next(iter(train_dataloader))[0].to(device)\nquantizer.track_forward(real_input)\n\nstart = time.time()\n_, calibration_config = quantizer.compress(None, max_epochs=5)\nprint(f'pure training 5 epochs: {time.time() - start}s')\n\nprint(calibration_config)\nstart = time.time()\nacc = evaluating_model(model)\nprint(f'quantization evaluating: {time.time() - start}s Acc.: {acc}')"
"import nni\nfrom nni.compression.quantization import QATQuantizer\nfrom nni.compression.utils import TorchEvaluator\n\n\noptimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)\nevaluator = TorchEvaluator(training_model, optimizer, training_step) # type: ignore\n\nconfig_list = [{\n 'op_names': ['conv1', 'conv2', 'fc1', 'fc2'],\n 'target_names': ['_input_', 'weight', '_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n},{\n 'op_names': ['relu1', 'relu2', 'relu3'],\n 'target_names': ['_output_'],\n 'quant_dtype': 'int8',\n 'quant_scheme': 'affine',\n 'granularity': 'default',\n}]\n\nquantizer = QATQuantizer(model, config_list, evaluator, len(train_dataloader))\nreal_input = next(iter(train_dataloader))[0].to(device)\nquantizer.track_forward(real_input)\n\nstart = time.time()\n_, calibration_config = quantizer.compress(None, max_epochs=5)\nprint(f'pure training 5 epochs: {time.time() - start}s')\n\nprint(calibration_config)\nstart = time.time()\nacc = evaluating_model(model)\nprint(f'quantization evaluating: {time.time() - start}s Acc.: {acc}')"
]
}
],
@ -143,7 +132,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.10.11"
}
},
"nbformat": 4,

4
docs/source/tutorials/quantization_quick_start.py сгенерированный
Просмотреть файл

@ -136,8 +136,8 @@ print(f'pure evaluating: {time.time() - start}s Acc.: {acc}')
# Detailed about how to write ``config_list`` please refer :doc:`Config Specification <../compression/config_list>`.
import nni
from nni.contrib.compression.quantization import QATQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import QATQuantizer
from nni.compression.utils import TorchEvaluator
optimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)

Просмотреть файл

@ -1 +1 @@
d3d1074e56626255e3e19ef2a2ff057f
0eda6c780fb06aaecfc2e9c9e804d33a

45
docs/source/tutorials/quantization_quick_start.rst сгенерированный
Просмотреть файл

@ -10,7 +10,7 @@
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_quantization_quick_start.py>`
:ref:`Go to the end <sphx_glr_download_tutorials_quantization_quick_start.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
@ -123,6 +123,31 @@ Create training and evaluation dataloader
.. rst-class:: sphx-glr-script-out
.. code-block:: none
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/mnist/MNIST/raw/train-images-idx3-ubyte.gz
0%| | 0/9912422 [00:00<?, ?it/s] 100%|##########| 9912422/9912422 [00:00<00:00, 110174318.21it/s]
Extracting data/mnist/MNIST/raw/train-images-idx3-ubyte.gz to data/mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0/28881 [00:00<?, ?it/s] 100%|##########| 28881/28881 [00:00<00:00, 91839040.05it/s]
Extracting data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz to data/mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0/1648877 [00:00<?, ?it/s] 100%|##########| 1648877/1648877 [00:00<00:00, 26703211.30it/s]
Extracting data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to data/mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
0%| | 0/4542 [00:00<?, ?it/s] 100%|##########| 4542/4542 [00:00<00:00, 63081221.09it/s]
Extracting data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/mnist/MNIST/raw
@ -217,8 +242,8 @@ Pre-train and evaluate the model on MNIST dataset
Epoch 2 start!
Epoch 3 start!
Epoch 4 start!
pure training 5 epochs: 71.90893840789795s
pure evaluating: 1.6302893161773682s Acc.: 0.9908
pure training 5 epochs: 62.24345350265503s
pure evaluating: 1.5607831478118896s Acc.: 0.9906
@ -237,8 +262,8 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
import nni
from nni.contrib.compression.quantization import QATQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import QATQuantizer
from nni.compression.utils import TorchEvaluator
optimizer = nni.trace(SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
@ -282,9 +307,9 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
Epoch 2 start!
Epoch 3 start!
Epoch 4 start!
pure training 5 epochs: 117.75990748405457s
defaultdict(<class 'dict'>, {'fc2': {'weight': {'scale': tensor(0.0020), 'zero_point': tensor(-8.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.2640), 'tracked_min': tensor(-0.2319)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.1541), 'zero_point': tensor(-39.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(25.6346), 'tracked_min': tensor(-13.5170)}}, 'conv1': {'weight': {'scale': tensor(0.0023), 'zero_point': tensor(-12.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3128), 'tracked_min': tensor(-0.2606)}, '_input_0': {'scale': tensor(0.0128), 'zero_point': tensor(-94.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(2.8215), 'tracked_min': tensor(-0.4242)}, '_output_0': {'scale': tensor(0.0265), 'zero_point': tensor(-5.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.4957), 'tracked_min': tensor(-3.2373)}}, 'fc1': {'weight': {'scale': tensor(0.0007), 'zero_point': tensor(3.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.0894), 'tracked_min': tensor(-0.0943)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0678), 'zero_point': tensor(-8.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(9.1579), 'tracked_min': tensor(-8.0707)}}, 'conv2': {'weight': {'scale': tensor(0.0012), 'zero_point': tensor(-35.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.1927), 'tracked_min': tensor(-0.1097)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9995), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0893), 'zero_point': tensor(2.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.1702), 'tracked_min': tensor(-11.5212)}}, 'relu3': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu2': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu1': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9996), 'tracked_min': tensor(0.)}}})
quantization evaluating: 1.6024222373962402s Acc.: 0.9915
pure training 5 epochs: 94.30406522750854s
defaultdict(<class 'dict'>, {'fc1': {'weight': {'scale': tensor(0.0007), 'zero_point': tensor(6.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.0897), 'tracked_min': tensor(-0.0992)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0648), 'zero_point': tensor(3.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(8.0606), 'tracked_min': tensor(-8.4004)}}, 'fc2': {'weight': {'scale': tensor(0.0018), 'zero_point': tensor(-5.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.2388), 'tracked_min': tensor(-0.2198)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.1514), 'zero_point': tensor(-35.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(24.4862), 'tracked_min': tensor(-13.9780)}}, 'conv1': {'weight': {'scale': tensor(0.0027), 'zero_point': tensor(11.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.3176), 'tracked_min': tensor(-0.3750)}, '_input_0': {'scale': tensor(0.0128), 'zero_point': tensor(-94.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(2.8215), 'tracked_min': tensor(-0.4242)}, '_output_0': {'scale': tensor(0.0261), 'zero_point': tensor(4.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(3.2271), 'tracked_min': tensor(-3.4134)}}, 'conv2': {'weight': {'scale': tensor(0.0011), 'zero_point': tensor(-24.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(0.1707), 'tracked_min': tensor(-0.1165)}, '_input_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(5.9999), 'tracked_min': tensor(0.)}, '_output_0': {'scale': tensor(0.0900), 'zero_point': tensor(1.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(11.3434), 'tracked_min': tensor(-11.5140)}}, 'relu2': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}, 'relu1': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.0000), 'tracked_min': tensor(0.)}}, 'relu3': {'_output_0': {'scale': tensor(0.0236), 'zero_point': tensor(-127.), 'quant_dtype': 'int8', 'quant_scheme': 'affine', 'quant_bits': 8, 'tracked_max': tensor(6.), 'tracked_min': tensor(0.)}}})
quantization evaluating: 1.3835649490356445s Acc.: 0.9912
@ -292,7 +317,7 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 3 minutes 22.673 seconds)
**Total running time of the script:** ( 2 minutes 40.255 seconds)
.. _sphx_glr_download_tutorials_quantization_quick_start.py:
@ -302,6 +327,8 @@ Detailed about how to write ``config_list`` please refer :doc:`Config Specificat
.. container:: sphx-glr-footer sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: quantization_quick_start.py <quantization_quick_start.py>`

Двоичные данные
docs/source/tutorials/quantization_quick_start_codeobj.pickle сгенерированный

Двоичный файл не отображается.

29
docs/source/tutorials/quantization_speedup.ipynb сгенерированный
Просмотреть файл

@ -1,16 +1,5 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,7 +22,7 @@
},
"outputs": [],
"source": [
"import torch\nimport torchvision\nimport torchvision.transforms as transforms\ndef prepare_data_loaders(data_path, batch_size):\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n dataset = torchvision.datasets.ImageNet(\n data_path, split=\"train\",\n transform=transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n normalize,\n ]))\n\n sampler = torch.utils.data.SequentialSampler(dataset)\n data_loader = torch.utils.data.DataLoader(\n dataset, batch_size=batch_size,\n sampler=sampler)\n return data_loader\n\ndata_path = '/data' # replace it with your path of ImageNet dataset\ndata_loader = prepare_data_loaders(data_path, batch_size=128)\ncalib_data = None\nfor image, target in data_loader:\n calib_data = image.numpy()\n break\n\nfrom nni.compression.pytorch.quantization_speedup.calibrator import Calibrator\n# TensorRT processes the calibration data in the batch size of 64\ncalib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)"
"import torch\nimport torchvision\nimport torchvision.transforms as transforms\n\n\nskip_exec = True\n\nif not skip_exec:\n\n def prepare_data_loaders(data_path, batch_size):\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n dataset = torchvision.datasets.ImageNet(\n data_path, split=\"train\",\n transform=transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n normalize,\n ]))\n\n sampler = torch.utils.data.SequentialSampler(dataset)\n data_loader = torch.utils.data.DataLoader(\n dataset, batch_size=batch_size,\n sampler=sampler)\n return data_loader\n\n data_path = '/data' # replace it with your path of ImageNet dataset\n data_loader = prepare_data_loaders(data_path, batch_size=128)\n calib_data = None\n for image, target in data_loader:\n calib_data = image.numpy()\n break\n\n from nni.compression.quantization_speedup.calibrator import Calibrator\n # TensorRT processes the calibration data in the batch size of 64\n calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)"
]
},
{
@ -51,7 +40,7 @@
},
"outputs": [],
"source": [
"from nni_assets.compression.mobilenetv2 import MobileNetV2\nmodel = MobileNetV2()\n# a checkpoint of MobileNetV2 can be found here\n# https://download.pytorch.org/models/mobilenet_v2-b0353104.pth\nfloat_model_file = 'mobilenet_pretrained_float.pth'\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()"
"if not skip_exec:\n from nni_assets.compression.mobilenetv2 import MobileNetV2\n model = MobileNetV2()\n # a checkpoint of MobileNetV2 can be found here\n # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth\n float_model_file = 'mobilenet_pretrained_float.pth'\n state_dict = torch.load(float_model_file)\n model.load_state_dict(state_dict)\n model.eval()"
]
},
{
@ -69,7 +58,7 @@
},
"outputs": [],
"source": [
"from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT\n# input shape is used for converting to onnx\nengine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))\nengine.compress_with_calibrator(calib)"
"if not skip_exec:\n from nni.compression.quantization_speedup import ModelSpeedupTensorRT\n # input shape is used for converting to onnx\n engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))\n engine.compress_with_calibrator(calib)"
]
},
{
@ -87,7 +76,7 @@
},
"outputs": [],
"source": [
"from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy\nimport time\ndef test_accelerated_model(engine, data_loader, neval_batches):\n top1 = AverageMeter('Acc@1', ':6.2f')\n top5 = AverageMeter('Acc@5', ':6.2f')\n cnt = 0\n total_time = 0\n for image, target in data_loader:\n start_time = time.time()\n output, time_span = engine.inference(image)\n infer_time = time.time() - start_time\n print('time: ', time_span, infer_time)\n total_time += time_span\n\n start_time = time.time()\n output = output.view(-1, 1000)\n cnt += 1\n acc1, acc5 = accuracy(output, target, topk=(1, 5))\n top1.update(acc1[0], image.size(0))\n top5.update(acc5[0], image.size(0))\n rest_time = time.time() - start_time\n print('rest time: ', rest_time)\n if cnt >= neval_batches:\n break\n print('inference time: ', total_time / neval_batches)\n return top1, top5\n\ndata_loader = prepare_data_loaders(data_path, batch_size=64)\ntop1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\nprint('Accuracy of mode #1: ', top1, top5)\n\n\"\"\"\n\nMode #2: Using TensorRT as a pure acceleration backend\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this mode, the post-training quantization within TensorRT is not used, instead, the quantization bit-width and the range of tensor values are fed into TensorRT for speedup (i.e., with `trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS` configured).\n\n\"\"\""
"if not skip_exec:\n from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy\n import time\n\n def test_accelerated_model(engine, data_loader, neval_batches):\n top1 = AverageMeter('Acc@1', ':6.2f')\n top5 = AverageMeter('Acc@5', ':6.2f')\n cnt = 0\n total_time = 0\n for image, target in data_loader:\n start_time = time.time()\n output, time_span = engine.inference(image)\n infer_time = time.time() - start_time\n print('time: ', time_span, infer_time)\n total_time += time_span\n\n start_time = time.time()\n output = output.view(-1, 1000)\n cnt += 1\n acc1, acc5 = accuracy(output, target, topk=(1, 5))\n top1.update(acc1[0], image.size(0))\n top5.update(acc5[0], image.size(0))\n rest_time = time.time() - start_time\n print('rest time: ', rest_time)\n if cnt >= neval_batches:\n break\n print('inference time: ', total_time / neval_batches)\n return top1, top5\n\n data_loader = prepare_data_loaders(data_path, batch_size=64)\n top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\n print('Accuracy of mode #1: ', top1, top5)\n\n\"\"\"\n\nMode #2: Using TensorRT as a pure acceleration backend\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this mode, the post-training quantization within TensorRT is not used, instead, the quantization bit-width and the range of tensor values are fed into TensorRT for speedup (i.e., with `trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS` configured).\n\n\"\"\""
]
},
{
@ -105,7 +94,7 @@
},
"outputs": [],
"source": [
"model = MobileNetV2()\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()\ndevice = torch.device('cuda')\nmodel.to(device)"
"if not skip_exec:\n model = MobileNetV2()\n state_dict = torch.load(float_model_file)\n model.load_state_dict(state_dict)\n model.eval()\n device = torch.device('cuda')\n model.to(device)"
]
},
{
@ -123,7 +112,7 @@
},
"outputs": [],
"source": [
"from nni_assets.compression.mobilenetv2 import evaluate\nfrom nni.compression.pytorch.utils import TorchEvaluator\ndata_loader = prepare_data_loaders(data_path, batch_size=128)\ndef eval_for_calibration(model):\n evaluate(model, data_loader,\n neval_batches=1, device=device)\n\ndummy_input = torch.Tensor(64, 3, 224, 224).to(device)\npredict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)"
"if not skip_exec:\n from nni_assets.compression.mobilenetv2 import evaluate\n from nni.compression.utils import TorchEvaluator\n data_loader = prepare_data_loaders(data_path, batch_size=128)\n\n def eval_for_calibration(model):\n evaluate(model, data_loader, neval_batches=1, device=device)\n\n dummy_input = torch.Tensor(64, 3, 224, 224).to(device)\n predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)"
]
},
{
@ -141,7 +130,7 @@
},
"outputs": [],
"source": [
"from nni.compression.pytorch.quantization import PtqQuantizer\nconfig_list = [{\n 'quant_types': ['input', 'weight', 'output'],\n 'quant_bits': {'input': 8, 'weight': 8, 'output': 8},\n 'quant_dtype': 'int',\n 'quant_scheme': 'per_tensor_symmetric',\n 'op_types': ['default']\n}]\nquantizer = PtqQuantizer(model, config_list, predict_func, True)\nquantizer.compress()\ncalibration_config = quantizer.export_model()\nprint('quant result config: ', calibration_config)"
"from nni.compression.quantization import PtqQuantizer\nif not skip_exec:\n config_list = [{\n 'quant_types': ['input', 'weight', 'output'],\n 'quant_bits': {'input': 8, 'weight': 8, 'output': 8},\n 'quant_dtype': 'int',\n 'quant_scheme': 'per_tensor_symmetric',\n 'op_types': ['default']\n }]\n quantizer = PtqQuantizer(model, config_list, predict_func, True)\n quantizer.compress()\n calibration_config = quantizer.export_model()\n print('quant result config: ', calibration_config)"
]
},
{
@ -159,7 +148,7 @@
},
"outputs": [],
"source": [
"model = MobileNetV2()\nstate_dict = torch.load(float_model_file)\nmodel.load_state_dict(state_dict)\nmodel.eval()\n\nengine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)\nengine.compress()\ndata_loader = prepare_data_loaders(data_path, batch_size=64)\ntop1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\nprint('Accuracy of mode #2: ', top1, top5)"
"if not skip_exec:\n model = MobileNetV2()\n state_dict = torch.load(float_model_file)\n model.load_state_dict(state_dict)\n model.eval()\n\n engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)\n engine.compress()\n data_loader = prepare_data_loaders(data_path, batch_size=64)\n top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)\n print('Accuracy of mode #2: ', top1, top5)"
]
}
],
@ -179,7 +168,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.10.11"
}
},
"nbformat": 4,

214
docs/source/tutorials/quantization_speedup.py сгенерированный
Просмотреть файл

@ -33,85 +33,95 @@ As TensorRT has supported post-training quantization, directly leveraging this f
import torch
import torchvision
import torchvision.transforms as transforms
def prepare_data_loaders(data_path, batch_size):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
dataset = torchvision.datasets.ImageNet(
data_path, split="train",
transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
]))
sampler = torch.utils.data.SequentialSampler(dataset)
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size,
sampler=sampler)
return data_loader
data_path = '/data' # replace it with your path of ImageNet dataset
data_loader = prepare_data_loaders(data_path, batch_size=128)
calib_data = None
for image, target in data_loader:
calib_data = image.numpy()
break
skip_exec = True
from nni.compression.pytorch.quantization_speedup.calibrator import Calibrator
# TensorRT processes the calibration data in the batch size of 64
calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)
if not skip_exec:
def prepare_data_loaders(data_path, batch_size):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
dataset = torchvision.datasets.ImageNet(
data_path, split="train",
transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
]))
sampler = torch.utils.data.SequentialSampler(dataset)
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size,
sampler=sampler)
return data_loader
data_path = '/data' # replace it with your path of ImageNet dataset
data_loader = prepare_data_loaders(data_path, batch_size=128)
calib_data = None
for image, target in data_loader:
calib_data = image.numpy()
break
from nni.compression.quantization_speedup.calibrator import Calibrator
# TensorRT processes the calibration data in the batch size of 64
calib = Calibrator(calib_data, 'data/calib_cache_file.cache', batch_size=64)
# %%
# Prepare the float32 model MobileNetV2
from nni_assets.compression.mobilenetv2 import MobileNetV2
model = MobileNetV2()
# a checkpoint of MobileNetV2 can be found here
# https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
float_model_file = 'mobilenet_pretrained_float.pth'
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
if not skip_exec:
from nni_assets.compression.mobilenetv2 import MobileNetV2
model = MobileNetV2()
# a checkpoint of MobileNetV2 can be found here
# https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
float_model_file = 'mobilenet_pretrained_float.pth'
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
# %%
# Speed up the model with TensorRT
from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
# input shape is used for converting to onnx
engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))
engine.compress_with_calibrator(calib)
if not skip_exec:
from nni.compression.quantization_speedup import ModelSpeedupTensorRT
# input shape is used for converting to onnx
engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224))
engine.compress_with_calibrator(calib)
# %%
# Test the accuracy of the accelerated model
from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy
import time
def test_accelerated_model(engine, data_loader, neval_batches):
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
cnt = 0
total_time = 0
for image, target in data_loader:
start_time = time.time()
output, time_span = engine.inference(image)
infer_time = time.time() - start_time
print('time: ', time_span, infer_time)
total_time += time_span
if not skip_exec:
from nni_assets.compression.mobilenetv2 import AverageMeter, accuracy
import time
start_time = time.time()
output = output.view(-1, 1000)
cnt += 1
acc1, acc5 = accuracy(output, target, topk=(1, 5))
top1.update(acc1[0], image.size(0))
top5.update(acc5[0], image.size(0))
rest_time = time.time() - start_time
print('rest time: ', rest_time)
if cnt >= neval_batches:
break
print('inference time: ', total_time / neval_batches)
return top1, top5
def test_accelerated_model(engine, data_loader, neval_batches):
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
cnt = 0
total_time = 0
for image, target in data_loader:
start_time = time.time()
output, time_span = engine.inference(image)
infer_time = time.time() - start_time
print('time: ', time_span, infer_time)
total_time += time_span
data_loader = prepare_data_loaders(data_path, batch_size=64)
top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
print('Accuracy of mode #1: ', top1, top5)
start_time = time.time()
output = output.view(-1, 1000)
cnt += 1
acc1, acc5 = accuracy(output, target, topk=(1, 5))
top1.update(acc1[0], image.size(0))
top5.update(acc5[0], image.size(0))
rest_time = time.time() - start_time
print('rest time: ', rest_time)
if cnt >= neval_batches:
break
print('inference time: ', total_time / neval_batches)
return top1, top5
data_loader = prepare_data_loaders(data_path, batch_size=64)
top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
print('Accuracy of mode #1: ', top1, top5)
"""
@ -124,41 +134,44 @@ In this mode, the post-training quantization within TensorRT is not used, instea
# %%
# re-instantiate the MobileNetV2 model
model = MobileNetV2()
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
device = torch.device('cuda')
model.to(device)
if not skip_exec:
model = MobileNetV2()
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
device = torch.device('cuda')
model.to(device)
# %%
# Prepare Evaluator for PtqQuantizer
# PtqQuantizer uses eval_for_calibration to collect calibration data
# in the current setting, it handles 128 samples
from nni_assets.compression.mobilenetv2 import evaluate
from nni.compression.pytorch.utils import TorchEvaluator
data_loader = prepare_data_loaders(data_path, batch_size=128)
def eval_for_calibration(model):
evaluate(model, data_loader,
neval_batches=1, device=device)
if not skip_exec:
from nni_assets.compression.mobilenetv2 import evaluate
from nni.compression.utils import TorchEvaluator
data_loader = prepare_data_loaders(data_path, batch_size=128)
dummy_input = torch.Tensor(64, 3, 224, 224).to(device)
predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)
def eval_for_calibration(model):
evaluate(model, data_loader, neval_batches=1, device=device)
dummy_input = torch.Tensor(64, 3, 224, 224).to(device)
predict_func = TorchEvaluator(predicting_func=eval_for_calibration, dummy_input=dummy_input)
# %%
# Use PtqQuantizer to quantize the model
from nni.compression.pytorch.quantization import PtqQuantizer
config_list = [{
'quant_types': ['input', 'weight', 'output'],
'quant_bits': {'input': 8, 'weight': 8, 'output': 8},
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric',
'op_types': ['default']
}]
quantizer = PtqQuantizer(model, config_list, predict_func, True)
quantizer.compress()
calibration_config = quantizer.export_model()
print('quant result config: ', calibration_config)
from nni.compression.quantization import PtqQuantizer
if not skip_exec:
config_list = [{
'quant_types': ['input', 'weight', 'output'],
'quant_bits': {'input': 8, 'weight': 8, 'output': 8},
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric',
'op_types': ['default']
}]
quantizer = PtqQuantizer(model, config_list, predict_func, True)
quantizer.compress()
calibration_config = quantizer.export_model()
print('quant result config: ', calibration_config)
# %%
# Speed up the quantized model following the generated calibration_config
@ -166,13 +179,14 @@ print('quant result config: ', calibration_config)
# after applying bn folding. bn folding changes the models structure and weights.
# As TensorRT does bn folding by itself, we should input an original model to it.
# For simplicity, we re-instantiate a new model.
model = MobileNetV2()
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
if not skip_exec:
model = MobileNetV2()
state_dict = torch.load(float_model_file)
model.load_state_dict(state_dict)
model.eval()
engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)
engine.compress()
data_loader = prepare_data_loaders(data_path, batch_size=64)
top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
print('Accuracy of mode #2: ', top1, top5)
engine = ModelSpeedupTensorRT(model, input_shape=(64, 3, 224, 224), config=calibration_config)
engine.compress()
data_loader = prepare_data_loaders(data_path, batch_size=64)
top1, top5 = test_accelerated_model(engine, data_loader, neval_batches=32)
print('Accuracy of mode #2: ', top1, top5)

2
docs/source/tutorials/quantization_speedup.py.md5 сгенерированный
Просмотреть файл

@ -1 +1 @@
19e925997289f729983ff4d5ac76c89f
d364a206afed723ad9006f2a7035c00c

1321
docs/source/tutorials/quantization_speedup.rst сгенерированный

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Двоичные данные
docs/source/tutorials/quantization_speedup_codeobj.pickle сгенерированный

Двоичный файл не отображается.

11
docs/source/tutorials/sg_execution_times.rst сгенерированный
Просмотреть файл

@ -3,12 +3,15 @@
.. _sphx_glr_tutorials_sg_execution_times:
Computation times
=================
**03:22.673** total execution time for **tutorials** files:
**00:03.512** total execution time for **tutorials** files:
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_quick_start.py` (``quantization_quick_start.py``) | 03:22.673 | 0.0 MB |
| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 00:03.504 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_bert_glue.py` (``quantization_bert_glue.py``) | 00:00.009 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_darts.py` (``darts.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
@ -22,7 +25,5 @@ Computation times
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_pruning_speedup.py` (``pruning_speedup.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_bert_glue.py` (``quantization_bert_glue.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_quantization_speedup.py` (``quantization_speedup.py``) | 00:00.000 | 0.0 MB |
| :ref:`sphx_glr_tutorials_quantization_quick_start.py` (``quantization_quick_start.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------------------+-----------+--------+

Просмотреть файл

@ -74,7 +74,7 @@ class MyModule(pl.LightningModule):
class MyDataModule(pl.LightningDataModule):
pass
from nni.contrib.compression import LightningEvaluator
from nni.compression import LightningEvaluator
pl_trainer = nni.trace(pl.Trainer)(
accelerator='auto',

Просмотреть файл

@ -90,6 +90,6 @@ def training_step(batch: Any, model: torch.nn.Module, *args, **kwargs):
# Init ``TorchEvaluator``
# -----------------------
from nni.contrib.compression import TorchEvaluator
from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func, optimizer, training_step, lr_scheduler)

Просмотреть файл

@ -19,11 +19,11 @@ from examples.compression.models import (
device
)
from nni.contrib.compression import TorchEvaluator
from nni.contrib.compression.distillation import DynamicLayerwiseDistiller
from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression import TorchEvaluator
from nni.compression.distillation import DynamicLayerwiseDistiller
from nni.compression.pruning import TaylorPruner, AGPPruner
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
if __name__ == '__main__':

Просмотреть файл

@ -19,13 +19,13 @@ from examples.compression.models import (
device
)
from nni.contrib.compression import TorchEvaluator
from nni.contrib.compression.base.compressor import Quantizer
from nni.contrib.compression.distillation import DynamicLayerwiseDistiller
from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
from nni.contrib.compression.quantization import QATQuantizer
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression import TorchEvaluator
from nni.compression.base.compressor import Quantizer
from nni.compression.distillation import DynamicLayerwiseDistiller
from nni.compression.pruning import TaylorPruner, AGPPruner
from nni.compression.quantization import QATQuantizer
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
if __name__ == '__main__':

Просмотреть файл

@ -13,13 +13,13 @@ from examples.compression.models import (
device
)
from nni.contrib.compression.pruning import (
from nni.compression.pruning import (
L1NormPruner,
L2NormPruner,
FPGMPruner
)
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
prune_type = 'l1'

Просмотреть файл

@ -13,10 +13,10 @@ from examples.compression.models import (
device
)
from nni.contrib.compression import TorchEvaluator
from nni.contrib.compression.pruning import TaylorPruner, LinearPruner, AGPPruner
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression import TorchEvaluator
from nni.compression.pruning import TaylorPruner, LinearPruner, AGPPruner
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
schedule_type = 'agp'

Просмотреть файл

@ -13,10 +13,10 @@ from examples.compression.models import (
device
)
from nni.contrib.compression import TorchEvaluator
from nni.contrib.compression.pruning import SlimPruner
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression import TorchEvaluator
from nni.compression.pruning import SlimPruner
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
if __name__ == '__main__':

Просмотреть файл

@ -13,10 +13,10 @@ from examples.compression.models import (
device
)
from nni.contrib.compression import TorchEvaluator
from nni.contrib.compression.pruning import TaylorPruner
from nni.contrib.compression.utils import auto_set_denpendency_group_ids
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression import TorchEvaluator
from nni.compression.pruning import TaylorPruner
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup
if __name__ == '__main__':

Просмотреть файл

@ -11,8 +11,8 @@ from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import nni
from nni.contrib.compression.quantization import BNNQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import BNNQuantizer
from nni.compression.utils import TorchEvaluator
from nni.common.types import SCHEDULER

Просмотреть файл

@ -13,8 +13,8 @@ from torch import Tensor
from torchvision import datasets, transforms
import nni
from nni.contrib.compression.quantization import DoReFaQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import DoReFaQuantizer
from nni.compression.utils import TorchEvaluator
from nni.common.types import SCHEDULER

Просмотреть файл

@ -13,8 +13,8 @@ from torch import Tensor
from torchvision import datasets, transforms
import nni
from nni.contrib.compression.quantization import LsqQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import LsqQuantizer
from nni.compression.utils import TorchEvaluator
from nni.common.types import SCHEDULER
torch.manual_seed(0)

Просмотреть файл

@ -12,8 +12,8 @@ from torch import Tensor
from torchvision import datasets, transforms
import nni
from nni.contrib.compression.quantization import PtqQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import PtqQuantizer
from nni.compression.utils import TorchEvaluator
from nni.common.types import SCHEDULER

Просмотреть файл

@ -14,8 +14,8 @@ from torchvision import transforms
from torchvision.datasets import MNIST
import nni
from nni.contrib.compression.quantization import QATQuantizer
from nni.contrib.compression.utils import TorchEvaluator
from nni.compression.quantization import QATQuantizer
from nni.compression.utils import TorchEvaluator
from nni.common.types import SCHEDULER

8
examples/model_compress/.gitignore поставляемый
Просмотреть файл

@ -1,8 +0,0 @@
.pth
.tar.gz
data/
MNIST/
cifar-10-batches-py/
experiment_data/
pruning/models
pruning/pruning_log

Просмотреть файл

@ -1,7 +0,0 @@
# Examples
This folder contains a large number of examples of old versions of compression.
If you find that some examples are invalid, please contact us.
This folder will be deleted around NNI 3.2.
The new version examples is under `examples/compression`.

Просмотреть файл

@ -1,129 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from typing import Callable, Optional, Iterable
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.auto_compress import AbstractAutoCompressionModule
torch.manual_seed(1)
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
_use_cuda = torch.cuda.is_available()
_train_kwargs = {'batch_size': 64}
_test_kwargs = {'batch_size': 1000}
if _use_cuda:
_cuda_kwargs = {'num_workers': 1,
'pin_memory': True,
'shuffle': True}
_train_kwargs.update(_cuda_kwargs)
_test_kwargs.update(_cuda_kwargs)
_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
_device = torch.device("cuda" if _use_cuda else "cpu")
_train_loader = None
_test_loader = None
def _train(model, optimizer, criterion, epoch):
global _train_loader
if _train_loader is None:
dataset = datasets.MNIST('./data', train=True, download=True, transform=_transform)
_train_loader = torch.utils.data.DataLoader(dataset, **_train_kwargs)
model.train()
for data, target in _train_loader:
data, target = data.to(_device), target.to(_device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def _test(model):
global _test_loader
if _test_loader is None:
dataset = datasets.MNIST('./data', train=False, transform=_transform)
_test_loader = torch.utils.data.DataLoader(dataset, **_test_kwargs)
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in _test_loader:
data, target = data.to(_device), target.to(_device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(_test_loader.dataset)
acc = 100 * correct / len(_test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(_test_loader.dataset), acc))
return acc
_model = LeNet().to(_device)
_model.load_state_dict(torch.load('mnist_pretrain_lenet.pth'))
class AutoCompressionModule(AbstractAutoCompressionModule):
@classmethod
def model(cls) -> nn.Module:
return _model
@classmethod
def evaluator(cls) -> Callable[[nn.Module], float]:
return _test
@classmethod
def optimizer_factory(cls) -> Optional[Callable[[Iterable], optim.Optimizer]]:
def _optimizer_factory(params: Iterable):
return torch.optim.SGD(params, lr=0.01)
return _optimizer_factory
@classmethod
def criterion(cls) -> Optional[Callable]:
return F.nll_loss
@classmethod
def sparsifying_trainer(cls, compress_algorithm_name: str) -> Optional[Callable[[nn.Module, optim.Optimizer, Callable, int], None]]:
return _train
@classmethod
def post_compress_finetuning_trainer(cls, compress_algorithm_name: str) -> Optional[Callable[[nn.Module, optim.Optimizer, Callable, int], None]]:
return _train
@classmethod
def post_compress_finetuning_epochs(cls, compress_algorithm_name: str) -> int:
return 2

Просмотреть файл

@ -1,50 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from pathlib import Path
from nni.compression.pytorch.auto_compress import AutoCompressionExperiment, AutoCompressionSearchSpaceGenerator
from auto_compress_module import AutoCompressionModule
generator = AutoCompressionSearchSpaceGenerator()
generator.add_config('level', [
{
"sparsity": {
"_type": "uniform",
"_value": [0.01, 0.99]
},
'op_types': ['default']
}
])
generator.add_config('l1', [
{
"sparsity": {
"_type": "uniform",
"_value": [0.01, 0.99]
},
'op_types': ['Conv2d']
}
])
generator.add_config('qat', [
{
'quant_types': ['weight', 'output'],
'quant_bits': {
'weight': 8,
'output': 8
},
'op_types': ['Conv2d', 'Linear']
}])
search_space = generator.dumps()
experiment = AutoCompressionExperiment(AutoCompressionModule, 'local')
experiment.config.experiment_name = 'auto compression torch example'
experiment.config.trial_concurrency = 1
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True
experiment.run(8088)

Двоичный файл не отображается.

Просмотреть файл

@ -1,300 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
NNI example for combined pruning and quantization to compress a model.
In this example, we show the compression process to first prune a model, then quantize the pruned model.
"""
import argparse
import os
import time
import torch
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torchvision import datasets, transforms
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.pruning import L1FilterPruner
from nni.compression.pytorch.quantization import QAT_Quantizer
from models.mnist.naive import NaiveModel
from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
def get_model_time_cost(model, dummy_input):
model.eval()
n_times = 100
time_list = []
for _ in range(n_times):
torch.cuda.synchronize()
tic = time.time()
_ = model(dummy_input)
torch.cuda.synchronize()
time_list.append(time.time()-tic)
time_list = time_list[10:]
return sum(time_list) / len(time_list)
def train(args, model, device, train_loader, criterion, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
if args.dry_run:
break
def test(args, model, device, criterion, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Test Loss: {:.6f} Accuracy: {}%\n'.format(
test_loss, acc))
return acc
def test_trt(engine, test_loader):
test_loss = 0
correct = 0
time_elasped = 0
for data, target in test_loader:
output, time = engine.inference(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
time_elasped += time
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
def main(args):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
os.makedirs(args.experiment_data_dir, exist_ok=True)
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=transform),
batch_size=64,)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=transform),
batch_size=1000)
# Step1. Model Pretraining
model = NaiveModel().to(device)
criterion = torch.nn.NLLLoss()
optimizer = optim.Adadelta(model.parameters(), lr=args.pretrain_lr)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
flops, params, _ = count_flops_params(model, (1, 1, 28, 28), verbose=False)
if args.pretrained_model_dir is None:
args.pretrained_model_dir = os.path.join(args.experiment_data_dir, f'pretrained.pth')
best_acc = 0
for epoch in range(args.pretrain_epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
acc = test(args, model, device, criterion, test_loader)
if acc > best_acc:
best_acc = acc
state_dict = model.state_dict()
model.load_state_dict(state_dict)
torch.save(state_dict, args.pretrained_model_dir)
print(f'Model saved to {args.pretrained_model_dir}')
else:
state_dict = torch.load(args.pretrained_model_dir)
model.load_state_dict(state_dict)
best_acc = test(args, model, device, criterion, test_loader)
dummy_input = torch.randn([1000, 1, 28, 28]).to(device)
time_cost = get_model_time_cost(model, dummy_input)
# 125.49 M, 0.85M, 93.29, 1.1012
print(f'Pretrained model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}, Time Cost: {time_cost}')
# Step2. Model Pruning
config_list = [{
'sparsity': args.sparsity,
'op_types': ['Conv2d']
}]
kw_args = {}
if args.dependency_aware:
dummy_input = torch.randn([1000, 1, 28, 28]).to(device)
print('Enable the dependency_aware mode')
# note that, not all pruners support the dependency_aware mode
kw_args['dependency_aware'] = True
kw_args['dummy_input'] = dummy_input
pruner = L1FilterPruner(model, config_list, **kw_args)
model = pruner.compress()
pruner.get_pruned_weights()
mask_path = os.path.join(args.experiment_data_dir, 'mask.pth')
model_path = os.path.join(args.experiment_data_dir, 'pruned.pth')
pruner.export_model(model_path=model_path, mask_path=mask_path)
pruner._unwrap_model() # unwrap all modules to normal state
# Step3. Model Speedup
m_speedup = ModelSpeedup(model, dummy_input, mask_path, device)
m_speedup.speedup_model()
print('model after speedup', model)
flops, params, _ = count_flops_params(model, dummy_input, verbose=False)
acc = test(args, model, device, criterion, test_loader)
time_cost = get_model_time_cost(model, dummy_input)
print(f'Pruned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {acc: .2f}, Time Cost: {time_cost}')
# Step4. Model Finetuning
optimizer = optim.Adadelta(model.parameters(), lr=args.pretrain_lr)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
best_acc = 0
for epoch in range(args.finetune_epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
acc = test(args, model, device, criterion, test_loader)
if acc > best_acc:
best_acc = acc
state_dict = model.state_dict()
model.load_state_dict(state_dict)
save_path = os.path.join(args.experiment_data_dir, f'finetuned.pth')
torch.save(state_dict, save_path)
flops, params, _ = count_flops_params(model, dummy_input, verbose=True)
time_cost = get_model_time_cost(model, dummy_input)
# FLOPs 28.48 M, #Params: 0.18M, Accuracy: 89.03, Time Cost: 1.03
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}, Time Cost: {time_cost}')
print(f'Model saved to {save_path}')
# Step5. Model Quantization via QAT
config_list = [{
'quant_types': ['weight', 'output'],
'quant_bits': {'weight': 8, 'output': 8},
'op_names': ['conv1']
}, {
'quant_types': ['output'],
'quant_bits': {'output':8},
'op_names': ['relu1']
}, {
'quant_types': ['weight', 'output'],
'quant_bits': {'weight': 8, 'output': 8},
'op_names': ['conv2']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8},
'op_names': ['relu2']
}]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)
quantizer.compress()
# Step6. Quantization Aware Training
best_acc = 0
for epoch in range(1):
train(args, model, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
acc = test(args, model, device, criterion, test_loader)
if acc > best_acc:
best_acc = acc
state_dict = model.state_dict()
calibration_path = os.path.join(args.experiment_data_dir, 'calibration.pth')
calibration_config = quantizer.export_model(model_path, calibration_path)
print("calibration_config: ", calibration_config)
# Step7. Model Speedup
batch_size = 32
input_shape = (batch_size, 1, 28, 28)
engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)
engine.compress()
test_trt(engine, test_loader)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
# dataset and model
# parser.add_argument('--dataset', type=str, default='mnist',
# help='dataset to use, mnist, cifar10 or imagenet')
# parser.add_argument('--data-dir', type=str, default='./data/',
# help='dataset directory')
parser.add_argument('--pretrained-model-dir', type=str, default=None,
help='path to pretrained model')
parser.add_argument('--pretrain-epochs', type=int, default=10,
help='number of epochs to pretrain the model')
parser.add_argument('--pretrain-lr', type=float, default=1.0,
help='learning rate to pretrain the model')
parser.add_argument('--experiment-data-dir', type=str, default='./experiment_data',
help='For saving output checkpoints')
parser.add_argument('--log-interval', type=int, default=100, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--dry-run', action='store_true', default=False,
help='quickly check a single pass')
# parser.add_argument('--multi-gpu', action='store_true', default=False,
# help='run on mulitple gpus')
# parser.add_argument('--test-only', action='store_true', default=False,
# help='run test only')
# pruner
# parser.add_argument('--pruner', type=str, default='l1filter',
# choices=['level', 'l1filter', 'l2filter', 'slim', 'agp',
# 'fpgm', 'mean_activation', 'apoz', 'admm'],
# help='pruner to use')
parser.add_argument('--sparsity', type=float, default=0.5,
help='target overall target sparsity')
parser.add_argument('--dependency-aware', action='store_true', default=False,
help='toggle dependency-aware mode')
# finetuning
parser.add_argument('--finetune-epochs', type=int, default=5,
help='epochs to fine tune')
# parser.add_argument('--kd', action='store_true', default=False,
# help='quickly check a single pass')
# parser.add_argument('--kd_T', type=float, default=4,
# help='temperature for KD distillation')
# parser.add_argument('--finetune-lr', type=float, default=0.5,
# help='learning rate to finetune the model')
# speedup
# parser.add_argument('--speedup', action='store_true', default=False,
# help='whether to speedup the pruned model')
# parser.add_argument('--nni', action='store_true', default=False,
# help="whether to tune the pruners using NNi tuners")
args = parser.parse_args()
main(args)

Просмотреть файл

@ -1,43 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from pathlib import Path
import torch
from torch.optim import Adam
import nni
from nni.compression.experiment.experiment import CompressionExperiment
from nni.compression.experiment.config import CompressionExperimentConfig, TaylorFOWeightPrunerConfig
from vessel import LeNet, finetuner, evaluator, trainer, criterion, device
model = LeNet().to(device)
# pre-training model
finetuner(model)
optimizer = nni.trace(Adam)(model.parameters())
dummy_input = torch.rand(16, 1, 28, 28).to(device)
# normal experiment setting, no need to set search_space and trial_command
config = CompressionExperimentConfig('local')
config.experiment_name = 'auto compression torch example'
config.trial_concurrency = 1
config.max_trial_number = 10
config.trial_code_directory = Path(__file__).parent
config.tuner.name = 'TPE'
config.tuner.class_args['optimize_mode'] = 'maximize'
# compression experiment specific setting
# single float value means the expected remaining ratio upper limit for flops & params, lower limit for metric
config.compression_setting.flops = 0.2
config.compression_setting.params = 0.5
config.compression_setting.module_types = ['Conv2d', 'Linear']
config.compression_setting.exclude_module_names = ['fc2']
config.compression_setting.pruners = [TaylorFOWeightPrunerConfig()]
experiment = CompressionExperiment(config, model, finetuner, evaluator, dummy_input, trainer, optimizer, criterion, device)
experiment.run(8080)

Просмотреть файл

@ -1,99 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torchvision import datasets, transforms
import nni
@nni.trace
class LeNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
_use_cuda = True
device = torch.device("cuda" if _use_cuda else "cpu")
_train_kwargs = {'batch_size': 64}
_test_kwargs = {'batch_size': 1000}
if _use_cuda:
_cuda_kwargs = {'num_workers': 1,
'pin_memory': True,
'shuffle': True}
_train_kwargs.update(_cuda_kwargs)
_test_kwargs.update(_cuda_kwargs)
_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
_train_loader = None
_test_loader = None
def trainer(model, optimizer, criterion):
global _train_loader
if _train_loader is None:
dataset = datasets.MNIST('./data', train=True, download=True, transform=_transform)
_train_loader = torch.utils.data.DataLoader(dataset, **_train_kwargs)
model.train()
for data, target in _train_loader:
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
global _test_loader
if _test_loader is None:
dataset = datasets.MNIST('./data', train=False, transform=_transform, download=True)
_test_loader = torch.utils.data.DataLoader(dataset, **_test_kwargs)
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in _test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(_test_loader.dataset)
acc = 100 * correct / len(_test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(_test_loader.dataset), acc))
return acc
criterion = F.nll_loss
def finetuner(model: nn.Module):
optimizer = Adam(model.parameters())
for i in range(3):
trainer(model, optimizer, criterion)

Просмотреть файл

@ -1,115 +0,0 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_planes, planes, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(
in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*planes,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion*planes)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
out = F.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, in_planes, planes, stride=1):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, self.expansion *
planes, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(self.expansion*planes)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*planes,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion*planes)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
out = self.bn3(self.conv3(out))
out += self.shortcut(x)
out = F.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, block, num_blocks, num_classes=10):
super(ResNet, self).__init__()
self.in_planes = 64
# this layer is different from torchvision.resnet18() since this model adopted for Cifar10
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.linear = nn.Linear(512*block.expansion, num_classes)
def _make_layer(self, block, planes, num_blocks, stride):
strides = [stride] + [1]*(num_blocks-1)
layers = []
for stride in strides:
layers.append(block(self.in_planes, planes, stride))
self.in_planes = planes * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
def ResNet18():
return ResNet(BasicBlock, [2, 2, 2, 2])
def ResNet34():
return ResNet(BasicBlock, [3, 4, 6, 3])
def ResNet50():
return ResNet(Bottleneck, [3, 4, 6, 3])
def ResNet101():
return ResNet(Bottleneck, [3, 4, 23, 3])
def ResNet152():
return ResNet(Bottleneck, [3, 8, 36, 3])

Просмотреть файл

@ -1,63 +0,0 @@
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
defaultcfg = {
11: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
13: [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
16: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512],
19: [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512],
}
class VGG(nn.Module):
def __init__(self, depth=16):
super(VGG, self).__init__()
cfg = defaultcfg[depth]
self.cfg = cfg
self.feature = self.make_layers(cfg, True)
num_classes = 10
self.classifier = nn.Sequential(
nn.Linear(cfg[-1], 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True),
nn.Linear(512, num_classes)
)
self._initialize_weights()
def make_layers(self, cfg, batch_norm=False):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1, bias=False)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
def forward(self, x):
x = self.feature(x)
x = nn.AvgPool2d(2)(x)
x = x.view(x.size(0), -1)
y = self.classifier(x)
return y
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(0.5)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()

Просмотреть файл

@ -1,29 +0,0 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output

Просмотреть файл

@ -1,27 +0,0 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from functools import reduce
class NaiveModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)
self.relu1 = torch.nn.ReLU6()
self.relu2 = torch.nn.ReLU6()
self.relu3 = torch.nn.ReLU6()
self.max_pool1 = torch.nn.MaxPool2d(2, 2)
self.max_pool2 = torch.nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.relu1(self.conv1(x))
x = self.max_pool1(x)
x = self.relu2(self.conv2(x))
x = self.max_pool2(x)
x = x.view(-1, x.size()[1:].numel())
x = self.relu3(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)

Просмотреть файл

@ -1,88 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
#
# This file contains code adapted from AMC (https://github.com/mit-han-lab/amc)
# Copyright (c) 2018 MIT_Han_Lab
# Licensed under the MIT License
# https://github.com/mit-han-lab/amc/blob/master/LICENSE
import torch.nn as nn
import math
def conv_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU(inplace=True)
)
def conv_dw(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
nn.BatchNorm2d(inp),
nn.ReLU(inplace=True),
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU(inplace=True),
)
class MobileNet(nn.Module):
def __init__(self, n_class, profile='normal'):
super(MobileNet, self).__init__()
# original
if profile == 'normal':
in_planes = 32
cfg = [64, (128, 2), 128, (256, 2), 256, (512, 2), 512, 512, 512, 512, 512, (1024, 2), 1024]
# 0.5 AMC
elif profile == '0.5flops':
in_planes = 24
cfg = [48, (96, 2), 80, (192, 2), 200, (328, 2), 352, 368, 360, 328, 400, (736, 2), 752]
else:
raise NotImplementedError
self.conv1 = conv_bn(3, in_planes, stride=2)
self.features = self._make_layers(in_planes, cfg, conv_dw)
self.classifier = nn.Sequential(
nn.Linear(cfg[-1], n_class),
)
self._initialize_weights()
def forward(self, x):
x = self.conv1(x)
x = self.features(x)
x = x.mean([2, 3]) # global average pooling
x = self.classifier(x)
return x
def _make_layers(self, in_planes, cfg, layer):
layers = []
for x in cfg:
out_planes = x if isinstance(x, int) else x[0]
stride = 1 if isinstance(x, int) else x[1]
layers.append(layer(in_planes, out_planes, stride))
in_planes = out_planes
return nn.Sequential(*layers)
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()

Просмотреть файл

@ -1,131 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch.nn as nn
import math
def conv_bn(inp, oup, stride):
return nn.Sequential(
nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
def conv_1x1_bn(inp, oup):
return nn.Sequential(
nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
nn.ReLU6(inplace=True)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = round(inp * expand_ratio)
self.use_res_connect = self.stride == 1 and inp == oup
if expand_ratio == 1:
self.conv = nn.Sequential(
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
else:
self.conv = nn.Sequential(
# pw
nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# dw
nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),
nn.BatchNorm2d(hidden_dim),
nn.ReLU6(inplace=True),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self, n_class=1000, input_size=224, width_mult=1.):
super(MobileNetV2, self).__init__()
block = InvertedResidual
input_channel = 32
last_channel = 1280
interverted_residual_setting = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
# building first layer
assert input_size % 32 == 0
input_channel = int(input_channel * width_mult)
self.last_channel = int(last_channel * width_mult) if width_mult > 1.0 else last_channel
self.features = [conv_bn(3, input_channel, 2)]
# building inverted residual blocks
for t, c, n, s in interverted_residual_setting:
output_channel = int(c * width_mult)
for i in range(n):
if i == 0:
self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
else:
self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
input_channel = output_channel
# building last several layers
self.features.append(conv_1x1_bn(input_channel, self.last_channel))
# make it nn.Sequential
self.features = nn.Sequential(*self.features)
# building classifier
self.classifier = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(self.last_channel, n_class),
)
self._initialize_weights()
def forward(self, x):
x = self.features(x)
# it's same with .mean(3).mean(2), but
# speedup only suport the mean option
# whose output only have two dimensions
x = x.mean([2, 3])
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
n = m.weight.size(1)
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()

Просмотреть файл

@ -1,142 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported ActivationAPoZRank and ActivationMeanRank pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import ActivationAPoZRankPruner, ActivationMeanRankPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pruner', type=str, default='apoz',
choices=['apoz', 'mean'],
help='pruner to use')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'total_sparsity': 0.5,
'op_types': ['Conv2d'],
}]
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
if 'apoz' in args.pruner:
pruner = ActivationAPoZRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
else:
pruner = ActivationMeanRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,142 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported ActivationAPoZRank and ActivationMeanRank pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import ActivationAPoZRankPruner, ActivationMeanRankPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def train(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluate(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pruner', type=str, default='apoz',
choices=['apoz', 'mean'],
help='pruner to use')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
train(model, optimizer, criterion)
scheduler.step()
acc = evaluate(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'total_sparsity': 0.5,
'op_types': ['Conv2d'],
}]
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
if 'apoz' in args.pruner:
pruner = ActivationAPoZRankPruner(model, config_list, train, traced_optimizer, criterion, training_batches=20)
else:
pruner = ActivationMeanRankPruner(model, config_list, train, traced_optimizer, criterion, training_batches=20)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, torch.rand([10, 3, 32, 32]), masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluate(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
train(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluate(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,138 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported ADMM pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch.speedup import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import ADMMPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
}]
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
pruner = ADMMPruner(model, config_list, trainer, traced_optimizer, criterion, iterations=10, training_epochs=1, granularity='coarse-grained')
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, torch.randn([128, 3, 32, 32]).to(device), masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER PRUNING ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,98 +0,0 @@
import sys
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
from nni.compression.pytorch.pruning import AMCPruner
from nni.compression.pytorch.utils import count_flops_params
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
def trainer(model, optimizer, criterion, epoch):
model.train()
for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def finetuner(model):
model.train()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
# model = MobileNetV2(n_class=10).to(device)
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = MultiStepLR(optimizer, milestones=[50, 75], gamma=0.1)
criterion = torch.nn.CrossEntropyLoss()
for i in range(100):
trainer(model, optimizer, criterion, i)
pre_best_acc = evaluator(model)
dummy_input = torch.rand(10, 3, 32, 32).to(device)
pre_flops, pre_params, _ = count_flops_params(model, dummy_input)
config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.5, 'max_sparsity_per_layer': 0.8}]
# if you just want to keep the final result as the best result, you can pass evaluator as None.
# or the result with the highest score (given by evaluator) will be the best result.
ddpg_params = {'hidden1': 300, 'hidden2': 300, 'lr_c': 1e-3, 'lr_a': 1e-4, 'warmup': 100, 'discount': 1., 'bsize': 64,
'rmsize': 100, 'window_length': 1, 'tau': 0.01, 'init_delta': 0.5, 'delta_decay': 0.99, 'max_episode_length': 1e9, 'epsilon': 50000}
pruner = AMCPruner(400, model, config_list, dummy_input, evaluator, finetuner=finetuner, ddpg_params=ddpg_params, target='flops')
pruner.compress()
_, model, masks, best_acc, _ = pruner.get_best_result()
flops, params, _ = count_flops_params(model, dummy_input)
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,94 +0,0 @@
import sys
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
import nni
from nni.compression.pytorch.pruning import AutoCompressPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
epoch = 0
def trainer(model, optimizer, criterion):
global epoch
model.train()
for data, target in tqdm(iterable=train_loader, desc='Total Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
epoch = epoch + 1
def finetuner(model):
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
trainer(model, optimizer, criterion)
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
# pre-train the model
for _ in range(10):
trainer(model, optimizer, criterion)
config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.8}]
dummy_input = torch.rand(10, 3, 32, 32).to(device)
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
admm_params = {
'trainer': trainer,
'traced_optimizer': traced_optimizer,
'criterion': criterion,
'iterations': 10,
'training_epochs': 1
}
sa_params = {
'evaluator': evaluator
}
pruner = AutoCompressPruner(model, config_list, 10, admm_params, sa_params, keep_intermediate_result=True, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()

Просмотреть файл

@ -1,131 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported fpgm pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import FPGMPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = FPGMPruner(model, config_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,138 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported iterative pruning algorithms.
In this example, we show the end-to-end iterative pruning process: pre-training -> pruning -> fine-tuning.
'''
import sys
import argparse
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
from nni.compression.pytorch.pruning import (
LinearPruner,
AGPPruner,
LotteryTicketPruner
)
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
def trainer(model, optimizer, criterion, epoch):
model.train()
for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def finetuner(model):
model.train()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Iterative Example for model comporession')
parser.add_argument('--pruner', type=str, default='linear',
choices=['linear', 'agp', 'lottery'],
help='pruner to use')
parser.add_argument('--pretrain-epochs', type=int, default=10,
help='number of epochs to pretrain the model')
parser.add_argument('--total-iteration', type=int, default=10,
help='number of iteration to iteratively prune the model')
parser.add_argument('--pruning-algo', type=str, default='l1',
choices=['level', 'l1', 'l2', 'fpgm', 'slim', 'apoz',
'mean_activation', 'taylorfo', 'admm'],
help='algorithm to evaluate weights to prune')
parser.add_argument('--speedup', type=bool, default=False,
help='Whether to speedup the pruned model')
parser.add_argument('--reset-weight', type=bool, default=True,
help='Whether to reset weight during each iteration')
args = parser.parse_args()
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
# pre-train the model
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion, i)
evaluator(model)
config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
dummy_input = torch.rand(10, 3, 32, 32).to(device)
# if you just want to keep the final result as the best result, you can pass evaluator as None.
# or the result with the highest score (given by evaluator) will be the best result.
kw_args = {'pruning_algorithm': args.pruning_algo,
'total_iteration': args.total_iteration,
'evaluator': None,
'finetuner': finetuner}
if args.speedup:
kw_args['speedup'] = args.speedup
kw_args['dummy_input'] = torch.rand(10, 3, 32, 32).to(device)
if args.pruner == 'linear':
iterative_pruner = LinearPruner
elif args.pruner == 'agp':
iterative_pruner = AGPPruner
elif args.pruner == 'lottery':
kw_args['reset_weight'] = args.reset_weight
iterative_pruner = LotteryTicketPruner
pruner = iterative_pruner(model, config_list, **kw_args)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
evaluator(model)

Просмотреть файл

@ -1,130 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported level pruning algorithm.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import LevelPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'sparsity': 0.5,
'op_types': ['default']
}]
pruner = LevelPruner(model, config_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
# Fine-grained method does not need to speedup
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER PRUNING ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,128 +0,0 @@
import functools
import time
from tqdm import tqdm
import torch
from torch.optim import Adam
from torch.utils.data import DataLoader
from datasets import load_metric, load_dataset
from transformers import (
BertForSequenceClassification,
BertTokenizerFast,
DataCollatorWithPadding,
set_seed
)
import nni
from nni.compression.pytorch.pruning import MovementPruner
task_to_keys = {
"cola": ("sentence", None),
"mnli": ("premise", "hypothesis"),
"mrpc": ("sentence1", "sentence2"),
"qnli": ("question", "sentence"),
"qqp": ("question1", "question2"),
"rte": ("sentence1", "sentence2"),
"sst2": ("sentence", None),
"stsb": ("sentence1", "sentence2"),
"wnli": ("sentence1", "sentence2"),
}
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
gradient_accumulation_steps = 8
# a fake criterion because huggingface output already has loss
def criterion(input, target):
return input.loss
def trainer(model, optimizer, criterion, train_dataloader):
model.train()
counter = 0
for batch in (train_dataloader):
counter += 1
batch.to(device)
optimizer.zero_grad()
outputs = model(**batch)
# pruner may wrap the criterion, for example, loss = origin_loss + norm(weight), so call criterion to get loss here
loss = criterion(outputs, None)
loss = loss / gradient_accumulation_steps
loss.backward()
if counter % gradient_accumulation_steps == 0 or counter == len(train_dataloader):
optimizer.step()
if counter % 800 == 0:
print('[{}]: {}'.format(time.asctime(time.localtime(time.time())), counter))
if counter % 8000 == 0:
print('Step {}: {}'.format(counter // gradient_accumulation_steps, evaluator(model, metric, is_regression, validate_dataloader)))
def evaluator(model, metric, is_regression, eval_dataloader):
model.eval()
for batch in (eval_dataloader):
batch.to(device)
outputs = model(**batch)
predictions = outputs.logits.argmax(dim=-1) if not is_regression else outputs.logits.squeeze()
metric.add_batch(
predictions=predictions,
references=batch["labels"],
)
return metric.compute()
if __name__ == '__main__':
task_name = 'mnli'
is_regression = False
num_labels = 1 if is_regression else (3 if task_name == 'mnli' else 2)
train_batch_size = 4
eval_batch_size = 4
set_seed(1024)
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
sentence1_key, sentence2_key = task_to_keys[task_name]
# used to preprocess the raw data
def preprocess_function(examples):
# Tokenize the texts
args = (
(examples[sentence1_key],) if sentence2_key is None else (examples[sentence1_key], examples[sentence2_key])
)
result = tokenizer(*args, padding=False, max_length=128, truncation=True)
if "label" in examples:
# In all cases, rename the column to labels because the model will expect that.
result["labels"] = examples["label"]
return result
raw_datasets = load_dataset('glue', task_name, cache_dir='./data')
processed_datasets = raw_datasets.map(preprocess_function, batched=True, remove_columns=raw_datasets["train"].column_names)
train_dataset = processed_datasets['train']
validate_dataset = processed_datasets['validation_matched' if task_name == "mnli" else 'validation']
data_collator = DataCollatorWithPadding(tokenizer)
train_dataloader = DataLoader(train_dataset, shuffle=True, collate_fn=data_collator, batch_size=train_batch_size)
validate_dataloader = DataLoader(validate_dataset, collate_fn=data_collator, batch_size=eval_batch_size)
metric = load_metric("glue", task_name)
model = BertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=num_labels).to(device)
print('Initial: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))
config_list = [{'op_types': ['Linear'], 'op_partial_names': ['bert.encoder'], 'sparsity': 0.9}]
p_trainer = functools.partial(trainer, train_dataloader=train_dataloader)
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(Adam)(model.parameters(), lr=2e-5)
pruner = MovementPruner(model, config_list, p_trainer, traced_optimizer, criterion, training_epochs=10,
warm_up_step=12272, cool_down_beginning_step=110448)
_, masks = pruner.compress()
pruner.show_pruned_weights()
print('Final: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))
optimizer = Adam(model.parameters(), lr=2e-5)
trainer(model, optimizer, criterion, train_dataloader)
print('After 1 epoch finetuning: {}'.format(evaluator(model, metric, is_regression, validate_dataloader)))

Просмотреть файл

@ -1,137 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported l1norm and l2norm pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import L1NormPruner, L2NormPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pruner', type=str, default='l1norm',
choices=['l1norm', 'l2norm'],
help='pruner to use')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
if 'l1' in args.pruner:
pruner = L1NormPruner(model, config_list)
else:
pruner = L2NormPruner(model, config_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,100 +0,0 @@
import sys
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
from nni.compression.pytorch.pruning import L1NormPruner
from nni.compression.pytorch.pruning.tools import AGPTaskGenerator
from nni.compression.pytorch.pruning.basic_scheduler import PruningScheduler
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
def trainer(model, optimizer, criterion, epoch):
model.train()
for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def finetuner(model):
model.train()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
# pre-train the model
for i in range(5):
trainer(model, optimizer, criterion, i)
# No need to pass model and config_list to pruner during initializing when using scheduler.
pruner = L1NormPruner(None, None)
# you can specify the log_dir, all intermediate results and best result will save under this folder.
# if you don't want to keep intermediate results, you can set `keep_intermediate_result=False`.
config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
task_generator = AGPTaskGenerator(10, model, config_list, log_dir='.', keep_intermediate_result=True)
dummy_input = torch.rand(10, 3, 32, 32).to(device)
# if you just want to keep the final result as the best result, you can pass evaluator as None.
# or the result with the highest score (given by evaluator) will be the best result.
# scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speedup=True, dummy_input=dummy_input, evaluator=evaluator)
scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speedup=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)
scheduler.compress()
_, model, masks, _, _ = scheduler.get_best_result()

Просмотреть файл

@ -1,88 +0,0 @@
import sys
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
from nni.compression.pytorch.pruning import L1NormPruner
from nni.compression.pytorch.speedup import ModelSpeedup
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
def trainer(model, optimizer, criterion, epoch):
model.train()
for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
print('\nPre-train the model:')
for i in range(5):
trainer(model, optimizer, criterion, i)
evaluator(model)
config_list = [{'op_types': ['Conv2d'], 'sparsity': 0.8}]
pruner = L1NormPruner(model, config_list)
_, masks = pruner.compress()
print('\nThe accuracy with masks:')
evaluator(model)
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand(10, 3, 32, 32).to(device), masks_file=masks).speedup_model()
print('\nThe accuracy after speedup:')
evaluator(model)
# Need a new optimizer due to the modules in model will be replaced during speedup.
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
print('\nFinetune the model after speedup:')
for i in range(5):
trainer(model, optimizer, criterion, i)
evaluator(model)

Просмотреть файл

@ -1,109 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for simulated anealing pruning algorithm.
In this example, we show the end-to-end iterative pruning process: pre-training -> pruning -> fine-tuning.
'''
import sys
import argparse
from tqdm import tqdm
import torch
from torchvision import datasets, transforms
from nni.compression.pytorch.pruning import SimulatedAnnealingPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
criterion = torch.nn.CrossEntropyLoss()
def trainer(model, optimizer, criterion, epoch):
model.train()
for data, target in tqdm(iterable=train_loader, desc='Epoch {}'.format(epoch)):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def finetuner(model):
model.train()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for data, target in tqdm(iterable=train_loader, desc='Epoch PFs'):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
def evaluator(model):
model.eval()
correct = 0
with torch.no_grad():
for data, target in tqdm(iterable=test_loader, desc='Test'):
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Iterative Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=10,
help='number of epochs to pretrain the model')
parser.add_argument('--pruning-algo', type=str, default='l1',
choices=['level', 'l1', 'l2', 'fpgm', 'slim', 'apoz',
'mean_activation', 'taylorfo', 'admm'],
help='algorithm to evaluate weights to prune')
parser.add_argument('--cool-down-rate', type=float, default=0.9,
help='Cool down rate of the temperature.')
args = parser.parse_args()
model = VGG().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
# pre-train the model
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion, i)
evaluator(model)
config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.8}]
# evaluator in 'SimulatedAnnealingPruner' could not be None.
pruner = SimulatedAnnealingPruner(model, config_list, pruning_algorithm=args.pruning_algo,
evaluator=evaluator, cool_down_rate=args.cool_down_rate, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
evaluator(model)

Просмотреть файл

@ -1,136 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported slim pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> speedup -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import SlimPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'total_sparsity': 0.5,
'op_types': ['BatchNorm2d'],
'max_sparsity_per_layer': 0.9
}]
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
pruner = SlimPruner(model, config_list, trainer, traced_optimizer, criterion, training_epochs=1, scale=0.0001, mode='global')
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,36 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
This is an example for pruning speedup the huggingface transformers.
Now nni officially support speedup bert, bart, t5, vit attention heads.
For other transforms attention or even any hyper-module, users could customize by implementation a Replacer.
"""
import torch
from transformers.models.bert.configuration_bert import BertConfig
from transformers.models.bert.modeling_bert import BertForSequenceClassification
from nni.compression.pytorch.pruning import L1NormPruner
from nni.compression.pytorch.speedup import ModelSpeedup
from nni.compression.pytorch.utils.external.atten_replacer import TransformersAttentionReplacer
config = BertConfig()
model = BertForSequenceClassification(config)
config_list = [{
'op_types': ['Linear'],
'op_partial_names': ['bert.encoder.layer.{}.attention.self'.format(i) for i in range(12)],
'sparsity': 0.98
}]
pruner = L1NormPruner(model, config_list)
_, masks = pruner.compress()
pruner._unwrap_model()
replacer = TransformersAttentionReplacer(model)
ModelSpeedup(model, torch.randint(0, 30000, [4, 128]), masks, customized_replacers=[replacer]).speedup_model()
print(model(**{'input_ids': torch.randint(0, 30000, [4, 128])}))
print(model)

Просмотреть файл

@ -1,165 +0,0 @@
from __future__ import annotations
import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger
import torch
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from torchmetrics.functional import accuracy
from torchvision import datasets, transforms
import nni
from nni.compression.pytorch import LightningEvaluator
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
class SimpleLightningModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.model = VGG()
self.criterion = torch.nn.CrossEntropyLoss()
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = self.criterion(logits, y)
self.log("train_loss", loss)
return loss
def evaluate(self, batch, stage=None):
x, y = batch
logits = self(x)
loss = self.criterion(logits, y)
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y, 'multiclass', num_classes=10)
if stage:
self.log(f"default", loss, prog_bar=False)
self.log(f"{stage}_loss", loss, prog_bar=True)
self.log(f"{stage}_acc", acc, prog_bar=True)
def validation_step(self, batch, batch_idx):
self.evaluate(batch, "val")
def test_step(self, batch, batch_idx):
self.evaluate(batch, "test")
def configure_optimizers(self):
optimizer = nni.trace(torch.optim.Adam)(
self.parameters(),
lr=0.001
)
scheduler_dict = {
"scheduler": nni.trace(StepLR)(
optimizer,
step_size=1,
gamma=0.5
),
"interval": "epoch",
}
return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
class ImageNetDataModule(pl.LightningDataModule):
def __init__(self, data_dir: str = "./data"):
super().__init__()
self.data_dir = data_dir
def prepare_data(self):
# download
datasets.CIFAR10(self.data_dir, train=True, download=True)
datasets.CIFAR10(self.data_dir, train=False, download=True)
def setup(self, stage: str | None = None):
if stage == "fit" or stage is None:
self.cifar10_train_data = datasets.CIFAR10(root='data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]))
self.cifar10_val_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]))
if stage == "test" or stage is None:
self.cifar10_test_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]))
if stage == "predict" or stage is None:
self.cifar10_predict_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]))
def train_dataloader(self):
return DataLoader(self.cifar10_train_data, batch_size=128, shuffle=True)
def val_dataloader(self):
return DataLoader(self.cifar10_val_data, batch_size=128, shuffle=False)
def test_dataloader(self):
return DataLoader(self.cifar10_test_data, batch_size=128, shuffle=False)
def predict_dataloader(self):
return DataLoader(self.cifar10_predict_data, batch_size=128, shuffle=False)
# Train the model
pl_trainer = nni.trace(pl.Trainer)(
accelerator='auto',
devices=1,
max_epochs=3,
logger=TensorBoardLogger('./lightning_logs', name="vgg"),
)
pl_data = nni.trace(ImageNetDataModule)(data_dir='./data')
model = SimpleLightningModel()
pl_trainer.fit(model, pl_data)
metric = pl_trainer.test(model, pl_data)
print(f'The trained model accuracy: {metric}')
# create traced optimizer / lr_scheduler
optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss()
lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
dummy_input = torch.rand(4, 3, 224, 224)
# TorchEvaluator initialization
evaluator = LightningEvaluator(pl_trainer, pl_data)
# apply pruning
from nni.compression.pytorch.pruning import TaylorFOWeightPruner
from nni.compression.pytorch.speedup import ModelSpeedup
pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
_, masks = pruner.compress()
metric = pl_trainer.test(model, pl_data)
print(f'The masked model accuracy: {metric}')
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]), masks_file=masks).speedup_model()
metric = pl_trainer.test(model, pl_data)
print(f'The speedup model accuracy: {metric}')
# finetune the speedup model
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss()
lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
pl_trainer = pl.Trainer(
accelerator='auto',
devices=1,
max_epochs=3,
logger=TensorBoardLogger('./lightning_logs', name="vgg"),
)
pl_trainer.fit(model, pl_data)
metric = pl_trainer.test(model, pl_data)
print(f'The speedup model after finetuning accuracy: {metric}')

Просмотреть файл

@ -1,136 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported TaylorFOWeight pruning algorithms.
In this example, we show the end-to-end pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import sys
import torch
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import TaylorFOWeightPruner
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
g_epoch = 0
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=128, shuffle=False)
def trainer(model, optimizer, criterion):
global g_epoch
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if batch_idx and batch_idx % 100 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
g_epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
g_epoch += 1
def evaluator(model):
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(test_loader.dataset)
print('Accuracy: {}%\n'.format(acc))
return acc
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession')
parser.add_argument('--pretrain-epochs', type=int, default=20,
help='number of epochs to pretrain the model')
parser.add_argument('--fine-tune-epochs', type=int, default=20,
help='number of epochs to fine tune the model')
args = parser.parse_args()
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
model = VGG().to(device)
optimizer, scheduler = optimizer_scheduler_generator(model, total_epoch=args.pretrain_epochs)
criterion = torch.nn.CrossEntropyLoss()
pre_best_acc = 0.0
best_state_dict = None
for i in range(args.pretrain_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
acc = evaluator(model)
if acc > pre_best_acc:
pre_best_acc = acc
best_state_dict = model.state_dict()
print("Best accuracy: {}".format(pre_best_acc))
model.load_state_dict(best_state_dict)
pre_flops, pre_params, _ = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
g_epoch = 0
# Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
config_list = [{
'total_sparsity': 0.5,
'op_types': ['Conv2d'],
}]
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
pruner = TaylorFOWeightPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
print('\n' + '=' * 50 + ' EVALUATE THE MODEL AFTER SPEEDUP ' + '=' * 50)
evaluator(model)
# Optimizer used in the pruner might be patched, so recommend to new an optimizer for fine-tuning stage.
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
optimizer, scheduler = optimizer_scheduler_generator(model, _lr=0.01, total_epoch=args.fine_tune_epochs)
best_acc = 0.0
g_epoch = 0
for i in range(args.fine_tune_epochs):
trainer(model, optimizer, criterion)
scheduler.step()
best_acc = max(evaluator(model), best_acc)
flops, params, results = count_flops_params(model, torch.randn([128, 3, 32, 32]).to(device))
print(f'Pretrained model FLOPs {pre_flops/1e6:.2f} M, #Params: {pre_params/1e6:.2f}M, Accuracy: {pre_best_acc: .2f}%')
print(f'Finetuned model FLOPs {flops/1e6:.2f} M, #Params: {params/1e6:.2f}M, Accuracy: {best_acc: .2f}%')

Просмотреть файл

@ -1,265 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
NNI example for supported DistributedDataParallel pruning.
In this example, we use TaylorFo pruner to show the end-to-end ddp pruning process: pre-training -> pruning -> fine-tuning.
Note that pruners use masks to simulate the real pruning. In order to obtain a real compressed model, model speedup is required.
'''
import argparse
import time
import functools
from typing import Callable
from pathlib import Path
import torch
import torch.nn as nn
import torch.distributed as dist
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import MultiStepLR
import nni
from nni.compression.pytorch import ModelSpeedup
from nni.compression.pytorch.utils import count_flops_params
from nni.compression.pytorch.pruning import TaylorFOWeightPruner
from nni.compression.pytorch.utils import TorchEvaluator
from nni.common.types import SCHEDULER
############# Create dataloaders, optimizer, training and evaluation function ############
class Mnist(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)
self.relu1 = torch.nn.ReLU6()
self.relu2 = torch.nn.ReLU6()
self.relu3 = torch.nn.ReLU6()
self.max_pool1 = torch.nn.MaxPool2d(2, 2)
self.max_pool2 = torch.nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.relu1(self.conv1(x))
x = self.max_pool1(x)
x = self.relu2(self.conv2(x))
x = self.max_pool2(x)
x = x.view(x.shape[0], -1)
x = self.relu3(self.fc1(x))
x = self.fc2(x)
return x
def create_dataloaders():
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# training dataloader
training_dataset = datasets.MNIST('data', train=True, download=True, transform=trans)
training_sampler = torch.utils.data.distributed.DistributedSampler(training_dataset)
training_dataloader = torch.utils.data.DataLoader(training_dataset, \
batch_size=64, sampler=training_sampler)
# validation dataloader
validation_dataset = datasets.MNIST('data', train=False, transform=trans)
validation_sampler = torch.utils.data.distributed.DistributedSampler(validation_dataset)
validation_dataloader = torch.utils.data.DataLoader(validation_dataset, \
batch_size=1000, sampler=validation_sampler)
return training_dataloader, validation_dataloader
def training(
training_dataloader: DataLoader,
validation_dataloader: DataLoader,
model: nn.Module,
optimizer: torch.optim.Optimizer,
criterion: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
lr_scheduler: SCHEDULER = None,
max_steps: int = None, max_epochs: int = None,
local_rank: int = -1,
save_best_model: bool = False, save_path: str = None,
log_path: str = None,
evaluation_func=None,
):
model.train()
current_step = 0
best_acc = 0.
for current_epoch in range(max_epochs if max_epochs else 2):
for (data, target) in training_dataloader:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
if lr_scheduler:
lr_scheduler.step()
current_step += 1
# evaluation for every 1000 steps
if current_step % 1000 == 0 or current_step % len(training_dataloader) == 0:
acc = evaluation_func(validation_dataloader, model)
with log_path.open('a+') as f:
msg = '[{}] Epoch {}, Step {}: Acc: {} Loss:{}\n'.format(time.asctime(time.localtime(time.time())), \
current_epoch, current_step, acc, loss.item())
f.write(msg)
if save_best_model and best_acc < acc:
assert save_path is not None
if local_rank == 0:
torch.save(model.module.state_dict(), save_path)
best_acc = acc
if max_steps and current_step >= max_steps:
return best_acc
return best_acc
def evaluation(validation_dataloader: DataLoader, model: nn.Module):
training = model.training
model.eval()
correct = 0.0
with torch.no_grad():
for data, target in validation_dataloader:
data, target = data.cuda(), target.cuda()
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
acc = 100 * correct / len(validation_dataloader.dataset)
# average acc in different local_ranks
average_acc = torch.tensor([acc]).cuda()
dist.all_reduce(average_acc, op=dist.ReduceOp.SUM)
world_size = dist.get_world_size()
average_acc = average_acc / world_size
print('Average Accuracy: {}%\n'.format(average_acc.item()))
model.train(training)
return average_acc.item()
def optimizer_scheduler_generator(model, _lr=0.1, _momentum=0.9, _weight_decay=5e-4, total_epoch=160):
optimizer = torch.optim.SGD(model.parameters(), lr=_lr, momentum=_momentum, weight_decay=_weight_decay)
scheduler = MultiStepLR(optimizer, milestones=[int(total_epoch * 0.5), int(total_epoch * 0.75)], gamma=0.1)
return optimizer, scheduler
def retrain_model(
args,
local_rank: int,
model: nn.Module = None,
):
# create an ddp model
if model is None: # pretraining process
model = Mnist().cuda()
log_save_path = "pretraining.log"
model_save_path = "pretraining_best_model.pth"
epochs = args.pretrain_epochs
lr = args.pretraining_lr
else: # finetune process
log_save_path = "finetune.log"
model_save_path = "finetune_best_model.pth"
epochs = args.finetune_epochs
lr = args.finetune_lr
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
# create dataloaders
training_dataloader, validation_dataloader = create_dataloaders()
# create optimizer, lr_scheduler and criterion
optimizer, lr_scheduler = optimizer_scheduler_generator(model, \
_lr=lr, total_epoch=epochs)
criterion = torch.nn.CrossEntropyLoss()
# training and evaluation process
best_acc = training(training_dataloader, validation_dataloader, model, optimizer, criterion, lr_scheduler,\
args.max_steps, epochs, local_rank, save_best_model=True, \
save_path=Path(args.log_dir) / model_save_path, \
log_path=Path(args.log_dir) / log_save_path, \
evaluation_func=evaluation)
# compute params and FLOPs
flops, params, _ = count_flops_params(model, torch.randn([32, 1, 28, 28]).cuda())
return flops, params, best_acc
def pruned_model_process(args, local_rank):
# load the pretrained model
model = Mnist().cuda()
state_dict = torch.load(Path(args.log_dir) / "pretraining_best_model.pth")
model.load_state_dict(state_dict)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])
# create dataloaders
training_dataloader, validation_dataloader = create_dataloaders()
# build a config_list
config_list = [{'total_sparsity': 0.7, 'op_types': ['Conv2d']}]
# create an evaluator
taylor_training = functools.partial(
training,
training_dataloader,
validation_dataloader,
local_rank = local_rank,
log_path = Path(args.log_dir) / "taylor_pruning.log",
evaluation_func = evaluation,
)
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
evaluator = TorchEvaluator(taylor_training, traced_optimizer, criterion)
# create an taylor pruner
pruner = TaylorFOWeightPruner(model=model, config_list=config_list,
evaluator=evaluator, training_steps=args.pruner_training_steps)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()
#speedup
sub_module = ModelSpeedup(model, dummy_input=torch.rand([32, 1, 28, 28]).cuda(), masks_file=masks).speedup_model()
return sub_module
def main():
parser = argparse.ArgumentParser(description='PyTorch Example for model comporession with ddp')
parser.add_argument('--finetune_lr', type=float, default=0.01,
help='the learning rate in the fine-tune process')
parser.add_argument('--pretraining_lr', type=float, default=0.01,
help='the learning rate in the pretraining process')
parser.add_argument('--eps', type=float, default=1e-8,
help='the parameter in the Adam optimizer')
parser.add_argument('--max_steps', type=int, default=None,
help='the max number of training steps')
parser.add_argument('--log_dir', type=str, default='./mnist_infos',
help='the base path for saving files')
parser.add_argument('--pruner_training_steps', type=int, default=1000,
help='the number of training steps in the pruning process')
parser.add_argument('--pretrain_epochs', type=int, default=5,
help='number of epochs to pretrain the model')
parser.add_argument('--finetune_epochs', type=int, default=20,
help='number of epochs to fine-tune the model')
args = parser.parse_args()
Path(args.log_dir).mkdir(parents=True, exist_ok=True)
#init ddp
dist.init_process_group(backend='nccl')
# get local_rank
rank = dist.get_rank()
local_rank = rank % torch.cuda.device_count()
print(f"local_rank:{local_rank}")
torch.cuda.set_device(local_rank)
print('\n' + '=' * 50 + ' START TO TRAIN THE MODEL ' + '=' * 50)
original_flops, original_params, original_best_acc = retrain_model(args, local_rank)
# # Start to prune and speedup
print('\n' + '=' * 50 + ' START TO PRUNE THE BEST ACCURACY PRETRAINED MODEL ' + '=' * 50)
model = pruned_model_process(args, local_rank)
print('\n' + '=' * 50 + ' START TO FINE TUNE THE MODEL ' + '=' * 50)
finetuned_flops, finetuned_params, finetuned_best_acc = retrain_model(args, local_rank, model.cuda())
print(f'Pretrained model FLOPs {original_flops/1e6:.2f} M, #Params: {original_params/1e6:.2f}M, Accuracy: {original_best_acc: .2f}%')
print(f'Finetuned model FLOPs {finetuned_flops/1e6:.2f} M, #Params: {finetuned_params/1e6:.2f}M, Accuracy: {finetuned_best_acc: .2f}%')
if __name__ == '__main__':
main()

Просмотреть файл

@ -1,112 +0,0 @@
from __future__ import annotations
from typing import Callable, Any
import torch
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import nni
from nni.compression.pytorch import TorchEvaluator
from nni.common.types import SCHEDULER
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
from cifar10.vgg import VGG
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model: torch.nn.Module = VGG().to(device)
def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
criterion: Callable[[Any, Any], torch.Tensor],
lr_schedulers: SCHEDULER | None = None, max_steps: int | None = None,
max_epochs: int | None = None, *args, **kwargs):
model.train()
# prepare data
cifar10_train_data = datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]), download=True)
train_dataloader = DataLoader(cifar10_train_data, batch_size=128, shuffle=True)
total_epochs = max_epochs if max_epochs else 3
total_steps = max_steps if max_steps else None
current_steps = 0
# training loop
for _ in range(total_epochs):
for inputs, labels in train_dataloader:
inputs, labels = inputs.to(device), labels.to(device)
optimizers.zero_grad()
loss = criterion(model(inputs), labels)
loss.backward()
optimizers.step()
current_steps += 1
if total_steps and current_steps == total_steps:
return
lr_schedulers.step()
def evaluating_func(model: torch.nn.Module):
model.eval()
# prepare data
cifar10_val_data = datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
]), download=True)
val_dataloader = DataLoader(cifar10_val_data, batch_size=4, shuffle=False)
# testing loop
correct = 0
with torch.no_grad():
for inputs, labels in val_dataloader:
inputs, labels = inputs.to(device), labels.to(device)
logits = model(inputs)
preds = torch.argmax(logits, dim=1)
correct += preds.eq(labels.view_as(preds)).sum().item()
return correct / len(cifar10_val_data)
# Train the model
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss()
lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
training_func(model, optimizer, criterion, lr_scheduler)
acc = evaluating_func(model)
print(f'The trained model accuracy: {acc}')
# create traced optimizer / lr_scheduler
optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss()
lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
dummy_input = torch.rand(4, 3, 224, 224).to(device)
# TorchEvaluator initialization
evaluator = TorchEvaluator(training_func=training_func, optimizers=optimizer, criterion=criterion,
lr_schedulers=lr_scheduler, dummy_input=dummy_input, evaluating_func=evaluating_func)
# apply pruning
from nni.compression.pytorch.pruning import TaylorFOWeightPruner
from nni.compression.pytorch.speedup import ModelSpeedup
pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
_, masks = pruner.compress()
acc = evaluating_func(model)
print(f'The masked model accuracy: {acc}')
pruner.show_pruned_weights()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
acc = evaluating_func(model)
print(f'The speedup model accuracy: {acc}')
# finetune the speedup model
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.CrossEntropyLoss()
lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
training_func(model, optimizer, criterion, lr_scheduler)
acc = evaluating_func(model)
print(f'The speedup model after finetuning accuracy: {acc}')

Просмотреть файл

@ -1,57 +0,0 @@
import numpy as np
from datasets import load_dataset, load_metric
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
Trainer,
TrainingArguments
)
import nni
from nni.compression.pytorch import TransformersEvaluator
from nni.compression.pytorch.pruning import TaylorFOWeightPruner
dataset = load_dataset('yelp_review_full')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets['train'].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets['test'].shuffle(seed=42).select(range(1000))
model = AutoModelForSequenceClassification.from_pretrained('bert-base-cased', num_labels=5)
training_args = TrainingArguments(output_dir='test_trainer')
metric = load_metric('accuracy')
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(
output_dir='./log',
evaluation_strategy='epoch',
per_device_train_batch_size=32,
num_train_epochs=3,
max_steps=-1
)
trainer = nni.trace(Trainer)(
model=model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics
)
evaluator = TransformersEvaluator(trainer)
pruner = TaylorFOWeightPruner(model, [{'op_types': ['Linear'], 'sparsity': 0.5}], evaluator, 20)
_, masks = pruner.compress()
pruner.show_pruned_weights()

Просмотреть файл

@ -1,154 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import BNNQuantizer
class VGG_Cifar10(nn.Module):
def __init__(self, num_classes=1000):
super(VGG_Cifar10, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 128, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(128, 256, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(256, 512, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True)
)
self.classifier = nn.Sequential(
nn.Linear(512 * 4 * 4, 1024, bias=False),
nn.BatchNorm1d(1024),
nn.Hardtanh(inplace=True),
nn.Linear(1024, 1024, bias=False),
nn.BatchNorm1d(1024),
nn.Hardtanh(inplace=True),
nn.Linear(1024, num_classes), # do not quantize output
nn.BatchNorm1d(num_classes, affine=False)
)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 512 * 4 * 4)
x = self.classifier(x)
return x
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
for name, param in model.named_parameters():
if name.endswith('old_weight'):
param = param.clamp(-1, 1)
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def adjust_learning_rate(optimizer, epoch):
update_list = [55, 100, 150, 200, 400, 600]
if epoch in update_list:
for param_group in optimizer.param_groups:
param_group['lr'] = param_group['lr'] * 0.1
return
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG_Cifar10(num_classes=10)
model.to(device)
configure_list = [{
'quant_types': ['weight'],
'quant_bits': 1,
'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.3', 'features.7', 'features.10', 'features.14', 'classifier.0', 'classifier.3']
}, {
'quant_types': ['output'],
'quant_bits': 1,
'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}]
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
quantizer = BNNQuantizer(model, configure_list, optimizer)
model = quantizer.compress()
print('=' * 10 + 'train' + '=' * 10)
best_top1 = 0
for epoch in range(400):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
adjust_learning_rate(optimizer, epoch)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
print(best_top1)
if __name__ == '__main__':
main()

Просмотреть файл

@ -1,71 +0,0 @@
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import DoReFaQuantizer
import sys
sys.path.append('../models')
from mnist.naive import NaiveModel
def train(model, quantizer, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=trans),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=trans),
batch_size=1000, shuffle=True)
model = NaiveModel()
model = model.to(device)
configure_list = [{
'quant_types': ['weight'],
'quant_bits': {
'weight': 8,
}, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
'op_types':['Conv2d', 'Linear']
}]
quantizer = DoReFaQuantizer(model, configure_list)
quantizer.compress()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.5)
for epoch in range(10):
print('# Epoch {} #'.format(epoch))
train(model, quantizer, device, train_loader, optimizer)
test(model, device, test_loader)
if __name__ == '__main__':
main()

Просмотреть файл

@ -1,142 +0,0 @@
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import LsqQuantizer
from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
class Mnist(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)
self.relu1 = torch.nn.ReLU6()
self.relu2 = torch.nn.ReLU6()
self.relu3 = torch.nn.ReLU6()
self.max_pool1 = torch.nn.MaxPool2d(2, 2)
self.max_pool2 = torch.nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.relu1(self.conv1(x))
x = self.max_pool1(x)
x = self.relu2(self.conv2(x))
x = self.max_pool2(x)
x = x.view(-1, 4 * 4 * 50)
x = self.relu3(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(model, quantizer, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def test_trt(engine, test_loader):
test_loss = 0
correct = 0
time_elasped = 0
for data, target in test_loader:
output, time = engine.inference(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
time_elasped += time
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=trans),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=trans),
batch_size=1000, shuffle=True)
model = Mnist()
configure_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {'weight': 8, 'input': 8},
'op_names': ['conv1']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8, },
'op_names': ['relu1']
}, {
'quant_types': ['weight', 'input'],
'quant_bits': {'weight': 8, 'input': 8},
'op_names': ['conv2']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8},
'op_names': ['relu2']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8},
'op_names': ['max_pool2']
}
]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = LsqQuantizer(model, configure_list, optimizer)
quantizer.compress()
model.to(device)
for epoch in range(40):
print('# Epoch {} #'.format(epoch))
train(model, quantizer, device, train_loader, optimizer)
test(model, device, test_loader)
model_path = "mnist_model.pth"
calibration_path = "mnist_calibration.pth"
calibration_config = quantizer.export_model(model_path, calibration_path)
test(model, device, test_loader)
print("calibration_config: ", calibration_config)
batch_size = 32
input_shape = (batch_size, 1, 28, 28)
engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=batch_size)
engine.compress()
test_trt(engine, test_loader)
if __name__ == '__main__':
main()

Просмотреть файл

@ -1,115 +0,0 @@
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import QAT_Quantizer
from nni.compression.pytorch.quantization.settings import set_quant_scheme_dtype
import sys
sys.path.append('../models')
from mnist.naive import NaiveModel
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=trans),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=trans),
batch_size=1000, shuffle=True)
# Two things should be kept in mind when set this configure_list:
# 1. When deploying model on backend, some layers will be fused into one layer. For example, the consecutive
# conv + bn + relu layers will be fused into one big layer. If we want to execute the big layer in quantization
# mode, we should tell the backend the quantization information of the input, output, and the weight tensor of
# the big layer, which correspond to conv's input, conv's weight and relu's output.
# 2. Same tensor should be quantized only once. For example, if a tensor is the output of layer A and the input
# of the layer B, you should configure either {'quant_types': ['output'], 'op_names': ['a']} or
# {'quant_types': ['input'], 'op_names': ['b']} in the configure_list.
configure_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {'weight': 8, 'input': 8},
'op_names': ['conv1', 'conv2']
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8, },
'op_names': ['relu1', 'relu2']
}, {
'quant_types': ['output', 'weight', 'input'],
'quant_bits': {'output': 8, 'weight': 8, 'input': 8},
'op_names': ['fc1', 'fc2'],
}]
# you can also set the quantization dtype and scheme layer-wise through configure_list like:
# configure_list = [{
# 'quant_types': ['weight', 'input'],
# 'quant_bits': {'weight': 8, 'input': 8},
# 'op_names': ['conv1', 'conv2'],
# 'quant_dtype': 'int',
# 'quant_scheme': 'per_channel_symmetric'
# }]
# For now quant_dtype's options are 'int' and 'uint. And quant_scheme's options are per_tensor_affine,
# per_tensor_symmetric, per_channel_affine and per_channel_symmetric.
set_quant_scheme_dtype('weight', 'per_channel_symmetric', 'int')
set_quant_scheme_dtype('output', 'per_tensor_symmetric', 'int')
set_quant_scheme_dtype('input', 'per_tensor_symmetric', 'int')
model = NaiveModel().to(device)
dummy_input = torch.randn(1, 1, 28, 28).to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
# To enable batch normalization folding in the training process, you should
# pass dummy_input to the QAT_Quantizer.
quantizer = QAT_Quantizer(model, configure_list, optimizer, dummy_input=dummy_input)
quantizer.compress()
model.to(device)
for epoch in range(40):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
model_path = "mnist_model.pth"
calibration_path = "mnist_calibration.pth"
onnx_path = "mnist_model.onnx"
input_shape = (1, 1, 28, 28)
device = torch.device("cuda")
calibration_config = quantizer.export_model(model_path, calibration_path, onnx_path, input_shape, device)
print("Generated calibration config is: ", calibration_config)
if __name__ == '__main__':
main()

Просмотреть файл

@ -152,8 +152,8 @@ def build_finetuning_model(state_dict_path: str, is_quant=False):
import nni
from nni.contrib.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.quantization import QATQuantizer, LsqQuantizer, PtqQuantizer
from nni.compression.utils import TransformersEvaluator
def fake_quantize():

Просмотреть файл

@ -1,152 +0,0 @@
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import QAT_Quantizer
from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT
import sys
sys.path.append('../models')
from mnist.naive import NaiveModel
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def test_trt(engine, test_loader):
test_loss = 0
correct = 0
time_elasped = 0
for data, target in test_loader:
output, time = engine.inference(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
time_elasped += time
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
print("Inference elapsed_time (whole dataset): {}s".format(time_elasped))
def post_training_quantization_example(train_loader, test_loader, device):
model = NaiveModel()
config = {
'conv1':{'weight_bits':8, 'output_bits':8},
'conv2':{'weight_bits':32, 'output_bits':32},
'fc1':{'weight_bits':16, 'output_bits':16},
'fc2':{'weight_bits':8, 'output_bits':8}
}
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
model.to(device)
for epoch in range(1):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
batch_size = 32
input_shape = (batch_size, 1, 28, 28)
engine = ModelSpeedupTensorRT(model, input_shape, config=config, calib_data_loader=train_loader, batchsize=batch_size)
engine.compress()
test_trt(engine, test_loader)
def quantization_aware_training_example(train_loader, test_loader, device):
model = NaiveModel()
configure_list = [{
'quant_types': ['input', 'weight'],
'quant_bits': {'input':8, 'weight':8},
'op_names': ['conv1']
}, {
'quant_types': ['output'],
'quant_bits': {'output':8},
'op_names': ['relu1']
}, {
'quant_types': ['input', 'weight'],
'quant_bits': {'input':8, 'weight':8},
'op_names': ['conv2']
}, {
'quant_types': ['output'],
'quant_bits': {'output':8},
'op_names': ['relu2']
}
]
# finetune the model by using QAT
# enable batchnorm folding mode
dummy_input = torch.randn(1, 1, 28, 28)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(model, configure_list, optimizer, dummy_input=dummy_input)
quantizer.compress()
model.to(device)
for epoch in range(1):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
model_path = "mnist_model.pth"
calibration_path = "mnist_calibration.pth"
calibration_config = quantizer.export_model(model_path, calibration_path)
test(model, device, test_loader)
print("calibration_config: ", calibration_config)
batch_size = 32
input_shape = (batch_size, 1, 28, 28)
engine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=batch_size)
engine.compress()
test_trt(engine, test_loader)
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=trans),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=trans),
batch_size=1000, shuffle=True)
# post-training quantization on TensorRT
post_training_quantization_example(train_loader, test_loader, device)
# combine NNI quantization algorithm QAT with backend framework TensorRT
quantization_aware_training_example(train_loader, test_loader, device)
if __name__ == '__main__':
main()

Просмотреть файл

@ -1,117 +0,0 @@
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.pytorch.quantization import ObserverQuantizer
import sys
sys.path.append('../models')
from mnist.naive import NaiveModel
def train(model, device, train_loader, optimizer):
model.to(device)
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def calibration(model, device, test_loader):
model.eval()
with torch.no_grad():
for data, _ in test_loader:
data = data.to(device)
model(data)
def main():
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True, transform=trans),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=trans),
batch_size=1000, shuffle=True)
model = NaiveModel()
configure_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {'weight': 8, 'input': 8},
'op_names': ['conv1'],
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8, },
'op_names': ['relu1'],
}, {
'quant_types': ['weight', 'input'],
'quant_bits': {'weight': 8, 'input': 8},
'op_names': ['conv2'],
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8},
'op_names': ['relu2'],
}, {
'quant_types': ['output'],
'quant_bits': {'output': 8},
'op_names': ['max_pool2'],
}
]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
# Train the model to get a baseline performance
for epoch in range(5):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
# Construct the ObserverQuantizer. Note that currently ObserverQuantizer only works
# in evaluation mode.
quantizer = ObserverQuantizer(model.eval(), configure_list, optimizer)
# Use the test data set to do calibration, this will not change the model parameters
calibration(model, device, test_loader)
# obtain the quantization information and switch the model to "accuracy verification" mode
quantizer.compress()
# measure the accuracy of the quantized model.
test(model, device, test_loader)
model_path = "mnist_model.pth"
calibration_path = "mnist_calibration.pth"
calibration_config = quantizer.export_model(model_path, calibration_path)
print("calibration_config: ", calibration_config)
# For now the quantization settings of ObserverQuantizer does not match the TensorRT,
# so TensorRT conversion are not supported
# current settings:
# weight : per_tensor_symmetric, qint8
# activation : per_tensor_affine, quint8, reduce_range=True
if __name__ == '__main__':
main()

Просмотреть файл

@ -12,8 +12,8 @@ from torch import Tensor
from torchvision import datasets, transforms
from deepspeed import DeepSpeedEngine
from nni.contrib.compression.quantization import LsqQuantizer
from nni.contrib.compression.utils import DeepspeedTorchEvaluator
from nni.compression.quantization import LsqQuantizer
from nni.compression.utils import DeepspeedTorchEvaluator
from nni.common.types import SCHEDULER

Просмотреть файл

@ -199,8 +199,8 @@ if not skip_exec:
# The following code creates distillers for distillation.
from nni.contrib.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.contrib.compression.utils import TransformersEvaluator
from nni.compression.distillation import DynamicLayerwiseDistiller, Adaptive1dLayerwiseDistiller
from nni.compression.utils import TransformersEvaluator
# %%
# Dynamic distillation is suitable for the situation where the distillation states dimension of the student and the teacher match.
@ -312,9 +312,9 @@ def adapt_distillation(student_model: BertForSequenceClassification, teacher_mod
# You could refer to the experiment results to choose a appropriate ``regular_scale`` you like.
from nni.contrib.compression.pruning import MovementPruner
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.pytorch.speedup.v2.external_replacer import TransformersAttentionReplacer
from nni.compression.pruning import MovementPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer
def pruning_attn():
@ -378,7 +378,7 @@ if not skip_exec:
# so we use ``AGPPruner`` to schedule the sparse ratio to achieve better pruning performance.
from nni.contrib.compression.pruning import TaylorPruner, AGPPruner
from nni.compression.pruning import TaylorPruner, AGPPruner
from transformers.models.bert.modeling_bert import BertLayer
@ -444,7 +444,7 @@ if not skip_exec:
# The output masks can be generated and applied after register the setting template for them.
from nni.contrib.compression.base.setting import PruningSetting
from nni.compression.base.setting import PruningSetting
output_align_setting = {
'_output_': {

Просмотреть файл

@ -65,7 +65,7 @@ config_list = [{
# %%
# Pruners usually require `model` and `config_list` as input arguments.
from nni.contrib.compression.pruning import L1NormPruner
from nni.compression.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)
# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
@ -88,7 +88,7 @@ for name, mask in masks.items():
pruner.unwrap_model()
# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
from nni.compression.pytorch.speedup.v2 import ModelSpeedup
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше