This commit is contained in:
Yuge Zhang 2022-04-14 14:55:56 +08:00 коммит произвёл GitHub
Родитель 4446280d35
Коммит e0ce406f88
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
20 изменённых файлов: 64 добавлений и 45 удалений

Просмотреть файл

@ -160,6 +160,8 @@ If the output shape of the pruned conv layer is not divisible by 1024(for exampl
not_safe = not_safe_to_prune(model, dummy_input)
.. _flops-counter:
Model FLOPs/Parameters Counter
------------------------------

Просмотреть файл

@ -4,7 +4,7 @@ Pruner in NNI
NNI implements the main part of the pruning algorithm as pruner. All pruners are implemented as close as possible to what is described in the paper (if it has).
The following table provides a brief introduction to the pruners implemented in nni, click the link in table to view a more detailed introduction and use cases.
There are two kinds of pruners in NNI, please refer to `basic pruner <basic-pruner>`_ and `scheduled pruner <scheduled-pruner>`_ for details.
There are two kinds of pruners in NNI, please refer to :ref:`basic pruner <basic-pruner>` and :ref:`scheduled pruner <scheduled-pruner>` for details.
.. list-table::
:header-rows: 1

Просмотреть файл

@ -95,6 +95,16 @@ autodoc_inherit_docstrings = False
# Sphinx will warn about all references where the target cannot be found.
nitpicky = False # disabled for now
# A list of regular expressions that match URIs that should not be checked.
linkcheck_ignore = [
r'http://localhost:\d+',
r'.*://.*/#/', # Modern websites that has URLs like xxx.com/#/guide
r'https://github.com/JSong-Jia/Pic/', # Community links can't be found any more
]
# Ignore all links located in release.rst
linkcheck_exclude_documents = ['^release']
# Bibliography files
bibtex_bibfiles = ['refs.bib']

Просмотреть файл

@ -1,7 +1,7 @@
Examples
========
More examples can be found in our :githublink:`GitHub repository <nni/examples>`.
More examples can be found in our :githublink:`GitHub repository <examples>`.
.. cardlinkitem::
:header: HPO Quickstart with PyTorch

Просмотреть файл

@ -4,6 +4,8 @@ AdaptDL Training Service
Now NNI supports running experiment on `AdaptDL <https://github.com/petuum/adaptdl>`__, which is a resource-adaptive deep learning training and scheduling framework. With AdaptDL training service, your trial program will run as AdaptDL job in Kubernetes cluster.
AdaptDL aims to make distributed deep learning easy and efficient in dynamic-resource environments such as shared clusters and the cloud.
.. note:: AdaptDL doesn't support :ref:`reuse mode <training-service-reuse>`.
Prerequisite
------------

Просмотреть файл

@ -65,7 +65,7 @@ If the k8s cluster enforces Authorization, you also need to create a ServiceAcco
Design
------
Please refer the design of `Kubeflow training service <KubeflowMode.rst>`__,
Please refer the design of :doc:`Kubeflow training service <kubeflow>`,
FrameworkController training service pipeline is similar.
Example
@ -113,7 +113,7 @@ If you use Azure Kubernetes Service, you should set storage config as follows:
experiment.config.training_service.storage.key_vault_name = 'your_vault_name'
experiment.config.training_service.storage.key_vault_key = 'your_secret_name'
If you set `ServiceAccount <https://github.com/microsoft/frameworkcontroller/tree/master/example/run#prerequisite>`__ in your k8s,
If you set `ServiceAccount <https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/>`__ in your k8s,
please set ``serviceAccountName`` in your config:
.. code-block:: python

Просмотреть файл

@ -8,6 +8,8 @@ Prerequisite
NNI has supported :doc:`./local`, :doc:`./remote`, :doc:`./openpai`, :doc:`./aml`, :doc:`./kubeflow`, :doc:`./frameworkcontroller`, for hybrid training service. Before starting an experiment using using hybrid training service, users should first setup their chosen (sub) training services (e.g., remote training service) according to each training service's own document page.
.. note:: Reuse mode is disabled by default for local training service. But if you are using local training service in hybrid, :ref:`reuse mode <training-service-reuse>` is enabled by default.
Usage
-----

Просмотреть файл

@ -3,6 +3,8 @@ Local Training Service
With local training service, the whole experiment (e.g., tuning algorithms, trials) runs on a single machine, i.e., user's dev machine. The generated trials run on this machine following ``trialConcurrency`` set in the configuration yaml file. If GPUs are used by trial, local training service will allocate required number of GPUs for each trial, like a resource scheduler.
.. note:: Currently, :ref:`reuse mode <training-service-reuse>` remains disabled by default in local training service.
Prerequisite
------------

Просмотреть файл

@ -100,7 +100,7 @@ Compared with :doc:`local` and :doc:`remote`, OpenPAI training service supports
* - trialMemorySize
- Optional field. Should be in format like ``2gb`` based on your trial program's memory requirement. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - dockerImage
- Optional field. In OpenPAI training service, your trial program will be scheduled by OpenPAI to run in `Docker container <https://www.docker.com/>`__. This key is used to specify the Docker image used to create the container in which your trial will run. Upon every NNI release, we build `a docker image <https://hub.docker.com/r/msranni/nni>`__ with :githublink:`this Dockerfile <https://hub.docker.com/r/msranni/nni>`. You can either use this image directly in your config file, or build your own image. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
- Optional field. In OpenPAI training service, your trial program will be scheduled by OpenPAI to run in `Docker container <https://www.docker.com/>`__. This key is used to specify the Docker image used to create the container in which your trial will run. Upon every NNI release, we build `a docker image <https://hub.docker.com/r/msranni/nni>`__ with `this Dockerfile <https://hub.docker.com/r/msranni/nni>`__. You can either use this image directly in your config file, or build your own image. If it's not set in trial configuration, it should be set in the config specified in ``openpaiConfig`` or ``openpaiConfigFile`` field.
* - virtualCluster
- Optional field. Set the virtualCluster of OpenPAI. If omitted, the job will run on ``default`` virtual cluster.
* - localStorageMountPoint

Просмотреть файл

@ -9,21 +9,30 @@ NNI has supported many training services listed below. Users can go through each
* - Training Service
- Description
* - Local
* - :doc:`Local <local>`
- The whole experiment runs on your dev machine (i.e., a single local machine)
* - Remote
* - :doc:`Remote <remote>`
- The trials are dispatched to your configured SSH servers
* - OpenPAI
* - :doc:`OpenPAI <openpai>`
- Running trials on OpenPAI, a DNN model training platform based on Kubernetes
* - Kubeflow
* - :doc:`Kubeflow <kubeflow>`
- Running trials with Kubeflow, a DNN model training framework based on Kubernetes
* - AdaptDL
* - :doc:`AdaptDL <adaptdl>`
- Running trials on AdaptDL, an elastic DNN model training platform
* - FrameworkController
* - :doc:`FrameworkController <frameworkcontroller>`
- Running trials with FrameworkController, a DNN model training framework on Kubernetes
* - AML
* - :doc:`AML <aml>`
- Running trials on Azure Machine Learning (AML) cloud service
* - PAI-DLC
* - :doc:`PAI-DLC <paidlc>`
- Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK
* - Hybrid
- Support jointly using multiple above training services
* - :doc:`Hybrid <hybrid>`
- Support jointly using multiple above training services
.. _training-service-reuse:
Training Service Under Reuse Mode
---------------------------------
Since NNI v2.0, there are two sets of training service implementations in NNI. The new one is called *reuse mode*. When reuse mode is enabled, a cluster, such as a remote machine or a computer instance on AML, will launch a long-running environment, so that NNI will submit trials to these environments iteratively, which saves the time to create new jobs. For instance, using OpenPAI training platform under reuse mode can avoid the overhead of pulling docker images, creating containers, and downloading data repeatedly.
.. note:: In the reuse mode, users need to make sure each trial can run independently in the same job (e.g., avoid loading checkpoints from previous trials).

Просмотреть файл

@ -60,7 +60,7 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's
Note: You should set ``platform: dlc`` in NNI config YAML file if you want to start experiment in dlc mode.
Compared with `LocalMode <LocalMode.rst>`__ training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Compared with :doc:`local`, training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Also, as dlc mode requires DSW/DLC to mount the same NAS disk to share information, there are two extra keys related to this: ``localStorageMountPoint`` and ``containerStorageMountPoint``.

Просмотреть файл

@ -13,7 +13,7 @@ Prerequisite
2. Make sure remote machines can be accessed through SSH from the machine which runs ``nnictl`` command. It supports both password and key authentication of SSH. For advanced usage, please refer to :ref:`reference-remote-config-label` in reference for detailed usage.
3. Make sure the NNI version on each machine is consistent. Follow the install guide `here <../Tutorial/QuickStart.rst>`__ to install NNI.
3. Make sure the NNI version on each machine is consistent. Follow the install guide :doc:`here </installation>` to install NNI.
4. Make sure the command of Trial is compatible with remote OSes, if you want to use remote Linux and Windows together. For example, the default python 3.x executable called ``python3`` on Linux, and ``python`` on Windows.

Просмотреть файл

@ -2,7 +2,7 @@ HPO Benchmarks
==============
We provide a benchmarking tool to compare the performances of tuners provided by NNI (and users' custom tuners) on different
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark)>`_ to run different *benchmarks* on the NNI *tuners*.
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark>`_ to run different *benchmarks* on the NNI *tuners*.
The tool is located in ``examples/trials/benchmarking/automlbenchmark``. This document provides a brief introduction to the tool, its usage, and currently available benchmarks.
Overview and Terminologies

Просмотреть файл

@ -352,7 +352,7 @@ Detailed usage can be found :doc:`/experiment/training_service/remote`.
* - reuseMode
- ``bool``, optional
- Default: ``True``. Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
- Default: ``True``. Enable :ref:`reuse mode <training-service-reuse>`.
RemoteMachineConfig
"""""""""""""""""""
@ -490,7 +490,7 @@ Detailed usage can be found :doc:`here </experiment/training_service/openpai>`.
* - reuseMode
- ``bool``, optional
- Default: ``True``. Enable `reuse mode <../TrainingService/Overview.rst#training-service-under-reuse-mode>`__.
- Default: ``True``. Enable :ref:`reuse mode <training-service-reuse>`.
* - openpaiConfig
- ``JSON``, optional

Просмотреть файл

@ -15,7 +15,7 @@ Instructions
#. Run ``git clone https://github.com/ultmaster/EfficientNet-PyTorch`` to clone the `ultmaster modified version <https://github.com/ultmaster/EfficientNet-PyTorch>`__ of the original `EfficientNet-PyTorch <https://github.com/lukemelas/EfficientNet-PyTorch>`__. The modifications were done to adhere to the original `Tensorflow version <https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet>`__ as close as possible (including EMA, label smoothing and etc.); also added are the part which gets parameters from tuner and reports intermediate/final results. Clone it into ``EfficientNet-PyTorch``\ ; the files like ``main.py``\ , ``train_imagenet.sh`` will appear inside, as specified in the configuration files.
#. Run ``nnictl create --config config_local.yml`` (use ``config_pai.yml`` for OpenPAI) to find the best EfficientNet-B1. Adjust the training service (PAI/local/remote), batch size in the config files according to the environment.
For training on ImageNet, read ``EfficientNet-PyTorch/train_imagenet.sh``. Download ImageNet beforehand and extract it adhering to `PyTorch format <https://pytorch.org/docs/stable/torchvision/datasets.html#imagenet>`__ and then replace ``/mnt/data/imagenet`` in with the location of the ImageNet storage. This file should also be a good example to follow for mounting ImageNet into the container on OpenPAI.
For training on ImageNet, read ``EfficientNet-PyTorch/train_imagenet.sh``. Download ImageNet beforehand and extract it adhering to `PyTorch format <https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageNet.html>`__ and then replace ``/mnt/data/imagenet`` in with the location of the ImageNet storage. This file should also be a good example to follow for mounting ImageNet into the container on OpenPAI.
Results
-------

Просмотреть файл

@ -5,17 +5,7 @@ Hyper Parameter Optimization Comparison
Comparison of Hyperparameter Optimization (HPO) algorithms on several problems.
Hyperparameter Optimization algorithms are list below:
* :doc:`Random Search </hpo/tuners>`
* :doc:`Grid Search </hpo/tuners>`
* :doc:`Evolution </hpo/tuners>`
* :doc:`Anneal </hpo/tuners>`
* :doc:`Metis </hpo/tuners>`
* :doc:`TPE </hpo/tuners>`
* :doc:`SMAC </hpo/tuners>`
* :doc:`HyperBand </hpo/tuners>`
* :doc:`BOHB </hpo/tuners>`
Hyperparameter Optimization algorithms are listed in :doc:`/hpo/tuners`.
All algorithms run in NNI local environment.
@ -38,7 +28,7 @@ AutoGBDT Example
Problem Description
^^^^^^^^^^^^^^^^^^^
Nonconvex problem on the hyper-parameter search of AutoGBDT example.
Nonconvex problem on the hyper-parameter search of :githublink:`AutoGBDT example <examples/trials/auto-gbdt>`.
Search Space
^^^^^^^^^^^^

Просмотреть файл

@ -35,7 +35,7 @@ The experiments are performed with the following pruners/datasets/models:
For the pruners with scheduling, ``L1Filter Pruner`` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, ``L1Filter Pruner`` is used to performn real pruning.
*
All the pruners listed above are implemented in :githublink:`nni <docs/en_US/Compression/Overview.rst>`.
All the pruners listed above are implemented in :doc:`nni </compression/overview>`.
Experiment Result
-----------------
@ -88,15 +88,12 @@ Implementation Details
^^^^^^^^^^^^^^^^^^^^^^
*
The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
* The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
*
Both FLOPs and the number of parameters are counted with :githublink:`Model FLOPs/Parameters Counter <docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter>` after :githublink:`model speedup <docs/en_US/Compression/ModelSpeedup.rst>`.
* Both FLOPs and the number of parameters are counted with :ref:`Model FLOPs/Parameters Counter <flops-counter>` after :doc:`model speedup </tutorials/pruning_speedup>`.
This avoids potential issues of counting them of masked models.
*
The experiment code can be found :githublink:`here <examples/model_compress/pruning/legacy/auto_pruners_torch.py>`.
* The experiment code can be found :githublink:`here <examples/model_compress/pruning/legacy/auto_pruners_torch.py>`.
Experiment Result Rendering
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Просмотреть файл

@ -41,7 +41,7 @@ How to Open NNI's Web UI on Google Colab
! curl -s http://localhost:4040/api/tunnels # don't change the port number 4040
You will see an url like http://xxxx.ngrok.io after step 4, open this url and you will find NNI's Web UI. Have fun :)
You will see an url like ``http://xxxx.ngrok.io`` after step 4, open this url and you will find NNI's Web UI. Have fun :)
Access Web UI with frp
----------------------

Просмотреть файл

@ -8,7 +8,7 @@ Overview
The performance of RocksDB is highly contingent on its tuning. However, because of the complexity of its underlying technology and a large number of configurable parameters, a good configuration is sometimes hard to obtain. NNI can help to address this issue. NNI supports many kinds of tuning algorithms to search the best configuration of RocksDB, and support many kinds of environments like local machine, remote servers and cloud.
This example illustrates how to use NNI to search the best configuration of RocksDB for a ``fillrandom`` benchmark supported by a benchmark tool ``db_bench``\ , which is an official benchmark tool provided by RocksDB itself. Therefore, before running this example, please make sure NNI is installed and `db_bench <https://github.com/facebook/rocksdb/wiki/Benchmarking-tools>`__ is in your ``PATH``. Please refer to `here <../Tutorial/QuickStart.rst>`__ for detailed information about installation and preparing of NNI environment, and `here <https://github.com/facebook/rocksdb/blob/master/INSTALL.md>`__ for compiling RocksDB as well as ``db_bench``.
This example illustrates how to use NNI to search the best configuration of RocksDB for a ``fillrandom`` benchmark supported by a benchmark tool ``db_bench``\ , which is an official benchmark tool provided by RocksDB itself. Therefore, before running this example, please make sure NNI is installed and `db_bench <https://github.com/facebook/rocksdb/wiki/Benchmarking-tools>`__ is in your ``PATH``. Please refer to :doc:`here </installation>` for detailed information about installation and preparing of NNI environment, and `here <https://github.com/facebook/rocksdb/blob/master/INSTALL.md>`__ for compiling RocksDB as well as ``db_bench``.
We also provide a simple script :githublink:`db_bench_installation.sh <examples/trials/systems_auto_tuning/rocksdb-fillrandom/db_bench_installation.sh>` helping to compile and install ``db_bench`` as well as its dependencies on Ubuntu. Installing RocksDB on other systems can follow the same procedure.
@ -24,7 +24,7 @@ Search Space
For simplicity, this example tunes three parameters, ``write_buffer_size``\ , ``min_write_buffer_num`` and ``level0_file_num_compaction_trigger``\ , for writing 16M keys with 20 Bytes of key size and 100 Bytes of value size randomly, based on writing operations per second (OPS). ``write_buffer_size`` sets the size of a single memtable. Once memtable exceeds this size, it is marked immutable and a new one is created. ``min_write_buffer_num`` is the minimum number of memtables to be merged before flushing to storage. Once the number of files in level 0 reaches ``level0_file_num_compaction_trigger``\ , level 0 to level 1 compaction is triggered.
In this example, the search space is specified by a ``search_space.json`` file as shown below. Detailed explanation of search space could be found `here <../Tutorial/SearchSpaceSpec.rst>`__.
In this example, the search space is specified by a ``search_space.json`` file as shown below. Detailed explanation of search space could be found :doc:`here </hpo/search_space>`.
.. code-block:: json
@ -58,7 +58,7 @@ Benchmark code should receive a configuration from NNI manager, and report the c
Config file
^^^^^^^^^^^
One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.), platform settings (\ ``trainingService``\ ), path settings (\ ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to `here <../Tutorial/QuickStart.rst>`__ for more information.
One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.), platform settings (\ ``trainingService``\ ), path settings (\ ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to :doc:`/reference/experiment_config`.
Here is an example of tuning RocksDB with SMAC algorithm:
@ -68,7 +68,7 @@ Here is an example of tuning RocksDB with TPE algorithm:
:githublink:`code directory <examples/trials/systems_auto_tuning/rocksdb-fillrandom/config_tpe.yml>`
Other tuners can be easily adopted in the same way. Please refer to `here <../Tuner/BuiltinTuner.rst>`__ for more information.
Other tuners can be easily adopted in the same way. Please refer to :doc:`here </hpo/tuners>` for more information.
Finally, we could enter the example folder and start the experiment using following commands:

Просмотреть файл

@ -42,6 +42,11 @@ stages:
python tools/chineselink.py check
displayName: Translation up-to-date
- script: |
cd docs
make -e SPHINXOPTS="-W -T -b linkcheck -q --keep-going" html
displayName: External links integrity check
- job: python
pool:
vmImage: ubuntu-latest