doc refactor to master (#648)
* initial commit for document refactor (#533) * enable sphinx * fix docs/readme link * update conf * use mkdocs to build homepage * Update mkdocs.yml use docs/README as main * delete sphinx file * add nav to mkdocs config * fix mkdocs bug * add requirement for online build * clean requirements.txt * add contribution * change requirements file location * delete sphinx from gitignore * mnist examples doc (#566) * mnist examples doc * update * update * update * update * update * Docs refactor of Tutorial (QuickStart, Tuners, Assessors) (#554) * Refactor of QuickStart * fix some typo * Make new changes based on suggestions * update successful INFO * update Tuners * update Tuners:overview * Update Tuners.md * Add Assessor.md * update * update * mkdocs.yml * Update QuickStart.md * update * update * update * update * update * add diff * modified QuickStart.md and add mnist without nni example * update * small change * update * update * update * update * update and refactor the mnist.py * update * update working process * refator the mnist.py * update * update * update mkdocs.md * add metis tuner * update QuickStart webUI part * update capture * update capture * update description * update picture for assessor * update * modified mkdocs.yml * update Tuners.md * test format * update Tuner.md * update Tuner.md * change display format * fix typo * update * fix typo * fix typo and rename customize_Advisor * fix typo and modified * Dev doc: Add docs for Trials, SearchSpace, Annotation and GridSearch (#569) * add Trials.md * add Trials.md * add Trials.md * add Trials.md * add docs * add docs * add docs * add docs * add docs * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * update triaL.MD * update triaL.MD * update triaL.MD * update triaL.MD * update triaL.MD * add grid search tuner doc * add grid search tuner doc * Chec dev doc (#606) * multiPhase doc * updates * updates * updates * updates * updates * updates * update dev-doc to sphinx (#630) * add trigger (#544) * NNI logging architecture improvement (#539) * Removed unused log code, refactor to rename some class name in nni sdk and trial_tools * Fix the regression bug that loca/remote mode doesnt work * [WebUI] Fix issue#517 & issue#459 (#524) * [WebUI] Fix issue#517 & issue#459 * update * [Logging architecture refactor] Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor (#551) * Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor * Doc typo and format fixes (#560) * fix incorrect document * fix doc format and typo * fix state transition (#504) * Add enas nni version from contributor (#557) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * update * fix path * update reference * fix bug in config file * update nni_arch_overview.png * update * update * update * add enas_nni * Code coverage report (#559) * Add UT code coverage report * updates * updates * updates * updates * updates * updates * integration test python code coverage report * Updating Readme to add the Related Projects like PAI, KubeLauncher and MMdnn (#565) * Adding related projects to Readme * Fix remote TrainingService bug, change forEach to "for of" (#564) trial job could not be stopped in remote machine when experiment is stopped, because awit/async does not work normally in forEach, refer https://codeburst.io/javascript-async-await-with-foreach-b6ba62bbf404. * To install the whole nni in an virtual environment (#538) * support venv * adapt venv * adapt venv * adapt venv * adapt venv * new test * new test * new test * support venv * support venv * support venv * support venv * support venv * support venv * support venv * colorful output for mac * colorful output for mac * permission denied in /tmp * permission denied in /tmp * permission denied in /tmp * remove unused variable * final * remove build python * Make it feasible for annotation whether to add an extra line "nni.get_next_parameter()" or not (#526) * fix bug * add docs * add ut * add ut * add to ci * update doc * update doc * update ut * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * test * test * test * test * test * test * test * test * test * test * revert * refactor * refactor * s * merge * fix annotation for extra line * add deprecation warning * fix permision deny (#567) * Add Metis Tuner (#534) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * update * fix path * update reference * fix bug in config file * update nni_arch_overview.png * update * update * update * add metis tuner code * 1. fix bug about import 2.update other sdk file * add auto-gbdt-example and remove unused code * add metis_tuner into README * update the README * update README | remove unused variable * fix typo * add sklearn into requirments * Update src/sdk/pynni/nni/metis_tuner/metis_tuner.py add default value in __init__ Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * Update docs/HowToChooseTuner.md Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * Update docs/HowToChooseTuner.md Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * fix typo | add more comments * Change WARNING to INFO (#574) change the warning level to info level when expand relative path add nnictl --version log update readme.md * Fix some bugs in doc and log (#561) * fix some bugs in doc and log * The learning rate focus more on validation sets accuracy than training sets accuracy. * Fix a race condidtion issue in trial_keeper for reading log from pipe (#578) * Fix a race condidtion issue in trial_keeper for reading log from pipe * [WebUI] Fix issue#458 about final result as dict (#563) * [WebUI] Fix issue#458 about final result as dict * Fix comments * fix bug * support frameworkcontroller log (#572) support frameworkcontroller log * Dev weight sharing (#568) (#576) * Dev weight sharing (#568) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * Dev weight sharing update doc (#577) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * Dev weight sharing update (#579) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * update weight sharing tutorial * Dev weight sharing (#581) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * update weight sharing tutorial * fix divide by zero risk * update tuner thread exception handling * fix bug for async test * Add frameworkcontroller document (#530) Add frameworkcontroller document. Fix other document small issues. * [WebUI] Show trial log for pai and k8s (#580) * [WebUI] Show trial log for pai and k8s * fix lint * Fix comments * [WebUI] Show trial log for pai and k8s (#580) * [WebUI] Show trial log for pai and k8s * fix lint * Fix comments * add __init__.py to metis_tuner (#588) * [Document] Update webui doc (#587) * Update webui document * update * Update Dockerfile and README (#589) * fix some bugs in doc and log * The learning rate focus more on validation sets accuracy than training sets accuracy. * update Dockerfile and README * Update README.md Merge to branch v0.5 * [WebUI] Fix bug (#591) * fix bug * fix bug of background * update * update * add frameworkcontroller platform * update README in metis and update RuntimeError info (#595) * update README in metis and update RuntimeError * fix typo * add numerical choice check * update * udpate NFS setup tutorial (#597) * Remove unused example (#600) * update README in metis and update RuntimeError * remove smart params * Update release note (#603) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md … * update doc: overview (#555) * doc overview * update overview * modification * update overview * update * update * update * update * Delete mkdocs.yml * cifar10 example doc (#573) * add cifar10 examples * add search space and result * update * update typo * update * update * Update doc: refactor ExperimentConfig.md (#602) * fix doc * add link in doc * update * add nnictl package cmd * update doc index & add api reference (#636) * add trigger (#544) * NNI logging architecture improvement (#539) * Removed unused log code, refactor to rename some class name in nni sdk and trial_tools * Fix the regression bug that loca/remote mode doesnt work * [WebUI] Fix issue#517 & issue#459 (#524) * [WebUI] Fix issue#517 & issue#459 * update * [Logging architecture refactor] Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor (#551) * Remove unused metrics related code in nni trial_tools, support kubeflow mode for logging architecture refactor * Doc typo and format fixes (#560) * fix incorrect document * fix doc format and typo * fix state transition (#504) * Add enas nni version from contributor (#557) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * update * fix path * update reference * fix bug in config file * update nni_arch_overview.png * update * update * update * add enas_nni * Code coverage report (#559) * Add UT code coverage report * updates * updates * updates * updates * updates * updates * integration test python code coverage report * Updating Readme to add the Related Projects like PAI, KubeLauncher and MMdnn (#565) * Adding related projects to Readme * add Trials.md * add Trials.md * add Trials.md * Fix remote TrainingService bug, change forEach to "for of" (#564) trial job could not be stopped in remote machine when experiment is stopped, because awit/async does not work normally in forEach, refer https://codeburst.io/javascript-async-await-with-foreach-b6ba62bbf404. * add Trials.md * add docs * To install the whole nni in an virtual environment (#538) * support venv * adapt venv * adapt venv * adapt venv * adapt venv * new test * new test * new test * support venv * support venv * support venv * support venv * support venv * support venv * support venv * colorful output for mac * colorful output for mac * permission denied in /tmp * permission denied in /tmp * permission denied in /tmp * remove unused variable * final * remove build python * Make it feasible for annotation whether to add an extra line "nni.get_next_parameter()" or not (#526) * fix bug * add docs * add ut * add ut * add to ci * update doc * update doc * update ut * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * add ut to ci * test * test * test * test * test * test * test * test * test * test * revert * refactor * refactor * s * merge * fix annotation for extra line * add deprecation warning * fix permision deny (#567) * Add Metis Tuner (#534) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * update readme * update * fix path * update reference * fix bug in config file * update nni_arch_overview.png * update * update * update * add metis tuner code * 1. fix bug about import 2.update other sdk file * add auto-gbdt-example and remove unused code * add metis_tuner into README * update the README * update README | remove unused variable * fix typo * add sklearn into requirments * Update src/sdk/pynni/nni/metis_tuner/metis_tuner.py add default value in __init__ Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * Update docs/HowToChooseTuner.md Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * Update docs/HowToChooseTuner.md Co-Authored-By: xuehui1991 <xuehui@microsoft.com> * fix typo | add more comments * Change WARNING to INFO (#574) change the warning level to info level when expand relative path add nnictl --version log update readme.md * Fix some bugs in doc and log (#561) * fix some bugs in doc and log * The learning rate focus more on validation sets accuracy than training sets accuracy. * Fix a race condidtion issue in trial_keeper for reading log from pipe (#578) * Fix a race condidtion issue in trial_keeper for reading log from pipe * [WebUI] Fix issue#458 about final result as dict (#563) * [WebUI] Fix issue#458 about final result as dict * Fix comments * fix bug * support frameworkcontroller log (#572) support frameworkcontroller log * Dev weight sharing (#568) (#576) * Dev weight sharing (#568) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * Dev weight sharing update doc (#577) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * Dev weight sharing update (#579) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * update weight sharing tutorial * Dev weight sharing (#581) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit1d174836d3
. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments * add example section * update weight sharing tutorial * fix divide by zero risk * update tuner thread exception handling * fix bug for async test * Add frameworkcontroller document (#530) Add frameworkcontroller document. Fix other document small issues. * [WebUI] Show trial log for pai and k8s (#580) * [WebUI] Show trial log for pai and k8s * fix lint * Fix comments * [WebUI] Show trial log for pai and k8s (#580) * [WebUI] Show trial log for pai and k8s * fix lint * Fix comments * add __init__.py to metis_tuner (#588) * add docs * [Document] Update webui doc (#587) * Update webui document * update * Update Dockerfile and README (#589) * fix some bugs in doc and log * The learning rate focus more on validation sets accuracy than training sets accuracy. * update Dockerfile and README * Update README.md Merge to branch v0.5 * [WebUI] Fix bug (#591) * fix bug * fix bug of background * update * update * add frameworkcontroller platform * update README in metis and update RuntimeError info (#595) * update README in metis and update RuntimeError * fix typo * add numerical choice check * update * udpate NFS setup tutorial (#597) * Remove unused example (#600) * update README in metis and update RuntimeError * remove smart params * Update release note … * Add sklearn_example.md (#647) * fix doc * add link in doc * update * add nnictl package cmd * add sklearn-example.md * update * update * fix link * add desc in example.rst * revert nnictl before sphinx try * fix mnist.py example * add SQuAD_evolution_examples.md (#620) * add SQuAD_evolution_examples.md * add update * remove yml file * remove mkdoc.yml * update Example.rst * Add GBDT example doc (#654) * add SQuAD_evolution_examples.md * add update * remove yml file * remove mkdoc.yml * update Example.rst * update gbdt_example.md * update * add run command line * update Evlution_SQuAD.md * update link * add gbdt in Example.rst * Update SearchSpaceSpec (#656) * add Trials.md * add Trials.md * add Trials.md * add Trials.md * add docs * add docs * add docs * add docs * add docs * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * docs modification * update triaL.MD * update triaL.MD * update triaL.MD * update triaL.MD * update triaL.MD * add grid search tuner doc * add grid search tuner doc * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * update SearchSpaceSpec * fix color for zejun * fix mnist before * fix image * Fix doc format (#658) * add SQuAD_evolution_examples.md * add update * remove yml file * fix format problem * update index * fix typo * fix broken-links of quickstart/tuners/assessors (#662) * update assessor.rst * fix * Dev doc fix4 (#672) * refactor index * fix title format for remote * fix nnictl doc w/ rst * fix assessor pic link * fix type for training service * adjust tutorial * fix typo * Dev doc (#669) * changed image link * deleted link of triallog.png new version do not has this function * Update AdvancedNAS.md * changed image link weight sharing image * update doc (#670) * update doc * Dev doc fix4 (#672) * refactor index * fix title format for remote * fix nnictl doc w/ rst * fix assessor pic link * fix type for training service * adjust tutorial * fix typo * Update customized assessor doc (#671) * Update customized assessor doc * updates * fix typo * update doc (#673)
|
@ -0,0 +1,3 @@
|
|||
_build
|
||||
_static
|
||||
_templates
|
|
@ -19,7 +19,7 @@ tuner:
|
|||
```
|
||||
And let tuner decide where to save & load weights and feed the paths to trials through `nni.get_next_parameters()`:
|
||||
|
||||
![weight_sharing_design](./img/weight_sharing.png)
|
||||
<img src="https://user-images.githubusercontent.com/23273522/51817667-93ebf080-2306-11e9-8395-b18b322062bc.png" alt="drawing" width="700"/>
|
||||
|
||||
For example, in tensorflow:
|
||||
```python
|
||||
|
@ -86,4 +86,4 @@ For details, please refer to this [simple weight sharing example](../test/async_
|
|||
[2]: https://arxiv.org/abs/1707.07012
|
||||
[3]: https://arxiv.org/abs/1806.09055
|
||||
[4]: https://arxiv.org/abs/1806.10282
|
||||
[5]: https://arxiv.org/abs/1703.01041
|
||||
[5]: https://arxiv.org/abs/1703.01041
|
||||
|
|
|
@ -1,58 +1,70 @@
|
|||
# NNI Annotation
|
||||
|
||||
For good user experience and reduce user effort, we need to design a good annotation grammar.
|
||||
|
||||
If users use NNI system, they only need to:
|
||||
## Overview
|
||||
|
||||
1. Use nni.get_next_parameter() to retrieve hyper parameters from Tuner, before using other annotation, use following annotation at the begining of trial code:
|
||||
'''@nni.get_next_parameter()'''
|
||||
To improve user experience and reduce user effort, we design an annotation grammar. Using NNI annotation, users can adapt their code to NNI just by adding some standalone annotating strings, which does not affect the execution of the original code.
|
||||
|
||||
2. Annotation variable in code as:
|
||||
Below is an example:
|
||||
|
||||
'''@nni.variable(nni.choice(2,3,5,7),name=self.conv_size)'''
|
||||
```python
|
||||
'''@nni.variable(nni.choice(0.1, 0.01, 0.001), name=learning_rate)'''
|
||||
learning_rate = 0.1
|
||||
```
|
||||
The meaning of this example is that NNI will choose one of several values (0.1, 0.01, 0.001) to assign to the learning_rate variable. Specifically, this first line is an NNI annotation, which is a single string. Following is an assignment statement. What nni does here is to replace the right value of this assignment statement according to the information provided by the annotation line.
|
||||
|
||||
3. Annotation intermediate in code as:
|
||||
|
||||
'''@nni.report_intermediate_result(test_acc)'''
|
||||
In this way, users could either run the python code directly or launch NNI to tune hyper-parameter in this code, without changing any codes.
|
||||
|
||||
4. Annotation output in code as:
|
||||
## Types of Annotation:
|
||||
|
||||
'''@nni.report_final_result(test_acc)'''
|
||||
In NNI, there are mainly four types of annotation:
|
||||
|
||||
5. Annotation `function_choice` in code as:
|
||||
|
||||
'''@nni.function_choice(max_pool(h_conv1, self.pool_size),avg_pool(h_conv1, self.pool_size),name=max_pool)'''
|
||||
### 1. Annotate variables
|
||||
|
||||
In this way, they can easily implement automatic tuning on NNI.
|
||||
`'''@nni.variable(sampling_algo, name)'''`
|
||||
|
||||
For `@nni.variable`, `nni.choice` is the type of search space and there are 10 types to express your search space as follows:
|
||||
`@nni.variable` is used in NNI to annotate a variable.
|
||||
|
||||
1. `@nni.variable(nni.choice(option1,option2,...,optionN),name=variable)`
|
||||
Which means the variable value is one of the options, which should be a list The elements of options can themselves be stochastic expressions
|
||||
**Arguments**
|
||||
|
||||
2. `@nni.variable(nni.randint(upper),name=variable)`
|
||||
Which means the variable value is a random integer in the range [0, upper).
|
||||
- **sampling_algo**: Sampling algorithm that specifies a search space. User should replace it with a built-in NNI sampling function whose name consists of an `nni.` identification and a search space type specified in [SearchSpaceSpec](SearchSpaceSpec.md) such as `choice` or `uniform`.
|
||||
- **name**: The name of the variable that the selected value will be assigned to. Note that this argument should be the same as the left value of the following assignment statement.
|
||||
|
||||
3. `@nni.variable(nni.uniform(low, high),name=variable)`
|
||||
Which means the variable value is a value uniformly between low and high.
|
||||
An example here is:
|
||||
|
||||
4. `@nni.variable(nni.quniform(low, high, q),name=variable)`
|
||||
Which means the variable value is a value like round(uniform(low, high) / q) * q
|
||||
```python
|
||||
'''@nni.variable(nni.choice(0.1, 0.01, 0.001), name=learning_rate)'''
|
||||
learning_rate = 0.1
|
||||
```
|
||||
|
||||
5. `@nni.variable(nni.loguniform(low, high),name=variable)`
|
||||
Which means the variable value is a value drawn according to exp(uniform(low, high)) so that the logarithm of the return value is uniformly distributed.
|
||||
### 2. Annotate functions
|
||||
|
||||
6. `@nni.variable(nni.qloguniform(low, high, q),name=variable)`
|
||||
Which means the variable value is a value like round(exp(uniform(low, high)) / q) * q
|
||||
`'''@nni.function_choice(*functions, name)'''`
|
||||
|
||||
7. `@nni.variable(nni.normal(label, mu, sigma),name=variable)`
|
||||
Which means the variable value is a real value that's normally-distributed with mean mu and standard deviation sigma.
|
||||
`@nni.function_choice` is used to choose one from several functions.
|
||||
|
||||
8. `@nni.variable(nni.qnormal(label, mu, sigma, q),name=variable)`
|
||||
Which means the variable value is a value like round(normal(mu, sigma) / q) * q
|
||||
**Arguments**
|
||||
|
||||
9. `@nni.variable(nni.lognormal(label, mu, sigma),name=variable)`
|
||||
Which means the variable value is a value drawn according to exp(normal(mu, sigma))
|
||||
- **\*functions**: Several functions that are waiting to be selected from. Note that it should be a complete function call with arguments. Such as `max_pool(hidden_layer, pool_size)`.
|
||||
- **name**: The name of the function that will be replaced in the following assignment statement.
|
||||
|
||||
10. `@nni.variable(nni.qlognormal(label, mu, sigma, q),name=variable)`
|
||||
Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q
|
||||
An example here is:
|
||||
|
||||
```python
|
||||
"""@nni.function_choice(max_pool(hidden_layer, pool_size), avg_pool(hidden_layer, pool_size), name=max_pool)"""
|
||||
h_pooling = max_pool(hidden_layer, pool_size)
|
||||
```
|
||||
|
||||
### 3. Annotate intermediate result
|
||||
|
||||
`'''@nni.report_intermediate_result(metrics)'''`
|
||||
|
||||
`@nni.report_intermediate_result` is used to report intermediate result, whose usage is the same as `nni.report_intermediate_result` in [Trials.md](Trials.md)
|
||||
|
||||
### 4. Annotate final result
|
||||
|
||||
`'''@nni.report_final_result(metrics)'''`
|
||||
|
||||
`@nni.report_final_result` is used to report the final result of the current trial, whose usage is the same as `nni.report_final_result` in [Trials.md](Trials.md)
|
||||
|
|
|
@ -0,0 +1,74 @@
|
|||
# Built-in Assessors
|
||||
|
||||
NNI provides state-of-the-art tuning algorithm in our builtin-assessors and makes them easy to use. Below is the brief overview of NNI current builtin Assessors:
|
||||
|
||||
|Assessor|Brief Introduction of Algorithm|
|
||||
|---|---|
|
||||
|**Medianstop**<br>[(Usage)](#MedianStop)|Medianstop is a simple early stopping rule mentioned in the [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.|
|
||||
|[Curvefitting](https://github.com/Microsoft/nni/blob/master/src/sdk/pynni/nni/curvefitting_assessor/README.md)<br>[(Usage)](#Curvefitting)|Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve|
|
||||
|
||||
<br>
|
||||
|
||||
## Usage of Builtin Assessors
|
||||
|
||||
Use builtin assessors provided by NNI SDK requires to declare the **builtinAssessorName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements, and example for each assessor.
|
||||
|
||||
Note: Please follow the format when you write your `config.yml` file.
|
||||
|
||||
<a name="MedianStop"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Median Stop Assessor`
|
||||
|
||||
> Builtin Assessor Name: **Medianstop**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
|
||||
* **start_step** (*int, optional, default = 0*) - A trial is determined to be stopped or not, only after receiving start_step number of reported intermediate results.
|
||||
|
||||
**Usage example:**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
assessor:
|
||||
builtinAssessorName: Medianstop
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
start_step: 5
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Curvefitting"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Curve Fitting Assessor`
|
||||
|
||||
> Builtin Assessor Name: **Curvefitting**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **epoch_num** (*int, **required***) - The total number of epoch. We need to know the number of epoch to determine which point we need to predict.
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
|
||||
* **start_step** (*int, optional, default = 6*) - A trial is determined to be stopped or not, we start to predict only after receiving start_step number of reported intermediate results.
|
||||
* **threshold** (*float, optional, default = 0.95*) - The threshold that we decide to early stop the worse performance curve. For example: if threshold = 0.95, optimize_mode = maximize, best performance in the history is 0.9, then we will stop the trial which predict value is lower than 0.95 * 0.9 = 0.855.
|
||||
|
||||
**Usage example:**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
assessor:
|
||||
builtinAssessorName: Curvefitting
|
||||
classArgs:
|
||||
epoch_num: 20
|
||||
optimize_mode: maximize
|
||||
start_step: 6
|
||||
threshold: 0.95
|
||||
```
|
|
@ -0,0 +1,313 @@
|
|||
# Built-in Tuners
|
||||
|
||||
NNI provides state-of-the-art tuning algorithm as our builtin-tuners and makes them easy to use. Below is the brief summary of NNI currently built-in Tuners:
|
||||
|
||||
|Tuner|Brief Introduction of Algorithm|
|
||||
|---|---|
|
||||
|**TPE**<br>[(Usage)](#TPE)|The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model.|
|
||||
|**Random Search**<br>[(Usage)](#Random)|In Random Search for Hyper-Parameter Optimization show that Random Search might be surprisingly simple and effective. We suggest that we could use Random Search as the baseline when we have no knowledge about the prior distribution of hyper-parameters.|
|
||||
|**Anneal**<br>[(Usage)](#Anneal)|This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.|
|
||||
|**Naive Evolution**<br>[(Usage)](#Evolution)|Naive Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naive Evolution requires many trials to works, but it's very simple and easy to expand new features.|
|
||||
|**SMAC**<br>[(Usage)](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo.|
|
||||
|**Batch tuner**<br>[(Usage)](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|
||||
|**Grid Search**<br>[(Usage)](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.|
|
||||
|[Hyperband](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/hyperband_advisor)<br>[(Usage)](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of STEPs to find out promising one, then further training those promising ones to select several more promising one.|
|
||||
|[Network Morphism](https://github.com/Microsoft/nni/blob/master/src/sdk/pynni/nni/networkmorphism_tuner/README.md)<br>[(Usage)](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train.|
|
||||
|**Metis Tuner**<br>[(Usage)](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter.|
|
||||
|
||||
<br>
|
||||
|
||||
## Usage of Builtin Tuners
|
||||
|
||||
Use builtin tuner provided by NNI SDK requires to declare the **builtinTunerName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements and example for each tuner.
|
||||
|
||||
Note: Please follow the format when you write your `config.yml` file.
|
||||
|
||||
<a name="TPE"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `TPE`
|
||||
|
||||
> Builtin Tuner Name: **TPE**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example:**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: TPE
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Random"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Random Search`
|
||||
|
||||
> Builtin Tuner Name: **Random**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.
|
||||
|
||||
**Requirement of classArg:**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: Random
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Anneal"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Anneal`
|
||||
|
||||
> Builtin Tuner Name: **Anneal**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: Anneal
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Evolution"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Naive Evolution`
|
||||
|
||||
> Builtin Tuner Name: **Evolution**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: Evolution
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="SMAC"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `SMAC`
|
||||
|
||||
> Builtin Tuner Name: **SMAC**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: SMAC
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Batch"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Batch Tuner`
|
||||
|
||||
> Builtin Tuner Name: BatchTuner
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: BatchTuner
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
Note that the search space that BatchTuner supported like:
|
||||
|
||||
```json
|
||||
{
|
||||
"combine_params":
|
||||
{
|
||||
"_type" : "choice",
|
||||
"_value" : [{"optimizer": "Adam", "learning_rate": 0.00001},
|
||||
{"optimizer": "Adam", "learning_rate": 0.0001},
|
||||
{"optimizer": "Adam", "learning_rate": 0.001},
|
||||
{"optimizer": "SGD", "learning_rate": 0.01},
|
||||
{"optimizer": "SGD", "learning_rate": 0.005},
|
||||
{"optimizer": "SGD", "learning_rate": 0.0002}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The search space file including the high-level key `combine_params`. The type of params in search space must be `choice` and the `values` including all the combined-params value.
|
||||
|
||||
<a name="GridSearch"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Grid Search`
|
||||
|
||||
> Builtin Tuner Name: **Grid Search**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform`. **The number `q` in `quniform` and `qloguniform` has special meaning (different from the spec in [search space spec](./SearchSpaceSpec.md)). It means the number of values that will be sampled evenly from the range `low` and `high`.**
|
||||
|
||||
It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: GridSearch
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="Hyperband"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Hyperband`
|
||||
|
||||
> Builtin Advisor Name: **Hyperband**
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
* **R** (*int, optional, default = 60*) - the maximum STEPS (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use STEPS to control how long it runs.
|
||||
* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
advisor:
|
||||
builtinAdvisorName: Hyperband
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
R: 60
|
||||
eta: 3
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="NetworkMorphism"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Network Morphism`
|
||||
|
||||
> Builtin Tuner Name: **NetworkMorphism**
|
||||
|
||||
**Installation**
|
||||
|
||||
NetworkMorphism requires [pyTorch](https://pytorch.org/get-started/locally), so users should install it first.
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](../examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
|
||||
* **input_width** (*int, optional, default = 32*) - input image width
|
||||
* **input_channel** (*int, optional, default = 3*) - input image channel
|
||||
* **n_output_node** (*int, optional, default = 10*) - number of classes
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: NetworkMorphism
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
task: cv
|
||||
input_width: 32
|
||||
input_channel: 3
|
||||
n_output_node: 10
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
<a name="MetisTuner"></a>
|
||||
|
||||
![](https://placehold.it/15/1589F0/000000?text=+) `Metis Tuner`
|
||||
|
||||
> Builtin Tuner Name: **MetisTuner**
|
||||
|
||||
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`.
|
||||
|
||||
**Installation**
|
||||
|
||||
Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should install it first. User could use `pip3 install sklearn` to install it.
|
||||
|
||||
**Suggested scenario**
|
||||
|
||||
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](../examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.
|
||||
|
||||
**Requirement of classArg**
|
||||
|
||||
* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
|
||||
|
||||
**Usage example**
|
||||
|
||||
```yaml
|
||||
# config.yml
|
||||
tuner:
|
||||
builtinTunerName: MetisTuner
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
```
|
|
@ -0,0 +1,8 @@
|
|||
###############################
|
||||
Contribution to NNI
|
||||
###############################
|
||||
|
||||
.. toctree::
|
||||
Development Setup<SetupNNIDeveloperEnvironment>
|
||||
Contribution Guide<CONTRIBUTING>
|
||||
Debug HowTo<HowToDebug>
|
|
@ -0,0 +1,62 @@
|
|||
# Customize Assessor
|
||||
|
||||
## Customize Assessor
|
||||
|
||||
NNI also support building an assessor by yourself to adjust your tuning demand.
|
||||
|
||||
If you want to implement a customized Assessor, there are three things for you to do:
|
||||
|
||||
1) Inherit an assessor of a base Assessor class
|
||||
2) Implement assess_trial function
|
||||
3) Configure your customized Assessor in experiment yaml config file
|
||||
|
||||
**1. Inherit an assessor of a base Assessor class**
|
||||
|
||||
```python
|
||||
from nni.assessor import Assessor
|
||||
|
||||
class CustomizedAssessor(Assessor):
|
||||
def __init__(self, ...):
|
||||
...
|
||||
```
|
||||
|
||||
**2. Implement assess trial function**
|
||||
```python
|
||||
from nni.assessor import Assessor, AssessResult
|
||||
|
||||
class CustomizedAssessor(Assessor):
|
||||
def __init__(self, ...):
|
||||
...
|
||||
|
||||
def assess_trial(self, trial_history):
|
||||
"""
|
||||
Determines whether a trial should be killed. Must override.
|
||||
trial_history: a list of intermediate result objects.
|
||||
Returns AssessResult.Good or AssessResult.Bad.
|
||||
"""
|
||||
# you code implement here.
|
||||
...
|
||||
```
|
||||
|
||||
**3. Configure your customized Assessor in experiment yaml config file**
|
||||
|
||||
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the \_\_init__ constructor.
|
||||
|
||||
```yaml
|
||||
|
||||
assessor:
|
||||
codeDir: /home/abc/myassessor
|
||||
classFileName: my_customized_assessor.py
|
||||
className: CustomizedAssessor
|
||||
# Any parameter need to pass to your Assessor class __init__ constructor
|
||||
# can be specified in this optional classArgs field, for example
|
||||
classArgs:
|
||||
arg1: value1
|
||||
|
||||
```
|
||||
|
||||
Please noted in **2**. The object `trial_history` are exact the object that Trial send to Assessor by using SDK `report_intermediate_result` function.
|
||||
|
||||
More detail example you could see:
|
||||
> * [medianstop-assessor](../src/sdk/pynni/nni/medianstop_assessor)
|
||||
> * [curvefitting-assessor](../src/sdk/pynni/nni/curvefitting_assessor)
|
|
@ -0,0 +1,118 @@
|
|||
# Customize-Tuner
|
||||
|
||||
## Customize Tuner
|
||||
|
||||
NNI provides state-of-the-art tuning algorithm in our builtin-tuners. We also support building a tuner by yourself to adjust your tuning demand.
|
||||
|
||||
If you want to implement and use your own tuning algorithm, you can implement a customized Tuner, there are three things for you to do:
|
||||
|
||||
1) Inherit a tuner of a base Tuner class
|
||||
2) Implement receive_trial_result and generate_parameter function
|
||||
3) Configure your customized tuner in experiment yaml config file
|
||||
|
||||
Here is an example:
|
||||
|
||||
**1. Inherit a tuner of a base Tuner class**
|
||||
|
||||
```python
|
||||
from nni.tuner import Tuner
|
||||
|
||||
class CustomizedTuner(Tuner):
|
||||
def __init__(self, ...):
|
||||
...
|
||||
```
|
||||
|
||||
**2. Implement receive_trial_result and generate_parameter function**
|
||||
|
||||
```python
|
||||
from nni.tuner import Tuner
|
||||
|
||||
class CustomizedTuner(Tuner):
|
||||
def __init__(self, ...):
|
||||
...
|
||||
|
||||
def receive_trial_result(self, parameter_id, parameters, value):
|
||||
'''
|
||||
Record an observation of the objective function and Train
|
||||
parameter_id: int
|
||||
parameters: object created by 'generate_parameters()'
|
||||
value: final metrics of the trial, including default matrix
|
||||
'''
|
||||
# your code implements here.
|
||||
...
|
||||
|
||||
def generate_parameters(self, parameter_id):
|
||||
'''
|
||||
Returns a set of trial (hyper-)parameters, as a serializable object
|
||||
parameter_id: int
|
||||
'''
|
||||
# your code implements here.
|
||||
return your_parameters
|
||||
...
|
||||
```
|
||||
|
||||
`receive_trial_result` will receive the `parameter_id, parameters, value` as parameters input. Also, Tuner will receive the `value` object are exactly same value that Trial send.
|
||||
|
||||
The `your_parameters` return from `generate_parameters` function, will be package as json object by NNI SDK. NNI SDK will unpack json object so the Trial will receive the exact same `your_parameters` from Tuner.
|
||||
|
||||
For example:
|
||||
If the you implement the `generate_parameters` like this:
|
||||
|
||||
```python
|
||||
|
||||
def generate_parameters(self, parameter_id):
|
||||
'''
|
||||
Returns a set of trial (hyper-)parameters, as a serializable object
|
||||
parameter_id: int
|
||||
'''
|
||||
# your code implements here.
|
||||
return {"dropout": 0.3, "learning_rate": 0.4}
|
||||
|
||||
```
|
||||
|
||||
It means your Tuner will always generate parameters `{"dropout": 0.3, "learning_rate": 0.4}`. Then Trial will receive `{"dropout": 0.3, "learning_rate": 0.4}` by calling API `nni.get_next_parameter()`. Once the trial ends with a result (normally some kind of metrics), it can send the result to Tuner by calling API `nni.report_final_result()`, for example `nni.report_final_result(0.93)`. Then your Tuner's `receive_trial_result` function will receied the result like:
|
||||
|
||||
```
|
||||
|
||||
parameter_id = 82347
|
||||
parameters = {"dropout": 0.3, "learning_rate": 0.4}
|
||||
value = 0.93
|
||||
|
||||
```
|
||||
|
||||
**Note that** if you want to access a file (e.g., `data.txt`) in the directory of your own tuner, you cannot use `open('data.txt', 'r')`. Instead, you should use the following:
|
||||
|
||||
```
|
||||
|
||||
_pwd = os.path.dirname(__file__)
|
||||
_fd = open(os.path.join(_pwd, 'data.txt'), 'r')
|
||||
|
||||
```
|
||||
|
||||
This is because your tuner is not executed in the directory of your tuner (i.e., `pwd` is not the directory of your own tuner).
|
||||
|
||||
**3. Configure your customized tuner in experiment yaml config file**
|
||||
|
||||
NNI needs to locate your customized tuner class and instantiate the class, so you need to specify the location of the customized tuner class and pass literal values as parameters to the \_\_init__ constructor.
|
||||
|
||||
```yaml
|
||||
|
||||
tuner:
|
||||
codeDir: /home/abc/mytuner
|
||||
classFileName: my_customized_tuner.py
|
||||
className: CustomizedTuner
|
||||
# Any parameter need to pass to your tuner class __init__ constructor
|
||||
# can be specified in this optional classArgs field, for example
|
||||
classArgs:
|
||||
arg1: value1
|
||||
|
||||
```
|
||||
|
||||
More detail example you could see:
|
||||
> * [evolution-tuner](../src/sdk/pynni/nni/evolution_tuner)
|
||||
> * [hyperopt-tuner](../src/sdk/pynni/nni/hyperopt_tuner)
|
||||
> * [evolution-based-customized-tuner](../examples/tuners/ga_customer_tuner)
|
||||
|
||||
### Write a more advanced automl algorithm
|
||||
|
||||
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](../src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](./howto_3_CustomizedAdvisor.md) for how to write a customized advisor.
|
|
@ -0,0 +1,12 @@
|
|||
######################
|
||||
Examples
|
||||
######################
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
MNIST<mnist_examples>
|
||||
Cifar10<cifar10_examples>
|
||||
Scikit-learn<sklearn_examples>
|
||||
EvolutionSQuAD<SQuAD_evolution_examples>
|
||||
GBDT<gbdt_example>
|
|
@ -3,6 +3,12 @@
|
|||
A config file is needed when create an experiment, the path of the config file is provide to nnictl.
|
||||
The config file is written in yaml format, and need to be written correctly.
|
||||
This document describes the rule to write config file, and will provide some examples and templates.
|
||||
|
||||
- [Template](#Template) (the templates of an config file)
|
||||
- [Configuration spec](#Configuration) (the configuration specification of every attribute in config file)
|
||||
- [Examples](#Examples) (the examples of config file)
|
||||
|
||||
<a name="Template"></a>
|
||||
## Template
|
||||
* __light weight(without Annotation and Assessor)__
|
||||
|
||||
|
@ -112,8 +118,8 @@ machineList:
|
|||
username:
|
||||
passwd:
|
||||
```
|
||||
|
||||
## Configuration
|
||||
<a name="Configuration"></a>
|
||||
## Configuration spec
|
||||
* __authorName__
|
||||
* Description
|
||||
|
||||
|
@ -131,12 +137,14 @@ machineList:
|
|||
|
||||
__trialConcurrency__ specifies the max num of trial jobs run simultaneously.
|
||||
|
||||
Note: if trialGpuNum is bigger than the free gpu numbers, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
|
||||
Note: if trialGpuNum is bigger than the free gpu numbers, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
|
||||
|
||||
* __maxExecDuration__
|
||||
* Description
|
||||
|
||||
__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.
|
||||
|
||||
Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more.
|
||||
|
||||
* __maxTrialNum__
|
||||
* Description
|
||||
|
@ -437,7 +445,7 @@ machineList:
|
|||
|
||||
|
||||
|
||||
|
||||
<a name="Examples"></a>
|
||||
## Examples
|
||||
* __local mode__
|
||||
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
# FAQ
|
||||
|
||||
This page is for frequent asked questions and answers.
|
||||
|
||||
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
**Get Started with NNI**
|
||||
===
|
||||
# Get Started with NNI
|
||||
|
||||
## **Installation**
|
||||
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
Grid Search on nni
|
||||
===
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file.
|
||||
|
||||
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform` since only these three types of search space can be exhausted.
|
||||
|
||||
Moreover, in GridSearch Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows:
|
||||
|
||||
* Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it.
|
||||
* Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back.
|
||||
|
||||
## 2. Usage
|
||||
|
||||
Since Grid Search Tuner will exhaust all possible hyper-parameter combination according to the search space file without any hyper-parameter for tuner itself, all you need to do is to specify tuner name in your experiment's yaml config file:
|
||||
|
||||
```
|
||||
tuner:
|
||||
builtinTunerName: GridSearch
|
||||
```
|
||||
|
|
@ -1,31 +1,23 @@
|
|||
**Installation of NNI**
|
||||
===
|
||||
# Installation of NNI
|
||||
|
||||
Currently we only support installation on Linux & Mac.
|
||||
|
||||
## **Installation**
|
||||
* __Dependencies__
|
||||
|
||||
```bash
|
||||
python >= 3.5
|
||||
git
|
||||
wget
|
||||
```
|
||||
|
||||
python pip should also be correctly installed. You could use "python3 -m pip -v" to check pip version.
|
||||
|
||||
* __Install NNI through pip__
|
||||
|
||||
Prerequisite: `python >= 3.5`
|
||||
```bash
|
||||
python3 -m pip install --user --upgrade nni
|
||||
python3 -m pip install --upgrade nni
|
||||
```
|
||||
|
||||
* __Install NNI through source code__
|
||||
|
||||
Prerequisite: `python >=3.5, git, wget`
|
||||
```bash
|
||||
git clone -b v0.5 https://github.com/Microsoft/nni.git
|
||||
git clone -b v0.5.1 https://github.com/Microsoft/nni.git
|
||||
cd nni
|
||||
source install.sh
|
||||
./install.sh
|
||||
```
|
||||
|
||||
* __Install NNI in docker image__
|
||||
|
@ -66,5 +58,7 @@ Below are the minimum system requirements for NNI on macOS. Due to potential pro
|
|||
* [Define search space](SearchSpaceSpec.md)
|
||||
* [Config an experiment](ExperimentConfig.md)
|
||||
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
|
||||
* [How to run an experiment on multiple machines?](tutorial_2_RemoteMachineMode.md)
|
||||
* [How to run an experiment on multiple machines?](RemoteMachineMode.md)
|
||||
* [How to run an experiment on OpenPAI?](PAIMode.md)
|
||||
* [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md)
|
||||
* [How to run an experiment on Kubernetes through FrameworkController?](FrameworkControllerMode.md)
|
|
@ -0,0 +1,19 @@
|
|||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line.
|
||||
SPHINXOPTS =
|
||||
SPHINXBUILD = sphinx-build
|
||||
SOURCEDIR = .
|
||||
BUILDDIR = _build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
|
@ -0,0 +1,630 @@
|
|||
.. role:: raw-html-m2r(raw)
|
||||
:format: html
|
||||
|
||||
|
||||
nnictl
|
||||
======
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
**nnictl** is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc.
|
||||
|
||||
Commands
|
||||
--------
|
||||
|
||||
nnictl support commands:
|
||||
|
||||
|
||||
* `nnictl create <#create>`_
|
||||
* `nnictl resume <#resume>`_
|
||||
* `nnictl stop <#stop>`_
|
||||
* `nnictl update <#update>`_
|
||||
* `nnictl trial <#trial>`_
|
||||
* `nnictl top <#top>`_
|
||||
* `nnictl experiment <#experiment>`_
|
||||
* `nnictl config <#config>`_
|
||||
* `nnictl log <#log>`_
|
||||
* `nnictl webui <#webui>`_
|
||||
* `nnictl tensorboard <#tensorboard>`_
|
||||
* `nnictl package <#package>`_
|
||||
|
||||
Manage an experiment
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
:raw-html-m2r:`<a name="create"></a>`
|
||||
|
||||
|
||||
*
|
||||
**nnictl create**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to create a new experiment, using the configuration specified in config file.
|
||||
After this command is successfully done, the context will be set as this experiment,
|
||||
which means the following command you issued is associated with this experiment,
|
||||
unless you explicitly changes the context(not supported yet).
|
||||
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl create [OPTIONS]
|
||||
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+=====================================+
|
||||
| --config, -c | True | |yaml configure file of the experiment|
|
||||
+-------------------+-----------+-----------+-------------------------------------+
|
||||
| --port, -p | False | |the port of restful server |
|
||||
+-------------------+-----------+-----------+-------------------------------------+
|
||||
:raw-html-m2r:`<a name="resume"></a>`
|
||||
|
||||
*
|
||||
**nnictl resume**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to resume a stopped experiment.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl resume [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |The id of the experiment you want to resume |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --port, -p | False | |Rest port of the experiment you want to resume |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
|
||||
:raw-html-m2r:`<a name="stop"></a>`
|
||||
|
||||
|
||||
*
|
||||
**nnictl stop**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to stop a running experiment or multiple experiments.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl stop [id]
|
||||
|
||||
*
|
||||
Detail
|
||||
|
||||
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
|
||||
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
|
||||
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
|
||||
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
|
||||
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
|
||||
6.Users could use 'nnictl stop all' to stop all experiments
|
||||
|
||||
:raw-html-m2r:`<a name="update"></a>`
|
||||
|
||||
*
|
||||
**nnictl update**
|
||||
|
||||
|
||||
*
|
||||
**nnictl update searchspace**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to update an experiment's search space.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl update searchspace [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --filename, -f | True | |the file storing your new search space |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
|
||||
*
|
||||
**nnictl update concurrency**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to update an experiment's concurrency.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl update concurrency [OPTIONS]
|
||||
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --value, -v | True | |the number of allowed concurrent trials |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl update duration**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to update an experiment's concurrency.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl update duration [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --value, -v | True | |the experiment duration will be NUMBER seconds.|
|
||||
| | | |SUFFIX may be 's' for seconds (the default), |
|
||||
| | | |'m' for minutes, 'h' for hours or 'd' for days.|
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
|
||||
*
|
||||
**nnictl update trialnum**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to update an experiment's maxtrialnum.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl update trialnum [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --value, -v | True | |the new number of maxtrialnum you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
:raw-html-m2r:`<a name="trial"></a>`
|
||||
|
||||
|
||||
*
|
||||
**nnictl trial**
|
||||
|
||||
|
||||
*
|
||||
**nnictl trial ls**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to show trial's information.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl trial ls
|
||||
|
||||
*
|
||||
Options:
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl trial kill**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
You can use this command to kill a trial job.
|
||||
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl trial kill [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --trialid, -t | True | |ID of the trial you want to kill. |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
:raw-html-m2r:`<a name="top"></a>`
|
||||
|
||||
*
|
||||
**nnictl top**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Monitor all of running experiments.
|
||||
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl top
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --time, -t | False | |The interval to update the experiment status, |
|
||||
| | | |the unit of time is second, |
|
||||
| | | |and the default value is 3 second. |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
:raw-html-m2r:`<a name="experiment"></a>`
|
||||
|
||||
Manage experiment information
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
*
|
||||
**nnictl experiment show**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show the information of experiment.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl experiment show
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl experiment status**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show the status of experiment.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl experiment status
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to check |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl experiment list**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show the information of all the (running) experiments.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl experiment list
|
||||
|
||||
:raw-html-m2r:`<a name="config"></a>`
|
||||
|
||||
|
||||
*
|
||||
**nnictl config show**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Display the current context information.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl config show
|
||||
|
||||
:raw-html-m2r:`<a name="log"></a>`
|
||||
|
||||
Manage log
|
||||
^^^^^^^^^^
|
||||
|
||||
|
||||
*
|
||||
**nnictl log stdout**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show the stdout log content.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl log stdout [options]
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --head, -h | False | |show head lines of stdout |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --tail, -t | False | |show tail lines of stdout |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --path, -p | False | |show the path of stdout file |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl log stderr**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show the stderr log content.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl log stderr [options]
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --head, -h | False | |show head lines of stderr |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --tail, -t | False | |show tail lines of stderr |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --path, -p | False | |show the path of stderr file |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl log trial**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Show trial log path.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
:raw-html-m2r:`<a name="webui"></a>`
|
||||
|
||||
Manage webui
|
||||
^^^^^^^^^^^^
|
||||
|
||||
|
||||
*
|
||||
**nnictl webui url**
|
||||
|
||||
:raw-html-m2r:`<a name="tensorboard"></a>`
|
||||
|
||||
Manage tensorboard
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
*
|
||||
**nnictl tensorboard start**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Start the tensorboard process.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl tensorboard start
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment you want to set |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --trialid | False | |ID of the trial |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| --port | False | 6006 |The port of the tensorboard process |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
Detail
|
||||
|
||||
|
||||
#. NNICTL support tensorboard function in local and remote platform for the moment, other platforms will be supported later.
|
||||
#. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path.
|
||||
#. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process.
|
||||
#. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again.
|
||||
#. If there is only one trial job, you don't need to set trialid. If there are multiple trial jobs running, you should set the trialid, or you could use [nnictl tensorboard start --trialid all] to map --logdir to all trial log paths.
|
||||
|
||||
*
|
||||
**nnictl tensorboard stop**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Stop all of the tensorboard process.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl tensorboard stop
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| id | False | |ID of the experiment |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
:raw-html-m2r:`<a name="package"></a>`
|
||||
|
||||
Manage package
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
*
|
||||
**nnictl package install**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
Install the packages needed in nni experiments.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl package install [OPTIONS]
|
||||
|
||||
*
|
||||
Options:
|
||||
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
| Name, shorthand | Required | Default | Description |
|
||||
+===================+===========+===========+===============================================+
|
||||
| --name | True | |The name of package to be installed |
|
||||
+-------------------+-----------+-----------+-----------------------------------------------+
|
||||
|
||||
*
|
||||
**nnictl package show**
|
||||
|
||||
|
||||
*
|
||||
Description
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
List the packages supported.
|
||||
|
||||
*
|
||||
Usage
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nnictl package show
|
|
@ -8,43 +8,42 @@ __nnictl__ is a command line tool, which can be used to control experiments, suc
|
|||
## Commands
|
||||
|
||||
nnictl support commands:
|
||||
- [nnictl create](#create)
|
||||
- [nnictl resume](#resume)
|
||||
- [nnictl stop](#stop)
|
||||
- [nnictl update](#update)
|
||||
- [nnictl trial](#trial)
|
||||
- [nnictl top](#top)
|
||||
- [nnictl experiment](#experiment)
|
||||
- [nnictl config](#config)
|
||||
- [nnictl log](#log)
|
||||
- [nnictl webui](#webui)
|
||||
- [nnictl tensorboard](#tensorboard)
|
||||
- [nnictl package](#package)
|
||||
|
||||
|
||||
```bash
|
||||
nnictl create
|
||||
nnictl stop
|
||||
nnictl update
|
||||
nnictl resume
|
||||
nnictl trial
|
||||
nnictl experiment
|
||||
nnictl config
|
||||
nnictl log
|
||||
nnictl webui
|
||||
nnictl tensorboard
|
||||
nnictl top
|
||||
nnictl --version
|
||||
```
|
||||
|
||||
### Manage an experiment
|
||||
|
||||
* __nnictl create__
|
||||
|
||||
* Description
|
||||
|
||||
You can use this command to create a new experiment, using the configuration specified in config file. After this command is successfully done, the context will be set as this experiment, which means the following command you issued is associated with this experiment, unless you explicitly changes the context(not supported yet).
|
||||
|
||||
<a name="create"></a>
|
||||
* __nnictl create__
|
||||
* Description
|
||||
|
||||
You can use this command to create a new experiment, using the configuration specified in config file.
|
||||
After this command is successfully done, the context will be set as this experiment,
|
||||
which means the following command you issued is associated with this experiment,
|
||||
unless you explicitly changes the context(not supported yet).
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl create [OPTIONS]
|
||||
```
|
||||
|
||||
Options:
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| --config, -c| True| |yaml configure file of the experiment|
|
||||
| --port, -p | False| |the port of restful server|
|
||||
| --debug, -d | False| |Set log level to debug|
|
||||
|
||||
|
||||
nnictl create [OPTIONS]
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| --config, -c| True| |yaml configure file of the experiment|
|
||||
| --port, -p | False| |the port of restful server|
|
||||
<a name="resume"></a>
|
||||
* __nnictl resume__
|
||||
|
||||
* Description
|
||||
|
@ -63,8 +62,10 @@ nnictl --version
|
|||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |The id of the experiment you want to resume|
|
||||
| --port, -p| False| |Rest port of the experiment you want to resume|
|
||||
| --debug, -d | False| |Set log level to debug|
|
||||
|
||||
|
||||
|
||||
|
||||
<a name="stop"></a>
|
||||
* __nnictl stop__
|
||||
* Description
|
||||
|
||||
|
@ -77,92 +78,82 @@ nnictl --version
|
|||
```
|
||||
|
||||
* Detail
|
||||
|
||||
1. If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
|
||||
|
||||
2. If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
|
||||
|
||||
3. If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
|
||||
|
||||
4. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
|
||||
|
||||
5. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
|
||||
|
||||
6. Users could use 'nnictl stop all' to stop all experiments
|
||||
|
||||
|
||||
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
|
||||
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
|
||||
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
|
||||
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
|
||||
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
|
||||
6.Users could use 'nnictl stop all' to stop all experiments
|
||||
<a name="update"></a>
|
||||
* __nnictl update__
|
||||
|
||||
* __nnictl update searchspace__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's search space.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl update searchspace [OPTIONS]
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --filename, -f| True| |the file storing your new search space|
|
||||
|
||||
* __nnictl update concurrency__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's concurrency.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl update concurrency [OPTIONS]
|
||||
|
||||
* __nnictl update searchspace__
|
||||
* Description
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the number of allowed concurrent trials|
|
||||
|
||||
* __nnictl update duration__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's concurrency.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl update duration [OPTIONS]
|
||||
|
||||
You can use this command to update an experiment's search space.
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
|
||||
|
||||
* Usage
|
||||
* __nnictl update trialnum__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's maxtrialnum.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl update trialnum [OPTIONS]
|
||||
|
||||
```bash
|
||||
nnictl update searchspace [OPTIONS]
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --filename, -f| True| |the file storing your new search space|
|
||||
|
||||
* __nnictl update concurrency__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's concurrency.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl update concurrency [OPTIONS]
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the number of allowed concurrent trials|
|
||||
|
||||
* __nnictl update duration__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's duration.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl update duration [OPTIONS]
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
|
||||
|
||||
* __nnictl update trialnum__
|
||||
* Description
|
||||
|
||||
You can use this command to update an experiment's maxtrialnum.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl update trialnum [OPTIONS]
|
||||
```
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the new number of maxtrialnum you want to set|
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --value, -v| True| |the new number of maxtrialnum you want to set|
|
||||
|
||||
|
||||
<a name="trial"></a>
|
||||
* __nnictl trial__
|
||||
|
||||
* __nnictl trial ls__
|
||||
|
@ -184,41 +175,39 @@ nnictl --version
|
|||
| id| False| |ID of the experiment you want to set|
|
||||
|
||||
* __nnictl trial kill__
|
||||
* Description
|
||||
|
||||
You can use this command to kill a trial job.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl trial kill [OPTIONS]
|
||||
```
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --trialid, -t| True| |ID of the trial you want to kill.|
|
||||
* Description
|
||||
|
||||
You can use this command to kill a trial job.
|
||||
* Usage
|
||||
|
||||
* __nnictl top__
|
||||
nnictl trial kill [OPTIONS]
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --trialid, -t| True| |ID of the trial you want to kill.|
|
||||
<a name="top"></a>
|
||||
* __nnictl top__
|
||||
|
||||
* Description
|
||||
|
||||
Monitor all of running experiments.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl top
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.|
|
||||
|
||||
Monitor all of running experiments.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl top
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
| --time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.|
|
||||
|
||||
|
||||
<a name="experiment"></a>
|
||||
### Manage experiment information
|
||||
|
||||
* __nnictl experiment show__
|
||||
|
@ -268,23 +257,18 @@ nnictl --version
|
|||
nnictl experiment list
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| all| False| False|Show all of experiments, including stopped experiments.|
|
||||
|
||||
|
||||
<a name="config"></a>
|
||||
* __nnictl config show__
|
||||
* Description
|
||||
|
||||
Display the current context information.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl config show
|
||||
```
|
||||
|
||||
* Description
|
||||
|
||||
Display the current context information.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl config show
|
||||
|
||||
<a name="log"></a>
|
||||
### Manage log
|
||||
|
||||
* __nnictl log stdout__
|
||||
|
@ -335,36 +319,12 @@ nnictl --version
|
|||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl log trial [options]
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |the id of trial|
|
||||
|
||||
<a name="webui"></a>
|
||||
### Manage webui
|
||||
|
||||
* __nnictl webui url__
|
||||
|
||||
* Description
|
||||
|
||||
Show the urls of the experiment.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl webui url
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
|
||||
<a name="tensorboard"></a>
|
||||
### Manage tensorboard
|
||||
|
||||
* __nnictl tensorboard start__
|
||||
|
@ -396,32 +356,42 @@ nnictl --version
|
|||
5. If there is only one trial job, you don't need to set trialid. If there are multiple trial jobs running, you should set the trialid, or you could use [nnictl tensorboard start --trialid all] to map --logdir to all trial log paths.
|
||||
|
||||
* __nnictl tensorboard stop__
|
||||
* Description
|
||||
* Description
|
||||
|
||||
Stop all of the tensorboard process.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl tensorboard stop
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
|
||||
Stop all of the tensorboard process.
|
||||
<a name="package"></a>
|
||||
### Manage package
|
||||
* __nnictl package install__
|
||||
* Description
|
||||
|
||||
Install the packages needed in nni experiments.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl package install [OPTIONS]
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| --name| True| |The name of package to be installed|
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl tensorboard stop
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Name, shorthand | Required|Default | Description |
|
||||
| ------ | ------ | ------ |------ |
|
||||
| id| False| |ID of the experiment you want to set|
|
||||
|
||||
### Check nni version
|
||||
|
||||
* __nnictl --version__
|
||||
|
||||
* Description
|
||||
|
||||
Describe the current version of nni installed.
|
||||
|
||||
* Usage
|
||||
|
||||
```bash
|
||||
nnictl --version
|
||||
```
|
||||
* __nnictl package show__
|
||||
* Description
|
||||
|
||||
List the packages supported.
|
||||
|
||||
* Usage
|
||||
|
||||
nnictl package show
|
||||
|
|
|
@ -1,49 +1,61 @@
|
|||
# NNI Overview
|
||||
# Overview
|
||||
|
||||
NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning experiments. For each experiment, user only need to define a search space and update a few lines of code, and then leverage NNI build-in algorithms and training services to search the best hyper parameters and/or neural architecture.
|
||||
NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: easy-to-use, scalability, flexibility, and efficiency.
|
||||
|
||||
* **Easy-to-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both commandline tool and WebUI to work with your experiments.
|
||||
* **Scalability**: Tuning hyperparameters or neural architecture often demands large amount of computation resource, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., PAI, Kubernetes). Thousands of trials could run in parallel by depending on the capacity of your configured training platforms.
|
||||
* **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users could also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them.
|
||||
* **Efficiency**: We are intensively working on more efficient model tuning from both system level and algorithm level. For example, leveraging early feedback to speedup tuning procedure.
|
||||
|
||||
The figure below shows high-level architecture of NNI.
|
||||
|
||||
<p align="center">
|
||||
<img src="https://user-images.githubusercontent.com/23273522/51816536-ed055580-2301-11e9-8ad8-605a79ee1b9a.png" alt="drawing" width="700"/>
|
||||
</p>
|
||||
|
||||
## Key Concepts
|
||||
|
||||
* *Experiment*: An experiment is one task of, for example, finding out the best hyperparameters of a model, finding out the best neural network architecture. It consists of trials and AutoML algorithms.
|
||||
|
||||
* *Search Space*: It means the feasible region for tuning the model. For example, the value range of each hyperparameters.
|
||||
|
||||
* *Configuration*: A configuration is an instance from the search space, that is, each hyperparameter has a specific value.
|
||||
|
||||
* *Trial*: Trial is an individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific nerual architecture). Trial code should be able to run with the provided configuration.
|
||||
|
||||
* *Tuner*: Tuner is an AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
|
||||
|
||||
* *Assessor*: Assessor analyzes trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
|
||||
|
||||
* *Training Platform*: It means where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., PAI, Kubernetes).
|
||||
|
||||
Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted.
|
||||
|
||||
For each experiment, user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
|
||||
|
||||
>Step 1: [Define search space](SearchSpaceSpec.md)
|
||||
|
||||
>Step 2: [Update model codes](howto_1_WriteTrial.md)
|
||||
>Step 2: [Update model codes](Trials.md)
|
||||
|
||||
>Step 3: [Define Experiment](ExperimentConfig.md)
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img src="./img/3_steps.jpg" alt="drawing"/>
|
||||
</p>
|
||||
<img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/>
|
||||
</p>
|
||||
|
||||
After user submits the experiment through a command line tool [nnictl](../tools/README.md), a demon process (NNI manager) take care of search process. NNI manager continuously get search settings that generated by tuning algorithms, then NNI manager asks the training service component to dispatch and run trial jobs in a targeted training environment (e.g. local machine, remote servers and cloud). The results of trials jobs such as model accurate will send back to tuning algorithms for generating more meaningful search settings. NNI manager stops the search process after it find the best models.
|
||||
|
||||
## Architecture Overview
|
||||
<p align="center">
|
||||
<img src="./img/nni_arch_overview.png" alt="drawing"/>
|
||||
</p>
|
||||
|
||||
User can use the nnictl and/or a visualized Web UI nniboard to monitor and debug a given experiment.
|
||||
|
||||
NNI provides a set of examples in the package to get you familiar with the above process.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
**Experiment** in NNI is a method for testing different assumptions (hypotheses) by Trials under conditions constructed and controlled by NNI. During the experiment, one or more conditions are allowed to change in an organized manner and effects of these changes on associated conditions.
|
||||
|
||||
### **Trial**
|
||||
**Trial** in NNI is an individual attempt at applying a set of parameters on a model.
|
||||
|
||||
### **Tuner**
|
||||
**Tuner** in NNI is an implementation of Tuner API for a special tuning algorithm. [Read more about the Tuners supported in the latest NNI release](HowToChooseTuner.md)
|
||||
|
||||
### **Assessor**
|
||||
**Assessor** in NNI is an implementation of Assessor API for optimizing the execution of experiment.
|
||||
More details about how to run an experiment, please refer to [Get Started](QuickStart.md).
|
||||
|
||||
## Learn More
|
||||
* [Get started](GetStarted.md)
|
||||
* [Install NNI](Installation.md)
|
||||
* [Use command line tool nnictl](NNICTLDOC.md)
|
||||
* [Use NNIBoard](WebUI.md)
|
||||
* [Use annotation](howto_1_WriteTrial.md#nni-python-annotation)
|
||||
### **Tutorials**
|
||||
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
|
||||
* [Get started](QuickStart.md)
|
||||
* [How to adapt your trial code on NNI?](Trials.md)
|
||||
* [What are tuners supported by NNI?](Builtin_Tuner.md)
|
||||
* [How to customize your own tuner?](Customize_Tuner.md)
|
||||
* [What are assessors supported by NNI?](Builtin_Assessors.md)
|
||||
* [How to customize your own assessor?](Customize_Assessor.md)
|
||||
* [How to run an experiment on local?](tutorial_1_CR_exp_local_api.md)
|
||||
* [How to run an experiment on multiple machines?](tutorial_2_RemoteMachineMode.md)
|
||||
* [How to run an experiment on OpenPAI?](PAIMode.md)
|
||||
* [Examples](mnist_examples.md)
|
||||
|
||||
[How to do trouble shooting when using NNI?]: <> ()
|
|
@ -0,0 +1,232 @@
|
|||
# QuickStart
|
||||
|
||||
## Installation
|
||||
|
||||
We support Linux and MacOS in current stage, Ubuntu 16.04 or higher and MacOS 10.14.1 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
|
||||
|
||||
```bash
|
||||
python3 -m pip install --upgrade nni
|
||||
```
|
||||
|
||||
Note:
|
||||
|
||||
* `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges.
|
||||
* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
|
||||
* For the `system requirements` of NNI, please refer to [Install NNI](Installation.md)
|
||||
|
||||
## "Hello World" example on MNIST
|
||||
|
||||
NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, tuning hyperparameters. Now, we show how to use NNI to help you find the optimal hyperparameters.
|
||||
|
||||
Here is an example script to train a CNN on MNIST dataset **without NNI**:
|
||||
|
||||
```python
|
||||
def run_trial(params):
|
||||
# Input data
|
||||
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
|
||||
# Build MNIST network
|
||||
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], channel_2_num=params['channel_2_num'], conv_size=params['conv_size'], hidden_size=params['hidden_size'], pool_size=params['pool_size'], learning_rate=params['learning_rate'])
|
||||
mnist_network.build_network()
|
||||
|
||||
test_acc = 0.0
|
||||
with tf.Session() as sess:
|
||||
# Train MNIST network
|
||||
mnist_network.train(sess, mnist)
|
||||
# Evaluate MNIST network
|
||||
test_acc = mnist_network.evaluate(mnist)
|
||||
|
||||
if __name__ == '__main__':
|
||||
params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64, 'conv_size': 5, 'pool_size': 2, 'hidden_size': 1024, 'learning_rate': 1e-4, 'batch_num': 2000, 'batch_size': 32}
|
||||
run_trial(params)
|
||||
```
|
||||
|
||||
Note: If you want to see the full implementation, please refer to [examples/trials/mnist/mnist_before.py](../examples/trials/mnist/mnist_before.py)
|
||||
|
||||
The above code can only try one set of parameters at a time, if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
|
||||
|
||||
NNI is born for helping user do the tuning jobs, the NNI working process is presented below:
|
||||
|
||||
```
|
||||
input: search space, trial code, config file
|
||||
output: one optimal hyperparameter configuration
|
||||
|
||||
1: For t = 0, 1, 2, ..., maxTrialNum,
|
||||
2: hyperparameter = chose a set of parameter from search space
|
||||
3: final result = run_trial_and_evaluate(hyperparameter)
|
||||
4: report final result to NNI
|
||||
5: If reach the upper limit time,
|
||||
6: Stop the experiment
|
||||
7: return hyperparameter value with best final result
|
||||
```
|
||||
|
||||
If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes base on your code:
|
||||
|
||||
**Three things required to do when using NNI**
|
||||
|
||||
**Step 1**: Give a `Search Space` file in json, includes the `name` and the `distribution` (discrete valued or continuous valued) of all the hyperparameters you need to search.
|
||||
|
||||
```diff
|
||||
- params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64,
|
||||
- 'conv_size': 5, 'pool_size': 2, 'hidden_size': 1024, 'learning_rate': 1e-4, 'batch_num': 2000, 'batch_size': 32}
|
||||
+ {
|
||||
+ "dropout_rate":{"_type":"uniform","_value":[0.5, 0.9]},
|
||||
+ "conv_size":{"_type":"choice","_value":[2,3,5,7]},
|
||||
+ "hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
|
||||
+ "batch_size": {"_type":"choice", "_value": [1, 4, 8, 16, 32]},
|
||||
+ "learning_rate":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]}
|
||||
+ }
|
||||
```
|
||||
|
||||
*Implemented code directory: [search_space.json](../examples/trials/mnist/search_space.json)*
|
||||
|
||||
**Step 2**: Modified your `Trial` file to get the hyperparameter set from NNI and report the final result to NNI.
|
||||
|
||||
```diff
|
||||
+ import nni
|
||||
|
||||
def run_trial(params):
|
||||
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
|
||||
|
||||
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], channel_2_num=params['channel_2_num'], conv_size=params['conv_size'], hidden_size=params['hidden_size'], pool_size=params['pool_size'], learning_rate=params['learning_rate'])
|
||||
mnist_network.build_network()
|
||||
|
||||
with tf.Session() as sess:
|
||||
mnist_network.train(sess, mnist)
|
||||
test_acc = mnist_network.evaluate(mnist)
|
||||
+ nni.report_final_result(acc)
|
||||
|
||||
if __name__ == '__main__':
|
||||
- params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64,
|
||||
- 'conv_size': 5, 'pool_size': 2, 'hidden_size': 1024, 'learning_rate': 1e-4, 'batch_num': 2000, 'batch_size': 32}
|
||||
+ params = nni.get_next_parameter()
|
||||
run_trial(params)
|
||||
```
|
||||
|
||||
*Implemented code directory: [mnist.py](../examples/trials/mnist/mnist.py)*
|
||||
|
||||
**Step 3**: Define a `config` file in yaml, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max runtime arguments.
|
||||
|
||||
```yaml
|
||||
authorName: default
|
||||
experimentName: example_mnist
|
||||
trialConcurrency: 1
|
||||
maxExecDuration: 1h
|
||||
maxTrialNum: 10
|
||||
trainingServicePlatform: local
|
||||
# The path to Search Space
|
||||
searchSpacePath: search_space.json
|
||||
useAnnotation: false
|
||||
tuner:
|
||||
builtinTunerName: TPE
|
||||
# The path and the running command of trial
|
||||
trial:
|
||||
command: python3 mnist.py
|
||||
codeDir: .
|
||||
gpuNum: 0
|
||||
```
|
||||
|
||||
*Implemented code directory: [config.yml](../examples/trials/mnist/config.yml)*
|
||||
|
||||
All the codes above are already prepared and stored in [examples/trials/mnist/](../examples/trials/mnist).
|
||||
|
||||
When these things are done, **run the config.yml file from your command line to start the experiment**.
|
||||
|
||||
```bash
|
||||
nnictl create --config nni/examples/trials/mnist/config.yml
|
||||
```
|
||||
|
||||
Note: **nnictl** is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click [here](NNICTLDOC.md) for more usage of `nnictl`
|
||||
|
||||
Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. And this is what we expected to get:
|
||||
|
||||
```
|
||||
INFO: Starting restful server...
|
||||
INFO: Successfully started Restful server!
|
||||
INFO: Setting local config...
|
||||
INFO: Successfully set local config!
|
||||
INFO: Starting experiment...
|
||||
INFO: Successfully started experiment!
|
||||
-----------------------------------------------------------------------
|
||||
The experiment id is egchD4qy
|
||||
The Web UI urls are: [Your IP]:8080
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
You can use these commands to get more information about the experiment
|
||||
-----------------------------------------------------------------------
|
||||
commands description
|
||||
1. nnictl experiment show show the information of experiments
|
||||
2. nnictl trial ls list all of trial jobs
|
||||
3. nnictl top monitor the status of running experiments
|
||||
4. nnictl log stderr show stderr log content
|
||||
5. nnictl log stdout show stdout log content
|
||||
6. nnictl stop stop an experiment
|
||||
7. nnictl trial kill kill a trial job by id
|
||||
8. nnictl --help get help information about nnictl
|
||||
-----------------------------------------------------------------------
|
||||
```
|
||||
|
||||
If you prepare `trial`, `search space` and `config` according to the above steps and successfully create a NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameters sets for each trial according to the requirements you set. You can clearly sees its progress by NNI WebUI.
|
||||
|
||||
## WebUI
|
||||
|
||||
After you start your experiment in NNI successfully, you can find a message in the command-line interface to tell you `Web UI url` like this:
|
||||
|
||||
```
|
||||
The Web UI urls are: [Your IP]:8080
|
||||
```
|
||||
|
||||
Open the `Web UI url`(In this information is: `[Your IP]:8080`) in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below.
|
||||
|
||||
### View summary page
|
||||
|
||||
Click the tab "Overview".
|
||||
|
||||
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc.
|
||||
|
||||
![](./img/QuickStart1.png)
|
||||
|
||||
Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page.
|
||||
|
||||
![](./img/QuickStart2.png)
|
||||
|
||||
### View trials detail page
|
||||
|
||||
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
|
||||
|
||||
![](./img/QuickStart3.png)
|
||||
|
||||
Click the tab "Hyper Parameter" to see the parallel graph.
|
||||
|
||||
* You can select the percentage to see top trials.
|
||||
* Choose two axis to swap its positions
|
||||
|
||||
![](./img/QuickStart4.png)
|
||||
|
||||
Click the tab "Trial Duration" to see the bar graph.
|
||||
|
||||
![](./img/QuickStart5.png)
|
||||
|
||||
Below is the status of the all trials. Specifically:
|
||||
|
||||
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
|
||||
* If you run a pai experiment, you can also see the hdfsLogPath.
|
||||
* Kill: you can kill a job that status is running.
|
||||
* Support to search for a specific trial.
|
||||
|
||||
![](./img/QuickStart6.png)
|
||||
|
||||
* Intermediate Result Grap
|
||||
|
||||
![](./img/QuickStart7.png)
|
||||
|
||||
## Related Topic
|
||||
|
||||
* [Try different Tuners](Builtin_Tuner.md)
|
||||
* [Try different Assessors](Builtin_Assessors.md)
|
||||
* [How to use command line tool nnictl](NNICTLDOC.md)
|
||||
* [How to write a trial](Trials.md)
|
||||
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
|
||||
* [How to run an experiment on multiple machines?](RemoteMachineMode.md)
|
||||
* [How to run an experiment on OpenPAI?](PAIMode.md)
|
||||
* [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md)
|
||||
* [How to run an experiment on Kubernetes through FrameworkController?](FrameworkControllerMode.md)
|
|
@ -0,0 +1,133 @@
|
|||
# Overview
|
||||
|
||||
<p align="center">
|
||||
<img src="https://raw.githubusercontent.com/Microsoft/nni/master/docs/img/nni_logo.png" alt="drawing" width="300"/>
|
||||
</p>
|
||||
|
||||
-----------
|
||||
|
||||
[![](https://img.shields.io/badge/license-MIT-yellow.svg)](https://github.com/Microsoft/nni/blob/master/LICENSE)
|
||||
[![](https://msrasrg.visualstudio.com/NNIOpenSource/_apis/build/status/Microsoft.nni)](https://msrasrg.visualstudio.com/NNIOpenSource/_build/latest?definitionId=6)
|
||||
[![](https://img.shields.io/github/issues-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen)
|
||||
[![](https://img.shields.io/github/issues/Microsoft/nni/bug.svg)](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3Abug)
|
||||
[![](https://img.shields.io/github/issues-pr-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/pulls?q=is%3Apr+is%3Aopen)
|
||||
[![](https://img.shields.io/github/release/Microsoft/nni.svg)](https://github.com/Microsoft/nni/releases)
|
||||
[![](https://badges.gitter.im/Microsoft/nni.svg)](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||
|
||||
NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
|
||||
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
|
||||
|
||||
|
||||
![](img/nni_arch_overview.png)
|
||||
|
||||
## **Who should consider using NNI**
|
||||
* Those who want to try different AutoML algorithms in their training code (model) at their local machine.
|
||||
* Those who want to run AutoML trial jobs in different environments to speed up search (e.g. remote servers and cloud).
|
||||
* Researchers and data scientists who want to implement their own AutoML algorithms and compare it with other algorithms.
|
||||
* ML Platform owners who want to support AutoML in their platform.
|
||||
|
||||
## **Install & Verify**
|
||||
|
||||
**Install through pip**
|
||||
* We support Linux and MacOS in current stage, Ubuntu 16.04 or higher, along with MacOS 10.14.1 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
|
||||
```bash
|
||||
python3 -m pip install --user --upgrade nni
|
||||
```
|
||||
* Note:
|
||||
* If you are in docker container (as root), please remove `--user` from the installation command.
|
||||
* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
|
||||
|
||||
**Install through source code**
|
||||
* We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current stage.
|
||||
* Run the following commands in an environment that has `python >= 3.5`, `git` and `wget`.
|
||||
```bash
|
||||
git clone -b v0.4.1 https://github.com/Microsoft/nni.git
|
||||
cd nni
|
||||
source install.sh
|
||||
```
|
||||
|
||||
For the system requirements of NNI, please refer to [Install NNI](Installation.md)
|
||||
|
||||
**Verify install**
|
||||
|
||||
The following example is an experiment built on TensorFlow. Make sure you have **TensorFlow installed** before running it.
|
||||
* Download the examples via clone the source code.
|
||||
```bash
|
||||
git clone -b v0.4.1 https://github.com/Microsoft/nni.git
|
||||
```
|
||||
* Run the mnist example.
|
||||
```bash
|
||||
nnictl create --config nni/examples/trials/mnist/config.yml
|
||||
```
|
||||
|
||||
* Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the `Web UI url`.
|
||||
```
|
||||
INFO: Starting restful server...
|
||||
INFO: Successfully started Restful server!
|
||||
INFO: Setting local config...
|
||||
INFO: Successfully set local config!
|
||||
INFO: Starting experiment...
|
||||
INFO: Successfully started experiment!
|
||||
-----------------------------------------------------------------------
|
||||
The experiment id is egchD4qy
|
||||
The Web UI urls are: http://223.255.255.1:8080 http://127.0.0.1:8080
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
You can use these commands to get more information about the experiment
|
||||
-----------------------------------------------------------------------
|
||||
commands description
|
||||
1. nnictl experiment show show the information of experiments
|
||||
2. nnictl trial ls list all of trial jobs
|
||||
3. nnictl log stderr show stderr log content
|
||||
4. nnictl log stdout show stdout log content
|
||||
5. nnictl stop stop an experiment
|
||||
6. nnictl trial kill kill a trial job by id
|
||||
7. nnictl --help get help information about nnictl
|
||||
-----------------------------------------------------------------------
|
||||
```
|
||||
|
||||
* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](WebUI.md) are more Web UI pages.
|
||||
|
||||
|
||||
<table style="border: none">
|
||||
<th><img src="https://raw.githubusercontent.com/Microsoft/nni/dev-doc/docs/img/webui_overview_page.png" alt="drawing" width="395"/></th>
|
||||
<th><img src="https://raw.githubusercontent.com/Microsoft/nni/dev-doc/docs/img/webui_trialdetail_page.png" alt="drawing" width="410"/></th>
|
||||
</table>
|
||||
|
||||
|
||||
## **Documentation**
|
||||
* [NNI overview](Overview.md)
|
||||
* [Quick start](GetStarted.md)
|
||||
|
||||
## **How to**
|
||||
* [Install NNI](Installation.md)
|
||||
* [Use command line tool nnictl](NNICTLDOC.md)
|
||||
* [Use NNIBoard](WebUI.md)
|
||||
* [How to define search space](SearchSpaceSpec.md)
|
||||
* [How to define a trial](howto_1_WriteTrial.md)
|
||||
* [Config an experiment](ExperimentConfig.md)
|
||||
* [How to use annotation](howto_1_WriteTrial.md#nni-python-annotation)
|
||||
## **Tutorials**
|
||||
* [Run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
|
||||
* [Run an experiment on multiple machines?](tutorial_2_RemoteMachineMode.md)
|
||||
* [Run an experiment on OpenPAI?](PAIMode.md)
|
||||
* [Run an experiment on Kubeflow?](KubeflowMode.md)
|
||||
* [Try different tuners and assessors](tutorial_3_tryTunersAndAssessors.md)
|
||||
* [Implement a customized tuner](howto_2_CustomizedTuner.md)
|
||||
* [Implement a customized assessor](../examples/assessors/README.md)
|
||||
* [Use Genetic Algorithm to find good model architectures for Reading Comprehension task](../examples/trials/ga_squad/README.md)
|
||||
|
||||
## **Contribute**
|
||||
This project welcomes contributions and suggestions, we use [GitHub issues](https://github.com/Microsoft/nni/issues) for tracking requests and bugs.
|
||||
|
||||
Issues with the **good first issue** label are simple and easy-to-start ones that we recommend new contributors to start with.
|
||||
|
||||
To set up environment for NNI development, refer to the instruction: [Set up NNI developer environment](SetupNNIDeveloperEnvironment.md)
|
||||
|
||||
Before start coding, review and get familiar with the NNI Code Contribution Guideline: [Contributing](CONTRIBUTING.md)
|
||||
|
||||
We are in construction of the instruction for [How to Debug](HowToDebug.md), you are also welcome to contribute questions or suggestions on this area.
|
||||
|
||||
## **License**
|
||||
The entire codebase is under [MIT license](https://github.com/Microsoft/nni/blob/master/LICENSE)
|
||||
|
|
@ -1,39 +1,41 @@
|
|||
# Release 0.5.0 - 01/14/2019
|
||||
## Major Features
|
||||
### New tuner and assessor supports
|
||||
# ChangeLog
|
||||
|
||||
## Release 0.5.0 - 01/14/2019
|
||||
### Major Features
|
||||
#### New tuner and assessor supports
|
||||
* Support [Metis tuner](./HowToChooseTuner.md#MetisTuner) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
|
||||
* Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS.
|
||||
* Support [Curve fitting assessor](./HowToChooseTuner.md#Curvefitting) for early stop policy using learning curve extrapolation.
|
||||
* Advanced Support of [Weight Sharing](./AdvancedNAS.md): Enable weight sharing for NAS tuners, currently through NFS.
|
||||
|
||||
|
||||
### Training Service Enhancement
|
||||
#### Training Service Enhancement
|
||||
* [FrameworkController Training service](./FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes
|
||||
* FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet.
|
||||
* NNI provides unified and simple specification for job definition.
|
||||
* MNIST example for how to use FrameworkController.
|
||||
|
||||
### User Experience improvements
|
||||
#### User Experience improvements
|
||||
* A better trial logging support for NNI experiments in PAI, Kubeflow and FrameworkController mode:
|
||||
* An improved logging architecture to send stdout/stderr of trials to NNI manager via Http post. NNI manager will store trial's stdout/stderr messages in local log file.
|
||||
* Show the link for trial log file on WebUI.
|
||||
* Support to show final result's all key-value pairs.
|
||||
|
||||
# Release 0.4.1 - 12/14/2018
|
||||
## Major Features
|
||||
### New tuner supports
|
||||
## Release 0.4.1 - 12/14/2018
|
||||
### Major Features
|
||||
#### New tuner supports
|
||||
* Support [network morphism](./HowToChooseTuner.md#NetworkMorphism) as a new tuner
|
||||
|
||||
### Training Service improvements
|
||||
#### Training Service improvements
|
||||
* Migrate [Kubeflow training service](https://github.com/Microsoft/nni/blob/master/docs/KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
|
||||
* [Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service
|
||||
* Improvement on local code files uploading to OpenPAI HDFS
|
||||
* Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration
|
||||
|
||||
### NNICTL improvements
|
||||
#### NNICTL improvements
|
||||
* Show version information both in nnictl and WebUI. You can run **nnictl -v** to show your current installed NNI version
|
||||
|
||||
### WebUI improvements
|
||||
#### WebUI improvements
|
||||
* Enable modify concurrency number during experiment
|
||||
* Add feedback link to NNI github 'create issue' page
|
||||
* Enable customize top 10 trials regarding to metric numbers (largest or smallest)
|
||||
|
@ -41,14 +43,14 @@
|
|||
* Enable automatic scaling of axes for metric number
|
||||
* Update annotation to support displaying real choice in searchspace
|
||||
|
||||
## New examples
|
||||
### New examples
|
||||
* [FashionMnist](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism), work together with network morphism tuner
|
||||
* [Distributed MNIST example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed-pytorch) written in PyTorch
|
||||
|
||||
|
||||
# Release 0.4 - 12/6/2018
|
||||
## Release 0.4 - 12/6/2018
|
||||
|
||||
## Major Features
|
||||
### Major Features
|
||||
* [Kubeflow Training service](./KubeflowMode.md)
|
||||
* Support tf-operator
|
||||
* [Distributed trial example](../examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
|
||||
|
@ -63,7 +65,7 @@
|
|||
* Support search a specific trial by trial number
|
||||
* Show trial's hdfsLogPath
|
||||
* Download experiment parameters
|
||||
## Others
|
||||
### Others
|
||||
* Asynchronous dispatcher
|
||||
* Docker file update, add pytorch library
|
||||
* Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API.
|
||||
|
@ -73,8 +75,8 @@
|
|||
* Don’t print useless ‘metrics is empty’ log int PAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks PAI trial’s output for debugging purpose
|
||||
* Add timestamp at the beginning of each log entry in trial keeper.
|
||||
|
||||
# Release 0.3.0 - 11/2/2018
|
||||
## NNICTL new features and updates
|
||||
## Release 0.3.0 - 11/2/2018
|
||||
### NNICTL new features and updates
|
||||
* Support running multiple experiments simultaneously.
|
||||
|
||||
Before v0.3, NNI only supports running single experiment once a time. After this realse, users are able to run multiple experiments simultaneously. Each experiment will require a unique port, the 1st experiment will be set to the default port as previous versions. You can specify a unique port for the rest experiments as below:
|
||||
|
@ -83,7 +85,7 @@
|
|||
* Support updating max trial number.
|
||||
use ```nnictl update --help``` to learn more. Or refer to [NNICTL Spec](https://github.com/Microsoft/nni/blob/master/docs/NNICTLDOC.md) for the fully usage of NNICTL.
|
||||
|
||||
## API new features and updates
|
||||
### API new features and updates
|
||||
* <span style="color:red">**breaking change**</span>: nn.get_parameters() is refactored to nni.get_next_parameter. All examples of prior releases can not run on v0.3, please clone nni repo to get new examples. If you had applied NNI to your own codes, please update the API accordingly.
|
||||
|
||||
* New API **nni.get_sequence_id()**.
|
||||
|
@ -96,23 +98,23 @@
|
|||
* float
|
||||
* A python dict containing 'default' key, the value of 'default' key should be of type int or float. The dict can contain any other key value pairs.
|
||||
|
||||
## New tuner support
|
||||
### New tuner support
|
||||
* **Batch Tuner** which iterates all parameter combination, can be used to submit batch trial jobs.
|
||||
|
||||
## New examples
|
||||
### New examples
|
||||
* A NNI Docker image for public usage:
|
||||
```docker pull msranni/nni:latest```
|
||||
* New trial example: [NNI Sklearn Example](https://github.com/Microsoft/nni/tree/master/examples/trials/sklearn)
|
||||
* New competition example: [Kaggle Competition TGS Salt Example](https://github.com/Microsoft/nni/tree/master/examples/trials/kaggle-tgs-salt)
|
||||
|
||||
## Others
|
||||
### Others
|
||||
* UI refactoring, refer to [WebUI doc](WebUI.md) for how to work with the new UI.
|
||||
* Continuous Integration: NNI had switched to Azure pipelines
|
||||
* [Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).
|
||||
|
||||
|
||||
# Release 0.2.0 - 9/29/2018
|
||||
## Major Features
|
||||
## Release 0.2.0 - 9/29/2018
|
||||
### Major Features
|
||||
* Support [OpenPAI](https://github.com/Microsoft/pai) (aka pai) Training Service (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode)
|
||||
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
|
||||
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
|
||||
|
@ -123,14 +125,14 @@
|
|||
* Update ga squad example and related documentation
|
||||
* WebUI UX small enhancement and bug fix
|
||||
|
||||
## Known Issues
|
||||
### Known Issues
|
||||
[Known Issues in release 0.2.0](https://github.com/Microsoft/nni/labels/nni020knownissues).
|
||||
|
||||
# Release 0.1.0 - 9/10/2018 (initial release)
|
||||
## Release 0.1.0 - 9/10/2018 (initial release)
|
||||
|
||||
Initial release of Neural Network Intelligence (NNI).
|
||||
|
||||
## Major Features
|
||||
### Major Features
|
||||
* Installation and Deployment
|
||||
* Support pip install and source codes install
|
||||
* Support training services on local mode(including Multi-GPU mode) as well as multi-machines mode
|
||||
|
@ -147,5 +149,5 @@ Initial release of Neural Network Intelligence (NNI).
|
|||
* Others
|
||||
* Support simple GPU job scheduling
|
||||
|
||||
## Known Issues
|
||||
### Known Issues
|
||||
[Known Issues in release 0.1.0](https://github.com/Microsoft/nni/labels/nni010knownissues).
|
||||
|
|
|
@ -0,0 +1,11 @@
|
|||
References
|
||||
==================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
Command Line <NNICTL>
|
||||
Python API <sdk_reference>
|
||||
Annotation <AnnotationSpec>
|
||||
Configuration<ExperimentConfig>
|
||||
Search Space <SearchSpaceSpec>
|
|
@ -1,6 +1,4 @@
|
|||
**Run an Experiment on Multiple Machines**
|
||||
|
||||
===
|
||||
# Run an Experiment on Multiple Machines
|
||||
|
||||
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.
|
||||
|
||||
|
@ -16,14 +14,6 @@ e.g. Three machines and you login in with account `bob` (Note: the account is no
|
|||
|
||||
Install NNI on each of your machines following the install guide [here](GetStarted.md).
|
||||
|
||||
For remote machines that are used only to run trials but not the nnictl, you can just install python SDK:
|
||||
|
||||
* __Install python SDK through pip__
|
||||
|
||||
```bash
|
||||
python3 -m pip install --user --upgrade nni-sdk
|
||||
```
|
||||
|
||||
## Run an experiment
|
||||
|
||||
Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.
|
||||
|
|
|
@ -0,0 +1,308 @@
|
|||
# Automatic Model Architecture Search for Reading Comprehension
|
||||
This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension.
|
||||
|
||||
## 1. Search Space
|
||||
Since attention and recurrent neural network (RNN) have been proven effective in Reading Comprehension.
|
||||
We conclude the search space as follow:
|
||||
|
||||
1. IDENTITY (Effectively means keep training).
|
||||
2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
|
||||
3. REMOVE-RNN-LAYER
|
||||
4. INSERT-ATTENTION-LAYER(Inserts an attention layer.)
|
||||
5. REMOVE-ATTENTION-LAYER
|
||||
6. ADD-SKIP (Identity between random layers).
|
||||
7. REMOVE-SKIP (Removes random skip).
|
||||
|
||||
![](../examples/trials/ga_squad/ga_squad.png)
|
||||
|
||||
### New version
|
||||
Also we have another version which time cost is less and performance is better. We will release soon.
|
||||
|
||||
## 2. How to run this example in local?
|
||||
|
||||
### 2.1 Use downloading script to download data
|
||||
|
||||
Execute the following command to download needed files
|
||||
using the downloading script:
|
||||
|
||||
```
|
||||
chmod +x ./download.sh
|
||||
./download.sh
|
||||
```
|
||||
|
||||
Or Download manually
|
||||
|
||||
1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
|
||||
|
||||
```
|
||||
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
|
||||
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
|
||||
```
|
||||
|
||||
2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
|
||||
|
||||
```
|
||||
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
|
||||
unzip glove.840B.300d.zip
|
||||
```
|
||||
|
||||
### 2.2 Update configuration
|
||||
Modify `nni/examples/trials/ga_squad/config.yml`, here is the default configuration:
|
||||
|
||||
```
|
||||
authorName: default
|
||||
experimentName: example_ga_squad
|
||||
trialConcurrency: 1
|
||||
maxExecDuration: 1h
|
||||
maxTrialNum: 1
|
||||
#choice: local, remote
|
||||
trainingServicePlatform: local
|
||||
#choice: true, false
|
||||
useAnnotation: false
|
||||
tuner:
|
||||
codeDir: ~/nni/examples/tuners/ga_customer_tuner
|
||||
classFileName: customer_tuner.py
|
||||
className: CustomerTuner
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
trial:
|
||||
command: python3 trial.py
|
||||
codeDir: ~/nni/examples/trials/ga_squad
|
||||
gpuNum: 0
|
||||
```
|
||||
|
||||
In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
|
||||
|
||||
### 2.3 submit this job
|
||||
|
||||
```
|
||||
nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
|
||||
```
|
||||
|
||||
## 3 Run this example on OpenPAI
|
||||
|
||||
Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours.
|
||||
|
||||
### 3.1 Update configuration
|
||||
Modify `nni/examples/trials/ga_squad/config_pai.yaml`, here is the default configuration:
|
||||
|
||||
```
|
||||
authorName: default
|
||||
experimentName: example_ga_squad
|
||||
trialConcurrency: 1
|
||||
maxExecDuration: 1h
|
||||
maxTrialNum: 10
|
||||
#choice: local, remote, pai
|
||||
trainingServicePlatform: pai
|
||||
#choice: true, false
|
||||
useAnnotation: false
|
||||
#Your nni_manager ip
|
||||
nniManagerIp: 10.10.10.10
|
||||
tuner:
|
||||
codeDir: ../../tuners/ga_customer_tuner
|
||||
classFileName: customer_tuner.py
|
||||
className: CustomerTuner
|
||||
classArgs:
|
||||
optimize_mode: maximize
|
||||
trial:
|
||||
command: chmod +x ./download.sh && ./download.sh && python3 trial.py
|
||||
codeDir: .
|
||||
gpuNum: 0
|
||||
cpuNum: 1
|
||||
memoryMB: 32869
|
||||
#The docker image to run nni job on pai
|
||||
image: msranni/nni:latest
|
||||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
|
||||
dataDir: hdfs://10.10.10.10:9000/username/nni
|
||||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
|
||||
outputDir: hdfs://10.10.10.10:9000/username/nni
|
||||
paiConfig:
|
||||
#The username to login pai
|
||||
userName: username
|
||||
#The password to login pai
|
||||
passWord: password
|
||||
#The host of restful server of pai
|
||||
host: 10.10.10.10
|
||||
```
|
||||
|
||||
Please change the default value to your personal account and machine information. Including `nniManagerIp`, `dataDir`, `outputDir`, `userName`, `passWord` and `host`.
|
||||
|
||||
In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
|
||||
|
||||
`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1.
|
||||
|
||||
### 3.2 submit this job
|
||||
|
||||
```
|
||||
nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml
|
||||
```
|
||||
|
||||
## 4. Technical details about the trial
|
||||
|
||||
### 4.1 How does it works
|
||||
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
|
||||
|
||||
### 4.2 The trial
|
||||
|
||||
The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
|
||||
|
||||
* `attention.py` contains an implementation for attention mechanism in Tensorflow.
|
||||
* `data.py` contains functions for data preprocessing.
|
||||
* `evaluate.py` contains the evaluation script.
|
||||
* `graph.py` contains the definition of the computation graph.
|
||||
* `rnn.py` contains an implementation for GRU in Tensorflow.
|
||||
* `train_model.py` is a wrapper for the whole question answering model.
|
||||
|
||||
Among those files, `trial.py` and `graph_to_tf.py` are special.
|
||||
|
||||
`graph_to_tf.py` has a function named as `graph_to_network`, here is its skeleton code:
|
||||
|
||||
```
|
||||
def graph_to_network(input1,
|
||||
input2,
|
||||
input1_lengths,
|
||||
input2_lengths,
|
||||
graph,
|
||||
dropout_rate,
|
||||
is_training,
|
||||
num_heads=1,
|
||||
rnn_units=256):
|
||||
topology = graph.is_topology()
|
||||
layers = dict()
|
||||
layers_sequence_lengths = dict()
|
||||
num_units = input1.get_shape().as_list()[-1]
|
||||
layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
|
||||
positional_encoding(input1, scale=False, zero_pad=False)
|
||||
layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
|
||||
layers[0] = dropout(layers[0], dropout_rate, is_training)
|
||||
layers[1] = dropout(layers[1], dropout_rate, is_training)
|
||||
layers_sequence_lengths[0] = input1_lengths
|
||||
layers_sequence_lengths[1] = input2_lengths
|
||||
for _, topo_i in enumerate(topology):
|
||||
if topo_i == '|':
|
||||
continue
|
||||
if graph.layers[topo_i].graph_type == LayerType.input.value:
|
||||
# ......
|
||||
elif graph.layers[topo_i].graph_type == LayerType.attention.value:
|
||||
# ......
|
||||
# More layers to handle
|
||||
```
|
||||
|
||||
As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
|
||||
|
||||
```
|
||||
topology = graph.is_topology()
|
||||
```
|
||||
|
||||
performs topological sorting on the internal graph representation, and the code inside the loop:
|
||||
|
||||
```
|
||||
for _, topo_i in enumerate(topology):
|
||||
```
|
||||
|
||||
performs actually conversion that maps each layer to a part in Tensorflow computation graph.
|
||||
|
||||
### 4.3 The tuner
|
||||
|
||||
The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
|
||||
|
||||
```
|
||||
class CustomerTuner(Tuner):
|
||||
# ......
|
||||
|
||||
def generate_parameters(self, parameter_id):
|
||||
"""Returns a set of trial graph config, as a serializable object.
|
||||
parameter_id : int
|
||||
"""
|
||||
if len(self.population) <= 0:
|
||||
logger.debug("the len of poplution lower than zero.")
|
||||
raise Exception('The population is empty')
|
||||
pos = -1
|
||||
for i in range(len(self.population)):
|
||||
if self.population[i].result == None:
|
||||
pos = i
|
||||
break
|
||||
if pos != -1:
|
||||
indiv = copy.deepcopy(self.population[pos])
|
||||
self.population.pop(pos)
|
||||
temp = json.loads(graph_dumps(indiv.config))
|
||||
else:
|
||||
random.shuffle(self.population)
|
||||
if self.population[0].result > self.population[1].result:
|
||||
self.population[0] = self.population[1]
|
||||
indiv = copy.deepcopy(self.population[0])
|
||||
self.population.pop(1)
|
||||
indiv.mutation()
|
||||
graph = indiv.config
|
||||
temp = json.loads(graph_dumps(graph))
|
||||
|
||||
# ......
|
||||
```
|
||||
|
||||
As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
|
||||
|
||||
```
|
||||
if self.population[0].result > self.population[1].result:
|
||||
self.population[0] = self.population[1]
|
||||
indiv = copy.deepcopy(self.population[0])
|
||||
```
|
||||
|
||||
controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
|
||||
|
||||
### 4.4 Model configuration format
|
||||
|
||||
Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
|
||||
|
||||
```
|
||||
{
|
||||
"max_layer_num": 50,
|
||||
"layers": [
|
||||
{
|
||||
"input_size": 0,
|
||||
"type": 3,
|
||||
"output_size": 1,
|
||||
"input": [],
|
||||
"size": "x",
|
||||
"output": [4, 5],
|
||||
"is_delete": false
|
||||
},
|
||||
{
|
||||
"input_size": 0,
|
||||
"type": 3,
|
||||
"output_size": 1,
|
||||
"input": [],
|
||||
"size": "y",
|
||||
"output": [4, 5],
|
||||
"is_delete": false
|
||||
},
|
||||
{
|
||||
"input_size": 1,
|
||||
"type": 4,
|
||||
"output_size": 0,
|
||||
"input": [6],
|
||||
"size": "x",
|
||||
"output": [],
|
||||
"is_delete": false
|
||||
},
|
||||
{
|
||||
"input_size": 1,
|
||||
"type": 4,
|
||||
"output_size": 0,
|
||||
"input": [5],
|
||||
"size": "y",
|
||||
"output": [],
|
||||
"is_delete": false
|
||||
},
|
||||
{"Comment": "More layers will be here for actual graphs."}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
|
||||
|
||||
* `type` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively.
|
||||
* `size` is the length of the output. "x", "y" correspond to document length / question length, respectively.
|
||||
* `input_size` is the number of inputs the layer has.
|
||||
* `input` is the indices of layers taken as input of this layer.
|
||||
* `output` is the indices of layers use this layer's output as their input.
|
||||
* `is_delete` means whether the layer is still available.
|
|
@ -1,8 +1,12 @@
|
|||
## How to define search space?
|
||||
# Search Space
|
||||
|
||||
### Hyper-parameter Search Space
|
||||
## Overview
|
||||
|
||||
* A search space configure example as follow:
|
||||
In NNI, tuner will sample parameters/architecture according to the search space, which is defined as a json file.
|
||||
|
||||
To define a search space, users should define the name of variable, the type of sampling strategy and its parameters.
|
||||
|
||||
* A example of search space definition as follow:
|
||||
|
||||
```python
|
||||
{
|
||||
|
@ -15,11 +19,12 @@
|
|||
|
||||
```
|
||||
|
||||
The example define `dropout_rate` as variable which priori distribution is uniform distribution, and its value from `0.1` and `0.5`.
|
||||
The tuner will sample parameters/architecture by understanding the search space first.
|
||||
|
||||
User should define the name of variable, type and candidate value of variable.
|
||||
The candidate type and value for variable is here:
|
||||
Take the first line as an example. ```dropout_rate``` is defined as a variable whose priori distribution is a uniform distribution of a range from ```0.1``` and ```0.5```.
|
||||
|
||||
## Types
|
||||
|
||||
All types of sampling strategies and their parameter are listed here:
|
||||
|
||||
* {"_type":"choice","_value":options}
|
||||
* Which means the variable value is one of the options, which should be a list. The elements of options can themselves be [nested] stochastic expressions. In this case, the stochastic choices that only appear in some of the options become conditional parameters.
|
||||
|
@ -67,8 +72,24 @@ The candidate type and value for variable is here:
|
|||
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
|
||||
<br/>
|
||||
|
||||
Note that SMAC only supports a subset of the types above, including `choice`, `randint`, `uniform`, `loguniform`, `quniform(q=1)`. In the current version, SMAC does not support cascaded search space (i.e., conditional variable in SMAC).
|
||||
|
||||
Note that GridSearch Tuner only supports a subset of the types above, including `choice`, `quniform` and `qloguniform`, where q here specifies the number of values that will be sampled. Details about the last two type as follows
|
||||
## Search Space Types Supported by Each Tuner
|
||||
|
||||
| | choice | randint | uniform | quniform | loguniform | qloguniform | normal | qnormal | lognormal | qlognormal |
|
||||
|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
||||
| TPE Tuner | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| Random Search Tuner| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| Anneal Tuner | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| Evolution Tuner | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| SMAC Tuner | ✓ | ✓ | ✓ | ✓ | ✓ | | | | | |
|
||||
| Batch Tuner | ✓ | | | | | | | | | |
|
||||
| Grid Search Tuner | ✓ | | | ✓ | | ✓ | | | | |
|
||||
| Hyperband Advisor | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| Metis Tuner | ✓ | ✓ | ✓ | ✓ | | | | | | |
|
||||
|
||||
Note that In GridSearch Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows
|
||||
|
||||
* Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it.
|
||||
* Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back.
|
||||
|
||||
Note that Metis Tuner only support numerical `choice` now
|
||||
|
|
|
@ -0,0 +1,135 @@
|
|||
# Write a Trial Run on NNI
|
||||
|
||||
A **Trial** in NNI is an individual attempt at applying a configuration (e.g., a set of hyper-parameters) on a model.
|
||||
|
||||
To define an NNI trial, you need to firstly define the set of parameters (i.e., search space) and then update the model. NNI provide two approaches for you to define a trial: [NNI API](#nni-api) and [NNI Python annotation](#nni-annotation). You could also refer to [here](#more-examples) for more trial examples.
|
||||
|
||||
<a name="nni-api"></a>
|
||||
## NNI API
|
||||
|
||||
### Step 1 - Prepare a SearchSpace parameters file.
|
||||
|
||||
An example is shown below:
|
||||
|
||||
```json
|
||||
{
|
||||
"dropout_rate":{"_type":"uniform","_value":[0.1,0.5]},
|
||||
"conv_size":{"_type":"choice","_value":[2,3,5,7]},
|
||||
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
|
||||
"learning_rate":{"_type":"uniform","_value":[0.0001, 0.1]}
|
||||
}
|
||||
```
|
||||
|
||||
Refer to [SearchSpaceSpec.md](./SearchSpaceSpec.md) to learn more about search space. Tuner will generate configurations from this search space, that is, choosing a value for each hyperparameter from the range.
|
||||
|
||||
### Step 2 - Update model codes
|
||||
|
||||
- Import NNI
|
||||
|
||||
Include `import nni` in your trial code to use NNI APIs.
|
||||
|
||||
- Get configuration from Tuner
|
||||
|
||||
```json
|
||||
RECEIVED_PARAMS = nni.get_next_parameter()
|
||||
```
|
||||
`RECEIVED_PARAMS` is an object, for example:
|
||||
`{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}`.
|
||||
|
||||
- Report metric data periodically (optional)
|
||||
|
||||
```json
|
||||
nni.report_intermediate_result(metrics)
|
||||
```
|
||||
`metrics` could be any python object. If users use NNI built-in tuner/assessor, `metrics` can only have two formats: 1) a number e.g., float, int, 2) a dict object that has a key named `default` whose value is a number. This `metrics` is reported to [assessor](Builtin_Assessors.md). Usually, `metrics` could be periodically evaluated loss or accuracy.
|
||||
|
||||
- Report performance of the configuration
|
||||
|
||||
```json
|
||||
nni.report_final_result(metrics)
|
||||
```
|
||||
`metrics` also could be any python object. If users use NNI built-in tuner/assessor, `metrics` follows the same format rule as that in `report_intermediate_result`, the number indicates the model's performance, for example, the model's accuracy, loss etc. This `metrics` is reported to [tuner](Builtin-Tuner.md).
|
||||
|
||||
### Step 3 - Enable NNI API
|
||||
|
||||
To enable NNI API mode, you need to set useAnnotation to *false* and provide the path of SearchSpace file (you just defined in step 1):
|
||||
|
||||
```json
|
||||
useAnnotation: false
|
||||
searchSpacePath: /path/to/your/search_space.json
|
||||
```
|
||||
|
||||
You can refer to [here](ExperimentConfig.md) for more information about how to set up experiment configurations.
|
||||
|
||||
*Please refer to [here]() for more APIs (e.g., `nni.get_sequence_id()`) provided by NNI.
|
||||
|
||||
|
||||
<a name="nni-annotation"></a>
|
||||
## NNI Python Annotation
|
||||
|
||||
An alternative to writing a trial is to use NNI's syntax for python. Simple as any annotation, NNI annotation is working like comments in your codes. You don't have to make structure or any other big changes to your existing codes. With a few lines of NNI annotation, you will be able to:
|
||||
|
||||
* annotate the variables you want to tune
|
||||
* specify in which range you want to tune the variables
|
||||
* annotate which variable you want to report as intermediate result to `assessor`
|
||||
* annotate which variable you want to report as the final result (e.g. model accuracy) to `tuner`.
|
||||
|
||||
Again, take MNIST as an example, it only requires 2 steps to write a trial with NNI Annotation.
|
||||
|
||||
### Step 1 - Update codes with annotations
|
||||
|
||||
The following is a tensorflow code snippet for NNI Annotation, where the highlighted four lines are annotations that help you to:
|
||||
1. tune batch\_size and dropout\_rate
|
||||
2. report test\_acc every 100 steps
|
||||
3. at last report test\_acc as final result.
|
||||
|
||||
What noteworthy is: as these newly added codes are annotations, it does not actually change your previous codes logic, you can still run your code as usual in environments without NNI installed.
|
||||
|
||||
```diff
|
||||
with tf.Session() as sess:
|
||||
sess.run(tf.global_variables_initializer())
|
||||
+ """@nni.variable(nni.choice(50, 250, 500), name=batch_size)"""
|
||||
batch_size = 128
|
||||
for i in range(10000):
|
||||
batch = mnist.train.next_batch(batch_size)
|
||||
+ """@nni.variable(nni.choice(1, 5), name=dropout_rate)"""
|
||||
dropout_rate = 0.5
|
||||
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
|
||||
mnist_network.labels: batch[1],
|
||||
mnist_network.keep_prob: dropout_rate})
|
||||
if i % 100 == 0:
|
||||
test_acc = mnist_network.accuracy.eval(
|
||||
feed_dict={mnist_network.images: mnist.test.images,
|
||||
mnist_network.labels: mnist.test.labels,
|
||||
mnist_network.keep_prob: 1.0})
|
||||
+ """@nni.report_intermediate_result(test_acc)"""
|
||||
|
||||
test_acc = mnist_network.accuracy.eval(
|
||||
feed_dict={mnist_network.images: mnist.test.images,
|
||||
mnist_network.labels: mnist.test.labels,
|
||||
mnist_network.keep_prob: 1.0})
|
||||
+ """@nni.report_final_result(test_acc)"""
|
||||
```
|
||||
|
||||
**NOTE**:
|
||||
- `@nni.variable` will take effect on its following line, which is an assignment statement whose leftvalue must be specified by the keyword `name` in `@nni.variable`.
|
||||
- `@nni.report_intermediate_result`/`@nni.report_final_result` will send the data to assessor/tuner at that line.
|
||||
|
||||
For more information about annotation syntax and its usage, please refer to [Annotation README](../tools/nni_annotation/README.md) .
|
||||
|
||||
|
||||
### Step 2 - Enable NNI Annotation
|
||||
|
||||
In the yaml configure file, you need to set *useAnnotation* to true to enable NNI annotation:
|
||||
```
|
||||
useAnnotation: true
|
||||
```
|
||||
|
||||
<a name="more-examples"></a>
|
||||
## More Trial Examples
|
||||
|
||||
* [MNIST examples](mnist_examples.md)
|
||||
* [Finding out best optimizer for Cifar10 classification](cifar10_examples.md)
|
||||
* [How to tune Scikit-learn on NNI](sklearn_examples.md)
|
||||
* [Automatic Model Architecture Search for Reading Comprehension.](SQuAD_evolution_examples.md)
|
||||
* [Tuning GBDT on NNI](gbdt_example.md)
|
|
@ -0,0 +1,12 @@
|
|||
######################
|
||||
Tutorials
|
||||
######################
|
||||
|
||||
.. toctree::
|
||||
Installation
|
||||
Write Trial<Trials>
|
||||
Tuners<tuners>
|
||||
Assessors<assessors>
|
||||
WebUI
|
||||
Training Platform<training_services>
|
||||
advanced
|
|
@ -45,7 +45,6 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
|
|||
|
||||
![](./img/webui-img/detail-pai.png)
|
||||
|
||||
![](./img/webui-img/trialog.png)
|
||||
|
||||
* Kill: you can kill a job that status is running.
|
||||
* Support to search for a specific trial.
|
||||
|
|
|
@ -0,0 +1,6 @@
|
|||
Advanced Features
|
||||
=====================
|
||||
|
||||
.. toctree::
|
||||
MultiPhase<multiPhase>
|
||||
AdvancedNAS
|
|
@ -0,0 +1,17 @@
|
|||
Assessors
|
||||
==============
|
||||
In order to save our computing resources, NNI supports an early stop policy and creates **Assessor** to finish this job.
|
||||
|
||||
Assessor receives the intermediate result from Trial and decides whether the Trial should be killed by specific algorithm. Once the Trial experiment meets the early stop conditions(which means assessor is pessimistic about the final results), the assessor will kill the trial and the status of trial will be `"EARLY_STOPPED"`.
|
||||
|
||||
Here is an experimental result of MNIST after using 'Curvefitting' Assessor in 'maximize' mode, you can see that assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use assessor, we may get better hyperparameters under the same computing resources.
|
||||
|
||||
*Implemented code directory: config_assessor.yml <https://github.com/Microsoft/nni/blob/master/examples/trials/mnist/config_assessor.yml>*
|
||||
|
||||
.. image:: ./img/Assessor.png
|
||||
|
||||
Like Tuners, users can either use built-in Assessors, or customize an Assessor on their own. Please refer to the following tutorials for detail:
|
||||
|
||||
.. toctree::
|
||||
Builtin Assessors<Builtin_Assessors>
|
||||
Customized Assessors<Customize_Assessor>
|
|
@ -0,0 +1,84 @@
|
|||
# CIFAR-10 examples
|
||||
|
||||
## Overview
|
||||
|
||||
[CIFAR-10][3] classification is a common benchmark problem in machine learning. The CIFAR-10 dataset is the collection of images. It is one of the most widely used datasets for machine learning research which contains 60,000 32x32 color images in 10 different classes. Thus, we use CIFAR-10 classification as an example to introduce NNI usage.
|
||||
|
||||
### **Goals**
|
||||
|
||||
As we all know, the choice of model optimizer is directly affects the performance of the final matrix. The goal of this tutorial is to **tune a better performace optimizer** to train a relatively small convolutional neural network (CNN) for recognizing images.
|
||||
|
||||
In this example, we have selected the following common deep learning optimizer:
|
||||
|
||||
> "SGD", "Adadelta", "Adagrad", "Adam", "Adamax"
|
||||
|
||||
### **Experimental**
|
||||
|
||||
#### Preparations
|
||||
|
||||
This example requires pytorch. Pytorch install package should be chosen based on python version and cuda version.
|
||||
|
||||
Here is an example of the environment python==3.5 and cuda == 8.0, then using the following commands to install [pytorch][2]:
|
||||
|
||||
```bash
|
||||
python3 -m pip install http://download.pytorch.org/whl/cu80/torch-0.4.1-cp35-cp35m-linux_x86_64.whl
|
||||
python3 -m pip install torchvision
|
||||
```
|
||||
|
||||
#### CIFAR-10 with NNI
|
||||
|
||||
**Search Space**
|
||||
|
||||
As we stated in the target, we target to find out the best `optimizer` for training CIFAR-10 classification. When using different optimizers, we also need to adjust `learning rates` and `network structure` accordingly. so we chose these three parameters as hyperparameters and write the following search space.
|
||||
|
||||
```json
|
||||
{
|
||||
"lr":{"_type":"choice", "_value":[0.1, 0.01, 0.001, 0.0001]},
|
||||
"optimizer":{"_type":"choice", "_value":["SGD", "Adadelta", "Adagrad", "Adam", "Adamax"]},
|
||||
"model":{"_type":"choice", "_value":["vgg", "resnet18", "googlenet", "densenet121", "mobilenet", "dpn92", "senet18"]}
|
||||
}
|
||||
```
|
||||
|
||||
*Implemented code directory: [search_space.json][8]*
|
||||
|
||||
**Trial**
|
||||
|
||||
The code for CNN training of each hyperparameters set, paying particular attention to the following points are specific for NNI:
|
||||
|
||||
* Use `nni.get_next_parameter()` to get next training hyperparameter set.
|
||||
* Use `nni.report_intermediate_result(acc)` to report the intermedian result after finish each epoch.
|
||||
* Use `nni.report_intermediate_result(acc)` to report the final result before the trial end.
|
||||
|
||||
*Implemented code directory: [main.py][9]*
|
||||
|
||||
You can also use your previous code directly, refer to [How to define a trial][5] for modify.
|
||||
|
||||
**Config**
|
||||
|
||||
Here is the example of running this experiment on local(with multiple GPUs):
|
||||
|
||||
code directory: [examples/trials/cifar10_pytorch/config.yml][6]
|
||||
|
||||
Here is the example of running this experiment on OpenPAI:
|
||||
|
||||
code directory: [examples/trials/cifar10_pytorch/config_pai.yml][7]
|
||||
|
||||
*The complete examples we have implemented: [examples/trials/cifar10_pytorch/][1]*
|
||||
|
||||
#### Lauch the experiment
|
||||
|
||||
We are ready for the experiment, let's now **run the config.yml file from your command line to start the experiment**.
|
||||
|
||||
```bash
|
||||
nnictl create --config nni/examples/trials/cifar10_pytorch/config.yml
|
||||
```
|
||||
|
||||
[1]: https://github.com/Microsoft/nni/tree/master/examples/trials/cifar10_pytorch
|
||||
[2]: https://pytorch.org/
|
||||
[3]: https://www.cs.toronto.edu/~kriz/cifar.html
|
||||
[4]: https://github.com/Microsoft/nni/tree/master/examples/trials/cifar10_pytorch
|
||||
[5]: https://github.com/Microsoft/nni/blob/master/docs/howto_1_WriteTrial.md
|
||||
[6]: https://github.com/Microsoft/nni/blob/master/examples/trials/cifar10_pytorch/config.yml
|
||||
[7]: https://github.com/Microsoft/nni/blob/master/examples/trials/cifar10_pytorch/config_pai.yml
|
||||
[8]: https://github.com/Microsoft/nni/blob/master/examples/trials/cifar10_pytorch/search_space.json
|
||||
[9]: https://github.com/Microsoft/nni/blob/master/examples/trials/cifar10_pytorch/main.py
|
|
@ -0,0 +1,187 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file does only contain a selection of the most common options. For a
|
||||
# full list see the documentation:
|
||||
# http://www.sphinx-doc.org/en/master/config
|
||||
|
||||
# -- Path setup --------------------------------------------------------------
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
# import os
|
||||
# import sys
|
||||
# sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
import recommonmark
|
||||
from recommonmark.parser import CommonMarkParser
|
||||
|
||||
# -- Project information ---------------------------------------------------
|
||||
|
||||
project = 'Neural Network Intelligence'
|
||||
copyright = '2019, Microsoft'
|
||||
author = 'Microsoft'
|
||||
|
||||
# The short X.Y version
|
||||
version = ''
|
||||
# The full version, including alpha/beta/rc tags
|
||||
release = 'v0.5'
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#
|
||||
# needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = [
|
||||
'sphinx.ext.autodoc',
|
||||
'sphinx.ext.mathjax',
|
||||
'sphinx_markdown_tables',
|
||||
'sphinxarg.ext',
|
||||
]
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
#
|
||||
source_parsers = {
|
||||
'.md': CommonMarkParser
|
||||
}
|
||||
|
||||
source_suffix = ['.rst', '.md']
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#
|
||||
# This is also used if you do content translation via gettext catalogs.
|
||||
# Usually you set "language" from the command line for these cases.
|
||||
language = None
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This pattern also affects html_static_path and html_extra_path.
|
||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = None
|
||||
|
||||
|
||||
# -- Options for HTML output -------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#
|
||||
html_theme_options = {
|
||||
'logo_only': True,
|
||||
}
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# Custom sidebar templates, must be a dictionary that maps document names
|
||||
# to template names.
|
||||
#
|
||||
# The default sidebars (for documents that don't match any pattern) are
|
||||
# defined by theme itself. Builtin themes are using these templates by
|
||||
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
|
||||
# 'searchbox.html']``.
|
||||
#
|
||||
# html_sidebars = {}
|
||||
|
||||
html_logo = './img/nni_logo_dark.png'
|
||||
|
||||
# -- Options for HTMLHelp output ---------------------------------------------
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'NeuralNetworkIntelligencedoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output ------------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#
|
||||
# 'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#
|
||||
# 'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#
|
||||
# 'preamble': '',
|
||||
|
||||
# Latex figure (float) alignment
|
||||
#
|
||||
# 'figure_align': 'htbp',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
(master_doc, 'NeuralNetworkIntelligence.tex', 'Neural Network Intelligence Documentation',
|
||||
'Microsoft', 'manual'),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for manual page output ------------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [
|
||||
(master_doc, 'neuralnetworkintelligence', 'Neural Network Intelligence Documentation',
|
||||
[author], 1)
|
||||
]
|
||||
|
||||
|
||||
# -- Options for Texinfo output ----------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
(master_doc, 'NeuralNetworkIntelligence', 'Neural Network Intelligence Documentation',
|
||||
author, 'NeuralNetworkIntelligence', 'One line description of project.',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for Epub output -------------------------------------------------
|
||||
|
||||
# Bibliographic Dublin Core info.
|
||||
epub_title = project
|
||||
|
||||
# The unique identifier of the text. This can be a ISBN number
|
||||
# or the project homepage.
|
||||
#
|
||||
# epub_identifier = ''
|
||||
|
||||
# A unique identification for the text.
|
||||
#
|
||||
# epub_uid = ''
|
||||
|
||||
# A list of files that should not be packed into the epub file.
|
||||
epub_exclude_files = ['search.html']
|
||||
|
||||
|
||||
# -- Extension configuration -------------------------------------------------
|
|
@ -0,0 +1,185 @@
|
|||
# GBDT in nni
|
||||
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
|
||||
|
||||
Gradient boosting decision tree has many popular implementations, such as [lightgbm](https://github.com/Microsoft/LightGBM), [xgboost](https://github.com/dmlc/xgboost), and [catboost](https://github.com/catboost/catboost), etc. GBDT is a great tool for solving the problem of traditional machine learning problem. Since GBDT is a robust algorithm, it could use in many domains. The better hyper-parameters for GBDT, the better performance you could achieve.
|
||||
|
||||
NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently.
|
||||
|
||||
|
||||
## 1. Search Space in GBDT
|
||||
There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example):
|
||||
|
||||
> * For better accuracy
|
||||
* `learning_rate`. The range of `learning rate` could be [0.001, 0.9].
|
||||
|
||||
* `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them.
|
||||
|
||||
* `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10]
|
||||
|
||||
* `num_iterations`. May larger if underfitting.
|
||||
|
||||
> * For speed up
|
||||
* `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0].
|
||||
|
||||
* `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0].
|
||||
|
||||
* `max_bin`.
|
||||
|
||||
> * To avoid overfitting
|
||||
* `min_data_in_leaf`. This depends on your dataset.
|
||||
|
||||
* `min_sum_hessian_in_leaf`. This depend on your dataset.
|
||||
|
||||
* `lambda_l1` and `lambda_l2`.
|
||||
|
||||
* `min_gain_to_split`.
|
||||
|
||||
* `num_leaves`.
|
||||
|
||||
Reference link:
|
||||
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and
|
||||
[autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
|
||||
|
||||
## 2. Task description
|
||||
Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train).
|
||||
Given the features and label in train data, we train a GBDT regression model and use it to predict.
|
||||
|
||||
## 3. How to run in nni
|
||||
|
||||
### 3.1 Prepare your trial code
|
||||
You need to prepare a basic code as following:
|
||||
``` python
|
||||
|
||||
...
|
||||
|
||||
def get_default_parameters():
|
||||
...
|
||||
return params
|
||||
|
||||
|
||||
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
|
||||
'''
|
||||
Load or create dataset
|
||||
'''
|
||||
...
|
||||
|
||||
return lgb_train, lgb_eval, X_test, y_test
|
||||
|
||||
def run(lgb_train, lgb_eval, params, X_test, y_test):
|
||||
# train
|
||||
gbm = lgb.train(params,
|
||||
lgb_train,
|
||||
num_boost_round=20,
|
||||
valid_sets=lgb_eval,
|
||||
early_stopping_rounds=5)
|
||||
# predict
|
||||
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
|
||||
|
||||
# eval
|
||||
rmse = mean_squared_error(y_test, y_pred) ** 0.5
|
||||
print('The rmse of prediction is:', rmse)
|
||||
|
||||
if __name__ == '__main__':
|
||||
lgb_train, lgb_eval, X_test, y_test = load_data()
|
||||
|
||||
PARAMS = get_default_parameters()
|
||||
# train
|
||||
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
|
||||
```
|
||||
|
||||
### 3.2 Prepare your search space.
|
||||
If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`,
|
||||
you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
|
||||
```
|
||||
{
|
||||
"num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]},
|
||||
"learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
|
||||
"bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
|
||||
"bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]}
|
||||
}
|
||||
```
|
||||
|
||||
More support variable type you could reference [here](https://github.com/Microsoft/nni/blob/master/docs/SearchSpaceSpec.md).
|
||||
|
||||
### 3.3 Add SDK of nni into your code.
|
||||
```diff
|
||||
+import nni
|
||||
...
|
||||
|
||||
def get_default_parameters():
|
||||
...
|
||||
return params
|
||||
|
||||
|
||||
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
|
||||
'''
|
||||
Load or create dataset
|
||||
'''
|
||||
...
|
||||
|
||||
return lgb_train, lgb_eval, X_test, y_test
|
||||
|
||||
def run(lgb_train, lgb_eval, params, X_test, y_test):
|
||||
# train
|
||||
gbm = lgb.train(params,
|
||||
lgb_train,
|
||||
num_boost_round=20,
|
||||
valid_sets=lgb_eval,
|
||||
early_stopping_rounds=5)
|
||||
# predict
|
||||
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
|
||||
|
||||
# eval
|
||||
rmse = mean_squared_error(y_test, y_pred) ** 0.5
|
||||
print('The rmse of prediction is:', rmse)
|
||||
+ nni.report_final_result(rmse)
|
||||
|
||||
if __name__ == '__main__':
|
||||
lgb_train, lgb_eval, X_test, y_test = load_data()
|
||||
+ RECEIVED_PARAMS = nni.get_next_parameter()
|
||||
PARAMS = get_default_parameters()
|
||||
+ PARAMS.update(RECEIVED_PARAMS)
|
||||
PARAMS = get_default_parameters()
|
||||
PARAMS.update(RECEIVED_PARAMS)
|
||||
|
||||
# train
|
||||
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
|
||||
```
|
||||
|
||||
### 3.4 Write a config file and run it.
|
||||
In the config file, you could set some settings including:
|
||||
|
||||
* Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc.
|
||||
* Platform setting: `trainingServicePlatform`, etc.
|
||||
* Path seeting: `searchSpacePath`, `trial codeDir`, etc.
|
||||
* Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc.
|
||||
|
||||
An config.yml as follow:
|
||||
```yml
|
||||
authorName: default
|
||||
experimentName: example_auto-gbdt
|
||||
trialConcurrency: 1
|
||||
maxExecDuration: 10h
|
||||
maxTrialNum: 10
|
||||
#choice: local, remote, pai
|
||||
trainingServicePlatform: local
|
||||
searchSpacePath: search_space.json
|
||||
#choice: true, false
|
||||
useAnnotation: false
|
||||
tuner:
|
||||
#choice: TPE, Random, Anneal, Evolution, BatchTuner
|
||||
#SMAC (SMAC should be installed through nnictl)
|
||||
builtinTunerName: TPE
|
||||
classArgs:
|
||||
#choice: maximize, minimize
|
||||
optimize_mode: minimize
|
||||
trial:
|
||||
command: python3 main.py
|
||||
codeDir: .
|
||||
gpuNum: 0
|
||||
```
|
||||
|
||||
Run this experiment with command as follow:
|
||||
```
|
||||
nnictl create --config ./config.yml
|
||||
```
|
|
@ -1,5 +1,4 @@
|
|||
**Write a Trial Run on NNI**
|
||||
===
|
||||
# Write a Trial Run on NNI
|
||||
|
||||
A **Trial** in NNI is an individual attempt at applying a set of parameters on a model.
|
||||
|
||||
|
|
После Ширина: | Высота: | Размер: 119 KiB |
После Ширина: | Высота: | Размер: 162 KiB |
После Ширина: | Высота: | Размер: 68 KiB |
После Ширина: | Высота: | Размер: 92 KiB |
После Ширина: | Высота: | Размер: 292 KiB |
После Ширина: | Высота: | Размер: 72 KiB |
После Ширина: | Высота: | Размер: 118 KiB |
После Ширина: | Высота: | Размер: 37 KiB |
После Ширина: | Высота: | Размер: 33 KiB |
После Ширина: | Высота: | Размер: 22 KiB |
|
@ -0,0 +1,21 @@
|
|||
#########################################
|
||||
Neural Network Intelligence Documentation
|
||||
#########################################
|
||||
|
||||
********
|
||||
Contents
|
||||
********
|
||||
|
||||
.. toctree::
|
||||
:caption: Table of Contents
|
||||
:maxdepth: 2
|
||||
:glob:
|
||||
|
||||
Overview
|
||||
GetStarted<QuickStart>
|
||||
Tutorials
|
||||
Examples
|
||||
Reference
|
||||
FAQ
|
||||
Contribution
|
||||
Changelog<RELEASE>
|
|
@ -0,0 +1,68 @@
|
|||
# MNIST examples
|
||||
|
||||
CNN MNIST classifier for deep learning is similar to `hello world` for programming languages. Thus, we use MNIST as example to introduce different features of NNI. The examples are listed below:
|
||||
|
||||
- [MNIST with NNI API](#mnist)
|
||||
- [MNIST with NNI annotation](#mnist-annotation)
|
||||
- [MNIST in keras](#mnist-keras)
|
||||
- [MNIST -- tuning with batch tuner](#mnist-batch)
|
||||
- [MNIST -- tuning with hyperband](#mnist-hyperband)
|
||||
- [MNIST -- tuning within a nested search space](#mnist-nested)
|
||||
- [distributed MNIST (tensorflow) using kubeflow](#mnist-kubeflow-tf)
|
||||
- [distributed MNIST (pytorch) using kubeflow](#mnist-kubeflow-pytorch)
|
||||
|
||||
<a name="mnist"></a>
|
||||
**MNIST with NNI API**
|
||||
|
||||
This is a simple network which has two convolutional layers, two pooling layers and a fully connected layer. We tune hyperparameters, such as dropout rate, convolution size, hidden size, etc. It can be tuned with most NNI built-in tuners, such as TPE, SMAC, Random. We also provide an exmaple yaml file which enables assessor.
|
||||
|
||||
`code directory: examples/trials/mnist/`
|
||||
|
||||
<a name="mnist-annotation"></a>
|
||||
**MNIST with NNI annotation**
|
||||
|
||||
This example is similar to the example above, the only difference is that this example uses NNI annotation to specify search space and report results, while the example above uses NNI apis to receive configuration and report results.
|
||||
|
||||
`code directory: examples/trials/mnist-annotation/`
|
||||
|
||||
<a name="mnist-keras"></a>
|
||||
**MNIST in keras**
|
||||
|
||||
This example is implemented in keras. It is also a network for MNIST dataset, with two convolution layers, one pooling layer, and two fully connected layers.
|
||||
|
||||
`code directory: examples/trials/mnist-keras/`
|
||||
|
||||
<a name="mnist-batch"></a>
|
||||
**MNIST -- tuning with batch tuner**
|
||||
|
||||
This example is to show how to use batch tuner. Users simply list all the configurations they want to try in the search space file. NNI will try all of them.
|
||||
|
||||
`code directory: examples/trials/mnist-batch-tune-keras/`
|
||||
|
||||
<a name="mnist-hyperband"></a>
|
||||
**MNIST -- tuning with hyperband**
|
||||
|
||||
This example is to show how to use hyperband to tune the model. There is one more key `STEPS` in the received configuration for trials to control how long it can run (e.g., number of iterations).
|
||||
|
||||
`code directory: examples/trials/mnist-hyperband/`
|
||||
|
||||
<a name="mnist-nested"></a>
|
||||
**MNIST -- tuning within a nested search space**
|
||||
|
||||
This example is to show that NNI also support nested search space. The search space file is an example of how to define nested search space.
|
||||
|
||||
`code directory: examples/trials/mnist-cascading-search-space/`
|
||||
|
||||
<a name="mnist-kubeflow-tf"></a>
|
||||
**distributed MNIST (tensorflow) using kubeflow**
|
||||
|
||||
This example is to show how to run distributed training on kubeflow through NNI. Users can simply provide distributed training code and a configure file which specifies the kubeflow mode. For example, what is the command to run ps and what is the command to run worker, and how many resources they consume. This example is implemented in tensorflow, thus, uses kubeflow tensorflow operator.
|
||||
|
||||
`code directory: examples/trials/mnist-distributed/`
|
||||
|
||||
<a name="mnist-kubeflow-pytorch"></a>
|
||||
**distributed MNIST (pytorch) using kubeflow**
|
||||
|
||||
Similar to the previous example, the difference is that this example is implemented in pytorch, thus, it uses kubeflow pytorch operator.
|
||||
|
||||
`code directory: examples/trials/mnist-distributed-pytorch/`
|
|
@ -0,0 +1,7 @@
|
|||
sphinx==1.8.3
|
||||
sphinx-argparse==0.2.5
|
||||
sphinx-markdown-tables==0.0.9
|
||||
sphinx-rtd-theme==0.4.2
|
||||
sphinxcontrib-websupport==1.1.0
|
||||
recommonmark==0.5.0
|
||||
nni==0.5
|
|
@ -0,0 +1,51 @@
|
|||
###########################
|
||||
Python API Reference
|
||||
###########################
|
||||
|
||||
API for trial code
|
||||
------------------------
|
||||
.. autofunction:: nni.get_next_parameter
|
||||
.. autofunction:: nni.get_current_parameter
|
||||
.. autofunction:: nni.report_intermediate_result
|
||||
.. autofunction:: nni.report_final_result
|
||||
.. autofunction:: nni.get_sequence_id
|
||||
|
||||
|
||||
API for tuners
|
||||
------------------------
|
||||
.. autoclass:: nni.tuner.Tuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.hyperopt_tuner.hyperopt_tuner.HyperoptTuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.batch_tuner.batch_tuner.BatchTuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.evolution_tuner.evolution_tuner.EvolutionTuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.gridsearch_tuner.gridsearch_tuner.GridSearchTuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.networkmorphism_tuner.networkmorphism_tuner.NetworkMorphismTuner
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.smac_tuner.smac_tuner.SMACTuner
|
||||
:members:
|
||||
|
||||
API for assessors
|
||||
------------------------
|
||||
.. autoclass:: nni.assessor.Assessor
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.curvefitting_assessor.curvefitting_assessor.CurvefittingAssessor
|
||||
:members:
|
||||
|
||||
.. autoclass:: nni.medianstop_assessor.medianstop_assessor.MedianstopAssessor
|
||||
:members:
|
||||
|
||||
|
||||
API for Advisors
|
||||
------------------------
|
||||
.. autoclass:: nni.hyperband_advisor.hyperband_advisor.Hyperband
|
|
@ -0,0 +1,69 @@
|
|||
# Scikit-learn in NNI
|
||||
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) is a pupular meachine learning tool for data mining and data analysis. It supports many kinds of meachine learning models like LinearRegression, LogisticRegression, DecisionTree, SVM etc. How to make the use of scikit-learn more efficiency is a valuable topic.
|
||||
NNI supports many kinds of tuning algorithms to search the best models and/or hyper-parameters for scikit-learn, and support many kinds of environments like local machine, remote servers and cloud.
|
||||
|
||||
## 1. How to run the example
|
||||
To start using NNI, you should install the nni package, and use the command line tool `nnictl` to start an experiment. For more information about installation and preparing for the environment, please [refer](GetStarted.md).
|
||||
After you installed NNI, you could enter the corresponding folder and start the experiment using following commands:
|
||||
```
|
||||
nnictl create --config ./config.yml
|
||||
```
|
||||
|
||||
## 2. Description of the example
|
||||
|
||||
|
||||
### 2.1 classification
|
||||
This example uses the dataset of digits, which is made up of 1797 8x8 images, and each image is a hand-written digit, the goal is to classify these images into 10 classes.
|
||||
In this example, we use SVC as the model, and choose some parameters of this model, including `"C", "keral", "degree", "gamma" and "coef0"`. For more information of these parameters, please [refer](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).
|
||||
|
||||
|
||||
### 2.2 regression
|
||||
This example uses the Boston Housing Dataset, this dataset consists of price of houses in various places in Boston and the information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE) etc to predict the house price of boston.
|
||||
In this example, we tune different kinds of regression models including `"LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"` and some parameters like `"svr_kernel", "knr_weights"`. You could get more details about these models from [here](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning).
|
||||
|
||||
## 3. How to write sklearn code using nni
|
||||
It is easy to use nni in your sklearn code, there are only a few steps.
|
||||
* __step 1__
|
||||
Prepare a search_space.json to storage your choose spaces.
|
||||
For example, if you want to choose different models, you may try:
|
||||
```
|
||||
{
|
||||
"model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]}
|
||||
}
|
||||
```
|
||||
If you want to choose different models and parameters, you could put them together in a search_space.json file.
|
||||
```
|
||||
{
|
||||
"model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]},
|
||||
"svr_kernel": {"_type":"choice","_value":["linear", "poly", "rbf"]},
|
||||
"knr_weights": {"_type":"choice","_value":["uniform", "distance"]}
|
||||
}
|
||||
```
|
||||
Then you could read these values as a dict from your python code, please get into the step 2.
|
||||
* __step 2__
|
||||
At the beginning of your python code, you should `import nni` to insure the packages works normally.
|
||||
First, you should use `nni.get_next_parameter()` function to get your parameters given by nni. Then you could use these parameters to update your code.
|
||||
For example, if you define your search_space.json like following format:
|
||||
```
|
||||
{
|
||||
"C": {"_type":"uniform","_value":[0.1, 1]},
|
||||
"keral": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
|
||||
"degree": {"_type":"choice","_value":[1, 2, 3, 4]},
|
||||
"gamma": {"_type":"uniform","_value":[0.01, 0.1]},
|
||||
"coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
|
||||
}
|
||||
```
|
||||
You may get a parameter dict like this:
|
||||
```
|
||||
params = {
|
||||
'C': 1.0,
|
||||
'keral': 'linear',
|
||||
'degree': 3,
|
||||
'gamma': 0.01,
|
||||
'coef0': 0.01
|
||||
}
|
||||
```
|
||||
Then you could use these variables to write your scikit-learn code.
|
||||
* __step 3__
|
||||
After you finished your training, you could get your own score of the model, like your percision, recall or MSE etc. NNI needs your score to tuner algorithms and generate next group of parameters, please report the score back to NNI and start next trial job.
|
||||
You just need to use `nni.report_final_result(score)` to communitate with NNI after you process your scikit-learn code. Or if you have multiple scores in the steps of training, you could also report them back to NNI using `nni.report_intemediate_result(score)`. Note, you may not report intemediate result of your job, but you must report back your final result.
|
|
@ -0,0 +1,9 @@
|
|||
Introduction to NNI Training Services
|
||||
=====================================
|
||||
|
||||
.. toctree::
|
||||
Local<tutorial_1_CR_exp_local_api>
|
||||
Remote<RemoteMachineMode>
|
||||
PAI<PAIMode>
|
||||
Kubeflow<KubeflowMode>
|
||||
FrameworkController Mode<FrameworkControllerMode>
|
|
@ -0,0 +1,19 @@
|
|||
#################
|
||||
Tuners
|
||||
#################
|
||||
|
||||
Overview
|
||||
-----------------
|
||||
|
||||
NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**.
|
||||
|
||||
Tuner receives the result from `Trial` as a matrix to evaluate the performance of a specific parameters/architecture configures. And tuner sends next hyper-parameter or architecture configure to Trial.
|
||||
|
||||
In NNI, we support two approaches to set the tuner: first is directly use builtin tuner provided by nni sdk, second is customize a tuner file by yourself. We also have Advisor that combines the functinality of Tuner & Assessor.
|
||||
|
||||
For details, please refer to the following tutorials:
|
||||
|
||||
.. toctree::
|
||||
Builtin Tuners<Builtin_Tuner>
|
||||
Customized Tuners<Customize_Tuner>
|
||||
Customized Advisor<Customize_Advisor>
|
|
@ -2,7 +2,7 @@ authorName: default
|
|||
experimentName: example_mnist
|
||||
trialConcurrency: 1
|
||||
maxExecDuration: 1h
|
||||
maxTrialNum: 20
|
||||
maxTrialNum: 50
|
||||
#choice: local, remote
|
||||
trainingServicePlatform: local
|
||||
searchSpacePath: search_space.json
|
||||
|
@ -17,10 +17,12 @@ tuner:
|
|||
optimize_mode: maximize
|
||||
assessor:
|
||||
#choice: Medianstop, Curvefitting
|
||||
builtinAssessorName: Medianstop
|
||||
builtinAssessorName: Curvefitting
|
||||
classArgs:
|
||||
#choice: maximize, minimize
|
||||
optimize_mode: maximize
|
||||
epoch_num: 20
|
||||
threshold: 0.9
|
||||
trial:
|
||||
command: python3 mnist.py
|
||||
codeDir: .
|
||||
|
|
|
@ -0,0 +1,223 @@
|
|||
"""A deep MNIST classifier using convolutional layers."""
|
||||
import argparse
|
||||
import logging
|
||||
import math
|
||||
import tempfile
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.examples.tutorials.mnist import input_data
|
||||
|
||||
FLAGS = None
|
||||
|
||||
logger = logging.getLogger('mnist_AutoML')
|
||||
|
||||
|
||||
class MnistNetwork(object):
|
||||
'''
|
||||
MnistNetwork is for initializing and building basic network for mnist.
|
||||
'''
|
||||
|
||||
def __init__(self,
|
||||
channel_1_num,
|
||||
channel_2_num,
|
||||
conv_size,
|
||||
hidden_size,
|
||||
pool_size,
|
||||
learning_rate,
|
||||
x_dim=784,
|
||||
y_dim=10):
|
||||
self.channel_1_num = channel_1_num
|
||||
self.channel_2_num = channel_2_num
|
||||
self.conv_size = conv_size
|
||||
self.hidden_size = hidden_size
|
||||
self.pool_size = pool_size
|
||||
self.learning_rate = learning_rate
|
||||
self.x_dim = x_dim
|
||||
self.y_dim = y_dim
|
||||
|
||||
self.images = tf.placeholder(
|
||||
tf.float32, [None, self.x_dim], name='input_x')
|
||||
self.labels = tf.placeholder(
|
||||
tf.float32, [None, self.y_dim], name='input_y')
|
||||
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
|
||||
|
||||
self.train_step = None
|
||||
self.accuracy = None
|
||||
|
||||
def build_network(self):
|
||||
'''
|
||||
Building network for mnist
|
||||
'''
|
||||
|
||||
# Reshape to use within a convolutional neural net.
|
||||
# Last dimension is for "features" - there is only one here, since images are
|
||||
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
|
||||
with tf.name_scope('reshape'):
|
||||
try:
|
||||
input_dim = int(math.sqrt(self.x_dim))
|
||||
except:
|
||||
print(
|
||||
'input dim cannot be sqrt and reshape. input dim: ' + str(self.x_dim))
|
||||
logger.debug(
|
||||
'input dim cannot be sqrt and reshape. input dim: %s', str(self.x_dim))
|
||||
raise
|
||||
x_image = tf.reshape(self.images, [-1, input_dim, input_dim, 1])
|
||||
|
||||
# First convolutional layer - maps one grayscale image to 32 feature maps.
|
||||
with tf.name_scope('conv1'):
|
||||
w_conv1 = weight_variable(
|
||||
[self.conv_size, self.conv_size, 1, self.channel_1_num])
|
||||
b_conv1 = bias_variable([self.channel_1_num])
|
||||
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
|
||||
|
||||
# Pooling layer - downsamples by 2X.
|
||||
with tf.name_scope('pool1'):
|
||||
h_pool1 = max_pool(h_conv1, self.pool_size)
|
||||
|
||||
# Second convolutional layer -- maps 32 feature maps to 64.
|
||||
with tf.name_scope('conv2'):
|
||||
w_conv2 = weight_variable([self.conv_size, self.conv_size,
|
||||
self.channel_1_num, self.channel_2_num])
|
||||
b_conv2 = bias_variable([self.channel_2_num])
|
||||
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
|
||||
|
||||
# Second pooling layer.
|
||||
with tf.name_scope('pool2'):
|
||||
h_pool2 = max_pool(h_conv2, self.pool_size)
|
||||
|
||||
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
|
||||
# is down to 7x7x64 feature maps -- maps this to 1024 features.
|
||||
last_dim = int(input_dim / (self.pool_size * self.pool_size))
|
||||
with tf.name_scope('fc1'):
|
||||
w_fc1 = weight_variable(
|
||||
[last_dim * last_dim * self.channel_2_num, self.hidden_size])
|
||||
b_fc1 = bias_variable([self.hidden_size])
|
||||
|
||||
h_pool2_flat = tf.reshape(
|
||||
h_pool2, [-1, last_dim * last_dim * self.channel_2_num])
|
||||
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
|
||||
|
||||
# Dropout - controls the complexity of the model, prevents co-adaptation of features.
|
||||
with tf.name_scope('dropout'):
|
||||
h_fc1_drop = tf.nn.dropout(h_fc1, self.keep_prob)
|
||||
|
||||
# Map the 1024 features to 10 classes, one for each digit
|
||||
with tf.name_scope('fc2'):
|
||||
w_fc2 = weight_variable([self.hidden_size, self.y_dim])
|
||||
b_fc2 = bias_variable([self.y_dim])
|
||||
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2
|
||||
|
||||
with tf.name_scope('loss'):
|
||||
cross_entropy = tf.reduce_mean(
|
||||
tf.nn.softmax_cross_entropy_with_logits(labels=self.labels, logits=y_conv))
|
||||
with tf.name_scope('adam_optimizer'):
|
||||
self.train_step = tf.train.AdamOptimizer(
|
||||
self.learning_rate).minimize(cross_entropy)
|
||||
|
||||
with tf.name_scope('accuracy'):
|
||||
correct_prediction = tf.equal(
|
||||
tf.argmax(y_conv, 1), tf.argmax(self.labels, 1))
|
||||
self.accuracy = tf.reduce_mean(
|
||||
tf.cast(correct_prediction, tf.float32))
|
||||
|
||||
|
||||
def conv2d(x_input, w_matrix):
|
||||
"""conv2d returns a 2d convolution layer with full stride."""
|
||||
return tf.nn.conv2d(x_input, w_matrix, strides=[1, 1, 1, 1], padding='SAME')
|
||||
|
||||
|
||||
def max_pool(x_input, pool_size):
|
||||
"""max_pool downsamples a feature map by 2X."""
|
||||
return tf.nn.max_pool(x_input, ksize=[1, pool_size, pool_size, 1],
|
||||
strides=[1, pool_size, pool_size, 1], padding='SAME')
|
||||
|
||||
|
||||
def weight_variable(shape):
|
||||
"""weight_variable generates a weight variable of a given shape."""
|
||||
initial = tf.truncated_normal(shape, stddev=0.1)
|
||||
return tf.Variable(initial)
|
||||
|
||||
|
||||
def bias_variable(shape):
|
||||
"""bias_variable generates a bias variable of a given shape."""
|
||||
initial = tf.constant(0.1, shape=shape)
|
||||
return tf.Variable(initial)
|
||||
|
||||
|
||||
def main(params):
|
||||
'''
|
||||
Main function, build mnist network, run and send result to NNI.
|
||||
'''
|
||||
# Import data
|
||||
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
|
||||
print('Mnist download data down.')
|
||||
logger.debug('Mnist download data down.')
|
||||
|
||||
# Create the model
|
||||
# Build the graph for the deep net
|
||||
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'],
|
||||
channel_2_num=params['channel_2_num'],
|
||||
conv_size=params['conv_size'],
|
||||
hidden_size=params['hidden_size'],
|
||||
pool_size=params['pool_size'],
|
||||
learning_rate=params['learning_rate'])
|
||||
mnist_network.build_network()
|
||||
logger.debug('Mnist build network done.')
|
||||
|
||||
# Write log
|
||||
graph_location = tempfile.mkdtemp()
|
||||
logger.debug('Saving graph to: %s', graph_location)
|
||||
train_writer = tf.summary.FileWriter(graph_location)
|
||||
train_writer.add_graph(tf.get_default_graph())
|
||||
|
||||
test_acc = 0.0
|
||||
with tf.Session() as sess:
|
||||
sess.run(tf.global_variables_initializer())
|
||||
for i in range(params['batch_num']):
|
||||
batch = mnist.train.next_batch(params['batch_size'])
|
||||
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
|
||||
mnist_network.labels: batch[1],
|
||||
mnist_network.keep_prob: 1 - params['dropout_rate']}
|
||||
)
|
||||
|
||||
if i % 100 == 0:
|
||||
test_acc = mnist_network.accuracy.eval(
|
||||
feed_dict={mnist_network.images: mnist.test.images,
|
||||
mnist_network.labels: mnist.test.labels,
|
||||
mnist_network.keep_prob: 1.0})
|
||||
|
||||
logger.debug('test accuracy %g', test_acc)
|
||||
logger.debug('Pipe send intermediate result done.')
|
||||
|
||||
test_acc = mnist_network.accuracy.eval(
|
||||
feed_dict={mnist_network.images: mnist.test.images,
|
||||
mnist_network.labels: mnist.test.labels,
|
||||
mnist_network.keep_prob: 1.0})
|
||||
|
||||
logger.debug('Final result is %g', test_acc)
|
||||
logger.debug('Send final result done.')
|
||||
|
||||
def get_params():
|
||||
''' Get parameters from command line '''
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--data_dir", type=str, default='/tmp/tensorflow/mnist/input_data', help="data directory")
|
||||
parser.add_argument("--dropout_rate", type=float, default=0.5, help="dropout rate")
|
||||
parser.add_argument("--channel_1_num", type=int, default=32)
|
||||
parser.add_argument("--channel_2_num", type=int, default=64)
|
||||
parser.add_argument("--conv_size", type=int, default=5)
|
||||
parser.add_argument("--pool_size", type=int, default=2)
|
||||
parser.add_argument("--hidden_size", type=int, default=1024)
|
||||
parser.add_argument("--learning_rate", type=float, default=1e-4)
|
||||
parser.add_argument("--batch_num", type=int, default=2000)
|
||||
parser.add_argument("--batch_size", type=int, default=32)
|
||||
|
||||
args, _ = parser.parse_known_args()
|
||||
return args
|
||||
|
||||
if __name__ == '__main__':
|
||||
try:
|
||||
params = vars(get_params())
|
||||
main(params)
|
||||
except Exception as exception:
|
||||
logger.exception(exception)
|
||||
raise
|