nni/docs/ExperimentConfig.md

11 KiB

Experiment config reference

If you want to create a new nni experiment, you need to prepare a config file in your local machine, and provide the path of this file to nnictl. The config file is written in yaml format, and need to be written correctly. This document describes the rule to write config file, and will provide some examples and templates for you.

Template

  • light weight(without Annotation and Assessor)
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
trial:
  command: 
  codeDir: 
  gpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
  • Use Assessor
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
searchSpacePath: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
assessor:
  #choice: Medianstop
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
trial:
  command: 
  codeDir: 
  gpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 
  • Use Annotation
authorName: 
experimentName: 
trialConcurrency: 
maxExecDuration: 
maxTrialNum: 
#choice: local, remote
trainingServicePlatform: 
#choice: true, false
useAnnotation: 
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
assessor:
  #choice: Medianstop
  builtinAssessorName:
  classArgs:
    #choice: maximize, minimize
    optimize_mode:
  gpuNum: 
trial:
  command: 
  codeDir: 
  gpuNum: 
#machineList can be empty if the platform is local
machineList:
  - ip: 
    port: 
    username: 
    passwd: 

Configuration

  • authorName

    • Description

      authorName is the name of the author who create the experiment. TBD: add default value

  • experimentName

    • Description

      experimentName is the name of the experiment you created.
      TBD: add default value

  • trialConcurrency

    • Description

      trialConcurrency specifies the max num of trial jobs run simultaneously.

      Note: if you set trialGpuNum bigger than the free gpu numbers in your machine, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
      
  • maxExecDuration

    • Description

      maxExecDuration specifies the max duration time of an experiment.The unit of the time is {s, m, h, d}, which means {seconds, minutes, hours, days}.

  • maxTrialNum

    • Description

      maxTrialNum specifies the max number of trial jobs created by nni, including successed and failed jobs.

  • trainingServicePlatform

    • Description

      trainingServicePlatform specifies the platform to run the experiment, including {local, remote}.

      • local mode means you run an experiment in your local linux machine.

      • remote mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete machineList field.

  • searchSpacePath

    • Description

      searchSpacePath specifies the path of search space file you want to use, which should be a valid path in your local linux machine.

      Note: if you set useAnnotation=True, you should remove searchSpacePath field or just let it be empty.
      
  • useAnnotation

    • Description

      useAnnotation means whether you use annotation to analysis your code and generate search space.

      Note: if you set useAnnotation=True, you should not set searchSpacePath.
      
  • tuner

    • Description

      tuner specifies the tuner algorithm you use to run an experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, you just need to set builtinTunerName and classArgs. Another way is to use your own tuner file, and you need to set codeDirectory, classFileName, className and classArgs.

    • builtinTunerName and classArgs

      • builtinTunerName

        builtinTunerName specifies the name of system tuner you want to use, nni sdk provides four kinds of tuner, including {TPE, Random, Anneal, Evolution}

      • classArgs

        classArgs specifies the arguments of tuner algorithm

    • codeDir, classFileName, className and classArgs

      • codeDir

        codeDir specifies the directory of tuner code.

        • classFileName

      classFileName specifies the name of tuner file.

      • className

      className specifies the name of tuner class.

      • classArgs

      classArgs specifies the arguments of tuner algorithm.

    • gpuNum

      gpuNum specifies the gpu number you want to use to run the tuner process. The value of this field should be a positive number.

      Note: you could only specify one way to set tuner, for example, you could set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and you could not set them both. 
      
  • assessor

    • Description

      assessor specifies the assessor algorithm you use to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, you just need to set builtinAssessorName and classArgs. Another way is to use your own tuner file, and you need to set codeDirectory, classFileName, className and classArgs.

    • builtinAssessorName and classArgs

      • builtinAssessorName

        builtinAssessorName specifies the name of system assessor you want to use, nni sdk provides four kinds of tuner, including {TPE, Random, Anneal, Evolution}

      • classArgs

        classArgs specifies the arguments of tuner algorithm

    • codeDir, classFileName, className and classArgs

      • codeDir

        codeDir specifies the directory of tuner code.

        • classFileName

      classFileName specifies the name of tuner file.

      • className

      className specifies the name of tuner class.

      • classArgs

      classArgs specifies the arguments of tuner algorithm.

    • gpuNum

      gpuNum specifies the gpu number you want to use to run the assessor process. The value of this field should be a positive number.

      Note: you could only specify one way to set assessor, for example, you could set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and you could not set them both.If you do not want to use assessor, you just need to leave assessor empty or remove assessor in your config file. Default value is 0. 
      
  • trial

    • command

      command specifies the command to run trial process.

    • codeDir

      codeDir specifies the directory of your own trial file.

    • gpuNum

      gpuNum specifies the num of gpu you want to use to run your trial process. Default value is 0.

  • machineList

    machineList should be set if you set trainingServicePlatform=remote, or it could be empty.

    • ip

      ip is the ip address of your remote machine.

    • port

      port is the ssh port you want to use to connect machine.

      Note: if you set port empty, the default value will be 22.
      
    • username

      username is the account you use.

    • passwd

      passwd specifies the password of your account.

    • sshKeyPath

      If you want to use ssh key to login remote machine, you could set sshKeyPath in config file. sshKeyPath is the path of ssh key file, which should be valid.

      Note: if you set passwd and sshKeyPath simultaneously, nni will try passwd.
      
    • passphrase

      passphrase is used to protect ssh key, which could be empty if you don't have passphrase.

Examples

  • local mode

    If you want to run your trial jobs in your local machine, and use annotation to generate search space, you could use the following config:

authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
trial:
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0

If you want to use assessor, you could add assessor configuration in your file.

authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
assessor:
  #choice: Medianstop
  builtinAssessorName: Medianstop
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
trial:
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0

Or you could specify your own tuner and assessor file as following:

authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  codeDir: /nni/tuner
  classFileName: mytuner.py
  className: MyTuner
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
assessor:
  codeDir: /nni/assessor
  classFileName: myassessor.py
  className: MyAssessor
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
trial:
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
  • remote mode

If you want run trial jobs in your remote machine, you could specify the remote mahcine information as fllowing format:

authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: maximize
  gpuNum: 0
trial:
  command: python3 mnist.py
  codeDir: /nni/mnist
  gpuNum: 0
#machineList can be empty if the platform is local
machineList:
  - ip: 10.10.10.10
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.11
    port: 22
    username: test
    passwd: test
  - ip: 10.10.10.12
    port: 22
    username: test
    sshKeyPath: /nni/sshkey
    passphrase: qwert