Update features in v0.2 into branch master to release a new version (#297)

* refine readme

* feat: refine data push/pull (#138)

* feat: refine data push/pull

* test: add cli provision testing

* fix: style fix

* fix: add necessary comments

* fix: from code review

* add fall back function in weather download (#112)

* fix deployment issue in multi envs

* fix typo

* fix ~/.maro not exist issue in build

* skip deploy when build

* update for comments

* temporarily disable weather info

* replace ecr with cim in setup.py

* replace ecr in manifest

* remove weather check when read data

* fix station id issue

* fix format

* add TODO in comments

* add noaa weather source

* fix weather reset and weather comment

* add comment for weather data url

* some format update

* add fall back function in weather download

* update comment

* update for comments

* update comment

* add period

* fix for pylint

* update for pylint check

* added example docs (#136)

* added example docs

* added citibike greedy example doc

* modified citibike doc

* fixed PR comments

* fixed more PR comments

* fixed small formatting issue

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* switch the key and value of handler_dict in decorator (#144)

* switch the key and value of handler_dict in decorator

* add dist decorator UT and fixed multithreading conflict in maro test suite

* pr comments update.

* resolved comments about decorator UT

* rename handler_fun in dist decorator

* change self.attr into class_name.attr

* update UT tests comments

* V0.1 annotation (#147)

* refine the annotation of simulator core

* remove reward from env(be)

* format refined

* white spaces test

* left-padding spaces refined

* format modifed

* update the left-padding spaces of docstrings

* code format updated

* update according to comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Event payload details for env.summary (#156)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* V0.2 online lp for citi bike (#159)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

* online LP example added for citi bike

* infeasible solution

* infeasible solution fixed: call snapshot before any env.step()

* experiment results of toy topos added

* experiment results of toy topos added

* experiment result update: better than naive baseline

* PuLP version added

* greedy experiment results update

* citibike result update

* modified according to PR comments

* update experiment results and forecasting comparison

* citi bike lp README updated

* README updated

* modified according to PR comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* update according to flake8

* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)

* 1. added logical operator overloading for early stopping checker; 2. added mean value checker

* fixed PR comments

* removed learner.exit() in single_process_launcher

* added another early stopping checker in example

* fixed PR comments and lint issues

* lint issue fix

* fixed lint issues

* fixed a bug

* fixed a bug

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 skip connection (#176)

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* moved reward type casting to exp shaper

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* fixed a bug in learner's test() (#193)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 double dqn (#188)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* set is_double to true in DQN config

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature predefined image (#183)

* feat: support predefined image provision

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature proxy rejoin (#158)

* update dist decorator

* replace proxy.get_peers by proxy.peers

* update proxy rejoin (draft, not runable for proxy rejoin)

* fix bugs in proxy

* add message cache, and redesign rejoin parameter

* feat: add checkpoint with test

* update proxy.rejoin

* fixed rejoin bug, rename func

* add test example(temp)

* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.

* capital env vari name

* rm json.dumps; change retries to 10; temp add warning level for rejoin

* fix: unable to load FaultToleranceAgent, missing params

* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent

* feat: add node_id to node_details

* fix: add a new dependency for tests

* style: meet linting requirements

* style: remaining linting problems

* lint fixed; rm temp test folder.

* fixed lint f-string without placeholder

* fix: add a flag for "remove_container", refine restart logic and Redis keys naming

* proxy rejoin update.

* variable rename.

* fixed lint issues

* fixed lint issues

* add exit code for different error

* feat: add special errors handler

* add max rejoin times

* remove unused import

* add rejoin UT; resolve rejoin comments

* lint fixed

* fixed UT import problem

* rm MessageCache in proxy

* fix: refine key naming

* update proxy rejoin; add topic for broadcast

* feat: support predefined image provision

* update UT for communication

* add docstring for rejoin

* fixed isort and zmq driver import

* fixed isort and UT test

* fix isort issue

* proxy rejoin update (comments v2)

* fixed isort error

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* feat: add exists method for checkpoint

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* add driver close and socket SUB disconnect for rejoin

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

* fixed comments and update logger level

* mv driver in proxy.__init__ for issue temp fixed.

* Update docstring and comments

* style: fix code reviews problems

* fix code format

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 feature cli windows (#203)

* fix: change local mkdir to os.makedirs

* fix: add utf8 encoding for logger

* fix: add powershell.exe prefix to subprocess functions

* feat: add debug_green

* fix: use fsutil to create fix-size files in Windows

* fix: use universal_newlines=True to handle encoding problem in different operating systems

* fix: use temp file to do copy when the operating system is not Linux

* fix: linting error

* fix: use fsutil in test_k8s.py

* feat: dynamic init ABS_PATH in GlobalParams

* fix: use -Command to execute Powershell command

* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode

* fix: problems in code review

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* V0.2 merge master (#214)

* fix the visualization of docs/key_components/distributed_toolkit

* add examples into isort ignore

* refine import path for examples (#195)

* refine import path for examples

* refine indents

* fixed formatting issues

* update code style

* add editorconfig-checker, add editorconfig path into lint, change super-linter version

* change path for code saving in cim.gnn

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>

* fix issue that sometimes there is conflict between distutils and setuptools  (#208)

* fix issue that cython and setuptools conflict

* follow the accepted temp workaround

* update comment, it should be conflict between setuptools and distutils

* fixed bugs related to proxy interface changes

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* typo fix

* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)

* bug fix

* clear the reference after extract sub events, update ut to cover this issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fix flake8 style problem

* V0.2 feature refine mode namings (#212)

* feat: refine cli exception

* feat: refine mode namings

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* fixed bugs in dist rl

* feat: rename files

* tests: set longer gracefully wait time

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: rm redundant variables

* fix: refine error message

Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vis new (#210)

Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* V0.2 local host process (#221)

* Update local process (not ready)

* update cli process mode

* add setup/clear/template for maro process

* fix process stop

* add logger and rename parameters

* add logger for setup/clear

* fixed close not exist pid when given pid list.

* Fixed comments and rename setup/clear with create/delete

* update ProcessInternalError

* V0.2 grass on premises (#220)

* feat: refine cli exception
* commit on v0.2_grass_on_premises

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm scheduling scenario (#189)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Resolve none action problem (#224)

* V0.2 vm_scheduling notebook (#223)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Init vm shceduling notebook

* Add notebook

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update based on the v0.2_datacenter

* Update notebook

* Update

* update filepath

* notebook updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Update process mode docs and fixed on premises (#226)

* V0.2 Add github workflow integration (#222)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* V0.2 explorer (#198)

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* added noise explorer

* fixed formatting

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* removed epsilon parameter from choose_action

* fixed some PR comments

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* refined dqn example

* fixed lint issues

* simplified scheduler

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 embedded optim (#191)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* embedded optimizer into SingleHeadLearningModel

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* minor docstring edits

* mv optimizer options inside LearningMode

* modified example accordingly

* fixed a bug

* fixed a bug

* fixed a bug

* added dueling DQN feature

* revised and refined docstrings

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 VM scheduling docs (#228)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* vm doc init

* Update docs

* Update docs

* Update docs

* Update docs

* Remove old notebook

* Update docs

* Update docs

* Add figure

* Update docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* v0.2 VM Scheduling docs refinement (#231)

* Fix typo

* Refining vm scheduling docs

* V0.2 store refinement (#234)

* updated docs and images for rl toolkit

* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Fix bug (#237)

vm scenario: fix the event type bug of the postpone event

* V0.2 rl toolkit doc (#235)

* updated docs and images for rl toolkit

* updated cim example doc

* updated cim exmaple docs

* updated cim example rst

* updated rl_toolkit and cim example docs

* replaced q_module with q_net in example rst

* refined doc

* refined doc

* updated figures

* updated figures

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Merge V0.2 vis into V0.2 (#233)

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* Added manifest file. (#201)

Only a few changes that need to meet requirements of manifest file format.

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

* V0.2 visualization-0.1 (#181)

* visualization 0.1

* render html title function

* flake-8 style fix

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fix the visualization of docs/key_components/distributed_toolkit

* doc refine

* doc update

* params type

* add examples into isort ignore

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>

* image change

* add reset snapshot

* delete dump

* add new line

* add next steps

* import change

* relative import

* add init file

* import change

* change utils file

* change cliexpcetion to clierror

* dashboard test

* change result

* change assertation

* move not

* unit test change

* core change

* unit test delete name_mapping_file

* update cim business engine

* doc update

* change relative path

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* duc update

* duc update

* duc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* change import sequence

* comments update

* doc add pic

* add dependency

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* Update dashboard_visualization.rst

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* delete white space

* doc update

* doc update

* update doc

* update doc

* update doc

Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 docs process mode (#230)

* Update process mode docs and fixed on premises

* Update orchestration docs

* Update process mode docs add JOB_NAME as env variable

* fixed bugs

* fixed isort issue

* update docs index

Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 learning model refinement (#236)

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* misc edits

* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook

* renamed single_host_cim_learner ot cim_learner in notebook

* updated notebook output

* typo fix

* removed dimension check in absence of shared stack

* fixed a typo

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Update vm docs (#241)

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 info update (#240)

* update readme

* update version

* refine reademe format

* add vis gif

* add citation

* update citation

* update badge

Co-authored-by: Arthur Jiang <sjian@microsoft.com>

* Fix typo (#242)

* Fix typo

* fix typo

* fix

* syntax fix (#253)

* syntax fix

* syntax fix

* syntax fix

* rm unwanted import

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm oversubscription (#246)

* Remove topology

* Update pipeline

* Update pipeline

* Update pipeline

* Modify metafile

* Add two attributes of VM

* Update pipeline

* Add vm category

* Add todo

* Add oversub config

* Add oversubscription feature

* Lint fix

* Update based on PR comment.

* Update pipeline

* Update pipeline

* Update config.

* Update based on PR comment

* Update

* Add pm sku feature

* Add sku setting

* Add sku feature

* Lint fix

* Lint style

* Update sku, overloading

* Lint fix

* Lint style

* Fix bug

* Modify config

* Remove sky and replaced it by pm stype

* Add and refactor vm category

* Comment out cofig

* Unify the enum format

* Fix lint style

* Fix import order

* Update based on PR comment

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* V0.2 vm scheduling decision event (#257)

* Fix data preparation bug

* Add frame index

* V0.2 PG, K-step and lambda return utils  (#155)

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* merged with v0.2_embedded_optims

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* revised

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* revised code based on revised abstractions

* fixed some bugs

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* fixed a bug with k-step return in AC

* fixed a bug

* fixed a bug

* merged pg, ac and ppo examples

* fixed a bug

* fixed a bug

* fixed naming for ppo

* renamed some variables in PPO

* added ActionWithLogProbability return type for PO-type algorithms

* fixed a bug

* fixed a bug

* fixed lint issues

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* formatting

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* updated cim PO example code according to changes in maro/rl

* removed early stopping from CIM dqn example

* combined ac and ppo and simplified example code and config

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* put PG and AC under PolicyOptimization class and refined examples accordingly

* fixed lint issues

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* updated cim example for policy optimization

* typo fix

* typo fix

* typo fix

* typo fix

* misc edits

* minor edits to rl_toolkit.rst

* checked out docs from master

* fixed typo in k-step shaper

* fixed lint issues

* bug fix in store

* lint issue fix

* changed default max_ep to 100 for policy_optimization algos

* vis doc update to master (#244)

* refine readme

* feat: refine data push/pull (#138)

* feat: refine data push/pull

* test: add cli provision testing

* fix: style fix

* fix: add necessary comments

* fix: from code review

* add fall back function in weather download (#112)

* fix deployment issue in multi envs

* fix typo

* fix ~/.maro not exist issue in build

* skip deploy when build

* update for comments

* temporarily disable weather info

* replace ecr with cim in setup.py

* replace ecr in manifest

* remove weather check when read data

* fix station id issue

* fix format

* add TODO in comments

* add noaa weather source

* fix weather reset and weather comment

* add comment for weather data url

* some format update

* add fall back function in weather download

* update comment

* update for comments

* update comment

* add period

* fix for pylint

* update for pylint check

* added example docs (#136)

* added example docs

* added citibike greedy example doc

* modified citibike doc

* fixed PR comments

* fixed more PR comments

* fixed small formatting issue

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* switch the key and value of handler_dict in decorator (#144)

* switch the key and value of handler_dict in decorator

* add dist decorator UT and fixed multithreading conflict in maro test suite

* pr comments update.

* resolved comments about decorator UT

* rename handler_fun in dist decorator

* change self.attr into class_name.attr

* update UT tests comments

* V0.1 annotation (#147)

* refine the annotation of simulator core

* remove reward from env(be)

* format refined

* white spaces test

* left-padding spaces refined

* format modifed

* update the left-padding spaces of docstrings

* code format updated

* update according to comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Event payload details for env.summary (#156)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* V0.2 online lp for citi bike (#159)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

* online LP example added for citi bike

* infeasible solution

* infeasible solution fixed: call snapshot before any env.step()

* experiment results of toy topos added

* experiment results of toy topos added

* experiment result update: better than naive baseline

* PuLP version added

* greedy experiment results update

* citibike result update

* modified according to PR comments

* update experiment results and forecasting comparison

* citi bike lp README updated

* README updated

* modified according to PR comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* update according to flake8

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)

* 1. added logical operator overloading for early stopping checker; 2. added mean value checker

* fixed PR comments

* removed learner.exit() in single_process_launcher

* added another early stopping checker in example

* fixed PR comments and lint issues

* lint issue fix

* fixed lint issues

* fixed a bug

* fixed a bug

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 skip connection (#176)

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* moved reward type casting to exp shaper

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fixed a bug in learner's test() (#193)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 double dqn (#188)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* set is_double to true in DQN config

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature predefined image (#183)

* feat: support predefined image provision

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* V0.2 feature proxy rejoin (#158)

* update dist decorator

* replace proxy.get_peers by proxy.peers

* update proxy rejoin (draft, not runable for proxy rejoin)

* fix bugs in proxy

* add message cache, and redesign rejoin parameter

* feat: add checkpoint with test

* update proxy.rejoin

* fixed rejoin bug, rename func

* add test example(temp)

* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.

* capital env vari name

* rm json.dumps; change retries to 10; temp add warning level for rejoin

* fix: unable to load FaultToleranceAgent, missing params

* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent

* feat: add node_id to node_details

* fix: add a new dependency for tests

* style: meet linting requirements

* style: remaining linting problems

* lint fixed; rm temp test folder.

* fixed lint f-string without placeholder

* fix: add a flag for "remove_container", refine restart logic and Redis keys naming

* proxy rejoin update.

* variable rename.

* fixed lint issues

* fixed lint issues

* add exit code for different error

* feat: add special errors handler

* add max rejoin times

* remove unused import

* add rejoin UT; resolve rejoin comments

* lint fixed

* fixed UT import problem

* rm MessageCache in proxy

* fix: refine key naming

* update proxy rejoin; add topic for broadcast

* feat: support predefined image provision

* update UT for communication

* add docstring for rejoin

* fixed isort and zmq driver import

* fixed isort and UT test

* fix isort issue

* proxy rejoin update (comments v2)

* fixed isort error

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* feat: add exists method for checkpoint

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* add driver close and socket SUB disconnect for rejoin

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

* fixed comments and update logger level

* mv driver in proxy.__init__ for issue temp fixed.

* Update docstring and comments

* style: fix code reviews problems

* fix code format

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 feature cli windows (#203)

* fix: change local mkdir to os.makedirs

* fix: add utf8 encoding for logger

* fix: add powershell.exe prefix to subprocess functions

* feat: add debug_green

* fix: use fsutil to create fix-size files in Windows

* fix: use universal_newlines=True to handle encoding problem in different operating systems

* fix: use temp file to do copy when the operating system is not Linux

* fix: linting error

* fix: use fsutil in test_k8s.py

* feat: dynamic init ABS_PATH in GlobalParams

* fix: use -Command to execute Powershell command

* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode

* fix: problems in code review

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* V0.2 merge master (#214)

* fix the visualization of docs/key_components/distributed_toolkit

* add examples into isort ignore

* refine import path for examples (#195)

* refine import path for examples

* refine indents

* fixed formatting issues

* update code style

* add editorconfig-checker, add editorconfig path into lint, change super-linter version

* change path for code saving in cim.gnn

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>

* fix issue that sometimes there is conflict between distutils and setuptools  (#208)

* fix issue that cython and setuptools conflict

* follow the accepted temp workaround

* update comment, it should be conflict between setuptools and distutils

* fixed bugs related to proxy interface changes

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* typo fix

* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)

* bug fix

* clear the reference after extract sub events, update ut to cover this issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fix flake8 style problem

* V0.2 feature refine mode namings (#212)

* feat: refine cli exception

* feat: refine mode namings

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* fixed bugs in dist rl

* feat: rename files

* tests: set longer gracefully wait time

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: rm redundant variables

* fix: refine error message

Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vis new (#210)

Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* V0.2 local host process (#221)

* Update local process (not ready)

* update cli process mode

* add setup/clear/template for maro process

* fix process stop

* add logger and rename parameters

* add logger for setup/clear

* fixed close not exist pid when given pid list.

* Fixed comments and rename setup/clear with create/delete

* update ProcessInternalError

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* V0.2 grass on premises (#220)

* feat: refine cli exception
* commit on v0.2_grass_on_premises

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm scheduling scenario (#189)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Resolve none action problem (#224)

* V0.2 vm_scheduling notebook (#223)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Init vm shceduling notebook

* Add notebook

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update based on the v0.2_datacenter

* Update notebook

* Update

* update filepath

* notebook updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Update process mode docs and fixed on premises (#226)

* V0.2 Add github workflow integration (#222)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* update param name

* V0.2 explorer (#198)

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* added noise explorer

* fixed formatting

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* removed epsilon parameter from choose_action

* fixed some PR comments

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* refined dqn example

* fixed lint issues

* simplified scheduler

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 embedded optim (#191)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* embedded optimizer into SingleHeadLearningModel

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* minor docstring edits

* mv optimizer options inside LearningMode

* modified example accordingly

* fixed a bug

* fixed a bug

* fixed a bug

* added dueling DQN feature

* revised and refined docstrings

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 VM scheduling docs (#228)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* vm doc init

* Update docs

* Update docs

* Update docs

* Update docs

* Remove old notebook

* Update docs

* Update docs

* Add figure

* Update docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* doc update

* new link

* image update

* v0.2 VM Scheduling docs refinement (#231)

* Fix typo

* Refining vm scheduling docs

* image change

* V0.2 store refinement (#234)

* updated docs and images for rl toolkit

* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Fix bug (#237)

vm scenario: fix the event type bug of the postpone event

* V0.2 rl toolkit doc (#235)

* updated docs and images for rl toolkit

* updated cim example doc

* updated cim exmaple docs

* updated cim example rst

* updated rl_toolkit and cim example docs

* replaced q_module with q_net in example rst

* refined doc

* refined doc

* updated figures

* updated figures

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Merge V0.2 vis into V0.2 (#233)

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* Added manifest file. (#201)

Only a few changes that need to meet requirements of manifest file format.

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

* V0.2 visualization-0.1 (#181)

* visualization 0.1

* render html title function

* flake-8 style fix

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fix the visualization of docs/key_components/distributed_toolkit

* doc refine

* doc update

* params type

* add examples into isort ignore

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>

* image change

* add reset snapshot

* delete dump

* add new line

* add next steps

* import change

* relative import

* add init file

* import change

* change utils file

* change cliexpcetion to clierror

* dashboard test

* change result

* change assertation

* move not

* unit test change

* core change

* unit test delete name_mapping_file

* update cim business engine

* doc update

* change relative path

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* duc update

* duc update

* duc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* change import sequence

* comments update

* doc add pic

* add dependency

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* Update dashboard_visualization.rst

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* delete white space

* doc update

* doc update

* update doc

* update doc

* update doc

Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 docs process mode (#230)

* Update process mode docs and fixed on premises

* Update orchestration docs

* Update process mode docs add JOB_NAME as env variable

* fixed bugs

* fixed isort issue

* update docs index

Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 learning model refinement (#236)

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* misc edits

* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook

* renamed single_host_cim_learner ot cim_learner in notebook

* updated notebook output

* typo fix

* removed dimension check in absence of shared stack

* fixed a typo

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Update vm docs (#241)

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 info update (#240)

* update readme

* update version

* refine reademe format

* add vis gif

* add citation

* update citation

* update badge

Co-authored-by: Arthur Jiang <sjian@microsoft.com>

* Fix typo (#242)

* Fix typo

* fix typo

* fix

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* bug fix related to np array divide (#245)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Master.simple bike (#250)

* notebook for simple bike repositioning added

* add simple rule-based algorithms

* unify input

* add policy based on statistics

* update be for simple bike scenario to fit latest event buffer changes (#247)

* change rendered graph

* figures updated

* change notebook

* matplot updated

* figures updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: wesley <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* simple bike repositioning article: formula updated

* checked out docs/source from v0.2

* aligned with v0.2

* rm unwanted import

* added references in policy_optimization.py

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 backend dynamic node support (#172)

* update lint workflow

* fix workflow issue

* Update lint.yml

* Create tox.ini

* Update lint.yml

* Update lint.yml

* Update tox.ini

* Update lint.yml

* Delete tox.ini from root folder, move it to .github/linters

* Update CONTRIBUTING.md

* add more comments

* update lint conf to ignore cli banner issue

* change extension implementation from c to cpp

* update script to gen cpp files

* backend base interface redefine

* interface revamp for np backend

* 1st step for revamp

* bug fix

* draft

* implementation of attribute

* implementation of backend

* remove  backend switching

* draft raw backend wrapper

* correct function parameter type

* 1st runable version

* bug fix for types

* ut passed

* change CRLF to LF

* fix get_node_info interface

* add raw test in frame ut

* return np.array for all query result

* use ticks from backend

* set init value

* snapshot ut passed

* support set default backend by environemnt variable

* env ut with different backend

* fix take snapshot index bug

* test under both backends

* ignore generated cpp file

* fix lint isues

* more lint fix

* use ordered map to store ticks to keep the order

* remove test code

* refine dup code

* refine code to avoid too much if/else

* handle and raise exception for attr getter

* change the way to handle cpp exception, use cython runtimeerror instead

* add missing function, and fix bug in np impl

* fix lint issue

* specify c++11 flag for compilers

* use normal field assignment instead initializer list, as linux gcc will complain it

* add np ignore macro

* try to refine token pasting operator to avoid error on linux

* more pasting operator issue fix

* remove un-used options

* update workflow files to fit new backend

* 1st version of dynamic backend structure

* setup ut for cpp using lest

* bitset complete

* attributestore and ut

* arrange

* copy_to

* current frame

* ut for frame

* bug fix and ut correct

* fix issue that value not correct after arrange

* fix bug in test case

* frame update

* change the way to add nodes, support add node from middle

* frame in backend

* snapshotlist code complete

* add size method for snapshotlist, add ut template

* make sure snapshot max size not be 0

* add max size

* fix query parameters

* fix attribute store extend error

* add function to retrieve attribute from snapshotlist

* return nan for invalid index

* add function to check if nan for float attribute only

* fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash

* add functions to expose internal state under debug mode, make it easy to do unit test

* fix issue that cause overlap logic skiped

* ut passed for all implemented functions

* remove query in ut, as it not completed yet

* refine querying interfaces, use 2 functions for 1 querying

* snapshot query,

* use pointer instead weak_ptr

* backend impl

* set default parameters value

* query bug fix,

* bug fix: new_attr should return attr id not node id

* use macro to create attribute getters

* add reset support

* change the way to reset, avoid allocation time

* test reset for attributestore

* use Bitset instead vector<bool> to make it easy to reset

* refine backend interfaces to make it compact with old one

* correct quering interface, cython compile passed

* bug fix: get_ticks not set correct index

* correct cpp backend binding, add type for frame

* correct ut for snapshot

* bug fix: query cause crash after snapshot reset

* fix env test

* bug fix: is_nan should check data type first

* fix cim ut issues with raw backend

* fix citibike ut issues for raw backend

* add interfaces to support dynamic nodes, not tested

* bug fix: access cpp object without cdef

* bug fix: missing impl for dynamic methods

* ut for append nodes

* return node number dynamiclly

* remove unused parameters for snapshot

* remove unused code

* allow get attribute for deleted node

* ut for delete and resume node

* function to set attribute slot

* bug fix: set attribute will cause crash

* bug fix: remove append node when reset cause exception

* bug fix: frame.backend_type return incorrect name

* backends performance comparison

* correct internal type

* correct warnings

* missing ;

* formating

* fix lint issue

* simple the way to copy mapping

* add dump interfaces

* frame dump

* ignore if dump path is not exist

* bug fix: use max slots instead of current slots for padding in snapshot querying

* use max slot number in history instead of current for padding

* dump for snapshot

* close file at the end

* refine snapshot dump function

* fix lint issue

* avoid too much allocate operation

* use pointer instead reference for furthure changes

* avoid 2 times map copy

* add comments for missing functions

* performance optimize

* use emplace instead push

* use emplace instead  push

* remove cpp files

* add missing lisence

* ignore .vs folder

* add lest lisence for cpp unittest

* Delete CMakeLists.txt

* add error msg for exception, make it easy to identify error at python side

* remove old codes

* replace with new code

* change IDENTIER to NODE_TYPE and ATTR_TYPE

* build pass

* fix attr type not correct bug

* reomve unused comment

* make frame ut pass

* correct the max snapshots checking

* fix test case

* add missing file

* correct performance test

* refine attribute code

* refine bitset code

* update FrameBase doc about switch backend

* correct the exception name

* refine frame code

* refine node code

* refine snapshot list code

* add is_const and is_list when adding attribute

* support query const attribute without tick exist

* add operations for list attribute

* remove cache as we have list attribute

* add remove and insert for list attribute

* add for-loop support for list attribute

* fix bug that not update list attribute slot number after operations

* test for dynamic features

* frame dump

* dump for snapshot list

* fix issue on gcc compiler

* add missing file

* fix lint issues

* refine the exception, more comments

* fix lint issue

* fix lint issue

* use simulate enum instead of str

* Use new type instead old in tests

* using mapping instead if-else

* remove generated code

* use mapping to reduce too much if-else

* add default attribute type int if not provided or invalid provided

* remove generated code

* update workflow with code gen

* more frame test

* add missing files

* test: cover maro.simulator.utils.common

* update test with new scenario

* comments

* tests

* update doc

* fix lint and comments

* CRLF to LF

* fix lint issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 vm oversub docs (#256)

* Remove topology

* Update pipeline

* Update pipeline

* Update pipeline

* Modify metafile

* Add two attributes of VM

* Update pipeline

* Add vm category

* Add todo

* Add oversub config

* Add oversubscription feature

* Lint fix

* Update based on PR comment.

* Update pipeline

* Update pipeline

* Update config.

* Update based on PR comment

* Update

* Add pm sku feature

* Add sku setting

* Add sku feature

* Lint fix

* Lint style

* Update sku, overloading

* Lint fix

* Lint style

* Fix bug

* Modify config

* Remove sky and replaced it by pm stype

* Add and refactor vm category

* Comment out cofig

* Unify the enum format

* Fix lint style

* Fix import order

* Update based on PR comment

* Update overload to the VM docs

* Update docs

* Update vm docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 ddpg (#252)

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* embedded optimizer into SingleHeadLearningModel

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* minor docstring edits

* mv optimizer options inside LearningMode

* modified example accordingly

* fixed a bug

* fixed a bug

* fixed a bug

* added dueling DQN feature

* revised and refined docstrings

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* some revision to DDPG

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* fixed some issues with DDPG code

* added noise explorer

* formatting

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* fixed naming issues

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* tmp commit

* tmp commit

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* fixed learning model naming

* fixed conflicts

* updated ddpg example

* misc edits

* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook

* renamed single_host_cim_learner ot cim_learner in notebook

* updated notebook output

* typo fix

* added ddpg example for cim

* fixed some bugs

* removed dimension check in absence of shared stack

* fixed a typo

* bug fixes

* bug fixes

* aligned with v0.2

* aligned with v0.2

* fixed lint issues

* added reference in ddpg.py

* fixed lint issues

* fixed lint issues

* fixed lint issues

* removed ddpg example

* checked out files from origin/v0.2 before merging

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 cli refactoring (#227)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* refactor: extract reusable methods to GrassExecutor

* feat: refine validation.py and add docstrings

* fix: add remote prefix to ssh function

* style: refine logging output

* fix: extract param 'vm_name'

* fix: linting errors

* feat: add NodeStatus and ContainerStatus at executors

* feat: use master_node_size as the size of build_node_image_vm

* fix: refine comments

* feat: add "state" key for node_details

* fix: linting errors

* fix: deployment error when ssh_port is the default port

* refactor: extract utils/*.py in scripts

* style: single quote to double quote

* refactor: refine folder structure of scripts

* fix: linting errors

* fix: add executable to fix error initialization

* refactor: use SubProcess to execute commands in scripts

* refactor: refine script namings

* refactor: extract utils/*.py and systemd/*.service in agents

* feat: refine Exception structure, add SubProcess class in agents

* feat: use psutil to get resource details, move resource details initialization to agents

* fix: linting errors

* feat: use docker sdk in node_agent

* feat: extract RedisExecutor in agents

* test: remove image when tearing down

* feat: add LoadImageAgent

* feat: move node status update to agents

* refactor: move utils folder to upper level in scripts

* feat: add node_api_server, refine agents folder structure

* fix: linting errors

* refactor: refine folder structure in grass/lib

* refactor: build DeploymentValidator class

* refactor: create DetailsReader, DetailsWriter, delete sync mode

* refactor: rename DockerManager to DockerController

* refactor: rename RedisManager to RedisController

* refactor: rename AzureExecutor to AzureController

* refactor: create NameCreator

* refactor: create PathConvertor

* refactor: rename checkers to details_validity_wrapper

* refactor: rename lock to operation_lock_wrapper

* refactor: create FileSynchronizer

* refactor: create redis instance in RedisController

* feat: add master_api_server, move job related scripts to api_server

* refactor: move node related scripts to api_server

* fix: use "DELETE" instead of "DEL" as http method

* refactor: use mapping names instead of namings like "sths_details"

* feat: move master related scripts to api_server

* feat: move containers related scripts to api_server

* fix: add gracefully wait for remote_start_master_services

* feat: move image_files related scripts to api_server

* fix: improper test in the training stage

* refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client

* refactor: refine namings in services

* feat: move clean related scripts to api_server

* refactor: delete "public_key" field

* feat: build MasterApiClient

* refactor: delete sync_mkdir

* feat: refine locks in node_details

* feat: build DockerController for grass/utils

* refactor: rename Extractor to Controller

* feat: move schedule related components to api_server

* fix: incorrect allocation when starting batch jobs

* fix: missing field "containers" in job_details

* feat: add delete_job in master_api_server

* feat: add logger in agents

* fix: no "resources" field when scale up node at the very beginning

* feat: use Process back instead of Thread in node_agent

* feat: add 'v1' prefix to api_servers' urls

* refactor: move lib/aks under lib/clouds

* refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis

* feat: extract K8sExecutor

* fix: add one more searching layer of pakcage_data at maro.cli.k8s

* refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode

* refactor: move id init to standardize_create_deployment in grass/azure mode

* fix: use GlobalParams instead of hard-coded data

* feat: build K8sDetailsReader, K8sDetailsWriter

* feat: use k8s sdk to replace subprocess call

* refactor: delete redundant vars

* refactor: move more methods to K8sExecutor

* test: use legal naming in tests/cli/k8s

* refactor: refine logging messages

* refactor: make create() as a staticmethod at grass/azure mode, refine logging messages

* feat: build ArmTemplateParameterBuilder in K8sAzureExecutor

* refactor: remove redundant params

* refactor: rename /clouds to /modes

* refactor: refine structures and logging messages in GrassExecutor

* feat: add 'PENDING' to NodeStatus

* feat: refine build_job_details for create schedule in grass/azure

* feat: refine build_job_details for create schedule in k8s/aks

* feat: use node_join schema in grass/azure

* refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts

* refactor: add 'ssh', 'api_server' into master_details and node_details

* refactor: move master runtine params initialization into api_server

* refactor: refine namings

* feat: reconstruct grass/on-premises with new schema

* refactor: delete field 'user' in grass_azure_create

* refactor: rename 'blueprints_v1' to 'blueprints'

* refactor: move some GlobalPaths to subfolders

* refactor: replace 'connection' field with 'master' or 'node'

* refactor: move start_service scripts to init_master.py

* refactor: rename grass/master/release to grass/master/delete_master

* refactor: load local_details in node services, refine script namings

* refactor: move invocations of start_node and stop node to api server

* fix: add missing imports

* refactor: rename SubProcess to Subprocess

* refactor: delete field 'user' in k8s_aks_create

* refactor: refine folder structures in /.maro/clusters/cluster

* refactor: move /logs to /clusters/{cluster_name}

* refactor: refine filenames

* fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings

* refactor: refine code structures, delete redundant code

* refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml

* feat: add rsa+aes data encryption on dev-master communication

* fix: change MasterApiClient to RedisController in node-related services and scripts

* refactor: remove all "{cluster_name}" in redis keys

* refactor: extract init_master and create_user to GrassExecutor

* test: refine tests in grass/azure and k8s/aks

* refactor: refine ArmTemplateParameterBuilder

* feat: change the order of installation in init_build_node_image_vm.py

* fix: add user/admin_id to grass_on_premises_create.yml

* fix: change outdated container names

* feat: add standardize_join_cluster_deployment in grass/on-premises

* feat: add init_node_runtime_env in join_cluster.py

* refactor: refine code structure in join_cluster.py

* test: add TestGrassOnPremises

* refactor: refine ARM templates

* fix: linting errors

* fix: test requirements error

* fix: arm linting errors

* refactor: late import in grass, k8s

* style: refine load_parser_grass

* style: refine load_parser_k8s

* docs: update orchestrations

* fix: fix get_job_logs

* docs: add docs for GrassAzureExecutor, GrassExecutor

* docs: add docs for GrassOnPremisesExecutor

* docs: add docs for /grass/scripts

* docs: add docs for /grass/services

* docs: add docs for /grass/utils

* docs: add docs for k8s

* try paramiko of another version

* rollback paramiko package version

Co-authored-by: Wesley <Wenlei.Shi@microsoft.com>

* Refine joint decision sequential action mode (#219)

* refine the logic about jont decision sequential action mode to match current event buffer implementation

* fix lint issue

* fix lint issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 merge algorithm into agent (#259)

* merged algorithm with agent

* bug fixes

* fix

* bug fixes

* fixed lint issues and renamed models to model

* removed exp pool type spec in AbsAgent

* fixed lint issues

* dqn exp pool bug fix

* minor issues

* updated notebooks and examples according to rl toolkit changes

* updated images

* moved exp pool init inside DQN

* renamed column_based_store to simple_store

* fixed lint issues

* fixed lint issues

* lint issue fix

* lint issue fix

* fixed bugs in test_store

* typo fix

* minor edits

* lint issue fix

* 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel

* updated notebook

* removed simple agent manager

* fixed lint issues

* fixed lint issues

* bug fix

* refined LearningModel

* updated cim example doc

* lint issue fix

* small refinements

* replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms

* lint issue fix

* formatting

* 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model

* minor formatting fixes

* refinement

* rm unwanted import

* add List typing in schedular

* lint issue fix

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wesley <Wenlei.Shi@microsoft.com>

* V0.2 gnn refactoring (#274)

* merged algorithm with agent

* bug fixes

* fix

* bug fixes

* fixed lint issues and renamed models to model

* removed exp pool type spec in AbsAgent

* fixed lint issues

* dqn exp pool bug fix

* minor issues

* updated notebooks and examples according to rl toolkit changes

* updated images

* moved exp pool init inside DQN

* renamed column_based_store to simple_store

* fixed lint issues

* fixed lint issues

* lint issue fix

* lint issue fix

* fixed bugs in test_store

* typo fix

* minor edits

* lint issue fix

* 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel

* updated notebook

* removed simple agent manager

* fixed lint issues

* fixed lint issues

* bug fix

* refined LearningModel

* updated cim example doc

* lint issue fix

* small refinements

* replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms

* refactored gnn example and added single-process script

* removed obsolete files from gnn

* lint issue fix

* formatting

* 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model

* minor formatting fixes

* refinement

* rm unwanted import

* add List typing in schedular

* lint issue fix

* removed redundant parameters for GNNBasedACModel

* restored duration to 1120

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wesley <Wenlei.Shi@microsoft.com>

* Add vector env support (#266)

* 1st version

* make vectorenv can import under module root

* allow outside control which environment to push, so we do not need to control the tick for each environments

* remove comment

* lint fixing

* add test for vector env, correct the batch number

* lint fixing

* reduce parameters

* Update vector env ut to test if support raw backend

* correct comments on hello

* fix review comments, cim actiontype wip

* add a compatiable way to handle ActionType for cim scenario

* lint fix

* correct the action type to handle previous action

* add doc string for wrappers

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* v0.2 Rule-based algorithms for VM Scheduling (#255)

* rule_based_algorithm

* revise_the_code_by_aiming_hao

* revise_the_code_by_aiming_hao

* use the np.argmin

* Update best_fit.py

fix the "np not defined"

* refine the code

* fix the error

* refine the code

* fix the error

* fix the error

* refine the code

* remove the history

* refine the code

* update first_fit

* Refine the code

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>

* delete duplicated rule based algorithms for VM scheduling

* Add slot filter functions for node attribute  (#273)

* add where filter for general usage

* test for general filter

* simpler comparison for attribute

* filter on raw

* fix array fetch bug

* ut for base comparison

* lint fix

* remove unused variables

* update ignore

* Fix coding style (#284)

* V0.2 vm region support (#258)

* Region init

* Add region, zone, cluster

* Fix bug

* Add update parent id

* Update PM config

* Update number

* Fix import order

* Fix bug

* Modify config

* Add cluster attribute

* Refine naming

* Fix bug

* Modify 336k config

* Update region

* Update config

* Update pm config

* pylint

* Add comment

* Update based on PR comment

* Modify config and zone class

* Add unit test

* Update region part

* Update pylint

* Modify unit test

* Refactor region structure

* Add comment and fix style

* Fix machine num bugs

* Modify config

* Fix style

* Fix bugs and add empty machine attributes

* Add update upper level metrics

* Update config

* Fix lint style

* Modify doc strings

* Fix amount counter

* Update unit test

* fix lint style

* Update the ids init

* Init total and empty machine num

* Update lint style

* Fix snapshot attributes initial state

* Update config

* add topologies for over-subscription and multi-cluster to be compatible with the previous topologies

* Add simulation result

* Move readme

* Add overload results

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* V0.2 rule based algorithm readme (#282)

* Add README.md and refine the bin_packing algorithm

* refine round_robin and bin_packing

* Update README.md

* Refine the code and README.md

* Refine the bin_packing and round_robin

* Refine the code

Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>

* Feature: Add a cli command to support create new project. (#279)

* maro project new

* remove maro project run

* add get_metrics to template

* add license

* more comments

* lint issue fix

* linting issue fix

* fix linting issue

* linting issue fix

* remove unused code gen

* include template files

* fix incorrect comment

* include topologies for vm_scheduling scenario

* rename to PositiveNumberValidator

* refine command line comment

* refine topology command comment

* add a simple doc for new command

* fix incorrect value for dummy frame

* correct issues in docs

* more comments on set_state

* doc issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* CLI visualization support and maro grass local mode (#277)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* refactor: extract reusable methods to GrassExecutor

* feat: refine validation.py and add docstrings

* fix: add remote prefix to ssh function

* style: refine logging output

* fix: extract param 'vm_name'

* fix: linting errors

* feat: add NodeStatus and ContainerStatus at executors

* feat: use master_node_size as the size of build_node_image_vm

* fix: refine comments

* feat: add "state" key for node_details

* fix: linting errors

* fix: deployment error when ssh_port is the default port

* refactor: extract utils/*.py in scripts

* style: single quote to double quote

* refactor: refine folder structure of scripts

* fix: linting errors

* fix: add executable to fix error initialization

* refactor: use SubProcess to execute commands in scripts

* refactor: refine script namings

* refactor: extract utils/*.py and systemd/*.service in agents

* feat: refine Exception structure, add SubProcess class in agents

* feat: use psutil to get resource details, move resource details initialization to agents

* fix: linting errors

* feat: use docker sdk in node_agent

* feat: extract RedisExecutor in agents

* test: remove image when tearing down

* feat: add LoadImageAgent

* feat: move node status update to agents

* refactor: move utils folder to upper level in scripts

* feat: add node_api_server, refine agents folder structure

* fix: linting errors

* refactor: refine folder structure in grass/lib

* refactor: build DeploymentValidator class

* refactor: create DetailsReader, DetailsWriter, delete sync mode

* refactor: rename DockerManager to DockerController

* refactor: rename RedisManager to RedisController

* refactor: rename AzureExecutor to AzureController

* refactor: create NameCreator

* refactor: create PathConvertor

* refactor: rename checkers to details_validity_wrapper

* refactor: rename lock to operation_lock_wrapper

* refactor: create FileSynchronizer

* refactor: create redis instance in RedisController

* feat: add master_api_server, move job related scripts to api_server

* refactor: move node related scripts to api_server

* fix: use "DELETE" instead of "DEL" as http method

* refactor: use mapping names instead of namings like "sths_details"

* feat: move master related scripts to api_server

* feat: move containers related scripts to api_server

* fix: add gracefully wait for remote_start_master_services

* feat: move image_files related scripts to api_server

* fix: improper test in the training stage

* refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client

* refactor: refine namings in services

* feat: move clean related scripts to api_server

* refactor: delete "public_key" field

* feat: build MasterApiClient

* refactor: delete sync_mkdir

* feat: refine locks in node_details

* feat: build DockerController for grass/utils

* refactor: rename Extractor to Controller

* feat: move schedule related components to api_server

* fix: incorrect allocation when starting batch jobs

* fix: missing field "containers" in job_details

* feat: add delete_job in master_api_server

* feat: add logger in agents

* fix: no "resources" field when scale up node at the very beginning

* feat: use Process back instead of Thread in node_agent

* feat: add 'v1' prefix to api_servers' urls

* refactor: move lib/aks under lib/clouds

* refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis

* feat: extract K8sExecutor

* fix: add one more searching layer of pakcage_data at maro.cli.k8s

* refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode

* refactor: move id init to standardize_create_deployment in grass/azure mode

* fix: use GlobalParams instead of hard-coded data

* feat: build K8sDetailsReader, K8sDetailsWriter

* feat: use k8s sdk to replace subprocess call

* refactor: delete redundant vars

* refactor: move more methods to K8sExecutor

* test: use legal naming in tests/cli/k8s

* refactor: refine logging messages

* refactor: make create() as a staticmethod at grass/azure mode, refine logging messages

* feat: build ArmTemplateParameterBuilder in K8sAzureExecutor

* refactor: remove redundant params

* refactor: rename /clouds to /modes

* refactor: refine structures and logging messages in GrassExecutor

* feat: add 'PENDING' to NodeStatus

* feat: refine build_job_details for create schedule in grass/azure

* feat: refine build_job_details for create schedule in k8s/aks

* add grass local mode (non-pass)

* feat: use node_join schema in grass/azure

* refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts

* refactor: add 'ssh', 'api_server' into master_details and node_details

* refactor: move master runtine params initialization into api_server

* refactor: refine namings

* feat: reconstruct grass/on-premises with new schema

* refactor: delete field 'user' in grass_azure_create

* refactor: rename 'blueprints_v1' to 'blueprints'

* refactor: move some GlobalPaths to subfolders

* Update grass local mode, run pass

* refactor: replace 'connection' field with 'master' or 'node'

* refactor: move start_service scripts to init_master.py

* refactor: rename grass/master/release to grass/master/delete_master

* refactor: load local_details in node services, refine script namings

* refactor: move invocations of start_node and stop node to api server

* fix: add missing imports

* refactor: rename SubProcess to Subprocess

* refactor: delete field 'user' in k8s_aks_create

* add resource class

* refactor: refine folder structures in /.maro/clusters/cluster

* refactor: move /logs to /clusters/{cluster_name}

* refactor: refine filenames

* fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings

* refactor: refine code structures, delete redundant code

* refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml

* feat: add rsa+aes data encryption on dev-master communication

* fix: change MasterApiClient to RedisController in node-related services and scripts

* refactor: remove all "{cluster_name}" in redis keys

* refactor: extract init_master and create_user to GrassExecutor

* test: refine tests in grass/azure and k8s/aks

* refactor: refine ArmTemplateParameterBuilder

* add cli visible agent

* feat: change the order of installation in init_build_node_image_vm.py

* fix: add user/admin_id to grass_on_premises_create.yml

* fix: change outdated container names

* feat: add standardize_join_cluster_deployment in grass/on-premises

* feat: add init_node_runtime_env in join_cluster.py

* refactor: refine code structure in join_cluster.py

* test: add TestGrassOnPremises

* refactor: refine ARM templates

* fix: linting errors

* fix: test requirements error

* fix: arm linting errors

* refactor: late import in grass, k8s

* style: refine load_parser_grass

* style: refine load_parser_k8s

* add jobstate and resource usage support

* add local visible test

* docs: update orchestrations

* fix: fix get_job_logs

* docs: add docs for GrassAzureExecutor, GrassExecutor

* docs: add docs for GrassOnPremisesExecutor

* docs: add docs for /grass/scripts

* docs: add docs for /grass/services

* docs: add docs for /grass/utils

* docs: add docs for k8s

* grass mode visible pass

* grass local mode run pass

* fixed pylint

* Update resource, rm GPUtil depend

* Update CLI local mode visible

* grass local mode pass

* add redis clear and pylint fixed

* rm job status in grass azure mode

* fix bug

* fixed merge issue

* fixed lin

* update by pr comments

* fixed isort issue

* fixed stop bug

* fixed local agent and cmp issue

* fixed pending job cannot killed

* add mount in Grass local mode

* add resource check interval in redis

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* Add Env-Geographic visualization tool, CIM hello as example (#291)

* streamit with questdb

* script to import current dump data, except attention file, use influxdb line protocol for batch sending.

* refine the interface to flatten dictionary

* add messagetype.file to upload file later

* correct tag name

* correct the way to initial streamit, make it possible to use it any where after start

* add data collecting in cim business engine

* streamit client refactoring

* fix import issue

* update cim hello world, with a commented code to enable vis data streaming

* fix metric replace bug

* refactor the type checking code

* V0.2 remove props from be (#269)

* Fix bug

* fix bu

* Master vm doc - data preparation (#285)

* Update vm docs

* Update docs

* Update data preparation docs

* Update

* Update docs

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* maro geo vis

* add new line

* doc update

* lint refine

* lint update

* lint updata

* lint update

* lint update

* lint update

* code revert

* add declare

* code revert

* add new line

* add comment

* delete surplus

* delete core

* lint update

* lint update

* lint update

* lint update

* specify version

* lint update

* specify docker version

* import sort

* backend revert

* Delete test.py

* format refact

* doc update

* import orders

* change import orders

* change import orders

* add version of http-server

* add specified port

* delete print

* lint update

* lint update

* lint update

* update doc

* dependecy update

* update business engine

* business engine

* business engine update

Co-authored-by: chaosyu <chaos.you@gmail.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Maro Geographic Tool Doc Update (#294)

* streamit with questdb

* script to import current dump data, except attention file, use influxdb line protocol for batch sending.

* refine the interface to flatten dictionary

* add messagetype.file to upload file later

* correct tag name

* correct the way to initial streamit, make it possible to use it any where after start

* add data collecting in cim business engine

* streamit client refactoring

* fix import issue

* update cim hello world, with a commented code to enable vis data streaming

* fix metric replace bug

* refactor the type checking code

* maro geo vis

* add new line

* doc update

* lint refine

* lint update

* lint updata

* lint update

* lint update

* lint update

* code revert

* add declare

* code revert

* add new line

* add comment

* delete surplus

* delete core

* lint update

* lint update

* lint update

* lint update

* specify version

* lint update

* specify docker version

* import sort

* backend revert

* Delete test.py

* format refact

* doc update

* import orders

* change import orders

* change import orders

* add version of http-server

* add specified port

* delete print

* lint update

* lint update

* lint update

* update doc

* dependecy update

* update business engine

* business engine

* business engine update

* doc update

* delete irelevant file

Co-authored-by: chaosyu <chaos.you@gmail.com>

* Maro geo vis Data Update (#295)

* streamit with questdb

* script to import current dump data, except attention file, use influxdb line protocol for batch sending.

* refine the interface to flatten dictionary

* add messagetype.file to upload file later

* correct tag name

* correct the way to initial streamit, make it possible to use it any where after start

* add data collecting in cim business engine

* streamit client refactoring

* fix import issue

* update cim hello world, with a commented code to enable vis data streaming

* fix metric replace bug

* refactor the type checking code

* maro geo vis

* add new line

* doc update

* lint refine

* lint update

* lint updata

* lint update

* lint update

* lint update

* code revert

* add declare

* code revert

* add new line

* add comment

* delete surplus

* delete core

* lint update

* lint update

* lint update

* lint update

* specify version

* lint update

* specify docker version

* import sort

* backend revert

* Delete test.py

* format refact

* doc update

* import orders

* change import orders

* change import orders

* add version of http-server

* add specified port

* delete print

* lint update

* lint update

* lint update

* update doc

* dependecy update

* update business engine

* business engine

* business engine update

* doc update

* delete irelevant file

* update data

Co-authored-by: chaosyu <chaos.you@gmail.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2_refactored_distributed_framework (#206)

* added some more logs for dist RL

* bug fix

* fixed a typo

* bug fix

* refined logs

* set session_id to None for exit message

* add setup/clear/template for maro process

* changed to internal logger for actor and learner

* removed redundant component name from internal logs

* fix process stop

* add logger and rename parameters

* add logger for setup/clear

* fixed close not exist pid when given pid list.

* Fixed comments and rename setup/clear with create/delete

* fixed typos

* update ProcessInternalError

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* updated notebook

* 1. removed external loggers from cim example; 2. fixed batch inference bugs

* removed actor_trainer mode and refactored

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* fixed conflicts

* fixed typos

* removed stale imports

* fixed stale naming

* removed dist_topologies folder

* refined session id logic

* bug fix

* refactored

* distributed RL refinement

* refined

* small bug fix

* fixed lint issues

* fixed lint issues

* removed unwanted file

* fixed a typo

* gnn refactoring in progress

* merged algorithm with agent

* bug fixes

* fix

* bug fixes

* fixed lint issues and renamed models to model

* removed unwanted files

* fixed merge conflicts

* removed exp pool type spec in AbsAgent

* fixed lint issues

* changed to a single gnn agent

* dqn exp pool bug fix

* minor issues

* removed GNNAgentManager

* updated notebooks and examples according to rl toolkit changes

* updated images

* moved exp pool init inside DQN

* renamed column_based_store to simple_store

* mroe gnn refactoring

* fixed lint issues

* fixed lint issues

* lint issue fix

* lint issue fix

* fixed bugs in test_store

* typo fix

* minor edits

* lint issue fix

* finished single process gnn

* fixed bugs

* 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel

* updated notebook

* removed simple agent manager

* fixed lint issues

* fixed lint issues

* bug fix

* bug fixes

* refined LearningModel

* modified gnn example based on latest rl toolkit changes

* updated cim example doc

* lint issue fix

* small refinements

* refactored GNN example

* replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms

* refactored gnn example and added single-process script

* removed obsolete files from gnn

* lint issue fix

* formatting

* checked out gnn files from origin/v0.2

* refactored distributed rl toolkit

* finished distributed rl refactoring and updated dqn example and notebook

* merged request_rollout with collect

* some refinement

* refactored examples

* distributed rl revamping complete

* bug and formatting fixes

* bug fixes

* hid proxy instantiation inside dist components

* small refinement

* refined distributed RL and updated docs

* updated docs and notebook

* rm unwanted imports

* added missing files

* rm unwanted files

* lint issue fix

* bug fix

* example doc update

* rm agent_manager.svg

* updated images

* updated image file name in doc

* revamped cim example code structure

* added missing file

* restored default training config for dqn and ac-gnn

* added default loss function for actor-critic

* rm unwanted import

* updated README for cim/ac

* removed log_p param for PolicyGradient train()

* added READMEs for CIM

* renamed ac-gnn to ac_gnn

* updated README for CIM and added set_seeds to multi-process dqn

* init

* remove unit, make it same as logic

* init by sku, world sku

* init by sku, world sku

* remove debug code

* correct snapshot number issue

* rename logic to unit, make it meaningful

* add facility base

* refine naming

* refine the code, more comment to make it easy to read

* add supplier facility, logic not tested yet

* fix bug in facility initialize, add consumerunit not completed

* refactoring the facilities in world config

* add consumer for warehouse facility

* add upstream topology, and save it state

* add mapping from id to data model index

* logic without reward of consumer

* bug fix

* seller unit

* use tcod for path finding

* retailer facility

* bug fix, show seller demands in example

* add a interactive and renderable env wrapper to later debugging

* move font to subfolder with lisence to make it more clearly

* add more details for node mapping

* dispatch action by unit id

* merge the frame changes to support data model inherit

* add action for consumer, so that we can push the requirement

* add unit id and facility in state for unit, add storage id for manufacture unit to simple the state retrieving

* show manufacture related debug info step by step

* add bom info for debug

* add x,y to facility, bug fix

* fix bugs in transport and distribution unit, correct the path finding issue

* show vehicle movement in screen

* remove completed todo

* fix vehicle location issue, make all units and data model class from configs

* show more states

* fix slot number bug for dynamic backend

* rename suppliers to manufactures

* add missing file

* remove code config, use yml instead

* add 2 different step modes

* update changes

* rename manufacture

* add action for manufacture unit

* more attribute for states

* add balance sheet

* rename attribute to unify the feature name

* reverted experimental changes in dqn learner

* updated notebook

* rm supply chain code

* lint issue fix

* lint issue fix

* added missing file

* added general rollout workflow and trajectory class

* refactored

* more refactoring

* checked out backend from v0.2

* checked out setup.py from v0.2

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
Co-authored-by: chaosyu <chaos.you@gmail.com>

* Add the price model (#286)

* Add the price model

* fix the error

* Refine the energy consumption

* Fix the error

* Delete business_engine_20210225104622.py

* Delete

* Delete the history file

* Delete common_20210205152100.py

* Delete common_20210302150646.py

* Refine the code

* Refine the code

* Refine the code

* Delete history files

* Fix the error

* Fix the error

* Fix the error

* Fix the error

* Fix the error

* Fix the error

* refine the code

* Refine the code

* Delete the history file

* Fix the error

* Fix the error

* Fix the error

* Refine the code

* fix the error

* fix the error

* fix the error

* Refine the code

* Add toy files

* Refine the code

* Refine the code

* Add file

* Refine the code

Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* add vm_scheduling meta into package data

* Maro Dashboard Vis Doc Update (#298)

* streamit with questdb

* script to import current dump data, except attention file, use influxdb line protocol for batch sending.

* refine the interface to flatten dictionary

* add messagetype.file to upload file later

* correct tag name

* correct the way to initial streamit, make it possible to use it any where after start

* add data collecting in cim business engine

* streamit client refactoring

* fix import issue

* update cim hello world, with a commented code to enable vis data streaming

* fix metric replace bug

* refactor the type checking code

* maro geo vis

* add new line

* doc update

* lint refine

* lint update

* lint updata

* lint update

* lint update

* lint update

* code revert

* add declare

* code revert

* add new line

* add comment

* delete surplus

* delete core

* lint update

* lint update

* lint update

* lint update

* specify version

* lint update

* specify docker version

* import sort

* backend revert

* Delete test.py

* format refact

* doc update

* import orders

* change import orders

* change import orders

* add version of http-server

* add specified port

* delete print

* lint update

* lint update

* lint update

* update doc

* dependecy update

* update business engine

* business engine

* business engine update

* doc update

* delete irelevant file

* update data

* doc update

Co-authored-by: chaosyu <chaos.you@gmail.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fixed internal logger dumplicated output (#299)

* fixed internal logger dumplicated output

* delete unused import

* fixed isort

Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>
Co-authored-by: MicrosoftHam <77261932+MicrosoftHam@users.noreply.github.com>
Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
This commit is contained in:
Jinyu-W 2021-03-22 14:53:27 +08:00 коммит произвёл GitHub
Родитель 9b1aca95b1
Коммит cee5277692
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
481 изменённых файлов: 22311 добавлений и 13429 удалений

2
.github/workflows/test_with_cli.yml поставляемый
Просмотреть файл

@ -63,4 +63,4 @@ jobs:
test_with_cli: True
training_only: True
run: |
python -m unittest tests/cli/grass/test_grass.py
python -m unittest -f tests/cli/grass/test_grass_azure.py

1
.gitignore поставляемый
Просмотреть файл

@ -6,6 +6,7 @@
*.c
*.cpp
*.DS_Store
.pytest_cache/
.idea/
.vscode/
.vs/

Просмотреть файл

@ -3,3 +3,5 @@ prune examples
include maro/simulator/scenarios/cim/topologies/*/*.yml
include maro/simulator/scenarios/citi_bike/topologies/*/*.yml
include maro/simulator/scenarios/vm_scheduling/topologies/*/*.yml
include maro/cli/project_generator/templates/*.jinja

Просмотреть файл

@ -161,7 +161,7 @@ env = Env(scenario="cim",
options={"enable-dump-snapshot": "./dump_data"})
# Inspect environment with the dump data
maro inspector env --source ./dump_data
maro inspector dashboard --source_path ./dump_data/snapshot_dump_folder
```
### Show Cases

Просмотреть файл

@ -9,6 +9,34 @@ maro.rl.agent.abs\_agent
:undoc-members:
:show-inheritance:
maro.rl.agent.dqn
--------------------------------------------------------------------------------
.. automodule:: maro.rl.agent.dqn
:members:
:undoc-members:
:show-inheritance:
maro.rl.agent.ddpg
--------------------------------------------------------------------------------
.. automodule:: maro.rl.agent.ddpg
:members:
:undoc-members:
:show-inheritance:
maro.rl.agent.policy\_optimization
--------------------------------------------------------------------------------
.. automodule:: maro.rl.agent.policy_optimization
:members:
:undoc-members:
:show-inheritance:
Agent Manager
================================================================================
maro.rl.agent.abs\_agent\_manager
--------------------------------------------------------------------------------
@ -18,33 +46,13 @@ maro.rl.agent.abs\_agent\_manager
:show-inheritance:
Algorithms
Model
================================================================================
maro.rl.algorithms.torch.abs\_algorithm
maro.rl.model.learning\_model
--------------------------------------------------------------------------------
.. automodule:: maro.rl.algorithms.torch.abs_algorithm
:members:
:undoc-members:
:show-inheritance:
maro.rl.algorithms.torch.dqn
--------------------------------------------------------------------------------
.. automodule:: maro.rl.algorithms.torch.dqn
:members:
:undoc-members:
:show-inheritance:
Models
================================================================================
maro.rl.models.torch.learning\_model
--------------------------------------------------------------------------------
.. automodule:: maro.rl.models.torch.learning_model
.. automodule:: maro.rl.model.torch.learning_model
:members:
:undoc-members:
:show-inheritance:
@ -53,18 +61,46 @@ maro.rl.models.torch.learning\_model
Explorer
================================================================================
maro.rl.explorer.abs\_explorer
maro.rl.exploration.abs\_explorer
--------------------------------------------------------------------------------
.. automodule:: maro.rl.explorer.abs_explorer
.. automodule:: maro.rl.exploration.abs_explorer
:members:
:undoc-members:
:show-inheritance:
maro.rl.explorer.simple\_explorer
maro.rl.exploration.epsilon\_greedy\_explorer
--------------------------------------------------------------------------------
.. automodule:: maro.rl.explorer.simple_explorer
.. automodule:: maro.rl.exploration.epsilon_greedy_explorer
:members:
:undoc-members:
:show-inheritance:
maro.rl.exploration.noise\_explorer
--------------------------------------------------------------------------------
.. automodule:: maro.rl.exploration.noise_explorer
:members:
:undoc-members:
:show-inheritance:
Scheduler
================================================================================
maro.rl.scheduling.scheduler
--------------------------------------------------------------------------------
.. automodule:: maro.rl.scheduling.scheduler
:members:
:undoc-members:
:show-inheritance:
maro.rl.scheduling.simple\_parameter\_scheduler
--------------------------------------------------------------------------------
.. automodule:: maro.rl.scheduling.simple_parameter_scheduler
:members:
:undoc-members:
:show-inheritance:
@ -81,38 +117,6 @@ maro.rl.shaping.abs\_shaper
:undoc-members:
:show-inheritance:
maro.rl.shaping.action\_shaper
--------------------------------------------------------------------------------
.. automodule:: maro.rl.shaping.action_shaper
:members:
:undoc-members:
:show-inheritance:
maro.rl.shaping.experience\_shaper
--------------------------------------------------------------------------------
.. automodule:: maro.rl.shaping.experience_shaper
:members:
:undoc-members:
:show-inheritance:
maro.rl.shaping.k\_step\_experience\_shaper
--------------------------------------------------------------------------------
.. automodule:: maro.rl.shaping.k_step_experience_shaper
:members:
:undoc-members:
:show-inheritance:
maro.rl.shaping.state\_shaper
--------------------------------------------------------------------------------
.. automodule:: maro.rl.shaping.state_shaper
:members:
:undoc-members:
:show-inheritance:
Storage
================================================================================
@ -125,18 +129,10 @@ maro.rl.storage.abs\_store
:undoc-members:
:show-inheritance:
maro.rl.storage.column\_based\_store
maro.rl.storage.simple\_store
--------------------------------------------------------------------------------
.. automodule:: maro.rl.storage.column_based_store
:members:
:undoc-members:
:show-inheritance:
maro.rl.storage.utils
--------------------------------------------------------------------------------
.. automodule:: maro.rl.storage.utils
.. automodule:: maro.rl.storage.simple_store
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -37,14 +37,16 @@ author = "MARO Team"
# extensions coming with Sphinx (named "sphinx.ext.*") or your custom
# ones.
extensions = ["recommonmark",
"sphinx.ext.autodoc",
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx_markdown_tables",
"sphinx_copybutton",
]
extensions = [
"recommonmark",
"sphinx.ext.autodoc",
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx_markdown_tables",
"sphinx_copybutton",
"sphinx.ext.autosectionlabel",
]
napoleon_google_docstring = True
napoleon_use_param = False

Просмотреть файл

@ -1,298 +1,167 @@
Multi Agent DQN for CIM
================================================
This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the
`CIM <https://maro.readthedocs.io/en/latest/scenarios/container_inventory_management.html>`_ problem. It is formalized as a multi-agent reinforcement learning problem, where each port acts as a decision
agent. The agents take actions independently, e.g., loading containers to vessels or discharging containers from vessels.
This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the container
inventory management (CIM) problem. It is formalized as a multi-agent reinforcement learning problem,
where each port acts as a decision agent. When a vessel arrives at a port, these agents must take actions
by transfering a certain amount of containers to / from the vessel. The objective is for the agents to
learn policies that minimize the overall container shortage.
State Shaper
------------
Trajectory
----------
`State shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ converts the environment
observation to the model input state which includes temporal and spatial information. For this scenario, the model input
state includes:
The ``CIMTrajectoryForDQN`` inherits from ``Trajectory`` function and implements methods to be used as callbacks
in the roll-out loop. In this example,
* ``get_state`` converts environment observations to state vectors that encode temporal and spatial information.
The temporal information includes relevant port and vessel information, such as shortage and remaining space,
over the past k days (here k = 7). The spatial information includes features of the downstream ports.
* ``get_action`` converts agents' output (an integer that maps to a percentage of containers to be loaded
to or unloaded from the vessel) to action objects that can be executed by the environment.
* ``get_offline_reward`` computes the reward of a given action as a linear combination of fulfillment and
shortage within a future time frame.
* ``on_finish`` processes a complete trajectory into data that can be used directly by the learning agents.
- Temporal information, including the past week's information of ports and vessels, such as shortage on port and
remaining space on vessel.
- Spatial information, including related downstream port features.
.. code-block:: python
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
class CIMTrajectoryForDQN(Trajectory):
def __init__(
self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
reward_time_window, fulfillment_factor, shortage_factor, time_decay,
finite_vessel_space=True, has_early_discharge=True
):
super().__init__(env)
self.port_attributes = port_attributes
self.vessel_attributes = vessel_attributes
self.action_space = action_space
self.look_back = look_back
self.max_ports_downstream = max_ports_downstream
self.reward_time_window = reward_time_window
self.fulfillment_factor = fulfillment_factor
self.shortage_factor = shortage_factor
self.time_decay = time_decay
self.finite_vessel_space = finite_vessel_space
self.has_early_discharge = has_early_discharge
class CIMStateShaper(StateShaper):
...
def __call__(self, decision_event, snapshot_list):
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
ticks = [tick - rt for rt in range(self._look_back - 1)]
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
state = np.concatenate((port_features, vessel_features))
return str(port_idx), state
def get_state(self, event):
vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
ticks = [tick - rt for rt in range(self.look_back - 1)]
future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
return {port_idx: np.concatenate((port_features, vessel_features))}
def get_action(self, action_by_agent, event):
vessel_snapshots = self.env.snapshot_list["vessels"]
action_info = list(action_by_agent.values())[0]
model_action = action_info[0] if isinstance(action_info, tuple) else action_info
scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
zero_action_idx = len(self.action_space) / 2 # index corresponding to value zero.
vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
percent = abs(self.action_space[model_action])
Action Shaper
-------------
`Action shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert an
agent's model output to an environment executable action. For this specific scenario, the action space consists of
integers from -10 to 10, with -10 indicating loading 100% of the containers in the current inventory to the vessel and
10 indicating discharging 100% of the containers on the vessel to the port.
.. code-block:: python
class CIMActionShaper(ActionShaper):
...
def __call__(self, model_action, decision_event, snapshot_list):
scope = decision_event.action_scope
tick = decision_event.tick
port_idx = decision_event.port_idx
vessel_idx = decision_event.vessel_idx
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
assert 0 <= model_action < len(self._action_space)
if model_action < self._zero_action_index:
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
elif model_action > self._zero_action_index:
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
actual_action = (
round(plan_action) if plan_action > 0
else round(self._action_space[model_action] * scope.discharge)
)
if model_action < zero_action_idx:
action_type = ActionType.LOAD
actual_action = min(round(percent * scope.load), vessel_space)
elif model_action > zero_action_idx:
action_type = ActionType.DISCHARGE
plan_action = percent * (scope.discharge + early_discharge) - early_discharge
actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
else:
actual_action = 0
actual_action, action_type = 0, None
return Action(vessel_idx, port_idx, actual_action)
return {port: Action(vessel, port, actual_action, action_type)}
Experience Shaper
-----------------
def get_offline_reward(self, event):
port_snapshots = self.env.snapshot_list["ports"]
start_tick = event.tick + 1
ticks = list(range(start_tick, start_tick + self.reward_time_window))
`Experience shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert
an episode trajectory to trainable experiences for RL agents. For this specific scenario, the reward is a linear
combination of fulfillment and shortage in a limited time window.
future_fulfillment = port_snapshots[ticks::"fulfillment"]
future_shortage = port_snapshots[ticks::"shortage"]
decay_list = [
self.time_decay ** i for i in range(self.reward_time_window)
for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
]
.. code-block:: python
class TruncatedExperienceShaper(ExperienceShaper):
...
def __call__(self, trajectory, snapshot_list):
experiences_by_agent = {}
for i in range(len(trajectory) - 1):
transition = trajectory[i]
agent_id = transition["agent_id"]
if agent_id not in experiences_by_agent:
experiences_by_agent[agent_id] = defaultdict(list)
experiences = experiences_by_agent[agent_id]
experiences["state"].append(transition["state"])
experiences["action"].append(transition["action"])
experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
experiences["next_state"].append(trajectory[i + 1]["state"])
tot_fulfillment = np.dot(future_fulfillment, decay_list)
tot_shortage = np.dot(future_shortage, decay_list)
return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
self.trajectory["event"].append(event)
self.trajectory["state"].append(state_by_agent)
self.trajectory["action"].append(action_by_agent)
def on_finish(self):
exp_by_agent = defaultdict(lambda: defaultdict(list))
for i in range(len(self.trajectory["state"]) - 1):
agent_id = list(self.trajectory["state"][i].keys())[0]
exp = exp_by_agent[agent_id]
exp["S"].append(self.trajectory["state"][i][agent_id])
exp["A"].append(self.trajectory["action"][i][agent_id])
exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
return dict(exp_by_agent)
return experiences_by_agent
Agent
-----
`Agent <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent>`_ is a combination of (RL)
algorithm, experience pool, and a set of parameters that governs the training loop. For this scenario, the agent is the
abstraction of a port. We choose DQN as our underlying learning algorithm with a TD-error-based sampling mechanism.
The out-of-the-box DQN is used as our agent.
.. code-block:: python
NUM_ACTIONS = 21
class DQNAgent(AbsAgent):
...
def train(self):
if len(self._experience_pool) < self._min_experiences_to_train:
return
for _ in range(self._num_batches):
indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
state = np.asarray(sample["state"])
action = np.asarray(sample["action"])
reward = np.asarray(sample["reward"])
next_state = np.asarray(sample["next_state"])
loss = self._algorithm.train(state, action, reward, next_state)
self._experience_pool.update(indexes, {"loss": loss})
def create_dqn_agents(agent_id_list):
agent_dict = {}
for agent_id in agent_id_list:
q_net = NNStack(
"q_value",
FullyConnectedBlock(
input_dim=state_shaper.dim,
hidden_dims=[256, 128, 64],
output_dim=NUM_ACTIONS,
activation=nn.LeakyReLU,
is_head=True,
batch_norm_enabled=True,
softmax_enabled=False,
skip_connection_enabled=False,
dropout_p=.0)
)
algorithm = DQN(
model=LearningModel(
q_net, optimizer_options=OptimizerOptions(cls=RMSprop, params={"lr": 0.05})
),
config=DQNConfig(
reward_decay=.0,
target_update_frequency=5,
tau=0.1,
is_double=True,
per_sample_td_error_enabled=True,
loss_cls=nn.SmoothL1Loss
)
)
experience_pool = ColumnBasedStore(**config.experience_pool)
agent_dict[agent_id] = DQNAgent(
agent_id, algorithm, ColumnBasedStore(),
min_experiences_to_train=1024, num_batches=10, batch_size=128
)
return agent_dict
Agent Manager
-------------
The complexities of the environment can be isolated from the learning algorithm by using an
`Agent manager <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent-manager>`_
to manage individual agents. We define a function to create the agents and an agent manager class
that implements the ``train`` method where the newly obtained experiences are stored in the agents'
experience pools before training, in accordance with the DQN algorithm.
.. code-block:: python
class DQNAgentManager(SimpleAgentManager):
def train(self, experiences_by_agent, performance=None):
self._assert_train_mode()
# store experiences for each agent
for agent_id, exp in experiences_by_agent.items():
exp.update({"loss": [1e8] * len(list(exp.values())[0])})
self.agent_dict[agent_id].store_experiences(exp)
for agent in self.agent_dict.values():
agent.train()
Main Loop with Actor and Learner (Single Process)
-------------------------------------------------
This single-process workflow of a learning policy's interaction with a MARO environment is comprised of:
- Initializing an environment with specific scenario and topology parameters.
- Defining scenario-specific components, e.g. shapers.
- Creating agents and an agent manager.
- Creating an `actor <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ and a
`learner <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ to start the
training process in which the agent manager interacts with the environment for collecting experiences and updating
policies.
.. code-block::python
env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
state_shaper = CIMStateShaper(look_back=7, max_ports_downstream=2)
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, NUM_ACTIONS)))
experience_shaper = TruncatedExperienceShaper(
time_window=100, fulfillment_factor=1.0, shortage_factor=1.0, time_decay_factor=0.97
)
agent_manager = DQNAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN_INFERENCE,
agent_dict=create_dqn_agents(agent_id_list),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper
)
scheduler = TwoPhaseLinearParameterScheduler(
max_episode=100,
parameter_names=["epsilon"],
split_ep=50,
start_values=0.4,
mid_values=0.32,
end_values=.0
)
actor = SimpleActor(env, agent_manager)
learner = SimpleLearner(agent_manager, actor, scheduler)
learner.learn()
Main Loop with Actor and Learner (Distributed/Multi-process)
--------------------------------------------------------------
We demonstrate a single-learner and multi-actor topology where the learner drives the program by telling remote actors
to perform roll-out tasks and using the results they sent back to improve the policies. The workflow usually involves
launching a learner process and an actor process separately. Because training occurs on the learner side and inference
occurs on the actor side, we need to create appropriate agent managers on both sides.
On the actor side, the agent manager must be equipped with all shapers as well as an explorer. Thus, The code for
creating an environment and an agent manager on the actor side is similar to that for the single-host version,
except that it is necessary to set the AgentManagerMode to AgentManagerMode.INFERENCE. As in the single-process version, the environment
and the agent manager are wrapped in a SimpleActor instance. To make the actor a distributed worker, we need to further
wrap it in an ActorWorker instance. Finally, we launch the worker and it starts to listen to roll-out requests from the
learner. The following code snippet shows the creation of an actor worker with a simple (local) actor wrapped inside.
.. code-block:: python
env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
agent_manager = DQNAgentManager(
name="cim_learner",
mode=AgentManagerMode.INFERENCE,
agent_dict=create_dqn_agents(agent_id_list),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper
)
proxy_params = {
"group_name": "distributed_cim",
"expected_peers": {"learner": 1},
"redis_address": ("localhost", 6379),
"max_retries": 15
.. code-block:: python
agent_config = {
"model": ...,
"optimization": ...,
"hyper_params": ...
}
actor_worker = ActorWorker(
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
proxy_params=proxy_params
)
actor_worker.launch()
On the learner side, an agent manager in AgentManagerMode.TRAIN mode is required. However, it is not necessary to create shapers for an
agent manager in AgentManagerMode.TRAIN mode. Instead of creating an actor, we create an actor proxy and wrap it inside the learner. This proxy
serves as the communication interface for the learner and is responsible for sending roll-out requests to remote actor
processes and receiving results. Calling the train method executes the usual training loop except that the actual
roll-out is performed remotely. The code snippet below shows the creation of a learner with an actor proxy wrapped
inside that communicates with 3 actors.
def get_dqn_agent():
q_model = SimpleMultiHeadModel(
FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
)
return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
Training
--------
The distributed training consists of one learner process and multiple actor processes. The learner optimizes
the policy by collecting roll-out data from the actors to train the underlying agents.
The actor process must create a roll-out executor for performing the requested roll-outs, which means that the
the environment simulator and shapers should be created here. In this example, inference is performed on the
actor's side, so a set of DQN agents must be created in order to load the models (and exploration parameters)
from the learner.
.. code-block:: python
def cim_dqn_actor():
env = Env(**training_config["env"])
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
actor.as_worker(training_config["group"])
agent_manager = DQNAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN,
agent_dict=create_dqn_agents(agent_id_list),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper
)
proxy_params = {
"group_name": "distributed_cim",
"expected_peers": {"actor": 3},
"redis_address": ("localhost", 6379),
"max_retries": 15
}
actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
scheduler = TwoPhaseLinearParameterScheduler(
max_episode=100,
parameter_names=["epsilon"],
split_ep=50,
start_values=0.4,
mid_values=0.32,
end_values=.0
)
learner = SimpleLearner(agent_manager, actor, scheduler)
learner.learn()
The learner's side requires a concrete learner class that inherits from ``AbsLearner`` and implements the ``run``
method which contains the main training loop. Here the implementation is similar to the single-threaded version
except that the ``collect`` method is used to obtain roll-out data from the actors (since the roll-out executors
are located on the actors' side). The agents created here are where training occurs and hence always contains the
latest policies.
.. code-block:: python
def cim_dqn_learner():
env = Env(**training_config["env"])
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
actor = ActorProxy(
training_config["group"], training_config["num_actors"],
update_trigger=training_config["learner_update_trigger"]
)
learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
learner.run()
.. note::

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 19 KiB

После

Ширина:  |  Высота:  |  Размер: 21 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 16 KiB

После

Ширина:  |  Высота:  |  Размер: 16 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 13 KiB

После

Ширина:  |  Высота:  |  Размер: 13 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 17 KiB

После

Ширина:  |  Высота:  |  Размер: 186 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 27 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 184 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

После

Ширина:  |  Высота:  |  Размер: 180 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 205 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 12 KiB

Двоичные данные
docs/source/images/visualization/geographic/data_chart_display.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.1 MiB

Двоичные данные
docs/source/images/visualization/geographic/database_exp.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 71 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 3.3 MiB

Двоичные данные
docs/source/images/visualization/geographic/local_mode.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 558 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 344 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 316 KiB

Двоичные данные
docs/source/images/visualization/geographic/real_time_mode.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.9 MiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.1 MiB

Просмотреть файл

@ -81,9 +81,9 @@ Contents
installation/pip_install.rst
installation/playground.rst
installation/grass_cluster_provisioning_on_azure.rst
installation/k8s_cluster_provisioning_on_azure.rst
installation/grass_cluster_provisioning_on_premises.rst
installation/grass_azure_cluster_provisioning.rst
installation/grass_on_premises_cluster_provisioning.rst
installation/k8s_aks_cluster_provisioning.rst
installation/multi_processes_localhost_provisioning.rst
.. toctree::
@ -93,6 +93,7 @@ Contents
scenarios/container_inventory_management.rst
scenarios/citi_bike.rst
scenarios/vm_scheduling.rst
scenarios/command_line.rst
.. toctree::
:maxdepth: 2
@ -114,6 +115,7 @@ Contents
key_components/communication.rst
key_components/orchestration.rst
key_components/dashboard_visualization.rst
key_components/geographic_visualization.rst
.. toctree::
:maxdepth: 2

Просмотреть файл

@ -0,0 +1,240 @@
.. _grass-azure-cluster-provisioning:
Grass Cluster Provisioning on Azure
===================================
With the following guide, you can build up a MARO cluster in
:ref:`grass/azure <grass>`
mode on Azure and run your training job in a distributed environment.
Prerequisites
-------------
* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
* `Install docker <https://docs.docker.com/engine/install/>`_ and
`Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
Cluster Management
------------------
* Create a cluster with a :ref:`deployment <#grass-azure-create>`
.. code-block:: sh
# Create a grass cluster with a grass-create deployment
maro grass create ./grass-azure-create.yml
* Scale the cluster
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.
.. code-block:: sh
# Scale nodes with 'Standard_D4s_v3' specification to 2
maro grass node scale myGrassCluster Standard_D4s_v3 2
# Scale nodes with 'Standard_D2s_v3' specification to 0
maro grass node scale myGrassCluster Standard_D2s_v3 0
* Delete the cluster
.. code-block:: sh
# Delete a grass cluster
maro grass delete myGrassCluster
* Start/Stop nodes to save costs
.. code-block:: sh
# Start 2 nodes with 'Standard_D4s_v3' specification
maro grass node start myGrassCluster Standard_D4s_v3 2
# Stop 2 nodes with 'Standard_D4s_v3' specification
maro grass node stop myGrassCluster Standard_D4s_v3 2
* Get statuses of the cluster
.. code-block:: sh
# Get master status
maro grass status myGrassCluster master
# Get nodes status
maro grass status myGrassCluster nodes
# Get containers status
maro grass status myGrassCluster containers
* Clean up the cluster
Delete all running jobs, schedules, containers in the cluster.
.. code-block:: sh
maro grass clean myGrassCluster
.. _grass-azure-cluster-provisioning/run-job:
Run Job
-------
* Push your training image from local machine
.. code-block:: sh
# Push image 'myImage' to the cluster,
# 'myImage' is a docker image that loaded on the machine that executed this command
maro grass image push myGrassCluster --image-name myImage
* Push your training data
.. code-block:: sh
# Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
# You can then assign your mapping location in the start-job-deployment
maro grass data push myGrassCluster ./myTrainingData/dqn /myTrainingData
* Start a training job with a :ref:`start-job-deployment <grass-start-job>`
.. code-block:: sh
# Start a training job with a start-job deployment
maro grass job start myGrassCluster ./grass-start-job.yml
* Or, schedule batch jobs with a :ref:`start-schedule-deployment <grass-start-schedule>`
These jobs will shared the same specification of components.
A best practice to use this command will be:
Push your training configs all at once with "``maro grass data push``",
and get the jobName from environment variables in the containers,
then use the specific training config based on the jobName.
.. code-block:: sh
# Start a training schedule with a start-schedule deployment
maro grass schedule start myGrassCluster ./grass-start-schedule.yml
* Get the logs of the job
.. code-block:: sh
# Get the logs of the job
maro grass job logs myGrassCluster myJob1
* List the current status of the job
.. code-block:: sh
# List the current status of the job
maro grass job list myGrassCluster
* Stop a training job
.. code-block:: sh
# Stop a training job
maro grass job stop myJob1
Sample Deployments
------------------
grass-azure-create
^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass/azure
name: myGrassCluster
cloud:
resource_group: myResourceGroup
subscription: mySubscription
location: eastus
default_username: admin
default_public_key: "{ssh public key}"
user:
admin_id: admin
master:
node_size: Standard_D2s_v3
grass-start-job
^^^^^^^^^^^^^^^
You can replace {project root} with a valid linux path. e.g. /home/admin
Then the data you push will be mount into this folder.
.. code-block:: yaml
mode: grass
name: myJob1
allocation:
mode: single-metric-balanced
metric: cpu
components:
actor:
command: "python {project root}/myTrainingData/dqn/job1/start_actor.py"
image: myImage
mount:
target: "{project root}"
num: 5
resources:
cpu: 1
gpu: 0
memory: 1024m
learner:
command: "python {project root}/myTrainingData/dqn/job1/start_learner.py"
image: myImage
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m
grass-start-schedule
^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: mySchedule1
allocation:
mode: single-metric-balanced
metric: cpu
job_names:
- myJob2
- myJob3
- myJob4
- myJob5
components:
actor:
command: "python {project root}/myTrainingData/dqn/schedule1/actor.py"
image: myImage
mount:
target: “{project root}”
num: 5
resources:
cpu: 1
gpu: 0
memory: 1024m
learner:
command: "bash {project root}/myTrainingData/dqn/schedule1/learner.py"
image: myImage
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m

Просмотреть файл

@ -1,202 +0,0 @@
Grass Cluster Provisioning on Azure
===================================
With the following guide, you can build up a MARO cluster in
`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
on Azure and run your training job in a distributed environment.
Prerequisites
-------------
* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
* `Install docker <https://docs.docker.com/engine/install/>`_ and
`Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
Cluster Management
------------------
* Create a cluster with a `deployment <#grass-azure-create>`_
.. code-block:: sh
# Create a grass cluster with a grass-create deployment
maro grass create ./grass-azure-create.yml
* Scale the cluster
.. code-block:: sh
# Scale nodes with 'Standard_D4s_v3' specification to 2
maro grass node scale my_grass_cluster Standard_D4s_v3 2
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
to see more node specifications.
* Delete the cluster
.. code-block:: sh
# Delete a grass cluster
maro grass delete my_grass_cluster
* Start/stop nodes to save costs
.. code-block:: sh
# Start 2 nodes with 'Standard_D4s_v3' specification
maro grass node start my_grass_cluster Standard_D4s_v3 2
# Stop 2 nodes with 'Standard_D4s_v3' specification
maro grass node stop my_grass_cluster Standard_D4s_v3 2
Run Job
-------
* Push your training image
.. code-block:: sh
# Push image 'my_image' to the cluster
maro grass image push my_grass_cluster --image-name my_image
* Push your training data
.. code-block:: sh
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
# You can then assign your mapping location in the start-job deployment
maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
* Start a training job with a `deployment <#grass-start-job>`_
.. code-block:: sh
# Start a training job with a start-job deployment
maro grass job start my_grass_cluster ./grass-start-job.yml
* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
.. code-block:: sh
# Start a training schedule with a start-schedule deployment
maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
* Get the logs of the job
.. code-block:: sh
# Get the logs of the job
maro grass job logs my_grass_cluster my_job_1
* List the current status of the job
.. code-block:: sh
# List the current status of the job
maro grass job list my_grass_cluster
* Stop a training job
.. code-block:: sh
# Stop a training job
maro grass job stop my_job_1
Sample Deployments
------------------
grass-azure-create
^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: my_grass_cluster
cloud:
infra: azure
location: eastus
resource_group: my_grass_resource_group
subscription: my_subscription
user:
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
admin_username: admin
master:
node_size: Standard_D2s_v3
grass-start-job
^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: my_job_1
allocation:
mode: single-metric-balanced
metric: cpu
components:
actor:
command: "bash {project root}/my_training_data/job_1/actor.sh"
image: my_image
mount:
target: “{project root}”
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
learner:
command: "bash {project root}/my_training_data/job_1/learner.sh"
image: my_image
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m
grass-start-schedule
^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: my_schedule_1
allocation:
mode: single-metric-balanced
metric: cpu
job_names:
- my_job_2
- my_job_3
- my_job_4
- my_job_5
components:
actor:
command: "bash {project root}/my_training_data/job_1/actor.sh"
image: my_image
mount:
target: “{project root}”
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
learner:
command: "bash {project root}/my_training_data/job_1/learner.sh"
image: my_image
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m

Просмотреть файл

@ -1,206 +0,0 @@
Grass Cluster Provisioning in On-Premises Environment
=====================================================
With the following guide, you can build up a MARO cluster in
`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
in local private network and run your training job in On-Premises distributed environment.
Prerequisites
-------------
* Linux with Python 3.6+
* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
Cluster Management
------------------
* Create a cluster with a `deployment <#grass-cluster-create>`_
.. code-block:: sh
# Create a grass cluster with a grass-create deployment
maro grass create ./grass-azure-create.yml
* Let a node join a specified cluster
.. code-block:: sh
# Let a worker node join into specified cluster
maro grass node join ./node-join.yml
* Let a node leave a specified cluster
.. code-block:: sh
# Let a worker node leave a specified cluster
maro grass node leave {cluster_name} {node_name}
* Delete the cluster
.. code-block:: sh
# Delete a grass cluster
maro grass delete my_grass_cluster
Run Job
-------
* Push your training image
.. code-block:: sh
# Push image 'my_image' to the cluster
maro grass image push my_grass_cluster --image-name my_image
* Push your training data
.. code-block:: sh
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
# You can then assign your mapping location in the start-job deployment
maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
* Start a training job with a `deployment <#grass-start-job>`_
.. code-block:: sh
# Start a training job with a start-job deployment
maro grass job start my_grass_cluster ./grass-start-job.yml
* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
.. code-block:: sh
# Start a training schedule with a start-schedule deployment
maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
* Get the logs of the job
.. code-block:: sh
# Get the logs of the job
maro grass job logs my_grass_cluster my_job_1
* List the current status of the job
.. code-block:: sh
# List the current status of the job
maro grass job list my_grass_cluster
* Stop a training job
.. code-block:: sh
# Stop a training job
maro grass job stop my_job_1
Sample Deployments
------------------
grass-cluster-create
^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass/on-premises
name: cluster_name
user:
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
admin_username: admin
grass-node-join
^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: "grass/on-premises"
name: ""
cluster: ""
public_ip_address: ""
hostname: ""
system: "linux"
resources:
cpu: 1
memory: 1024
gpu: 0
grass-start-job
^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: my_job_1
allocation:
mode: single-metric-balanced
metric: cpu
components:
actor:
command: "bash {project root}/my_training_data/job_1/actor.sh"
image: my_image
mount:
target: “{project root}”
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
learner:
command: "bash {project root}/my_training_data/job_1/learner.sh"
image: my_image
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m
grass-start-schedule
^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass
name: my_schedule_1
allocation:
mode: single-metric-balanced
metric: cpu
job_names:
- my_job_2
- my_job_3
- my_job_4
- my_job_5
components:
actor:
command: "bash {project root}/my_training_data/job_1/actor.sh"
image: my_image
mount:
target: “{project root}”
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
learner:
command: "bash {project root}/my_training_data/job_1/learner.sh"
image: my_image
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m

Просмотреть файл

@ -0,0 +1,98 @@
.. _grass-on-premises-cluster-provisioning:
Grass Cluster Provisioning in On-Premises Environment
=====================================================
With the following guide, you can build up a MARO cluster in
:ref:`grass/on-premises <grass>`
in local private network and run your training job in On-Premises distributed environment.
Prerequisites
-------------
* Linux with Python 3.6+
* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
Cluster Management
------------------
* Create a cluster with a :ref:`deployment <grass-on-premises-create>`
.. code-block:: sh
# Create a grass cluster with a grass-create deployment
maro grass create ./grass-azure-create.yml
* Let a node join a specified cluster
.. code-block:: sh
# Let a worker node join into specified cluster
maro grass node join ./node-join.yml
* Let a node leave a specified cluster
.. code-block:: sh
# Let a worker node leave a specified cluster
maro grass node leave {cluster_name} {node_name}
* Delete the cluster
.. code-block:: sh
# Delete a grass cluster
maro grass delete my_grass_cluster
Run Job
-------
See :ref:`Run Job in grass/azure <grass-azure-cluster-provisioning/run-job>` for reference.
Sample Deployments
------------------
grass-on-premises-create
^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass/on-premises
name: clusterName
user:
admin_id: admin
master:
username: root
hostname: maroMaster
public_ip_address: 137.128.0.1
private_ip_address: 10.0.0.4
grass-on-premises-join-cluster
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: grass/on-premises
master:
private_ip_address: 10.0.0.4
node:
hostname: maroNode1
username: root
public_ip_address: 137.128.0.2
private_ip_address: 10.0.0.5
resources:
cpu: all
memory: 2048m
gpu: 0
config:
install_node_runtime: true
install_node_gpu_support: false

Просмотреть файл

@ -1,8 +1,10 @@
.. _k8s-aks-cluster-provisioning:
K8S Cluster Provisioning on Azure
=================================
With the following guide, you can build up a MARO cluster in
`k8s mode <../distributed_training/orchestration_with_k8s.html#orchestration-with-k8s>`_
:ref:`k8s/aks <k8s>`
on Azure and run your training job in a distributed environment.
Prerequisites
@ -36,7 +38,7 @@ Prerequisites
Cluster Management
------------------
* Create a cluster with a `deployment <#k8s-azure-create>`_
* Create a cluster with a :ref:`deployment <k8s-aks-create>`
.. code-block:: sh
@ -47,18 +49,20 @@ Cluster Management
.. code-block:: sh
# Scale nodes with 'Standard_D4s_v3' specification to 2
maro k8s node scale my_k8s_cluster Standard_D4s_v3 2
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
to see more node specifications.
# Scale nodes with 'Standard_D4s_v3' specification to 2
maro k8s node scale myK8sCluster Standard_D4s_v3 2
# Scale nodes with 'Standard_D2s_v3' specification to 0
maro k8s node scale myK8sCluster Standard_D2s_v3 0
* Delete the cluster
.. code-block:: sh
# Delete a k8s cluster
maro k8s delete my_k8s_cluster
maro k8s delete myK8sCluster
Run Job
-------
@ -67,72 +71,69 @@ Run Job
.. code-block:: sh
# Push image 'my_image' to the cluster
maro k8s image push my_k8s_cluster --image-name my_image
# Push image 'myImage' to the cluster
maro k8s image push myK8sCluster --image-name myImage
* Push your training data
.. code-block:: sh
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
# You can then assign your mapping location in the start-job deployment
maro k8s data push my_k8s_cluster ./my_training_data/* /my_training_data
# Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
# You can then assign your mapping location in the start-job-deployment
maro k8s data push myGrassCluster ./myTrainingData/dqn /myTrainingData
* Start a training job with a `deployment <#k8s-start-job>`_
* Start a training job with a :ref:`deployment <k8s-start-job>`
.. code-block:: sh
# Start a training job with a start-job deployment
maro k8s job start my_k8s_cluster ./k8s-start-job.yml
# Start a training job with a start-job-deployment
maro k8s job start myK8sCluster ./k8s-start-job.yml
* Or, schedule batch jobs with a `deployment <#k8s-start-schedule>`_
* Or, schedule batch jobs with a :ref:`deployment <k8s-start-schedule>`
.. code-block:: sh
# Start a training schedule with a start-schedule deployment
maro k8s schedule start my_k8s123_cluster ./k8s-start-schedule.yml
# Start a training schedule with a start-schedule-deployment
maro k8s schedule start myK8sCluster ./k8s-start-schedule.yml
* Get the logs of the job
.. code-block:: sh
# Logs will be exported to current directory
maro k8s job logs my_k8s_cluster my_job_1
maro k8s job logs myK8sCluster myJob1
* List the current status of the job
.. code-block:: sh
# List current status of jobs
maro k8s job list my_k8s_cluster my_job_1
maro k8s job list myK8sCluster myJob1
* Stop a training job
.. code-block:: sh
# Stop a training job
maro k8s job stop my_k8s_cluster my_job_1
maro k8s job stop myK8sCluster myJob1
Sample Deployments
------------------
k8s-azure-create
^^^^^^^^^^^^^^^^
k8s-aks-create
^^^^^^^^^^^^^^
.. code-block:: yaml
mode: k8s
name: my_k8s_cluster
mode: k8s/aks
name: myK8sCluster
cloud:
infra: azure
subscription: mySubscription
resource_group: myResourceGroup
location: eastus
resource_group: my_k8s_resource_group
subscription: my_subscription
user:
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
admin_username: admin
default_public_key: "{ssh public key}"
default_username: admin
master:
node_size: Standard_D2s_v3
@ -142,63 +143,63 @@ k8s-start-job
.. code-block:: yaml
mode: k8s
name: my_job_1
mode: k8s/aks
name: myJob1
components:
actor:
command: ["bash", "{project root}/my_training_data/actor.sh"]
image: my_image
command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
image: myImage
mount:
target: "{project root}"
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
memory: 2048M
learner:
command: ["bash", "{project root}/my_training_data/learner.sh"]
image: my_image
command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
image: myImage
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m
memory: 2048M
k8s-start-schedule
^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
mode: k8s
name: my_schedule_1
mode: k8s/aks
name: mySchedule1
job_names:
- my_job_2
- my_job_3
- my_job_4
- my_job_5
- myJob2
- myJob3
- myJob4
- myJob5
components:
actor:
command: ["bash", "{project root}/my_training_data/actor.sh"]
image: my_image
command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
image: myImage
mount:
target: "{project root}"
num: 5
resources:
cpu: 2
gpu: 0
memory: 2048m
memory: 2048M
learner:
command: ["bash", "{project root}/my_training_data/learner.sh"]
image: my_image
command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
image: myImage
mount:
target: "{project root}"
num: 1
resources:
cpu: 2
gpu: 0
memory: 2048m
memory: 2048M

Просмотреть файл

@ -71,7 +71,7 @@ To start this visualization tool, user need to input command following the forma
.. code-block:: sh
maro inspector env --source {source\_folder\_path} --force {true/false}
maro inspector dashboard --source_path {source\_folder\_path} --force {true/false}
----
@ -79,7 +79,7 @@ e.g.
.. code-block:: sh
maro inspector env --source_path .\maro\dumper_files --force false
maro inspector dashboard --source_path .\maro\dumper_files --force false
----

Просмотреть файл

@ -0,0 +1,235 @@
Geographic Visualization
=======================
We can use Env-geographic for both finished experiments and running experiments.
For finished experiments, the local mode is enabled for users to view experimental data
in order to help users to make subsequent decisions. If a running experiment is selected,
the real-time mode will be launched by default, it is used to view real-time experimental
data and judge the effectiveness of the model. You can also freely change to
local mode for the finished epoch under real-time mode.
Dependency
----------
Env-geographic's startup depends on docker.
Therefore, users need to install docker on the machine and ensure that it can run normally.
User could get docker through `Docker installation <https://docs.docker.com/get-docker/>`_.
How to Use?
-----------
Env-geographic has 3 parts: front-end, back-end and database. Users need 2 steps
to start this tool:
1. Start the database and choose an experiment to be displayed.
2. Start the front-end and back-end service with specified experiment name.
Start database
~~~~~~~~~~~~~~
Firstly, user need to start the local database with command:
.. code-block:: sh
maro inspector geo --start database
----
After the command is executed successfully, user
could view the local data with localhost:9000 by default.
If the default port is occupied, user could obtain the access port of each container
through the following command:
.. code-block:: sh
docker container ls
----
User could view all experiment information by SQL statement:
.. code-block:: SQL
SELECT * FROM maro.experiments
----
Data is stored locally at the folder maro/maro/streamit/server/data.
Choose an existing experiment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To view the visualization of experimental data, user need to
specify the name of experiment. User could choose an existing
experiment or start an experiment either.
User could select a name from local database.
.. image:: ../images/visualization/geographic/database_exp.png
:alt: database_exp
Create a new experiment
^^^^^^^^^^^^^^^^^^^^^^^
Currently, users need to manually start the experiment to obtain
the data required by the service.
To send data to database, there are 2 compulsory steps:
1. Set the environmental variable to enable data transmission.
2. Import relevant package and modify the code of environmental initialization to send data.
User needs to set the value of the environment variable
"MARO_STREAMIT_ENABLED" to "true". If user wants to specify the experiment name,
set the environment variable "MARO_STREAMIT_EXPERIMENT_NAME". If user does not
set this value, a unique experiment name would be processed automatically. User
could check the experiment name through database. It should be noted that when
selecting a topology, user must select a topology with specific geographic
information. The experimental data obtained by using topology files without
geographic information cannot be used in the Env-geographic tool.
User could set the environmental variable as following example:
.. code-block:: python
os.environ["MARO_STREAMIT_ENABLED"] = "true"
os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "my_maro_experiment"
----
To send the experimental data by episode while the experiment is running, user needs to import the
package **streamit** with following code before environment initialization:
.. code-block:: python
# Import package streamit
from maro.streamit import streamit
# Initialize environment and send basic information of experiment to database.
env = Env(scenario="cim", topology="global_trade.22p_l0.1",
start_tick=0, durations=100)
for ep in range(EPISODE_NUMBER):
# Send experimental data to database by episode.
streamit.episode(ep)
----
To get the complete reference, please view the file maro/examples/hello_world/cim/hello.py.
After starting the experiment, user needs to query its name in local database to make sure
the experimental data is sent successfully.
Start service
~~~~~~~~~~~~~
To start the front-end and back-end service, user need to specify the experiment name.
User could specify the port by adding the parameter "front_end_port" as following
command:
.. code-block:: sh
maro inspector geo --start service --experiment_name YOUR_EXPERIMENT_NAME --front_end_port 8080
----
The program will automatically determine whether to use real-time mode
or local mode according to the data status of the current experiment.
Feature List
------------
For the convenience of users, Env-geographic tool implemented some features
so that users can freely view experimental data.
Real-time mode and local mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Local mode
^^^^^^^^^^
In this mode, user could comprehend the experimental data through the geographic
information and the charts on both sides. By clicking the play button in the lower
left corner of the page, user could view the dynamic changes of the data in the
selected time window. By hovering on geographic items and charts, more detailed information
could be displayed.
.. image:: ../images/visualization/geographic/local_mode.gif
:alt: local_mode
The chart on the right side of the page shows the changes in the data over
a period of time from the perspectives of overall, port, and vessel.
.. image:: ../images/visualization/geographic/local_mode_right_chart.gif
:alt: local_mode_right_chart
The chart on the left side of the page shows the ranking of the carrying
capacity of each port and the change in carrying capacity between ports
in the entire time window.
.. image:: ../images/visualization/geographic/local_mode_left_chart.gif
:alt: local_mode_left_chart
Real-time mode
^^^^^^^^^^^^^^
The feature of real-time mode is not much different from that of local mode.
The particularity of real-time mode lies in the data. The automatic playback
speed of the progress bar in the front-end page is often close to the speed
of the experimental data. So user could not select the time window freely in
this mode.
Besides, user could change the mode by clicking. If user choose to view the
local data under real-time mode, the experimental data generated so far could
be displayed.
.. image:: ../images/visualization/geographic/real_time_mode.gif
:alt: real_time_mode
Geographic data display
~~~~~~~~~~~~~~~~~~~~~~~
In the map on the page, user can view the specific status of different resource
holders at various times. Users can further understand a specific area by zooming the map.
Among them, the three different status of the port:
Surplus, Deficit and Balance represent the quantitative relationship between the
empty container volume and the received order volume of the corresponding port
at that time.
.. image:: ../images/visualization/geographic/geographic_data_display.gif
:alt: geographic_data_display
Data chart display
~~~~~~~~~~~~~~~~~~
The ranking table on the right side of the page shows the throughput of routes and
ports over a period of time. While the heat-map shows the throughput between ports
over a period of time. User can hover to specific elements to view data information.
The chart on the left shows the order volume and empty container information of each
port and each vessel. User can view the data of different resource holders by switching options.
In addition, user can zoom the chart to display information more clearly.
.. image:: ../images/visualization/geographic/data_chart_display.gif
:alt: data_chart_display
Time window selection
~~~~~~~~~~~~~~~~~~~~~
This feature is only valid in local mode. User can select the starting point position by
sliding to select the left starting point of the time window, and view the specific data at
different time.
In addition, the user can freely choose the end of the time window. When the user plays this tool,
it will loop in the time window selected by the user.
.. image:: ../images/visualization/geographic/time_window_selection.gif
:alt: time_window_selection

Просмотреть файл

@ -1,4 +1,3 @@
Distributed Orchestration
=========================
@ -7,20 +6,20 @@ on cloud computing service like `Azure <https://azure.microsoft.com/en-us/>`_.
These CLI commands can also be used to schedule the training jobs with the
specified resource requirements. In MARO, all training job related components
are dockerized for easy deployment and resource allocation. It provides a unified
abstraction/interface for different orchestration framework
(e.g. `Grass <#id3>`_\ , `Kubernetes <#id4>`_\ ).
abstraction/interface for different orchestration framework see
(e.g. :ref:`Grass`, :ref:`K8s` ).
.. image:: ../images/distributed/orch_overview.svg
:target: ../images/distributed/orch_overview.svg
:alt: Orchestration Overview
:width: 600
:width: 650
Process
-------
The process mode is part of the `MARO CLI`, which uses multi-processes to start the
training jobs in the localhost environment. To align with `Grass <#id3>`_ and `Kubernetes
<#id4>`_, the process mode also uses Redis for job management. The process mode tries
training jobs in the localhost environment. To align with :ref:`Grass` and :ref:`K8s`,
the process mode also uses Redis for job management. The process mode tries
to simulate the operation of the real distributed cluster in localhost so that users can smoothly
deploy their code to the distributed cluster. Meanwhile, through the training in the process mode,
it is a cheaper way to find bugs that will happens during the real distributed training.
@ -44,59 +43,118 @@ to get how to use it.
.. image:: ../images/distributed/orch_process.svg
:target: ../images/distributed/orch_process.svg
:alt: Orchestration Process Mode on Local
:width: 300
:width: 250
.. _grass:
Grass
-----
Grass is a self-designed, development purpose orchestration framework. It can be
Grass is an orchestration framework developed by the MARO team. It can be
confidently applied to small/middle size cluster (< 200 nodes). The design goal
of Grass is to speed up the distributed algorithm prototype development.
of Grass is to speed up the development of distributed algorithm prototypes.
It has the following advantages:
* Fast deployment in a small cluster.
* Fine-grained resource management.
* Lightweight, no other dependencies are required.
* Lightweight, no complex dependencies required.
In the Grass mode:
* All VMs will be deployed in the same virtual network for a faster, more stable
connection and larger bandwidth. Please note that the maximum number of VMs is
limited by the `available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
* It is a centralized topology, the master node will host Redis service for peer
discovering, Fluentd service for log collecting, SMB service for file sharing.
* On each VM, the probe (worker) agent is used to track the computing resources
and detect abnormal events.
Check `Grass Cluster Provisioning on Azure <../installation/grass_cluster_provisioning_on_azure.html>`_
Check :ref:`Grass Cluster Provisioning on Azure <grass-azure-cluster-provisioning>` and
:ref:`Grass Cluster Provisioning in On-Premises Environment <grass-on-premises-cluster-provisioning>`
to get how to use it.
Modes
^^^^^
We currently have two modes in Grass, and you can choose whichever you want to create a Grass cluster.
**grass/azure**
* Create a Grass cluster with Azure.
* With a valid Azure subscription, you can create a cluster with one command from ground zero.
* You can easily scale up/down nodes as needed,
and start/stop nodes to save costs without messing up the current environment.
* Please note that the maximum number of VMs in grass/azure is limited by the
`available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
**grass/on-premises**
* Create a Grass cluster with machines on hand.
* You can join a machine to the cluster if the machine is in the same private network as the Master.
Components
^^^^^^^^^^
Here's the diagram of a Grass cluster with all the components tied together.
.. image:: ../images/distributed/orch_grass.svg
:target: ../images/distributed/orch_grass.svg
:alt: Orchestration Grass Mode in Azure
:width: 600
:width: 650
Kubernetes
----------
|
Master Components
* redis: A centralized DB for runtime data storage.
* fluentd: A centralized data collector for log collecting.
* samba-server: For file sharing within the whole cluster.
* master-agent: A daemon service for status monitoring and job scheduling.
* master-api-server: A RESTFul server for cluster management.
The MARO CLI can access this server to control cluster and get cluster information in an encryption session.
Node Components
* samba-client: For file sharing.
* node-agent: A daemon service for tracking the computing resources and container statues of the node.
* node-api-server: An internal RESTFul server for node management.
Communications
^^^^^^^^^^^^^^
Outer Environment to the Master
* The communications from outer environment to the Master is encrypted.
* Grass will use the following paths in the OuterEnv-Master communications:
* SSH tunnel: For file transfer and script execution.
* HTTP connection: For connection with master-api-server, use RSA+AES hybrid encryption.
Communications within the Cluster
* The communications within the cluster is not encrypted.
* Therefore, user has the responsibility to make sure all Nodes are connected within a private network and
restrict external connections in the cluster.
.. _k8s:
K8s
---
MARO also supports Kubernetes (k8s) as an orchestration option.
With this widely used framework, you can easily build up your training cluster
With this widely adopted framework, you can easily build up your MARO Cluster
with hundreds and thousands of nodes. It has the following advantages:
* Higher durability.
* Better scalability.
In the Kubernetes mode:
* The dockerized job component runs in Kubernetes pod, and each pod only hosts
one component.
* All Kubernetes pods are registered into the same virtual network using
`Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
Check `K8S Cluster Provisioning on Azure <../installation/k8s_cluster_provisioning_on_azure.html>`_
to get how to use it.
We currently support the k8s/aks mode in Kubernetes, and it has the following features:
.. image:: ../images/distributed/orch_k8s.svg
:target: ../images/distributed/orch_k8s.svg
:alt: Orchestration K8S Mode in Azure
:width: 600
:width: 650
|
* The dockerized job component runs in Kubernetes Pod, and each Pod only hosts one component.
* All Kubernetes Pods are registered into the same virtual network using
`Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
* Azure File Service is used for file sharing in all Pods.
* Azure Container Registry is included for image management.
Check :ref:`K8S Cluster Provisioning on Azure <k8s-aks-cluster-provisioning>`
to see how to use it.

Просмотреть файл

@ -2,112 +2,22 @@
RL Toolkit
==========
MARO provides a full-stack abstraction for reinforcement learning (RL), which
empowers users to easily apply predefined and customized components to different
scenarios in a scalable way. The main abstractions include
`Learner, Actor <#learner-and-actor>`_\ , `Agent Manager <#agent-manager>`_\ ,
`Agent <#agent>`_\ , `Algorithm <#algorithm>`_\ ,
`State Shaper, Action Shaper, Experience Shaper <#shapers>`_\ , etc.
MARO provides a full-stack abstraction for reinforcement learning (RL), which enables users to
apply predefined and customized components to various scenarios. The main abstractions include
fundamental components such as `Agent <#agent>`_\ and `Shaper <#shaper>`_\ , and training routine
controllers such as `Actor <#actor>` and `Learner <#learner>`.
Learner and Actor
-----------------
.. image:: ../images/rl/overview.svg
:target: ../images/rl/overview.svg
:alt: RL Overview
* **Learner** is the abstraction of the learnable policy. It is responsible for
learning a qualified policy to improve the business optimized object.
.. code-block:: python
# Train function of learner.
def learn(self):
for exploration_params in self._scheduler:
performance, exp_by_agent = self._actor.roll_out(
self._agent_manager.dump_models(),
exploration_params=exploration_params
)
self._scheduler.record_performance(performance)
self._agent_manager.train(exp_by_agent)
* **Actor** is the abstraction of experience collection. It is responsible for
interacting with the environment and collecting experiences. The experiences
collected during interaction will be used for the training of the learners.
.. code-block:: python
# Rollout function of actor.
def roll_out(self, models=None, epsilons=None, seed: int = None):
self._env.reset()
# load models
if model_dict is not None:
self._agents.load_models(model_dict)
# load exploration parameters:
if exploration_params is not None:
self._agents.set_exploration_params(exploration_params)
metrics, decision_event, is_done = self._env.step(None)
while not is_done:
action = self._agents.choose_action(decision_event, self._env.snapshot_list)
metrics, decision_event, is_done = self._env.step(action)
self._agents.on_env_feedback(metrics)
details = self._agents.post_process(self._env.snapshot_list) if return_details else None
return self._env.metrics, details
Scheduler
---------
A ``Scheduler`` is the driver of an episodic learning process. The learner uses the scheduler to repeat the
rollout-training cycle a set number of episodes. For algorithms that require explicit exploration (e.g.,
DQN and DDPG), there are two types of schedules that a learner may follow:
* Static schedule, where the exploration parameters are generated using a pre-defined function of episode
number. See ``LinearParameterScheduler`` and ``TwoPhaseLinearParameterScheduler`` provided in the toolkit
for example.
* Dynamic schedule, where the exploration parameters for the next episode are determined based on the performance
history. Such a mechanism is possible in our abstraction because the scheduler provides a ``record_performance``
interface that allows it to keep track of roll-out performances.
Optionally, an early stopping checker may be registered if one wishes to terminate training when certain performance
requirements are satisfied, possibly before reaching the prescribed number of episodes.
Agent Manager
-------------
The agent manager provides a unified interactive interface with the environment
for RL agent(s). From the actor's perspective, it isolates the complex dependencies
of the various homogeneous/heterogeneous agents, so that the whole agent manager
will behave just like a single agent. Furthermore, to well serve the distributed algorithm
(scalable), the agent manager provides two kinds of working modes, which can be applied in
different distributed components, such as inference mode in actor, training mode in learner.
.. image:: ../images/rl/agent_manager.svg
:target: ../images/rl/agent_manager.svg
:alt: Agent Manager
:width: 750
* In **inference mode**\ , the agent manager is responsible to access and shape
the environment state for the related agent, convert the model action to an
executable environment action, and finally generate experiences from the
interaction trajectory.
* In **training mode**\ , the agent manager will optimize the underlying model of
the related agent(s), based on the collected experiences from in the inference mode.
Agent
-----
An agent is a combination of (RL) algorithm, experience pool, and a set of
non-algorithm-specific parameters (algorithm-specific parameters are managed by
the algorithm module). Non-algorithm-specific parameters are used to manage
experience storage, sampling strategies, and training strategies. Since all kinds
of scenario-specific stuff will be handled by the agent manager, the agent is
scenario agnostic.
The Agent is the kernel abstraction of the RL formulation for a real-world problem.
Our abstraction decouples agent and its underlying model so that an agent can exist
as an RL paradigm independent of the inner workings of the models it uses to generate
actions or estimate values. For example, the actor-critic algorithm does not need to
concern itself with the structures and optimizing schemes of the actor and critic models.
This decoupling is achieved by the Core Model abstraction described below.
.. image:: ../images/rl/agent.svg
:target: ../images/rl/agent.svg
@ -116,96 +26,96 @@ scenario agnostic.
.. code-block:: python
class AbsAgent(ABC):
def __init__(self, name: str, algorithm: AbsAlgorithm, experience_pool: AbsStore = None):
self._name = name
self._algorithm = algorithm
self._experience_pool = experience_pool
def __init__(self, model: AbsCoreModel, config, experience_pool=None):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = model.to(self.device)
self.config = config
self._experience_pool = experience_pool
Algorithm
---------
The algorithm is the kernel abstraction of the RL formulation for a real-world problem. Our abstraction
decouples algorithm and model so that an algorithm can exist as an RL paradigm independent of the inner
workings of the models it uses to generate actions or estimate values. For example, the actor-critic
algorithm does not need to concern itself with the structures and optimizing schemes of the actor and
critic models. This decoupling is achieved by the ``LearningModel`` abstraction described below.
.. image:: ../images/rl/algorithm.svg
:target: ../images/rl/algorithm.svg
:alt: Algorithm
:width: 650
* ``choose_action`` is used to make a decision based on a provided model state.
* ``train`` is used to trigger training and the policy update from external.
.. code-block:: python
class AbsAlgorithm(ABC):
def __init__(self, model: LearningModel, config):
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self._model = model.to(self._device)
self._config = config
Block, NNStack and LearningModel
--------------------------------
Core Model
----------
MARO provides an abstraction for the underlying models used by agents to form policies and estimate values.
The abstraction consists of a 3-level hierachy formed by ``AbsBlock``, ``NNStack`` and ``LearningModel`` from
the bottom up, all of which subclass torch's nn.Module. An ``AbsBlock`` is the smallest structural
unit of an NN-based model. For instance, the ``FullyConnectedBlock`` provided in the toolkit represents a stack
of fully connected layers with features like batch normalization, drop-out and skip connection. An ``NNStack`` is
a composite network that consists of one or more such blocks, each with its own set of network features.
The complete model as used directly by an ``Algorithm`` is represented by a ``LearningModel``, which consists of
one or more task stacks as "heads" and an optional shared stack at the bottom (which serves to produce representations
as input to all task stacks). It also contains one or more optimizers responsible for applying gradient steps to the
trainable parameters within each stack, which is the smallest trainable unit from the perspective of a ``LearningModel``.
The assignment of optimizers is flexible: it is possible to freeze certain stacks while optimizing others. Such an
abstraction presents a unified interface to the algorithm, regardless of how many individual models it requires and how
The abstraction consists of ``AbsBlock`` and ``AbsCoreModel``, both of which subclass torch's nn.Module.
The ``AbsBlock`` represents the smallest structural unit of an NN-based model. For instance, the ``FullyConnectedBlock``
provided in the toolkit is a stack of fully connected layers with features like batch normalization,
drop-out and skip connection. The ``AbsCoreModel`` is a collection of network components with
embedded optimizers and serves as an agent's "brain" by providing a unified interface to it. regardless of how many individual models it requires and how
complex the model architecture might be.
.. image:: ../images/rl/learning_model.svg
:target: ../images/rl/learning_model.svg
:alt: Algorithm
:width: 650
As an example, the initialization of the actor-critic algorithm may look like this:
.. code-block:: python
actor_stack = NNStack(name="actor", block_a1, block_a2, ...)
critic_stack = NNStack(name="critic", block_c1, block_c2, ...)
learning_model = LearningModel(actor_stack, critic_stack)
actor_critic = ActorCritic(learning_model, config)
actor_stack = FullyConnectedBlock(...)
critic_stack = FullyConnectedBlock(...)
model = SimpleMultiHeadModel(
{"actor": actor_stack, "critic": critic_stack},
optim_option={
"actor": OptimizerOption(cls=Adam, params={"lr": 0.001})
"critic": OptimizerOption(cls=RMSprop, params={"lr": 0.0001})
}
)
agent = ActorCritic("actor_critic", learning_model, config)
Choosing an action is simply:
.. code-block:: python
learning_model(state, task_name="actor", is_training=False)
model(state, task_name="actor", training=False)
And performing one gradient step is simply:
.. code-block:: python
learning_model.learn(critic_loss + actor_loss)
model.learn(critic_loss + actor_loss)
Explorer
-------
--------
MARO provides an abstraction for exploration in RL. Some RL algorithms such as DQN and DDPG require
explicit exploration, the extent of which is usually determined by a set of parameters whose values
are generated by the scheduler. The ``AbsExplorer`` class is designed to cater to these needs. Simple
exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space and ``UniformNoiseExplorer``
and ``GaussianNoiseExplorer`` for continuous action space, are provided in the toolkit.
explicit exploration governed by a set of parameters. The ``AbsExplorer`` class is designed to cater
to these needs. Simple exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space
and ``UniformNoiseExplorer`` and ``GaussianNoiseExplorer`` for continuous action space, are provided in
the toolkit.
As an example, the exploration for DQN may be carried out with the aid of an ``EpsilonGreedyExplorer``:
.. code-block:: python
explorer = EpsilonGreedyExplorer(num_actions=10)
greedy_action = learning_model(state, is_training=False).argmax(dim=1).data
exploration_action = explorer(greedy_action)
greedy_action = learning_model(state, training=False).argmax(dim=1).data
exploration_action = explorer(greedy_action)
Tools for Training
------------------------------
.. image:: ../images/rl/learner_actor.svg
:target: ../images/rl/learner_actor.svg
:alt: RL Overview
The RL toolkit provides tools that make local and distributed training easy:
* Learner, the central controller of the learning process, which consists of collecting simulation data from
remote actors and training the agents with them. The training data collection can be done in local or
distributed fashion by loading an ``Actor`` or ``ActorProxy`` instance, respectively.
* Actor, which implements the ``roll_out`` method where the agent interacts with the environment for one
episode. It consists of an environment instance and an agent (a single agent or multiple agents wrapped by
``MultiAgentWrapper``). The class provides the as_worker() method which turns it to an event loop where roll-outs
are performed on the learner's demand. In distributed RL, there are typically many actor processes running
simultaneously to parallelize training data collection.
* Actor proxy, which also implements the ``roll_out`` method with the same signature, but manages a set of remote
actors for parallel data collection.
* Trajectory, which is primarily responsible for translating between scenario-specific information and model
input / output. It implements the following methods which are used as callbacks in the actor's roll-out loop:
* ``get_state``, which converts observations of an environment into model input. For example, the observation
may be represented by a multi-level data structure, which gets encoded by a state shaper to a one-dimensional
vector as input to a neural network. The state shaper usually goes hand in hand with the underlying policy
or value models.
* ``get_action``, which provides model output with necessary context so that it can be executed by the
environment simulator.
* ``get_reward``, which computes a reward for a given action.
* ``on_env_feedback``, which defines things to do upon getting feedback from the environment.
* ``on_finish``, which defines things to do upon completion of a roll-out episode.

Просмотреть файл

@ -0,0 +1,88 @@
Command support for scenarios
=================================
After installation, MARO provides a command that generate project for user,
make it much easier to use or customize scenario.
.. code-block:: sh
maro project new
This command will show a step-by-step wizard to create a new project under current folder.
Currently it supports 2 modes.
1. Use built-in scenarios
-------------------------
To use built-in scenarios, please agree the first option "Use built-in scenario" with "yes" or "y", default is "yes".
Then you can select a built-in scenario and topologies with auto-completing.
.. code-block:: sh
Use built-in scenario?yes
Scenario name:cim
Use built-in topology (configuration)?yes
Topology name to use:global_trade.22p_l0.0
Durations to emulate:1024
Number of episodes to emulate:500
{'durations': 1024,
'scenario': 'cim',
'topology': 'global_trade.22p_l0.0',
'total_episodes': 500,
'use_builtin_scenario': True,
'use_builtin_topology': True}
Is this OK?yes
If these settings correct, then this command will create a runner.py script, you can just run with:
.. code-block:: sh
python runner.py
This script contains minimal code to interactive with environment without any action, you can then extend it as you wish.
Also you can create you own topology (configuration) if you say "no" for options "Use built-in topology (configuration)?".
It will ask you for a name of new topology, then copy the content from built-in one into your working folder (topologies/your_topology_name/config.yml).
2. Customized scenario
-------------------------------
This mode is used to generate a template of customize scenario for you instead of writing it from scratch.
To enable this, say "no" for option "Use built-in scenario", then provide your scenario name, default is current folder name.
.. code-block:: sh
Use built-in scenario?no
New scenario name:my_test
New topology name:my_test
Durations to emulate:1000
Number of episodes to emulate:100
{'durations': 1000,
'scenario': 'my_test',
'topology': 'my_test',
'total_episodes': 100,
'use_builtin_scenario': False,
'use_builtin_topology': False}
Is this OK?yes
This will generate following files like below:
.. code-block:: sh
-- runner.py
-- scenario
-- business_engine.py
-- common.py
-- events.py
-- frame_builder.py
-- topologies
-- my_test
-- config.yml
The script "runner.py" is the entry of this project, it will interactive with your scenario without action.
Then you can fill "scenario/business_engine.py" with your own logic.

2
examples/__init__.py Normal file
Просмотреть файл

@ -0,0 +1,2 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

11
examples/cim/README.md Normal file
Просмотреть файл

@ -0,0 +1,11 @@
# Container Inventory Management
Container inventory management (CIM) is a scenario where reinforcement learning (RL) can potentially prove useful. Three algorithms are used to learn the multi-agent policy in given environments. Each algorithm has a ``config`` folder which contains ``agent_config.py`` and ``training_config.py``. The former contains parameters for the underlying models and algorithm specific hyper-parameters. The latter contains parameters for the environment and the main training loop. The file ``common.py`` contains parameters and utility functions shared by some or all of these algorithms.
In the ``ac`` folder, , the policy is trained using the Actor-Critc algorithm in single-threaded fashion. The example can be run by simply executing ``python3 main.py``. Logs will be saved in a file named ``cim-ac.CURRENT_TIME_STAMP.log`` under the ``ac/logs`` folder, where ``CURRENT_TIME_STAMP`` is the time of executing the script.
In the ``dqn`` folder, the policy is trained using the DQN algorithm in multi-process / distributed mode. This example can be run in three ways.
* ``python3 main.py`` or ``python3 main.py -w 0`` runs the example in multi-process mode, in which a main process spawns one learner process and a number of actor processes as specified in ``config/training_config.py``.
* ``python3 main.py -w 1`` launches the learner process only. This is for distributed training and expects a number of actor processes (as specified in ``config/training_config.py``) running on some other node(s).
* ``python3 main.py -w 2`` launches the actor process only. This is for distributed training and expects a learner process running on some other node.
Logs will be saved in a file named ``GROUP_NAME.log`` under the ``{ac_gnn, dqn}/logs`` folder, where ``GROUP_NAME`` is specified in the "group" field in ``config/training_config.py``.

2
examples/cim/__init__.py Normal file
Просмотреть файл

@ -0,0 +1,2 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

Просмотреть файл

@ -0,0 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .agent_config import agent_config
from .training_config import training_config
__all__ = ["agent_config", "training_config"]

Просмотреть файл

@ -0,0 +1,52 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from torch import nn
from torch.optim import Adam, RMSprop
from maro.rl import OptimOption
from examples.cim.common import common_config
input_dim = (
(common_config["look_back"] + 1) *
(common_config["max_ports_downstream"] + 1) *
len(common_config["port_attributes"]) +
len(common_config["vessel_attributes"])
)
agent_config = {
"model": {
"actor": {
"input_dim": input_dim,
"output_dim": len(common_config["action_space"]),
"hidden_dims": [256, 128, 64],
"activation": nn.Tanh,
"softmax": True,
"batch_norm": False,
"head": True
},
"critic": {
"input_dim": input_dim,
"output_dim": 1,
"hidden_dims": [256, 128, 64],
"activation": nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"head": True
}
},
"optimization": {
"actor": OptimOption(optim_cls=Adam, optim_params={"lr": 0.001}),
"critic": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.001})
},
"hyper_params": {
"reward_discount": .0,
"critic_loss_func": nn.SmoothL1Loss(),
"train_iters": 10,
"actor_loss_coefficient": 0.1,
"k": 1,
"lam": 0.0
# "clip_ratio": 0.8
}
}

Просмотреть файл

@ -0,0 +1,11 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
training_config = {
"env": {
"scenario": "cim",
"topology": "toy.4p_ssdd_l0.0",
"durations": 1120,
},
"max_episode": 50
}

59
examples/cim/ac/main.py Normal file
Просмотреть файл

@ -0,0 +1,59 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from collections import defaultdict, deque
from os import makedirs, system
from os.path import dirname, join, realpath
import numpy as np
from torch import nn
from torch.optim import Adam, RMSprop
from maro.rl import (
Actor, ActorCritic, ActorCriticConfig, FullyConnectedBlock, MultiAgentWrapper, SimpleMultiHeadModel,
Scheduler, OnPolicyLearner
)
from maro.simulator import Env
from maro.utils import Logger, set_seeds
from examples.cim.ac.config import agent_config, training_config
from examples.cim.common import CIMTrajectory, common_config
def get_ac_agent():
actor_net = FullyConnectedBlock(**agent_config["model"]["actor"])
critic_net = FullyConnectedBlock(**agent_config["model"]["critic"])
ac_model = SimpleMultiHeadModel(
{"actor": actor_net, "critic": critic_net}, optim_option=agent_config["optimization"],
)
return ActorCritic(ac_model, ActorCriticConfig(**agent_config["hyper_params"]))
class CIMTrajectoryForAC(CIMTrajectory):
def on_finish(self):
training_data = {}
for event, state, action in zip(self.trajectory["event"], self.trajectory["state"], self.trajectory["action"]):
agent_id = list(state.keys())[0]
data = training_data.setdefault(agent_id, {"args": [[] for _ in range(4)]})
data["args"][0].append(state[agent_id]) # state
data["args"][1].append(action[agent_id][0]) # action
data["args"][2].append(action[agent_id][1]) # log_p
data["args"][3].append(self.get_offline_reward(event)) # reward
for agent_id in training_data:
training_data[agent_id]["args"] = [
np.asarray(vals, dtype=np.float32 if i == 3 else None)
for i, vals in enumerate(training_data[agent_id]["args"])
]
return training_data
# Single-threaded launcher
if __name__ == "__main__":
set_seeds(1024) # for reproducibility
env = Env(**training_config["env"])
agent = MultiAgentWrapper({name: get_ac_agent() for name in env.agent_idx_list})
actor = Actor(env, agent, CIMTrajectoryForAC, trajectory_kwargs=common_config) # local actor
learner = OnPolicyLearner(actor, training_config["max_episode"])
learner.run()

99
examples/cim/common.py Normal file
Просмотреть файл

@ -0,0 +1,99 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from collections import defaultdict
import numpy as np
from maro.rl import Trajectory
from maro.simulator.scenarios.cim.common import Action, ActionType
common_config = {
"port_attributes": ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"],
"vessel_attributes": ["empty", "full", "remaining_space"],
"action_space": list(np.linspace(-1.0, 1.0, 21)),
# Parameters for computing states
"look_back": 7,
"max_ports_downstream": 2,
# Parameters for computing actions
"finite_vessel_space": True,
"has_early_discharge": True,
# Parameters for computing rewards
"reward_time_window": 99,
"fulfillment_factor": 1.0,
"shortage_factor": 1.0,
"time_decay": 0.97
}
class CIMTrajectory(Trajectory):
def __init__(
self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
reward_time_window, fulfillment_factor, shortage_factor, time_decay,
finite_vessel_space=True, has_early_discharge=True
):
super().__init__(env)
self.port_attributes = port_attributes
self.vessel_attributes = vessel_attributes
self.action_space = action_space
self.look_back = look_back
self.max_ports_downstream = max_ports_downstream
self.reward_time_window = reward_time_window
self.fulfillment_factor = fulfillment_factor
self.shortage_factor = shortage_factor
self.time_decay = time_decay
self.finite_vessel_space = finite_vessel_space
self.has_early_discharge = has_early_discharge
def get_state(self, event):
vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
ticks = [tick - rt for rt in range(self.look_back - 1)]
future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
return {port_idx: np.concatenate((port_features, vessel_features))}
def get_action(self, action_by_agent, event):
vessel_snapshots = self.env.snapshot_list["vessels"]
action_info = list(action_by_agent.values())[0]
model_action = action_info[0] if isinstance(action_info, tuple) else action_info
scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
zero_action_idx = len(self.action_space) / 2 # index corresponding to value zero.
vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
percent = abs(self.action_space[model_action])
if model_action < zero_action_idx:
action_type = ActionType.LOAD
actual_action = min(round(percent * scope.load), vessel_space)
elif model_action > zero_action_idx:
action_type = ActionType.DISCHARGE
plan_action = percent * (scope.discharge + early_discharge) - early_discharge
actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
else:
actual_action, action_type = 0, None
return {port: Action(vessel, port, actual_action, action_type)}
def get_offline_reward(self, event):
port_snapshots = self.env.snapshot_list["ports"]
start_tick = event.tick + 1
ticks = list(range(start_tick, start_tick + self.reward_time_window))
future_fulfillment = port_snapshots[ticks::"fulfillment"]
future_shortage = port_snapshots[ticks::"shortage"]
decay_list = [
self.time_decay ** i for i in range(self.reward_time_window)
for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
]
tot_fulfillment = np.dot(future_fulfillment, decay_list)
tot_shortage = np.dot(future_shortage, decay_list)
return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
self.trajectory["event"].append(event)
self.trajectory["state"].append(state_by_agent)
self.trajectory["action"].append(action_by_agent)

Просмотреть файл

@ -1,24 +0,0 @@
# Overview
The CIM problem is one of the quintessential use cases of MARO. The example can
be run with a set of scenario configurations that can be found under
maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
topology, type of algorithm to use, number of training episodes) can be configured
through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
all algorithm-specific parameters can be configured through
the config.py file in that folder.
## Single-host Single-process Mode
To run the CIM example using the DQN algorithm under single-host mode, go to
examples/cim/dqn and run single_process_launcher.py. You may play around with
the configuration if you want to try out different settings.
## Distributed Mode
The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
for distributed training. For debugging purposes, we provide a script that
simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
and execute python3 multi_process_launcher.py \[GROUP_NAME\] \[NUM_ACTORS\], where
GROUP_NAME is the identifier for the current run and NUM_ACTORS is the number of actor
processes to launch.

Просмотреть файл

@ -0,0 +1,2 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

Просмотреть файл

@ -1,14 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .action_shaper import CIMActionShaper
from .agent_manager import DQNAgentManager, create_dqn_agents
from .experience_shaper import TruncatedExperienceShaper
from .state_shaper import CIMStateShaper
__all__ = [
"CIMActionShaper",
"DQNAgentManager", "create_dqn_agents",
"TruncatedExperienceShaper",
"CIMStateShaper"
]

Просмотреть файл

@ -1,36 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from maro.rl import ActionShaper
from maro.simulator.scenarios.cim.common import Action
class CIMActionShaper(ActionShaper):
def __init__(self, action_space):
super().__init__()
self._action_space = action_space
self._zero_action_index = action_space.index(0)
def __call__(self, model_action, decision_event, snapshot_list):
scope = decision_event.action_scope
tick = decision_event.tick
port_idx = decision_event.port_idx
vessel_idx = decision_event.vessel_idx
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
assert 0 <= model_action < len(self._action_space)
if model_action < self._zero_action_index:
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
elif model_action > self._zero_action_index:
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
actual_action = (
round(plan_action) if plan_action > 0
else round(self._action_space[model_action] * scope.discharge)
)
else:
actual_action = 0
return Action(vessel_idx, port_idx, actual_action)

Просмотреть файл

@ -1,60 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import pickle
import numpy as np
from maro.rl import AbsAgent, ColumnBasedStore
class DQNAgent(AbsAgent):
"""Implementation of AbsAgent for the DQN algorithm.
Args:
name (str): Agent's name.
algorithm (AbsAlgorithm): A concrete algorithm instance that inherits from AbstractAlgorithm.
experience_pool (AbsStore): It is used to store experiences processed by the experience shaper, which will be
used by some value-based algorithms, such as DQN.
min_experiences_to_train: minimum number of experiences required for training.
num_batches: number of batches to train the DQN model on per call to ``train``.
batch_size: mini-batch size.
"""
def __init__(
self,
name: str,
algorithm,
experience_pool: ColumnBasedStore,
min_experiences_to_train,
num_batches,
batch_size
):
super().__init__(name, algorithm, experience_pool=experience_pool)
self._min_experiences_to_train = min_experiences_to_train
self._num_batches = num_batches
self._batch_size = batch_size
def train(self):
"""Implementation of the training loop for DQN.
Experiences are sampled using their TD errors as weights. After training, the new TD errors are updated
in the experience pool.
"""
if len(self._experience_pool) < self._min_experiences_to_train:
return
for _ in range(self._num_batches):
indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
state = np.asarray(sample["state"])
action = np.asarray(sample["action"])
reward = np.asarray(sample["reward"])
next_state = np.asarray(sample["next_state"])
loss = self._algorithm.train(state, action, reward, next_state)
self._experience_pool.update(indexes, {"loss": loss})
def dump_experience_pool(self, dir_path: str):
"""Dump the experience pool to disk."""
os.makedirs(dir_path, exist_ok=True)
with open(os.path.join(dir_path, self._name), "wb") as fp:
pickle.dump(self._experience_pool, fp)

Просмотреть файл

@ -1,57 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch.nn as nn
from torch.optim import RMSprop
from maro.rl import (
ColumnBasedStore, DQN, DQNConfig, FullyConnectedBlock, LearningModel, NNStack, OptimizerOptions,
SimpleAgentManager
)
from maro.utils import set_seeds
from .agent import DQNAgent
def create_dqn_agents(agent_id_list, config):
num_actions = config.algorithm.num_actions
set_seeds(config.seed)
agent_dict = {}
for agent_id in agent_id_list:
q_net = NNStack(
"q_value",
FullyConnectedBlock(
input_dim=config.algorithm.input_dim,
output_dim=num_actions,
activation=nn.LeakyReLU,
is_head=True,
**config.algorithm.model
)
)
learning_model = LearningModel(
q_net,
optimizer_options=OptimizerOptions(cls=RMSprop, params=config.algorithm.optimizer)
)
algorithm = DQN(
learning_model,
DQNConfig(**config.algorithm.hyper_params, loss_cls=nn.SmoothL1Loss)
)
agent_dict[agent_id] = DQNAgent(
agent_id, algorithm, ColumnBasedStore(**config.experience_pool),
**config.training_loop_parameters
)
return agent_dict
class DQNAgentManager(SimpleAgentManager):
def train(self, experiences_by_agent, performance=None):
self._assert_train_mode()
# store experiences for each agent
for agent_id, exp in experiences_by_agent.items():
exp.update({"loss": [1e8] * len(list(exp.values())[0])})
self.agent_dict[agent_id].store_experiences(exp)
for agent in self.agent_dict.values():
agent.train()

Просмотреть файл

@ -1,20 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
This file is used to load the configuration and convert it into a dotted dictionary.
"""
import io
import os
import yaml
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
with io.open(CONFIG_PATH, "r") as in_file:
config = yaml.safe_load(in_file)
DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
distributed_config = yaml.safe_load(in_file)

Просмотреть файл

@ -1,52 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from collections import defaultdict
import numpy as np
from maro.rl import ExperienceShaper
class TruncatedExperienceShaper(ExperienceShaper):
def __init__(
self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float, shortage_factor: float
):
super().__init__(reward_func=None)
self._time_window = time_window
self._time_decay_factor = time_decay_factor
self._fulfillment_factor = fulfillment_factor
self._shortage_factor = shortage_factor
def __call__(self, trajectory, snapshot_list):
experiences_by_agent = {}
for i in range(len(trajectory) - 1):
transition = trajectory[i]
agent_id = transition["agent_id"]
if agent_id not in experiences_by_agent:
experiences_by_agent[agent_id] = defaultdict(list)
experiences = experiences_by_agent[agent_id]
experiences["state"].append(transition["state"])
experiences["action"].append(transition["action"])
experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
experiences["next_state"].append(trajectory[i + 1]["state"])
return experiences_by_agent
def _compute_reward(self, decision_event, snapshot_list):
start_tick = decision_event.tick + 1
end_tick = decision_event.tick + self._time_window
ticks = list(range(start_tick, end_tick))
# calculate tc reward
future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
future_shortage = snapshot_list["ports"][ticks::"shortage"]
decay_list = [
self._time_decay_factor ** i for i in range(end_tick - start_tick)
for _ in range(future_fulfillment.shape[0] // (end_tick - start_tick))
]
tot_fulfillment = np.dot(future_fulfillment, decay_list)
tot_shortage = np.dot(future_shortage, decay_list)
return np.float32(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)

Просмотреть файл

@ -1,30 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import numpy as np
from maro.rl import StateShaper
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
class CIMStateShaper(StateShaper):
def __init__(self, *, look_back, max_ports_downstream):
super().__init__()
self._look_back = look_back
self._max_ports_downstream = max_ports_downstream
self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
def __call__(self, decision_event, snapshot_list):
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
ticks = [tick - rt for rt in range(self._look_back - 1)]
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
state = np.concatenate((port_features, vessel_features))
return str(port_idx), state
@property
def dim(self):
return self._dim

Просмотреть файл

@ -1,48 +0,0 @@
env:
scenario: "cim"
topology: "toy.4p_ssdd_l0.0"
durations: 1120
state_shaping:
look_back: 7
max_ports_downstream: 2
experience_shaping:
time_window: 100
fulfillment_factor: 1.0
shortage_factor: 1.0
time_decay_factor: 0.97
main_loop:
max_episode: 500
exploration:
parameter_names:
- "epsilon"
split_ep: 250
start_values: 0.4
mid_values: 0.32
end_values: 0.0
agents:
algorithm:
num_actions: 21
model:
hidden_dims:
- 256
- 128
- 64
softmax_enabled: false
batch_norm_enabled: true
skip_connection_enabled: false
dropout_p: 0.0
optimizer:
lr: 0.05
hyper_params:
reward_discount: .0
target_update_frequency: 5
tau: 0.1
is_double: true
per_sample_td_error_enabled: true
experience_pool:
capacity: -1
training_loop_parameters:
min_experiences_to_train: 1024
num_batches: 10
batch_size: 128
seed: 32 # for reproducibility

Просмотреть файл

@ -0,0 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .agent_config import agent_config
from .training_config import training_config
__all__ = ["agent_config", "training_config"]

Просмотреть файл

@ -0,0 +1,38 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from torch import nn
from torch.optim import RMSprop
from maro.rl import DQN, DQNConfig, FullyConnectedBlock, OptimOption, PolicyGradient, SimpleMultiHeadModel
from examples.cim.common import common_config
input_dim = (
(common_config["look_back"] + 1) *
(common_config["max_ports_downstream"] + 1) *
len(common_config["port_attributes"]) +
len(common_config["vessel_attributes"])
)
agent_config = {
"model": {
"input_dim": input_dim,
"output_dim": len(common_config["action_space"]), # number of possible actions
"hidden_dims": [256, 128, 64],
"activation": nn.LeakyReLU,
"softmax": False,
"batch_norm": True,
"skip_connection": False,
"head": True,
"dropout_p": 0.0
},
"optimization": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.05}),
"hyper_params": {
"reward_discount": .0,
"loss_cls": nn.SmoothL1Loss,
"target_update_freq": 5,
"tau": 0.1,
"double": False
}
}

Просмотреть файл

@ -0,0 +1,29 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
training_config = {
"env": {
"scenario": "cim",
"topology": "toy.4p_ssdd_l0.0",
"durations": 1120,
},
"max_episode": 100,
"exploration": {
"parameter_names": ["epsilon"],
"split": 0.5,
"start": 0.4,
"mid": 0.32,
"end": 0.0
},
"training": {
"min_experiences_to_train": 1024,
"train_iter": 10,
"batch_size": 128,
"prioritized_sampling_by_loss": True
},
"group": "cim-dqn",
"learner_update_trigger": 2,
"num_actors": 2,
"num_trainers": 4,
"trainer_id": 0
}

Просмотреть файл

@ -1,49 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import numpy as np
from maro.rl import ActorWorker, AgentManagerMode, SimpleActor
from maro.simulator import Env
from maro.utils import convert_dottable
from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
def launch(config, distributed_config):
config = convert_dottable(config)
distributed_config = convert_dottable(distributed_config)
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
state_shaper = CIMStateShaper(**config.env.state_shaping)
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions)))
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
agent_manager = DQNAgentManager(
name="cim_actor",
mode=AgentManagerMode.INFERENCE,
agent_dict=create_dqn_agents(agent_id_list, config.agents),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper
)
proxy_params = {
"group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
"expected_peers": {"learner": 1},
"redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
"max_retries": 15
}
actor_worker = ActorWorker(
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
proxy_params=proxy_params
)
actor_worker.launch()
if __name__ == "__main__":
from components.config import config, distributed_config
launch(config=config, distributed_config=distributed_config)

Просмотреть файл

@ -1,51 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
from maro.rl import (
ActorProxy, AgentManagerMode, SimpleLearner, TwoPhaseLinearParameterScheduler, concat_experiences_by_agent
)
from maro.simulator import Env
from maro.utils import Logger, convert_dottable
from components import CIMStateShaper, DQNAgentManager, create_dqn_agents
def launch(config, distributed_config):
config = convert_dottable(config)
distributed_config = convert_dottable(distributed_config)
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
config["agents"]["algorithm"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
agent_manager = DQNAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN,
agent_dict=create_dqn_agents(agent_id_list, config.agents)
)
proxy_params = {
"group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
"expected_peers": {
"actor": int(os.environ["NUM_ACTORS"] if "NUM_ACTORS" in os.environ else distributed_config.num_actors)
},
"redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
"max_retries": 15
}
learner = SimpleLearner(
agent_manager=agent_manager,
actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
scheduler=TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration),
logger=Logger("cim_learner", auto_timestamp=False)
)
learner.learn()
learner.test()
learner.dump_models(os.path.join(os.getcwd(), "models"))
learner.exit()
if __name__ == "__main__":
from components.config import config, distributed_config
launch(config=config, distributed_config=distributed_config)

Просмотреть файл

@ -1,6 +0,0 @@
redis:
hostname: "localhost"
port: 6379
group: test_group
num_actors: 1
num_learners: 1

87
examples/cim/dqn/main.py Normal file
Просмотреть файл

@ -0,0 +1,87 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import time
from collections import defaultdict
from multiprocessing import Process
from os import makedirs
from os.path import dirname, join, realpath
from maro.rl import (
Actor, ActorProxy, DQN, DQNConfig, FullyConnectedBlock, MultiAgentWrapper, OffPolicyLearner,
SimpleMultiHeadModel, TwoPhaseLinearParameterScheduler
)
from maro.simulator import Env
from maro.utils import Logger, set_seeds
from examples.cim.common import CIMTrajectory, common_config
from examples.cim.dqn.config import agent_config, training_config
def get_dqn_agent():
q_model = SimpleMultiHeadModel(
FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
)
return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
class CIMTrajectoryForDQN(CIMTrajectory):
def on_finish(self):
exp_by_agent = defaultdict(lambda: defaultdict(list))
for i in range(len(self.trajectory["state"]) - 1):
agent_id = list(self.trajectory["state"][i].keys())[0]
exp = exp_by_agent[agent_id]
exp["S"].append(self.trajectory["state"][i][agent_id])
exp["A"].append(self.trajectory["action"][i][agent_id])
exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
return dict(exp_by_agent)
def cim_dqn_learner():
env = Env(**training_config["env"])
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
actor = ActorProxy(
training_config["group"], training_config["num_actors"],
update_trigger=training_config["learner_update_trigger"]
)
learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
learner.run()
def cim_dqn_actor():
env = Env(**training_config["env"])
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
actor.as_worker(training_config["group"])
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-w", "--whoami", type=int, choices=[0, 1, 2], default=0,
help="Identity of this process: 0 - multi-process mode, 1 - learner, 2 - actor"
)
args = parser.parse_args()
if args.whoami == 0:
actor_processes = [Process(target=cim_dqn_actor) for _ in range(training_config["num_actors"])]
learner_process = Process(target=cim_dqn_learner)
for i, actor_process in enumerate(actor_processes):
set_seeds(i) # this is to ensure that the actors explore differently.
actor_process.start()
learner_process.start()
for actor_process in actor_processes:
actor_process.join()
learner_process.join()
elif args.whoami == 1:
cim_dqn_learner()
elif args.whoami == 2:
cim_dqn_actor()

Просмотреть файл

@ -1,25 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
This script is used to debug distributed algorithm in single host multi-process mode.
"""
import argparse
import os
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("group_name", help="group name")
parser.add_argument("num_actors", type=int, help="number of actors")
args = parser.parse_args()
learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
# Launch the learner process
os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
# Launch the actor processes
for _ in range(args.num_actors):
os.system(f"GROUP={args.group_name} python " + actor_path)

Просмотреть файл

@ -1,53 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import numpy as np
from maro.rl import AgentManagerMode, SimpleActor, SimpleLearner, TwoPhaseLinearParameterScheduler
from maro.simulator import Env
from maro.utils import LogFormat, Logger, convert_dottable
from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
def launch(config):
config = convert_dottable(config)
# Step 1: Initialize a CIM environment for using a toy dataset.
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
action_space = list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions))
# Step 2: Create state, action and experience shapers. We also need to create an explorer here due to the
# greedy nature of the DQN algorithm.
state_shaper = CIMStateShaper(**config.env.state_shaping)
action_shaper = CIMActionShaper(action_space=action_space)
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
# Step 3: Create agents and an agent manager.
config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
agent_manager = DQNAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN_INFERENCE,
agent_dict=create_dqn_agents(agent_id_list, config.agents),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper
)
# Step 4: Create an actor and a learner to start the training process.
scheduler = TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration)
actor = SimpleActor(env, agent_manager)
learner = SimpleLearner(
agent_manager, actor, scheduler,
logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
)
learner.learn()
learner.test()
learner.dump_models(os.path.join(os.getcwd(), "models"))
if __name__ == "__main__":
from components.config import config
launch(config)

Просмотреть файл

@ -1,13 +0,0 @@
from .actor import ParallelActor
from .agent_manager import SimpleAgentManger
from .learner import GNNLearner
from .state_shaper import GNNStateShaper
from .utils import decision_cnt_analysis, load_config, return_scaler, save_code, save_config
__all__ = [
"ParallelActor",
"SimpleAgentManger",
"GNNLearner",
"GNNStateShaper",
"decision_cnt_analysis", "load_config", "return_scaler", "save_code", "save_config"
]

Просмотреть файл

@ -1,37 +0,0 @@
from maro.rl import ActionShaper
class DiscreteActionShaper(ActionShaper):
"""The shaping class to transform the action in [-1, 1] to actual repositioning function."""
def __init__(self, action_dim):
super().__init__()
self._action_dim = action_dim
self._zero_action = self._action_dim // 2
def __call__(self, decision_event, model_action):
"""Shaping the action in [-1,1] range to the actual repositioning function.
This function maps integer model action within the range of [-A, A] to actual action. We define negative actual
action as discharge resource from vessel to port and positive action as upload from port to vessel, so the
upper bound and lower bound of actual action are the resource in dynamic and static node respectively.
Args:
decision_event (Event): The decision event from the environment.
model_action (int): Output action, range A means the half of the agent output dim.
"""
env_action = 0
model_action -= self._zero_action
action_scope = decision_event.action_scope
if model_action < 0:
# Discharge resource from dynamic node.
env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.load)
elif model_action == 0:
env_action = 0
else:
# Load resource to dynamic node.
env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.discharge)
env_action = int(env_action)
return env_action

Просмотреть файл

@ -1,370 +0,0 @@
import ctypes
import multiprocessing
import os
import pickle
import time
from collections import OrderedDict
from multiprocessing import Pipe, Process
import numpy as np
import torch
from maro.rl import AbsActor
from maro.simulator import Env
from maro.simulator.scenarios.cim.common import Action
from .action_shaper import DiscreteActionShaper
from .experience_shaper import ExperienceShaper
from .shared_structure import SharedStructure
from .state_shaper import GNNStateShaper
from .utils import fix_seed, gnn_union
def organize_exp_list(experience_collections: dict, idx_mapping: dict):
"""The function assemble the experience from multiple processes into a dictionary.
Args:
experience_collections (dict): It stores the experience in all agents. The structure is the same as what is
defined in the SharedStructure in the ParallelActor except additional key for experience length. For
example:
{
"len": numpy.array,
"s": {
"v": numpy.array,
"p": numpy.array,
}
"a": numpy.array,
"R": numpy.array,
"s_": {
"v": numpy.array,
"p": numpy.array,
}
}
Note that the experience from different agents are stored in the same batch in a sequential way. For
example, if agent x starts at b_x in batch index and the experience is l_x length long, the range [b_x,
l_x) in the batch is the experience of agent x.
idx_mapping (dict): The key is the name of each agent and the value is the starting index, e.g., b_x, of the
storage space where the experience of the agent is stored.
"""
result = {}
tmpi = 0
for code, idx in idx_mapping.items():
exp_len = experience_collections["len"][0][tmpi]
s = organize_obs(experience_collections["s"], idx, exp_len)
s_ = organize_obs(experience_collections["s_"], idx, exp_len)
R = experience_collections["R"][idx: idx + exp_len]
R = R.reshape(-1, *R.shape[2:])
a = experience_collections["a"][idx: idx + exp_len]
a = a.reshape(-1, *a.shape[2:])
result[code] = {
"R": R,
"a": a,
"s": s,
"s_": s_,
"len": a.shape[0]
}
tmpi += 1
return result
def organize_obs(obs, idx, exp_len):
"""Helper function to transform the observation from multiple processes to a unified dictionary."""
tick_buffer, _, para_cnt, v_cnt, v_dim = obs["v"].shape
_, _, _, p_cnt, p_dim = obs["p"].shape
batch = exp_len * para_cnt
# v: tick_buffer, seq_len, parallel_cnt, v_cnt, v_dim --> (tick_buffer, cnt, v_cnt, v_dim)
v = obs["v"][:, idx: idx + exp_len]
v = v.reshape(tick_buffer, batch, v_cnt, v_dim)
p = obs["p"][:, idx: idx + exp_len]
p = p.reshape(tick_buffer, batch, p_cnt, p_dim)
# vo: seq_len * parallel_cnt * v_cnt * p_cnt* --> cnt * v_cnt * p_cnt*
vo = obs["vo"][idx: idx + exp_len]
vo = vo.reshape(batch, v_cnt, vo.shape[-1])
po = obs["po"][idx: idx + exp_len]
po = po.reshape(batch, p_cnt, po.shape[-1])
vedge = obs["vedge"][idx: idx + exp_len]
vedge = vedge.reshape(batch, v_cnt, vedge.shape[-2], vedge.shape[-1])
pedge = obs["pedge"][idx: idx + exp_len]
pedge = pedge.reshape(batch, p_cnt, pedge.shape[-2], pedge.shape[-1])
ppedge = obs["ppedge"][idx: idx + exp_len]
ppedge = ppedge.reshape(batch, p_cnt, ppedge.shape[-2], ppedge.shape[-1])
# mask: (seq_len, parallel_cnt, tick_buffer)
mask = obs["mask"][idx: idx + exp_len].reshape(batch, tick_buffer)
return {"v": v, "p": p, "vo": vo, "po": po, "pedge": pedge, "vedge": vedge, "ppedge": ppedge, "mask": mask}
def single_player_worker(index, config, exp_idx_mapping, pipe, action_io, exp_output):
"""The A2C worker function to collect experience.
Args:
index (int): The process index counted from 0.
config (dict): It is a dottable dictionary that stores the configuration of the simulation, state_shaper and
postprocessing shaper.
exp_idx_mapping (dict): The key is agent code and the value is the starting index where the experience is stored
in the experience batch.
pipe (Pipe): The pipe instance for communication with the main process.
action_io (SharedStructure): The shared memory to hold the state information that the main process uses to
generate an action.
exp_output (SharedStructure): The shared memory to transfer the experience list to the main process.
"""
if index == 0:
simulation_log_path = os.path.join(config.log.path, f"cim_gnn_{index}")
if not os.path.exists(simulation_log_path):
os.makedirs(simulation_log_path)
opts = {"enable-dump-snapshot": simulation_log_path}
env = Env(**config.env.param, options=opts)
else:
env = Env(**config.env.param)
fix_seed(env, config.env.seed)
static_code_list, dynamic_code_list = list(env.summary["node_mapping"]["ports"].values()), \
list(env.summary["node_mapping"]["vessels"].values())
# Create gnn_state_shaper without consuming any resources.
gnn_state_shaper = GNNStateShaper(
static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
tick_buffer=config.model.tick_buffer, max_value=env.configs["total_containers"])
gnn_state_shaper.compute_static_graph_structure(env)
action_io_np = action_io.structuralize()
action_shaper = DiscreteActionShaper(config.model.action_dim)
exp_shaper = ExperienceShaper(
static_code_list, dynamic_code_list, config.env.param.durations, gnn_state_shaper,
scale_factor=config.env.return_scaler, time_slot=config.training.td_steps,
discount_factor=config.training.gamma, idx=index, shared_storage=exp_output.structuralize(),
exp_idx_mapping=exp_idx_mapping)
i = 0
while pipe.recv() == "reset":
r, decision_event, is_done = env.step(None)
j = 0
logs = []
while not is_done:
model_input = gnn_state_shaper(decision_event, env.snapshot_list)
action_io_np["v"][:, index] = model_input["v"]
action_io_np["p"][:, index] = model_input["p"]
action_io_np["vo"][index] = model_input["vo"]
action_io_np["po"][index] = model_input["po"]
action_io_np["vedge"][index] = model_input["vedge"]
action_io_np["pedge"][index] = model_input["pedge"]
action_io_np["ppedge"][index] = model_input["ppedge"]
action_io_np["mask"][index] = model_input["mask"]
action_io_np["pid"][index] = decision_event.port_idx
action_io_np["vid"][index] = decision_event.vessel_idx
pipe.send("features")
model_action = pipe.recv()
env_action = action_shaper(decision_event, model_action)
exp_shaper.record(decision_event=decision_event, model_action=model_action, model_input=model_input)
logs.append([
index, decision_event.tick, decision_event.port_idx, decision_event.vessel_idx, model_action,
env_action, decision_event.action_scope.load, decision_event.action_scope.discharge])
action = Action(decision_event.vessel_idx, decision_event.port_idx, env_action)
r, decision_event, is_done = env.step(action)
j += 1
action_io_np["sh"][index] = compute_shortage(env.snapshot_list, config.env.param.durations, static_code_list)
i += 1
pipe.send("done")
gnn_state_shaper.end_ep_callback(env.snapshot_list)
# Organize and synchronize exp to shared memory.
exp_shaper(env.snapshot_list)
exp_shaper.reset()
logs = np.array(logs, dtype=np.float)
pipe.send(logs)
env.reset()
def compute_shortage(snapshot_list, max_tick, static_code_list):
"""Helper function to compute the shortage after a episode end."""
return np.sum(snapshot_list["ports"][max_tick - 1: static_code_list: "acc_shortage"])
class ParallelActor(AbsActor):
def __init__(self, config, demo_env, gnn_state_shaper, agent_manager, logger):
"""A2C rollout class.
This implements the synchronized A2C structure. Multiple processes are created to simulate and collect
experience where only CPU is needed and whenever an action is required, they notify the main process and the
main process will do the batch action inference with GPU.
Args:
config (dict): The configuration to run the simulation.
demo_env (maro.simulator.Env): To get configuration information such as the amount of vessels and ports as
well as the topology of the environment, the example environment is needed.
gnn_state_shaper (AbsShaper): The state shaper instance to extract graph information from the state of
the environment.
agent_manager (AbsAgentManger): The agent manager instance to do the action inference in batch.
logger: The logger instance to log information during the rollout.
"""
super().__init__(demo_env, agent_manager)
multiprocessing.set_start_method("spawn", True)
self._logger = logger
self.config = config
self._static_node_mapping = demo_env.summary["node_mapping"]["ports"]
self._dynamic_node_mapping = demo_env.summary["node_mapping"]["vessels"]
self._gnn_state_shaper = gnn_state_shaper
self.device = torch.device(config.training.device)
self.parallel_cnt = config.training.parallel_cnt
self.log_header = [f"sh_{i}" for i in range(self.parallel_cnt)]
tick_buffer = config.model.tick_buffer
v_dim, vedge_dim, v_cnt = self._gnn_state_shaper.get_input_dim("v"), \
self._gnn_state_shaper.get_input_dim("vedge"), len(self._dynamic_node_mapping)
p_dim, pedge_dim, p_cnt = self._gnn_state_shaper.get_input_dim("p"), \
self._gnn_state_shaper.get_input_dim("pedge"), len(self._static_node_mapping)
self.pipes = [Pipe() for i in range(self.parallel_cnt)]
action_io_structure = {
"p": ((tick_buffer, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
"v": ((tick_buffer, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
"po": ((self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
"vo": ((self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
"vedge": ((self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
"pedge": ((self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
"ppedge": ((self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
"mask": ((self.parallel_cnt, tick_buffer), ctypes.c_bool),
"sh": ((self.parallel_cnt, ), ctypes.c_long),
"pid": ((self.parallel_cnt, ), ctypes.c_long),
"vid": ((self.parallel_cnt, ), ctypes.c_long)
}
self.action_io = SharedStructure(action_io_structure)
self.action_io_np = self.action_io.structuralize()
tot_exp_len = sum(config.env.exp_per_ep.values())
exp_output_structure = {
"s": {
"v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
"p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
"vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
"po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
"vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
"pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
"ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
"mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
},
"s_": {
"v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
"p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
"vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
"po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
"vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
"pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
"ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
"mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
},
"a": ((tot_exp_len, self.parallel_cnt), ctypes.c_long),
"len": ((self.parallel_cnt, len(config.env.exp_per_ep)), ctypes.c_long),
"R": ((tot_exp_len, self.parallel_cnt, p_cnt), ctypes.c_float),
}
self.exp_output = SharedStructure(exp_output_structure)
self.exp_output_np = self.exp_output.structuralize()
self._logger.info("allocate complete")
self.exp_idx_mapping = OrderedDict()
acc_c = 0
for key, c in config.env.exp_per_ep.items():
self.exp_idx_mapping[key] = acc_c
acc_c += c
self.workers = [
Process(
target=single_player_worker,
args=(i, config, self.exp_idx_mapping, self.pipes[i][1], self.action_io, self.exp_output)
) for i in range(self.parallel_cnt)
]
for w in self.workers:
w.start()
self._logger.info("all thread started")
self._roll_out_time = 0
self._trainsfer_time = 0
self._roll_out_cnt = 0
def roll_out(self):
"""Rollout using current policy in the AgentManager.
Returns:
result (dict): The key is the agent code, the value is the experience list stored in numpy.array.
"""
# Compute the time used for state preparation in the child process.
t_state = 0
# Compute the time used for action inference.
t_action = 0
for p in self.pipes:
p[0].send("reset")
self._roll_out_cnt += 1
step_i = 0
tick = time.time()
while True:
signals = [p[0].recv() for p in self.pipes]
if signals[0] == "done":
break
step_i += 1
t = time.time()
graph = gnn_union(
self.action_io_np["p"], self.action_io_np["po"], self.action_io_np["pedge"],
self.action_io_np["v"], self.action_io_np["vo"], self.action_io_np["vedge"],
self._gnn_state_shaper.p2p_static_graph, self.action_io_np["ppedge"],
self.action_io_np["mask"], self.device
)
t_state += time.time() - t
assert(np.min(self.action_io_np["pid"]) == np.max(self.action_io_np["pid"]))
assert(np.min(self.action_io_np["vid"]) == np.max(self.action_io_np["vid"]))
t = time.time()
actions = self._inference_agents.choose_action(
agent_id=(self.action_io_np["pid"][0], self.action_io_np["vid"][0]), state=graph
)
t_action += time.time() - t
for i, p in enumerate(self.pipes):
p[0].send(actions[i])
self._roll_out_time += time.time() - tick
tick = time.time()
self._logger.info("receiving exp")
logs = [p[0].recv() for p in self.pipes]
self._logger.info(f"Mean of shortage: {np.mean(self.action_io_np['sh'])}")
self._trainsfer_time += time.time() - tick
self._logger.debug(dict(zip(self.log_header, self.action_io_np["sh"])))
with open(os.path.join(self.config.log.path, f"logs_{self._roll_out_cnt}"), "wb") as fp:
pickle.dump(logs, fp)
self._logger.info("organize exp_dict")
result = organize_exp_list(self.exp_output_np, self.exp_idx_mapping)
if self.config.log.exp.enable and self._roll_out_cnt % self.config.log.exp.freq == 0:
with open(os.path.join(self.config.log.path, f"exp_{self._roll_out_cnt}"), "wb") as fp:
pickle.dump(result, fp)
self._logger.debug(f"play time: {int(self._roll_out_time)}")
self._logger.debug(f"transfer time: {int(self._trainsfer_time)}")
return result
def exit(self):
"""Terminate the child processes."""
for p in self.pipes:
p[0].send("close")

Просмотреть файл

@ -1,180 +0,0 @@
import os
import torch
from torch import nn
from torch.distributions import Categorical
from torch.nn.utils import clip_grad
from maro.rl import AbsAlgorithm
from .utils import gnn_union
class ActorCritic(AbsAlgorithm):
"""Actor-Critic algorithm in CIM problem.
The vanilla ac algorithm.
Args:
model (nn.Module): A actor-critic module outputing both the policy network and the value network.
device (torch.device): A PyTorch device instance where the module is computed on.
p2p_adj (numpy.array): The static port-to-port adjencency matrix.
td_steps (int): The value "n" in the n-step TD algorithm.
gamma (float): The time decay.
learning_rate (float): The learning rate for the module.
entropy_factor (float): The weight of the policy"s entropy to boost exploration.
"""
def __init__(
self, model: nn.Module, device: torch.device, p2p_adj=None, td_steps=100, gamma=0.97, learning_rate=0.0003,
entropy_factor=0.1):
self._gamma = gamma
self._td_steps = td_steps
self._value_discount = gamma**100
self._entropy_factor = entropy_factor
self._device = device
self._tot_batchs = 0
self._p2p_adj = p2p_adj
super().__init__(
model_dict={"a&c": model}, optimizer_opt={"a&c": (torch.optim.Adam, {"lr": learning_rate})},
loss_func_dict={}, hyper_params=None)
def choose_action(self, state: dict, p_idx: int, v_idx: int):
"""Get action from the AC model.
Args:
state (dict): A dictionary containing the input to the module. For example:
{
"v": v,
"p": p,
"pe": {
"edge": pedge,
"adj": padj,
"mask": pmask,
},
"ve": {
"edge": vedge,
"adj": vadj,
"mask": vmask,
},
"ppe": {
"edge": ppedge,
"adj": p2p_adj,
"mask": p2p_mask,
},
"mask": seq_mask,
}
p_idx (int): The identity of the port doing the action.
v_idx (int): The identity of the vessel doing the action.
Returns:
model_action (numpy.int64): The action returned from the module.
"""
with torch.no_grad():
prob, _ = self._model_dict["a&c"](state, a=True, p_idx=p_idx, v_idx=v_idx)
distribution = Categorical(prob)
model_action = distribution.sample().cpu().numpy()
return model_action
def train(self, batch, p_idx, v_idx):
"""Model training.
Args:
batch (dict): The dictionary of a batch of experience. For example:
{
"s": the dictionary of state,
"a": model actions in numpy array,
"R": the n-step accumulated reward,
"s"": the dictionary of the next state,
}
p_idx (int): The identity of the port doing the action.
v_idx (int): The identity of the vessel doing the action.
Returns:
a_loss (float): action loss.
c_loss (float): critic loss.
e_loss (float): entropy loss.
tot_norm (float): the L2 norm of the gradient.
"""
self._tot_batchs += 1
item_a_loss, item_c_loss, item_e_loss = 0, 0, 0
obs_batch = batch["s"]
action_batch = batch["a"]
return_batch = batch["R"]
next_obs_batch = batch["s_"]
obs_batch = gnn_union(
obs_batch["p"], obs_batch["po"], obs_batch["pedge"], obs_batch["v"], obs_batch["vo"], obs_batch["vedge"],
self._p2p_adj, obs_batch["ppedge"], obs_batch["mask"], self._device)
action_batch = torch.from_numpy(action_batch).long().to(self._device)
return_batch = torch.from_numpy(return_batch).float().to(self._device)
next_obs_batch = gnn_union(
next_obs_batch["p"], next_obs_batch["po"], next_obs_batch["pedge"], next_obs_batch["v"],
next_obs_batch["vo"], next_obs_batch["vedge"], self._p2p_adj, next_obs_batch["ppedge"],
next_obs_batch["mask"], self._device)
# Train actor network.
self._optimizer["a&c"].zero_grad()
# Every port has a value.
# values.shape: (batch, p_cnt)
probs, values = self._model_dict["a&c"](obs_batch, a=True, p_idx=p_idx, v_idx=v_idx, c=True)
distribution = Categorical(probs)
log_prob = distribution.log_prob(action_batch)
entropy_loss = distribution.entropy()
_, values_ = self._model_dict["a&c"](next_obs_batch, c=True)
advantage = return_batch + self._value_discount * values_.detach() - values
if self._entropy_factor != 0:
# actor_loss = actor_loss* torch.log(entropy_loss + np.e)
advantage[:, p_idx] += self._entropy_factor * entropy_loss.detach()
actor_loss = - (log_prob * torch.sum(advantage, axis=-1).detach()).mean()
item_a_loss = actor_loss.item()
item_e_loss = entropy_loss.mean().item()
# Train critic network.
critic_loss = torch.sum(advantage.pow(2), axis=1).mean()
item_c_loss = critic_loss.item()
# torch.nn.utils.clip_grad_norm_(self._critic_model.parameters(),0.5)
tot_loss = 0.1 * actor_loss + critic_loss
tot_loss.backward()
tot_norm = clip_grad.clip_grad_norm_(self._model_dict["a&c"].parameters(), 1)
self._optimizer["a&c"].step()
return item_a_loss, item_c_loss, item_e_loss, float(tot_norm)
def set_weights(self, weights):
self._model_dict["a&c"].load_state_dict(weights)
def get_weights(self):
return self._model_dict["a&c"].state_dict()
def _get_save_idx(self, fp_str):
return int(fp_str.split(".")[0].split("_")[0])
def save_model(self, pth, id):
if not os.path.exists(pth):
os.makedirs(pth)
pth = os.path.join(pth, f"{id}_ac.pkl")
torch.save(self._model_dict["a&c"].state_dict(), pth)
def _set_gnn_weights(self, weights):
for key in weights:
if key in self._model_dict["a&c"].state_dict().keys():
self._model_dict["a&c"].state_dict()[key].copy_(weights[key])
def load_model(self, folder_pth, idx=-1):
if idx == -1:
fps = os.listdir(folder_pth)
fps = [f for f in fps if "ac" in f]
fps.sort(key=self._get_save_idx)
ac_pth = fps[-1]
else:
ac_pth = f"{idx}_ac.pkl"
pth = os.path.join(folder_pth, ac_pth)
with open(pth, "rb") as fp:
weights = torch.load(fp, map_location=self._device)
self._set_gnn_weights(weights)

Просмотреть файл

@ -1,41 +0,0 @@
from collections import defaultdict
import numpy as np
from maro.rl import AbsAgent
from maro.utils import DummyLogger
from .numpy_store import Shuffler
class TrainableAgent(AbsAgent):
def __init__(self, name, algorithm, experience_pool, logger=DummyLogger()):
self._logger = logger
super().__init__(name, algorithm, experience_pool)
def train(self, training_config):
loss_dict = defaultdict(list)
for j in range(training_config.shuffle_time):
shuffler = Shuffler(self._experience_pool, batch_size=training_config.batch_size)
while shuffler.has_next():
batch = shuffler.next()
actor_loss, critic_loss, entropy_loss, tot_loss = self._algorithm.train(
batch, self._name[0], self._name[1])
loss_dict["actor"].append(actor_loss)
loss_dict["critic"].append(critic_loss)
loss_dict["entropy"].append(entropy_loss)
loss_dict["tot"].append(tot_loss)
a_loss = np.mean(loss_dict["actor"])
c_loss = np.mean(loss_dict["critic"])
e_loss = np.mean(loss_dict["entropy"])
tot_loss = np.mean(loss_dict["tot"])
self._logger.debug(
f"code: {str(self._name)} \t actor: {float(a_loss)} \t critic: {float(c_loss)} \t entropy: {float(e_loss)} \
\t tot: {float(tot_loss)}")
self._experience_pool.clear()
return loss_dict
def choose_action(self, model_state):
return self._algorithm.choose_action(model_state, self._name[0], self._name[1])

Просмотреть файл

@ -1,119 +0,0 @@
from copy import copy
import numpy as np
import torch
from maro.rl import AbsAgentManager, AgentMode
from maro.utils import DummyLogger
from .actor_critic import ActorCritic
from .agent import TrainableAgent
from .numpy_store import NumpyStore
from .simple_gnn import SharedAC
from .state_shaper import GNNStateShaper
class SimpleAgentManger(AbsAgentManager):
def __init__(
self, name, agent_id_list, port_code_list, vessel_code_list, demo_env, state_shaper: GNNStateShaper,
logger=DummyLogger()):
super().__init__(
name, AgentMode.TRAIN, agent_id_list, state_shaper=state_shaper, action_shaper=None,
experience_shaper=None, explorer=None)
self.port_code_list = copy(port_code_list)
self.vessel_code_list = copy(vessel_code_list)
self.demo_env = demo_env
self._logger = logger
def assemble(self, config):
v_dim, vedge_dim = self._state_shaper.get_input_dim("v"), self._state_shaper.get_input_dim("vedge")
p_dim, pedge_dim = self._state_shaper.get_input_dim("p"), self._state_shaper.get_input_dim("pedge")
self.device = torch.device(config.training.device)
self._logger.info(config.training.device)
ac_model = SharedAC(
p_dim, pedge_dim, v_dim, vedge_dim, config.model.tick_buffer, config.model.action_dim).to(self.device)
value_dict = {
("s", "v"):
(
(config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
np.float32, False),
("s", "p"):
(
(config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
np.float32, False),
("s", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
("s", "po"): ((len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
("s", "vedge"):
(
(len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
np.float32, True),
("s", "pedge"):
(
(len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
np.float32, True),
("s", "ppedge"):
(
(len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
np.float32, True),
("s", "mask"): ((config.model.tick_buffer, ), np.bool, True),
("s_", "v"):
(
(config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
np.float32, False),
("s_", "p"):
(
(config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
np.float32, False),
("s_", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
("s_", "po"):
(
(len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
("s_", "vedge"):
(
(len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
np.float32, True),
("s_", "pedge"):
(
(len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
np.float32, True),
("s_", "ppedge"):
(
(len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
np.float32, True),
("s_", "mask"): ((config.model.tick_buffer, ), np.bool, True),
# To identify one dimension variable.
("R",): ((len(self.port_code_list), ), np.float32, True),
("a",): (tuple(), np.int64, True),
}
self._algorithm = ActorCritic(
ac_model, self.device, td_steps=config.training.td_steps, p2p_adj=self._state_shaper.p2p_static_graph,
gamma=config.training.gamma, learning_rate=config.training.learning_rate)
for agent_id, cnt in config.env.exp_per_ep.items():
experience_pool = NumpyStore(value_dict, config.training.parallel_cnt * config.training.train_freq * cnt)
self._agent_dict[agent_id] = TrainableAgent(agent_id, self._algorithm, experience_pool, self._logger)
def choose_action(self, agent_id, state):
return self._agent_dict[agent_id].choose_action(state)
def load_models_from_files(self, model_pth):
self._algorithm.load_model(model_pth)
def train(self, training_config):
for agent in self._agent_dict.values():
agent.train(training_config)
def store_experiences(self, experiences):
for code, exp_list in experiences.items():
self._agent_dict[code].store_experiences(exp_list)
def save_model(self, pth, id):
self._algorithm.save_model(pth, id)
def load_model(self, pth):
self._algorithm.load_model(pth)

Просмотреть файл

@ -1,111 +0,0 @@
from collections import defaultdict
import numpy as np
class ExperienceShaper:
def __init__(
self, static_list, dynamic_list, max_tick, gnn_state_shaper, scale_factor=0.0001, time_slot=100,
discount_factor=0.97, idx=-1, shared_storage=None, exp_idx_mapping=None):
self._static_list = list(static_list)
self._dynamic_list = list(dynamic_list)
self._time_slot = time_slot
self._discount_factor = discount_factor
self._discount_vector = np.logspace(1, self._time_slot, self._time_slot, base=discount_factor)
self._max_tick = max_tick
self._tick_range = list(range(self._max_tick))
self._len_return = self._max_tick - self._time_slot
self._gnn_state_shaper = gnn_state_shaper
self._fulfillment_list, self._shortage_list, self._experience_dict = None, None, None
self._experience_dict = defaultdict(list)
self._init_state()
self._idx = idx
self._exp_idx_mapping = exp_idx_mapping
self._shared_storage = shared_storage
self._scale_factor = scale_factor
def _init_state(self):
self._fulfillment_list, self._shortage_list = np.zeros(self._max_tick + 1), np.zeros(self._max_tick + 1)
self._experience_dict = defaultdict(list)
self._last_tick = 0
def record(self, decision_event, model_action, model_input):
# Only the experience that has the next state of given time slot is valuable.
if decision_event.tick + self._time_slot < self._max_tick:
self._experience_dict[decision_event.port_idx, decision_event.vessel_idx].append({
"tick": decision_event.tick,
"s": model_input,
"a": model_action,
})
def _compute_delta(self, arr):
delta = np.array(arr)
delta[1:] -= arr[:-1]
return delta
def _batch_obs_to_numpy(self, obs):
v = np.stack([o["v"] for o in obs], axis=0)
p = np.stack([o["p"] for o in obs], axis=0)
vo = np.stack([o["vo"] for o in obs], axis=0)
po = np.stack([o["po"] for o in obs], axis=0)
return {"p": p, "v": v, "vo": vo, "po": po}
def __call__(self, snapshot_list):
if self._shared_storage is None:
return
shortage = snapshot_list["ports"][self._tick_range:self._static_list:"shortage"].reshape(self._max_tick, -1)
fulfillment = snapshot_list["ports"][self._tick_range:self._static_list:"fulfillment"] \
.reshape(self._max_tick, -1)
delta = fulfillment - shortage
R = np.empty((self._len_return, len(self._static_list)), dtype=np.float)
for i in range(0, self._len_return, 1):
R[i] = np.dot(self._discount_vector, delta[i + 1: i + self._time_slot + 1])
for (agent_idx, vessel_idx), exp_list in self._experience_dict.items():
for exp in exp_list:
tick = exp["tick"]
exp["s_"] = self._gnn_state_shaper(tick=tick + self._time_slot)
exp["R"] = self._scale_factor * R[tick]
tmpi = 0
for (agent_idx, vessel_idx), idx_base in self._exp_idx_mapping.items():
exp_list = self._experience_dict[(agent_idx, vessel_idx)]
exp_len = len(exp_list)
# Here, we assume that exp_idx_mapping order is not changed.
self._shared_storage["len"][self._idx, tmpi] = exp_len
self._shared_storage["s"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["v"] for e in exp_list], axis=1)
self._shared_storage["s"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["p"] for e in exp_list], axis=1)
self._shared_storage["s"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["vo"] for e in exp_list], axis=0)
self._shared_storage["s"]["po"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["po"] for e in exp_list], axis=0)
self._shared_storage["s"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["vedge"] for e in exp_list], axis=0)
self._shared_storage["s"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s"]["pedge"] for e in exp_list], axis=0)
self._shared_storage["s_"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["v"] for e in exp_list], axis=1)
self._shared_storage["s_"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["p"] for e in exp_list], axis=1)
self._shared_storage["s_"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["vo"] for e in exp_list], axis=0)
self._shared_storage["s_"]["po"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["po"] for e in exp_list], axis=0)
self._shared_storage["s_"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["vedge"] for e in exp_list], axis=0)
self._shared_storage["s_"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
np.stack([e["s_"]["pedge"] for e in exp_list], axis=0)
self._shared_storage["a"][idx_base: idx_base + exp_len, self._idx] = \
np.array([exp["a"] for exp in exp_list], dtype=np.int64)
self._shared_storage["R"][idx_base: idx_base + exp_len, self._idx] = \
np.vstack([exp["R"] for exp in exp_list])
tmpi += 1
def reset(self):
del self._experience_dict
self._init_state()

Просмотреть файл

@ -1,52 +0,0 @@
import os
import time
from maro.rl import AbsLearner
from maro.utils import DummyLogger
from .actor import ParallelActor
from .agent_manager import SimpleAgentManger
class GNNLearner(AbsLearner):
"""Learner class for the training pipeline and the specialized logging in GNN solution for CIM problem.
Args:
actor (AbsActor): The actor instance to collect experience.
trainable_agents (AbsAgentManager): The agent manager for training RL models.
logger (Logger): The logger to save/print the message.
"""
def __init__(self, actor: ParallelActor, trainable_agents: SimpleAgentManger, logger=DummyLogger()):
super().__init__()
self._actor = actor
self._trainable_agents = trainable_agents
self._logger = logger
def train(self, training_config, log_pth=None):
rollout_time = 0
training_time = 0
for i in range(training_config.rollout_cnt):
self._logger.info(f"rollout {i + 1}")
tick = time.time()
exp_dict = self._actor.roll_out()
rollout_time += time.time() - tick
self._logger.info("start putting exps")
self._trainable_agents.store_experiences(exp_dict)
if training_config.enable and i % training_config.train_freq == training_config.train_freq - 1:
self._logger.info("training start")
tick = time.time()
self._trainable_agents.train(training_config)
training_time += time.time() - tick
if log_pth is not None and (i + 1) % training_config.model_save_freq == 0:
self._trainable_agents.save_model(os.path.join(log_pth, "models"), i + 1)
self._logger.debug(f"total rollout_time: {int(rollout_time)}")
self._logger.debug(f"train_time: {int(training_time)}")
def test(self):
pass

Просмотреть файл

@ -1,186 +0,0 @@
from typing import Sequence
import numpy as np
from maro.rl import AbsStore
def get_item(data_dict, key_tuple):
"""Helper function to get the value in a hierarchical dictionary given the key path.
Args:
data_dict (dict): The data structure. For example:
{
"a": {
"b": 1,
"c": {
"d": 2,
}
}
}
key_tuple (tuple): The key path to the target field. For example, given the data_dict above, the key_tuple
("a", "c", "d") should return 2.
"""
for key in key_tuple:
data_dict = data_dict[key]
return data_dict
def set_item(data_dict, key_tuple, data):
"""The setter function corresponding to the get_item function."""
for i, key in enumerate(key_tuple):
if key not in data_dict:
data_dict[key] = {}
if i == len(key_tuple) - 1:
data_dict[key] = data
else:
data_dict = data_dict[key]
class NumpyStore(AbsStore):
def __init__(self, domain_type_dict, capacity):
"""
Args:
domain_type_dict (dict): The dictionary describing the name, structure and type of each field in the
experience. Each field in the experience is the key-value pair in the folowing structure:
(field_name): (size_of_an_instance, data_type, batch_first)
For example:
("s"): ((32, 64), np.float32, True)
The field can be a hierarchical dictionary by identifying the full path to the root.
For example:
{
("s", "p"): ((32, 64), np.float32, True)
("s", "v"): ((48, ), np.float32, False),
}
Then the batch of experience returned by self.get(indexes) is:
{
"s":
{
"p": numpy.array with size (batch, 32, 64),
"v": numpy.array with size (32, batch, 48),
}
}
Note that for the field ("s", "v"), the batch is in the 2nd dimension because the batch_first attribute
is False.
capacity (int): The maximum stored experience in the store.
"""
super().__init__()
self.domain_type_dict = dict(domain_type_dict)
self.store = {
key: np.zeros(
shape=(capacity, *shape) if batch_first else (shape[0], capacity, *shape[1:]), dtype=data_type)
for key, (shape, data_type, batch_first) in domain_type_dict.items()}
self.batch_first_store = {key: batch_first for key, (_, _, batch_first) in domain_type_dict.items()}
self.cnt = 0
self.capacity = capacity
def put(self, exp_dict: dict):
"""Insert a batch of experience into the store.
If the store reaches the maximum capacity, this function will replace the experience in the store randomly.
Args:
exp_dict (dict): The dictionary of a batch of experience. For example:
{
"s":
{
"p": numpy.array with size (batch, 32, 64),
"v": numpy.array with size (32, batch, 48),
}
}
The structure should be consistent with the structure defined in the __init__ function.
Returns:
indexes (numpy.array): The list of the indexes each experience in the batch is located in.
"""
dlen = exp_dict["len"]
append_end = min(max(self.capacity - self.cnt, 0), dlen)
idxs = np.zeros(dlen, dtype=np.int)
if append_end != 0:
for key in self.domain_type_dict.keys():
data = get_item(exp_dict, key)
if self.batch_first_store[key]:
self.store[key][self.cnt: self.cnt + append_end] = data[0: append_end]
else:
self.store[key][:, self.cnt: self.cnt + append_end] = data[:, 0: append_end]
idxs[: append_end] = np.arange(self.cnt, self.cnt + append_end)
if append_end < dlen:
replace_idx = self._get_replace_idx(dlen - append_end)
for key in self.domain_type_dict.keys():
data = get_item(exp_dict, key)
if self.batch_first_store[key]:
self.store[key][replace_idx] = data[append_end: dlen]
else:
self.store[key][:, replace_idx] = data[:, append_end: dlen]
idxs[append_end: dlen] = replace_idx
self.cnt += dlen
return idxs
def _get_replace_idx(self, cnt):
return np.random.randint(low=0, high=self.capacity, size=cnt)
def get(self, indexes: np.array):
"""Get the experience indexed in the indexes list from the store.
Args:
indexes (np.array): A numpy array containing the indexes of a batch experience.
Returns:
data_dict (dict): the structure same as that defined in the __init__ function.
"""
data_dict = {}
for key in self.domain_type_dict.keys():
if self.batch_first_store[key]:
set_item(data_dict, key, self.store[key][indexes])
else:
set_item(data_dict, key, self.store[key][:, indexes])
return data_dict
def __len__(self):
return min(self.capacity, self.cnt)
def update(self, indexes: Sequence, contents: Sequence):
raise NotImplementedError("NumpyStore does not support modifying the experience!")
def sample(self, size, weights: Sequence, replace: bool = True):
raise NotImplementedError("NumpyStore does not support sampling. Please use outer sampler to fetch samples!")
def clear(self):
"""Remove all the experience in the store."""
self.cnt = 0
class Shuffler:
def __init__(self, store: NumpyStore, batch_size: int):
"""The helper class for fast batch sampling.
Args:
store (NumpyStore): The data source for sampling.
batch_size (int): The size of a batch.
"""
self._store = store
self._shuffled_seq = np.arange(0, len(store))
np.random.shuffle(self._shuffled_seq)
self._start = 0
self._batch_size = batch_size
def next(self):
"""Uniformly sampling out a batch in the store."""
if self._start >= len(self._store):
return None
end = min(self._start + self._batch_size, len(self._store))
rst = self._store.get(self._shuffled_seq[self._start: end])
self._start += self._batch_size
return rst
def has_next(self):
"""Check if any experience is not visited."""
return self._start < len(self._store)

Просмотреть файл

@ -1,46 +0,0 @@
import multiprocessing
import numpy as np
def init_shared_memory(data_structure):
"""Initialize the data structure of the shared memory.
Args:
data_structure: The dictionary that describes the data structure. For example,
{
"a": (shape, type),
"b": {
"b1": (shape, type),
}
}
"""
if isinstance(data_structure, tuple):
mult = 1
for i in data_structure[0]:
mult *= i
return multiprocessing.Array(data_structure[1], mult, lock=False)
else:
shared_data = {}
for k, v in data_structure.items():
shared_data[k] = init_shared_memory(v)
return shared_data
def shared_data2numpy(shared_data, structure_info):
if not isinstance(shared_data, dict):
return np.frombuffer(shared_data, dtype=structure_info[1]).reshape(structure_info[0])
else:
numpy_dict = {}
for k, v in shared_data.items():
numpy_dict[k] = shared_data2numpy(v, structure_info[k])
return numpy_dict
class SharedStructure:
def __init__(self, data_structure):
self.data_structure = data_structure
self.shared = init_shared_memory(data_structure)
def structuralize(self):
return shared_data2numpy(self.shared, self.data_structure)

Просмотреть файл

@ -1,335 +0,0 @@
import math
import torch
import torch.nn as nn
from torch import Tensor
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.nn import functional as F
from torch.nn.modules.activation import MultiheadAttention
from torch.nn.modules.dropout import Dropout
from torch.nn.modules.normalization import LayerNorm
class PositionalEncoder(nn.Module):
"""
The positional encoding used in transformer to get the sequential information.
The code is based on the PyTorch version in web
https://pytorch.org/tutorials/beginner/transformer_tutorial.html?highlight=positionalencoding
"""
def __init__(self, d_model, max_seq_len=80):
super().__init__()
self.d_model = d_model
self.times = 4 * math.sqrt(self.d_model)
# Create constant "pe" matrix with values dependant on pos and i.
self.pe = torch.zeros(max_seq_len, d_model)
for pos in range(max_seq_len):
for i in range(0, d_model, 2):
self.pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
self.pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))
self.pe = self.pe.unsqueeze(1) / self.d_model
def forward(self, x):
# Make embeddings relatively larger.
addon = self.pe[: x.shape[0], :, : x.shape[2]].to(x.device)
return x + addon
class SimpleGATLayer(nn.Module):
"""The enhanced graph attention layer for heterogenenous neighborhood.
It first utilizes pre-layers for both the source and destination node to map their features into the same hidden
size. If the edge also has features, they are concatenated with those of the corresponding source node before being
fed to the pre-layers. Then the graph attention(https://arxiv.org/abs/1710.10903) is done to aggregate information
from the source nodes to the destination nodes. The residual connection and layer normalization are also used to
enhance the performance, which is similar to the Transformer(https://arxiv.org/abs/1706.03762).
Args:
src_dim (int): The feature dimension of the source nodes.
dest_dim (int): The feature dimension of the destination nodes.
edge_dim (int): The feature dimension of the edges. If the edges have no feature, it should be set 0.
hidden_size (int): The hidden size both the destination and source is mapped into.
nhead (int): The number of head in the multi-head attention.
position_encoding (bool): the neighbor source nodes is aggregated in order(True) or orderless(False).
"""
def __init__(self, src_dim, dest_dim, edge_dim, hidden_size, nhead=4, position_encoding=True):
super().__init__()
self.src_dim = src_dim
self.dest_dim = dest_dim
self.edge_dim = edge_dim
self.hidden_size = hidden_size
self.nhead = nhead
src_layers = []
src_layers.append(nn.Linear(src_dim + edge_dim, hidden_size))
src_layers.append(GeLU())
self.src_pre_layer = nn.Sequential(*src_layers)
dest_layers = []
dest_layers.append(nn.Linear(dest_dim, hidden_size))
dest_layers.append(GeLU())
self.dest_pre_layer = nn.Sequential(*dest_layers)
self.att = MultiheadAttention(embed_dim=hidden_size, num_heads=nhead)
self.att_dropout = Dropout(0.1)
self.att_norm = LayerNorm(hidden_size)
self.zero_padding_template = torch.zeros((1, src_dim), dtype=torch.float)
def forward(self, src: Tensor, dest: Tensor, adj: Tensor, mask: Tensor, edges: Tensor = None):
"""Information aggregation from the source nodes to the destination nodes.
Args:
src (Tensor): The source nodes in a batch of graph.
dest (Tensor): The destination nodes in a batch of graph.
adj (Tensor): The adjencency list stored in a 2D matrix in the batch-second format. The first dimension is
the maximum amount of the neighbors the destinations have. As the neighbor quantities vary from one
destination to another, the short sequences are padded with 0.
mask (Tensor): The mask identifies if a position in the adj is padded. Note that it is stored in the
batch-first format.
Returns:
destination_emb: The embedding of the destinations after the GAT layer.
Shape:
src: (batch, src_cnt, src_dim)
dest: (batch, dest_cnt, dest_dim)
adj: (src_neighbor_cnt, batch*dest_cnt)
mask: (batch*dest_cnt)*src_neighbor_cnt
edges: (batch*dest_cnt, src_neighbor_cnt, edge_dim)
destination_emb: (batch, dest_cnt, hidden_size)
"""
assert(self.src_dim == src.shape[-1])
assert(self.dest_dim == dest.shape[-1])
batch, s_cnt, src_dim = src.shape
batch, d_cnt, dest_dim = dest.shape
src_neighbor_cnt = adj.shape[0]
src_embedding = src.reshape(-1, src_dim)
src_embedding = torch.cat((self.zero_padding_template.to(src_embedding.device), src_embedding))
flat_adj = adj.reshape(-1)
src_embedding = src_embedding[flat_adj].reshape(src_neighbor_cnt, -1, src_dim)
if edges is not None:
src_embedding = torch.cat((src_embedding, edges), axis=2)
src_input = self.src_pre_layer(
src_embedding.reshape(-1, src_dim + self.edge_dim)). \
reshape(*src_embedding.shape[:2], self.hidden_size)
dest_input = self.dest_pre_layer(dest.reshape(-1, dest_dim)).reshape(1, batch * d_cnt, self.hidden_size)
dest_emb, _ = self.att(dest_input, src_input, src_input, key_padding_mask=mask)
dest_emb = dest_emb + self.att_dropout(dest_emb)
dest_emb = self.att_norm(dest_emb)
return dest_emb.reshape(batch, d_cnt, self.hidden_size)
class SimpleTransformer(nn.Module):
"""Graph attention network with multiple graph in the CIM scenario.
This module aggregates information in the port-to-port graph, port-to-vessel graph and vessel-to-port graph. The
aggregation in the two graph are done separatedly and then the port features are concatenated as the final result.
Args:
p_dim (int): The feature dimension of the ports.
v_dim (int): The feature dimension of the vessels.
edge_dim (dict): The key is the edge name and the value is the corresponding feature dimension.
output_size (int): The hidden size in graph attention.
layer_num (int): The number of graph attention layers in each graph.
"""
def __init__(self, p_dim, v_dim, edge_dim: dict, output_size, layer_num=2):
super().__init__()
self.hidden_size = output_size
self.layer_num = layer_num
pl, vl, ppl = [], [], []
for i in range(layer_num):
if i == 0:
pl.append(SimpleGATLayer(v_dim, p_dim, edge_dim["v"], self.hidden_size, nhead=4))
vl.append(SimpleGATLayer(p_dim, v_dim, edge_dim["v"], self.hidden_size, nhead=4))
# p2p links.
ppl.append(
SimpleGATLayer(
p_dim, p_dim, edge_dim["p"], self.hidden_size, nhead=4, position_encoding=False)
)
else:
pl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
if i != layer_num - 1:
# p2v conv is not necessary at the last layer, for we only use port features.
vl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
ppl.append(SimpleGATLayer(
self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4, position_encoding=False))
self.p_layers = nn.ModuleList(pl)
self.v_layers = nn.ModuleList(vl)
self.pp_layers = nn.ModuleList(ppl)
def forward(self, p, pe, v, ve, ppe):
"""Do the multi-channel graph attention.
Args:
p (Tensor): The port feature.
pe (Tensor): The vessel-port edge feature.
v (Tensor): The vessel feature.
ve (Tensor): The port-vessel edge feature.
ppe (Tensor): The port-port edge feature.
"""
# p.shape: (batch*p_cnt, p_dim)
pp = p
pre_p, pre_v, pre_pp = p, v, pp
for i in range(self.layer_num):
# Only feed edge info in the first layer.
p = self.p_layers[i](pre_v, pre_p, adj=pe["adj"], edges=pe["edge"] if i == 0 else None, mask=pe["mask"])
if i != self.layer_num - 1:
v = self.v_layers[i](
pre_p, pre_v, adj=ve["adj"], edges=ve["edge"] if i == 0 else None, mask=ve["mask"])
pp = self.pp_layers[i](
pre_pp, pre_pp, adj=ppe["adj"], edges=ppe["edge"] if i == 0 else None, mask=ppe["mask"])
pre_p, pre_v, pre_pp = p, v, pp
p = torch.cat((p, pp), axis=2)
return p, v
class GeLU(nn.Module):
"""Simple gelu wrapper as a independent module."""
def __init__(self):
super().__init__()
def forward(self, input):
return F.gelu(input)
class Header(nn.Module):
def __init__(self, input_size, hidden_size, output_size, net_type="res"):
super().__init__()
self.net_type = net_type
if net_type == "res":
self.fc_0 = nn.Linear(input_size, hidden_size)
self.act_0 = GeLU()
# self.do_0 = Dropout(dropout)
self.fc_1 = nn.Linear(hidden_size, input_size)
self.act_1 = GeLU()
self.fc_2 = nn.Linear(input_size, output_size)
elif net_type == "2layer":
self.fc_0 = nn.Linear(input_size, hidden_size)
self.act_0 = GeLU()
# self.do_0 = Dropout(dropout)
self.fc_1 = nn.Linear(hidden_size, hidden_size // 2)
self.act_1 = GeLU()
self.fc_2 = nn.Linear(hidden_size // 2, output_size)
elif net_type == "1layer":
self.fc_0 = nn.Linear(input_size, hidden_size)
self.act_0 = GeLU()
self.fc_1 = nn.Linear(hidden_size, output_size)
def forward(self, x):
if self.net_type == "res":
x1 = self.act_0(self.fc_0(x))
x1 = self.act_1(self.fc_1(x1) + x)
return self.fc_2(x1)
elif self.net_type == "2layer":
x = self.act_0(self.fc_0(x))
x = self.act_1(self.fc_1(x))
x = self.fc_1(x)
return x
else:
x = self.fc_1(self.act_0(self.fc_0(x)))
return x
class SharedAC(nn.Module):
"""The actor-critic module shared with multiple agents.
This module maps the input graph of the observation to the policy and value space. It first extracts the temporal
information separately for each node with a small transformer block and then extracts the spatial information with
a multi-graph/channel graph attention. Finally, the extracted feature embedding is fed to a actor header as well
as a critic layer, which are the two MLPs with residual connections.
"""
def __init__(
self, input_dim_p, edge_dim_p, input_dim_v, edge_dim_v, tick_buffer, action_dim, a=True, c=True,
scale=4, ac_head="res"):
super().__init__()
assert(a or c)
self.a, self.c = a, c
self.input_dim_v = input_dim_v
self.input_dim_p = input_dim_p
self.tick_buffer = tick_buffer
self.pre_dim_v, self.pre_dim_p = 8 * scale, 16 * scale
self.p_pre_layer = nn.Sequential(
nn.Linear(input_dim_p, self.pre_dim_p), GeLU(), PositionalEncoder(
d_model=self.pre_dim_p, max_seq_len=tick_buffer))
self.v_pre_layer = nn.Sequential(
nn.Linear(input_dim_v, self.pre_dim_v), GeLU(), PositionalEncoder(
d_model=self.pre_dim_v, max_seq_len=tick_buffer))
p_encoder_layer = TransformerEncoderLayer(
d_model=self.pre_dim_p, nhead=4, activation="gelu", dim_feedforward=self.pre_dim_p * 4)
v_encoder_layer = TransformerEncoderLayer(
d_model=self.pre_dim_v, nhead=2, activation="gelu", dim_feedforward=self.pre_dim_v * 4)
# Alternative initialization: define the normalization.
# self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_p))
# self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_v))
self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3)
self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3)
self.gnn_output_size = 32 * scale
self.trans_gat = SimpleTransformer(
p_dim=self.pre_dim_p,
v_dim=self.pre_dim_v,
output_size=self.gnn_output_size // 2,
edge_dim={"p": edge_dim_p, "v": edge_dim_v},
layer_num=2
)
if a:
self.policy_hidden_size = 16 * scale
self.a_input = 3 * self.gnn_output_size // 2
self.actor = nn.Sequential(
Header(self.a_input, self.policy_hidden_size, action_dim, ac_head), nn.Softmax(dim=-1))
if c:
self.value_hidden_size = 16 * scale
self.c_input = self.gnn_output_size
self.critic = Header(self.c_input, self.value_hidden_size, 1, ac_head)
def forward(self, state, a=False, p_idx=None, v_idx=None, c=False):
assert((a and p_idx is not None and v_idx is not None) or c)
feature_p, feature_v = state["p"], state["v"]
tb, bsize, p_cnt, _ = feature_p.shape
v_cnt = feature_v.shape[2]
assert(tb == self.tick_buffer)
# Before: feature_p.shape: (tick_buffer, batch_size, p_cnt, p_dim)
# After: feature_p.shape: (tick_buffer, batch_size*p_cnt, p_dim)
feature_p = self.p_pre_layer(feature_p.reshape(feature_p.shape[0], -1, feature_p.shape[-1]))
# state["mask"]: (batch_size, tick_buffer)
# mask_p: (batch_size, p_cnt, tick_buffer)
mask_p = state["mask"].repeat(1, p_cnt).reshape(-1, self.tick_buffer)
feature_p = self.trans_layer_p(feature_p, src_key_padding_mask=mask_p)
feature_v = self.v_pre_layer(feature_v.reshape(feature_v.shape[0], -1, feature_v.shape[-1]))
mask_v = state["mask"].repeat(1, v_cnt).reshape(-1, self.tick_buffer)
feature_v = self.trans_layer_v(feature_v, src_key_padding_mask=mask_v)
feature_p = feature_p[0].reshape(bsize, p_cnt, self.pre_dim_p)
feature_v = feature_v[0].reshape(bsize, v_cnt, self.pre_dim_v)
emb_p, emb_v = self.trans_gat(feature_p, state["pe"], feature_v, state["ve"], state["ppe"])
a_rtn, c_rtn = None, None
if a and self.a:
ap = emb_p.reshape(bsize, p_cnt, self.gnn_output_size)
ap = ap[:, p_idx, :]
av = emb_v.reshape(bsize, v_cnt, self.gnn_output_size // 2)
av = av[:, v_idx, :]
emb_a = torch.cat((ap, av), axis=1)
a_rtn = self.actor(emb_a)
if c and self.c:
c_rtn = self.critic(emb_p).reshape(bsize, p_cnt)
return a_rtn, c_rtn

Просмотреть файл

@ -1,235 +0,0 @@
import numpy as np
from maro.rl.shaping.state_shaper import StateShaper
from .utils import compute_v2p_degree_matrix
class GNNStateShaper(StateShaper):
"""State shaper to extract graph information.
Args:
port_code_list (list): The list of the port codes in the CIM topology.
vessel_code_list (list): The list of the vessel code in the CIM topology.
max_tick (int): The duration of the simulation.
feature_config (dict): The dottable dict that stores the configuration of the observation feature.
max_value (int): The norm scale. All the feature are simply divided by this number.
tick_buffer (int): The value n in n-step TD.
only_demo (bool): Define if the shaper instance is used only for shape demonstration(True) or runtime
shaping(False).
"""
def __init__(
self, port_code_list, vessel_code_list, max_tick, feature_config, max_value=100000, tick_buffer=20,
only_demo=False):
# Collect and encode all ports.
self.port_code_list = list(port_code_list)
self.port_cnt = len(self.port_code_list)
self.port_code_inv_dict = {code: i for i, code in enumerate(self.port_code_list)}
# Collect and encode all vessels.
self.vessel_code_list = list(vessel_code_list)
self.vessel_cnt = len(self.vessel_code_list)
self.vessel_code_inv_dict = {code: i for i, code in enumerate(self.vessel_code_list)}
# Collect and encode ports and vessels together.
self.node_code_inv_dict_p = {i: i for i in self.port_code_list}
self.node_code_inv_dict_v = {i: i + self.port_cnt for i in self.vessel_code_list}
self.node_cnt = self.port_cnt + self.vessel_cnt
one_hot_coding = np.identity(self.node_cnt)
self.port_one_hot_coding = np.expand_dims(one_hot_coding[:self.port_cnt], axis=0)
self.vessel_one_hot_coding = np.expand_dims(one_hot_coding[self.port_cnt:], axis=0)
self.last_tick = -1
self.port_features = [
"empty", "full", "capacity", "on_shipper", "on_consignee", "booking", "acc_booking", "shortage",
"acc_shortage", "fulfillment", "acc_fulfillment"]
self.vessel_features = ["empty", "full", "capacity", "remaining_space"]
self._max_tick = max_tick
self._tick_buffer = tick_buffer
# To identify one vessel would never arrive at the port.
self.max_arrival_time = 99999999
self.vedge_dim = 2
self.pedge_dim = 1
self._only_demo = only_demo
self._feature_config = feature_config
self._normalize = True
self._norm_scale = 2.0 / max_value
if not only_demo:
self._state_dict = {
# Last "tick" is used for embedding, all zero and never be modified.
"v": np.zeros((self._max_tick + 1, self.vessel_cnt, self.get_input_dim("v"))),
"p": np.zeros((self._max_tick + 1, self.port_cnt, self.get_input_dim("p"))),
"vo": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt), dtype=np.int),
"po": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt), dtype=np.int),
"vedge": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt, self.get_input_dim("vedge"))),
"pedge": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt, self.get_input_dim("vedge"))),
"ppedge": np.zeros((self._max_tick + 1, self.port_cnt, self.port_cnt, self.get_input_dim("pedge"))),
}
# Fixed order: in the order of degree.
def compute_static_graph_structure(self, env):
v2p_adj_matrix = compute_v2p_degree_matrix(env)
p2p_adj_matrix = np.dot(v2p_adj_matrix.T, v2p_adj_matrix)
p2p_adj_matrix[p2p_adj_matrix == 0] = self.max_arrival_time
np.fill_diagonal(p2p_adj_matrix, self.max_arrival_time)
self._p2p_embedding = self.sort(p2p_adj_matrix)
v2p_adj_matrix = -v2p_adj_matrix
v2p_adj_matrix[v2p_adj_matrix == 0] = self.max_arrival_time
self._fixed_v_order = self.sort(v2p_adj_matrix)
self._fixed_p_order = self.sort(v2p_adj_matrix.T)
@property
def p2p_static_graph(self):
return self._p2p_embedding
def sort(self, arrival_time, attr=None):
"""
Given the arrival time matrix, this function sort the matrix and return the index matrix in the order of
arrival time
"""
n, m = arrival_time.shape
if self._feature_config.attention_order == "ramdom":
arrival_time = arrival_time + np.random.randint(self._max_tick, size=arrival_time.shape)
at_index = np.argsort(arrival_time, axis=1)
if attr is not None:
idx_tmp = np.repeat(at_index, attr.shape[-1]).reshape(*at_index.shape, attr.shape[-1])
attr = np.take_along_axis(attr, idx_tmp, axis=1)
mask = np.sort(arrival_time, axis=1) >= self.max_arrival_time
at_index += 1
at_index[mask] = 0
if attr is None:
return at_index
else:
return at_index, attr
def end_ep_callback(self, snapshot_list):
if self._only_demo:
return
tick_range = np.arange(start=self.last_tick, stop=self._max_tick)
self._sync_raw_features(snapshot_list, list(tick_range))
self.last_tick = -1
def _sync_raw_features(self, snapshot_list, tick_range, static_code=None, dynamic_code=None):
"""This function update the state_dict from snapshot_list in the given tick_range."""
if len(tick_range) == 0:
# This occurs when two actions happen at the same tick.
return
# One dim features.
port_naive_feature = snapshot_list["ports"][tick_range: self.port_code_list: self.port_features] \
.reshape(len(tick_range), self.port_cnt, -1)
# Number of laden from source to destination.
full_on_port = snapshot_list["matrices"][tick_range::"full_on_ports"].reshape(
len(tick_range), self.port_cnt, self.port_cnt)
# Normalize features to a small range.
port_state_mat = self.normalize(port_naive_feature)
if self._feature_config.onehot_identity:
# Add onehot vector to identify port and vessel.
port_onehot = np.repeat(self.port_one_hot_coding, len(tick_range), axis=0)
if static_code is not None and dynamic_code is not None:
# Identify the decision vessel at the decision port.
port_onehot[-1, self.port_code_inv_dict[static_code], self.node_code_inv_dict_v[dynamic_code]] = -1
port_state_mat = np.concatenate([port_state_mat, port_onehot], axis=2)
self._state_dict["p"][tick_range] = port_state_mat
vessel_naive_feature = snapshot_list["vessels"][tick_range:self.vessel_code_list: self.vessel_features] \
.reshape(len(tick_range), self.vessel_cnt, -1)
full_on_vessel = snapshot_list["matrices"][tick_range::"full_on_vessels"].reshape(
len(tick_range), self.vessel_cnt, self.port_cnt)
vessel_state_mat = self.normalize(vessel_naive_feature)
if self._feature_config.onehot_identity:
vessel_state_mat = np.concatenate(
[vessel_state_mat, np.repeat(self.vessel_one_hot_coding, len(tick_range), axis=0)], axis=2)
self._state_dict["v"][tick_range] = vessel_state_mat
# last_arrival_time.shape: vessel_cnt * port_cnt
# -1 means one vessel never stops at the port
vessel_arrival_time = snapshot_list["matrices"][tick_range[-1]:: "vessel_plans"].reshape(
self.vessel_cnt, self.port_cnt)
# Use infinity time to identify vessels never arrive at the port.
last_arrival_time = vessel_arrival_time + 1
last_arrival_time[last_arrival_time == 0] = self.max_arrival_time
if static_code is not None and dynamic_code is not None:
# To differentiate vessel acting on the port and other vessels that have taken or wait to take actions.
last_arrival_time[self.vessel_code_inv_dict[dynamic_code], self.port_code_inv_dict[static_code]] = 0
# Here, we assume that the order of arriving time between two action/event is all the same.
vedge_raw = self.normalize(np.stack((full_on_vessel[-1], last_arrival_time), axis=-1))
vo, vedge = self.sort(last_arrival_time, attr=vedge_raw)
po, pedge = self.sort(last_arrival_time.T, attr=vedge_raw.transpose((1, 0, 2)))
self._state_dict["vo"][tick_range] = np.expand_dims(vo, axis=0)
self._state_dict["vedge"][tick_range] = np.expand_dims(vedge, axis=0)
self._state_dict["po"][tick_range] = np.expand_dims(po, axis=0)
self._state_dict["pedge"][tick_range] = np.expand_dims(pedge, axis=0)
self._state_dict["ppedge"][tick_range] = self.normalize(full_on_port[-1]).reshape(1, *full_on_port[-1].shape, 1)
def __call__(self, action_info=None, snapshot_list=None, tick=None):
if self._only_demo:
return
assert((action_info is not None and snapshot_list is not None) or tick is not None)
if action_info is not None and snapshot_list is not None:
# Update the state dict.
static_code = action_info.port_idx
dynamic_code = action_info.vessel_idx
if self.last_tick == action_info.tick:
tick_range = [action_info.tick]
else:
tick_range = list(range(self.last_tick + 1, action_info.tick + 1, 1))
self.last_tick = action_info.tick
self._sync_raw_features(snapshot_list, tick_range, static_code, dynamic_code)
tick = action_info.tick
# State_tick_range is inverse order.
state_tick_range = np.arange(tick, max(-1, tick - self._tick_buffer), -1)
v = np.zeros((self._tick_buffer, self.vessel_cnt, self.get_input_dim("v")))
v[:len(state_tick_range)] = self._state_dict["v"][state_tick_range]
p = np.zeros((self._tick_buffer, self.port_cnt, self.get_input_dim("p")))
p[:len(state_tick_range)] = self._state_dict["p"][state_tick_range]
# True means padding.
mask = np.ones(self._tick_buffer, dtype=np.bool)
mask[:len(state_tick_range)] = False
ret = {
"tick": state_tick_range,
"v": v,
"p": p,
"vo": self._state_dict["vo"][tick],
"po": self._state_dict["po"][tick],
"vedge": self._state_dict["vedge"][tick],
"pedge": self._state_dict["pedge"][tick],
"ppedge": self._state_dict["ppedge"][tick],
"mask": mask,
"len": len(state_tick_range),
}
return ret
def normalize(self, feature):
if not self._normalize:
return feature
return feature * self._norm_scale
def get_input_dim(self, agent_code):
if agent_code in self.port_code_inv_dict or agent_code == "p":
return len(self.port_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
elif agent_code in self.vessel_code_inv_dict or agent_code == "v":
return len(self.vessel_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
elif agent_code == "vedge":
# v-p edge: (arrival_time, laden to destination)
return 2
elif agent_code == "pedge":
# p-p edge: (laden to destination, )
return 1
else:
raise ValueError("agent not exist!")

Просмотреть файл

@ -1,266 +0,0 @@
import ast
import io
import os
import random
import shutil
import sys
from collections import OrderedDict, defaultdict
import numpy as np
import torch
import yaml
from maro.simulator import Env
from maro.simulator.scenarios.cim.common import Action
from maro.utils import clone, convert_dottable
def compute_v2p_degree_matrix(env):
"""This function compute the adjacent matrix."""
topo_config = env.configs
static_dict = env.summary["node_mapping"]["ports"]
dynamic_dict = env.summary["node_mapping"]["vessels"]
adj_matrix = np.zeros((len(dynamic_dict), len(static_dict)), dtype=np.int)
for v, vinfo in topo_config["vessels"].items():
route_name = vinfo["route"]["route_name"]
route = topo_config["routes"][route_name]
vid = dynamic_dict[v]
for p in route:
adj_matrix[vid][static_dict[p["port_name"]]] += 1
return adj_matrix
def from_numpy(device, *np_values):
return [torch.from_numpy(v).to(device) for v in np_values]
def gnn_union(p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask, device):
"""Union multiple graph in CIM.
Args:
v: Numpy array of shape (seq_len, batch, v_cnt, v_dim).
vo: Numpy array of shape (batch, v_cnt, p_cnt).
vedge: Numpy array of shape (batch, v_cnt, p_cnt, e_dim).
Returns:
result (dict): The dictionary that describes the graph.
"""
seq_len, batch, v_cnt, v_dim = v.shape
_, _, p_cnt, p_dim = p.shape
p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask = from_numpy(
device, p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask)
batch_range = torch.arange(batch, dtype=torch.long).to(device)
# vadj.shape: (batch*v_cnt, p_cnt*)
vadj, vedge = flatten_embedding(vo, batch_range, vedge)
# vmask.shape: (batch*v_cnt, p_cnt*)
vmask = vadj == 0
# vadj.shape: (p_cnt*, batch*v_cnt)
vadj = vadj.transpose(0, 1)
# vedge.shape: (p_cnt*, batch*v_cnt, e_dim)
vedge = vedge.transpose(0, 1)
padj, pedge = flatten_embedding(po, batch_range, pedge)
pmask = padj == 0
padj = padj.transpose(0, 1)
pedge = pedge.transpose(0, 1)
p2p_adj = p2p.repeat(batch, 1, 1)
# p2p_adj.shape: (batch*p_cnt, p_cnt*)
p2p_adj, ppedge = flatten_embedding(p2p_adj, batch_range, ppedge)
# p2p_mask.shape: (batch*p_cnt, p_cnt*)
p2p_mask = p2p_adj == 0
# p2p_adj.shape: (p_cnt*, batch*p_cnt)
p2p_adj = p2p_adj.transpose(0, 1)
ppedge = ppedge.transpose(0, 1)
return {
"v": v,
"p": p,
"pe": {
"edge": pedge,
"adj": padj,
"mask": pmask,
},
"ve": {
"edge": vedge,
"adj": vadj,
"mask": vmask,
},
"ppe": {
"edge": ppedge,
"adj": p2p_adj,
"mask": p2p_mask,
},
"mask": seq_mask,
}
def flatten_embedding(embedding, batch_range, edge=None):
if len(embedding.shape) == 3:
batch, x_cnt, y_cnt = embedding.shape
addon = (batch_range * y_cnt).view(batch, 1, 1)
else:
seq_len, batch, x_cnt, y_cnt = embedding.shape
addon = (batch_range * y_cnt).view(seq_len, batch, 1, 1)
embedding_mask = embedding == 0
embedding += addon
embedding[embedding_mask] = 0
ret = embedding.reshape(-1, embedding.shape[-1])
col_mask = ret.sum(dim=0) != 0
ret = ret[:, col_mask]
if edge is None:
return ret
else:
edge = edge.reshape(-1, *edge.shape[2:])[:, col_mask, :]
return ret, edge
def log2json(file_path):
"""load the log file as a json list."""
with open(file_path, "r") as fp:
lines = fp.read().splitlines()
json_list = "[" + ",".join(lines) + "]"
return ast.literal_eval(json_list)
def decision_cnt_analysis(env, pv=False, buffer_size=8):
if not pv:
decision_cnt = [buffer_size] * len(env.node_name_mapping["static"])
r, pa, is_done = env.step(None)
while not is_done:
decision_cnt[pa.port_idx] += 1
action = Action(pa.vessel_idx, pa.port_idx, 0)
r, pa, is_done = env.step(action)
else:
decision_cnt = OrderedDict()
r, pa, is_done = env.step(None)
while not is_done:
if (pa.port_idx, pa.vessel_idx) not in decision_cnt:
decision_cnt[pa.port_idx, pa.vessel_idx] = buffer_size
else:
decision_cnt[pa.port_idx, pa.vessel_idx] += 1
action = Action(pa.vessel_idx, pa.port_idx, 0)
r, pa, is_done = env.step(action)
env.reset()
return decision_cnt
def random_shortage(env, tick, action_dim=21):
_, pa, is_done = env.step(None)
node_cnt = len(env.summary["node_mapping"]["ports"])
while not is_done:
"""
load, discharge = pa.action_scope.load, pa.action_scope.discharge
action_idx = np.random.randint(action_dim) - zero_idx
if action_idx < 0:
actual_action = int(1.0*action_idx/zero_idx*load)
else:
actual_action = int(1.0*action_idx/zero_idx*discharge)
"""
action = Action(pa.vessel_idx, pa.port_idx, 0)
r, pa, is_done = env.step(action)
shs = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_shortage"]
fus = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_fulfillment"]
env.reset()
return fus - shs, np.sum(shs + fus)
def return_scaler(env, tick, gamma, action_dim=21):
R, tot_amount = random_shortage(env, tick, action_dim)
Rs_mean = np.mean(R) / tick / (1 - gamma)
return abs(1.0 / Rs_mean), tot_amount
def load_config(config_pth):
with io.open(config_pth, "r") as in_file:
raw_config = yaml.safe_load(in_file)
config = convert_dottable(raw_config)
if config.env.seed < 0:
config.env.seed = random.randint(0, 99999)
regularize_config(config)
return config
def save_config(config, config_pth):
with open(config_pth, "w") as fp:
config = dottable2dict(config)
config["env"]["exp_per_ep"] = [f"{k[0]}, {k[1]}, {d}" for k, d in config["env"]["exp_per_ep"].items()]
yaml.safe_dump(config, fp)
def dottable2dict(config):
if isinstance(config, float):
return str(config)
if not isinstance(config, dict):
return clone(config)
rt = {}
for k, v in config.items():
rt[k] = dottable2dict(v)
return rt
def save_code(folder, save_pth):
save_path = os.path.join(save_pth, "code")
code_pth = os.path.join(os.getcwd(), folder)
shutil.copytree(code_pth, save_path)
def fix_seed(env, seed):
env.set_seed(seed)
np.random.seed(seed)
random.seed(seed)
def zero_play(**args):
env = Env(**args)
_, pa, is_done = env.step(None)
while not is_done:
action = Action(pa.vessel_idx, pa.port_idx, 0)
r, pa, is_done = env.step(action)
return env.snapshot_list
def regularize_config(config):
def parse_value(v):
try:
return int(v)
except ValueError:
try:
return float(v)
except ValueError:
if v == "false" or v == "False":
return False
elif v == "true" or v == "True":
return True
else:
return v
def set_attr(config, attrs, value):
if len(attrs) == 1:
config[attrs[0]] = value
else:
set_attr(config[attrs[0]], attrs[1:], value)
all_args = sys.argv[1:]
for i in range(len(all_args) // 2):
name = all_args[i * 2]
attrs = name[2:].split(".")
value = parse_value(all_args[i * 2 + 1])
set_attr(config, attrs, value)
def analysis_speed(env):
speed_dict = defaultdict(int)
eq_speed = 0
for ves in env.configs["vessels"].values():
speed_dict[ves["sailing"]["speed"]] += 1
for sp, cnt in speed_dict.items():
eq_speed += 1.0 * cnt / sp
eq_speed = 1.0 / eq_speed
return speed_dict, eq_speed

Просмотреть файл

@ -1,36 +0,0 @@
env:
seed: 10
param:
durations: 1120
scenario: "cim"
topology: "global_trade.22p_l0.8"
# topology: "toy.4p_ssdd_l0.0"
training:
enable: True
parallel_cnt: 1
device: "cpu"
batch_size: 16
shuffle_time: 1
rollout_cnt: 500
train_freq: 1
model_save_freq: 1
gamma: 0.99
learning_rate: 0.00005
td_steps: 100
entropy_loss_enable: True
model:
path: "./"
tick_buffer: 20
hidden_size: 32
graph_output_dim: 32
action_dim: 21
feature:
# temporal or random, if temporal, the edges in the graph are listed in the order of event time, else in a
# random order.
attention_order: temporal
onehot_identity: False
log:
path: "./"
exp:
enable: false
freq: 10

Просмотреть файл

@ -1,70 +0,0 @@
import datetime
import os
from maro.simulator import Env
from maro.utils import Logger
from components import (
GNNLearner, GNNStateShaper, ParallelActor, SimpleAgentManger,
decision_cnt_analysis, load_config, return_scaler, save_code, save_config
)
if __name__ == "__main__":
real_path = os.path.split(os.path.realpath(__file__))[0]
config_path = os.path.join(real_path, "config.yml")
config = load_config(config_path)
# Generate log path.
date_str = datetime.datetime.now().strftime("%Y%m%d")
time_str = datetime.datetime.now().strftime("%H%M%S.%f")
subfolder_name = f"{config.env.param.topology}_{time_str}"
# Log path.
config.log.path = os.path.join(config.log.path, date_str, subfolder_name)
if not os.path.exists(config.log.path):
os.makedirs(config.log.path)
simulation_logger = Logger(tag="simulation", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
# Create a demo environment to retrieve environment information.
simulation_logger.info("Approximating the experience quantity of each agent...")
demo_env = Env(**config.env.param)
config.env.exp_per_ep = decision_cnt_analysis(demo_env, pv=True, buffer_size=8)
simulation_logger.info(config.env.exp_per_ep)
# Add some buffer to prevent overlapping.
config.env.return_scaler, tot_order_amount = return_scaler(
demo_env, tick=config.env.param.durations, gamma=config.training.gamma)
simulation_logger.info(f"Return value will be scaled down by the factor {config.env.return_scaler}")
save_config(config, os.path.join(config.log.path, "config.yml"))
save_code("examples/cim/gnn", config.log.path)
port_mapping = demo_env.summary["node_mapping"]["ports"]
vessel_mapping = demo_env.summary["node_mapping"]["vessels"]
# Create a mock gnn_state_shaper.
static_code_list, dynamic_code_list = list(port_mapping.values()), list(vessel_mapping.values())
gnn_state_shaper = GNNStateShaper(
static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
tick_buffer=config.model.tick_buffer, only_demo=True, max_value=demo_env.configs["total_containers"])
gnn_state_shaper.compute_static_graph_structure(demo_env)
# Create and assemble agent_manager.
agent_id_list = list(config.env.exp_per_ep.keys())
training_logger = Logger(tag="training", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
agent_manager = SimpleAgentManger(
"CIM-GNN-manager", agent_id_list, static_code_list, dynamic_code_list, demo_env, gnn_state_shaper,
training_logger)
agent_manager.assemble(config)
# Create the rollout actor to collect experience.
actor = ParallelActor(config, demo_env, gnn_state_shaper, agent_manager, logger=simulation_logger)
# Learner function for training and testing.
learner = GNNLearner(actor, agent_manager, logger=simulation_logger)
learner.learn(config.training)
# Cancel all the child process used for rollout.
actor.exit()

Просмотреть файл

@ -1,22 +0,0 @@
# Overview
The CIM problem is one of the quintessential use cases of MARO. The example can
be run with a set of scenario configurations that can be found under
maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
topology, type of algorithm to use, number of training episodes) can be configured
through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
all algorithm-specific parameters can be configured through
the config.py file in that folder.
## Single-host Single-process Mode
To run the CIM example using the DQN algorithm under single-host mode, go to
examples/cim/dqn and run single_process_launcher.py. You may play around with
the configuration if you want to try out different settings.
## Distributed Mode
The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
for distributed training. For debugging purposes, we provide a script that
simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
and run multi_process_launcher.py to start the learner and actor processes.

Просмотреть файл

@ -1,14 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .action_shaper import CIMActionShaper
from .agent_manager import POAgentManager, create_po_agents
from .experience_shaper import TruncatedExperienceShaper
from .state_shaper import CIMStateShaper
__all__ = [
"CIMActionShaper",
"POAgentManager", "create_po_agents",
"TruncatedExperienceShaper",
"CIMStateShaper"
]

Просмотреть файл

@ -1,33 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from maro.rl import ActionShaper
from maro.simulator.scenarios.cim.common import Action
class CIMActionShaper(ActionShaper):
def __init__(self, action_space):
super().__init__()
self._action_space = action_space
self._zero_action_index = action_space.index(0)
def __call__(self, model_action, decision_event, snapshot_list):
scope = decision_event.action_scope
tick = decision_event.tick
port_idx = decision_event.port_idx
vessel_idx = decision_event.vessel_idx
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
assert 0 <= model_action < len(self._action_space)
if model_action < self._zero_action_index:
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
elif model_action > self._zero_action_index:
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
actual_action = round(plan_action) if plan_action > 0 else round(self._action_space[model_action] * scope.discharge)
else:
actual_action = 0
return Action(vessel_idx, port_idx, actual_action)

Просмотреть файл

@ -1,83 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import numpy as np
import torch.nn as nn
from torch.optim import Adam, RMSprop
from maro.rl import (
AbsAgent, ActorCritic, ActorCriticConfig, FullyConnectedBlock, LearningModel, NNStack,
OptimizerOptions, PolicyGradient, PolicyOptimizationConfig, SimpleAgentManager
)
from maro.utils import set_seeds
class POAgent(AbsAgent):
def train(self, states: np.ndarray, actions: np.ndarray, log_action_prob: np.ndarray, rewards: np.ndarray):
self._algorithm.train(states, actions, log_action_prob, rewards)
def create_po_agents(agent_id_list, config):
input_dim, num_actions = config.input_dim, config.num_actions
set_seeds(config.seed)
agent_dict = {}
for agent_id in agent_id_list:
actor_net = NNStack(
"actor",
FullyConnectedBlock(
input_dim=input_dim,
output_dim=num_actions,
activation=nn.Tanh,
is_head=True,
**config.actor_model
)
)
if config.type == "actor_critic":
critic_net = NNStack(
"critic",
FullyConnectedBlock(
input_dim=config.input_dim,
output_dim=1,
activation=nn.LeakyReLU,
is_head=True,
**config.critic_model
)
)
hyper_params = config.actor_critic_hyper_parameters
hyper_params.update({"reward_discount": config.reward_discount})
learning_model = LearningModel(
actor_net, critic_net,
optimizer_options={
"actor": OptimizerOptions(cls=Adam, params=config.actor_optimizer),
"critic": OptimizerOptions(cls=RMSprop, params=config.critic_optimizer)
}
)
algorithm = ActorCritic(
learning_model, ActorCriticConfig(critic_loss_func=nn.SmoothL1Loss(), **hyper_params)
)
else:
learning_model = LearningModel(
actor_net,
optimizer_options=OptimizerOptions(cls=Adam, params=config.actor_optimizer)
)
algorithm = PolicyGradient(learning_model, PolicyOptimizationConfig(config.reward_discount))
agent_dict[agent_id] = POAgent(name=agent_id, algorithm=algorithm)
return agent_dict
class POAgentManager(SimpleAgentManager):
def train(self, experiences_by_agent: dict):
for agent_id, exp in experiences_by_agent.items():
if not isinstance(exp, list):
exp = [exp]
for trajectory in exp:
self.agent_dict[agent_id].train(
trajectory["state"],
trajectory["action"],
trajectory["log_action_probability"],
trajectory["reward"]
)

Просмотреть файл

@ -1,19 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
This file is used to load the configuration and convert it into a dotted dictionary.
"""
import io
import os
import yaml
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
with io.open(CONFIG_PATH, "r") as in_file:
config = yaml.safe_load(in_file)
DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
distributed_config = yaml.safe_load(in_file)

Просмотреть файл

@ -1,51 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from collections import defaultdict
import numpy as np
from maro.rl import ExperienceShaper
class TruncatedExperienceShaper(ExperienceShaper):
def __init__(self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float,
shortage_factor: float):
super().__init__(reward_func=None)
self._time_window = time_window
self._time_decay_factor = time_decay_factor
self._fulfillment_factor = fulfillment_factor
self._shortage_factor = shortage_factor
def __call__(self, trajectory, snapshot_list):
agent_ids = np.asarray(trajectory.get_by_key("agent_id"))
states = np.asarray(trajectory.get_by_key("state"))
actions = np.asarray(trajectory.get_by_key("action"))
log_action_probabilities = np.asarray(trajectory.get_by_key("log_action_probability"))
rewards = np.fromiter(
map(self._compute_reward, trajectory.get_by_key("event"), [snapshot_list] * len(trajectory)),
dtype=np.float32
)
return {agent_id: {
"state": states[agent_ids == agent_id],
"action": actions[agent_ids == agent_id],
"log_action_probability": log_action_probabilities[agent_ids == agent_id],
"reward": rewards[agent_ids == agent_id],
}
for agent_id in set(agent_ids)}
def _compute_reward(self, decision_event, snapshot_list):
start_tick = decision_event.tick + 1
end_tick = decision_event.tick + self._time_window
ticks = list(range(start_tick, end_tick))
# calculate tc reward
future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
future_shortage = snapshot_list["ports"][ticks::"shortage"]
decay_list = [self._time_decay_factor ** i for i in range(end_tick - start_tick)
for _ in range(future_fulfillment.shape[0]//(end_tick-start_tick))]
tot_fulfillment = np.dot(future_fulfillment, decay_list)
tot_shortage = np.dot(future_shortage, decay_list)
return np.float(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)

Просмотреть файл

@ -1,30 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import numpy as np
from maro.rl import StateShaper
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
class CIMStateShaper(StateShaper):
def __init__(self, *, look_back, max_ports_downstream):
super().__init__()
self._look_back = look_back
self._max_ports_downstream = max_ports_downstream
self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
def __call__(self, decision_event, snapshot_list):
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
ticks = [tick - rt for rt in range(self._look_back - 1)]
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
state = np.concatenate((port_features, vessel_features))
return str(port_idx), state
@property
def dim(self):
return self._dim

Просмотреть файл

@ -1,50 +0,0 @@
env:
scenario: "cim"
topology: "toy.4p_ssdd_l0.0"
durations: 1120
state_shaping:
look_back: 7
max_ports_downstream: 2
experience_shaping:
time_window: 100
fulfillment_factor: 1.0
shortage_factor: 1.0
time_decay_factor: 0.97
main_loop:
max_episode: 100
early_stopping:
warmup_ep: 20
last_k: 5
perf_threshold: 0.95 # minimum performance (fulfillment ratio) required to trigger early stopping
perf_stability_threshold: 0.1 # stability is measured by the maximum of abs(perf_(i+1) - perf_i) / perf_i
# over the last k episodes (where perf is short for performance). This value must
# be below this threshold to trigger early stopping
agents:
seed: 1024 # for reproducibility
type: "actor_critic" # "actor_critic" or "policy_gradient"
num_actions: 21
actor_model:
hidden_dims:
- 256
- 128
- 64
softmax_enabled: true
batch_norm_enabled: false
actor_optimizer:
lr: 0.001
critic_model:
hidden_dims:
- 256
- 128
- 64
softmax_enabled: false
batch_norm_enabled: true
critic_optimizer:
lr: 0.001
reward_discount: .0
actor_critic_hyper_parameters:
train_iters: 10
actor_loss_coefficient: 0.1
k: 1
lam: 0.0
# clip_ratio: 0.8

Просмотреть файл

@ -1,46 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import numpy as np
from maro.simulator import Env
from maro.rl import AgentManagerMode, SimpleActor, ActorWorker
from maro.utils import convert_dottable
from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
def launch(config):
config = convert_dottable(config)
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
state_shaper = CIMStateShaper(**config.env.state_shaping)
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
config["agents"]["input_dim"] = state_shaper.dim
agent_manager = POAgentManager(
name="cim_actor",
mode=AgentManagerMode.INFERENCE,
agent_dict=create_po_agents(agent_id_list, config.agents),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper,
)
proxy_params = {
"group_name": os.environ["GROUP"],
"expected_peers": {"learner": 1},
"redis_address": ("localhost", 6379)
}
actor_worker = ActorWorker(
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
proxy_params=proxy_params
)
actor_worker.launch()
if __name__ == "__main__":
from components.config import config
launch(config)

Просмотреть файл

@ -1,46 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
from maro.rl import ActorProxy, AgentManagerMode, Scheduler, SimpleLearner, merge_experiences_with_trajectory_boundaries
from maro.simulator import Env
from maro.utils import Logger, convert_dottable
from components import CIMStateShaper, POAgentManager, create_po_agents
def launch(config):
config = convert_dottable(config)
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
config["agents"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
agent_manager = POAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN,
agent_dict=create_po_agents(agent_id_list, config.agents)
)
proxy_params = {
"group_name": os.environ["GROUP"],
"expected_peers": {"actor": int(os.environ["NUM_ACTORS"])},
"redis_address": ("localhost", 6379)
}
learner = SimpleLearner(
agent_manager=agent_manager,
actor=ActorProxy(
proxy_params=proxy_params, experience_collecting_func=merge_experiences_with_trajectory_boundaries
),
scheduler=Scheduler(config.main_loop.max_episode),
logger=Logger("cim_learner", auto_timestamp=False)
)
learner.learn()
learner.test()
learner.dump_models(os.path.join(os.getcwd(), "models"))
learner.exit()
if __name__ == "__main__":
from components.config import config
launch(config)

Просмотреть файл

@ -1,6 +0,0 @@
redis:
hostname: "localhost"
port: 6379
group: test_group
num_actors: 1
num_learners: 1

Просмотреть файл

@ -1,26 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""
This script is used to debug distributed algorithm in single host multi-process mode.
"""
import argparse
import os
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("group_name", help="group name")
parser.add_argument("num_actors", type=int, help="number of actors")
args = parser.parse_args()
learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
# Launch the learner process
os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
# Launch the actor processes
for _ in range(args.num_actors):
os.system(f"GROUP={args.group_name} python " + actor_path)

Просмотреть файл

@ -1,91 +0,0 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
from statistics import mean
import numpy as np
from maro.simulator import Env
from maro.rl import AgentManagerMode, Scheduler, SimpleActor, SimpleLearner
from maro.utils import LogFormat, Logger, convert_dottable
from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
class EarlyStoppingChecker:
"""Callable class that checks the performance history to determine early stopping.
Args:
warmup_ep (int): Episode from which early stopping checking is initiated.
last_k (int): Number of latest performance records to check for early stopping.
perf_threshold (float): The mean of the ``last_k`` performance metric values must be above this value to
trigger early stopping.
perf_stability_threshold (float): The maximum one-step change over the ``last_k`` performance metrics must be
below this value to trigger early stopping.
"""
def __init__(self, warmup_ep: int, last_k: int, perf_threshold: float, perf_stability_threshold: float):
self._warmup_ep = warmup_ep
self._last_k = last_k
self._perf_threshold = perf_threshold
self._perf_stability_threshold = perf_stability_threshold
def get_metric(record):
return 1 - record["container_shortage"] / record["order_requirements"]
self._metric_func = get_metric
def __call__(self, perf_history) -> bool:
if len(perf_history) < max(self._last_k, self._warmup_ep):
return False
metric_series = list(map(self._metric_func, perf_history[-self._last_k:]))
max_delta = max(
abs(metric_series[i] - metric_series[i - 1]) / metric_series[i - 1] for i in range(1, self._last_k)
)
print(f"mean_metric: {mean(metric_series)}, max_delta: {max_delta}")
return mean(metric_series) > self._perf_threshold and max_delta < self._perf_stability_threshold
def launch(config):
# First determine the input dimension and add it to the config.
config = convert_dottable(config)
# Step 1: initialize a CIM environment for using a toy dataset.
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
# Step 2: create state, action and experience shapers. We also need to create an explorer here due to the
# greedy nature of the DQN algorithm.
state_shaper = CIMStateShaper(**config.env.state_shaping)
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
# Step 3: create an agent manager.
config["agents"]["input_dim"] = state_shaper.dim
agent_manager = POAgentManager(
name="cim_learner",
mode=AgentManagerMode.TRAIN_INFERENCE,
agent_dict=create_po_agents(agent_id_list, config.agents),
state_shaper=state_shaper,
action_shaper=action_shaper,
experience_shaper=experience_shaper,
)
# Step 4: Create an actor and a learner to start the training process.
scheduler = Scheduler(
config.main_loop.max_episode,
early_stopping_checker=EarlyStoppingChecker(**config.main_loop.early_stopping)
)
actor = SimpleActor(env, agent_manager)
learner = SimpleLearner(
agent_manager, actor, scheduler,
logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
)
learner.learn()
learner.test()
learner.dump_models(os.path.join(os.getcwd(), "models"))
if __name__ == "__main__":
from components.config import config
launch(config)

Просмотреть файл

@ -1,50 +1,69 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
# Enable realtime data streaming with following statements.
# import os
# os.environ["MARO_STREAMIT_ENABLED"] = "true"
# os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "test_317"
from maro.simulator import Env
from maro.simulator.scenarios.cim.common import Action
from maro.simulator.scenarios.cim.common import Action, ActionType
from maro.streamit import streamit
start_tick = 0
durations = 100 # 100 days
if __name__ == "__main__":
start_tick = 0
durations = 100 # 100 days
opts = dict()
"""
enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
If you leave value to empty string, it will dump to current folder.
For getting dump data, please uncomment below line and specify dump destination folder.
"""
# opts['enable-dump-snapshot'] = ''
opts = dict()
with streamit:
"""
enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
If you leave value to empty string, it will dump to current folder.
For getting dump data, please uncomment below line and specify dump destination folder.
"""
# opts['enable-dump-snapshot'] = ''
# Initialize an environment with a specific scenario, related topology.
env = Env(scenario="cim", topology="toy.5p_ssddd_l0.0",
start_tick=start_tick, durations=durations, options=opts)
# Initialize an environment with a specific scenario, related topology.
env = Env(scenario="cim", topology="global_trade.22p_l0.1",
start_tick=start_tick, durations=durations, options=opts)
# Query environment summary, which includes business instances, intra-instance attributes, etc.
print(env.summary)
# Query environment summary, which includes business instances, intra-instance attributes, etc.
print(env.summary)
for ep in range(2):
# Tell streamit we are in a new episode.
streamit.episode(ep)
for ep in range(2):
# Gym-like step function
metrics, decision_event, is_done = env.step(None)
# Gym-like step function.
metrics, decision_event, is_done = env.step(None)
while not is_done:
past_week_ticks = [x for x in range(
decision_event.tick - 7, decision_event.tick)]
decision_port_idx = decision_event.port_idx
intr_port_infos = ["booking", "empty", "shortage"]
while not is_done:
past_week_ticks = [x for x in range(
max(decision_event.tick - 7, 0), decision_event.tick)]
decision_port_idx = decision_event.port_idx
intr_port_infos = ["booking", "empty", "shortage"]
# Query the decision port booking, empty container inventory, shortage information in the past week
past_week_info = env.snapshot_list["ports"][past_week_ticks:
decision_port_idx:
intr_port_infos]
# Query the decision port booking, empty container inventory, shortage information in the past week
past_week_info = env.snapshot_list["ports"][past_week_ticks:
decision_port_idx:
intr_port_infos]
dummy_action = Action(decision_event.vessel_idx,
decision_event.port_idx, 0)
dummy_action = Action(
decision_event.vessel_idx,
decision_event.port_idx,
0,
ActionType.LOAD
)
# Drive environment with dummy action (no repositioning)
metrics, decision_event, is_done = env.step(dummy_action)
# Drive environment with dummy action (no repositioning)
metrics, decision_event, is_done = env.step(dummy_action)
# Query environment business metrics at the end of an episode,
# it is your optimized object (usually includes multi-target).
print(f"ep: {ep}, environment metrics: {env.metrics}")
env.reset()
# Query environment business metrics at the end of an episode,
# it is your optimized object (usually includes multi-target).
print(f"ep: {ep}, environment metrics: {env.metrics}")
env.reset()

Просмотреть файл

Просмотреть файл

@ -18,16 +18,16 @@ def worker(group_name):
component_type="worker",
expected_peers={"master": 1})
counter = 0
print(f"{proxy.component_name}'s counter is {counter}.")
print(f"{proxy.name}'s counter is {counter}.")
# Nonrecurring receive the message from the proxy.
for msg in proxy.receive(is_continuous=False):
print(f"{proxy.component_name} receive message from {msg.source}.")
print(f"{proxy.name} receive message from {msg.source}.")
if msg.tag == "INC":
counter += 1
print(f"{proxy.component_name} receive INC request, {proxy.component_name}'s count is {counter}.")
proxy.reply(received_message=msg, tag="done")
print(f"{proxy.name} receive INC request, {proxy.name}'s count is {counter}.")
proxy.reply(message=msg, tag="done")
def master(group_name: str, worker_num: int, is_immediate: bool = False):
@ -55,17 +55,18 @@ def master(group_name: str, worker_num: int, is_immediate: bool = False):
session_type=SessionType.NOTIFICATION
)
# Do some tasks with higher priority here.
replied_msgs = proxy.receive_by_id(session_ids)
replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
else:
replied_msgs = proxy.broadcast(
component_type="worker",
tag="INC",
session_type=SessionType.NOTIFICATION
session_type=SessionType.NOTIFICATION,
timeout=-1
)
for msg in replied_msgs:
print(
f"{proxy.component_name} get receive notification from {msg.source} with "
f"{proxy.name} get receive notification from {msg.source} with "
f"message session stage {msg.session_stage}."
)

Просмотреть файл

@ -22,11 +22,11 @@ def summation_worker(group_name):
# Nonrecurring receive the message from the proxy.
for msg in proxy.receive(is_continuous=False):
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
if msg.tag == "job":
replied_payload = sum(msg.payload)
proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
proxy.reply(message=msg, tag="sum", payload=replied_payload)
def multiplication_worker(group_name):
@ -42,11 +42,11 @@ def multiplication_worker(group_name):
# Nonrecurring receive the message from the proxy.
for msg in proxy.receive(is_continuous=False):
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
if msg.tag == "job":
replied_payload = np.prod(msg.payload)
proxy.reply(received_message=msg, tag="multiply", payload=replied_payload)
proxy.reply(message=msg, tag="multiply", payload=replied_payload)
def master(group_name: str, sum_worker_number: int, multiply_worker_number: int, is_immediate: bool = False):
@ -88,19 +88,20 @@ def master(group_name: str, sum_worker_number: int, multiply_worker_number: int,
session_type=SessionType.TASK,
destination_payload_list=destination_payload_list)
# Do some tasks with higher priority here.
replied_msgs = proxy.receive_by_id(session_ids)
replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
else:
replied_msgs = proxy.scatter(tag="job",
session_type=SessionType.TASK,
destination_payload_list=destination_payload_list)
destination_payload_list=destination_payload_list,
timeout=-1)
sum_result, multiply_result = 0, 1
for msg in replied_msgs:
if msg.tag == "sum":
print(f"{proxy.component_name} receive message from {msg.source} with the sum result {msg.payload}.")
print(f"{proxy.name} receive message from {msg.source} with the sum result {msg.payload}.")
sum_result += msg.payload
elif msg.tag == "multiply":
print(f"{proxy.component_name} receive message from {msg.source} with the multiply result {msg.payload}.")
print(f"{proxy.name} receive message from {msg.source} with the multiply result {msg.payload}.")
multiply_result *= msg.payload
# Check task result correction.

Просмотреть файл

@ -22,11 +22,11 @@ def worker(group_name):
# Nonrecurring receive the message from the proxy.
for msg in proxy.receive(is_continuous=False):
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
if msg.tag == "sum":
replied_payload = sum(msg.payload)
proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
proxy.reply(message=msg, tag="sum", payload=replied_payload)
def master(group_name: str, is_immediate: bool = False):
@ -49,19 +49,19 @@ def master(group_name: str, is_immediate: bool = False):
for peer in proxy.peers_name["worker"]:
message = SessionMessage(tag="sum",
source=proxy.component_name,
source=proxy.name,
destination=peer,
payload=random_integer_list,
session_type=SessionType.TASK)
if is_immediate:
session_id = proxy.isend(message)
# Do some tasks with higher priority here.
replied_msgs = proxy.receive_by_id(session_id)
replied_msgs = proxy.receive_by_id(session_id, timeout=-1)
else:
replied_msgs = proxy.send(message)
replied_msgs = proxy.send(message, timeout=-1)
for msg in replied_msgs:
print(f"{proxy.component_name} receive {msg.source}, replied payload is {msg.payload}.")
print(f"{proxy.name} receive {msg.source}, replied payload is {msg.payload}.")
if __name__ == "__main__":

Просмотреть файл

@ -0,0 +1,46 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from maro.simulator.scenarios.cim.common import Action, DecisionEvent
from maro.vector_env import VectorEnv
if __name__ == "__main__":
with VectorEnv(batch_num=4, scenario="cim", topology="toy.5p_ssddd_l0.0", durations=100) as env:
for ep in range(2):
print("current episode:", ep)
metrics, decision_event, is_done = (None, None, False)
while not is_done:
action = None
# Usage:
# 1. Only push speicified (1st for this example) environment, leave others behind
# if decision_event:
# env0_dec: DecisionEvent = decision_event[0]
# # 1.1 After 1st environment is done, then others will push forward.
# if env0_dec:
# ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
# action = {0: Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)}
# 2. Only pass action to 1st environment (give None to other environments),
# but keep pushing all the environment, until the end
if decision_event:
env0_dec: DecisionEvent = decision_event[0]
if env0_dec:
ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
action = [None] * env.batch_number
# with a list of action, will push all environment to next step
action[0] = Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)
metrics, decision_event, is_done = env.step(action)
print("Final tick for each environment:", env.tick)
print("Final frame index for each environment:", env.frame_index)
env.reset()

Просмотреть файл

@ -0,0 +1,12 @@
# Simulation Results
Below table is the simulation results of current topologies based on `Best Fit` algorithm.
In the oversubscription topologies, the oversubscription rate is `115%`.
|Topology | PM Setting | Time Spent(s) | Total VM Requests |Successful Allocation| Energy Consumption| Total Oversubscriptions | Total Overload PMs
|:----:|-----|:--------:|:---:|:-------:|:----:|:---:|:---:|
|10k| 100 PMs, 32 Cores, 128 GB | 104.98|10,000| 10,000| 2,399,610 | 0 | 0|
|10k.oversubscription| 100 PMs, 32 Cores, 128 GB| 101.00 |10,000 |10,000| 2,386,371| 279,331 | 0|
|336k| 880 PMs, 16 Cores, 112 GB | 7,896.37 |335,985| 109,249 |26,425,878 | 0 | 0 |
|336k.oversubscription| 880 PMs, 16 Cores, 112 GB | 7,903.33| 335,985| 115,008 | 27,440,946 | 3,868,475 | 0

Просмотреть файл

@ -1,7 +0,0 @@
env:
scenario: vm_scheduling
topology: azure.2019.10k
start_tick: 0
durations: 8638
resolution: 1
seed: 88

Просмотреть файл

@ -1,74 +0,0 @@
import io
import os
import random
import timeit
import yaml
from maro.simulator import Env
from maro.simulator.scenarios.vm_scheduling import AllocateAction, DecisionPayload, PostponeAction
from maro.utils import convert_dottable
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "config.yml")
with io.open(CONFIG_PATH, "r") as in_file:
raw_config = yaml.safe_load(in_file)
config = convert_dottable(raw_config)
if __name__ == "__main__":
start_time = timeit.default_timer()
env = Env(
scenario=config.env.scenario,
topology=config.env.topology,
start_tick=config.env.start_tick,
durations=config.env.durations,
snapshot_resolution=config.env.resolution
)
if config.env.seed is not None:
env.set_seed(config.env.seed)
random.seed(config.env.seed)
metrics: object = None
decision_event: DecisionPayload = None
is_done: bool = False
action: AllocateAction = None
metrics, decision_event, is_done = env.step(None)
while not is_done:
valid_pm_num: int = len(decision_event.valid_pms)
if valid_pm_num <= 0:
# No valid PM now, postpone.
action: PostponeAction = PostponeAction(
vm_id=decision_event.vm_id,
postpone_step=1
)
else:
# Get the capacity and allocated cores from snapshot.
valid_pm_info = env.snapshot_list["pms"][
env.frame_index:decision_event.valid_pms:["cpu_cores_capacity", "cpu_cores_allocated"]
].reshape(-1, 2)
# Calculate to get the remaining cpu cores.
cpu_cores_remaining = valid_pm_info[:, 0] - valid_pm_info[:, 1]
# Choose the one with the closet remaining CPU.
chosen_idx = 0
minimum_remaining_cpu_cores = cpu_cores_remaining[0]
for i, remaining in enumerate(cpu_cores_remaining):
if remaining < minimum_remaining_cpu_cores:
chosen_idx = i
minimum_remaining_cpu_cores = remaining
# Take action to allocate on the closet pm.
action: AllocateAction = AllocateAction(
vm_id=decision_event.vm_id,
pm_id=decision_event.valid_pms[chosen_idx]
)
metrics, decision_event, is_done = env.step(action)
end_time = timeit.default_timer()
print(
f"[Best fit] Topology: {config.env.topology}. Total ticks: {config.env.durations}."
f" Start tick: {config.env.start_tick}."
)
print(f"[Timer] {end_time - start_time:.2f} seconds to finish the simulation.")
print(metrics)

Просмотреть файл

@ -1,7 +0,0 @@
env:
scenario: vm_scheduling
topology: azure.2019.10k
start_tick: 0
durations: 8638
resolution: 1
seed: 666

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше