Update features in v0.2 into branch master to release a new version (#297)

* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * bug fix related to np array divide (#245) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Master.simple bike (#250) * notebook for simple bike repositioning added * add simple rule-based algorithms * unify input * add policy based on statistics * update be for simple bike scenario to fit latest event buffer changes (#247) * change rendered graph * figures updated * change notebook * matplot updated * figures updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: wesley <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * simple bike repositioning article: formula updated * checked out docs/source from v0.2 * aligned with v0.2 * rm unwanted import * added references in policy_optimization.py * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 backend dynamic node support (#172) * update lint workflow * fix workflow issue * Update lint.yml * Create tox.ini * Update lint.yml * Update lint.yml * Update tox.ini * Update lint.yml * Delete tox.ini from root folder, move it to .github/linters * Update CONTRIBUTING.md * add more comments * update lint conf to ignore cli banner issue * change extension implementation from c to cpp * update script to gen cpp files * backend base interface redefine * interface revamp for np backend * 1st step for revamp * bug fix * draft * implementation of attribute * implementation of backend * remove backend switching * draft raw backend wrapper * correct function parameter type * 1st runable version * bug fix for types * ut passed * change CRLF to LF * fix get_node_info interface * add raw test in frame ut * return np.array for all query result * use ticks from backend * set init value * snapshot ut passed * support set default backend by environemnt variable * env ut with different backend * fix take snapshot index bug * test under both backends * ignore generated cpp file * fix lint isues * more lint fix * use ordered map to store ticks to keep the order * remove test code * refine dup code * refine code to avoid too much if/else * handle and raise exception for attr getter * change the way to handle cpp exception, use cython runtimeerror instead * add missing function, and fix bug in np impl * fix lint issue * specify c++11 flag for compilers * use normal field assignment instead initializer list, as linux gcc will complain it * add np ignore macro * try to refine token pasting operator to avoid error on linux * more pasting operator issue fix * remove un-used options * update workflow files to fit new backend * 1st version of dynamic backend structure * setup ut for cpp using lest * bitset complete * attributestore and ut * arrange * copy_to * current frame * ut for frame * bug fix and ut correct * fix issue that value not correct after arrange * fix bug in test case * frame update * change the way to add nodes, support add node from middle * frame in backend * snapshotlist code complete * add size method for snapshotlist, add ut template * make sure snapshot max size not be 0 * add max size * fix query parameters * fix attribute store extend error * add function to retrieve attribute from snapshotlist * return nan for invalid index * add function to check if nan for float attribute only * fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash * add functions to expose internal state under debug mode, make it easy to do unit test * fix issue that cause overlap logic skiped * ut passed for all implemented functions * remove query in ut, as it not completed yet * refine querying interfaces, use 2 functions for 1 querying * snapshot query, * use pointer instead weak_ptr * backend impl * set default parameters value * query bug fix, * bug fix: new_attr should return attr id not node id * use macro to create attribute getters * add reset support * change the way to reset, avoid allocation time * test reset for attributestore * use Bitset instead vector<bool> to make it easy to reset * refine backend interfaces to make it compact with old one * correct quering interface, cython compile passed * bug fix: get_ticks not set correct index * correct cpp backend binding, add type for frame * correct ut for snapshot * bug fix: query cause crash after snapshot reset * fix env test * bug fix: is_nan should check data type first * fix cim ut issues with raw backend * fix citibike ut issues for raw backend * add interfaces to support dynamic nodes, not tested * bug fix: access cpp object without cdef * bug fix: missing impl for dynamic methods * ut for append nodes * return node number dynamiclly * remove unused parameters for snapshot * remove unused code * allow get attribute for deleted node * ut for delete and resume node * function to set attribute slot * bug fix: set attribute will cause crash * bug fix: remove append node when reset cause exception * bug fix: frame.backend_type return incorrect name * backends performance comparison * correct internal type * correct warnings * missing ; * formating * fix lint issue * simple the way to copy mapping * add dump interfaces * frame dump * ignore if dump path is not exist * bug fix: use max slots instead of current slots for padding in snapshot querying * use max slot number in history instead of current for padding * dump for snapshot * close file at the end * refine snapshot dump function * fix lint issue * avoid too much allocate operation * use pointer instead reference for furthure changes * avoid 2 times map copy * add comments for missing functions * performance optimize * use emplace instead push * use emplace instead push * remove cpp files * add missing lisence * ignore .vs folder * add lest lisence for cpp unittest * Delete CMakeLists.txt * add error msg for exception, make it easy to identify error at python side * remove old codes * replace with new code * change IDENTIER to NODE_TYPE and ATTR_TYPE * build pass * fix attr type not correct bug * reomve unused comment * make frame ut pass * correct the max snapshots checking * fix test case * add missing file * correct performance test * refine attribute code * refine bitset code * update FrameBase doc about switch backend * correct the exception name * refine frame code * refine node code * refine snapshot list code * add is_const and is_list when adding attribute * support query const attribute without tick exist * add operations for list attribute * remove cache as we have list attribute * add remove and insert for list attribute * add for-loop support for list attribute * fix bug that not update list attribute slot number after operations * test for dynamic features * frame dump * dump for snapshot list * fix issue on gcc compiler * add missing file * fix lint issues * refine the exception, more comments * fix lint issue * fix lint issue * use simulate enum instead of str * Use new type instead old in tests * using mapping instead if-else * remove generated code * use mapping to reduce too much if-else * add default attribute type int if not provided or invalid provided * remove generated code * update workflow with code gen * more frame test * add missing files * test: cover maro.simulator.utils.common * update test with new scenario * comments * tests * update doc * fix lint and comments * CRLF to LF * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 vm oversub docs (#256) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment * Update overload to the VM docs * Update docs * Update vm docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 ddpg (#252) * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * some revision to DDPG * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * fixed some issues with DDPG code * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * fixed naming issues * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * tmp commit * tmp commit * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * fixed learning model naming * fixed conflicts * updated ddpg example * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * added ddpg example for cim * fixed some bugs * removed dimension check in absence of shared stack * fixed a typo * bug fixes * bug fixes * aligned with v0.2 * aligned with v0.2 * fixed lint issues * added reference in ddpg.py * fixed lint issues * fixed lint issues * fixed lint issues * removed ddpg example * checked out files from origin/v0.2 before merging Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 cli refactoring (#227) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * refactor: extract reusable methods to GrassExecutor * feat: refine validation.py and add docstrings * fix: add remote prefix to ssh function * style: refine logging output * fix: extract param 'vm_name' * fix: linting errors * feat: add NodeStatus and ContainerStatus at executors * feat: use master_node_size as the size of build_node_image_vm * fix: refine comments * feat: add "state" key for node_details * fix: linting errors * fix: deployment error when ssh_port is the default port * refactor: extract utils/*.py in scripts * style: single quote to double quote * refactor: refine folder structure of scripts * fix: linting errors * fix: add executable to fix error initialization * refactor: use SubProcess to execute commands in scripts * refactor: refine script namings * refactor: extract utils/*.py and systemd/*.service in agents * feat: refine Exception structure, add SubProcess class in agents * feat: use psutil to get resource details, move resource details initialization to agents * fix: linting errors * feat: use docker sdk in node_agent * feat: extract RedisExecutor in agents * test: remove image when tearing down * feat: add LoadImageAgent * feat: move node status update to agents * refactor: move utils folder to upper level in scripts * feat: add node_api_server, refine agents folder structure * fix: linting errors * refactor: refine folder structure in grass/lib * refactor: build DeploymentValidator class * refactor: create DetailsReader, DetailsWriter, delete sync mode * refactor: rename DockerManager to DockerController * refactor: rename RedisManager to RedisController * refactor: rename AzureExecutor to AzureController * refactor: create NameCreator * refactor: create PathConvertor * refactor: rename checkers to details_validity_wrapper * refactor: rename lock to operation_lock_wrapper * refactor: create FileSynchronizer * refactor: create redis instance in RedisController * feat: add master_api_server, move job related scripts to api_server * refactor: move node related scripts to api_server * fix: use "DELETE" instead of "DEL" as http method * refactor: use mapping names instead of namings like "sths_details" * feat: move master related scripts to api_server * feat: move containers related scripts to api_server * fix: add gracefully wait for remote_start_master_services * feat: move image_files related scripts to api_server * fix: improper test in the training stage * refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client * refactor: refine namings in services * feat: move clean related scripts to api_server * refactor: delete "public_key" field * feat: build MasterApiClient * refactor: delete sync_mkdir * feat: refine locks in node_details * feat: build DockerController for grass/utils * refactor: rename Extractor to Controller * feat: move schedule related components to api_server * fix: incorrect allocation when starting batch jobs * fix: missing field "containers" in job_details * feat: add delete_job in master_api_server * feat: add logger in agents * fix: no "resources" field when scale up node at the very beginning * feat: use Process back instead of Thread in node_agent * feat: add 'v1' prefix to api_servers' urls * refactor: move lib/aks under lib/clouds * refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis * feat: extract K8sExecutor * fix: add one more searching layer of pakcage_data at maro.cli.k8s * refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode * refactor: move id init to standardize_create_deployment in grass/azure mode * fix: use GlobalParams instead of hard-coded data * feat: build K8sDetailsReader, K8sDetailsWriter * feat: use k8s sdk to replace subprocess call * refactor: delete redundant vars * refactor: move more methods to K8sExecutor * test: use legal naming in tests/cli/k8s * refactor: refine logging messages * refactor: make create() as a staticmethod at grass/azure mode, refine logging messages * feat: build ArmTemplateParameterBuilder in K8sAzureExecutor * refactor: remove redundant params * refactor: rename /clouds to /modes * refactor: refine structures and logging messages in GrassExecutor * feat: add 'PENDING' to NodeStatus * feat: refine build_job_details for create schedule in grass/azure * feat: refine build_job_details for create schedule in k8s/aks * feat: use node_join schema in grass/azure * refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts * refactor: add 'ssh', 'api_server' into master_details and node_details * refactor: move master runtine params initialization into api_server * refactor: refine namings * feat: reconstruct grass/on-premises with new schema * refactor: delete field 'user' in grass_azure_create * refactor: rename 'blueprints_v1' to 'blueprints' * refactor: move some GlobalPaths to subfolders * refactor: replace 'connection' field with 'master' or 'node' * refactor: move start_service scripts to init_master.py * refactor: rename grass/master/release to grass/master/delete_master * refactor: load local_details in node services, refine script namings * refactor: move invocations of start_node and stop node to api server * fix: add missing imports * refactor: rename SubProcess to Subprocess * refactor: delete field 'user' in k8s_aks_create * refactor: refine folder structures in /.maro/clusters/cluster * refactor: move /logs to /clusters/{cluster_name} * refactor: refine filenames * fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings * refactor: refine code structures, delete redundant code * refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml * feat: add rsa+aes data encryption on dev-master communication * fix: change MasterApiClient to RedisController in node-related services and scripts * refactor: remove all "{cluster_name}" in redis keys * refactor: extract init_master and create_user to GrassExecutor * test: refine tests in grass/azure and k8s/aks * refactor: refine ArmTemplateParameterBuilder * feat: change the order of installation in init_build_node_image_vm.py * fix: add user/admin_id to grass_on_premises_create.yml * fix: change outdated container names * feat: add standardize_join_cluster_deployment in grass/on-premises * feat: add init_node_runtime_env in join_cluster.py * refactor: refine code structure in join_cluster.py * test: add TestGrassOnPremises * refactor: refine ARM templates * fix: linting errors * fix: test requirements error * fix: arm linting errors * refactor: late import in grass, k8s * style: refine load_parser_grass * style: refine load_parser_k8s * docs: update orchestrations * fix: fix get_job_logs * docs: add docs for GrassAzureExecutor, GrassExecutor * docs: add docs for GrassOnPremisesExecutor * docs: add docs for /grass/scripts * docs: add docs for /grass/services * docs: add docs for /grass/utils * docs: add docs for k8s * try paramiko of another version * rollback paramiko package version Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * Refine joint decision sequential action mode (#219) * refine the logic about jont decision sequential action mode to match current event buffer implementation * fix lint issue * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 merge algorithm into agent (#259) * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed exp pool type spec in AbsAgent * fixed lint issues * dqn exp pool bug fix * minor issues * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * refined LearningModel * updated cim example doc * lint issue fix * small refinements * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * lint issue fix * formatting * 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model * minor formatting fixes * refinement * rm unwanted import * add List typing in schedular * lint issue fix Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * V0.2 gnn refactoring (#274) * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed exp pool type spec in AbsAgent * fixed lint issues * dqn exp pool bug fix * minor issues * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * refined LearningModel * updated cim example doc * lint issue fix * small refinements * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model * minor formatting fixes * refinement * rm unwanted import * add List typing in schedular * lint issue fix * removed redundant parameters for GNNBasedACModel * restored duration to 1120 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * Add vector env support (#266) * 1st version * make vectorenv can import under module root * allow outside control which environment to push, so we do not need to control the tick for each environments * remove comment * lint fixing * add test for vector env, correct the batch number * lint fixing * reduce parameters * Update vector env ut to test if support raw backend * correct comments on hello * fix review comments, cim actiontype wip * add a compatiable way to handle ActionType for cim scenario * lint fix * correct the action type to handle previous action * add doc string for wrappers Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 Rule-based algorithms for VM Scheduling (#255) * rule_based_algorithm * revise_the_code_by_aiming_hao * revise_the_code_by_aiming_hao * use the np.argmin * Update best_fit.py fix the "np not defined" * refine the code * fix the error * refine the code * fix the error * fix the error * refine the code * remove the history * refine the code * update first_fit * Refine the code Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> * delete duplicated rule based algorithms for VM scheduling * Add slot filter functions for node attribute (#273) * add where filter for general usage * test for general filter * simpler comparison for attribute * filter on raw * fix array fetch bug * ut for base comparison * lint fix * remove unused variables * update ignore * Fix coding style (#284) * V0.2 vm region support (#258) * Region init * Add region, zone, cluster * Fix bug * Add update parent id * Update PM config * Update number * Fix import order * Fix bug * Modify config * Add cluster attribute * Refine naming * Fix bug * Modify 336k config * Update region * Update config * Update pm config * pylint * Add comment * Update based on PR comment * Modify config and zone class * Add unit test * Update region part * Update pylint * Modify unit test * Refactor region structure * Add comment and fix style * Fix machine num bugs * Modify config * Fix style * Fix bugs and add empty machine attributes * Add update upper level metrics * Update config * Fix lint style * Modify doc strings * Fix amount counter * Update unit test * fix lint style * Update the ids init * Init total and empty machine num * Update lint style * Fix snapshot attributes initial state * Update config * add topologies for over-subscription and multi-cluster to be compatible with the previous topologies * Add simulation result * Move readme * Add overload results Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 rule based algorithm readme (#282) * Add README.md and refine the bin_packing algorithm * refine round_robin and bin_packing * Update README.md * Refine the code and README.md * Refine the bin_packing and round_robin * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> * Feature: Add a cli command to support create new project. (#279) * maro project new * remove maro project run * add get_metrics to template * add license * more comments * lint issue fix * linting issue fix * fix linting issue * linting issue fix * remove unused code gen * include template files * fix incorrect comment * include topologies for vm_scheduling scenario * rename to PositiveNumberValidator * refine command line comment * refine topology command comment * add a simple doc for new command * fix incorrect value for dummy frame * correct issues in docs * more comments on set_state * doc issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * CLI visualization support and maro grass local mode (#277) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * refactor: extract reusable methods to GrassExecutor * feat: refine validation.py and add docstrings * fix: add remote prefix to ssh function * style: refine logging output * fix: extract param 'vm_name' * fix: linting errors * feat: add NodeStatus and ContainerStatus at executors * feat: use master_node_size as the size of build_node_image_vm * fix: refine comments * feat: add "state" key for node_details * fix: linting errors * fix: deployment error when ssh_port is the default port * refactor: extract utils/*.py in scripts * style: single quote to double quote * refactor: refine folder structure of scripts * fix: linting errors * fix: add executable to fix error initialization * refactor: use SubProcess to execute commands in scripts * refactor: refine script namings * refactor: extract utils/*.py and systemd/*.service in agents * feat: refine Exception structure, add SubProcess class in agents * feat: use psutil to get resource details, move resource details initialization to agents * fix: linting errors * feat: use docker sdk in node_agent * feat: extract RedisExecutor in agents * test: remove image when tearing down * feat: add LoadImageAgent * feat: move node status update to agents * refactor: move utils folder to upper level in scripts * feat: add node_api_server, refine agents folder structure * fix: linting errors * refactor: refine folder structure in grass/lib * refactor: build DeploymentValidator class * refactor: create DetailsReader, DetailsWriter, delete sync mode * refactor: rename DockerManager to DockerController * refactor: rename RedisManager to RedisController * refactor: rename AzureExecutor to AzureController * refactor: create NameCreator * refactor: create PathConvertor * refactor: rename checkers to details_validity_wrapper * refactor: rename lock to operation_lock_wrapper * refactor: create FileSynchronizer * refactor: create redis instance in RedisController * feat: add master_api_server, move job related scripts to api_server * refactor: move node related scripts to api_server * fix: use "DELETE" instead of "DEL" as http method * refactor: use mapping names instead of namings like "sths_details" * feat: move master related scripts to api_server * feat: move containers related scripts to api_server * fix: add gracefully wait for remote_start_master_services * feat: move image_files related scripts to api_server * fix: improper test in the training stage * refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client * refactor: refine namings in services * feat: move clean related scripts to api_server * refactor: delete "public_key" field * feat: build MasterApiClient * refactor: delete sync_mkdir * feat: refine locks in node_details * feat: build DockerController for grass/utils * refactor: rename Extractor to Controller * feat: move schedule related components to api_server * fix: incorrect allocation when starting batch jobs * fix: missing field "containers" in job_details * feat: add delete_job in master_api_server * feat: add logger in agents * fix: no "resources" field when scale up node at the very beginning * feat: use Process back instead of Thread in node_agent * feat: add 'v1' prefix to api_servers' urls * refactor: move lib/aks under lib/clouds * refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis * feat: extract K8sExecutor * fix: add one more searching layer of pakcage_data at maro.cli.k8s * refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode * refactor: move id init to standardize_create_deployment in grass/azure mode * fix: use GlobalParams instead of hard-coded data * feat: build K8sDetailsReader, K8sDetailsWriter * feat: use k8s sdk to replace subprocess call * refactor: delete redundant vars * refactor: move more methods to K8sExecutor * test: use legal naming in tests/cli/k8s * refactor: refine logging messages * refactor: make create() as a staticmethod at grass/azure mode, refine logging messages * feat: build ArmTemplateParameterBuilder in K8sAzureExecutor * refactor: remove redundant params * refactor: rename /clouds to /modes * refactor: refine structures and logging messages in GrassExecutor * feat: add 'PENDING' to NodeStatus * feat: refine build_job_details for create schedule in grass/azure * feat: refine build_job_details for create schedule in k8s/aks * add grass local mode (non-pass) * feat: use node_join schema in grass/azure * refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts * refactor: add 'ssh', 'api_server' into master_details and node_details * refactor: move master runtine params initialization into api_server * refactor: refine namings * feat: reconstruct grass/on-premises with new schema * refactor: delete field 'user' in grass_azure_create * refactor: rename 'blueprints_v1' to 'blueprints' * refactor: move some GlobalPaths to subfolders * Update grass local mode, run pass * refactor: replace 'connection' field with 'master' or 'node' * refactor: move start_service scripts to init_master.py * refactor: rename grass/master/release to grass/master/delete_master * refactor: load local_details in node services, refine script namings * refactor: move invocations of start_node and stop node to api server * fix: add missing imports * refactor: rename SubProcess to Subprocess * refactor: delete field 'user' in k8s_aks_create * add resource class * refactor: refine folder structures in /.maro/clusters/cluster * refactor: move /logs to /clusters/{cluster_name} * refactor: refine filenames * fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings * refactor: refine code structures, delete redundant code * refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml * feat: add rsa+aes data encryption on dev-master communication * fix: change MasterApiClient to RedisController in node-related services and scripts * refactor: remove all "{cluster_name}" in redis keys * refactor: extract init_master and create_user to GrassExecutor * test: refine tests in grass/azure and k8s/aks * refactor: refine ArmTemplateParameterBuilder * add cli visible agent * feat: change the order of installation in init_build_node_image_vm.py * fix: add user/admin_id to grass_on_premises_create.yml * fix: change outdated container names * feat: add standardize_join_cluster_deployment in grass/on-premises * feat: add init_node_runtime_env in join_cluster.py * refactor: refine code structure in join_cluster.py * test: add TestGrassOnPremises * refactor: refine ARM templates * fix: linting errors * fix: test requirements error * fix: arm linting errors * refactor: late import in grass, k8s * style: refine load_parser_grass * style: refine load_parser_k8s * add jobstate and resource usage support * add local visible test * docs: update orchestrations * fix: fix get_job_logs * docs: add docs for GrassAzureExecutor, GrassExecutor * docs: add docs for GrassOnPremisesExecutor * docs: add docs for /grass/scripts * docs: add docs for /grass/services * docs: add docs for /grass/utils * docs: add docs for k8s * grass mode visible pass * grass local mode run pass * fixed pylint * Update resource, rm GPUtil depend * Update CLI local mode visible * grass local mode pass * add redis clear and pylint fixed * rm job status in grass azure mode * fix bug * fixed merge issue * fixed lin * update by pr comments * fixed isort issue * fixed stop bug * fixed local agent and cmp issue * fixed pending job cannot killed * add mount in Grass local mode * add resource check interval in redis Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * Add Env-Geographic visualization tool, CIM hello as example (#291) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * V0.2 remove props from be (#269) * Fix bug * fix bu * Master vm doc - data preparation (#285) * Update vm docs * Update docs * Update data preparation docs * Update * Update docs Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Maro Geographic Tool Doc Update (#294) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file Co-authored-by: chaosyu <chaos.you@gmail.com> * Maro geo vis Data Update (#295) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2_refactored_distributed_framework (#206) * added some more logs for dist RL * bug fix * fixed a typo * bug fix * refined logs * set session_id to None for exit message * add setup/clear/template for maro process * changed to internal logger for actor and learner * removed redundant component name from internal logs * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * fixed typos * update ProcessInternalError * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * updated notebook * 1. removed external loggers from cim example; 2. fixed batch inference bugs * removed actor_trainer mode and refactored * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * fixed conflicts * fixed typos * removed stale imports * fixed stale naming * removed dist_topologies folder * refined session id logic * bug fix * refactored * distributed RL refinement * refined * small bug fix * fixed lint issues * fixed lint issues * removed unwanted file * fixed a typo * gnn refactoring in progress * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed unwanted files * fixed merge conflicts * removed exp pool type spec in AbsAgent * fixed lint issues * changed to a single gnn agent * dqn exp pool bug fix * minor issues * removed GNNAgentManager * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * mroe gnn refactoring * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * finished single process gnn * fixed bugs * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * bug fixes * refined LearningModel * modified gnn example based on latest rl toolkit changes * updated cim example doc * lint issue fix * small refinements * refactored GNN example * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * checked out gnn files from origin/v0.2 * refactored distributed rl toolkit * finished distributed rl refactoring and updated dqn example and notebook * merged request_rollout with collect * some refinement * refactored examples * distributed rl revamping complete * bug and formatting fixes * bug fixes * hid proxy instantiation inside dist components * small refinement * refined distributed RL and updated docs * updated docs and notebook * rm unwanted imports * added missing files * rm unwanted files * lint issue fix * bug fix * example doc update * rm agent_manager.svg * updated images * updated image file name in doc * revamped cim example code structure * added missing file * restored default training config for dqn and ac-gnn * added default loss function for actor-critic * rm unwanted import * updated README for cim/ac * removed log_p param for PolicyGradient train() * added READMEs for CIM * renamed ac-gnn to ac_gnn * updated README for CIM and added set_seeds to multi-process dqn * init * remove unit, make it same as logic * init by sku, world sku * init by sku, world sku * remove debug code * correct snapshot number issue * rename logic to unit, make it meaningful * add facility base * refine naming * refine the code, more comment to make it easy to read * add supplier facility, logic not tested yet * fix bug in facility initialize, add consumerunit not completed * refactoring the facilities in world config * add consumer for warehouse facility * add upstream topology, and save it state * add mapping from id to data model index * logic without reward of consumer * bug fix * seller unit * use tcod for path finding * retailer facility * bug fix, show seller demands in example * add a interactive and renderable env wrapper to later debugging * move font to subfolder with lisence to make it more clearly * add more details for node mapping * dispatch action by unit id * merge the frame changes to support data model inherit * add action for consumer, so that we can push the requirement * add unit id and facility in state for unit, add storage id for manufacture unit to simple the state retrieving * show manufacture related debug info step by step * add bom info for debug * add x,y to facility, bug fix * fix bugs in transport and distribution unit, correct the path finding issue * show vehicle movement in screen * remove completed todo * fix vehicle location issue, make all units and data model class from configs * show more states * fix slot number bug for dynamic backend * rename suppliers to manufactures * add missing file * remove code config, use yml instead * add 2 different step modes * update changes * rename manufacture * add action for manufacture unit * more attribute for states * add balance sheet * rename attribute to unify the feature name * reverted experimental changes in dqn learner * updated notebook * rm supply chain code * lint issue fix * lint issue fix * added missing file * added general rollout workflow and trajectory class * refactored * more refactoring * checked out backend from v0.2 * checked out setup.py from v0.2 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: chaosyu <chaos.you@gmail.com> * Add the price model (#286) * Add the price model * fix the error * Refine the energy consumption * Fix the error * Delete business_engine_20210225104622.py * Delete * Delete the history file * Delete common_20210205152100.py * Delete common_20210302150646.py * Refine the code * Refine the code * Refine the code * Delete history files * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * refine the code * Refine the code * Delete the history file * Fix the error * Fix the error * Fix the error * Refine the code * fix the error * fix the error * fix the error * Refine the code * Add toy files * Refine the code * Refine the code * Add file * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * add vm_scheduling meta into package data * Maro Dashboard Vis Doc Update (#298) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data * doc update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fixed internal logger dumplicated output (#299) * fixed internal logger dumplicated output * delete unused import * fixed isort Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com> Co-authored-by: MicrosoftHam <77261932+MicrosoftHam@users.noreply.github.com> Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
2021-03-22 14:53:27 +08:00 · 2021-03-22 14:53:27 +08:00 · cee5277692
--- a/.github/workflows/test_with_cli.yml
+++ b/.github/workflows/test_with_cli.yml
@ -63,4 +63,4 @@ jobs:
          test_with_cli: True
          training_only: True
        run: |
-          python -m unittest tests/cli/grass/test_grass.py
+          python -m unittest -f tests/cli/grass/test_grass_azure.py
--- a/.gitignore
+++ b/.gitignore
@ -6,6 +6,7 @@
 *.c
 *.cpp
 *.DS_Store
+.pytest_cache/
 .idea/
 .vscode/
 .vs/
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -3,3 +3,5 @@ prune examples

 include maro/simulator/scenarios/cim/topologies/*/*.yml
 include maro/simulator/scenarios/citi_bike/topologies/*/*.yml
+include maro/simulator/scenarios/vm_scheduling/topologies/*/*.yml
+include maro/cli/project_generator/templates/*.jinja
--- a/README.md
+++ b/README.md
@ -161,7 +161,7 @@ env = Env(scenario="cim",
          options={"enable-dump-snapshot": "./dump_data"})

 # Inspect environment with the dump data
-maro inspector env --source ./dump_data
+maro inspector dashboard --source_path ./dump_data/snapshot_dump_folder
 ```

 ### Show Cases
--- a/docs/source/apidoc/maro.rl.rst
+++ b/docs/source/apidoc/maro.rl.rst
@ -9,6 +9,34 @@ maro.rl.agent.abs\_agent
   :undoc-members:
   :show-inheritance:

+maro.rl.agent.dqn
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.agent.dqn
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+maro.rl.agent.ddpg
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.agent.ddpg
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+maro.rl.agent.policy\_optimization
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.agent.policy_optimization
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+
+Agent Manager
+================================================================================
+
 maro.rl.agent.abs\_agent\_manager
 --------------------------------------------------------------------------------

@ -18,33 +46,13 @@ maro.rl.agent.abs\_agent\_manager
   :show-inheritance:


-Algorithms
+Model
 ================================================================================

-maro.rl.algorithms.torch.abs\_algorithm
+maro.rl.model.learning\_model
 --------------------------------------------------------------------------------

-.. automodule:: maro.rl.algorithms.torch.abs_algorithm
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-maro.rl.algorithms.torch.dqn
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.algorithms.torch.dqn
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-
-Models
-================================================================================
-
-maro.rl.models.torch.learning\_model
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.models.torch.learning_model
+.. automodule:: maro.rl.model.torch.learning_model
   :members:
   :undoc-members:
   :show-inheritance:
@ -53,18 +61,46 @@ maro.rl.models.torch.learning\_model
 Explorer
 ================================================================================

-maro.rl.explorer.abs\_explorer
+maro.rl.exploration.abs\_explorer
 --------------------------------------------------------------------------------

-.. automodule:: maro.rl.explorer.abs_explorer
+.. automodule:: maro.rl.exploration.abs_explorer
   :members:
   :undoc-members:
   :show-inheritance:

-maro.rl.explorer.simple\_explorer
+maro.rl.exploration.epsilon\_greedy\_explorer
 --------------------------------------------------------------------------------

-.. automodule:: maro.rl.explorer.simple_explorer
+.. automodule:: maro.rl.exploration.epsilon_greedy_explorer
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+maro.rl.exploration.noise\_explorer
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.exploration.noise_explorer
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+
+Scheduler
+================================================================================
+
+maro.rl.scheduling.scheduler
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.scheduling.scheduler
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+maro.rl.scheduling.simple\_parameter\_scheduler
+--------------------------------------------------------------------------------
+
+.. automodule:: maro.rl.scheduling.simple_parameter_scheduler
   :members:
   :undoc-members:
   :show-inheritance:
@ -81,38 +117,6 @@ maro.rl.shaping.abs\_shaper
   :undoc-members:
   :show-inheritance:

-maro.rl.shaping.action\_shaper
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.shaping.action_shaper
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-maro.rl.shaping.experience\_shaper
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.shaping.experience_shaper
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-maro.rl.shaping.k\_step\_experience\_shaper
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.shaping.k_step_experience_shaper
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-maro.rl.shaping.state\_shaper
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.shaping.state_shaper
-   :members:
-   :undoc-members:
-   :show-inheritance:
-

 Storage
 ================================================================================
@ -125,18 +129,10 @@ maro.rl.storage.abs\_store
   :undoc-members:
   :show-inheritance:

-maro.rl.storage.column\_based\_store
+maro.rl.storage.simple\_store
 --------------------------------------------------------------------------------

-.. automodule:: maro.rl.storage.column_based_store
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-maro.rl.storage.utils
--------------------------------------------------------------------------------
-
-.. automodule:: maro.rl.storage.utils
+.. automodule:: maro.rl.storage.simple_store
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -37,14 +37,16 @@ author = "MARO Team"
 # extensions coming with Sphinx (named "sphinx.ext.*") or your custom
 # ones.

-extensions = ["recommonmark",
-              "sphinx.ext.autodoc",
-              "sphinx.ext.coverage",
-              "sphinx.ext.napoleon",
-              "sphinx.ext.viewcode",
-              "sphinx_markdown_tables",
-              "sphinx_copybutton",
-              ]
+extensions = [
+    "recommonmark",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.coverage",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.viewcode",
+    "sphinx_markdown_tables",
+    "sphinx_copybutton",
+    "sphinx.ext.autosectionlabel",
+]

 napoleon_google_docstring = True
 napoleon_use_param = False
--- a/docs/source/examples/multi_agent_dqn_cim.rst
+++ b/docs/source/examples/multi_agent_dqn_cim.rst
@ -1,298 +1,167 @@
 Multi Agent DQN for CIM
 ================================================

-This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the
-`CIM <https://maro.readthedocs.io/en/latest/scenarios/container_inventory_management.html>`_ problem. It is formalized as a multi-agent reinforcement learning problem, where each port acts as a decision
-agent. The agents take actions independently, e.g., loading containers to vessels or discharging containers from vessels.
+This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the container
+inventory management (CIM) problem. It is formalized as a multi-agent reinforcement learning problem,
+where each port acts as a decision agent. When a vessel arrives at a port, these agents must take actions
+by transfering a certain amount of containers to / from the vessel. The objective is for the agents to
+learn policies that minimize the overall container shortage.

-State Shaper
------------
+Trajectory
+----------

-`State shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ converts the environment
-observation to the model input state which includes temporal and spatial information. For this scenario, the model input
-state includes:
+The ``CIMTrajectoryForDQN`` inherits from ``Trajectory`` function and implements methods to be used as callbacks
+in the roll-out loop. In this example,
+  * ``get_state`` converts environment observations to state vectors that encode temporal and spatial information.
+    The temporal information includes relevant port and vessel information, such as shortage and remaining space,
+    over the past k days (here k = 7). The spatial information includes features of the downstream ports.
+  * ``get_action`` converts agents' output (an integer that maps to a percentage of containers to be loaded
+    to or unloaded from the vessel) to action objects that can be executed by the environment.
+  * ``get_offline_reward`` computes the reward of a given action as a linear combination of fulfillment and
+    shortage within a future time frame.
+  * ``on_finish`` processes a complete trajectory into data that can be used directly by the learning agents. 

- Temporal information, including the past week's information of ports and vessels, such as shortage on port and
-remaining space on vessel.
- Spatial information, including related downstream port features.

 .. code-block:: python
-    PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
-    VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
+    class CIMTrajectoryForDQN(Trajectory):
+        def __init__(
+            self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
+            reward_time_window, fulfillment_factor, shortage_factor, time_decay,
+            finite_vessel_space=True, has_early_discharge=True 
+        ):
+            super().__init__(env)
+            self.port_attributes = port_attributes
+            self.vessel_attributes = vessel_attributes
+            self.action_space = action_space
+            self.look_back = look_back
+            self.max_ports_downstream = max_ports_downstream
+            self.reward_time_window = reward_time_window
+            self.fulfillment_factor = fulfillment_factor
+            self.shortage_factor = shortage_factor
+            self.time_decay = time_decay
+            self.finite_vessel_space = finite_vessel_space
+            self.has_early_discharge = has_early_discharge

-    class CIMStateShaper(StateShaper):
-        ...
-        def __call__(self, decision_event, snapshot_list):
-            tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
-            ticks = [tick - rt for rt in range(self._look_back - 1)]
-            future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
-            port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
-            vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
-            state = np.concatenate((port_features, vessel_features))
-            return str(port_idx), state
+        def get_state(self, event):
+            vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
+            tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
+            ticks = [tick - rt for rt in range(self.look_back - 1)]
+            future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
+            port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
+            vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
+            return {port_idx: np.concatenate((port_features, vessel_features))}

+        def get_action(self, action_by_agent, event):
+            vessel_snapshots = self.env.snapshot_list["vessels"]
+            action_info = list(action_by_agent.values())[0]
+            model_action = action_info[0] if isinstance(action_info, tuple) else action_info
+            scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
+            zero_action_idx = len(self.action_space) / 2  # index corresponding to value zero.
+            vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
+            early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
+            percent = abs(self.action_space[model_action])

-Action Shaper
-------------
-
-`Action shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert an
-agent's model output to an environment executable action. For this specific scenario, the action space consists of
-integers from -10 to 10, with -10 indicating loading 100% of the containers in the current inventory to the vessel and
-10 indicating discharging 100% of the containers on the vessel to the port.
-
-.. code-block:: python
-
-    class CIMActionShaper(ActionShaper):
-        ...
-        def __call__(self, model_action, decision_event, snapshot_list):
-            scope = decision_event.action_scope
-            tick = decision_event.tick
-            port_idx = decision_event.port_idx
-            vessel_idx = decision_event.vessel_idx
-
-            port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
-            vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
-            early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
-            assert 0 <= model_action < len(self._action_space)
-
-            if model_action < self._zero_action_index:
-                actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
-            elif model_action > self._zero_action_index:
-                plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
-                actual_action = (
-                    round(plan_action) if plan_action > 0
-                    else round(self._action_space[model_action] * scope.discharge)
-                )
+            if model_action < zero_action_idx:
+                action_type = ActionType.LOAD
+                actual_action = min(round(percent * scope.load), vessel_space)
+            elif model_action > zero_action_idx:
+                action_type = ActionType.DISCHARGE
+                plan_action = percent * (scope.discharge + early_discharge) - early_discharge
+                actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
            else:
-                actual_action = 0
+                actual_action, action_type = 0, None

-            return Action(vessel_idx, port_idx, actual_action)
+            return {port: Action(vessel, port, actual_action, action_type)}

-Experience Shaper
-----------------
+        def get_offline_reward(self, event):
+            port_snapshots = self.env.snapshot_list["ports"]
+            start_tick = event.tick + 1
+            ticks = list(range(start_tick, start_tick + self.reward_time_window))

-`Experience shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert
-an episode trajectory to trainable experiences for RL agents. For this specific scenario, the reward is a linear
-combination of fulfillment and shortage in a limited time window.
+            future_fulfillment = port_snapshots[ticks::"fulfillment"]
+            future_shortage = port_snapshots[ticks::"shortage"]
+            decay_list = [
+                self.time_decay ** i for i in range(self.reward_time_window)
+                for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
+            ]

-.. code-block:: python
-    class TruncatedExperienceShaper(ExperienceShaper):
-        ...
-        def __call__(self, trajectory, snapshot_list):
-            experiences_by_agent = {}
-            for i in range(len(trajectory) - 1):
-                transition = trajectory[i]
-                agent_id = transition["agent_id"]
-                if agent_id not in experiences_by_agent:
-                    experiences_by_agent[agent_id] = defaultdict(list)
-                experiences = experiences_by_agent[agent_id]
-                experiences["state"].append(transition["state"])
-                experiences["action"].append(transition["action"])
-                experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
-                experiences["next_state"].append(trajectory[i + 1]["state"])
+            tot_fulfillment = np.dot(future_fulfillment, decay_list)
+            tot_shortage = np.dot(future_shortage, decay_list)
+
+            return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
+
+        def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
+            self.trajectory["event"].append(event)
+            self.trajectory["state"].append(state_by_agent)
+            self.trajectory["action"].append(action_by_agent)
+
+        def on_finish(self):
+            exp_by_agent = defaultdict(lambda: defaultdict(list))
+            for i in range(len(self.trajectory["state"]) - 1):
+                agent_id = list(self.trajectory["state"][i].keys())[0]
+                exp = exp_by_agent[agent_id]
+                exp["S"].append(self.trajectory["state"][i][agent_id])
+                exp["A"].append(self.trajectory["action"][i][agent_id])
+                exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
+                exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
+
+            return dict(exp_by_agent)

-            return experiences_by_agent

 Agent
 -----

-`Agent <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent>`_ is a combination of (RL)
-algorithm, experience pool, and a set of parameters that governs the training loop. For this scenario, the agent is the
-abstraction of a port. We choose DQN as our underlying learning algorithm with a TD-error-based sampling mechanism.
+The out-of-the-box DQN is used as our agent.

-.. code-block:: python
-    NUM_ACTIONS = 21
-    class DQNAgent(AbsAgent):
-        ...
-        def train(self):
-            if len(self._experience_pool) < self._min_experiences_to_train:
-                return
-
-            for _ in range(self._num_batches):
-                indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
-                state = np.asarray(sample["state"])
-                action = np.asarray(sample["action"])
-                reward = np.asarray(sample["reward"])
-                next_state = np.asarray(sample["next_state"])
-                loss = self._algorithm.train(state, action, reward, next_state)
-                self._experience_pool.update(indexes, {"loss": loss})
-
-    def create_dqn_agents(agent_id_list):
-        agent_dict = {}
-        for agent_id in agent_id_list:
-            q_net = NNStack(
-                "q_value",
-                FullyConnectedBlock(
-                    input_dim=state_shaper.dim,
-                    hidden_dims=[256, 128, 64],
-                    output_dim=NUM_ACTIONS,
-                    activation=nn.LeakyReLU,
-                    is_head=True,
-                    batch_norm_enabled=True, 
-                    softmax_enabled=False,
-                    skip_connection_enabled=False,
-                    dropout_p=.0)
-            )
-
-            algorithm = DQN(
-                model=LearningModel(
-                    q_net, optimizer_options=OptimizerOptions(cls=RMSprop, params={"lr": 0.05})
-                ),
-                config=DQNConfig(
-                    reward_decay=.0, 
-                    target_update_frequency=5, 
-                    tau=0.1, 
-                    is_double=True, 
-                    per_sample_td_error_enabled=True,
-                    loss_cls=nn.SmoothL1Loss
-                )
-            )
-
-            experience_pool = ColumnBasedStore(**config.experience_pool)
-            agent_dict[agent_id] = DQNAgent(
-                agent_id, algorithm, ColumnBasedStore(), 
-                min_experiences_to_train=1024, num_batches=10, batch_size=128
-            )
-
-        return agent_dict
-
-Agent Manager
-------------
-
-The complexities of the environment can be isolated from the learning algorithm by using an
-`Agent manager <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent-manager>`_
-to manage individual agents. We define a function to create the agents and an agent manager class
-that implements the ``train`` method where the newly obtained experiences are stored in the agents'
-experience pools before training, in accordance with the DQN algorithm.
-
-.. code-block:: python
-    class DQNAgentManager(SimpleAgentManager):
-        def train(self, experiences_by_agent, performance=None):
-            self._assert_train_mode()
-
-            # store experiences for each agent
-            for agent_id, exp in experiences_by_agent.items():
-                exp.update({"loss": [1e8] * len(list(exp.values())[0])})
-                self.agent_dict[agent_id].store_experiences(exp)
-
-            for agent in self.agent_dict.values():
-                agent.train()
-
-Main Loop with Actor and Learner (Single Process)
-------------------------------------------------
-
-This single-process workflow of a learning policy's interaction with a MARO environment is comprised of:
- Initializing an environment with specific scenario and topology parameters.
- Defining scenario-specific components, e.g. shapers.
- Creating agents and an agent manager.
- Creating an `actor <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ and a
-`learner <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ to start the
-training process in which the agent manager interacts with the environment for collecting experiences and updating
-policies.
-
-.. code-block::python
-    env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    state_shaper = CIMStateShaper(look_back=7, max_ports_downstream=2)
-    action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, NUM_ACTIONS)))
-    experience_shaper = TruncatedExperienceShaper(
-        time_window=100, fulfillment_factor=1.0, shortage_factor=1.0, time_decay_factor=0.97
-    )
-    agent_manager = DQNAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN_INFERENCE,
-        agent_dict=create_dqn_agents(agent_id_list),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper
-    )
-
-    scheduler = TwoPhaseLinearParameterScheduler(
-        max_episode=100,
-        parameter_names=["epsilon"],
-        split_ep=50,
-        start_values=0.4,
-        mid_values=0.32,
-        end_values=.0
-    )
-
-    actor = SimpleActor(env, agent_manager)
-    learner = SimpleLearner(agent_manager, actor, scheduler)
-    learner.learn()
-
-
-Main Loop with Actor and Learner (Distributed/Multi-process)
--------------------------------------------------------------
-
-We demonstrate a single-learner and multi-actor topology where the learner drives the program by telling remote actors
-to perform roll-out tasks and using the results they sent back to improve the policies. The workflow usually involves
-launching a learner process and an actor process separately. Because training occurs on the learner side and inference
-occurs on the actor side, we need to create appropriate agent managers on both sides.
-
-On the actor side, the agent manager must be equipped with all shapers as well as an explorer. Thus, The code for
-creating an environment and an agent manager on the actor side is similar to that for the single-host version,
-except that it is necessary to set the AgentManagerMode to AgentManagerMode.INFERENCE. As in the single-process version, the environment
-and the agent manager are wrapped in a SimpleActor instance. To make the actor a distributed worker, we need to further
-wrap it in an ActorWorker instance. Finally, we launch the worker and it starts to listen to roll-out requests from the
-learner. The following code snippet shows the creation of an actor worker with a simple (local) actor wrapped inside.
-
-.. code-block:: python
-    env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    agent_manager = DQNAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.INFERENCE,
-        agent_dict=create_dqn_agents(agent_id_list),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper
-    )
-    proxy_params = {
-        "group_name": "distributed_cim", 
-        "expected_peers": {"learner": 1}, 
-        "redis_address": ("localhost", 6379),
-        "max_retries": 15
+.. code-block:: python    
+    agent_config = {
+        "model": ...,
+        "optimization": ...,
+        "hyper_params": ...
    }
-    actor_worker = ActorWorker(
-        local_actor=SimpleActor(env=env, agent_manager=agent_manager),
-        proxy_params=proxy_params
-    )
-    actor_worker.launch()

-On the learner side, an agent manager in AgentManagerMode.TRAIN mode is required. However, it is not necessary to create shapers for an
-agent manager in AgentManagerMode.TRAIN mode. Instead of creating an actor, we create an actor proxy and wrap it inside the learner. This proxy
-serves as the communication interface for the learner and is responsible for sending roll-out requests to remote actor
-processes and receiving results. Calling the train method executes the usual training loop except that the actual
-roll-out is performed remotely. The code snippet below shows the creation of a learner with an actor proxy wrapped
-inside that communicates with 3 actors. 
+    def get_dqn_agent():
+        q_model = SimpleMultiHeadModel(
+            FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
+        )
+        return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
+
+
+Training
+--------
+
+The distributed training consists of one learner process and multiple actor processes. The learner optimizes
+the policy by collecting roll-out data from the actors to train the underlying agents.
+
+The actor process must create a roll-out executor for performing the requested roll-outs, which means that the
+the environment simulator and shapers should be created here. In this example, inference is performed on the
+actor's side, so a set of DQN agents must be created in order to load the models (and exploration parameters)
+from the learner.

 .. code-block:: python
+    def cim_dqn_actor():
+        env = Env(**training_config["env"])
+        agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
+        actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
+        actor.as_worker(training_config["group"])

-    agent_manager = DQNAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN,
-        agent_dict=create_dqn_agents(agent_id_list),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper
-    )
-    proxy_params = {
-        "group_name": "distributed_cim", 
-        "expected_peers": {"actor": 3}, 
-        "redis_address": ("localhost", 6379),
-        "max_retries": 15
-    }
-    actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
-    scheduler = TwoPhaseLinearParameterScheduler(
-        max_episode=100,
-        parameter_names=["epsilon"],
-        split_ep=50,
-        start_values=0.4,
-        mid_values=0.32,
-        end_values=.0
-    )
-    learner = SimpleLearner(agent_manager, actor, scheduler)
-    learner.learn()
+The learner's side requires a concrete learner class that inherits from ``AbsLearner`` and implements the ``run``
+method which contains the main training loop. Here the implementation is similar to the single-threaded version
+except that the ``collect`` method is used to obtain roll-out data from the actors (since the roll-out executors
+are located on the actors' side). The agents created here are where training occurs and hence always contains the
+latest policies. 
+
+.. code-block:: python
+    def cim_dqn_learner():
+        env = Env(**training_config["env"])
+        agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
+        scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
+        actor = ActorProxy(
+            training_config["group"], training_config["num_actors"],
+            update_trigger=training_config["learner_update_trigger"]
+        )
+        learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
+        learner.run()

 .. note::

--- a/docs/source/images/distributed/orch_grass.svg
+++ b/docs/source/images/distributed/orch_grass.svg
--- a/docs/source/images/distributed/orch_k8s.svg
+++ b/docs/source/images/distributed/orch_k8s.svg
--- a/docs/source/images/distributed/orch_overview.svg
+++ b/docs/source/images/distributed/orch_overview.svg
--- a/docs/source/images/rl/agent.svg
+++ b/docs/source/images/rl/agent.svg
--- a/docs/source/images/rl/agent_manager.svg
+++ b/docs/source/images/rl/agent_manager.svg
--- a/docs/source/images/rl/algorithm.svg
+++ b/docs/source/images/rl/algorithm.svg
--- a/docs/source/images/rl/learner_actor.svg
+++ b/docs/source/images/rl/learner_actor.svg
--- a/docs/source/images/rl/learning_model.svg
+++ b/docs/source/images/rl/learning_model.svg
--- a/docs/source/images/rl/overview.svg
+++ b/docs/source/images/rl/overview.svg
--- a/docs/source/images/visualization/geographic/data_chart_display.gif
+++ b/docs/source/images/visualization/geographic/data_chart_display.gif
--- a/docs/source/images/visualization/geographic/database_exp.png
+++ b/docs/source/images/visualization/geographic/database_exp.png
--- a/docs/source/images/visualization/geographic/geographic_data_display.gif
+++ b/docs/source/images/visualization/geographic/geographic_data_display.gif
--- a/docs/source/images/visualization/geographic/local_mode.gif
+++ b/docs/source/images/visualization/geographic/local_mode.gif
--- a/docs/source/images/visualization/geographic/local_mode_left_chart.gif
+++ b/docs/source/images/visualization/geographic/local_mode_left_chart.gif
--- a/docs/source/images/visualization/geographic/local_mode_right_chart.gif
+++ b/docs/source/images/visualization/geographic/local_mode_right_chart.gif
--- a/docs/source/images/visualization/geographic/real_time_mode.gif
+++ b/docs/source/images/visualization/geographic/real_time_mode.gif
--- a/docs/source/images/visualization/geographic/time_window_selection.gif
+++ b/docs/source/images/visualization/geographic/time_window_selection.gif
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -81,9 +81,9 @@ Contents

    installation/pip_install.rst
    installation/playground.rst
-    installation/grass_cluster_provisioning_on_azure.rst
-    installation/k8s_cluster_provisioning_on_azure.rst
-    installation/grass_cluster_provisioning_on_premises.rst
+    installation/grass_azure_cluster_provisioning.rst
+    installation/grass_on_premises_cluster_provisioning.rst
+    installation/k8s_aks_cluster_provisioning.rst
    installation/multi_processes_localhost_provisioning.rst

 .. toctree::
@ -93,6 +93,7 @@ Contents
    scenarios/container_inventory_management.rst
    scenarios/citi_bike.rst
    scenarios/vm_scheduling.rst
+    scenarios/command_line.rst

 .. toctree::
    :maxdepth: 2
@ -114,6 +115,7 @@ Contents
    key_components/communication.rst
    key_components/orchestration.rst
    key_components/dashboard_visualization.rst
+    key_components/geographic_visualization.rst

 .. toctree::
    :maxdepth: 2
--- a/docs/source/installation/grass_azure_cluster_provisioning.rst
+++ b/docs/source/installation/grass_azure_cluster_provisioning.rst
@ -0,0 +1,240 @@
+.. _grass-azure-cluster-provisioning:
+
+Grass Cluster Provisioning on Azure
+===================================
+
+With the following guide, you can build up a MARO cluster in
+:ref:`grass/azure <grass>`
+mode on Azure and run your training job in a distributed environment.
+
+Prerequisites
+-------------
+
+* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
+* `Install docker <https://docs.docker.com/engine/install/>`_ and
+  `Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
+
+Cluster Management
+------------------
+
+* Create a cluster with a :ref:`deployment <#grass-azure-create>`
+
+  .. code-block:: sh
+
+    # Create a grass cluster with a grass-create deployment
+    maro grass create ./grass-azure-create.yml
+
+* Scale the cluster
+
+    Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.
+
+  .. code-block:: sh
+
+    # Scale nodes with 'Standard_D4s_v3' specification to 2
+    maro grass node scale myGrassCluster Standard_D4s_v3 2
+
+    # Scale nodes with 'Standard_D2s_v3' specification to 0
+    maro grass node scale myGrassCluster Standard_D2s_v3 0
+
+* Delete the cluster
+
+  .. code-block:: sh
+
+    # Delete a grass cluster
+    maro grass delete myGrassCluster
+
+* Start/Stop nodes to save costs
+
+  .. code-block:: sh
+
+    # Start 2 nodes with 'Standard_D4s_v3' specification
+    maro grass node start myGrassCluster Standard_D4s_v3 2
+
+    # Stop 2 nodes with 'Standard_D4s_v3' specification
+    maro grass node stop myGrassCluster Standard_D4s_v3 2
+
+* Get statuses of the cluster
+
+  .. code-block:: sh
+
+    # Get master status
+    maro grass status myGrassCluster master
+
+    # Get nodes status
+    maro grass status myGrassCluster nodes
+
+    # Get containers status
+    maro grass status myGrassCluster containers
+
+* Clean up the cluster
+
+    Delete all running jobs, schedules, containers in the cluster.
+
+  .. code-block:: sh
+
+    maro grass clean myGrassCluster
+
+.. _grass-azure-cluster-provisioning/run-job:
+
+Run Job
+-------
+
+* Push your training image from local machine
+
+  .. code-block:: sh
+
+    # Push image 'myImage' to the cluster,
+    # 'myImage' is a docker image that loaded on the machine that executed this command
+    maro grass image push myGrassCluster --image-name myImage
+
+* Push your training data
+
+  .. code-block:: sh
+
+    # Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
+    # You can then assign your mapping location in the start-job-deployment
+    maro grass data push myGrassCluster ./myTrainingData/dqn /myTrainingData
+
+* Start a training job with a :ref:`start-job-deployment <grass-start-job>`
+
+  .. code-block:: sh
+
+    # Start a training job with a start-job deployment
+    maro grass job start myGrassCluster ./grass-start-job.yml
+
+* Or, schedule batch jobs with a :ref:`start-schedule-deployment <grass-start-schedule>`
+
+    These jobs will shared the same specification of components.
+
+    A best practice to use this command will be:
+    Push your training configs all at once with "``maro grass data push``",
+    and get the jobName from environment variables in the containers,
+    then use the specific training config based on the jobName.
+
+  .. code-block:: sh
+
+    # Start a training schedule with a start-schedule deployment
+    maro grass schedule start myGrassCluster ./grass-start-schedule.yml
+
+* Get the logs of the job
+
+  .. code-block:: sh
+
+    # Get the logs of the job
+    maro grass job logs myGrassCluster myJob1
+
+* List the current status of the job
+
+  .. code-block:: sh
+
+    # List the current status of the job
+    maro grass job list myGrassCluster
+
+* Stop a training job
+
+  .. code-block:: sh
+
+    # Stop a training job
+    maro grass job stop myJob1
+
+Sample Deployments
+------------------
+
+grass-azure-create
+^^^^^^^^^^^^^^^^^^
+
+.. code-block:: yaml
+
+   mode: grass/azure
+   name: myGrassCluster
+
+   cloud:
+     resource_group: myResourceGroup
+     subscription: mySubscription
+     location: eastus
+     default_username: admin
+     default_public_key: "{ssh public key}"
+
+   user:
+     admin_id: admin
+
+   master:
+     node_size: Standard_D2s_v3
+
+grass-start-job
+^^^^^^^^^^^^^^^
+
+    You can replace {project root} with a valid linux path. e.g. /home/admin
+
+    Then the data you push will be mount into this folder.
+
+.. code-block:: yaml
+
+   mode: grass
+   name: myJob1
+
+   allocation:
+     mode: single-metric-balanced
+     metric: cpu
+
+   components:
+     actor:
+       command: "python {project root}/myTrainingData/dqn/job1/start_actor.py"
+       image: myImage
+       mount:
+         target: "{project root}"
+       num: 5
+       resources:
+         cpu: 1
+         gpu: 0
+         memory: 1024m
+     learner:
+       command: "python {project root}/myTrainingData/dqn/job1/start_learner.py"
+       image: myImage
+       mount:
+         target: "{project root}"
+       num: 1
+       resources:
+         cpu: 2
+         gpu: 0
+         memory: 2048m
+
+grass-start-schedule
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: yaml
+
+   mode: grass
+   name: mySchedule1
+
+   allocation:
+     mode: single-metric-balanced
+     metric: cpu
+
+   job_names:
+     - myJob2
+     - myJob3
+     - myJob4
+     - myJob5
+
+   components:
+     actor:
+       command: "python {project root}/myTrainingData/dqn/schedule1/actor.py"
+       image: myImage
+       mount:
+         target: “{project root}”
+       num: 5
+       resources:
+         cpu: 1
+         gpu: 0
+         memory: 1024m
+     learner:
+       command: "bash {project root}/myTrainingData/dqn/schedule1/learner.py"
+       image: myImage
+       mount:
+         target: "{project root}"
+       num: 1
+       resources:
+         cpu: 2
+         gpu: 0
+         memory: 2048m
--- a/docs/source/installation/grass_cluster_provisioning_on_azure.rst
+++ b/docs/source/installation/grass_cluster_provisioning_on_azure.rst
@ -1,202 +0,0 @@
-
-Grass Cluster Provisioning on Azure
-===================================
-
-With the following guide, you can build up a MARO cluster in
-`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
-on Azure and run your training job in a distributed environment.
-
-Prerequisites
-------------
-
-* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
-* `Install docker <https://docs.docker.com/engine/install/>`_ and
-  `Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
-
-Cluster Management
------------------
-
-* Create a cluster with a `deployment <#grass-azure-create>`_
-
-  .. code-block:: sh
-
-    # Create a grass cluster with a grass-create deployment
-    maro grass create ./grass-azure-create.yml
-
-* Scale the cluster
-
-  .. code-block:: sh
-
-    # Scale nodes with 'Standard_D4s_v3' specification to 2
-    maro grass node scale my_grass_cluster Standard_D4s_v3 2
-
-  Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
-  to see more node specifications.
-
-* Delete the cluster
-
-  .. code-block:: sh
-
-    # Delete a grass cluster
-    maro grass delete my_grass_cluster
-
-* Start/stop nodes to save costs
-
-  .. code-block:: sh
-
-    # Start 2 nodes with 'Standard_D4s_v3' specification
-    maro grass node start my_grass_cluster Standard_D4s_v3 2
-
-    # Stop 2 nodes with 'Standard_D4s_v3' specification
-    maro grass node stop my_grass_cluster Standard_D4s_v3 2
-
-Run Job
-------
-
-* Push your training image
-
-  .. code-block:: sh
-
-    # Push image 'my_image' to the cluster
-    maro grass image push my_grass_cluster --image-name my_image
-
-* Push your training data
-
-  .. code-block:: sh
-
-    # Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
-    # You can then assign your mapping location in the start-job deployment
-    maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
-
-* Start a training job with a `deployment <#grass-start-job>`_
-
-  .. code-block:: sh
-
-    # Start a training job with a start-job deployment
-    maro grass job start my_grass_cluster ./grass-start-job.yml
-
-* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
-
-  .. code-block:: sh
-
-    # Start a training schedule with a start-schedule deployment
-    maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
-
-* Get the logs of the job
-
-  .. code-block:: sh
-
-    # Get the logs of the job
-    maro grass job logs my_grass_cluster my_job_1
-
-* List the current status of the job
-
-  .. code-block:: sh
-
-    # List the current status of the job
-    maro grass job list my_grass_cluster
-
-* Stop a training job
-
-  .. code-block:: sh
-
-    # Stop a training job
-    maro grass job stop my_job_1
-
-Sample Deployments
------------------
-
-grass-azure-create
-^^^^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass
-   name: my_grass_cluster
-
-   cloud:
-     infra: azure
-     location: eastus
-     resource_group: my_grass_resource_group
-     subscription: my_subscription
-
-   user:
-     admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
-     admin_username: admin
-
-   master:
-     node_size: Standard_D2s_v3
-
-grass-start-job
-^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass
-   name: my_job_1
-
-   allocation:
-     mode: single-metric-balanced
-     metric: cpu
-
-   components:
-     actor:
-       command: "bash {project root}/my_training_data/job_1/actor.sh"
-       image: my_image
-       mount:
-         target: “{project root}”
-       num: 5
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-     learner:
-       command: "bash {project root}/my_training_data/job_1/learner.sh"
-       image: my_image
-       mount:
-         target: "{project root}"
-       num: 1
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-
-grass-start-schedule
-^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass
-   name: my_schedule_1
-
-   allocation:
-     mode: single-metric-balanced
-     metric: cpu
-
-   job_names:
-     - my_job_2
-     - my_job_3
-     - my_job_4
-     - my_job_5
-
-   components:
-     actor:
-       command: "bash {project root}/my_training_data/job_1/actor.sh"
-       image: my_image
-       mount:
-         target: “{project root}”
-       num: 5
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-     learner:
-       command: "bash {project root}/my_training_data/job_1/learner.sh"
-       image: my_image
-       mount:
-         target: "{project root}"
-       num: 1
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
--- a/docs/source/installation/grass_cluster_provisioning_on_premises.rst
+++ b/docs/source/installation/grass_cluster_provisioning_on_premises.rst
@ -1,206 +0,0 @@
-
-Grass Cluster Provisioning in On-Premises Environment
-=====================================================
-
-With the following guide, you can build up a MARO cluster in
-`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
-in local private network and run your training job in On-Premises distributed environment.
-
-Prerequisites
-------------
-
-* Linux with Python 3.6+
-* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
-
-Cluster Management
------------------
-
-* Create a cluster with a `deployment <#grass-cluster-create>`_
-
-  .. code-block:: sh
-
-    # Create a grass cluster with a grass-create deployment
-    maro grass create ./grass-azure-create.yml
-
-* Let a node join a specified cluster
-
-  .. code-block:: sh
-
-    # Let a worker node join into specified cluster
-    maro grass node join ./node-join.yml
-
-* Let a node leave a specified cluster
-
-  .. code-block:: sh
-
-    # Let a worker node leave a specified cluster
-    maro grass node leave {cluster_name} {node_name}
-
-
-* Delete the cluster
-
-  .. code-block:: sh
-
-    # Delete a grass cluster
-    maro grass delete my_grass_cluster
-
-
-Run Job
-------
-
-* Push your training image
-
-  .. code-block:: sh
-
-    # Push image 'my_image' to the cluster
-    maro grass image push my_grass_cluster --image-name my_image
-
-* Push your training data
-
-  .. code-block:: sh
-
-    # Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
-    # You can then assign your mapping location in the start-job deployment
-    maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
-
-* Start a training job with a `deployment <#grass-start-job>`_
-
-  .. code-block:: sh
-
-    # Start a training job with a start-job deployment
-    maro grass job start my_grass_cluster ./grass-start-job.yml
-
-* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
-
-  .. code-block:: sh
-
-    # Start a training schedule with a start-schedule deployment
-    maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
-
-* Get the logs of the job
-
-  .. code-block:: sh
-
-    # Get the logs of the job
-    maro grass job logs my_grass_cluster my_job_1
-
-* List the current status of the job
-
-  .. code-block:: sh
-
-    # List the current status of the job
-    maro grass job list my_grass_cluster
-
-* Stop a training job
-
-  .. code-block:: sh
-
-    # Stop a training job
-    maro grass job stop my_job_1
-
-Sample Deployments
------------------
-
-grass-cluster-create
-^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass/on-premises
-   name: cluster_name
-
-   user:
-     admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
-     admin_username: admin
-
-
-grass-node-join
-^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-    mode: "grass/on-premises"
-    name: ""
-    cluster: ""
-    public_ip_address: ""
-    hostname: ""
-    system: "linux"
-    resources:
-      cpu: 1
-      memory: 1024
-      gpu: 0
-
-
-grass-start-job
-^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass
-   name: my_job_1
-
-   allocation:
-     mode: single-metric-balanced
-     metric: cpu
-
-   components:
-     actor:
-       command: "bash {project root}/my_training_data/job_1/actor.sh"
-       image: my_image
-       mount:
-         target: “{project root}”
-       num: 5
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-     learner:
-       command: "bash {project root}/my_training_data/job_1/learner.sh"
-       image: my_image
-       mount:
-         target: "{project root}"
-       num: 1
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-
-grass-start-schedule
-^^^^^^^^^^^^^^^^^^^^
-
-.. code-block:: yaml
-
-   mode: grass
-   name: my_schedule_1
-
-   allocation:
-     mode: single-metric-balanced
-     metric: cpu
-
-   job_names:
-     - my_job_2
-     - my_job_3
-     - my_job_4
-     - my_job_5
-
-   components:
-     actor:
-       command: "bash {project root}/my_training_data/job_1/actor.sh"
-       image: my_image
-       mount:
-         target: “{project root}”
-       num: 5
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
-     learner:
-       command: "bash {project root}/my_training_data/job_1/learner.sh"
-       image: my_image
-       mount:
-         target: "{project root}"
-       num: 1
-       resources:
-         cpu: 2
-         gpu: 0
-         memory: 2048m
--- a/docs/source/installation/grass_on_premises_cluster_provisioning.rst
+++ b/docs/source/installation/grass_on_premises_cluster_provisioning.rst
@ -0,0 +1,98 @@
+.. _grass-on-premises-cluster-provisioning:
+
+Grass Cluster Provisioning in On-Premises Environment
+=====================================================
+
+With the following guide, you can build up a MARO cluster in
+:ref:`grass/on-premises <grass>`
+in local private network and run your training job in On-Premises distributed environment.
+
+Prerequisites
+-------------
+
+* Linux with Python 3.6+
+* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
+
+Cluster Management
+------------------
+
+* Create a cluster with a :ref:`deployment <grass-on-premises-create>`
+
+  .. code-block:: sh
+
+    # Create a grass cluster with a grass-create deployment
+    maro grass create ./grass-azure-create.yml
+
+* Let a node join a specified cluster
+
+  .. code-block:: sh
+
+    # Let a worker node join into specified cluster
+    maro grass node join ./node-join.yml
+
+* Let a node leave a specified cluster
+
+  .. code-block:: sh
+
+    # Let a worker node leave a specified cluster
+    maro grass node leave {cluster_name} {node_name}
+
+
+* Delete the cluster
+
+  .. code-block:: sh
+
+    # Delete a grass cluster
+    maro grass delete my_grass_cluster
+
+
+Run Job
+-------
+
+See :ref:`Run Job in grass/azure <grass-azure-cluster-provisioning/run-job>` for reference.
+
+
+Sample Deployments
+------------------
+
+grass-on-premises-create
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: yaml
+
+   mode: grass/on-premises
+   name: clusterName
+
+   user:
+     admin_id: admin
+
+   master:
+     username: root
+     hostname: maroMaster
+     public_ip_address: 137.128.0.1
+     private_ip_address: 10.0.0.4
+
+
+grass-on-premises-join-cluster
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: yaml
+
+    mode: grass/on-premises
+
+    master:
+      private_ip_address: 10.0.0.4
+
+    node:
+      hostname: maroNode1
+      username: root
+      public_ip_address: 137.128.0.2
+      private_ip_address: 10.0.0.5
+      resources:
+        cpu: all
+        memory: 2048m
+        gpu: 0
+
+     config:
+       install_node_runtime: true
+       install_node_gpu_support: false
--- a/docs/source/installation/k8s_cluster_provisioning_on_azure.rst
+++ b/docs/source/installation/k8s_cluster_provisioning_on_azure.rst
@ -1,8 +1,10 @@
+.. _k8s-aks-cluster-provisioning:
+
 K8S Cluster Provisioning on Azure
 =================================

 With the following guide, you can build up a MARO cluster in
-`k8s mode <../distributed_training/orchestration_with_k8s.html#orchestration-with-k8s>`_
+:ref:`k8s/aks <k8s>`
 on Azure and run your training job in a distributed environment.

 Prerequisites
@ -36,7 +38,7 @@ Prerequisites
 Cluster Management
 ------------------

-* Create a cluster with a `deployment <#k8s-azure-create>`_
+* Create a cluster with a :ref:`deployment <k8s-aks-create>`

  .. code-block:: sh

@ -47,18 +49,20 @@ Cluster Management

  .. code-block:: sh

-    # Scale nodes with 'Standard_D4s_v3' specification to 2
-    maro k8s node scale my_k8s_cluster Standard_D4s_v3 2
+      Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.

-  Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
-  to see more node specifications.
+    # Scale nodes with 'Standard_D4s_v3' specification to 2
+    maro k8s node scale myK8sCluster Standard_D4s_v3 2
+
+    # Scale nodes with 'Standard_D2s_v3' specification to 0
+    maro k8s node scale myK8sCluster Standard_D2s_v3 0

 * Delete the cluster

  .. code-block:: sh

    # Delete a k8s cluster
-    maro k8s delete my_k8s_cluster
+    maro k8s delete myK8sCluster

 Run Job
 -------
@ -67,72 +71,69 @@ Run Job

  .. code-block:: sh

-    # Push image 'my_image' to the cluster
-    maro k8s image push my_k8s_cluster --image-name my_image
+    # Push image 'myImage' to the cluster
+    maro k8s image push myK8sCluster --image-name myImage

 * Push your training data

  .. code-block:: sh

-    # Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
-    # You can then assign your mapping location in the start-job deployment
-    maro k8s data push my_k8s_cluster ./my_training_data/* /my_training_data
+    # Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
+    # You can then assign your mapping location in the start-job-deployment
+    maro k8s data push myGrassCluster ./myTrainingData/dqn /myTrainingData

-* Start a training job with a `deployment <#k8s-start-job>`_
+* Start a training job with a :ref:`deployment <k8s-start-job>`

  .. code-block:: sh

-    # Start a training job with a start-job deployment
-    maro k8s job start my_k8s_cluster ./k8s-start-job.yml
+    # Start a training job with a start-job-deployment
+    maro k8s job start myK8sCluster ./k8s-start-job.yml

-* Or, schedule batch jobs with a `deployment <#k8s-start-schedule>`_
+* Or, schedule batch jobs with a :ref:`deployment <k8s-start-schedule>`

  .. code-block:: sh

-    # Start a training schedule with a start-schedule deployment
-    maro k8s schedule start my_k8s123_cluster ./k8s-start-schedule.yml
+    # Start a training schedule with a start-schedule-deployment
+    maro k8s schedule start myK8sCluster ./k8s-start-schedule.yml

 * Get the logs of the job

  .. code-block:: sh

    # Logs will be exported to current directory
-    maro k8s job logs my_k8s_cluster my_job_1
+    maro k8s job logs myK8sCluster myJob1

 * List the current status of the job

  .. code-block:: sh

    # List current status of jobs
-    maro k8s job list my_k8s_cluster my_job_1
+    maro k8s job list myK8sCluster myJob1

 * Stop a training job

  .. code-block:: sh

    # Stop a training job
-    maro k8s job stop my_k8s_cluster my_job_1
+    maro k8s job stop myK8sCluster myJob1

 Sample Deployments
 ------------------

-k8s-azure-create
-^^^^^^^^^^^^^^^^
+k8s-aks-create
+^^^^^^^^^^^^^^

 .. code-block:: yaml

-   mode: k8s
-   name: my_k8s_cluster
+   mode: k8s/aks
+   name: myK8sCluster

   cloud:
-     infra: azure
+     subscription: mySubscription
+     resource_group: myResourceGroup
     location: eastus
-     resource_group: my_k8s_resource_group
-     subscription: my_subscription
-
-   user:
-     admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
-     admin_username: admin
+     default_public_key: "{ssh public key}"
+     default_username: admin

   master:
     node_size: Standard_D2s_v3
@ -142,63 +143,63 @@ k8s-start-job

 .. code-block:: yaml

-   mode: k8s
-   name: my_job_1
+   mode: k8s/aks
+   name: myJob1

   components:
     actor:
-       command: ["bash", "{project root}/my_training_data/actor.sh"]
-       image: my_image
+       command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
+       image: myImage
       mount:
         target: "{project root}"
       num: 5
       resources:
         cpu: 2
         gpu: 0
-         memory: 2048m
+         memory: 2048M
     learner:
-       command: ["bash", "{project root}/my_training_data/learner.sh"]
-       image: my_image
+       command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
+       image: myImage
       mount:
         target: "{project root}"
       num: 1
       resources:
         cpu: 2
         gpu: 0
-         memory: 2048m
+         memory: 2048M

 k8s-start-schedule
 ^^^^^^^^^^^^^^^^^^

 .. code-block:: yaml

-   mode: k8s
-   name: my_schedule_1
+   mode: k8s/aks
+   name: mySchedule1

   job_names:
-     - my_job_2
-     - my_job_3
-     - my_job_4
-     - my_job_5
+     - myJob2
+     - myJob3
+     - myJob4
+     - myJob5

   components:
     actor:
-       command: ["bash", "{project root}/my_training_data/actor.sh"]
-       image: my_image
+       command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
+       image: myImage
       mount:
         target: "{project root}"
       num: 5
       resources:
         cpu: 2
         gpu: 0
-         memory: 2048m
+         memory: 2048M
     learner:
-       command: ["bash", "{project root}/my_training_data/learner.sh"]
-       image: my_image
+       command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
+       image: myImage
       mount:
         target: "{project root}"
       num: 1
       resources:
         cpu: 2
         gpu: 0
-         memory: 2048m
+         memory: 2048M
--- a/docs/source/key_components/dashboard_visualization.rst
+++ b/docs/source/key_components/dashboard_visualization.rst
@ -71,7 +71,7 @@ To start this visualization tool, user need to input command following the forma

 .. code-block:: sh

-    maro inspector env --source {source\_folder\_path} --force {true/false}
+    maro inspector dashboard --source_path {source\_folder\_path} --force {true/false}

 ----

@ -79,7 +79,7 @@ e.g.

 .. code-block:: sh

-    maro inspector env --source_path .\maro\dumper_files --force false
+    maro inspector dashboard --source_path .\maro\dumper_files --force false

 ----

--- a/docs/source/key_components/geographic_visualization.rst
+++ b/docs/source/key_components/geographic_visualization.rst
@ -0,0 +1,235 @@
+Geographic Visualization
+=======================
+
+We can use Env-geographic for both finished experiments and running experiments.
+For finished experiments, the local mode is enabled for users to view experimental data
+in order to help users to make subsequent decisions. If a running experiment is selected,
+the real-time mode will be launched by default, it is used to view real-time experimental
+data  and judge the effectiveness of the model. You can also freely change to
+local mode for the finished epoch under real-time mode.
+
+
+Dependency
+----------
+
+Env-geographic's startup depends on docker. 
+Therefore, users need to install docker on the machine and ensure that it can run normally.
+User could get docker through `Docker installation <https://docs.docker.com/get-docker/>`_.
+
+
+How to Use?
+-----------
+
+Env-geographic has 3 parts: front-end, back-end and database. Users need 2 steps
+to start this tool:
+
+1. Start the database and choose an experiment to be displayed.
+2. Start the front-end and back-end service with specified experiment name.
+
+
+Start database
+~~~~~~~~~~~~~~
+Firstly, user need to start the local database with command:
+
+.. code-block:: sh
+
+    maro inspector geo --start database
+
+----
+
+After the command is executed successfully, user
+could view the local data with localhost:9000 by default. 
+If the default port is occupied, user could obtain the access port of each container
+through the following command:
+
+.. code-block:: sh
+
+    docker container ls
+
+----
+
+User could view all experiment information by SQL statement:
+
+.. code-block:: SQL
+
+    SELECT * FROM maro.experiments
+
+----
+
+Data is stored locally at the folder maro/maro/streamit/server/data.
+
+
+Choose an existing experiment
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To view the visualization of experimental data, user need to
+specify the name of experiment. User could choose an existing
+experiment or start an experiment either.
+
+User could select a name from local database.
+
+.. image:: ../images/visualization/geographic/database_exp.png
+   :alt: database_exp
+
+
+Create a new experiment
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Currently, users need to manually start the experiment to obtain
+the data required by the service.
+
+To send data to database, there are 2 compulsory steps:
+
+1. Set the environmental variable to enable data transmission.
+2. Import relevant package and modify the code of environmental initialization to send data.
+
+User needs to set the value of the environment variable
+"MARO_STREAMIT_ENABLED" to "true". If user wants to specify the experiment name,
+set the environment variable "MARO_STREAMIT_EXPERIMENT_NAME". If user does not 
+set this value, a unique experiment name would be processed automatically. User
+could check the experiment name through database. It should be noted that when
+selecting a topology, user must select a topology with specific geographic
+information. The experimental data obtained by using topology files without
+geographic information cannot be used in the Env-geographic tool.
+
+User could set the environmental variable as following example:
+
+.. code-block:: python
+
+    os.environ["MARO_STREAMIT_ENABLED"] = "true"
+
+    os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "my_maro_experiment"
+
+----
+
+To send the experimental data by episode while the experiment is running, user needs to import the
+package **streamit** with following code before environment initialization:
+
+.. code-block:: python
+
+      # Import package streamit
+      from maro.streamit import streamit
+      # Initialize environment and send basic information of experiment to database.
+      env = Env(scenario="cim", topology="global_trade.22p_l0.1",
+               start_tick=0, durations=100)
+      
+      for ep in range(EPISODE_NUMBER):
+            # Send experimental data to database by episode.
+            streamit.episode(ep)
+
+----
+
+To get the complete reference, please view the file maro/examples/hello_world/cim/hello.py.
+
+After starting the experiment, user needs to query its name in local database to make sure
+the experimental data is sent successfully.
+
+
+Start service
+~~~~~~~~~~~~~
+
+To start the front-end and back-end service, user need to specify the experiment name.
+User could specify the port by adding the parameter "front_end_port" as following
+command:
+
+.. code-block:: sh
+
+    maro inspector geo --start service --experiment_name YOUR_EXPERIMENT_NAME --front_end_port 8080
+
+----
+
+The program will automatically determine whether to use real-time mode
+or local mode according to the data status of the current experiment.
+
+Feature List
+------------
+
+For the convenience of users, Env-geographic tool implemented some features
+so that users can freely view experimental data.
+
+
+Real-time mode and local mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Local mode
+^^^^^^^^^^
+
+In this mode, user could comprehend the experimental data through the geographic
+information and the charts on both sides. By clicking the play button in the lower
+left corner of the page, user could view the dynamic changes of the data in the
+selected time window. By hovering on geographic items and charts, more detailed information
+could be displayed.
+
+
+.. image:: ../images/visualization/geographic/local_mode.gif
+   :alt: local_mode
+
+
+The chart on the right side of the page shows the changes in the data over
+a period of time from the perspectives of overall, port, and vessel.
+
+.. image:: ../images/visualization/geographic/local_mode_right_chart.gif
+   :alt: local_mode_right_chart
+
+The chart on the left side of the page shows the ranking of the carrying
+capacity of each port and the change in carrying capacity between ports
+in the entire time window.
+
+.. image:: ../images/visualization/geographic/local_mode_left_chart.gif
+   :alt: local_mode_left_chart
+
+Real-time mode
+^^^^^^^^^^^^^^
+
+The feature of real-time mode is not much different from that of local mode.
+The particularity of real-time mode lies in the data. The automatic playback
+speed of the progress bar in the front-end page is often close to the speed
+of the experimental data. So user could not select the time window freely in
+this mode.
+
+Besides, user could change the mode by clicking. If user choose to view the
+local data under real-time mode, the experimental data generated so far could
+be displayed.
+
+.. image:: ../images/visualization/geographic/real_time_mode.gif
+   :alt: real_time_mode
+
+Geographic data display
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In the map on the page, user can view the specific status of different resource
+holders at various times. Users can further understand a specific area by zooming the map.
+Among them, the three different status of the port:
+Surplus, Deficit and Balance represent the quantitative relationship between the
+empty container volume and the received order volume of the corresponding port
+at that time.
+
+.. image:: ../images/visualization/geographic/geographic_data_display.gif
+   :alt: geographic_data_display
+
+Data chart display
+~~~~~~~~~~~~~~~~~~
+The ranking table on the right side of the page shows the throughput of routes and
+ports over a period of time. While the heat-map shows the throughput between ports
+over a period of time. User can hover to specific elements to view data information.
+
+The chart on the left shows the order volume and empty container information of each
+port and each vessel. User can view the data of different resource holders by switching options.
+
+In addition, user can zoom the chart to display information more clearly.
+
+.. image:: ../images/visualization/geographic/data_chart_display.gif
+   :alt: data_chart_display
+
+Time window selection
+~~~~~~~~~~~~~~~~~~~~~
+
+This feature is only valid in local mode. User can select the starting point position by
+sliding to select the left starting point of the time window, and view the specific data at
+different time.
+
+In addition, the user can freely choose the end of the time window. When the user plays this tool,
+it will loop in the time window selected by the user.
+
+.. image:: ../images/visualization/geographic/time_window_selection.gif
+   :alt: time_window_selection
--- a/docs/source/key_components/orchestration.rst
+++ b/docs/source/key_components/orchestration.rst
@ -1,4 +1,3 @@
-
 Distributed Orchestration
 =========================

@ -7,20 +6,20 @@ on cloud computing service like `Azure <https://azure.microsoft.com/en-us/>`_.
 These CLI commands can also be used to schedule the training jobs with the
 specified resource requirements. In MARO, all training job related components
 are dockerized for easy deployment and resource allocation. It provides a unified
-abstraction/interface for different orchestration framework
-(e.g. `Grass <#id3>`_\ , `Kubernetes <#id4>`_\ ).
+abstraction/interface for different orchestration framework see
+(e.g. :ref:`Grass`, :ref:`K8s` ).

 .. image:: ../images/distributed/orch_overview.svg
   :target: ../images/distributed/orch_overview.svg
   :alt: Orchestration Overview
-   :width: 600
+   :width: 650

 Process
 -------

 The process mode is part of the `MARO CLI`, which uses multi-processes to start the 
-training jobs in the localhost environment. To align with `Grass <#id3>`_ and `Kubernetes 
-<#id4>`_, the process mode also uses Redis for job management. The process mode tries 
+training jobs in the localhost environment. To align with :ref:`Grass` and :ref:`K8s`,
+the process mode also uses Redis for job management. The process mode tries
 to simulate the operation of the real distributed cluster in localhost so that users can smoothly
 deploy their code to the distributed cluster. Meanwhile, through the training in the process mode, 
 it is a cheaper way to find bugs that will happens during the real distributed training. 
@ -44,59 +43,118 @@ to get how to use it.
 .. image:: ../images/distributed/orch_process.svg
   :target: ../images/distributed/orch_process.svg
   :alt: Orchestration Process Mode on Local
-   :width: 300
+   :width: 250
+
+.. _grass:

 Grass
 -----

-Grass is a self-designed, development purpose orchestration framework. It can be
+Grass is an orchestration framework developed by the MARO team. It can be
 confidently applied to small/middle size cluster (< 200 nodes). The design goal
-of Grass is to speed up the distributed algorithm prototype development.
+of Grass is to speed up the development of distributed algorithm prototypes.
 It has the following advantages:

 * Fast deployment in a small cluster.
 * Fine-grained resource management.
-* Lightweight, no other dependencies are required.
+* Lightweight, no complex dependencies required.

-In the Grass mode:
-
-* All VMs will be deployed in the same virtual network for a faster, more stable
-  connection and larger bandwidth. Please note that the maximum number of VMs is
-  limited by the `available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
-* It is a centralized topology, the master node will host Redis service for peer
-  discovering, Fluentd service for log collecting, SMB service for file sharing.
-* On each VM, the probe (worker) agent is used to track the computing resources
-  and detect abnormal events.
-
-Check `Grass Cluster Provisioning on Azure <../installation/grass_cluster_provisioning_on_azure.html>`_
+Check :ref:`Grass Cluster Provisioning on Azure <grass-azure-cluster-provisioning>` and
+:ref:`Grass Cluster Provisioning in On-Premises Environment <grass-on-premises-cluster-provisioning>`
 to get how to use it.

+Modes
+^^^^^
+
+We currently have two modes in Grass, and you can choose whichever you want to create a Grass cluster.
+
+**grass/azure**
+
+* Create a Grass cluster with Azure.
+* With a valid Azure subscription, you can create a cluster with one command from ground zero.
+* You can easily scale up/down nodes as needed,
+  and start/stop nodes to save costs without messing up the current environment.
+* Please note that the maximum number of VMs in grass/azure is limited by the
+  `available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
+
+**grass/on-premises**
+
+* Create a Grass cluster with machines on hand.
+* You can join a machine to the cluster if the machine is in the same private network as the Master.
+
+
+Components
+^^^^^^^^^^
+Here's the diagram of a Grass cluster with all the components tied together.
+
 .. image:: ../images/distributed/orch_grass.svg
   :target: ../images/distributed/orch_grass.svg
   :alt: Orchestration Grass Mode in Azure
-   :width: 600
+   :width: 650

-Kubernetes
----------
+|
+
+Master Components
+
+* redis: A centralized DB for runtime data storage.
+* fluentd: A centralized data collector for log collecting.
+* samba-server: For file sharing within the whole cluster.
+* master-agent: A daemon service for status monitoring and job scheduling.
+* master-api-server: A RESTFul server for cluster management.
+  The MARO CLI can access this server to control cluster and get cluster information in an encryption session.
+
+Node Components
+
+* samba-client: For file sharing.
+* node-agent: A daemon service for tracking the computing resources and container statues of the node.
+* node-api-server: An internal RESTFul server for node management.
+
+
+Communications
+^^^^^^^^^^^^^^
+
+Outer Environment to the Master
+
+* The communications from outer environment to the Master is encrypted.
+* Grass will use the following paths in the OuterEnv-Master communications:
+
+  * SSH tunnel: For file transfer and script execution.
+  * HTTP connection: For connection with master-api-server, use RSA+AES hybrid encryption.
+
+Communications within the Cluster
+
+* The communications within the cluster is not encrypted.
+* Therefore, user has the responsibility to make sure all Nodes are connected within a private network and
+  restrict external connections in the cluster.
+
+
+.. _k8s:
+
+K8s
+---

 MARO also supports Kubernetes (k8s) as an orchestration option.
-With this widely used framework, you can easily build up your training cluster
+With this widely adopted framework, you can easily build up your MARO Cluster
 with hundreds and thousands of nodes. It has the following advantages:

 * Higher durability.
 * Better scalability.

-In the Kubernetes mode:

-* The dockerized job component runs in Kubernetes pod, and each pod only hosts
-  one component.
-* All Kubernetes pods are registered into the same virtual network using
-  `Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
-
-Check `K8S Cluster Provisioning on Azure <../installation/k8s_cluster_provisioning_on_azure.html>`_
-to get how to use it.
+We currently support the k8s/aks mode in Kubernetes, and it has the following features:

 .. image:: ../images/distributed/orch_k8s.svg
   :target: ../images/distributed/orch_k8s.svg
   :alt: Orchestration K8S Mode in Azure
-   :width: 600
+   :width: 650
+
+|
+
+* The dockerized job component runs in Kubernetes Pod, and each Pod only hosts one component.
+* All Kubernetes Pods are registered into the same virtual network using
+  `Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
+* Azure File Service is used for file sharing in all Pods.
+* Azure Container Registry is included for image management.
+
+Check :ref:`K8S Cluster Provisioning on Azure <k8s-aks-cluster-provisioning>`
+to see how to use it.
--- a/docs/source/key_components/rl_toolkit.rst
+++ b/docs/source/key_components/rl_toolkit.rst
@ -2,112 +2,22 @@
 RL Toolkit
 ==========

-MARO provides a full-stack abstraction for reinforcement learning (RL), which
-empowers users to easily apply predefined and customized components to different
-scenarios in a scalable way. The main abstractions include
-`Learner, Actor <#learner-and-actor>`_\ , `Agent Manager <#agent-manager>`_\ ,
-`Agent <#agent>`_\ , `Algorithm <#algorithm>`_\ ,
-`State Shaper, Action Shaper, Experience Shaper <#shapers>`_\ , etc.
+MARO provides a full-stack abstraction for reinforcement learning (RL), which enables users to
+apply predefined and customized components to various scenarios. The main abstractions include
+fundamental components such as `Agent <#agent>`_\ and `Shaper <#shaper>`_\ , and training routine
+controllers such as `Actor <#actor>` and `Learner <#learner>`.

-Learner and Actor
-----------------
-
-.. image:: ../images/rl/overview.svg
-   :target: ../images/rl/overview.svg
-   :alt: RL Overview
-
-* **Learner** is the abstraction of the learnable policy. It is responsible for
-  learning a qualified policy to improve the business optimized object.
-
-  .. code-block:: python
-
-    # Train function of learner.
-    def learn(self):
-        for exploration_params in self._scheduler:
-            performance, exp_by_agent = self._actor.roll_out(
-                self._agent_manager.dump_models(),
-                exploration_params=exploration_params
-            )
-            self._scheduler.record_performance(performance)
-            self._agent_manager.train(exp_by_agent)
-
-* **Actor** is the abstraction of experience collection. It is responsible for
-  interacting with the environment and collecting experiences. The experiences
-  collected during interaction will be used for the training of the learners.
-
-  .. code-block:: python
-
-    # Rollout function of actor.
-    def roll_out(self, models=None, epsilons=None, seed: int = None):
-        self._env.reset()
-
-        # load models
-        if model_dict is not None:
-            self._agents.load_models(model_dict)
-
-        # load exploration parameters:
-        if exploration_params is not None:
-            self._agents.set_exploration_params(exploration_params)
-
-        metrics, decision_event, is_done = self._env.step(None)
-        while not is_done:
-            action = self._agents.choose_action(decision_event, self._env.snapshot_list)
-            metrics, decision_event, is_done = self._env.step(action)
-            self._agents.on_env_feedback(metrics)
-
-        details = self._agents.post_process(self._env.snapshot_list) if return_details else None
-
-        return self._env.metrics, details
-
-
-Scheduler
---------
-
-A ``Scheduler`` is the driver of an episodic learning process. The learner uses the scheduler to repeat the
-rollout-training cycle a set number of episodes. For algorithms that require explicit exploration (e.g.,
-DQN and DDPG), there are two types of schedules that a learner may follow:
-
-* Static schedule, where the exploration parameters are generated using a pre-defined function of episode
-  number. See ``LinearParameterScheduler`` and ``TwoPhaseLinearParameterScheduler`` provided in the toolkit
-  for example.
-* Dynamic schedule, where the exploration parameters for the next episode are determined based on the performance
-  history. Such a mechanism is possible in our abstraction because the scheduler provides a ``record_performance``
-  interface that allows it to keep track of roll-out performances.
-
-Optionally, an early stopping checker may be registered if one wishes to terminate training when certain performance
-requirements are satisfied, possibly before reaching the prescribed number of episodes.
-
-Agent Manager
-------------
-
-The agent manager provides a unified interactive interface with the environment
-for RL agent(s). From the actor's perspective, it isolates the complex dependencies
-of the various homogeneous/heterogeneous agents, so that the whole agent manager
-will behave just like a single agent. Furthermore, to well serve the distributed algorithm
-(scalable), the agent manager provides two kinds of working modes, which can be applied in
-different distributed components, such as inference mode in actor, training mode in learner.
-
-.. image:: ../images/rl/agent_manager.svg
-   :target: ../images/rl/agent_manager.svg
-   :alt: Agent Manager
-   :width: 750
-
-* In **inference mode**\ , the agent manager is responsible to access and shape
-  the environment state for the related agent, convert the model action to an
-  executable environment action, and finally generate experiences from the
-  interaction trajectory.
-* In **training mode**\ , the agent manager will optimize the underlying model of
-  the related agent(s), based on the collected experiences from in the inference mode.

 Agent
 -----

-An agent is a combination of (RL) algorithm, experience pool, and a set of
-non-algorithm-specific parameters (algorithm-specific parameters are managed by
-the algorithm module). Non-algorithm-specific parameters are used to manage
-experience storage, sampling strategies, and training strategies. Since all kinds
-of scenario-specific stuff will be handled by the agent manager, the agent is
-scenario agnostic.
+The Agent is the kernel abstraction of the RL formulation for a real-world problem. 
+Our abstraction decouples agent and its underlying model so that an agent can exist 
+as an RL paradigm independent of the inner workings of the models it uses to generate 
+actions or estimate values. For example, the actor-critic algorithm does not need to 
+concern itself with the structures and optimizing schemes of the actor and critic models. 
+This decoupling is achieved by the Core Model abstraction described below.
+

 .. image:: ../images/rl/agent.svg
   :target: ../images/rl/agent.svg
@ -116,96 +26,96 @@ scenario agnostic.
 .. code-block:: python

  class AbsAgent(ABC):
-      def __init__(self, name: str, algorithm: AbsAlgorithm, experience_pool: AbsStore = None):
-        self._name = name
-        self._algorithm = algorithm
-        self._experience_pool = experience_pool
+      def __init__(self, model: AbsCoreModel, config, experience_pool=None):
+          self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+          self.model = model.to(self.device)
+          self.config = config
+          self._experience_pool = experience_pool


-Algorithm
---------
-
-The algorithm is the kernel abstraction of the RL formulation for a real-world problem. Our abstraction
-decouples algorithm and model so that an algorithm can exist as an RL paradigm independent of the inner
-workings of the models it uses to generate actions or estimate values. For example, the actor-critic
-algorithm does not need to concern itself with the structures and optimizing schemes of the actor and
-critic models. This decoupling is achieved by the ``LearningModel`` abstraction described below.
-
-
-.. image:: ../images/rl/algorithm.svg
-   :target: ../images/rl/algorithm.svg
-   :alt: Algorithm
-   :width: 650
-
-* ``choose_action`` is used to make a decision based on a provided model state.
-* ``train`` is used to trigger training and the policy update from external.
-
-.. code-block:: python
-
-  class AbsAlgorithm(ABC):
-      def __init__(self, model: LearningModel, config):
-          self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-          self._model = model.to(self._device)
-          self._config = config
-
-
-Block, NNStack and LearningModel
--------------------------------
+Core Model
+----------

 MARO provides an abstraction for the underlying models used by agents to form policies and estimate values.
-The abstraction consists of a 3-level hierachy formed by ``AbsBlock``, ``NNStack`` and ``LearningModel`` from
-the bottom up, all of which subclass torch's nn.Module. An ``AbsBlock`` is the smallest structural
-unit of an NN-based model. For instance, the ``FullyConnectedBlock`` provided in the toolkit represents a stack
-of fully connected layers with features like batch normalization, drop-out and skip connection. An ``NNStack`` is
-a composite network that consists of one or more such blocks, each with its own set of network features.
-The complete model as used directly by an ``Algorithm`` is represented by a ``LearningModel``, which consists of
-one or more task stacks as "heads" and an optional shared stack at the bottom (which serves to produce representations
-as input to all task stacks). It also contains one or more optimizers responsible for applying gradient steps to the
-trainable parameters within each stack, which is the smallest trainable unit from the perspective of a ``LearningModel``.
-The assignment of optimizers is flexible: it is possible to freeze certain stacks while optimizing others. Such an
-abstraction presents a unified interface to the algorithm, regardless of how many individual models it requires and how
+The abstraction consists of ``AbsBlock`` and ``AbsCoreModel``, both of which subclass torch's nn.Module. 
+The ``AbsBlock`` represents the smallest structural unit of an NN-based model. For instance, the ``FullyConnectedBlock`` 
+provided in the toolkit is a stack of fully connected layers with features like batch normalization,
+drop-out and skip connection. The ``AbsCoreModel`` is a collection of network components with
+embedded optimizers and serves as an agent's "brain" by providing a unified interface to it. regardless of how many individual models it requires and how
 complex the model architecture might be.

-.. image:: ../images/rl/learning_model.svg
-   :target: ../images/rl/learning_model.svg
-   :alt: Algorithm
-   :width: 650
-
 As an example, the initialization of the actor-critic algorithm may look like this:

 .. code-block:: python

-  actor_stack = NNStack(name="actor", block_a1, block_a2, ...)
-  critic_stack = NNStack(name="critic", block_c1, block_c2, ...)
-  learning_model = LearningModel(actor_stack, critic_stack)
-  actor_critic = ActorCritic(learning_model, config)
+  actor_stack = FullyConnectedBlock(...)
+  critic_stack = FullyConnectedBlock(...)
+  model = SimpleMultiHeadModel(
+      {"actor": actor_stack, "critic": critic_stack},
+      optim_option={
+        "actor": OptimizerOption(cls=Adam, params={"lr": 0.001})
+        "critic": OptimizerOption(cls=RMSprop, params={"lr": 0.0001})  
+      }
+  )
+  agent = ActorCritic("actor_critic", learning_model, config)

 Choosing an action is simply:

 .. code-block:: python

-  learning_model(state, task_name="actor", is_training=False)
+  model(state, task_name="actor", training=False)

 And performing one gradient step is simply:

 .. code-block:: python

-  learning_model.learn(critic_loss + actor_loss)
+  model.learn(critic_loss + actor_loss)


 Explorer
-------
+--------

 MARO provides an abstraction for exploration in RL. Some RL algorithms such as DQN and DDPG require
-explicit exploration, the extent of which is usually determined by a set of parameters whose values
-are generated by the scheduler. The ``AbsExplorer`` class is designed to cater to these needs. Simple
-exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space and ``UniformNoiseExplorer``
-and ``GaussianNoiseExplorer`` for continuous action space, are provided in the toolkit.
+explicit exploration governed by a set of parameters. The ``AbsExplorer`` class is designed to cater
+to these needs. Simple exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space
+and ``UniformNoiseExplorer`` and ``GaussianNoiseExplorer`` for continuous action space, are provided in
+the toolkit.

 As an example, the exploration for DQN may be carried out with the aid of an ``EpsilonGreedyExplorer``:

 .. code-block:: python

  explorer = EpsilonGreedyExplorer(num_actions=10)
-  greedy_action = learning_model(state, is_training=False).argmax(dim=1).data
-  exploration_action = explorer(greedy_action)
+  greedy_action = learning_model(state, training=False).argmax(dim=1).data
+  exploration_action = explorer(greedy_action)
+
+
+Tools for Training
+------------------------------
+
+.. image:: ../images/rl/learner_actor.svg
+   :target: ../images/rl/learner_actor.svg
+   :alt: RL Overview
+
+The RL toolkit provides tools that make local and distributed training easy:
+* Learner, the central controller of the learning process, which consists of collecting simulation data from
+  remote actors and training the agents with them. The training data collection can be done in local or
+  distributed fashion by loading an ``Actor`` or ``ActorProxy`` instance, respectively.  
+* Actor, which implements the ``roll_out`` method where the agent interacts with the environment for one
+  episode. It consists of an environment instance and an agent (a single agent or multiple agents wrapped by
+  ``MultiAgentWrapper``). The class provides the as_worker() method which turns it to an event loop where roll-outs
+  are performed on the learner's demand. In distributed RL, there are typically many actor processes running
+  simultaneously to parallelize training data collection.
+* Actor proxy, which also implements the ``roll_out`` method with the same signature, but manages a set of remote
+  actors for parallel data collection.
+* Trajectory, which is primarily responsible for translating between scenario-specific information and model
+  input / output. It implements the following methods which are used as callbacks in the actor's roll-out loop: 
+  * ``get_state``, which converts observations of an environment into model input. For example, the observation
+    may be represented by a multi-level data structure, which gets encoded by a state shaper to a one-dimensional
+    vector as input to a neural network. The state shaper usually goes hand in hand with the underlying policy
+    or value models. 
+  * ``get_action``, which provides model output with necessary context so that it can be executed by the
+    environment simulator.
+  * ``get_reward``, which computes a reward for a given action.
+  * ``on_env_feedback``, which defines things to do upon getting feedback from the environment.  
+  * ``on_finish``, which defines things to do upon completion of a roll-out episode.
--- a/docs/source/scenarios/command_line.rst
+++ b/docs/source/scenarios/command_line.rst
@ -0,0 +1,88 @@
+Command support for scenarios
+=================================
+
+After installation, MARO provides a command that generate project for user,
+make it much easier to use or customize scenario.
+
+
+.. code-block:: sh
+
+    maro project new
+
+This command will show a step-by-step wizard to create a new project under current folder.
+Currently it supports 2 modes.
+
+
+1. Use built-in scenarios
+-------------------------
+
+To use built-in scenarios, please agree the first option "Use built-in scenario" with "yes" or "y", default is "yes".
+Then you can select a built-in scenario and topologies with auto-completing.
+
+.. code-block:: sh
+
+    Use built-in scenario?yes
+    Scenario name:cim
+    Use built-in topology (configuration)?yes
+    Topology name to use:global_trade.22p_l0.0
+    Durations to emulate:1024
+    Number of episodes to emulate:500
+    {'durations': 1024,
+    'scenario': 'cim',
+    'topology': 'global_trade.22p_l0.0',
+    'total_episodes': 500,
+    'use_builtin_scenario': True,
+    'use_builtin_topology': True}
+
+    Is this OK?yes
+
+If these settings correct, then this command will create a runner.py script, you can just run with:
+
+.. code-block:: sh
+
+    python runner.py
+
+This script contains minimal code to interactive with environment without any action, you can then extend it as you wish.
+
+Also you can create you own topology (configuration) if you say "no" for options "Use built-in topology (configuration)?".
+It will ask you for a name of new topology, then copy the content from built-in one into your working folder (topologies/your_topology_name/config.yml).
+
+
+2. Customized scenario
+-------------------------------
+
+This mode is used to generate a template of customize scenario for you instead of writing it from scratch.
+To enable this, say "no" for option "Use built-in scenario", then provide your scenario name, default is current folder name.
+
+.. code-block:: sh
+
+    Use built-in scenario?no
+    New scenario name:my_test
+    New topology name:my_test
+    Durations to emulate:1000
+    Number of episodes to emulate:100
+    {'durations': 1000,
+    'scenario': 'my_test',
+    'topology': 'my_test',
+    'total_episodes': 100,
+    'use_builtin_scenario': False,
+    'use_builtin_topology': False}
+
+    Is this OK?yes
+
+This will generate following files like below:
+
+.. code-block:: sh
+
+    -- runner.py
+    -- scenario
+        -- business_engine.py
+        -- common.py
+        -- events.py
+        -- frame_builder.py
+        -- topologies
+            -- my_test
+                -- config.yml
+
+The script "runner.py" is the entry of this project, it will interactive with your scenario without action.
+Then you can fill "scenario/business_engine.py" with your own logic.
--- a/examples/init.py
+++ b/examples/init.py
@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
--- a/examples/cim/README.md
+++ b/examples/cim/README.md
@ -0,0 +1,11 @@
+# Container Inventory Management
+
+Container inventory management (CIM) is a scenario where reinforcement learning (RL) can potentially prove useful. Three algorithms are used to learn the multi-agent policy in given environments. Each algorithm has a ``config`` folder which contains ``agent_config.py`` and ``training_config.py``. The former contains parameters for the underlying models and algorithm specific hyper-parameters. The latter contains parameters for the environment and the main training loop. The file ``common.py`` contains parameters and utility functions shared by some or all of these algorithms. 
+
+In the ``ac`` folder, , the policy is trained using the Actor-Critc algorithm in single-threaded fashion. The example can be run by simply executing ``python3 main.py``. Logs will be saved in a file named ``cim-ac.CURRENT_TIME_STAMP.log`` under the ``ac/logs`` folder, where ``CURRENT_TIME_STAMP`` is the time of executing the script. 
+
+In the ``dqn`` folder, the policy is trained using the DQN algorithm in multi-process / distributed mode. This example can be run in three ways. 
+* ``python3 main.py`` or ``python3 main.py -w 0`` runs the example in multi-process mode, in which a main process spawns one learner process and a number of actor processes as specified in ``config/training_config.py``.
+* ``python3 main.py -w 1`` launches the learner process only. This is for distributed training and expects a number of actor processes (as specified in ``config/training_config.py``) running on some other node(s).
+* ``python3 main.py -w 2`` launches the actor process only. This is for distributed training and expects a learner process running on some other node.
+Logs will be saved in a file named ``GROUP_NAME.log`` under the ``{ac_gnn, dqn}/logs`` folder, where ``GROUP_NAME`` is specified in the "group" field in ``config/training_config.py``.
--- a/examples/cim/init.py
+++ b/examples/cim/init.py
@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
--- a/examples/cim/ac/config/init.py
+++ b/examples/cim/ac/config/init.py
@ -0,0 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from .agent_config import agent_config
+from .training_config import training_config
+
+__all__ = ["agent_config", "training_config"]
--- a/examples/cim/ac/config/agent_config.py
+++ b/examples/cim/ac/config/agent_config.py
@ -0,0 +1,52 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from torch import nn
+from torch.optim import Adam, RMSprop
+
+from maro.rl import OptimOption
+
+from examples.cim.common import common_config
+
+input_dim = (
+    (common_config["look_back"] + 1) *
+    (common_config["max_ports_downstream"] + 1) *
+    len(common_config["port_attributes"]) +
+    len(common_config["vessel_attributes"])
+)
+
+agent_config = {
+    "model": {
+        "actor": {
+            "input_dim": input_dim,
+            "output_dim": len(common_config["action_space"]),
+            "hidden_dims": [256, 128, 64],
+            "activation": nn.Tanh,
+            "softmax": True,
+            "batch_norm": False,
+            "head": True
+        },
+        "critic": {
+            "input_dim": input_dim,
+            "output_dim": 1,
+            "hidden_dims": [256, 128, 64],
+            "activation": nn.LeakyReLU,
+            "softmax": False,
+            "batch_norm": True,
+            "head": True
+        }
+    },
+    "optimization": {
+        "actor": OptimOption(optim_cls=Adam, optim_params={"lr": 0.001}),
+        "critic": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.001})
+    },
+    "hyper_params": {
+        "reward_discount": .0,
+        "critic_loss_func": nn.SmoothL1Loss(),
+        "train_iters": 10,
+        "actor_loss_coefficient": 0.1,
+        "k": 1,
+        "lam": 0.0
+        # "clip_ratio": 0.8
+    }
+}
--- a/examples/cim/ac/config/training_config.py
+++ b/examples/cim/ac/config/training_config.py
@ -0,0 +1,11 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+training_config = {
+    "env": {
+        "scenario": "cim",
+        "topology": "toy.4p_ssdd_l0.0",
+        "durations": 1120,
+    },
+    "max_episode": 50
+}
--- a/examples/cim/ac/main.py
+++ b/examples/cim/ac/main.py
@ -0,0 +1,59 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from collections import defaultdict, deque
+from os import makedirs, system
+from os.path import dirname, join, realpath
+
+import numpy as np
+from torch import nn
+from torch.optim import Adam, RMSprop
+
+from maro.rl import (
+    Actor, ActorCritic, ActorCriticConfig, FullyConnectedBlock, MultiAgentWrapper, SimpleMultiHeadModel,
+    Scheduler, OnPolicyLearner
+)
+from maro.simulator import Env
+from maro.utils import Logger, set_seeds
+
+from examples.cim.ac.config import agent_config, training_config
+from examples.cim.common import CIMTrajectory, common_config
+
+
+def get_ac_agent():
+    actor_net = FullyConnectedBlock(**agent_config["model"]["actor"])
+    critic_net = FullyConnectedBlock(**agent_config["model"]["critic"])
+    ac_model = SimpleMultiHeadModel(
+        {"actor": actor_net, "critic": critic_net}, optim_option=agent_config["optimization"],
+    )
+    return ActorCritic(ac_model, ActorCriticConfig(**agent_config["hyper_params"]))
+
+
+class CIMTrajectoryForAC(CIMTrajectory):
+    def on_finish(self):
+        training_data = {}
+        for event, state, action in zip(self.trajectory["event"], self.trajectory["state"], self.trajectory["action"]):
+            agent_id = list(state.keys())[0]
+            data = training_data.setdefault(agent_id, {"args": [[] for _ in range(4)]})
+            data["args"][0].append(state[agent_id])  # state
+            data["args"][1].append(action[agent_id][0])  # action
+            data["args"][2].append(action[agent_id][1])  # log_p
+            data["args"][3].append(self.get_offline_reward(event))  # reward
+
+        for agent_id in training_data:
+            training_data[agent_id]["args"] = [
+                np.asarray(vals, dtype=np.float32 if i == 3 else None)
+                for i, vals in enumerate(training_data[agent_id]["args"])
+            ]
+
+        return training_data
+
+
+# Single-threaded launcher
+if __name__ == "__main__":
+    set_seeds(1024)  # for reproducibility
+    env = Env(**training_config["env"])
+    agent = MultiAgentWrapper({name: get_ac_agent() for name in env.agent_idx_list})
+    actor = Actor(env, agent, CIMTrajectoryForAC, trajectory_kwargs=common_config)  # local actor
+    learner = OnPolicyLearner(actor, training_config["max_episode"])
+    learner.run()
--- a/examples/cim/common.py
+++ b/examples/cim/common.py
@ -0,0 +1,99 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from collections import defaultdict
+
+import numpy as np
+
+from maro.rl import Trajectory
+from maro.simulator.scenarios.cim.common import Action, ActionType
+
+common_config = {
+    "port_attributes": ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"],
+    "vessel_attributes": ["empty", "full", "remaining_space"],
+    "action_space": list(np.linspace(-1.0, 1.0, 21)),
+    # Parameters for computing states
+    "look_back": 7,
+    "max_ports_downstream": 2,
+    # Parameters for computing actions
+    "finite_vessel_space": True,
+    "has_early_discharge": True,
+    # Parameters for computing rewards
+    "reward_time_window": 99,
+    "fulfillment_factor": 1.0,
+    "shortage_factor": 1.0,
+    "time_decay": 0.97
+}
+
+
+class CIMTrajectory(Trajectory):
+    def __init__(
+        self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
+        reward_time_window, fulfillment_factor, shortage_factor, time_decay,
+        finite_vessel_space=True, has_early_discharge=True 
+    ):
+        super().__init__(env)
+        self.port_attributes = port_attributes
+        self.vessel_attributes = vessel_attributes
+        self.action_space = action_space
+        self.look_back = look_back
+        self.max_ports_downstream = max_ports_downstream
+        self.reward_time_window = reward_time_window
+        self.fulfillment_factor = fulfillment_factor
+        self.shortage_factor = shortage_factor
+        self.time_decay = time_decay
+        self.finite_vessel_space = finite_vessel_space
+        self.has_early_discharge = has_early_discharge
+
+    def get_state(self, event):
+        vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
+        tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
+        ticks = [tick - rt for rt in range(self.look_back - 1)]
+        future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
+        port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
+        vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
+        return {port_idx: np.concatenate((port_features, vessel_features))}
+
+    def get_action(self, action_by_agent, event):
+        vessel_snapshots = self.env.snapshot_list["vessels"]
+        action_info = list(action_by_agent.values())[0]
+        model_action = action_info[0] if isinstance(action_info, tuple) else action_info
+        scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
+        zero_action_idx = len(self.action_space) / 2  # index corresponding to value zero.
+        vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
+        early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
+        percent = abs(self.action_space[model_action])
+
+        if model_action < zero_action_idx:
+            action_type = ActionType.LOAD
+            actual_action = min(round(percent * scope.load), vessel_space)
+        elif model_action > zero_action_idx:
+            action_type = ActionType.DISCHARGE
+            plan_action = percent * (scope.discharge + early_discharge) - early_discharge
+            actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
+        else:
+            actual_action, action_type = 0, None
+
+        return {port: Action(vessel, port, actual_action, action_type)}
+
+    def get_offline_reward(self, event):
+        port_snapshots = self.env.snapshot_list["ports"]
+        start_tick = event.tick + 1
+        ticks = list(range(start_tick, start_tick + self.reward_time_window))
+
+        future_fulfillment = port_snapshots[ticks::"fulfillment"]
+        future_shortage = port_snapshots[ticks::"shortage"]
+        decay_list = [
+            self.time_decay ** i for i in range(self.reward_time_window)
+            for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
+        ]
+
+        tot_fulfillment = np.dot(future_fulfillment, decay_list)
+        tot_shortage = np.dot(future_shortage, decay_list)
+
+        return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
+
+    def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
+        self.trajectory["event"].append(event)
+        self.trajectory["state"].append(state_by_agent)
+        self.trajectory["action"].append(action_by_agent)
--- a/examples/cim/dqn/README.md
+++ b/examples/cim/dqn/README.md
@ -1,24 +0,0 @@
-# Overview
-
-The CIM problem is one of the quintessential use cases of MARO. The example can
-be run with a set of scenario configurations that can be found under
-maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
-topology, type of algorithm to use, number of training episodes) can be configured
-through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
-all algorithm-specific parameters can be configured through
-the config.py file in that folder.
-
-## Single-host Single-process Mode
-
-To run the CIM example using the DQN algorithm under single-host mode, go to
-examples/cim/dqn and run single_process_launcher.py. You may play around with
-the configuration if you want to try out different settings.
-
-## Distributed Mode
-
-The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
-for distributed training. For debugging purposes, we provide a script that
-simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
-and execute python3 multi_process_launcher.py \[GROUP_NAME\] \[NUM_ACTORS\], where 
-GROUP_NAME is the identifier for the current run and NUM_ACTORS is the number of actor 
-processes to launch.  
--- a/examples/cim/dqn/init.py
+++ b/examples/cim/dqn/init.py
@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
--- a/examples/cim/dqn/components/init.py
+++ b/examples/cim/dqn/components/init.py
@ -1,14 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from .action_shaper import CIMActionShaper
-from .agent_manager import DQNAgentManager, create_dqn_agents
-from .experience_shaper import TruncatedExperienceShaper
-from .state_shaper import CIMStateShaper
-
-__all__ = [
-    "CIMActionShaper",
-    "DQNAgentManager", "create_dqn_agents",
-    "TruncatedExperienceShaper",
-    "CIMStateShaper"
-]
--- a/examples/cim/dqn/components/action_shaper.py
+++ b/examples/cim/dqn/components/action_shaper.py
@ -1,36 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from maro.rl import ActionShaper
-from maro.simulator.scenarios.cim.common import Action
-
-
-class CIMActionShaper(ActionShaper):
-    def __init__(self, action_space):
-        super().__init__()
-        self._action_space = action_space
-        self._zero_action_index = action_space.index(0)
-
-    def __call__(self, model_action, decision_event, snapshot_list):
-        scope = decision_event.action_scope
-        tick = decision_event.tick
-        port_idx = decision_event.port_idx
-        vessel_idx = decision_event.vessel_idx
-
-        port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
-        vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
-        early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
-        assert 0 <= model_action < len(self._action_space)
-
-        if model_action < self._zero_action_index:
-            actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
-        elif model_action > self._zero_action_index:
-            plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
-            actual_action = (
-                round(plan_action) if plan_action > 0
-                else round(self._action_space[model_action] * scope.discharge)
-            )
-        else:
-            actual_action = 0
-
-        return Action(vessel_idx, port_idx, actual_action)
--- a/examples/cim/dqn/components/agent.py
+++ b/examples/cim/dqn/components/agent.py
@ -1,60 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-import pickle
-
-import numpy as np
-
-from maro.rl import AbsAgent, ColumnBasedStore
-
-
-class DQNAgent(AbsAgent):
-    """Implementation of AbsAgent for the DQN algorithm.
-
-    Args:
-        name (str): Agent's name.
-        algorithm (AbsAlgorithm): A concrete algorithm instance that inherits from AbstractAlgorithm.
-        experience_pool (AbsStore): It is used to store experiences processed by the experience shaper, which will be
-            used by some value-based algorithms, such as DQN.
-        min_experiences_to_train: minimum number of experiences required for training.
-        num_batches: number of batches to train the DQN model on per call to ``train``.
-        batch_size: mini-batch size.
-    """
-    def __init__(
-        self,
-        name: str,
-        algorithm,
-        experience_pool: ColumnBasedStore,
-        min_experiences_to_train,
-        num_batches,
-        batch_size
-    ):
-        super().__init__(name, algorithm, experience_pool=experience_pool)
-        self._min_experiences_to_train = min_experiences_to_train
-        self._num_batches = num_batches
-        self._batch_size = batch_size
-
-    def train(self):
-        """Implementation of the training loop for DQN.
-
-        Experiences are sampled using their TD errors as weights. After training, the new TD errors are updated
-        in the experience pool.
-        """
-        if len(self._experience_pool) < self._min_experiences_to_train:
-            return
-
-        for _ in range(self._num_batches):
-            indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
-            state = np.asarray(sample["state"])
-            action = np.asarray(sample["action"])
-            reward = np.asarray(sample["reward"])
-            next_state = np.asarray(sample["next_state"])
-            loss = self._algorithm.train(state, action, reward, next_state)
-            self._experience_pool.update(indexes, {"loss": loss})
-
-    def dump_experience_pool(self, dir_path: str):
-        """Dump the experience pool to disk."""
-        os.makedirs(dir_path, exist_ok=True)
-        with open(os.path.join(dir_path, self._name), "wb") as fp:
-            pickle.dump(self._experience_pool, fp)
--- a/examples/cim/dqn/components/agent_manager.py
+++ b/examples/cim/dqn/components/agent_manager.py
@ -1,57 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import torch.nn as nn
-from torch.optim import RMSprop
-
-from maro.rl import (
-    ColumnBasedStore, DQN, DQNConfig, FullyConnectedBlock, LearningModel, NNStack, OptimizerOptions,
-    SimpleAgentManager
-)
-from maro.utils import set_seeds
-
-from .agent import DQNAgent
-
-
-def create_dqn_agents(agent_id_list, config):
-    num_actions = config.algorithm.num_actions
-    set_seeds(config.seed)
-    agent_dict = {}
-    for agent_id in agent_id_list:
-        q_net = NNStack(
-            "q_value",
-            FullyConnectedBlock(
-                input_dim=config.algorithm.input_dim,
-                output_dim=num_actions,
-                activation=nn.LeakyReLU,
-                is_head=True,
-                **config.algorithm.model
-            )
-        )
-        learning_model = LearningModel(
-            q_net, 
-            optimizer_options=OptimizerOptions(cls=RMSprop, params=config.algorithm.optimizer)
-        )
-        algorithm = DQN(
-            learning_model,
-            DQNConfig(**config.algorithm.hyper_params, loss_cls=nn.SmoothL1Loss)
-        )
-        agent_dict[agent_id] = DQNAgent(
-            agent_id, algorithm, ColumnBasedStore(**config.experience_pool),
-            **config.training_loop_parameters
-        )
-
-    return agent_dict
-
-
-class DQNAgentManager(SimpleAgentManager):
-    def train(self, experiences_by_agent, performance=None):
-        self._assert_train_mode()
-
-        # store experiences for each agent
-        for agent_id, exp in experiences_by_agent.items():
-            exp.update({"loss": [1e8] * len(list(exp.values())[0])})
-            self.agent_dict[agent_id].store_experiences(exp)
-
-        for agent in self.agent_dict.values():
-            agent.train()
--- a/examples/cim/dqn/components/config.py
+++ b/examples/cim/dqn/components/config.py
@ -1,20 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-This file is used to load the configuration and convert it into a dotted dictionary.
-"""
-
-import io
-import os
-
-import yaml
-
-
-CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
-with io.open(CONFIG_PATH, "r") as in_file:
-    config = yaml.safe_load(in_file)
-
-DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
-with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
-    distributed_config = yaml.safe_load(in_file)
--- a/examples/cim/dqn/components/experience_shaper.py
+++ b/examples/cim/dqn/components/experience_shaper.py
@ -1,52 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from collections import defaultdict
-
-import numpy as np
-
-from maro.rl import ExperienceShaper
-
-
-class TruncatedExperienceShaper(ExperienceShaper):
-    def __init__(
-        self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float, shortage_factor: float
-    ):
-        super().__init__(reward_func=None)
-        self._time_window = time_window
-        self._time_decay_factor = time_decay_factor
-        self._fulfillment_factor = fulfillment_factor
-        self._shortage_factor = shortage_factor
-
-    def __call__(self, trajectory, snapshot_list):
-        experiences_by_agent = {}
-        for i in range(len(trajectory) - 1):
-            transition = trajectory[i]
-            agent_id = transition["agent_id"]
-            if agent_id not in experiences_by_agent:
-                experiences_by_agent[agent_id] = defaultdict(list)
-            experiences = experiences_by_agent[agent_id]
-            experiences["state"].append(transition["state"])
-            experiences["action"].append(transition["action"])
-            experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
-            experiences["next_state"].append(trajectory[i + 1]["state"])
-
-        return experiences_by_agent
-
-    def _compute_reward(self, decision_event, snapshot_list):
-        start_tick = decision_event.tick + 1
-        end_tick = decision_event.tick + self._time_window
-        ticks = list(range(start_tick, end_tick))
-
-        # calculate tc reward
-        future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
-        future_shortage = snapshot_list["ports"][ticks::"shortage"]
-        decay_list = [
-            self._time_decay_factor ** i for i in range(end_tick - start_tick)
-            for _ in range(future_fulfillment.shape[0] // (end_tick - start_tick))
-        ]
-
-        tot_fulfillment = np.dot(future_fulfillment, decay_list)
-        tot_shortage = np.dot(future_shortage, decay_list)
-
-        return np.float32(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
--- a/examples/cim/dqn/components/state_shaper.py
+++ b/examples/cim/dqn/components/state_shaper.py
@ -1,30 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import numpy as np
-
-from maro.rl import StateShaper
-
-PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
-VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
-
-
-class CIMStateShaper(StateShaper):
-    def __init__(self, *, look_back, max_ports_downstream):
-        super().__init__()
-        self._look_back = look_back
-        self._max_ports_downstream = max_ports_downstream
-        self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
-
-    def __call__(self, decision_event, snapshot_list):
-        tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
-        ticks = [tick - rt for rt in range(self._look_back - 1)]
-        future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
-        port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
-        vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
-        state = np.concatenate((port_features, vessel_features))
-        return str(port_idx), state
-
-    @property
-    def dim(self):
-        return self._dim
--- a/examples/cim/dqn/config.yml
+++ b/examples/cim/dqn/config.yml
@ -1,48 +0,0 @@
-env:
-  scenario: "cim"
-  topology: "toy.4p_ssdd_l0.0"
-  durations: 1120
-  state_shaping:
-    look_back: 7
-    max_ports_downstream: 2
-  experience_shaping:
-    time_window: 100
-    fulfillment_factor: 1.0
-    shortage_factor: 1.0
-    time_decay_factor: 0.97
-main_loop:
-  max_episode: 500
-  exploration:
-    parameter_names:
-      - "epsilon"
-    split_ep: 250
-    start_values: 0.4
-    mid_values: 0.32
-    end_values: 0.0
-agents:
-  algorithm:
-    num_actions: 21
-    model:
-      hidden_dims:
-        - 256
-        - 128
-        - 64
-      softmax_enabled: false
-      batch_norm_enabled: true
-      skip_connection_enabled: false
-      dropout_p: 0.0
-    optimizer:
-      lr: 0.05
-    hyper_params:
-      reward_discount: .0
-      target_update_frequency: 5
-      tau: 0.1
-      is_double: true
-      per_sample_td_error_enabled: true
-  experience_pool:
-    capacity: -1
-  training_loop_parameters:
-    min_experiences_to_train: 1024
-    num_batches: 10
-    batch_size: 128
-  seed: 32   # for reproducibility
--- a/examples/cim/dqn/config/init.py
+++ b/examples/cim/dqn/config/init.py
@ -0,0 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from .agent_config import agent_config
+from .training_config import training_config
+
+__all__ = ["agent_config", "training_config"]
--- a/examples/cim/dqn/config/agent_config.py
+++ b/examples/cim/dqn/config/agent_config.py
@ -0,0 +1,38 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from torch import nn
+from torch.optim import RMSprop
+
+from maro.rl import DQN, DQNConfig, FullyConnectedBlock, OptimOption, PolicyGradient, SimpleMultiHeadModel
+
+from examples.cim.common import common_config
+
+input_dim = (
+    (common_config["look_back"] + 1) *
+    (common_config["max_ports_downstream"] + 1) *
+    len(common_config["port_attributes"]) +
+    len(common_config["vessel_attributes"])
+)
+
+agent_config = {
+    "model": {
+        "input_dim": input_dim,
+        "output_dim": len(common_config["action_space"]),   # number of possible actions
+        "hidden_dims": [256, 128, 64],
+        "activation": nn.LeakyReLU,
+        "softmax": False,
+        "batch_norm": True,
+        "skip_connection": False,
+        "head": True,
+        "dropout_p": 0.0
+    },
+    "optimization": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.05}),
+    "hyper_params": {
+        "reward_discount": .0,
+        "loss_cls": nn.SmoothL1Loss,
+        "target_update_freq": 5,
+        "tau": 0.1,
+        "double": False
+    }
+}
--- a/examples/cim/dqn/config/training_config.py
+++ b/examples/cim/dqn/config/training_config.py
@ -0,0 +1,29 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+training_config = {
+    "env": {
+        "scenario": "cim",
+        "topology": "toy.4p_ssdd_l0.0",
+        "durations": 1120,
+    },
+    "max_episode": 100,
+    "exploration": {
+        "parameter_names": ["epsilon"],
+        "split": 0.5,
+        "start": 0.4,
+        "mid": 0.32,
+        "end": 0.0
+    },
+    "training": {
+        "min_experiences_to_train": 1024,
+        "train_iter": 10,
+        "batch_size": 128,
+        "prioritized_sampling_by_loss": True
+    },
+    "group": "cim-dqn",
+    "learner_update_trigger": 2,
+    "num_actors": 2,
+    "num_trainers": 4,
+    "trainer_id": 0
+}
--- a/examples/cim/dqn/dist_actor.py
+++ b/examples/cim/dqn/dist_actor.py
@ -1,49 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-
-import numpy as np
-
-
-from maro.rl import ActorWorker, AgentManagerMode, SimpleActor
-from maro.simulator import Env
-from maro.utils import convert_dottable
-
-from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
-
-
-def launch(config, distributed_config):
-    config = convert_dottable(config)
-    distributed_config = convert_dottable(distributed_config)
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    state_shaper = CIMStateShaper(**config.env.state_shaping)
-    action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions)))
-    experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
-
-    config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
-    agent_manager = DQNAgentManager(
-        name="cim_actor",
-        mode=AgentManagerMode.INFERENCE,
-        agent_dict=create_dqn_agents(agent_id_list, config.agents),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper
-    )
-    proxy_params = {
-        "group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
-        "expected_peers": {"learner": 1},
-        "redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
-        "max_retries": 15
-    }
-    actor_worker = ActorWorker(
-        local_actor=SimpleActor(env=env, agent_manager=agent_manager),
-        proxy_params=proxy_params
-    )
-    actor_worker.launch()
-
-
-if __name__ == "__main__":
-    from components.config import config, distributed_config
-    launch(config=config, distributed_config=distributed_config)
--- a/examples/cim/dqn/dist_learner.py
+++ b/examples/cim/dqn/dist_learner.py
@ -1,51 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-
-from maro.rl import (
-    ActorProxy, AgentManagerMode, SimpleLearner, TwoPhaseLinearParameterScheduler, concat_experiences_by_agent
-)
-from maro.simulator import Env
-from maro.utils import Logger, convert_dottable
-
-from components import CIMStateShaper, DQNAgentManager, create_dqn_agents
-
-
-def launch(config, distributed_config):
-    config = convert_dottable(config)
-    distributed_config = convert_dottable(distributed_config)
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-
-    config["agents"]["algorithm"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
-    agent_manager = DQNAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN,
-        agent_dict=create_dqn_agents(agent_id_list, config.agents)
-    )
-
-    proxy_params = {
-        "group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
-        "expected_peers": {
-            "actor": int(os.environ["NUM_ACTORS"] if "NUM_ACTORS" in os.environ else distributed_config.num_actors)
-        },
-        "redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
-        "max_retries": 15
-    }
-
-    learner = SimpleLearner(
-        agent_manager=agent_manager,
-        actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
-        scheduler=TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration),
-        logger=Logger("cim_learner", auto_timestamp=False)
-    )
-    learner.learn()
-    learner.test()
-    learner.dump_models(os.path.join(os.getcwd(), "models"))
-    learner.exit()
-
-
-if __name__ == "__main__":
-    from components.config import config, distributed_config
-    launch(config=config, distributed_config=distributed_config)
--- a/examples/cim/dqn/distributed_config.yml
+++ b/examples/cim/dqn/distributed_config.yml
@ -1,6 +0,0 @@
-redis:
-  hostname: "localhost"
-  port: 6379
-group: test_group
-num_actors: 1
-num_learners: 1
--- a/examples/cim/dqn/main.py
+++ b/examples/cim/dqn/main.py
@ -0,0 +1,87 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import argparse
+import time
+from collections import defaultdict
+from multiprocessing import Process
+from os import makedirs
+from os.path import dirname, join, realpath
+
+from maro.rl import (
+    Actor, ActorProxy, DQN, DQNConfig, FullyConnectedBlock, MultiAgentWrapper, OffPolicyLearner,
+    SimpleMultiHeadModel, TwoPhaseLinearParameterScheduler
+)
+from maro.simulator import Env
+from maro.utils import Logger, set_seeds
+
+from examples.cim.common import CIMTrajectory, common_config
+from examples.cim.dqn.config import agent_config, training_config
+
+
+def get_dqn_agent():
+    q_model = SimpleMultiHeadModel(
+        FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
+    )
+    return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
+
+
+class CIMTrajectoryForDQN(CIMTrajectory):
+    def on_finish(self):
+        exp_by_agent = defaultdict(lambda: defaultdict(list))
+        for i in range(len(self.trajectory["state"]) - 1):
+            agent_id = list(self.trajectory["state"][i].keys())[0]
+            exp = exp_by_agent[agent_id]
+            exp["S"].append(self.trajectory["state"][i][agent_id])
+            exp["A"].append(self.trajectory["action"][i][agent_id])
+            exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
+            exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
+
+        return dict(exp_by_agent)
+
+
+def cim_dqn_learner():
+    env = Env(**training_config["env"])
+    agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
+    scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
+    actor = ActorProxy(
+        training_config["group"], training_config["num_actors"],
+        update_trigger=training_config["learner_update_trigger"]
+    )
+    learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
+    learner.run()
+
+
+def cim_dqn_actor():
+    env = Env(**training_config["env"])
+    agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
+    actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
+    actor.as_worker(training_config["group"])
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-w", "--whoami", type=int, choices=[0, 1, 2], default=0,
+        help="Identity of this process: 0 - multi-process mode, 1 - learner, 2 - actor"
+    )
+    
+    args = parser.parse_args()
+    if args.whoami == 0:
+        actor_processes = [Process(target=cim_dqn_actor) for _ in range(training_config["num_actors"])]
+        learner_process = Process(target=cim_dqn_learner)
+
+        for i, actor_process in enumerate(actor_processes):
+            set_seeds(i)  # this is to ensure that the actors explore differently.
+            actor_process.start()
+
+        learner_process.start()
+
+        for actor_process in actor_processes:
+            actor_process.join()
+
+        learner_process.join()
+    elif args.whoami == 1:
+        cim_dqn_learner()
+    elif args.whoami == 2:
+        cim_dqn_actor()
--- a/examples/cim/dqn/multi_process_launcher.py
+++ b/examples/cim/dqn/multi_process_launcher.py
@ -1,25 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-This script is used to debug distributed algorithm in single host multi-process mode.
-"""
-
-import argparse
-import os
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("group_name", help="group name")
-    parser.add_argument("num_actors", type=int, help="number of actors")
-    args = parser.parse_args()
-    learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
-    actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
-
-    # Launch the learner process
-    os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
-
-    # Launch the actor processes
-    for _ in range(args.num_actors):
-        os.system(f"GROUP={args.group_name} python " + actor_path)
--- a/examples/cim/dqn/single_process_launcher.py
+++ b/examples/cim/dqn/single_process_launcher.py
@ -1,53 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-
-import numpy as np
-
-from maro.rl import AgentManagerMode, SimpleActor, SimpleLearner, TwoPhaseLinearParameterScheduler
-from maro.simulator import Env
-from maro.utils import LogFormat, Logger, convert_dottable
-
-from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
-
-
-def launch(config):
-    config = convert_dottable(config)
-    # Step 1: Initialize a CIM environment for using a toy dataset.
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    action_space = list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions))
-
-    # Step 2: Create state, action and experience shapers. We also need to create an explorer here due to the
-    # greedy nature of the DQN algorithm.
-    state_shaper = CIMStateShaper(**config.env.state_shaping)
-    action_shaper = CIMActionShaper(action_space=action_space)
-    experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
-
-    # Step 3: Create agents and an agent manager.
-    config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
-    agent_manager = DQNAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN_INFERENCE,
-        agent_dict=create_dqn_agents(agent_id_list, config.agents),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper
-    )
-
-    # Step 4: Create an actor and a learner to start the training process.
-    scheduler = TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration)
-    actor = SimpleActor(env, agent_manager)
-    learner = SimpleLearner(
-        agent_manager, actor, scheduler,
-        logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
-    )
-    learner.learn()
-    learner.test()
-    learner.dump_models(os.path.join(os.getcwd(), "models"))
-
-
-if __name__ == "__main__":
-    from components.config import config
-    launch(config)
--- a/examples/cim/gnn/components/init.py
+++ b/examples/cim/gnn/components/init.py
@ -1,13 +0,0 @@
-from .actor import ParallelActor
-from .agent_manager import SimpleAgentManger
-from .learner import GNNLearner
-from .state_shaper import GNNStateShaper
-from .utils import decision_cnt_analysis, load_config, return_scaler, save_code, save_config
-
-__all__ = [
-    "ParallelActor",
-    "SimpleAgentManger",
-    "GNNLearner",
-    "GNNStateShaper",
-    "decision_cnt_analysis", "load_config", "return_scaler", "save_code", "save_config"
-]
--- a/examples/cim/gnn/components/action_shaper.py
+++ b/examples/cim/gnn/components/action_shaper.py
@ -1,37 +0,0 @@
-from maro.rl import ActionShaper
-
-
-class DiscreteActionShaper(ActionShaper):
-    """The shaping class to transform the action in [-1, 1] to actual repositioning function."""
-    def __init__(self, action_dim):
-        super().__init__()
-        self._action_dim = action_dim
-        self._zero_action = self._action_dim // 2
-
-    def __call__(self, decision_event, model_action):
-        """Shaping the action in [-1,1] range to the actual repositioning function.
-
-        This function maps integer model action within the range of [-A, A] to actual action. We define negative actual
-        action as discharge resource from vessel to port and positive action as upload from port to vessel, so the
-        upper bound and lower bound of actual action are the resource in dynamic and static node respectively.
-
-        Args:
-            decision_event (Event): The decision event from the environment.
-            model_action (int): Output action, range A means the half of the agent output dim.
-        """
-        env_action = 0
-        model_action -= self._zero_action
-
-        action_scope = decision_event.action_scope
-
-        if model_action < 0:
-            # Discharge resource from dynamic node.
-            env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.load)
-        elif model_action == 0:
-            env_action = 0
-        else:
-            # Load resource to dynamic node.
-            env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.discharge)
-        env_action = int(env_action)
-
-        return env_action
--- a/examples/cim/gnn/components/actor.py
+++ b/examples/cim/gnn/components/actor.py
@ -1,370 +0,0 @@
-import ctypes
-import multiprocessing
-import os
-import pickle
-import time
-from collections import OrderedDict
-from multiprocessing import Pipe, Process
-
-import numpy as np
-import torch
-
-from maro.rl import AbsActor
-from maro.simulator import Env
-from maro.simulator.scenarios.cim.common import Action
-
-from .action_shaper import DiscreteActionShaper
-from .experience_shaper import ExperienceShaper
-from .shared_structure import SharedStructure
-from .state_shaper import GNNStateShaper
-from .utils import fix_seed, gnn_union
-
-
-def organize_exp_list(experience_collections: dict, idx_mapping: dict):
-    """The function assemble the experience from multiple processes into a dictionary.
-
-    Args:
-        experience_collections (dict): It stores the experience in all agents. The structure is the same as what is
-            defined in the SharedStructure in the ParallelActor except additional key for experience length. For
-            example:
-
-            {
-                "len": numpy.array,
-                "s": {
-                    "v": numpy.array,
-                    "p": numpy.array,
-                }
-                "a": numpy.array,
-                "R": numpy.array,
-                "s_": {
-                    "v": numpy.array,
-                    "p": numpy.array,
-                }
-            }
-
-            Note that the experience from different agents are stored in the same batch in a sequential way. For
-            example, if agent x starts at b_x in batch index and the experience is l_x length long, the range [b_x,
-            l_x) in the batch is the experience of agent x.
-
-        idx_mapping (dict): The key is the name of each agent and the value is the starting index, e.g., b_x, of the
-            storage space where the experience of the agent is stored.
-    """
-    result = {}
-    tmpi = 0
-    for code, idx in idx_mapping.items():
-        exp_len = experience_collections["len"][0][tmpi]
-
-        s = organize_obs(experience_collections["s"], idx, exp_len)
-        s_ = organize_obs(experience_collections["s_"], idx, exp_len)
-        R = experience_collections["R"][idx: idx + exp_len]
-        R = R.reshape(-1, *R.shape[2:])
-        a = experience_collections["a"][idx: idx + exp_len]
-        a = a.reshape(-1, *a.shape[2:])
-
-        result[code] = {
-            "R": R,
-            "a": a,
-            "s": s,
-            "s_": s_,
-            "len": a.shape[0]
-        }
-        tmpi += 1
-    return result
-
-
-def organize_obs(obs, idx, exp_len):
-    """Helper function to transform the observation from multiple processes to a unified dictionary."""
-    tick_buffer, _, para_cnt, v_cnt, v_dim = obs["v"].shape
-    _, _, _, p_cnt, p_dim = obs["p"].shape
-    batch = exp_len * para_cnt
-    # v: tick_buffer, seq_len,  parallel_cnt, v_cnt, v_dim --> (tick_buffer, cnt, v_cnt, v_dim)
-    v = obs["v"][:, idx: idx + exp_len]
-    v = v.reshape(tick_buffer, batch, v_cnt, v_dim)
-    p = obs["p"][:, idx: idx + exp_len]
-    p = p.reshape(tick_buffer, batch, p_cnt, p_dim)
-    # vo: seq_len * parallel_cnt * v_cnt * p_cnt* --> cnt * v_cnt * p_cnt*
-    vo = obs["vo"][idx: idx + exp_len]
-    vo = vo.reshape(batch, v_cnt, vo.shape[-1])
-    po = obs["po"][idx: idx + exp_len]
-    po = po.reshape(batch, p_cnt, po.shape[-1])
-    vedge = obs["vedge"][idx: idx + exp_len]
-    vedge = vedge.reshape(batch, v_cnt, vedge.shape[-2], vedge.shape[-1])
-    pedge = obs["pedge"][idx: idx + exp_len]
-    pedge = pedge.reshape(batch, p_cnt, pedge.shape[-2], pedge.shape[-1])
-    ppedge = obs["ppedge"][idx: idx + exp_len]
-    ppedge = ppedge.reshape(batch, p_cnt, ppedge.shape[-2], ppedge.shape[-1])
-
-    # mask: (seq_len, parallel_cnt, tick_buffer)
-    mask = obs["mask"][idx: idx + exp_len].reshape(batch, tick_buffer)
-
-    return {"v": v, "p": p, "vo": vo, "po": po, "pedge": pedge, "vedge": vedge, "ppedge": ppedge, "mask": mask}
-
-
-def single_player_worker(index, config, exp_idx_mapping, pipe, action_io, exp_output):
-    """The A2C worker function to collect experience.
-
-    Args:
-        index (int): The process index counted from 0.
-        config (dict): It is a dottable dictionary that stores the configuration of the simulation, state_shaper and
-            postprocessing shaper.
-        exp_idx_mapping (dict): The key is agent code and the value is the starting index where the experience is stored
-            in the experience batch.
-        pipe (Pipe): The pipe instance for communication with the main process.
-        action_io (SharedStructure): The shared memory to hold the state information that the main process uses to
-            generate an action.
-        exp_output (SharedStructure): The shared memory to transfer the experience list to the main process.
-    """
-    if index == 0:
-        simulation_log_path = os.path.join(config.log.path, f"cim_gnn_{index}")
-        if not os.path.exists(simulation_log_path):
-            os.makedirs(simulation_log_path)
-        opts = {"enable-dump-snapshot": simulation_log_path}
-        env = Env(**config.env.param, options=opts)
-    else:
-        env = Env(**config.env.param)
-    fix_seed(env, config.env.seed)
-    static_code_list, dynamic_code_list = list(env.summary["node_mapping"]["ports"].values()), \
-        list(env.summary["node_mapping"]["vessels"].values())
-    # Create gnn_state_shaper without consuming any resources.
-
-    gnn_state_shaper = GNNStateShaper(
-        static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
-        tick_buffer=config.model.tick_buffer, max_value=env.configs["total_containers"])
-    gnn_state_shaper.compute_static_graph_structure(env)
-
-    action_io_np = action_io.structuralize()
-
-    action_shaper = DiscreteActionShaper(config.model.action_dim)
-    exp_shaper = ExperienceShaper(
-        static_code_list, dynamic_code_list, config.env.param.durations, gnn_state_shaper,
-        scale_factor=config.env.return_scaler, time_slot=config.training.td_steps,
-        discount_factor=config.training.gamma, idx=index, shared_storage=exp_output.structuralize(),
-        exp_idx_mapping=exp_idx_mapping)
-
-    i = 0
-    while pipe.recv() == "reset":
-        r, decision_event, is_done = env.step(None)
-
-        j = 0
-        logs = []
-        while not is_done:
-            model_input = gnn_state_shaper(decision_event, env.snapshot_list)
-            action_io_np["v"][:, index] = model_input["v"]
-            action_io_np["p"][:, index] = model_input["p"]
-            action_io_np["vo"][index] = model_input["vo"]
-            action_io_np["po"][index] = model_input["po"]
-            action_io_np["vedge"][index] = model_input["vedge"]
-            action_io_np["pedge"][index] = model_input["pedge"]
-            action_io_np["ppedge"][index] = model_input["ppedge"]
-            action_io_np["mask"][index] = model_input["mask"]
-            action_io_np["pid"][index] = decision_event.port_idx
-            action_io_np["vid"][index] = decision_event.vessel_idx
-            pipe.send("features")
-            model_action = pipe.recv()
-            env_action = action_shaper(decision_event, model_action)
-            exp_shaper.record(decision_event=decision_event, model_action=model_action, model_input=model_input)
-            logs.append([
-                index, decision_event.tick, decision_event.port_idx, decision_event.vessel_idx, model_action,
-                env_action, decision_event.action_scope.load, decision_event.action_scope.discharge])
-            action = Action(decision_event.vessel_idx, decision_event.port_idx, env_action)
-            r, decision_event, is_done = env.step(action)
-            j += 1
-        action_io_np["sh"][index] = compute_shortage(env.snapshot_list, config.env.param.durations, static_code_list)
-        i += 1
-        pipe.send("done")
-        gnn_state_shaper.end_ep_callback(env.snapshot_list)
-        # Organize and synchronize exp to shared memory.
-        exp_shaper(env.snapshot_list)
-        exp_shaper.reset()
-        logs = np.array(logs, dtype=np.float)
-        pipe.send(logs)
-        env.reset()
-
-
-def compute_shortage(snapshot_list, max_tick, static_code_list):
-    """Helper function to compute the shortage after a episode end."""
-    return np.sum(snapshot_list["ports"][max_tick - 1: static_code_list: "acc_shortage"])
-
-
-class ParallelActor(AbsActor):
-    def __init__(self, config, demo_env, gnn_state_shaper, agent_manager, logger):
-        """A2C rollout class.
-
-        This implements the synchronized A2C structure. Multiple processes are created to simulate and collect
-        experience where only CPU is needed and whenever an action is required, they notify the main process and the
-        main process will do the batch action inference with GPU.
-
-        Args:
-            config (dict): The configuration to run the simulation.
-            demo_env (maro.simulator.Env): To get configuration information such as the amount of vessels and ports as
-                well as the topology of the environment, the example environment is needed.
-            gnn_state_shaper (AbsShaper): The state shaper instance to extract graph information from the state of
-                the environment.
-            agent_manager (AbsAgentManger): The agent manager instance to do the action inference in batch.
-            logger: The logger instance to log information during the rollout.
-
-        """
-        super().__init__(demo_env, agent_manager)
-        multiprocessing.set_start_method("spawn", True)
-        self._logger = logger
-        self.config = config
-
-        self._static_node_mapping = demo_env.summary["node_mapping"]["ports"]
-        self._dynamic_node_mapping = demo_env.summary["node_mapping"]["vessels"]
-        self._gnn_state_shaper = gnn_state_shaper
-        self.device = torch.device(config.training.device)
-
-        self.parallel_cnt = config.training.parallel_cnt
-        self.log_header = [f"sh_{i}" for i in range(self.parallel_cnt)]
-
-        tick_buffer = config.model.tick_buffer
-
-        v_dim, vedge_dim, v_cnt = self._gnn_state_shaper.get_input_dim("v"), \
-            self._gnn_state_shaper.get_input_dim("vedge"), len(self._dynamic_node_mapping)
-        p_dim, pedge_dim, p_cnt = self._gnn_state_shaper.get_input_dim("p"), \
-            self._gnn_state_shaper.get_input_dim("pedge"), len(self._static_node_mapping)
-
-        self.pipes = [Pipe() for i in range(self.parallel_cnt)]
-
-        action_io_structure = {
-            "p": ((tick_buffer, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
-            "v": ((tick_buffer, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
-            "po": ((self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
-            "vo": ((self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
-            "vedge": ((self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
-            "pedge": ((self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
-            "ppedge": ((self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
-            "mask": ((self.parallel_cnt, tick_buffer), ctypes.c_bool),
-            "sh": ((self.parallel_cnt, ), ctypes.c_long),
-            "pid": ((self.parallel_cnt, ), ctypes.c_long),
-            "vid": ((self.parallel_cnt, ), ctypes.c_long)
-        }
-        self.action_io = SharedStructure(action_io_structure)
-        self.action_io_np = self.action_io.structuralize()
-
-        tot_exp_len = sum(config.env.exp_per_ep.values())
-
-        exp_output_structure = {
-            "s": {
-                "v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
-                "p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
-                "vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
-                "po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
-                "vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
-                "pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
-                "ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
-                "mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
-            },
-            "s_": {
-                "v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
-                "p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
-                "vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
-                "po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
-                "vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
-                "pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
-                "ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
-                "mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
-            },
-            "a": ((tot_exp_len, self.parallel_cnt), ctypes.c_long),
-            "len": ((self.parallel_cnt, len(config.env.exp_per_ep)), ctypes.c_long),
-            "R": ((tot_exp_len, self.parallel_cnt, p_cnt), ctypes.c_float),
-        }
-        self.exp_output = SharedStructure(exp_output_structure)
-        self.exp_output_np = self.exp_output.structuralize()
-
-        self._logger.info("allocate complete")
-
-        self.exp_idx_mapping = OrderedDict()
-        acc_c = 0
-        for key, c in config.env.exp_per_ep.items():
-            self.exp_idx_mapping[key] = acc_c
-            acc_c += c
-
-        self.workers = [
-            Process(
-                target=single_player_worker,
-                args=(i, config, self.exp_idx_mapping, self.pipes[i][1], self.action_io, self.exp_output)
-            ) for i in range(self.parallel_cnt)
-        ]
-        for w in self.workers:
-            w.start()
-
-        self._logger.info("all thread started")
-
-        self._roll_out_time = 0
-        self._trainsfer_time = 0
-        self._roll_out_cnt = 0
-
-    def roll_out(self):
-        """Rollout using current policy in the AgentManager.
-
-        Returns:
-            result (dict): The key is the agent code, the value is the experience list stored in numpy.array.
-        """
-        # Compute the time used for state preparation in the child process.
-        t_state = 0
-        # Compute the time used for action inference.
-        t_action = 0
-
-        for p in self.pipes:
-            p[0].send("reset")
-        self._roll_out_cnt += 1
-
-        step_i = 0
-        tick = time.time()
-        while True:
-            signals = [p[0].recv() for p in self.pipes]
-            if signals[0] == "done":
-                break
-
-            step_i += 1
-
-            t = time.time()
-            graph = gnn_union(
-                self.action_io_np["p"], self.action_io_np["po"], self.action_io_np["pedge"],
-                self.action_io_np["v"], self.action_io_np["vo"], self.action_io_np["vedge"],
-                self._gnn_state_shaper.p2p_static_graph, self.action_io_np["ppedge"],
-                self.action_io_np["mask"], self.device
-            )
-            t_state += time.time() - t
-
-            assert(np.min(self.action_io_np["pid"]) == np.max(self.action_io_np["pid"]))
-            assert(np.min(self.action_io_np["vid"]) == np.max(self.action_io_np["vid"]))
-
-            t = time.time()
-            actions = self._inference_agents.choose_action(
-                agent_id=(self.action_io_np["pid"][0], self.action_io_np["vid"][0]), state=graph
-            )
-            t_action += time.time() - t
-
-            for i, p in enumerate(self.pipes):
-                p[0].send(actions[i])
-
-        self._roll_out_time += time.time() - tick
-        tick = time.time()
-        self._logger.info("receiving exp")
-        logs = [p[0].recv() for p in self.pipes]
-
-        self._logger.info(f"Mean of shortage: {np.mean(self.action_io_np['sh'])}")
-        self._trainsfer_time += time.time() - tick
-
-        self._logger.debug(dict(zip(self.log_header, self.action_io_np["sh"])))
-
-        with open(os.path.join(self.config.log.path, f"logs_{self._roll_out_cnt}"), "wb") as fp:
-            pickle.dump(logs, fp)
-
-        self._logger.info("organize exp_dict")
-        result = organize_exp_list(self.exp_output_np, self.exp_idx_mapping)
-
-        if self.config.log.exp.enable and self._roll_out_cnt % self.config.log.exp.freq == 0:
-            with open(os.path.join(self.config.log.path, f"exp_{self._roll_out_cnt}"), "wb") as fp:
-                pickle.dump(result, fp)
-
-        self._logger.debug(f"play time: {int(self._roll_out_time)}")
-        self._logger.debug(f"transfer time: {int(self._trainsfer_time)}")
-        return result
-
-    def exit(self):
-        """Terminate the child processes."""
-        for p in self.pipes:
-            p[0].send("close")
--- a/examples/cim/gnn/components/actor_critic.py
+++ b/examples/cim/gnn/components/actor_critic.py
@ -1,180 +0,0 @@
-import os
-
-import torch
-from torch import nn
-from torch.distributions import Categorical
-from torch.nn.utils import clip_grad
-
-from maro.rl import AbsAlgorithm
-
-from .utils import gnn_union
-
-
-class ActorCritic(AbsAlgorithm):
-    """Actor-Critic algorithm in CIM problem.
-
-    The vanilla ac algorithm.
-
-    Args:
-        model (nn.Module): A actor-critic module outputing both the policy network and the value network.
-        device (torch.device): A PyTorch device instance where the module is computed on.
-        p2p_adj (numpy.array): The static port-to-port adjencency matrix.
-        td_steps (int): The value "n" in the n-step TD algorithm.
-        gamma (float): The time decay.
-        learning_rate (float): The learning rate for the module.
-        entropy_factor (float): The weight of the policy"s entropy to boost exploration.
-    """
-
-    def __init__(
-            self, model: nn.Module, device: torch.device, p2p_adj=None, td_steps=100, gamma=0.97, learning_rate=0.0003,
-            entropy_factor=0.1):
-        self._gamma = gamma
-        self._td_steps = td_steps
-        self._value_discount = gamma**100
-        self._entropy_factor = entropy_factor
-        self._device = device
-        self._tot_batchs = 0
-        self._p2p_adj = p2p_adj
-        super().__init__(
-            model_dict={"a&c": model}, optimizer_opt={"a&c": (torch.optim.Adam, {"lr": learning_rate})},
-            loss_func_dict={}, hyper_params=None)
-
-    def choose_action(self, state: dict, p_idx: int, v_idx: int):
-        """Get action from the AC model.
-
-        Args:
-            state (dict): A dictionary containing the input to the module. For example:
-                {
-                    "v": v,
-                    "p": p,
-                    "pe": {
-                        "edge": pedge,
-                        "adj": padj,
-                        "mask": pmask,
-                    },
-                    "ve": {
-                        "edge": vedge,
-                        "adj": vadj,
-                        "mask": vmask,
-                    },
-                    "ppe": {
-                        "edge": ppedge,
-                        "adj": p2p_adj,
-                        "mask": p2p_mask,
-                    },
-                    "mask": seq_mask,
-                }
-            p_idx (int): The identity of the port doing the action.
-            v_idx (int): The identity of the vessel doing the action.
-
-        Returns:
-            model_action (numpy.int64): The action returned from the module.
-        """
-        with torch.no_grad():
-            prob, _ = self._model_dict["a&c"](state, a=True, p_idx=p_idx, v_idx=v_idx)
-            distribution = Categorical(prob)
-            model_action = distribution.sample().cpu().numpy()
-            return model_action
-
-    def train(self, batch, p_idx, v_idx):
-        """Model training.
-
-        Args:
-            batch (dict): The dictionary of a batch of experience. For example:
-                {
-                    "s": the dictionary of state,
-                    "a": model actions in numpy array,
-                    "R": the n-step accumulated reward,
-                    "s"": the dictionary of the next state,
-                }
-            p_idx (int): The identity of the port doing the action.
-            v_idx (int): The identity of the vessel doing the action.
-
-        Returns:
-            a_loss (float): action loss.
-            c_loss (float): critic loss.
-            e_loss (float): entropy loss.
-            tot_norm (float): the L2 norm of the gradient.
-
-        """
-        self._tot_batchs += 1
-        item_a_loss, item_c_loss, item_e_loss = 0, 0, 0
-        obs_batch = batch["s"]
-        action_batch = batch["a"]
-        return_batch = batch["R"]
-        next_obs_batch = batch["s_"]
-
-        obs_batch = gnn_union(
-            obs_batch["p"], obs_batch["po"], obs_batch["pedge"], obs_batch["v"], obs_batch["vo"], obs_batch["vedge"],
-            self._p2p_adj, obs_batch["ppedge"], obs_batch["mask"], self._device)
-        action_batch = torch.from_numpy(action_batch).long().to(self._device)
-        return_batch = torch.from_numpy(return_batch).float().to(self._device)
-        next_obs_batch = gnn_union(
-            next_obs_batch["p"], next_obs_batch["po"], next_obs_batch["pedge"], next_obs_batch["v"],
-            next_obs_batch["vo"], next_obs_batch["vedge"], self._p2p_adj, next_obs_batch["ppedge"],
-            next_obs_batch["mask"], self._device)
-
-        # Train actor network.
-        self._optimizer["a&c"].zero_grad()
-
-        # Every port has a value.
-        # values.shape: (batch, p_cnt)
-        probs, values = self._model_dict["a&c"](obs_batch, a=True, p_idx=p_idx, v_idx=v_idx, c=True)
-        distribution = Categorical(probs)
-        log_prob = distribution.log_prob(action_batch)
-        entropy_loss = distribution.entropy()
-
-        _, values_ = self._model_dict["a&c"](next_obs_batch, c=True)
-        advantage = return_batch + self._value_discount * values_.detach() - values
-
-        if self._entropy_factor != 0:
-            # actor_loss = actor_loss* torch.log(entropy_loss + np.e)
-            advantage[:, p_idx] += self._entropy_factor * entropy_loss.detach()
-
-        actor_loss = - (log_prob * torch.sum(advantage, axis=-1).detach()).mean()
-
-        item_a_loss = actor_loss.item()
-        item_e_loss = entropy_loss.mean().item()
-
-        # Train critic network.
-        critic_loss = torch.sum(advantage.pow(2), axis=1).mean()
-        item_c_loss = critic_loss.item()
-        # torch.nn.utils.clip_grad_norm_(self._critic_model.parameters(),0.5)
-        tot_loss = 0.1 * actor_loss + critic_loss
-        tot_loss.backward()
-        tot_norm = clip_grad.clip_grad_norm_(self._model_dict["a&c"].parameters(), 1)
-        self._optimizer["a&c"].step()
-        return item_a_loss, item_c_loss, item_e_loss, float(tot_norm)
-
-    def set_weights(self, weights):
-        self._model_dict["a&c"].load_state_dict(weights)
-
-    def get_weights(self):
-        return self._model_dict["a&c"].state_dict()
-
-    def _get_save_idx(self, fp_str):
-        return int(fp_str.split(".")[0].split("_")[0])
-
-    def save_model(self, pth, id):
-        if not os.path.exists(pth):
-            os.makedirs(pth)
-        pth = os.path.join(pth, f"{id}_ac.pkl")
-        torch.save(self._model_dict["a&c"].state_dict(), pth)
-
-    def _set_gnn_weights(self, weights):
-        for key in weights:
-            if key in self._model_dict["a&c"].state_dict().keys():
-                self._model_dict["a&c"].state_dict()[key].copy_(weights[key])
-
-    def load_model(self, folder_pth, idx=-1):
-        if idx == -1:
-            fps = os.listdir(folder_pth)
-            fps = [f for f in fps if "ac" in f]
-            fps.sort(key=self._get_save_idx)
-            ac_pth = fps[-1]
-        else:
-            ac_pth = f"{idx}_ac.pkl"
-        pth = os.path.join(folder_pth, ac_pth)
-        with open(pth, "rb") as fp:
-            weights = torch.load(fp, map_location=self._device)
-        self._set_gnn_weights(weights)
--- a/examples/cim/gnn/components/agent.py
+++ b/examples/cim/gnn/components/agent.py
@ -1,41 +0,0 @@
-from collections import defaultdict
-
-import numpy as np
-
-from maro.rl import AbsAgent
-from maro.utils import DummyLogger
-
-from .numpy_store import Shuffler
-
-
-class TrainableAgent(AbsAgent):
-    def __init__(self, name, algorithm, experience_pool, logger=DummyLogger()):
-        self._logger = logger
-        super().__init__(name, algorithm, experience_pool)
-
-    def train(self, training_config):
-        loss_dict = defaultdict(list)
-        for j in range(training_config.shuffle_time):
-            shuffler = Shuffler(self._experience_pool, batch_size=training_config.batch_size)
-            while shuffler.has_next():
-                batch = shuffler.next()
-                actor_loss, critic_loss, entropy_loss, tot_loss = self._algorithm.train(
-                    batch, self._name[0], self._name[1])
-                loss_dict["actor"].append(actor_loss)
-                loss_dict["critic"].append(critic_loss)
-                loss_dict["entropy"].append(entropy_loss)
-                loss_dict["tot"].append(tot_loss)
-
-        a_loss = np.mean(loss_dict["actor"])
-        c_loss = np.mean(loss_dict["critic"])
-        e_loss = np.mean(loss_dict["entropy"])
-        tot_loss = np.mean(loss_dict["tot"])
-        self._logger.debug(
-            f"code: {str(self._name)} \t actor: {float(a_loss)} \t critic: {float(c_loss)} \t entropy: {float(e_loss)} \
-            \t tot: {float(tot_loss)}")
-
-        self._experience_pool.clear()
-        return loss_dict
-
-    def choose_action(self, model_state):
-        return self._algorithm.choose_action(model_state, self._name[0], self._name[1])
--- a/examples/cim/gnn/components/agent_manager.py
+++ b/examples/cim/gnn/components/agent_manager.py
@ -1,119 +0,0 @@
-from copy import copy
-
-import numpy as np
-import torch
-
-from maro.rl import AbsAgentManager, AgentMode
-from maro.utils import DummyLogger
-
-from .actor_critic import ActorCritic
-from .agent import TrainableAgent
-from .numpy_store import NumpyStore
-from .simple_gnn import SharedAC
-from .state_shaper import GNNStateShaper
-
-
-class SimpleAgentManger(AbsAgentManager):
-    def __init__(
-            self, name, agent_id_list, port_code_list, vessel_code_list, demo_env, state_shaper: GNNStateShaper,
-            logger=DummyLogger()):
-        super().__init__(
-            name, AgentMode.TRAIN, agent_id_list, state_shaper=state_shaper, action_shaper=None,
-            experience_shaper=None, explorer=None)
-        self.port_code_list = copy(port_code_list)
-        self.vessel_code_list = copy(vessel_code_list)
-        self.demo_env = demo_env
-        self._logger = logger
-
-    def assemble(self, config):
-        v_dim, vedge_dim = self._state_shaper.get_input_dim("v"), self._state_shaper.get_input_dim("vedge")
-        p_dim, pedge_dim = self._state_shaper.get_input_dim("p"), self._state_shaper.get_input_dim("pedge")
-
-        self.device = torch.device(config.training.device)
-        self._logger.info(config.training.device)
-        ac_model = SharedAC(
-            p_dim, pedge_dim, v_dim, vedge_dim, config.model.tick_buffer, config.model.action_dim).to(self.device)
-
-        value_dict = {
-            ("s", "v"):
-                (
-                    (config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
-                    np.float32, False),
-            ("s", "p"):
-                (
-                    (config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
-                    np.float32, False),
-            ("s", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
-            ("s", "po"): ((len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
-            ("s", "vedge"):
-                (
-                    (len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
-                    np.float32, True),
-            ("s", "pedge"):
-                (
-                    (len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
-                    np.float32, True),
-            ("s", "ppedge"):
-                (
-                    (len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
-                    np.float32, True),
-            ("s", "mask"): ((config.model.tick_buffer, ), np.bool, True),
-
-            ("s_", "v"):
-                (
-                    (config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
-                    np.float32, False),
-            ("s_", "p"):
-                (
-                    (config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
-                    np.float32, False),
-            ("s_", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
-            ("s_", "po"):
-                (
-                    (len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
-            ("s_", "vedge"):
-                (
-                    (len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
-                    np.float32, True),
-            ("s_", "pedge"):
-                (
-                    (len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
-                    np.float32, True),
-            ("s_", "ppedge"):
-                (
-                    (len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
-                    np.float32, True),
-            ("s_", "mask"): ((config.model.tick_buffer, ), np.bool, True),
-
-            # To identify one dimension variable.
-            ("R",): ((len(self.port_code_list), ), np.float32, True),
-            ("a",): (tuple(), np.int64, True),
-        }
-
-        self._algorithm = ActorCritic(
-            ac_model, self.device, td_steps=config.training.td_steps, p2p_adj=self._state_shaper.p2p_static_graph,
-            gamma=config.training.gamma, learning_rate=config.training.learning_rate)
-
-        for agent_id, cnt in config.env.exp_per_ep.items():
-            experience_pool = NumpyStore(value_dict, config.training.parallel_cnt * config.training.train_freq * cnt)
-            self._agent_dict[agent_id] = TrainableAgent(agent_id, self._algorithm, experience_pool, self._logger)
-
-    def choose_action(self, agent_id, state):
-        return self._agent_dict[agent_id].choose_action(state)
-
-    def load_models_from_files(self, model_pth):
-        self._algorithm.load_model(model_pth)
-
-    def train(self, training_config):
-        for agent in self._agent_dict.values():
-            agent.train(training_config)
-
-    def store_experiences(self, experiences):
-        for code, exp_list in experiences.items():
-            self._agent_dict[code].store_experiences(exp_list)
-
-    def save_model(self, pth, id):
-        self._algorithm.save_model(pth, id)
-
-    def load_model(self, pth):
-        self._algorithm.load_model(pth)
--- a/examples/cim/gnn/components/experience_shaper.py
+++ b/examples/cim/gnn/components/experience_shaper.py
@ -1,111 +0,0 @@
-from collections import defaultdict
-
-import numpy as np
-
-
-class ExperienceShaper:
-    def __init__(
-            self, static_list, dynamic_list, max_tick, gnn_state_shaper, scale_factor=0.0001, time_slot=100,
-            discount_factor=0.97, idx=-1, shared_storage=None, exp_idx_mapping=None):
-        self._static_list = list(static_list)
-        self._dynamic_list = list(dynamic_list)
-        self._time_slot = time_slot
-        self._discount_factor = discount_factor
-        self._discount_vector = np.logspace(1, self._time_slot, self._time_slot, base=discount_factor)
-        self._max_tick = max_tick
-        self._tick_range = list(range(self._max_tick))
-        self._len_return = self._max_tick - self._time_slot
-        self._gnn_state_shaper = gnn_state_shaper
-        self._fulfillment_list, self._shortage_list, self._experience_dict = None, None, None
-        self._experience_dict = defaultdict(list)
-        self._init_state()
-        self._idx = idx
-        self._exp_idx_mapping = exp_idx_mapping
-        self._shared_storage = shared_storage
-        self._scale_factor = scale_factor
-
-    def _init_state(self):
-        self._fulfillment_list, self._shortage_list = np.zeros(self._max_tick + 1), np.zeros(self._max_tick + 1)
-        self._experience_dict = defaultdict(list)
-        self._last_tick = 0
-
-    def record(self, decision_event, model_action, model_input):
-        # Only the experience that has the next state of given time slot is valuable.
-        if decision_event.tick + self._time_slot < self._max_tick:
-            self._experience_dict[decision_event.port_idx, decision_event.vessel_idx].append({
-                "tick": decision_event.tick,
-                "s": model_input,
-                "a": model_action,
-            })
-
-    def _compute_delta(self, arr):
-        delta = np.array(arr)
-        delta[1:] -= arr[:-1]
-        return delta
-
-    def _batch_obs_to_numpy(self, obs):
-        v = np.stack([o["v"] for o in obs], axis=0)
-        p = np.stack([o["p"] for o in obs], axis=0)
-        vo = np.stack([o["vo"] for o in obs], axis=0)
-        po = np.stack([o["po"] for o in obs], axis=0)
-        return {"p": p, "v": v, "vo": vo, "po": po}
-
-    def __call__(self, snapshot_list):
-        if self._shared_storage is None:
-            return
-
-        shortage = snapshot_list["ports"][self._tick_range:self._static_list:"shortage"].reshape(self._max_tick, -1)
-        fulfillment = snapshot_list["ports"][self._tick_range:self._static_list:"fulfillment"] \
-            .reshape(self._max_tick, -1)
-        delta = fulfillment - shortage
-        R = np.empty((self._len_return, len(self._static_list)), dtype=np.float)
-        for i in range(0, self._len_return, 1):
-            R[i] = np.dot(self._discount_vector, delta[i + 1: i + self._time_slot + 1])
-
-        for (agent_idx, vessel_idx), exp_list in self._experience_dict.items():
-            for exp in exp_list:
-                tick = exp["tick"]
-                exp["s_"] = self._gnn_state_shaper(tick=tick + self._time_slot)
-                exp["R"] = self._scale_factor * R[tick]
-
-        tmpi = 0
-        for (agent_idx, vessel_idx), idx_base in self._exp_idx_mapping.items():
-            exp_list = self._experience_dict[(agent_idx, vessel_idx)]
-            exp_len = len(exp_list)
-            # Here, we assume that exp_idx_mapping order is not changed.
-            self._shared_storage["len"][self._idx, tmpi] = exp_len
-            self._shared_storage["s"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["v"] for e in exp_list], axis=1)
-            self._shared_storage["s"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["p"] for e in exp_list], axis=1)
-            self._shared_storage["s"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["vo"] for e in exp_list], axis=0)
-            self._shared_storage["s"]["po"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["po"] for e in exp_list], axis=0)
-            self._shared_storage["s"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["vedge"] for e in exp_list], axis=0)
-            self._shared_storage["s"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s"]["pedge"] for e in exp_list], axis=0)
-
-            self._shared_storage["s_"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["v"] for e in exp_list], axis=1)
-            self._shared_storage["s_"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["p"] for e in exp_list], axis=1)
-            self._shared_storage["s_"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["vo"] for e in exp_list], axis=0)
-            self._shared_storage["s_"]["po"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["po"] for e in exp_list], axis=0)
-            self._shared_storage["s_"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["vedge"] for e in exp_list], axis=0)
-            self._shared_storage["s_"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
-                np.stack([e["s_"]["pedge"] for e in exp_list], axis=0)
-
-            self._shared_storage["a"][idx_base: idx_base + exp_len, self._idx] = \
-                np.array([exp["a"] for exp in exp_list], dtype=np.int64)
-            self._shared_storage["R"][idx_base: idx_base + exp_len, self._idx] = \
-                np.vstack([exp["R"] for exp in exp_list])
-            tmpi += 1
-
-    def reset(self):
-        del self._experience_dict
-        self._init_state()
--- a/examples/cim/gnn/components/learner.py
+++ b/examples/cim/gnn/components/learner.py
@ -1,52 +0,0 @@
-import os
-import time
-
-from maro.rl import AbsLearner
-from maro.utils import DummyLogger
-
-from .actor import ParallelActor
-from .agent_manager import SimpleAgentManger
-
-
-class GNNLearner(AbsLearner):
-    """Learner class for the training pipeline and the specialized logging in GNN solution for CIM problem.
-
-    Args:
-        actor (AbsActor): The actor instance to collect experience.
-        trainable_agents (AbsAgentManager): The agent manager for training RL models.
-        logger (Logger): The logger to save/print the message.
-    """
-
-    def __init__(self, actor: ParallelActor, trainable_agents: SimpleAgentManger, logger=DummyLogger()):
-        super().__init__()
-        self._actor = actor
-        self._trainable_agents = trainable_agents
-        self._logger = logger
-
-    def train(self, training_config, log_pth=None):
-        rollout_time = 0
-        training_time = 0
-        for i in range(training_config.rollout_cnt):
-            self._logger.info(f"rollout {i + 1}")
-            tick = time.time()
-            exp_dict = self._actor.roll_out()
-
-            rollout_time += time.time() - tick
-
-            self._logger.info("start putting exps")
-            self._trainable_agents.store_experiences(exp_dict)
-
-            if training_config.enable and i % training_config.train_freq == training_config.train_freq - 1:
-                self._logger.info("training start")
-                tick = time.time()
-                self._trainable_agents.train(training_config)
-                training_time += time.time() - tick
-
-            if log_pth is not None and (i + 1) % training_config.model_save_freq == 0:
-                self._trainable_agents.save_model(os.path.join(log_pth, "models"), i + 1)
-
-            self._logger.debug(f"total rollout_time: {int(rollout_time)}")
-            self._logger.debug(f"train_time: {int(training_time)}")
-
-    def test(self):
-        pass
--- a/examples/cim/gnn/components/numpy_store.py
+++ b/examples/cim/gnn/components/numpy_store.py
@ -1,186 +0,0 @@
-from typing import Sequence
-
-import numpy as np
-
-from maro.rl import AbsStore
-
-
-def get_item(data_dict, key_tuple):
-    """Helper function to get the value in a hierarchical dictionary given the key path.
-
-    Args:
-        data_dict (dict): The data structure. For example:
-            {
-                "a": {
-                    "b": 1,
-                    "c": {
-                        "d": 2,
-                    }
-                }
-            }
-
-        key_tuple (tuple): The key path to the target field. For example, given the data_dict above, the key_tuple
-            ("a", "c", "d") should return 2.
-    """
-    for key in key_tuple:
-        data_dict = data_dict[key]
-    return data_dict
-
-
-def set_item(data_dict, key_tuple, data):
-    """The setter function corresponding to the get_item function."""
-    for i, key in enumerate(key_tuple):
-        if key not in data_dict:
-            data_dict[key] = {}
-        if i == len(key_tuple) - 1:
-            data_dict[key] = data
-        else:
-            data_dict = data_dict[key]
-
-
-class NumpyStore(AbsStore):
-    def __init__(self, domain_type_dict, capacity):
-        """
-        Args:
-            domain_type_dict (dict): The dictionary describing the name, structure and type of each field in the
-                experience. Each field in the experience is the key-value pair in the folowing structure:
-                (field_name): (size_of_an_instance, data_type, batch_first)
-
-                For example:
-                    ("s"): ((32, 64), np.float32, True)
-
-                The field can be a hierarchical dictionary by identifying the full path to the root.
-
-                For example:
-                {
-                    ("s", "p"): ((32, 64), np.float32, True)
-                    ("s", "v"): ((48, ), np.float32, False),
-                }
-                Then the batch of experience returned by self.get(indexes) is:
-                {
-                    "s":
-                        {
-                            "p": numpy.array with size (batch, 32, 64),
-                            "v": numpy.array with size (32, batch, 48),
-                        }
-                }
-                Note that for the field ("s", "v"), the batch is in the 2nd dimension because the batch_first attribute
-                is False.
-
-            capacity (int): The maximum stored experience in the store.
-        """
-        super().__init__()
-        self.domain_type_dict = dict(domain_type_dict)
-        self.store = {
-            key: np.zeros(
-                shape=(capacity, *shape) if batch_first else (shape[0], capacity, *shape[1:]), dtype=data_type)
-            for key, (shape, data_type, batch_first) in domain_type_dict.items()}
-        self.batch_first_store = {key: batch_first for key, (_, _, batch_first) in domain_type_dict.items()}
-
-        self.cnt = 0
-        self.capacity = capacity
-
-    def put(self, exp_dict: dict):
-        """Insert a batch of experience into the store.
-
-        If the store reaches the maximum capacity, this function will replace the experience in the store randomly.
-
-        Args:
-            exp_dict (dict): The dictionary of a batch of experience. For example:
-
-                {
-                    "s":
-                        {
-                            "p": numpy.array with size (batch, 32, 64),
-                            "v": numpy.array with size (32, batch, 48),
-                        }
-                }
-
-                The structure should be consistent with the structure defined in the __init__ function.
-
-        Returns:
-            indexes (numpy.array): The list of the indexes each experience in the batch is located in.
-        """
-        dlen = exp_dict["len"]
-        append_end = min(max(self.capacity - self.cnt, 0), dlen)
-        idxs = np.zeros(dlen, dtype=np.int)
-        if append_end != 0:
-            for key in self.domain_type_dict.keys():
-                data = get_item(exp_dict, key)
-                if self.batch_first_store[key]:
-                    self.store[key][self.cnt: self.cnt + append_end] = data[0: append_end]
-                else:
-                    self.store[key][:, self.cnt: self.cnt + append_end] = data[:, 0: append_end]
-            idxs[: append_end] = np.arange(self.cnt, self.cnt + append_end)
-        if append_end < dlen:
-            replace_idx = self._get_replace_idx(dlen - append_end)
-            for key in self.domain_type_dict.keys():
-                data = get_item(exp_dict, key)
-                if self.batch_first_store[key]:
-                    self.store[key][replace_idx] = data[append_end: dlen]
-                else:
-                    self.store[key][:, replace_idx] = data[:, append_end: dlen]
-            idxs[append_end: dlen] = replace_idx
-        self.cnt += dlen
-        return idxs
-
-    def _get_replace_idx(self, cnt):
-        return np.random.randint(low=0, high=self.capacity, size=cnt)
-
-    def get(self, indexes: np.array):
-        """Get the experience indexed in the indexes list from the store.
-
-        Args:
-            indexes (np.array): A numpy array containing the indexes of a batch experience.
-
-        Returns:
-            data_dict (dict): the structure same as that defined in the __init__ function.
-        """
-        data_dict = {}
-        for key in self.domain_type_dict.keys():
-            if self.batch_first_store[key]:
-                set_item(data_dict, key, self.store[key][indexes])
-            else:
-                set_item(data_dict, key, self.store[key][:, indexes])
-        return data_dict
-
-    def __len__(self):
-        return min(self.capacity, self.cnt)
-
-    def update(self, indexes: Sequence, contents: Sequence):
-        raise NotImplementedError("NumpyStore does not support modifying the experience!")
-
-    def sample(self, size, weights: Sequence, replace: bool = True):
-        raise NotImplementedError("NumpyStore does not support sampling. Please use outer sampler to fetch samples!")
-
-    def clear(self):
-        """Remove all the experience in the store."""
-        self.cnt = 0
-
-
-class Shuffler:
-    def __init__(self, store: NumpyStore, batch_size: int):
-        """The helper class for fast batch sampling.
-
-        Args:
-            store (NumpyStore): The data source for sampling.
-            batch_size (int): The size of a batch.
-        """
-        self._store = store
-        self._shuffled_seq = np.arange(0, len(store))
-        np.random.shuffle(self._shuffled_seq)
-        self._start = 0
-        self._batch_size = batch_size
-
-    def next(self):
-        """Uniformly sampling out a batch in the store."""
-        if self._start >= len(self._store):
-            return None
-        end = min(self._start + self._batch_size, len(self._store))
-        rst = self._store.get(self._shuffled_seq[self._start: end])
-        self._start += self._batch_size
-        return rst
-
-    def has_next(self):
-        """Check if any experience is not visited."""
-        return self._start < len(self._store)
--- a/examples/cim/gnn/components/shared_structure.py
+++ b/examples/cim/gnn/components/shared_structure.py
@ -1,46 +0,0 @@
-import multiprocessing
-
-import numpy as np
-
-
-def init_shared_memory(data_structure):
-    """Initialize the data structure of the shared memory.
-
-    Args:
-        data_structure: The dictionary that describes the data structure. For example,
-            {
-                "a": (shape, type),
-                "b": {
-                        "b1": (shape, type),
-                    }
-            }
-    """
-    if isinstance(data_structure, tuple):
-        mult = 1
-        for i in data_structure[0]:
-            mult *= i
-        return multiprocessing.Array(data_structure[1], mult, lock=False)
-    else:
-        shared_data = {}
-        for k, v in data_structure.items():
-            shared_data[k] = init_shared_memory(v)
-        return shared_data
-
-
-def shared_data2numpy(shared_data, structure_info):
-    if not isinstance(shared_data, dict):
-        return np.frombuffer(shared_data, dtype=structure_info[1]).reshape(structure_info[0])
-    else:
-        numpy_dict = {}
-        for k, v in shared_data.items():
-            numpy_dict[k] = shared_data2numpy(v, structure_info[k])
-        return numpy_dict
-
-
-class SharedStructure:
-    def __init__(self, data_structure):
-        self.data_structure = data_structure
-        self.shared = init_shared_memory(data_structure)
-
-    def structuralize(self):
-        return shared_data2numpy(self.shared, self.data_structure)
--- a/examples/cim/gnn/components/simple_gnn.py
+++ b/examples/cim/gnn/components/simple_gnn.py
@ -1,335 +0,0 @@
-import math
-
-import torch
-import torch.nn as nn
-from torch import Tensor
-from torch.nn import TransformerEncoder, TransformerEncoderLayer
-from torch.nn import functional as F
-from torch.nn.modules.activation import MultiheadAttention
-from torch.nn.modules.dropout import Dropout
-from torch.nn.modules.normalization import LayerNorm
-
-
-class PositionalEncoder(nn.Module):
-    """
-    The positional encoding used in transformer to get the sequential information.
-
-    The code is based on the PyTorch version in web
-    https://pytorch.org/tutorials/beginner/transformer_tutorial.html?highlight=positionalencoding
-    """
-
-    def __init__(self, d_model, max_seq_len=80):
-        super().__init__()
-        self.d_model = d_model
-        self.times = 4 * math.sqrt(self.d_model)
-
-        # Create constant "pe" matrix with values dependant on pos and i.
-        self.pe = torch.zeros(max_seq_len, d_model)
-        for pos in range(max_seq_len):
-            for i in range(0, d_model, 2):
-                self.pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
-                self.pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))
-
-        self.pe = self.pe.unsqueeze(1) / self.d_model
-
-    def forward(self, x):
-        # Make embeddings relatively larger.
-        addon = self.pe[: x.shape[0], :, : x.shape[2]].to(x.device)
-        return x + addon
-
-
-class SimpleGATLayer(nn.Module):
-    """The enhanced graph attention layer for heterogenenous neighborhood.
-
-    It first utilizes pre-layers for both the source and destination node to map their features into the same hidden
-    size. If the edge also has features, they are concatenated with those of the corresponding source node before being
-    fed to the pre-layers. Then the graph attention(https://arxiv.org/abs/1710.10903) is done to aggregate information
-    from the source nodes to the destination nodes. The residual connection and layer normalization are also used to
-    enhance the performance, which is similar to the Transformer(https://arxiv.org/abs/1706.03762).
-
-    Args:
-        src_dim (int): The feature dimension of the source nodes.
-        dest_dim (int): The feature dimension of the destination nodes.
-        edge_dim (int): The feature dimension of the edges. If the edges have no feature, it should be set 0.
-        hidden_size (int): The hidden size both the destination and source is mapped into.
-        nhead (int): The number of head in the multi-head attention.
-        position_encoding (bool): the neighbor source nodes is aggregated in order(True) or orderless(False).
-    """
-
-    def __init__(self, src_dim, dest_dim, edge_dim, hidden_size, nhead=4, position_encoding=True):
-        super().__init__()
-        self.src_dim = src_dim
-        self.dest_dim = dest_dim
-        self.edge_dim = edge_dim
-        self.hidden_size = hidden_size
-        self.nhead = nhead
-        src_layers = []
-        src_layers.append(nn.Linear(src_dim + edge_dim, hidden_size))
-        src_layers.append(GeLU())
-        self.src_pre_layer = nn.Sequential(*src_layers)
-
-        dest_layers = []
-        dest_layers.append(nn.Linear(dest_dim, hidden_size))
-        dest_layers.append(GeLU())
-        self.dest_pre_layer = nn.Sequential(*dest_layers)
-
-        self.att = MultiheadAttention(embed_dim=hidden_size, num_heads=nhead)
-        self.att_dropout = Dropout(0.1)
-        self.att_norm = LayerNorm(hidden_size)
-
-        self.zero_padding_template = torch.zeros((1, src_dim), dtype=torch.float)
-
-    def forward(self, src: Tensor, dest: Tensor, adj: Tensor, mask: Tensor, edges: Tensor = None):
-        """Information aggregation from the source nodes to the destination nodes.
-
-        Args:
-            src (Tensor): The source nodes in a batch of graph.
-            dest (Tensor): The destination nodes in a batch of graph.
-            adj (Tensor): The adjencency list stored in a 2D matrix in the batch-second format. The first dimension is
-                the maximum amount of the neighbors the destinations have. As the neighbor quantities vary from one
-                destination to another, the short sequences are padded with 0.
-            mask (Tensor): The mask identifies if a position in the adj is padded. Note that it is stored in the
-                batch-first format.
-
-        Returns:
-            destination_emb: The embedding of the destinations after the GAT layer.
-
-        Shape:
-            src: (batch, src_cnt, src_dim)
-            dest: (batch, dest_cnt, dest_dim)
-            adj: (src_neighbor_cnt, batch*dest_cnt)
-            mask: (batch*dest_cnt)*src_neighbor_cnt
-            edges: (batch*dest_cnt, src_neighbor_cnt, edge_dim)
-            destination_emb: (batch, dest_cnt, hidden_size)
-
-        """
-        assert(self.src_dim == src.shape[-1])
-        assert(self.dest_dim == dest.shape[-1])
-        batch, s_cnt, src_dim = src.shape
-        batch, d_cnt, dest_dim = dest.shape
-        src_neighbor_cnt = adj.shape[0]
-
-        src_embedding = src.reshape(-1, src_dim)
-        src_embedding = torch.cat((self.zero_padding_template.to(src_embedding.device), src_embedding))
-
-        flat_adj = adj.reshape(-1)
-        src_embedding = src_embedding[flat_adj].reshape(src_neighbor_cnt, -1, src_dim)
-        if edges is not None:
-            src_embedding = torch.cat((src_embedding, edges), axis=2)
-
-        src_input = self.src_pre_layer(
-            src_embedding.reshape(-1, src_dim + self.edge_dim)). \
-            reshape(*src_embedding.shape[:2], self.hidden_size)
-        dest_input = self.dest_pre_layer(dest.reshape(-1, dest_dim)).reshape(1, batch * d_cnt, self.hidden_size)
-        dest_emb, _ = self.att(dest_input, src_input, src_input, key_padding_mask=mask)
-
-        dest_emb = dest_emb + self.att_dropout(dest_emb)
-        dest_emb = self.att_norm(dest_emb)
-        return dest_emb.reshape(batch, d_cnt, self.hidden_size)
-
-
-class SimpleTransformer(nn.Module):
-    """Graph attention network with multiple graph in the CIM scenario.
-
-    This module aggregates information in the port-to-port graph, port-to-vessel graph and vessel-to-port graph. The
-    aggregation in the two graph are done separatedly and then the port features are concatenated as the final result.
-
-    Args:
-        p_dim (int): The feature dimension of the ports.
-        v_dim (int): The feature dimension of the vessels.
-        edge_dim (dict): The key is the edge name and the value is the corresponding feature dimension.
-        output_size (int): The hidden size in graph attention.
-        layer_num (int): The number of graph attention layers in each graph.
-    """
-
-    def __init__(self, p_dim, v_dim, edge_dim: dict, output_size, layer_num=2):
-        super().__init__()
-        self.hidden_size = output_size
-        self.layer_num = layer_num
-
-        pl, vl, ppl = [], [], []
-        for i in range(layer_num):
-            if i == 0:
-                pl.append(SimpleGATLayer(v_dim, p_dim, edge_dim["v"], self.hidden_size, nhead=4))
-                vl.append(SimpleGATLayer(p_dim, v_dim, edge_dim["v"], self.hidden_size, nhead=4))
-                # p2p links.
-                ppl.append(
-                    SimpleGATLayer(
-                        p_dim, p_dim, edge_dim["p"], self.hidden_size, nhead=4, position_encoding=False)
-                )
-            else:
-                pl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
-                if i != layer_num - 1:
-                    # p2v conv is not necessary at the last layer, for we only use port features.
-                    vl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
-                ppl.append(SimpleGATLayer(
-                    self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4, position_encoding=False))
-        self.p_layers = nn.ModuleList(pl)
-        self.v_layers = nn.ModuleList(vl)
-        self.pp_layers = nn.ModuleList(ppl)
-
-    def forward(self, p, pe, v, ve, ppe):
-        """Do the multi-channel graph attention.
-
-        Args:
-            p (Tensor): The port feature.
-            pe (Tensor): The vessel-port edge feature.
-            v (Tensor): The vessel feature.
-            ve (Tensor): The port-vessel edge feature.
-            ppe (Tensor): The port-port edge feature.
-        """
-        # p.shape: (batch*p_cnt, p_dim)
-        pp = p
-        pre_p, pre_v, pre_pp = p, v, pp
-        for i in range(self.layer_num):
-            # Only feed edge info in the first layer.
-            p = self.p_layers[i](pre_v, pre_p, adj=pe["adj"], edges=pe["edge"] if i == 0 else None, mask=pe["mask"])
-            if i != self.layer_num - 1:
-                v = self.v_layers[i](
-                    pre_p, pre_v, adj=ve["adj"], edges=ve["edge"] if i == 0 else None, mask=ve["mask"])
-            pp = self.pp_layers[i](
-                pre_pp, pre_pp, adj=ppe["adj"], edges=ppe["edge"] if i == 0 else None, mask=ppe["mask"])
-            pre_p, pre_v, pre_pp = p, v, pp
-        p = torch.cat((p, pp), axis=2)
-        return p, v
-
-
-class GeLU(nn.Module):
-    """Simple gelu wrapper as a independent module."""
-    def __init__(self):
-        super().__init__()
-
-    def forward(self, input):
-        return F.gelu(input)
-
-
-class Header(nn.Module):
-    def __init__(self, input_size, hidden_size, output_size, net_type="res"):
-        super().__init__()
-        self.net_type = net_type
-        if net_type == "res":
-            self.fc_0 = nn.Linear(input_size, hidden_size)
-            self.act_0 = GeLU()
-            # self.do_0 = Dropout(dropout)
-            self.fc_1 = nn.Linear(hidden_size, input_size)
-            self.act_1 = GeLU()
-            self.fc_2 = nn.Linear(input_size, output_size)
-        elif net_type == "2layer":
-            self.fc_0 = nn.Linear(input_size, hidden_size)
-            self.act_0 = GeLU()
-            # self.do_0 = Dropout(dropout)
-            self.fc_1 = nn.Linear(hidden_size, hidden_size // 2)
-            self.act_1 = GeLU()
-            self.fc_2 = nn.Linear(hidden_size // 2, output_size)
-        elif net_type == "1layer":
-            self.fc_0 = nn.Linear(input_size, hidden_size)
-            self.act_0 = GeLU()
-            self.fc_1 = nn.Linear(hidden_size, output_size)
-
-    def forward(self, x):
-        if self.net_type == "res":
-            x1 = self.act_0(self.fc_0(x))
-            x1 = self.act_1(self.fc_1(x1) + x)
-            return self.fc_2(x1)
-        elif self.net_type == "2layer":
-            x = self.act_0(self.fc_0(x))
-            x = self.act_1(self.fc_1(x))
-            x = self.fc_1(x)
-            return x
-        else:
-            x = self.fc_1(self.act_0(self.fc_0(x)))
-            return x
-
-
-class SharedAC(nn.Module):
-    """The actor-critic module shared with multiple agents.
-
-    This module maps the input graph of the observation to the policy and value space. It first extracts the temporal
-    information separately for each node with a small transformer block and then extracts the spatial information with
-    a multi-graph/channel graph attention. Finally, the extracted feature embedding is fed to a actor header as well
-    as a critic layer, which are the two MLPs with residual connections.
-    """
-
-    def __init__(
-            self, input_dim_p, edge_dim_p, input_dim_v, edge_dim_v, tick_buffer, action_dim, a=True, c=True,
-            scale=4, ac_head="res"):
-        super().__init__()
-        assert(a or c)
-        self.a, self.c = a, c
-        self.input_dim_v = input_dim_v
-        self.input_dim_p = input_dim_p
-        self.tick_buffer = tick_buffer
-
-        self.pre_dim_v, self.pre_dim_p = 8 * scale, 16 * scale
-        self.p_pre_layer = nn.Sequential(
-            nn.Linear(input_dim_p, self.pre_dim_p), GeLU(), PositionalEncoder(
-                d_model=self.pre_dim_p, max_seq_len=tick_buffer))
-        self.v_pre_layer = nn.Sequential(
-            nn.Linear(input_dim_v, self.pre_dim_v), GeLU(), PositionalEncoder(
-                d_model=self.pre_dim_v, max_seq_len=tick_buffer))
-        p_encoder_layer = TransformerEncoderLayer(
-            d_model=self.pre_dim_p, nhead=4, activation="gelu", dim_feedforward=self.pre_dim_p * 4)
-        v_encoder_layer = TransformerEncoderLayer(
-            d_model=self.pre_dim_v, nhead=2, activation="gelu", dim_feedforward=self.pre_dim_v * 4)
-
-        # Alternative initialization: define the normalization.
-        # self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_p))
-        # self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_v))
-        self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3)
-        self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3)
-
-        self.gnn_output_size = 32 * scale
-        self.trans_gat = SimpleTransformer(
-            p_dim=self.pre_dim_p,
-            v_dim=self.pre_dim_v,
-            output_size=self.gnn_output_size // 2,
-            edge_dim={"p": edge_dim_p, "v": edge_dim_v},
-            layer_num=2
-        )
-
-        if a:
-            self.policy_hidden_size = 16 * scale
-            self.a_input = 3 * self.gnn_output_size // 2
-            self.actor = nn.Sequential(
-                Header(self.a_input, self.policy_hidden_size, action_dim, ac_head), nn.Softmax(dim=-1))
-        if c:
-            self.value_hidden_size = 16 * scale
-            self.c_input = self.gnn_output_size
-            self.critic = Header(self.c_input, self.value_hidden_size, 1, ac_head)
-
-    def forward(self, state, a=False, p_idx=None, v_idx=None, c=False):
-        assert((a and p_idx is not None and v_idx is not None) or c)
-        feature_p, feature_v = state["p"], state["v"]
-
-        tb, bsize, p_cnt, _ = feature_p.shape
-        v_cnt = feature_v.shape[2]
-        assert(tb == self.tick_buffer)
-
-        # Before: feature_p.shape: (tick_buffer, batch_size, p_cnt, p_dim)
-        # After: feature_p.shape: (tick_buffer, batch_size*p_cnt, p_dim)
-        feature_p = self.p_pre_layer(feature_p.reshape(feature_p.shape[0], -1, feature_p.shape[-1]))
-        # state["mask"]: (batch_size, tick_buffer)
-        # mask_p: (batch_size, p_cnt, tick_buffer)
-        mask_p = state["mask"].repeat(1, p_cnt).reshape(-1, self.tick_buffer)
-        feature_p = self.trans_layer_p(feature_p, src_key_padding_mask=mask_p)
-
-        feature_v = self.v_pre_layer(feature_v.reshape(feature_v.shape[0], -1, feature_v.shape[-1]))
-        mask_v = state["mask"].repeat(1, v_cnt).reshape(-1, self.tick_buffer)
-        feature_v = self.trans_layer_v(feature_v, src_key_padding_mask=mask_v)
-
-        feature_p = feature_p[0].reshape(bsize, p_cnt, self.pre_dim_p)
-        feature_v = feature_v[0].reshape(bsize, v_cnt, self.pre_dim_v)
-
-        emb_p, emb_v = self.trans_gat(feature_p, state["pe"], feature_v, state["ve"], state["ppe"])
-
-        a_rtn, c_rtn = None, None
-        if a and self.a:
-            ap = emb_p.reshape(bsize, p_cnt, self.gnn_output_size)
-            ap = ap[:, p_idx, :]
-            av = emb_v.reshape(bsize, v_cnt, self.gnn_output_size // 2)
-            av = av[:, v_idx, :]
-            emb_a = torch.cat((ap, av), axis=1)
-            a_rtn = self.actor(emb_a)
-        if c and self.c:
-            c_rtn = self.critic(emb_p).reshape(bsize, p_cnt)
-        return a_rtn, c_rtn
--- a/examples/cim/gnn/components/state_shaper.py
+++ b/examples/cim/gnn/components/state_shaper.py
@ -1,235 +0,0 @@
-import numpy as np
-
-from maro.rl.shaping.state_shaper import StateShaper
-
-from .utils import compute_v2p_degree_matrix
-
-
-class GNNStateShaper(StateShaper):
-    """State shaper to extract graph information.
-
-    Args:
-        port_code_list (list): The list of the port codes in the CIM topology.
-        vessel_code_list (list): The list of the vessel code in the CIM topology.
-        max_tick (int): The duration of the simulation.
-        feature_config (dict): The dottable dict that stores the configuration of the observation feature.
-        max_value (int): The norm scale. All the feature are simply divided by this number.
-        tick_buffer (int): The value n in n-step TD.
-        only_demo (bool): Define if the shaper instance is used only for shape demonstration(True) or runtime
-            shaping(False).
-    """
-
-    def __init__(
-            self, port_code_list, vessel_code_list, max_tick, feature_config, max_value=100000, tick_buffer=20,
-            only_demo=False):
-        # Collect and encode all ports.
-        self.port_code_list = list(port_code_list)
-        self.port_cnt = len(self.port_code_list)
-        self.port_code_inv_dict = {code: i for i, code in enumerate(self.port_code_list)}
-
-        # Collect and encode all vessels.
-        self.vessel_code_list = list(vessel_code_list)
-        self.vessel_cnt = len(self.vessel_code_list)
-        self.vessel_code_inv_dict = {code: i for i, code in enumerate(self.vessel_code_list)}
-
-        # Collect and encode ports and vessels together.
-        self.node_code_inv_dict_p = {i: i for i in self.port_code_list}
-        self.node_code_inv_dict_v = {i: i + self.port_cnt for i in self.vessel_code_list}
-        self.node_cnt = self.port_cnt + self.vessel_cnt
-
-        one_hot_coding = np.identity(self.node_cnt)
-        self.port_one_hot_coding = np.expand_dims(one_hot_coding[:self.port_cnt], axis=0)
-        self.vessel_one_hot_coding = np.expand_dims(one_hot_coding[self.port_cnt:], axis=0)
-        self.last_tick = -1
-
-        self.port_features = [
-            "empty", "full", "capacity", "on_shipper", "on_consignee", "booking", "acc_booking", "shortage",
-            "acc_shortage", "fulfillment", "acc_fulfillment"]
-        self.vessel_features = ["empty", "full", "capacity", "remaining_space"]
-
-        self._max_tick = max_tick
-        self._tick_buffer = tick_buffer
-        # To identify one vessel would never arrive at the port.
-        self.max_arrival_time = 99999999
-
-        self.vedge_dim = 2
-        self.pedge_dim = 1
-
-        self._only_demo = only_demo
-        self._feature_config = feature_config
-        self._normalize = True
-        self._norm_scale = 2.0 / max_value
-        if not only_demo:
-            self._state_dict = {
-                # Last "tick" is used for embedding, all zero and never be modified.
-                "v": np.zeros((self._max_tick + 1, self.vessel_cnt, self.get_input_dim("v"))),
-                "p": np.zeros((self._max_tick + 1, self.port_cnt, self.get_input_dim("p"))),
-                "vo": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt), dtype=np.int),
-                "po": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt), dtype=np.int),
-                "vedge": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt, self.get_input_dim("vedge"))),
-                "pedge": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt, self.get_input_dim("vedge"))),
-                "ppedge": np.zeros((self._max_tick + 1, self.port_cnt, self.port_cnt, self.get_input_dim("pedge"))),
-            }
-
-            # Fixed order: in the order of degree.
-
-    def compute_static_graph_structure(self, env):
-        v2p_adj_matrix = compute_v2p_degree_matrix(env)
-        p2p_adj_matrix = np.dot(v2p_adj_matrix.T, v2p_adj_matrix)
-        p2p_adj_matrix[p2p_adj_matrix == 0] = self.max_arrival_time
-        np.fill_diagonal(p2p_adj_matrix, self.max_arrival_time)
-        self._p2p_embedding = self.sort(p2p_adj_matrix)
-
-        v2p_adj_matrix = -v2p_adj_matrix
-        v2p_adj_matrix[v2p_adj_matrix == 0] = self.max_arrival_time
-        self._fixed_v_order = self.sort(v2p_adj_matrix)
-        self._fixed_p_order = self.sort(v2p_adj_matrix.T)
-
-    @property
-    def p2p_static_graph(self):
-        return self._p2p_embedding
-
-    def sort(self, arrival_time, attr=None):
-        """
-        Given the arrival time matrix, this function sort the matrix and return the index matrix in the order of
-        arrival time
-        """
-        n, m = arrival_time.shape
-        if self._feature_config.attention_order == "ramdom":
-            arrival_time = arrival_time + np.random.randint(self._max_tick, size=arrival_time.shape)
-        at_index = np.argsort(arrival_time, axis=1)
-        if attr is not None:
-            idx_tmp = np.repeat(at_index, attr.shape[-1]).reshape(*at_index.shape, attr.shape[-1])
-            attr = np.take_along_axis(attr, idx_tmp, axis=1)
-        mask = np.sort(arrival_time, axis=1) >= self.max_arrival_time
-        at_index += 1
-        at_index[mask] = 0
-        if attr is None:
-            return at_index
-        else:
-            return at_index, attr
-
-    def end_ep_callback(self, snapshot_list):
-        if self._only_demo:
-            return
-        tick_range = np.arange(start=self.last_tick, stop=self._max_tick)
-        self._sync_raw_features(snapshot_list, list(tick_range))
-        self.last_tick = -1
-
-    def _sync_raw_features(self, snapshot_list, tick_range, static_code=None, dynamic_code=None):
-        """This function update the state_dict from snapshot_list in the given tick_range."""
-        if len(tick_range) == 0:
-            # This occurs when two actions happen at the same tick.
-            return
-
-        # One dim features.
-        port_naive_feature = snapshot_list["ports"][tick_range: self.port_code_list: self.port_features] \
-            .reshape(len(tick_range), self.port_cnt, -1)
-        # Number of laden from source to destination.
-        full_on_port = snapshot_list["matrices"][tick_range::"full_on_ports"].reshape(
-            len(tick_range), self.port_cnt, self.port_cnt)
-        # Normalize features to a small range.
-        port_state_mat = self.normalize(port_naive_feature)
-
-        if self._feature_config.onehot_identity:
-            # Add onehot vector to identify port and vessel.
-            port_onehot = np.repeat(self.port_one_hot_coding, len(tick_range), axis=0)
-            if static_code is not None and dynamic_code is not None:
-                # Identify the decision vessel at the decision port.
-                port_onehot[-1, self.port_code_inv_dict[static_code], self.node_code_inv_dict_v[dynamic_code]] = -1
-            port_state_mat = np.concatenate([port_state_mat, port_onehot], axis=2)
-        self._state_dict["p"][tick_range] = port_state_mat
-
-        vessel_naive_feature = snapshot_list["vessels"][tick_range:self.vessel_code_list: self.vessel_features] \
-            .reshape(len(tick_range), self.vessel_cnt, -1)
-        full_on_vessel = snapshot_list["matrices"][tick_range::"full_on_vessels"].reshape(
-            len(tick_range), self.vessel_cnt, self.port_cnt)
-
-        vessel_state_mat = self.normalize(vessel_naive_feature)
-        if self._feature_config.onehot_identity:
-            vessel_state_mat = np.concatenate(
-                [vessel_state_mat, np.repeat(self.vessel_one_hot_coding, len(tick_range), axis=0)], axis=2)
-        self._state_dict["v"][tick_range] = vessel_state_mat
-
-        # last_arrival_time.shape: vessel_cnt * port_cnt
-        # -1 means one vessel never stops at the port
-        vessel_arrival_time = snapshot_list["matrices"][tick_range[-1]:: "vessel_plans"].reshape(
-            self.vessel_cnt, self.port_cnt)
-        # Use infinity time to identify vessels never arrive at the port.
-        last_arrival_time = vessel_arrival_time + 1
-        last_arrival_time[last_arrival_time == 0] = self.max_arrival_time
-        if static_code is not None and dynamic_code is not None:
-            # To differentiate vessel acting on the port and other vessels that have taken or wait to take actions.
-            last_arrival_time[self.vessel_code_inv_dict[dynamic_code], self.port_code_inv_dict[static_code]] = 0
-
-        # Here, we assume that the order of arriving time between two action/event is all the same.
-        vedge_raw = self.normalize(np.stack((full_on_vessel[-1], last_arrival_time), axis=-1))
-        vo, vedge = self.sort(last_arrival_time, attr=vedge_raw)
-        po, pedge = self.sort(last_arrival_time.T, attr=vedge_raw.transpose((1, 0, 2)))
-        self._state_dict["vo"][tick_range] = np.expand_dims(vo, axis=0)
-        self._state_dict["vedge"][tick_range] = np.expand_dims(vedge, axis=0)
-        self._state_dict["po"][tick_range] = np.expand_dims(po, axis=0)
-        self._state_dict["pedge"][tick_range] = np.expand_dims(pedge, axis=0)
-        self._state_dict["ppedge"][tick_range] = self.normalize(full_on_port[-1]).reshape(1, *full_on_port[-1].shape, 1)
-
-    def __call__(self, action_info=None, snapshot_list=None, tick=None):
-        if self._only_demo:
-            return
-        assert((action_info is not None and snapshot_list is not None) or tick is not None)
-
-        if action_info is not None and snapshot_list is not None:
-            # Update the state dict.
-            static_code = action_info.port_idx
-            dynamic_code = action_info.vessel_idx
-            if self.last_tick == action_info.tick:
-                tick_range = [action_info.tick]
-            else:
-                tick_range = list(range(self.last_tick + 1, action_info.tick + 1, 1))
-
-            self.last_tick = action_info.tick
-            self._sync_raw_features(snapshot_list, tick_range, static_code, dynamic_code)
-            tick = action_info.tick
-
-        # State_tick_range is inverse order.
-        state_tick_range = np.arange(tick, max(-1, tick - self._tick_buffer), -1)
-        v = np.zeros((self._tick_buffer, self.vessel_cnt, self.get_input_dim("v")))
-        v[:len(state_tick_range)] = self._state_dict["v"][state_tick_range]
-        p = np.zeros((self._tick_buffer, self.port_cnt, self.get_input_dim("p")))
-        p[:len(state_tick_range)] = self._state_dict["p"][state_tick_range]
-
-        # True means padding.
-        mask = np.ones(self._tick_buffer, dtype=np.bool)
-        mask[:len(state_tick_range)] = False
-        ret = {
-            "tick": state_tick_range,
-            "v": v,
-            "p": p,
-            "vo": self._state_dict["vo"][tick],
-            "po": self._state_dict["po"][tick],
-            "vedge": self._state_dict["vedge"][tick],
-            "pedge": self._state_dict["pedge"][tick],
-            "ppedge": self._state_dict["ppedge"][tick],
-            "mask": mask,
-            "len": len(state_tick_range),
-        }
-
-        return ret
-
-    def normalize(self, feature):
-        if not self._normalize:
-            return feature
-        return feature * self._norm_scale
-
-    def get_input_dim(self, agent_code):
-        if agent_code in self.port_code_inv_dict or agent_code == "p":
-            return len(self.port_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
-        elif agent_code in self.vessel_code_inv_dict or agent_code == "v":
-            return len(self.vessel_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
-        elif agent_code == "vedge":
-            # v-p edge: (arrival_time, laden to destination)
-            return 2
-        elif agent_code == "pedge":
-            # p-p edge: (laden to destination, )
-            return 1
-        else:
-            raise ValueError("agent not exist!")
--- a/examples/cim/gnn/components/utils.py
+++ b/examples/cim/gnn/components/utils.py
@ -1,266 +0,0 @@
-import ast
-import io
-import os
-import random
-import shutil
-import sys
-from collections import OrderedDict, defaultdict
-
-import numpy as np
-import torch
-import yaml
-
-from maro.simulator import Env
-from maro.simulator.scenarios.cim.common import Action
-from maro.utils import clone, convert_dottable
-
-
-def compute_v2p_degree_matrix(env):
-    """This function compute the adjacent matrix."""
-    topo_config = env.configs
-    static_dict = env.summary["node_mapping"]["ports"]
-    dynamic_dict = env.summary["node_mapping"]["vessels"]
-    adj_matrix = np.zeros((len(dynamic_dict), len(static_dict)), dtype=np.int)
-    for v, vinfo in topo_config["vessels"].items():
-        route_name = vinfo["route"]["route_name"]
-        route = topo_config["routes"][route_name]
-        vid = dynamic_dict[v]
-        for p in route:
-            adj_matrix[vid][static_dict[p["port_name"]]] += 1
-
-    return adj_matrix
-
-
-def from_numpy(device, *np_values):
-    return [torch.from_numpy(v).to(device) for v in np_values]
-
-
-def gnn_union(p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask, device):
-    """Union multiple graph in CIM.
-
-    Args:
-        v: Numpy array of shape (seq_len, batch, v_cnt, v_dim).
-        vo: Numpy array of shape (batch, v_cnt, p_cnt).
-        vedge: Numpy array of shape (batch, v_cnt, p_cnt, e_dim).
-    Returns:
-        result (dict): The dictionary that describes the graph.
-    """
-    seq_len, batch, v_cnt, v_dim = v.shape
-    _, _, p_cnt, p_dim = p.shape
-
-    p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask = from_numpy(
-        device, p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask)
-
-    batch_range = torch.arange(batch, dtype=torch.long).to(device)
-    # vadj.shape: (batch*v_cnt, p_cnt*)
-    vadj, vedge = flatten_embedding(vo, batch_range, vedge)
-    # vmask.shape: (batch*v_cnt, p_cnt*)
-    vmask = vadj == 0
-    # vadj.shape: (p_cnt*, batch*v_cnt)
-    vadj = vadj.transpose(0, 1)
-    # vedge.shape: (p_cnt*, batch*v_cnt, e_dim)
-    vedge = vedge.transpose(0, 1)
-
-    padj, pedge = flatten_embedding(po, batch_range, pedge)
-    pmask = padj == 0
-    padj = padj.transpose(0, 1)
-    pedge = pedge.transpose(0, 1)
-
-    p2p_adj = p2p.repeat(batch, 1, 1)
-    # p2p_adj.shape: (batch*p_cnt, p_cnt*)
-    p2p_adj, ppedge = flatten_embedding(p2p_adj, batch_range, ppedge)
-    # p2p_mask.shape: (batch*p_cnt, p_cnt*)
-    p2p_mask = p2p_adj == 0
-    # p2p_adj.shape: (p_cnt*, batch*p_cnt)
-    p2p_adj = p2p_adj.transpose(0, 1)
-    ppedge = ppedge.transpose(0, 1)
-
-    return {
-        "v": v,
-        "p": p,
-        "pe": {
-            "edge": pedge,
-            "adj": padj,
-            "mask": pmask,
-        },
-        "ve": {
-            "edge": vedge,
-            "adj": vadj,
-            "mask": vmask,
-        },
-        "ppe": {
-            "edge": ppedge,
-            "adj": p2p_adj,
-            "mask": p2p_mask,
-        },
-        "mask": seq_mask,
-    }
-
-
-def flatten_embedding(embedding, batch_range, edge=None):
-    if len(embedding.shape) == 3:
-        batch, x_cnt, y_cnt = embedding.shape
-        addon = (batch_range * y_cnt).view(batch, 1, 1)
-    else:
-        seq_len, batch, x_cnt, y_cnt = embedding.shape
-        addon = (batch_range * y_cnt).view(seq_len, batch, 1, 1)
-
-    embedding_mask = embedding == 0
-    embedding += addon
-    embedding[embedding_mask] = 0
-    ret = embedding.reshape(-1, embedding.shape[-1])
-    col_mask = ret.sum(dim=0) != 0
-    ret = ret[:, col_mask]
-    if edge is None:
-        return ret
-    else:
-        edge = edge.reshape(-1, *edge.shape[2:])[:, col_mask, :]
-        return ret, edge
-
-
-def log2json(file_path):
-    """load the log file as a json list."""
-    with open(file_path, "r") as fp:
-        lines = fp.read().splitlines()
-        json_list = "[" + ",".join(lines) + "]"
-        return ast.literal_eval(json_list)
-
-
-def decision_cnt_analysis(env, pv=False, buffer_size=8):
-    if not pv:
-        decision_cnt = [buffer_size] * len(env.node_name_mapping["static"])
-        r, pa, is_done = env.step(None)
-        while not is_done:
-            decision_cnt[pa.port_idx] += 1
-            action = Action(pa.vessel_idx, pa.port_idx, 0)
-            r, pa, is_done = env.step(action)
-    else:
-        decision_cnt = OrderedDict()
-        r, pa, is_done = env.step(None)
-        while not is_done:
-            if (pa.port_idx, pa.vessel_idx) not in decision_cnt:
-                decision_cnt[pa.port_idx, pa.vessel_idx] = buffer_size
-            else:
-                decision_cnt[pa.port_idx, pa.vessel_idx] += 1
-            action = Action(pa.vessel_idx, pa.port_idx, 0)
-            r, pa, is_done = env.step(action)
-    env.reset()
-    return decision_cnt
-
-
-def random_shortage(env, tick, action_dim=21):
-    _, pa, is_done = env.step(None)
-    node_cnt = len(env.summary["node_mapping"]["ports"])
-    while not is_done:
-        """
-        load, discharge = pa.action_scope.load, pa.action_scope.discharge
-        action_idx = np.random.randint(action_dim) - zero_idx
-        if action_idx < 0:
-            actual_action = int(1.0*action_idx/zero_idx*load)
-        else:
-            actual_action = int(1.0*action_idx/zero_idx*discharge)
-        """
-        action = Action(pa.vessel_idx, pa.port_idx, 0)
-        r, pa, is_done = env.step(action)
-
-    shs = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_shortage"]
-    fus = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_fulfillment"]
-    env.reset()
-    return fus - shs, np.sum(shs + fus)
-
-
-def return_scaler(env, tick, gamma, action_dim=21):
-    R, tot_amount = random_shortage(env, tick, action_dim)
-    Rs_mean = np.mean(R) / tick / (1 - gamma)
-    return abs(1.0 / Rs_mean), tot_amount
-
-
-def load_config(config_pth):
-    with io.open(config_pth, "r") as in_file:
-        raw_config = yaml.safe_load(in_file)
-        config = convert_dottable(raw_config)
-
-    if config.env.seed < 0:
-        config.env.seed = random.randint(0, 99999)
-
-    regularize_config(config)
-    return config
-
-
-def save_config(config, config_pth):
-    with open(config_pth, "w") as fp:
-        config = dottable2dict(config)
-        config["env"]["exp_per_ep"] = [f"{k[0]}, {k[1]}, {d}" for k, d in config["env"]["exp_per_ep"].items()]
-        yaml.safe_dump(config, fp)
-
-
-def dottable2dict(config):
-    if isinstance(config, float):
-        return str(config)
-    if not isinstance(config, dict):
-        return clone(config)
-    rt = {}
-    for k, v in config.items():
-        rt[k] = dottable2dict(v)
-    return rt
-
-
-def save_code(folder, save_pth):
-    save_path = os.path.join(save_pth, "code")
-    code_pth = os.path.join(os.getcwd(), folder)
-    shutil.copytree(code_pth, save_path)
-
-
-def fix_seed(env, seed):
-    env.set_seed(seed)
-    np.random.seed(seed)
-    random.seed(seed)
-
-
-def zero_play(**args):
-    env = Env(**args)
-    _, pa, is_done = env.step(None)
-    while not is_done:
-        action = Action(pa.vessel_idx, pa.port_idx, 0)
-        r, pa, is_done = env.step(action)
-    return env.snapshot_list
-
-
-def regularize_config(config):
-    def parse_value(v):
-        try:
-            return int(v)
-        except ValueError:
-            try:
-                return float(v)
-            except ValueError:
-                if v == "false" or v == "False":
-                    return False
-                elif v == "true" or v == "True":
-                    return True
-                else:
-                    return v
-
-    def set_attr(config, attrs, value):
-        if len(attrs) == 1:
-            config[attrs[0]] = value
-        else:
-            set_attr(config[attrs[0]], attrs[1:], value)
-
-    all_args = sys.argv[1:]
-    for i in range(len(all_args) // 2):
-        name = all_args[i * 2]
-        attrs = name[2:].split(".")
-        value = parse_value(all_args[i * 2 + 1])
-        set_attr(config, attrs, value)
-
-
-def analysis_speed(env):
-    speed_dict = defaultdict(int)
-    eq_speed = 0
-    for ves in env.configs["vessels"].values():
-        speed_dict[ves["sailing"]["speed"]] += 1
-    for sp, cnt in speed_dict.items():
-        eq_speed += 1.0 * cnt / sp
-    eq_speed = 1.0 / eq_speed
-    return speed_dict, eq_speed
--- a/examples/cim/gnn/config.yml
+++ b/examples/cim/gnn/config.yml
@ -1,36 +0,0 @@
-env:
-  seed: 10
-  param:
-    durations: 1120
-    scenario: "cim"
-    topology: "global_trade.22p_l0.8"
-    # topology: "toy.4p_ssdd_l0.0"
-training:
-  enable: True
-  parallel_cnt: 1
-  device: "cpu"
-  batch_size: 16
-  shuffle_time: 1
-  rollout_cnt: 500
-  train_freq: 1
-  model_save_freq: 1
-  gamma: 0.99
-  learning_rate: 0.00005
-  td_steps: 100
-  entropy_loss_enable: True
-model:
-  path: "./"
-  tick_buffer: 20
-  hidden_size: 32
-  graph_output_dim: 32
-  action_dim: 21
-  feature:
-    # temporal or random, if temporal, the edges in the graph are listed in the order of event time, else in a
-    # random order.
-    attention_order: temporal
-    onehot_identity: False
-log:
-  path: "./"
-  exp:
-    enable: false
-    freq: 10
--- a/examples/cim/gnn/launcher.py
+++ b/examples/cim/gnn/launcher.py
@ -1,70 +0,0 @@
-import datetime
-import os
-
-from maro.simulator import Env
-from maro.utils import Logger
-
-from components import (
-    GNNLearner, GNNStateShaper, ParallelActor, SimpleAgentManger,
-    decision_cnt_analysis, load_config, return_scaler, save_code, save_config
-)
-
-if __name__ == "__main__":
-    real_path = os.path.split(os.path.realpath(__file__))[0]
-
-    config_path = os.path.join(real_path, "config.yml")
-    config = load_config(config_path)
-
-    # Generate log path.
-    date_str = datetime.datetime.now().strftime("%Y%m%d")
-    time_str = datetime.datetime.now().strftime("%H%M%S.%f")
-    subfolder_name = f"{config.env.param.topology}_{time_str}"
-
-    # Log path.
-    config.log.path = os.path.join(config.log.path, date_str, subfolder_name)
-    if not os.path.exists(config.log.path):
-        os.makedirs(config.log.path)
-
-    simulation_logger = Logger(tag="simulation", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
-
-    # Create a demo environment to retrieve environment information.
-    simulation_logger.info("Approximating the experience quantity of each agent...")
-    demo_env = Env(**config.env.param)
-    config.env.exp_per_ep = decision_cnt_analysis(demo_env, pv=True, buffer_size=8)
-    simulation_logger.info(config.env.exp_per_ep)
-
-    # Add some buffer to prevent overlapping.
-    config.env.return_scaler, tot_order_amount = return_scaler(
-        demo_env, tick=config.env.param.durations, gamma=config.training.gamma)
-    simulation_logger.info(f"Return value will be scaled down by the factor {config.env.return_scaler}")
-
-    save_config(config, os.path.join(config.log.path, "config.yml"))
-    save_code("examples/cim/gnn", config.log.path)
-
-    port_mapping = demo_env.summary["node_mapping"]["ports"]
-    vessel_mapping = demo_env.summary["node_mapping"]["vessels"]
-
-    # Create a mock gnn_state_shaper.
-    static_code_list, dynamic_code_list = list(port_mapping.values()), list(vessel_mapping.values())
-    gnn_state_shaper = GNNStateShaper(
-        static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
-        tick_buffer=config.model.tick_buffer, only_demo=True, max_value=demo_env.configs["total_containers"])
-    gnn_state_shaper.compute_static_graph_structure(demo_env)
-
-    # Create and assemble agent_manager.
-    agent_id_list = list(config.env.exp_per_ep.keys())
-    training_logger = Logger(tag="training", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
-    agent_manager = SimpleAgentManger(
-        "CIM-GNN-manager", agent_id_list, static_code_list, dynamic_code_list, demo_env, gnn_state_shaper,
-        training_logger)
-    agent_manager.assemble(config)
-
-    # Create the rollout actor to collect experience.
-    actor = ParallelActor(config, demo_env, gnn_state_shaper, agent_manager, logger=simulation_logger)
-
-    # Learner function for training and testing.
-    learner = GNNLearner(actor, agent_manager, logger=simulation_logger)
-    learner.learn(config.training)
-
-    # Cancel all the child process used for rollout.
-    actor.exit()
--- a/examples/cim/policy_optimization/README.md
+++ b/examples/cim/policy_optimization/README.md
@ -1,22 +0,0 @@
-# Overview
-
-The CIM problem is one of the quintessential use cases of MARO. The example can
-be run with a set of scenario configurations that can be found under
-maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
-topology, type of algorithm to use, number of training episodes) can be configured
-through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
-all algorithm-specific parameters can be configured through
-the config.py file in that folder.
-
-## Single-host Single-process Mode
-
-To run the CIM example using the DQN algorithm under single-host mode, go to
-examples/cim/dqn and run single_process_launcher.py. You may play around with
-the configuration if you want to try out different settings.
-
-## Distributed Mode
-
-The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
-for distributed training. For debugging purposes, we provide a script that
-simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
-and run multi_process_launcher.py to start the learner and actor processes.
--- a/examples/cim/policy_optimization/components/init.py
+++ b/examples/cim/policy_optimization/components/init.py
@ -1,14 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from .action_shaper import CIMActionShaper
-from .agent_manager import POAgentManager, create_po_agents
-from .experience_shaper import TruncatedExperienceShaper
-from .state_shaper import CIMStateShaper
-
-__all__ = [
-    "CIMActionShaper",
-    "POAgentManager", "create_po_agents",
-    "TruncatedExperienceShaper",
-    "CIMStateShaper"
-]
--- a/examples/cim/policy_optimization/components/action_shaper.py
+++ b/examples/cim/policy_optimization/components/action_shaper.py
@ -1,33 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from maro.rl import ActionShaper
-from maro.simulator.scenarios.cim.common import Action
-
-
-class CIMActionShaper(ActionShaper):
-    def __init__(self, action_space):
-        super().__init__()
-        self._action_space = action_space
-        self._zero_action_index = action_space.index(0)
-
-    def __call__(self, model_action, decision_event, snapshot_list):
-        scope = decision_event.action_scope
-        tick = decision_event.tick
-        port_idx = decision_event.port_idx
-        vessel_idx = decision_event.vessel_idx
-
-        port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
-        vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
-        early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
-        assert 0 <= model_action < len(self._action_space)
-
-        if model_action < self._zero_action_index:
-            actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
-        elif model_action > self._zero_action_index:
-            plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
-            actual_action = round(plan_action) if plan_action > 0 else round(self._action_space[model_action] * scope.discharge)
-        else:
-            actual_action = 0
-
-        return Action(vessel_idx, port_idx, actual_action)
--- a/examples/cim/policy_optimization/components/agent_manager.py
+++ b/examples/cim/policy_optimization/components/agent_manager.py
@ -1,83 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import numpy as np
-import torch.nn as nn
-from torch.optim import Adam, RMSprop
-
-from maro.rl import (
-    AbsAgent, ActorCritic, ActorCriticConfig, FullyConnectedBlock, LearningModel, NNStack,
-    OptimizerOptions, PolicyGradient, PolicyOptimizationConfig, SimpleAgentManager
-)
-from maro.utils import set_seeds
-
-
-class POAgent(AbsAgent):
-    def train(self, states: np.ndarray, actions: np.ndarray, log_action_prob: np.ndarray, rewards: np.ndarray):
-        self._algorithm.train(states, actions, log_action_prob, rewards)
-
-
-def create_po_agents(agent_id_list, config):
-    input_dim, num_actions = config.input_dim, config.num_actions
-    set_seeds(config.seed)
-    agent_dict = {}
-    for agent_id in agent_id_list:
-        actor_net = NNStack(
-            "actor",
-            FullyConnectedBlock(
-                input_dim=input_dim,
-                output_dim=num_actions,
-                activation=nn.Tanh,
-                is_head=True,
-                **config.actor_model
-            )
-        )
-
-        if config.type == "actor_critic":
-            critic_net = NNStack(
-                "critic",
-                FullyConnectedBlock(
-                    input_dim=config.input_dim,
-                    output_dim=1,
-                    activation=nn.LeakyReLU,
-                    is_head=True,
-                    **config.critic_model
-                )
-            )
-
-            hyper_params = config.actor_critic_hyper_parameters
-            hyper_params.update({"reward_discount": config.reward_discount})
-            learning_model = LearningModel(
-                actor_net, critic_net, 
-                optimizer_options={
-                    "actor": OptimizerOptions(cls=Adam, params=config.actor_optimizer),
-                    "critic": OptimizerOptions(cls=RMSprop, params=config.critic_optimizer)
-                } 
-            )
-            algorithm = ActorCritic(
-                learning_model, ActorCriticConfig(critic_loss_func=nn.SmoothL1Loss(), **hyper_params)
-            )
-        else:
-            learning_model = LearningModel(
-                actor_net, 
-                optimizer_options=OptimizerOptions(cls=Adam, params=config.actor_optimizer)  
-            )
-            algorithm = PolicyGradient(learning_model, PolicyOptimizationConfig(config.reward_discount))
-
-        agent_dict[agent_id] = POAgent(name=agent_id, algorithm=algorithm)
-
-    return agent_dict
-
-
-class POAgentManager(SimpleAgentManager):
-    def train(self, experiences_by_agent: dict):
-        for agent_id, exp in experiences_by_agent.items():
-            if not isinstance(exp, list):
-                exp = [exp]
-            for trajectory in exp:
-                self.agent_dict[agent_id].train(
-                    trajectory["state"],
-                    trajectory["action"],
-                    trajectory["log_action_probability"],
-                    trajectory["reward"]
-                )
--- a/examples/cim/policy_optimization/components/config.py
+++ b/examples/cim/policy_optimization/components/config.py
@ -1,19 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-This file is used to load the configuration and convert it into a dotted dictionary.
-"""
-
-import io
-import os
-import yaml
-
-
-CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
-with io.open(CONFIG_PATH, "r") as in_file:
-    config = yaml.safe_load(in_file)
-
-DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
-with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
-    distributed_config = yaml.safe_load(in_file)
--- a/examples/cim/policy_optimization/components/experience_shaper.py
+++ b/examples/cim/policy_optimization/components/experience_shaper.py
@ -1,51 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-from collections import defaultdict
-
-import numpy as np
-
-from maro.rl import ExperienceShaper
-
-
-class TruncatedExperienceShaper(ExperienceShaper):
-    def __init__(self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float,
-                 shortage_factor: float):
-        super().__init__(reward_func=None)
-        self._time_window = time_window
-        self._time_decay_factor = time_decay_factor
-        self._fulfillment_factor = fulfillment_factor
-        self._shortage_factor = shortage_factor
-
-    def __call__(self, trajectory, snapshot_list):
-        agent_ids = np.asarray(trajectory.get_by_key("agent_id"))
-        states = np.asarray(trajectory.get_by_key("state"))
-        actions = np.asarray(trajectory.get_by_key("action"))
-        log_action_probabilities = np.asarray(trajectory.get_by_key("log_action_probability"))
-        rewards = np.fromiter(
-            map(self._compute_reward, trajectory.get_by_key("event"), [snapshot_list] * len(trajectory)),
-            dtype=np.float32
-        )
-        return {agent_id: {
-                    "state": states[agent_ids == agent_id],
-                    "action": actions[agent_ids == agent_id],
-                    "log_action_probability": log_action_probabilities[agent_ids == agent_id],
-                    "reward": rewards[agent_ids == agent_id],
-                }
-                for agent_id in set(agent_ids)}
-
-    def _compute_reward(self, decision_event, snapshot_list):
-        start_tick = decision_event.tick + 1
-        end_tick = decision_event.tick + self._time_window
-        ticks = list(range(start_tick, end_tick))
-
-        # calculate tc reward
-        future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
-        future_shortage = snapshot_list["ports"][ticks::"shortage"]
-        decay_list = [self._time_decay_factor ** i for i in range(end_tick - start_tick)
-                      for _ in range(future_fulfillment.shape[0]//(end_tick-start_tick))]
-
-        tot_fulfillment = np.dot(future_fulfillment, decay_list)
-        tot_shortage = np.dot(future_shortage, decay_list)
-
-        return np.float(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
--- a/examples/cim/policy_optimization/components/state_shaper.py
+++ b/examples/cim/policy_optimization/components/state_shaper.py
@ -1,30 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import numpy as np
-
-from maro.rl import StateShaper
-
-PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
-VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
-
-
-class CIMStateShaper(StateShaper):
-    def __init__(self, *, look_back, max_ports_downstream):
-        super().__init__()
-        self._look_back = look_back
-        self._max_ports_downstream = max_ports_downstream
-        self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
-
-    def __call__(self, decision_event, snapshot_list):
-        tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
-        ticks = [tick - rt for rt in range(self._look_back - 1)]
-        future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
-        port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
-        vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
-        state = np.concatenate((port_features, vessel_features))
-        return str(port_idx), state
-
-    @property
-    def dim(self):
-        return self._dim
--- a/examples/cim/policy_optimization/config.yml
+++ b/examples/cim/policy_optimization/config.yml
@ -1,50 +0,0 @@
-env:
-  scenario: "cim"
-  topology: "toy.4p_ssdd_l0.0"
-  durations: 1120
-  state_shaping:
-    look_back: 7
-    max_ports_downstream: 2
-  experience_shaping:
-    time_window: 100
-    fulfillment_factor: 1.0
-    shortage_factor: 1.0
-    time_decay_factor: 0.97
-main_loop:
-  max_episode: 100
-  early_stopping:
-    warmup_ep: 20
-    last_k: 5
-    perf_threshold: 0.95   # minimum performance (fulfillment ratio) required to trigger early stopping
-    perf_stability_threshold: 0.1  # stability is measured by the maximum of abs(perf_(i+1) - perf_i) / perf_i
-                                   # over the last k episodes (where perf is short for performance). This value must
-                                   # be below this threshold to trigger early stopping
-agents:
-  seed: 1024   # for reproducibility
-  type: "actor_critic"  # "actor_critic" or "policy_gradient"
-  num_actions: 21
-  actor_model:
-    hidden_dims:
-      - 256
-      - 128
-      - 64
-    softmax_enabled: true
-    batch_norm_enabled: false
-  actor_optimizer:
-    lr: 0.001
-  critic_model:
-    hidden_dims:
-      - 256
-      - 128
-      - 64
-    softmax_enabled: false
-    batch_norm_enabled: true
-  critic_optimizer:
-    lr: 0.001
-  reward_discount: .0
-  actor_critic_hyper_parameters:
-    train_iters: 10
-    actor_loss_coefficient: 0.1
-    k: 1
-    lam: 0.0
-    # clip_ratio: 0.8
--- a/examples/cim/policy_optimization/dist_actor.py
+++ b/examples/cim/policy_optimization/dist_actor.py
@ -1,46 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-
-import numpy as np
-
-from maro.simulator import Env
-from maro.rl import AgentManagerMode, SimpleActor, ActorWorker
-from maro.utils import convert_dottable
-
-from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
-
-
-def launch(config):
-    config = convert_dottable(config)
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    state_shaper = CIMStateShaper(**config.env.state_shaping)
-    action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
-    experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
-
-    config["agents"]["input_dim"] = state_shaper.dim
-    agent_manager = POAgentManager(
-        name="cim_actor",
-        mode=AgentManagerMode.INFERENCE,
-        agent_dict=create_po_agents(agent_id_list, config.agents),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper,
-    )
-    proxy_params = {
-        "group_name": os.environ["GROUP"],
-        "expected_peers": {"learner": 1},
-        "redis_address": ("localhost", 6379)
-    }
-    actor_worker = ActorWorker(
-        local_actor=SimpleActor(env=env, agent_manager=agent_manager),
-        proxy_params=proxy_params
-    )
-    actor_worker.launch()
-
-
-if __name__ == "__main__":
-    from components.config import config
-    launch(config)
--- a/examples/cim/policy_optimization/dist_learner.py
+++ b/examples/cim/policy_optimization/dist_learner.py
@ -1,46 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-
-from maro.rl import ActorProxy, AgentManagerMode, Scheduler, SimpleLearner, merge_experiences_with_trajectory_boundaries
-from maro.simulator import Env
-from maro.utils import Logger, convert_dottable
-
-from components import CIMStateShaper, POAgentManager, create_po_agents
-
-
-def launch(config):
-    config = convert_dottable(config)
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-    config["agents"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
-    agent_manager = POAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN,
-        agent_dict=create_po_agents(agent_id_list, config.agents)
-    )
-
-    proxy_params = {
-        "group_name": os.environ["GROUP"],
-        "expected_peers": {"actor": int(os.environ["NUM_ACTORS"])},
-        "redis_address": ("localhost", 6379)
-    }
-
-    learner = SimpleLearner(
-        agent_manager=agent_manager,
-        actor=ActorProxy(
-            proxy_params=proxy_params, experience_collecting_func=merge_experiences_with_trajectory_boundaries
-        ),
-        scheduler=Scheduler(config.main_loop.max_episode),
-        logger=Logger("cim_learner", auto_timestamp=False)
-    )
-    learner.learn()
-    learner.test()
-    learner.dump_models(os.path.join(os.getcwd(), "models"))
-    learner.exit()
-
-
-if __name__ == "__main__":
-    from components.config import config
-    launch(config)
--- a/examples/cim/policy_optimization/distributed_config.yml
+++ b/examples/cim/policy_optimization/distributed_config.yml
@ -1,6 +0,0 @@
-redis:
-  hostname: "localhost"
-  port: 6379
-group: test_group
-num_actors: 1
-num_learners: 1
--- a/examples/cim/policy_optimization/multi_process_launcher.py
+++ b/examples/cim/policy_optimization/multi_process_launcher.py
@ -1,26 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-"""
-This script is used to debug distributed algorithm in single host multi-process mode.
-"""
-
-import argparse
-import os
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("group_name", help="group name")
-    parser.add_argument("num_actors", type=int, help="number of actors")
-    args = parser.parse_args()
-
-    learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
-    actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
-
-    # Launch the learner process
-    os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
-
-    # Launch the actor processes
-    for _ in range(args.num_actors):
-        os.system(f"GROUP={args.group_name} python " + actor_path)
--- a/examples/cim/policy_optimization/single_process_launcher.py
+++ b/examples/cim/policy_optimization/single_process_launcher.py
@ -1,91 +0,0 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
-import os
-from statistics import mean
-
-import numpy as np
-
-from maro.simulator import Env
-from maro.rl import AgentManagerMode, Scheduler, SimpleActor, SimpleLearner
-from maro.utils import LogFormat, Logger, convert_dottable
-
-from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
-
-
-class EarlyStoppingChecker:
-    """Callable class that checks the performance history to determine early stopping.
-
-    Args:
-        warmup_ep (int): Episode from which early stopping checking is initiated.
-        last_k (int): Number of latest performance records to check for early stopping.
-        perf_threshold (float): The mean of the ``last_k`` performance metric values must be above this value to
-            trigger early stopping.
-        perf_stability_threshold (float): The maximum one-step change over the ``last_k`` performance metrics must be
-            below this value to trigger early stopping.
-    """
-    def __init__(self, warmup_ep: int, last_k: int, perf_threshold: float, perf_stability_threshold: float):
-        self._warmup_ep = warmup_ep
-        self._last_k = last_k
-        self._perf_threshold = perf_threshold
-        self._perf_stability_threshold = perf_stability_threshold
-
-        def get_metric(record):
-            return 1 - record["container_shortage"] / record["order_requirements"]
-        self._metric_func = get_metric
-
-    def __call__(self, perf_history) -> bool:
-        if len(perf_history) < max(self._last_k, self._warmup_ep):
-            return False
-
-        metric_series = list(map(self._metric_func, perf_history[-self._last_k:]))
-        max_delta = max(
-            abs(metric_series[i] - metric_series[i - 1]) / metric_series[i - 1] for i in range(1, self._last_k)
-        )
-        print(f"mean_metric: {mean(metric_series)}, max_delta: {max_delta}")
-        return mean(metric_series) > self._perf_threshold and max_delta < self._perf_stability_threshold
-
-
-def launch(config):
-    # First determine the input dimension and add it to the config.
-    config = convert_dottable(config)
-
-    # Step 1: initialize a CIM environment for using a toy dataset.
-    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
-    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
-
-    # Step 2: create state, action and experience shapers. We also need to create an explorer here due to the
-    # greedy nature of the DQN algorithm.
-    state_shaper = CIMStateShaper(**config.env.state_shaping)
-    action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
-    experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
-
-    # Step 3: create an agent manager.
-    config["agents"]["input_dim"] = state_shaper.dim
-    agent_manager = POAgentManager(
-        name="cim_learner",
-        mode=AgentManagerMode.TRAIN_INFERENCE,
-        agent_dict=create_po_agents(agent_id_list, config.agents),
-        state_shaper=state_shaper,
-        action_shaper=action_shaper,
-        experience_shaper=experience_shaper,
-    )
-
-    # Step 4: Create an actor and a learner to start the training process.
-    scheduler = Scheduler(
-        config.main_loop.max_episode,
-        early_stopping_checker=EarlyStoppingChecker(**config.main_loop.early_stopping)
-    )
-    actor = SimpleActor(env, agent_manager)
-    learner = SimpleLearner(
-        agent_manager, actor, scheduler,
-        logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
-    )
-    learner.learn()
-    learner.test()
-    learner.dump_models(os.path.join(os.getcwd(), "models"))
-
-
-if __name__ == "__main__":
-    from components.config import config
-    launch(config)
--- a/examples/hello_world/cim/hello.py
+++ b/examples/hello_world/cim/hello.py
@ -1,50 +1,69 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

+
+# Enable realtime data streaming with following statements.
+
+# import os
+
+# os.environ["MARO_STREAMIT_ENABLED"] = "true"
+# os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "test_317"
+
+
 from maro.simulator import Env
-from maro.simulator.scenarios.cim.common import Action
+from maro.simulator.scenarios.cim.common import Action, ActionType
+from maro.streamit import streamit

-start_tick = 0
-durations = 100  # 100 days
+if __name__ == "__main__":
+    start_tick = 0
+    durations = 100  # 100 days

-opts = dict()
-"""
-enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
-If you leave value to empty string, it will dump to current folder.
-For getting dump data, please uncomment below line and specify dump destination folder.
-"""
-# opts['enable-dump-snapshot'] = ''
+    opts = dict()
+    with streamit:
+        """
+        enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
+        If you leave value to empty string, it will dump to current folder.
+        For getting dump data, please uncomment below line and specify dump destination folder.
+        """
+        # opts['enable-dump-snapshot'] = ''

-# Initialize an environment with a specific scenario, related topology.
-env = Env(scenario="cim", topology="toy.5p_ssddd_l0.0",
-          start_tick=start_tick, durations=durations, options=opts)
+        # Initialize an environment with a specific scenario, related topology.
+        env = Env(scenario="cim", topology="global_trade.22p_l0.1",
+                  start_tick=start_tick, durations=durations, options=opts)

+        # Query environment summary, which includes business instances, intra-instance attributes, etc.
+        print(env.summary)

-# Query environment summary, which includes business instances, intra-instance attributes, etc.
-print(env.summary)
+        for ep in range(2):
+            # Tell streamit we are in a new episode.
+            streamit.episode(ep)

-for ep in range(2):
-    # Gym-like step function
-    metrics, decision_event, is_done = env.step(None)
+            # Gym-like step function.
+            metrics, decision_event, is_done = env.step(None)

-    while not is_done:
-        past_week_ticks = [x for x in range(
-            decision_event.tick - 7, decision_event.tick)]
-        decision_port_idx = decision_event.port_idx
-        intr_port_infos = ["booking", "empty", "shortage"]
+            while not is_done:
+                past_week_ticks = [x for x in range(
+                    max(decision_event.tick - 7, 0), decision_event.tick)]
+                decision_port_idx = decision_event.port_idx
+                intr_port_infos = ["booking", "empty", "shortage"]

-        # Query the decision port booking, empty container inventory, shortage information in the past week
-        past_week_info = env.snapshot_list["ports"][past_week_ticks:
-                                                    decision_port_idx:
-                                                    intr_port_infos]
+                # Query the decision port booking, empty container inventory, shortage information in the past week
+                past_week_info = env.snapshot_list["ports"][past_week_ticks:
+                                                            decision_port_idx:
+                                                            intr_port_infos]

-        dummy_action = Action(decision_event.vessel_idx,
-                              decision_event.port_idx, 0)
+                dummy_action = Action(
+                    decision_event.vessel_idx,
+                    decision_event.port_idx,
+                    0,
+                    ActionType.LOAD
+                )

-        # Drive environment with dummy action (no repositioning)
-        metrics, decision_event, is_done = env.step(dummy_action)
+                # Drive environment with dummy action (no repositioning)
+                metrics, decision_event, is_done = env.step(dummy_action)

-    # Query environment business metrics at the end of an episode,
-    # it is your optimized object (usually includes multi-target).
-    print(f"ep: {ep}, environment metrics: {env.metrics}")
-    env.reset()
+            # Query environment business metrics at the end of an episode,
+            # it is your optimized object (usually includes multi-target).
+            print(f"ep: {ep}, environment metrics: {env.metrics}")
+
+            env.reset()
--- a/examples/hello_world/test_vm.py
+++ b/examples/hello_world/test_vm.py
--- a/examples/proxy/broadcast.py
+++ b/examples/proxy/broadcast.py
@ -18,16 +18,16 @@ def worker(group_name):
                  component_type="worker",
                  expected_peers={"master": 1})
    counter = 0
-    print(f"{proxy.component_name}'s counter is {counter}.")
+    print(f"{proxy.name}'s counter is {counter}.")

    # Nonrecurring receive the message from the proxy.
    for msg in proxy.receive(is_continuous=False):
-        print(f"{proxy.component_name} receive message from {msg.source}.")
+        print(f"{proxy.name} receive message from {msg.source}.")

        if msg.tag == "INC":
            counter += 1
-            print(f"{proxy.component_name} receive INC request, {proxy.component_name}'s count is {counter}.")
-            proxy.reply(received_message=msg, tag="done")
+            print(f"{proxy.name} receive INC request, {proxy.name}'s count is {counter}.")
+            proxy.reply(message=msg, tag="done")


 def master(group_name: str, worker_num: int, is_immediate: bool = False):
@ -55,17 +55,18 @@ def master(group_name: str, worker_num: int, is_immediate: bool = False):
            session_type=SessionType.NOTIFICATION
        )
        # Do some tasks with higher priority here.
-        replied_msgs = proxy.receive_by_id(session_ids)
+        replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
    else:
        replied_msgs = proxy.broadcast(
            component_type="worker",
            tag="INC",
-            session_type=SessionType.NOTIFICATION
+            session_type=SessionType.NOTIFICATION,
+            timeout=-1
        )

    for msg in replied_msgs:
        print(
-            f"{proxy.component_name} get receive notification from {msg.source} with "
+            f"{proxy.name} get receive notification from {msg.source} with "
            f"message session stage {msg.session_stage}."
        )

--- a/examples/proxy/scatter.py
+++ b/examples/proxy/scatter.py
@ -22,11 +22,11 @@ def summation_worker(group_name):

    # Nonrecurring receive the message from the proxy.
    for msg in proxy.receive(is_continuous=False):
-        print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
+        print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")

        if msg.tag == "job":
            replied_payload = sum(msg.payload)
-            proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
+            proxy.reply(message=msg, tag="sum", payload=replied_payload)


 def multiplication_worker(group_name):
@ -42,11 +42,11 @@ def multiplication_worker(group_name):

    # Nonrecurring receive the message from the proxy.
    for msg in proxy.receive(is_continuous=False):
-        print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
+        print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")

        if msg.tag == "job":
            replied_payload = np.prod(msg.payload)
-            proxy.reply(received_message=msg, tag="multiply", payload=replied_payload)
+            proxy.reply(message=msg, tag="multiply", payload=replied_payload)


 def master(group_name: str, sum_worker_number: int, multiply_worker_number: int, is_immediate: bool = False):
@ -88,19 +88,20 @@ def master(group_name: str, sum_worker_number: int, multiply_worker_number: int,
                                     session_type=SessionType.TASK,
                                     destination_payload_list=destination_payload_list)
        # Do some tasks with higher priority here.
-        replied_msgs = proxy.receive_by_id(session_ids)
+        replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
    else:
        replied_msgs = proxy.scatter(tag="job",
                                     session_type=SessionType.TASK,
-                                     destination_payload_list=destination_payload_list)
+                                     destination_payload_list=destination_payload_list,
+                                     timeout=-1)

    sum_result, multiply_result = 0, 1
    for msg in replied_msgs:
        if msg.tag == "sum":
-            print(f"{proxy.component_name} receive message from {msg.source} with the sum result {msg.payload}.")
+            print(f"{proxy.name} receive message from {msg.source} with the sum result {msg.payload}.")
            sum_result += msg.payload
        elif msg.tag == "multiply":
-            print(f"{proxy.component_name} receive message from {msg.source} with the multiply result {msg.payload}.")
+            print(f"{proxy.name} receive message from {msg.source} with the multiply result {msg.payload}.")
            multiply_result *= msg.payload

    # Check task result correction.
--- a/examples/proxy/send.py
+++ b/examples/proxy/send.py
@ -22,11 +22,11 @@ def worker(group_name):

    # Nonrecurring receive the message from the proxy.
    for msg in proxy.receive(is_continuous=False):
-        print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
+        print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")

        if msg.tag == "sum":
            replied_payload = sum(msg.payload)
-            proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
+            proxy.reply(message=msg, tag="sum", payload=replied_payload)


 def master(group_name: str, is_immediate: bool = False):
@ -49,19 +49,19 @@ def master(group_name: str, is_immediate: bool = False):

    for peer in proxy.peers_name["worker"]:
        message = SessionMessage(tag="sum",
-                                 source=proxy.component_name,
+                                 source=proxy.name,
                                 destination=peer,
                                 payload=random_integer_list,
                                 session_type=SessionType.TASK)
        if is_immediate:
            session_id = proxy.isend(message)
            # Do some tasks with higher priority here.
-            replied_msgs = proxy.receive_by_id(session_id)
+            replied_msgs = proxy.receive_by_id(session_id, timeout=-1)
        else:
-            replied_msgs = proxy.send(message)
+            replied_msgs = proxy.send(message, timeout=-1)

        for msg in replied_msgs:
-            print(f"{proxy.component_name} receive {msg.source}, replied payload is {msg.payload}.")
+            print(f"{proxy.name} receive {msg.source}, replied payload is {msg.payload}.")


 if __name__ == "__main__":
--- a/examples/vector_env/hello.py
+++ b/examples/vector_env/hello.py
@ -0,0 +1,46 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from maro.simulator.scenarios.cim.common import Action, DecisionEvent
+from maro.vector_env import VectorEnv
+
+
+if __name__ == "__main__":
+    with VectorEnv(batch_num=4, scenario="cim", topology="toy.5p_ssddd_l0.0", durations=100) as env:
+        for ep in range(2):
+            print("current episode:", ep)
+
+            metrics, decision_event, is_done = (None, None, False)
+
+            while not is_done:
+                action = None
+
+                # Usage:
+                # 1. Only push speicified (1st for this example) environment, leave others behind
+                # if decision_event:
+                #     env0_dec: DecisionEvent = decision_event[0]
+
+                #     # 1.1 After 1st environment is done, then others will push forward.
+                #     if env0_dec:
+                #         ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
+                #         action = {0: Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)}
+
+                # 2. Only pass action to 1st environment (give None to other environments),
+                # but keep pushing all the environment, until the end
+                if decision_event:
+                    env0_dec: DecisionEvent = decision_event[0]
+
+                    if env0_dec:
+                        ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
+
+                        action = [None] * env.batch_number
+
+                        # with a list of action, will push all environment to next step
+                        action[0] = Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)
+
+                metrics, decision_event, is_done = env.step(action)
+
+            print("Final tick for each environment:", env.tick)
+            print("Final frame index for each environment:", env.frame_index)
+
+            env.reset()
--- a/examples/vm_scheduling/README.md
+++ b/examples/vm_scheduling/README.md
@ -0,0 +1,12 @@
+# Simulation Results
+
+Below table is the simulation results of current topologies based on `Best Fit` algorithm.
+
+In the oversubscription topologies, the oversubscription rate is `115%`.
+
+|Topology | PM Setting | Time Spent(s) | Total VM Requests |Successful Allocation| Energy Consumption| Total Oversubscriptions | Total Overload PMs
+|:----:|-----|:--------:|:---:|:-------:|:----:|:---:|:---:|
+|10k| 100 PMs, 32 Cores, 128 GB  | 104.98|10,000| 10,000| 2,399,610 | 0 | 0|
+|10k.oversubscription| 100 PMs, 32 Cores, 128 GB|  101.00 |10,000 |10,000| 2,386,371| 279,331 | 0|
+|336k| 880 PMs, 16 Cores, 112 GB | 7,896.37 |335,985| 109,249 |26,425,878 | 0 | 0 |
+|336k.oversubscription| 880 PMs, 16 Cores, 112 GB | 7,903.33| 335,985| 115,008 | 27,440,946 | 3,868,475 | 0
--- a/examples/vm_scheduling/best_fit/config.yml
+++ b/examples/vm_scheduling/best_fit/config.yml
@ -1,7 +0,0 @@
-env:
-  scenario: vm_scheduling
-  topology: azure.2019.10k
-  start_tick: 0
-  durations: 8638
-  resolution: 1
-  seed: 88
--- a/examples/vm_scheduling/best_fit/launcher.py
+++ b/examples/vm_scheduling/best_fit/launcher.py
@ -1,74 +0,0 @@
-import io
-import os
-import random
-import timeit
-
-import yaml
-
-from maro.simulator import Env
-from maro.simulator.scenarios.vm_scheduling import AllocateAction, DecisionPayload, PostponeAction
-from maro.utils import convert_dottable
-
-CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "config.yml")
-with io.open(CONFIG_PATH, "r") as in_file:
-    raw_config = yaml.safe_load(in_file)
-    config = convert_dottable(raw_config)
-
-
-if __name__ == "__main__":
-    start_time = timeit.default_timer()
-
-    env = Env(
-        scenario=config.env.scenario,
-        topology=config.env.topology,
-        start_tick=config.env.start_tick,
-        durations=config.env.durations,
-        snapshot_resolution=config.env.resolution
-    )
-
-    if config.env.seed is not None:
-        env.set_seed(config.env.seed)
-        random.seed(config.env.seed)
-
-    metrics: object = None
-    decision_event: DecisionPayload = None
-    is_done: bool = False
-    action: AllocateAction = None
-    metrics, decision_event, is_done = env.step(None)
-
-    while not is_done:
-        valid_pm_num: int = len(decision_event.valid_pms)
-        if valid_pm_num <= 0:
-            # No valid PM now, postpone.
-            action: PostponeAction = PostponeAction(
-                vm_id=decision_event.vm_id,
-                postpone_step=1
-            )
-        else:
-            # Get the capacity and allocated cores from snapshot.
-            valid_pm_info = env.snapshot_list["pms"][
-                env.frame_index:decision_event.valid_pms:["cpu_cores_capacity", "cpu_cores_allocated"]
-            ].reshape(-1, 2)
-            # Calculate to get the remaining cpu cores.
-            cpu_cores_remaining = valid_pm_info[:, 0] - valid_pm_info[:, 1]
-            # Choose the one with the closet remaining CPU.
-            chosen_idx = 0
-            minimum_remaining_cpu_cores = cpu_cores_remaining[0]
-            for i, remaining in enumerate(cpu_cores_remaining):
-                if remaining < minimum_remaining_cpu_cores:
-                    chosen_idx = i
-                    minimum_remaining_cpu_cores = remaining
-            # Take action to allocate on the closet pm.
-            action: AllocateAction = AllocateAction(
-                vm_id=decision_event.vm_id,
-                pm_id=decision_event.valid_pms[chosen_idx]
-            )
-        metrics, decision_event, is_done = env.step(action)
-
-    end_time = timeit.default_timer()
-    print(
-        f"[Best fit] Topology: {config.env.topology}. Total ticks: {config.env.durations}."
-        f" Start tick: {config.env.start_tick}."
-    )
-    print(f"[Timer] {end_time - start_time:.2f} seconds to finish the simulation.")
-    print(metrics)
--- a/examples/vm_scheduling/random/config.yml
+++ b/examples/vm_scheduling/random/config.yml
@ -1,7 +0,0 @@
-env:
-  scenario: vm_scheduling
-  topology: azure.2019.10k
-  start_tick: 0
-  durations: 8638
-  resolution: 1
-  seed: 666
--- a/Показать больше
+++ b/Показать больше