Update features in v0.2 into branch master to release a new version (#297)
* refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * bug fix related to np array divide (#245) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Master.simple bike (#250) * notebook for simple bike repositioning added * add simple rule-based algorithms * unify input * add policy based on statistics * update be for simple bike scenario to fit latest event buffer changes (#247) * change rendered graph * figures updated * change notebook * matplot updated * figures updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: wesley <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * simple bike repositioning article: formula updated * checked out docs/source from v0.2 * aligned with v0.2 * rm unwanted import * added references in policy_optimization.py * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 backend dynamic node support (#172) * update lint workflow * fix workflow issue * Update lint.yml * Create tox.ini * Update lint.yml * Update lint.yml * Update tox.ini * Update lint.yml * Delete tox.ini from root folder, move it to .github/linters * Update CONTRIBUTING.md * add more comments * update lint conf to ignore cli banner issue * change extension implementation from c to cpp * update script to gen cpp files * backend base interface redefine * interface revamp for np backend * 1st step for revamp * bug fix * draft * implementation of attribute * implementation of backend * remove backend switching * draft raw backend wrapper * correct function parameter type * 1st runable version * bug fix for types * ut passed * change CRLF to LF * fix get_node_info interface * add raw test in frame ut * return np.array for all query result * use ticks from backend * set init value * snapshot ut passed * support set default backend by environemnt variable * env ut with different backend * fix take snapshot index bug * test under both backends * ignore generated cpp file * fix lint isues * more lint fix * use ordered map to store ticks to keep the order * remove test code * refine dup code * refine code to avoid too much if/else * handle and raise exception for attr getter * change the way to handle cpp exception, use cython runtimeerror instead * add missing function, and fix bug in np impl * fix lint issue * specify c++11 flag for compilers * use normal field assignment instead initializer list, as linux gcc will complain it * add np ignore macro * try to refine token pasting operator to avoid error on linux * more pasting operator issue fix * remove un-used options * update workflow files to fit new backend * 1st version of dynamic backend structure * setup ut for cpp using lest * bitset complete * attributestore and ut * arrange * copy_to * current frame * ut for frame * bug fix and ut correct * fix issue that value not correct after arrange * fix bug in test case * frame update * change the way to add nodes, support add node from middle * frame in backend * snapshotlist code complete * add size method for snapshotlist, add ut template * make sure snapshot max size not be 0 * add max size * fix query parameters * fix attribute store extend error * add function to retrieve attribute from snapshotlist * return nan for invalid index * add function to check if nan for float attribute only * fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash * add functions to expose internal state under debug mode, make it easy to do unit test * fix issue that cause overlap logic skiped * ut passed for all implemented functions * remove query in ut, as it not completed yet * refine querying interfaces, use 2 functions for 1 querying * snapshot query, * use pointer instead weak_ptr * backend impl * set default parameters value * query bug fix, * bug fix: new_attr should return attr id not node id * use macro to create attribute getters * add reset support * change the way to reset, avoid allocation time * test reset for attributestore * use Bitset instead vector<bool> to make it easy to reset * refine backend interfaces to make it compact with old one * correct quering interface, cython compile passed * bug fix: get_ticks not set correct index * correct cpp backend binding, add type for frame * correct ut for snapshot * bug fix: query cause crash after snapshot reset * fix env test * bug fix: is_nan should check data type first * fix cim ut issues with raw backend * fix citibike ut issues for raw backend * add interfaces to support dynamic nodes, not tested * bug fix: access cpp object without cdef * bug fix: missing impl for dynamic methods * ut for append nodes * return node number dynamiclly * remove unused parameters for snapshot * remove unused code * allow get attribute for deleted node * ut for delete and resume node * function to set attribute slot * bug fix: set attribute will cause crash * bug fix: remove append node when reset cause exception * bug fix: frame.backend_type return incorrect name * backends performance comparison * correct internal type * correct warnings * missing ; * formating * fix lint issue * simple the way to copy mapping * add dump interfaces * frame dump * ignore if dump path is not exist * bug fix: use max slots instead of current slots for padding in snapshot querying * use max slot number in history instead of current for padding * dump for snapshot * close file at the end * refine snapshot dump function * fix lint issue * avoid too much allocate operation * use pointer instead reference for furthure changes * avoid 2 times map copy * add comments for missing functions * performance optimize * use emplace instead push * use emplace instead push * remove cpp files * add missing lisence * ignore .vs folder * add lest lisence for cpp unittest * Delete CMakeLists.txt * add error msg for exception, make it easy to identify error at python side * remove old codes * replace with new code * change IDENTIER to NODE_TYPE and ATTR_TYPE * build pass * fix attr type not correct bug * reomve unused comment * make frame ut pass * correct the max snapshots checking * fix test case * add missing file * correct performance test * refine attribute code * refine bitset code * update FrameBase doc about switch backend * correct the exception name * refine frame code * refine node code * refine snapshot list code * add is_const and is_list when adding attribute * support query const attribute without tick exist * add operations for list attribute * remove cache as we have list attribute * add remove and insert for list attribute * add for-loop support for list attribute * fix bug that not update list attribute slot number after operations * test for dynamic features * frame dump * dump for snapshot list * fix issue on gcc compiler * add missing file * fix lint issues * refine the exception, more comments * fix lint issue * fix lint issue * use simulate enum instead of str * Use new type instead old in tests * using mapping instead if-else * remove generated code * use mapping to reduce too much if-else * add default attribute type int if not provided or invalid provided * remove generated code * update workflow with code gen * more frame test * add missing files * test: cover maro.simulator.utils.common * update test with new scenario * comments * tests * update doc * fix lint and comments * CRLF to LF * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 vm oversub docs (#256) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment * Update overload to the VM docs * Update docs * Update vm docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 ddpg (#252) * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * some revision to DDPG * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * fixed some issues with DDPG code * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * fixed naming issues * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * tmp commit * tmp commit * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * fixed learning model naming * fixed conflicts * updated ddpg example * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * added ddpg example for cim * fixed some bugs * removed dimension check in absence of shared stack * fixed a typo * bug fixes * bug fixes * aligned with v0.2 * aligned with v0.2 * fixed lint issues * added reference in ddpg.py * fixed lint issues * fixed lint issues * fixed lint issues * removed ddpg example * checked out files from origin/v0.2 before merging Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 cli refactoring (#227) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * refactor: extract reusable methods to GrassExecutor * feat: refine validation.py and add docstrings * fix: add remote prefix to ssh function * style: refine logging output * fix: extract param 'vm_name' * fix: linting errors * feat: add NodeStatus and ContainerStatus at executors * feat: use master_node_size as the size of build_node_image_vm * fix: refine comments * feat: add "state" key for node_details * fix: linting errors * fix: deployment error when ssh_port is the default port * refactor: extract utils/*.py in scripts * style: single quote to double quote * refactor: refine folder structure of scripts * fix: linting errors * fix: add executable to fix error initialization * refactor: use SubProcess to execute commands in scripts * refactor: refine script namings * refactor: extract utils/*.py and systemd/*.service in agents * feat: refine Exception structure, add SubProcess class in agents * feat: use psutil to get resource details, move resource details initialization to agents * fix: linting errors * feat: use docker sdk in node_agent * feat: extract RedisExecutor in agents * test: remove image when tearing down * feat: add LoadImageAgent * feat: move node status update to agents * refactor: move utils folder to upper level in scripts * feat: add node_api_server, refine agents folder structure * fix: linting errors * refactor: refine folder structure in grass/lib * refactor: build DeploymentValidator class * refactor: create DetailsReader, DetailsWriter, delete sync mode * refactor: rename DockerManager to DockerController * refactor: rename RedisManager to RedisController * refactor: rename AzureExecutor to AzureController * refactor: create NameCreator * refactor: create PathConvertor * refactor: rename checkers to details_validity_wrapper * refactor: rename lock to operation_lock_wrapper * refactor: create FileSynchronizer * refactor: create redis instance in RedisController * feat: add master_api_server, move job related scripts to api_server * refactor: move node related scripts to api_server * fix: use "DELETE" instead of "DEL" as http method * refactor: use mapping names instead of namings like "sths_details" * feat: move master related scripts to api_server * feat: move containers related scripts to api_server * fix: add gracefully wait for remote_start_master_services * feat: move image_files related scripts to api_server * fix: improper test in the training stage * refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client * refactor: refine namings in services * feat: move clean related scripts to api_server * refactor: delete "public_key" field * feat: build MasterApiClient * refactor: delete sync_mkdir * feat: refine locks in node_details * feat: build DockerController for grass/utils * refactor: rename Extractor to Controller * feat: move schedule related components to api_server * fix: incorrect allocation when starting batch jobs * fix: missing field "containers" in job_details * feat: add delete_job in master_api_server * feat: add logger in agents * fix: no "resources" field when scale up node at the very beginning * feat: use Process back instead of Thread in node_agent * feat: add 'v1' prefix to api_servers' urls * refactor: move lib/aks under lib/clouds * refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis * feat: extract K8sExecutor * fix: add one more searching layer of pakcage_data at maro.cli.k8s * refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode * refactor: move id init to standardize_create_deployment in grass/azure mode * fix: use GlobalParams instead of hard-coded data * feat: build K8sDetailsReader, K8sDetailsWriter * feat: use k8s sdk to replace subprocess call * refactor: delete redundant vars * refactor: move more methods to K8sExecutor * test: use legal naming in tests/cli/k8s * refactor: refine logging messages * refactor: make create() as a staticmethod at grass/azure mode, refine logging messages * feat: build ArmTemplateParameterBuilder in K8sAzureExecutor * refactor: remove redundant params * refactor: rename /clouds to /modes * refactor: refine structures and logging messages in GrassExecutor * feat: add 'PENDING' to NodeStatus * feat: refine build_job_details for create schedule in grass/azure * feat: refine build_job_details for create schedule in k8s/aks * feat: use node_join schema in grass/azure * refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts * refactor: add 'ssh', 'api_server' into master_details and node_details * refactor: move master runtine params initialization into api_server * refactor: refine namings * feat: reconstruct grass/on-premises with new schema * refactor: delete field 'user' in grass_azure_create * refactor: rename 'blueprints_v1' to 'blueprints' * refactor: move some GlobalPaths to subfolders * refactor: replace 'connection' field with 'master' or 'node' * refactor: move start_service scripts to init_master.py * refactor: rename grass/master/release to grass/master/delete_master * refactor: load local_details in node services, refine script namings * refactor: move invocations of start_node and stop node to api server * fix: add missing imports * refactor: rename SubProcess to Subprocess * refactor: delete field 'user' in k8s_aks_create * refactor: refine folder structures in /.maro/clusters/cluster * refactor: move /logs to /clusters/{cluster_name} * refactor: refine filenames * fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings * refactor: refine code structures, delete redundant code * refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml * feat: add rsa+aes data encryption on dev-master communication * fix: change MasterApiClient to RedisController in node-related services and scripts * refactor: remove all "{cluster_name}" in redis keys * refactor: extract init_master and create_user to GrassExecutor * test: refine tests in grass/azure and k8s/aks * refactor: refine ArmTemplateParameterBuilder * feat: change the order of installation in init_build_node_image_vm.py * fix: add user/admin_id to grass_on_premises_create.yml * fix: change outdated container names * feat: add standardize_join_cluster_deployment in grass/on-premises * feat: add init_node_runtime_env in join_cluster.py * refactor: refine code structure in join_cluster.py * test: add TestGrassOnPremises * refactor: refine ARM templates * fix: linting errors * fix: test requirements error * fix: arm linting errors * refactor: late import in grass, k8s * style: refine load_parser_grass * style: refine load_parser_k8s * docs: update orchestrations * fix: fix get_job_logs * docs: add docs for GrassAzureExecutor, GrassExecutor * docs: add docs for GrassOnPremisesExecutor * docs: add docs for /grass/scripts * docs: add docs for /grass/services * docs: add docs for /grass/utils * docs: add docs for k8s * try paramiko of another version * rollback paramiko package version Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * Refine joint decision sequential action mode (#219) * refine the logic about jont decision sequential action mode to match current event buffer implementation * fix lint issue * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 merge algorithm into agent (#259) * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed exp pool type spec in AbsAgent * fixed lint issues * dqn exp pool bug fix * minor issues * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * refined LearningModel * updated cim example doc * lint issue fix * small refinements * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * lint issue fix * formatting * 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model * minor formatting fixes * refinement * rm unwanted import * add List typing in schedular * lint issue fix Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * V0.2 gnn refactoring (#274) * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed exp pool type spec in AbsAgent * fixed lint issues * dqn exp pool bug fix * minor issues * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * refined LearningModel * updated cim example doc * lint issue fix * small refinements * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * 1. moved early stopping logic inside scheduler; 2. added scheduler options for optimizers in learning-model * minor formatting fixes * refinement * rm unwanted import * add List typing in schedular * lint issue fix * removed redundant parameters for GNNBasedACModel * restored duration to 1120 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wesley <Wenlei.Shi@microsoft.com> * Add vector env support (#266) * 1st version * make vectorenv can import under module root * allow outside control which environment to push, so we do not need to control the tick for each environments * remove comment * lint fixing * add test for vector env, correct the batch number * lint fixing * reduce parameters * Update vector env ut to test if support raw backend * correct comments on hello * fix review comments, cim actiontype wip * add a compatiable way to handle ActionType for cim scenario * lint fix * correct the action type to handle previous action * add doc string for wrappers Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 Rule-based algorithms for VM Scheduling (#255) * rule_based_algorithm * revise_the_code_by_aiming_hao * revise_the_code_by_aiming_hao * use the np.argmin * Update best_fit.py fix the "np not defined" * refine the code * fix the error * refine the code * fix the error * fix the error * refine the code * remove the history * refine the code * update first_fit * Refine the code Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> * delete duplicated rule based algorithms for VM scheduling * Add slot filter functions for node attribute (#273) * add where filter for general usage * test for general filter * simpler comparison for attribute * filter on raw * fix array fetch bug * ut for base comparison * lint fix * remove unused variables * update ignore * Fix coding style (#284) * V0.2 vm region support (#258) * Region init * Add region, zone, cluster * Fix bug * Add update parent id * Update PM config * Update number * Fix import order * Fix bug * Modify config * Add cluster attribute * Refine naming * Fix bug * Modify 336k config * Update region * Update config * Update pm config * pylint * Add comment * Update based on PR comment * Modify config and zone class * Add unit test * Update region part * Update pylint * Modify unit test * Refactor region structure * Add comment and fix style * Fix machine num bugs * Modify config * Fix style * Fix bugs and add empty machine attributes * Add update upper level metrics * Update config * Fix lint style * Modify doc strings * Fix amount counter * Update unit test * fix lint style * Update the ids init * Init total and empty machine num * Update lint style * Fix snapshot attributes initial state * Update config * add topologies for over-subscription and multi-cluster to be compatible with the previous topologies * Add simulation result * Move readme * Add overload results Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 rule based algorithm readme (#282) * Add README.md and refine the bin_packing algorithm * refine round_robin and bin_packing * Update README.md * Refine the code and README.md * Refine the bin_packing and round_robin * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> * Feature: Add a cli command to support create new project. (#279) * maro project new * remove maro project run * add get_metrics to template * add license * more comments * lint issue fix * linting issue fix * fix linting issue * linting issue fix * remove unused code gen * include template files * fix incorrect comment * include topologies for vm_scheduling scenario * rename to PositiveNumberValidator * refine command line comment * refine topology command comment * add a simple doc for new command * fix incorrect value for dummy frame * correct issues in docs * more comments on set_state * doc issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * CLI visualization support and maro grass local mode (#277) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * refactor: extract reusable methods to GrassExecutor * feat: refine validation.py and add docstrings * fix: add remote prefix to ssh function * style: refine logging output * fix: extract param 'vm_name' * fix: linting errors * feat: add NodeStatus and ContainerStatus at executors * feat: use master_node_size as the size of build_node_image_vm * fix: refine comments * feat: add "state" key for node_details * fix: linting errors * fix: deployment error when ssh_port is the default port * refactor: extract utils/*.py in scripts * style: single quote to double quote * refactor: refine folder structure of scripts * fix: linting errors * fix: add executable to fix error initialization * refactor: use SubProcess to execute commands in scripts * refactor: refine script namings * refactor: extract utils/*.py and systemd/*.service in agents * feat: refine Exception structure, add SubProcess class in agents * feat: use psutil to get resource details, move resource details initialization to agents * fix: linting errors * feat: use docker sdk in node_agent * feat: extract RedisExecutor in agents * test: remove image when tearing down * feat: add LoadImageAgent * feat: move node status update to agents * refactor: move utils folder to upper level in scripts * feat: add node_api_server, refine agents folder structure * fix: linting errors * refactor: refine folder structure in grass/lib * refactor: build DeploymentValidator class * refactor: create DetailsReader, DetailsWriter, delete sync mode * refactor: rename DockerManager to DockerController * refactor: rename RedisManager to RedisController * refactor: rename AzureExecutor to AzureController * refactor: create NameCreator * refactor: create PathConvertor * refactor: rename checkers to details_validity_wrapper * refactor: rename lock to operation_lock_wrapper * refactor: create FileSynchronizer * refactor: create redis instance in RedisController * feat: add master_api_server, move job related scripts to api_server * refactor: move node related scripts to api_server * fix: use "DELETE" instead of "DEL" as http method * refactor: use mapping names instead of namings like "sths_details" * feat: move master related scripts to api_server * feat: move containers related scripts to api_server * fix: add gracefully wait for remote_start_master_services * feat: move image_files related scripts to api_server * fix: improper test in the training stage * refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client * refactor: refine namings in services * feat: move clean related scripts to api_server * refactor: delete "public_key" field * feat: build MasterApiClient * refactor: delete sync_mkdir * feat: refine locks in node_details * feat: build DockerController for grass/utils * refactor: rename Extractor to Controller * feat: move schedule related components to api_server * fix: incorrect allocation when starting batch jobs * fix: missing field "containers" in job_details * feat: add delete_job in master_api_server * feat: add logger in agents * fix: no "resources" field when scale up node at the very beginning * feat: use Process back instead of Thread in node_agent * feat: add 'v1' prefix to api_servers' urls * refactor: move lib/aks under lib/clouds * refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis * feat: extract K8sExecutor * fix: add one more searching layer of pakcage_data at maro.cli.k8s * refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode * refactor: move id init to standardize_create_deployment in grass/azure mode * fix: use GlobalParams instead of hard-coded data * feat: build K8sDetailsReader, K8sDetailsWriter * feat: use k8s sdk to replace subprocess call * refactor: delete redundant vars * refactor: move more methods to K8sExecutor * test: use legal naming in tests/cli/k8s * refactor: refine logging messages * refactor: make create() as a staticmethod at grass/azure mode, refine logging messages * feat: build ArmTemplateParameterBuilder in K8sAzureExecutor * refactor: remove redundant params * refactor: rename /clouds to /modes * refactor: refine structures and logging messages in GrassExecutor * feat: add 'PENDING' to NodeStatus * feat: refine build_job_details for create schedule in grass/azure * feat: refine build_job_details for create schedule in k8s/aks * add grass local mode (non-pass) * feat: use node_join schema in grass/azure * refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts * refactor: add 'ssh', 'api_server' into master_details and node_details * refactor: move master runtine params initialization into api_server * refactor: refine namings * feat: reconstruct grass/on-premises with new schema * refactor: delete field 'user' in grass_azure_create * refactor: rename 'blueprints_v1' to 'blueprints' * refactor: move some GlobalPaths to subfolders * Update grass local mode, run pass * refactor: replace 'connection' field with 'master' or 'node' * refactor: move start_service scripts to init_master.py * refactor: rename grass/master/release to grass/master/delete_master * refactor: load local_details in node services, refine script namings * refactor: move invocations of start_node and stop node to api server * fix: add missing imports * refactor: rename SubProcess to Subprocess * refactor: delete field 'user' in k8s_aks_create * add resource class * refactor: refine folder structures in /.maro/clusters/cluster * refactor: move /logs to /clusters/{cluster_name} * refactor: refine filenames * fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings * refactor: refine code structures, delete redundant code * refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml * feat: add rsa+aes data encryption on dev-master communication * fix: change MasterApiClient to RedisController in node-related services and scripts * refactor: remove all "{cluster_name}" in redis keys * refactor: extract init_master and create_user to GrassExecutor * test: refine tests in grass/azure and k8s/aks * refactor: refine ArmTemplateParameterBuilder * add cli visible agent * feat: change the order of installation in init_build_node_image_vm.py * fix: add user/admin_id to grass_on_premises_create.yml * fix: change outdated container names * feat: add standardize_join_cluster_deployment in grass/on-premises * feat: add init_node_runtime_env in join_cluster.py * refactor: refine code structure in join_cluster.py * test: add TestGrassOnPremises * refactor: refine ARM templates * fix: linting errors * fix: test requirements error * fix: arm linting errors * refactor: late import in grass, k8s * style: refine load_parser_grass * style: refine load_parser_k8s * add jobstate and resource usage support * add local visible test * docs: update orchestrations * fix: fix get_job_logs * docs: add docs for GrassAzureExecutor, GrassExecutor * docs: add docs for GrassOnPremisesExecutor * docs: add docs for /grass/scripts * docs: add docs for /grass/services * docs: add docs for /grass/utils * docs: add docs for k8s * grass mode visible pass * grass local mode run pass * fixed pylint * Update resource, rm GPUtil depend * Update CLI local mode visible * grass local mode pass * add redis clear and pylint fixed * rm job status in grass azure mode * fix bug * fixed merge issue * fixed lin * update by pr comments * fixed isort issue * fixed stop bug * fixed local agent and cmp issue * fixed pending job cannot killed * add mount in Grass local mode * add resource check interval in redis Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * Add Env-Geographic visualization tool, CIM hello as example (#291) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * V0.2 remove props from be (#269) * Fix bug * fix bu * Master vm doc - data preparation (#285) * Update vm docs * Update docs * Update data preparation docs * Update * Update docs Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Maro Geographic Tool Doc Update (#294) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file Co-authored-by: chaosyu <chaos.you@gmail.com> * Maro geo vis Data Update (#295) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2_refactored_distributed_framework (#206) * added some more logs for dist RL * bug fix * fixed a typo * bug fix * refined logs * set session_id to None for exit message * add setup/clear/template for maro process * changed to internal logger for actor and learner * removed redundant component name from internal logs * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * fixed typos * update ProcessInternalError * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * updated notebook * 1. removed external loggers from cim example; 2. fixed batch inference bugs * removed actor_trainer mode and refactored * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * fixed conflicts * fixed typos * removed stale imports * fixed stale naming * removed dist_topologies folder * refined session id logic * bug fix * refactored * distributed RL refinement * refined * small bug fix * fixed lint issues * fixed lint issues * removed unwanted file * fixed a typo * gnn refactoring in progress * merged algorithm with agent * bug fixes * fix * bug fixes * fixed lint issues and renamed models to model * removed unwanted files * fixed merge conflicts * removed exp pool type spec in AbsAgent * fixed lint issues * changed to a single gnn agent * dqn exp pool bug fix * minor issues * removed GNNAgentManager * updated notebooks and examples according to rl toolkit changes * updated images * moved exp pool init inside DQN * renamed column_based_store to simple_store * mroe gnn refactoring * fixed lint issues * fixed lint issues * lint issue fix * lint issue fix * fixed bugs in test_store * typo fix * minor edits * lint issue fix * finished single process gnn * fixed bugs * 1. removed state_shaper, action_shaper and exp_shaper abstractions; 2. used torch Categorical for sampling actions; 3. removed input_dim and output_dim properties from LearningModel * updated notebook * removed simple agent manager * fixed lint issues * fixed lint issues * bug fix * bug fixes * refined LearningModel * modified gnn example based on latest rl toolkit changes * updated cim example doc * lint issue fix * small refinements * refactored GNN example * replaced ActionInfo with torch Categorical's log_prob for policy_optimization algorithms * refactored gnn example and added single-process script * removed obsolete files from gnn * lint issue fix * formatting * checked out gnn files from origin/v0.2 * refactored distributed rl toolkit * finished distributed rl refactoring and updated dqn example and notebook * merged request_rollout with collect * some refinement * refactored examples * distributed rl revamping complete * bug and formatting fixes * bug fixes * hid proxy instantiation inside dist components * small refinement * refined distributed RL and updated docs * updated docs and notebook * rm unwanted imports * added missing files * rm unwanted files * lint issue fix * bug fix * example doc update * rm agent_manager.svg * updated images * updated image file name in doc * revamped cim example code structure * added missing file * restored default training config for dqn and ac-gnn * added default loss function for actor-critic * rm unwanted import * updated README for cim/ac * removed log_p param for PolicyGradient train() * added READMEs for CIM * renamed ac-gnn to ac_gnn * updated README for CIM and added set_seeds to multi-process dqn * init * remove unit, make it same as logic * init by sku, world sku * init by sku, world sku * remove debug code * correct snapshot number issue * rename logic to unit, make it meaningful * add facility base * refine naming * refine the code, more comment to make it easy to read * add supplier facility, logic not tested yet * fix bug in facility initialize, add consumerunit not completed * refactoring the facilities in world config * add consumer for warehouse facility * add upstream topology, and save it state * add mapping from id to data model index * logic without reward of consumer * bug fix * seller unit * use tcod for path finding * retailer facility * bug fix, show seller demands in example * add a interactive and renderable env wrapper to later debugging * move font to subfolder with lisence to make it more clearly * add more details for node mapping * dispatch action by unit id * merge the frame changes to support data model inherit * add action for consumer, so that we can push the requirement * add unit id and facility in state for unit, add storage id for manufacture unit to simple the state retrieving * show manufacture related debug info step by step * add bom info for debug * add x,y to facility, bug fix * fix bugs in transport and distribution unit, correct the path finding issue * show vehicle movement in screen * remove completed todo * fix vehicle location issue, make all units and data model class from configs * show more states * fix slot number bug for dynamic backend * rename suppliers to manufactures * add missing file * remove code config, use yml instead * add 2 different step modes * update changes * rename manufacture * add action for manufacture unit * more attribute for states * add balance sheet * rename attribute to unify the feature name * reverted experimental changes in dqn learner * updated notebook * rm supply chain code * lint issue fix * lint issue fix * added missing file * added general rollout workflow and trajectory class * refactored * more refactoring * checked out backend from v0.2 * checked out setup.py from v0.2 Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: chaosyu <chaos.you@gmail.com> * Add the price model (#286) * Add the price model * fix the error * Refine the energy consumption * Fix the error * Delete business_engine_20210225104622.py * Delete * Delete the history file * Delete common_20210205152100.py * Delete common_20210302150646.py * Refine the code * Refine the code * Refine the code * Delete history files * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * Fix the error * refine the code * Refine the code * Delete the history file * Fix the error * Fix the error * Fix the error * Refine the code * fix the error * fix the error * fix the error * Refine the code * Add toy files * Refine the code * Refine the code * Add file * Refine the code Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * add vm_scheduling meta into package data * Maro Dashboard Vis Doc Update (#298) * streamit with questdb * script to import current dump data, except attention file, use influxdb line protocol for batch sending. * refine the interface to flatten dictionary * add messagetype.file to upload file later * correct tag name * correct the way to initial streamit, make it possible to use it any where after start * add data collecting in cim business engine * streamit client refactoring * fix import issue * update cim hello world, with a commented code to enable vis data streaming * fix metric replace bug * refactor the type checking code * maro geo vis * add new line * doc update * lint refine * lint update * lint updata * lint update * lint update * lint update * code revert * add declare * code revert * add new line * add comment * delete surplus * delete core * lint update * lint update * lint update * lint update * specify version * lint update * specify docker version * import sort * backend revert * Delete test.py * format refact * doc update * import orders * change import orders * change import orders * add version of http-server * add specified port * delete print * lint update * lint update * lint update * update doc * dependecy update * update business engine * business engine * business engine update * doc update * delete irelevant file * update data * doc update Co-authored-by: chaosyu <chaos.you@gmail.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fixed internal logger dumplicated output (#299) * fixed internal logger dumplicated output * delete unused import * fixed isort Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com> Co-authored-by: MicrosoftHam <77261932+MicrosoftHam@users.noreply.github.com> Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
|
@ -63,4 +63,4 @@ jobs:
|
|||
test_with_cli: True
|
||||
training_only: True
|
||||
run: |
|
||||
python -m unittest tests/cli/grass/test_grass.py
|
||||
python -m unittest -f tests/cli/grass/test_grass_azure.py
|
||||
|
|
|
@ -6,6 +6,7 @@
|
|||
*.c
|
||||
*.cpp
|
||||
*.DS_Store
|
||||
.pytest_cache/
|
||||
.idea/
|
||||
.vscode/
|
||||
.vs/
|
||||
|
|
|
@ -3,3 +3,5 @@ prune examples
|
|||
|
||||
include maro/simulator/scenarios/cim/topologies/*/*.yml
|
||||
include maro/simulator/scenarios/citi_bike/topologies/*/*.yml
|
||||
include maro/simulator/scenarios/vm_scheduling/topologies/*/*.yml
|
||||
include maro/cli/project_generator/templates/*.jinja
|
|
@ -161,7 +161,7 @@ env = Env(scenario="cim",
|
|||
options={"enable-dump-snapshot": "./dump_data"})
|
||||
|
||||
# Inspect environment with the dump data
|
||||
maro inspector env --source ./dump_data
|
||||
maro inspector dashboard --source_path ./dump_data/snapshot_dump_folder
|
||||
```
|
||||
|
||||
### Show Cases
|
||||
|
|
|
@ -9,6 +9,34 @@ maro.rl.agent.abs\_agent
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.agent.dqn
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.agent.dqn
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.agent.ddpg
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.agent.ddpg
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.agent.policy\_optimization
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.agent.policy_optimization
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Agent Manager
|
||||
================================================================================
|
||||
|
||||
maro.rl.agent.abs\_agent\_manager
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
@ -18,33 +46,13 @@ maro.rl.agent.abs\_agent\_manager
|
|||
:show-inheritance:
|
||||
|
||||
|
||||
Algorithms
|
||||
Model
|
||||
================================================================================
|
||||
|
||||
maro.rl.algorithms.torch.abs\_algorithm
|
||||
maro.rl.model.learning\_model
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.algorithms.torch.abs_algorithm
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.algorithms.torch.dqn
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.algorithms.torch.dqn
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Models
|
||||
================================================================================
|
||||
|
||||
maro.rl.models.torch.learning\_model
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.models.torch.learning_model
|
||||
.. automodule:: maro.rl.model.torch.learning_model
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
@ -53,18 +61,46 @@ maro.rl.models.torch.learning\_model
|
|||
Explorer
|
||||
================================================================================
|
||||
|
||||
maro.rl.explorer.abs\_explorer
|
||||
maro.rl.exploration.abs\_explorer
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.explorer.abs_explorer
|
||||
.. automodule:: maro.rl.exploration.abs_explorer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.explorer.simple\_explorer
|
||||
maro.rl.exploration.epsilon\_greedy\_explorer
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.explorer.simple_explorer
|
||||
.. automodule:: maro.rl.exploration.epsilon_greedy_explorer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.exploration.noise\_explorer
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.exploration.noise_explorer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Scheduler
|
||||
================================================================================
|
||||
|
||||
maro.rl.scheduling.scheduler
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.scheduling.scheduler
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.scheduling.simple\_parameter\_scheduler
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.scheduling.simple_parameter_scheduler
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
@ -81,38 +117,6 @@ maro.rl.shaping.abs\_shaper
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.shaping.action\_shaper
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.shaping.action_shaper
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.shaping.experience\_shaper
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.shaping.experience_shaper
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.shaping.k\_step\_experience\_shaper
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.shaping.k_step_experience_shaper
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.shaping.state\_shaper
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.shaping.state_shaper
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
|
||||
Storage
|
||||
================================================================================
|
||||
|
@ -125,18 +129,10 @@ maro.rl.storage.abs\_store
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.storage.column\_based\_store
|
||||
maro.rl.storage.simple\_store
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.storage.column_based_store
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
maro.rl.storage.utils
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
.. automodule:: maro.rl.storage.utils
|
||||
.. automodule:: maro.rl.storage.simple_store
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
|
@ -37,14 +37,16 @@ author = "MARO Team"
|
|||
# extensions coming with Sphinx (named "sphinx.ext.*") or your custom
|
||||
# ones.
|
||||
|
||||
extensions = ["recommonmark",
|
||||
"sphinx.ext.autodoc",
|
||||
"sphinx.ext.coverage",
|
||||
"sphinx.ext.napoleon",
|
||||
"sphinx.ext.viewcode",
|
||||
"sphinx_markdown_tables",
|
||||
"sphinx_copybutton",
|
||||
]
|
||||
extensions = [
|
||||
"recommonmark",
|
||||
"sphinx.ext.autodoc",
|
||||
"sphinx.ext.coverage",
|
||||
"sphinx.ext.napoleon",
|
||||
"sphinx.ext.viewcode",
|
||||
"sphinx_markdown_tables",
|
||||
"sphinx_copybutton",
|
||||
"sphinx.ext.autosectionlabel",
|
||||
]
|
||||
|
||||
napoleon_google_docstring = True
|
||||
napoleon_use_param = False
|
||||
|
|
|
@ -1,298 +1,167 @@
|
|||
Multi Agent DQN for CIM
|
||||
================================================
|
||||
|
||||
This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the
|
||||
`CIM <https://maro.readthedocs.io/en/latest/scenarios/container_inventory_management.html>`_ problem. It is formalized as a multi-agent reinforcement learning problem, where each port acts as a decision
|
||||
agent. The agents take actions independently, e.g., loading containers to vessels or discharging containers from vessels.
|
||||
This example demonstrates how to use MARO's reinforcement learning (RL) toolkit to solve the container
|
||||
inventory management (CIM) problem. It is formalized as a multi-agent reinforcement learning problem,
|
||||
where each port acts as a decision agent. When a vessel arrives at a port, these agents must take actions
|
||||
by transfering a certain amount of containers to / from the vessel. The objective is for the agents to
|
||||
learn policies that minimize the overall container shortage.
|
||||
|
||||
State Shaper
|
||||
------------
|
||||
Trajectory
|
||||
----------
|
||||
|
||||
`State shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ converts the environment
|
||||
observation to the model input state which includes temporal and spatial information. For this scenario, the model input
|
||||
state includes:
|
||||
The ``CIMTrajectoryForDQN`` inherits from ``Trajectory`` function and implements methods to be used as callbacks
|
||||
in the roll-out loop. In this example,
|
||||
* ``get_state`` converts environment observations to state vectors that encode temporal and spatial information.
|
||||
The temporal information includes relevant port and vessel information, such as shortage and remaining space,
|
||||
over the past k days (here k = 7). The spatial information includes features of the downstream ports.
|
||||
* ``get_action`` converts agents' output (an integer that maps to a percentage of containers to be loaded
|
||||
to or unloaded from the vessel) to action objects that can be executed by the environment.
|
||||
* ``get_offline_reward`` computes the reward of a given action as a linear combination of fulfillment and
|
||||
shortage within a future time frame.
|
||||
* ``on_finish`` processes a complete trajectory into data that can be used directly by the learning agents.
|
||||
|
||||
- Temporal information, including the past week's information of ports and vessels, such as shortage on port and
|
||||
remaining space on vessel.
|
||||
- Spatial information, including related downstream port features.
|
||||
|
||||
.. code-block:: python
|
||||
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
|
||||
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
|
||||
class CIMTrajectoryForDQN(Trajectory):
|
||||
def __init__(
|
||||
self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
|
||||
reward_time_window, fulfillment_factor, shortage_factor, time_decay,
|
||||
finite_vessel_space=True, has_early_discharge=True
|
||||
):
|
||||
super().__init__(env)
|
||||
self.port_attributes = port_attributes
|
||||
self.vessel_attributes = vessel_attributes
|
||||
self.action_space = action_space
|
||||
self.look_back = look_back
|
||||
self.max_ports_downstream = max_ports_downstream
|
||||
self.reward_time_window = reward_time_window
|
||||
self.fulfillment_factor = fulfillment_factor
|
||||
self.shortage_factor = shortage_factor
|
||||
self.time_decay = time_decay
|
||||
self.finite_vessel_space = finite_vessel_space
|
||||
self.has_early_discharge = has_early_discharge
|
||||
|
||||
class CIMStateShaper(StateShaper):
|
||||
...
|
||||
def __call__(self, decision_event, snapshot_list):
|
||||
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
|
||||
ticks = [tick - rt for rt in range(self._look_back - 1)]
|
||||
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
|
||||
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
|
||||
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
|
||||
state = np.concatenate((port_features, vessel_features))
|
||||
return str(port_idx), state
|
||||
def get_state(self, event):
|
||||
vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
|
||||
tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
|
||||
ticks = [tick - rt for rt in range(self.look_back - 1)]
|
||||
future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
|
||||
port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
|
||||
vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
|
||||
return {port_idx: np.concatenate((port_features, vessel_features))}
|
||||
|
||||
def get_action(self, action_by_agent, event):
|
||||
vessel_snapshots = self.env.snapshot_list["vessels"]
|
||||
action_info = list(action_by_agent.values())[0]
|
||||
model_action = action_info[0] if isinstance(action_info, tuple) else action_info
|
||||
scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
|
||||
zero_action_idx = len(self.action_space) / 2 # index corresponding to value zero.
|
||||
vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
|
||||
early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
|
||||
percent = abs(self.action_space[model_action])
|
||||
|
||||
Action Shaper
|
||||
-------------
|
||||
|
||||
`Action shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert an
|
||||
agent's model output to an environment executable action. For this specific scenario, the action space consists of
|
||||
integers from -10 to 10, with -10 indicating loading 100% of the containers in the current inventory to the vessel and
|
||||
10 indicating discharging 100% of the containers on the vessel to the port.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class CIMActionShaper(ActionShaper):
|
||||
...
|
||||
def __call__(self, model_action, decision_event, snapshot_list):
|
||||
scope = decision_event.action_scope
|
||||
tick = decision_event.tick
|
||||
port_idx = decision_event.port_idx
|
||||
vessel_idx = decision_event.vessel_idx
|
||||
|
||||
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
|
||||
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
|
||||
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
|
||||
assert 0 <= model_action < len(self._action_space)
|
||||
|
||||
if model_action < self._zero_action_index:
|
||||
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
|
||||
elif model_action > self._zero_action_index:
|
||||
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
|
||||
actual_action = (
|
||||
round(plan_action) if plan_action > 0
|
||||
else round(self._action_space[model_action] * scope.discharge)
|
||||
)
|
||||
if model_action < zero_action_idx:
|
||||
action_type = ActionType.LOAD
|
||||
actual_action = min(round(percent * scope.load), vessel_space)
|
||||
elif model_action > zero_action_idx:
|
||||
action_type = ActionType.DISCHARGE
|
||||
plan_action = percent * (scope.discharge + early_discharge) - early_discharge
|
||||
actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
|
||||
else:
|
||||
actual_action = 0
|
||||
actual_action, action_type = 0, None
|
||||
|
||||
return Action(vessel_idx, port_idx, actual_action)
|
||||
return {port: Action(vessel, port, actual_action, action_type)}
|
||||
|
||||
Experience Shaper
|
||||
-----------------
|
||||
def get_offline_reward(self, event):
|
||||
port_snapshots = self.env.snapshot_list["ports"]
|
||||
start_tick = event.tick + 1
|
||||
ticks = list(range(start_tick, start_tick + self.reward_time_window))
|
||||
|
||||
`Experience shaper <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#shapers>`_ is used to convert
|
||||
an episode trajectory to trainable experiences for RL agents. For this specific scenario, the reward is a linear
|
||||
combination of fulfillment and shortage in a limited time window.
|
||||
future_fulfillment = port_snapshots[ticks::"fulfillment"]
|
||||
future_shortage = port_snapshots[ticks::"shortage"]
|
||||
decay_list = [
|
||||
self.time_decay ** i for i in range(self.reward_time_window)
|
||||
for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
|
||||
]
|
||||
|
||||
.. code-block:: python
|
||||
class TruncatedExperienceShaper(ExperienceShaper):
|
||||
...
|
||||
def __call__(self, trajectory, snapshot_list):
|
||||
experiences_by_agent = {}
|
||||
for i in range(len(trajectory) - 1):
|
||||
transition = trajectory[i]
|
||||
agent_id = transition["agent_id"]
|
||||
if agent_id not in experiences_by_agent:
|
||||
experiences_by_agent[agent_id] = defaultdict(list)
|
||||
experiences = experiences_by_agent[agent_id]
|
||||
experiences["state"].append(transition["state"])
|
||||
experiences["action"].append(transition["action"])
|
||||
experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
|
||||
experiences["next_state"].append(trajectory[i + 1]["state"])
|
||||
tot_fulfillment = np.dot(future_fulfillment, decay_list)
|
||||
tot_shortage = np.dot(future_shortage, decay_list)
|
||||
|
||||
return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
|
||||
|
||||
def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
|
||||
self.trajectory["event"].append(event)
|
||||
self.trajectory["state"].append(state_by_agent)
|
||||
self.trajectory["action"].append(action_by_agent)
|
||||
|
||||
def on_finish(self):
|
||||
exp_by_agent = defaultdict(lambda: defaultdict(list))
|
||||
for i in range(len(self.trajectory["state"]) - 1):
|
||||
agent_id = list(self.trajectory["state"][i].keys())[0]
|
||||
exp = exp_by_agent[agent_id]
|
||||
exp["S"].append(self.trajectory["state"][i][agent_id])
|
||||
exp["A"].append(self.trajectory["action"][i][agent_id])
|
||||
exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
|
||||
exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
|
||||
|
||||
return dict(exp_by_agent)
|
||||
|
||||
return experiences_by_agent
|
||||
|
||||
Agent
|
||||
-----
|
||||
|
||||
`Agent <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent>`_ is a combination of (RL)
|
||||
algorithm, experience pool, and a set of parameters that governs the training loop. For this scenario, the agent is the
|
||||
abstraction of a port. We choose DQN as our underlying learning algorithm with a TD-error-based sampling mechanism.
|
||||
The out-of-the-box DQN is used as our agent.
|
||||
|
||||
.. code-block:: python
|
||||
NUM_ACTIONS = 21
|
||||
class DQNAgent(AbsAgent):
|
||||
...
|
||||
def train(self):
|
||||
if len(self._experience_pool) < self._min_experiences_to_train:
|
||||
return
|
||||
|
||||
for _ in range(self._num_batches):
|
||||
indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
|
||||
state = np.asarray(sample["state"])
|
||||
action = np.asarray(sample["action"])
|
||||
reward = np.asarray(sample["reward"])
|
||||
next_state = np.asarray(sample["next_state"])
|
||||
loss = self._algorithm.train(state, action, reward, next_state)
|
||||
self._experience_pool.update(indexes, {"loss": loss})
|
||||
|
||||
def create_dqn_agents(agent_id_list):
|
||||
agent_dict = {}
|
||||
for agent_id in agent_id_list:
|
||||
q_net = NNStack(
|
||||
"q_value",
|
||||
FullyConnectedBlock(
|
||||
input_dim=state_shaper.dim,
|
||||
hidden_dims=[256, 128, 64],
|
||||
output_dim=NUM_ACTIONS,
|
||||
activation=nn.LeakyReLU,
|
||||
is_head=True,
|
||||
batch_norm_enabled=True,
|
||||
softmax_enabled=False,
|
||||
skip_connection_enabled=False,
|
||||
dropout_p=.0)
|
||||
)
|
||||
|
||||
algorithm = DQN(
|
||||
model=LearningModel(
|
||||
q_net, optimizer_options=OptimizerOptions(cls=RMSprop, params={"lr": 0.05})
|
||||
),
|
||||
config=DQNConfig(
|
||||
reward_decay=.0,
|
||||
target_update_frequency=5,
|
||||
tau=0.1,
|
||||
is_double=True,
|
||||
per_sample_td_error_enabled=True,
|
||||
loss_cls=nn.SmoothL1Loss
|
||||
)
|
||||
)
|
||||
|
||||
experience_pool = ColumnBasedStore(**config.experience_pool)
|
||||
agent_dict[agent_id] = DQNAgent(
|
||||
agent_id, algorithm, ColumnBasedStore(),
|
||||
min_experiences_to_train=1024, num_batches=10, batch_size=128
|
||||
)
|
||||
|
||||
return agent_dict
|
||||
|
||||
Agent Manager
|
||||
-------------
|
||||
|
||||
The complexities of the environment can be isolated from the learning algorithm by using an
|
||||
`Agent manager <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#agent-manager>`_
|
||||
to manage individual agents. We define a function to create the agents and an agent manager class
|
||||
that implements the ``train`` method where the newly obtained experiences are stored in the agents'
|
||||
experience pools before training, in accordance with the DQN algorithm.
|
||||
|
||||
.. code-block:: python
|
||||
class DQNAgentManager(SimpleAgentManager):
|
||||
def train(self, experiences_by_agent, performance=None):
|
||||
self._assert_train_mode()
|
||||
|
||||
# store experiences for each agent
|
||||
for agent_id, exp in experiences_by_agent.items():
|
||||
exp.update({"loss": [1e8] * len(list(exp.values())[0])})
|
||||
self.agent_dict[agent_id].store_experiences(exp)
|
||||
|
||||
for agent in self.agent_dict.values():
|
||||
agent.train()
|
||||
|
||||
Main Loop with Actor and Learner (Single Process)
|
||||
-------------------------------------------------
|
||||
|
||||
This single-process workflow of a learning policy's interaction with a MARO environment is comprised of:
|
||||
- Initializing an environment with specific scenario and topology parameters.
|
||||
- Defining scenario-specific components, e.g. shapers.
|
||||
- Creating agents and an agent manager.
|
||||
- Creating an `actor <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ and a
|
||||
`learner <https://maro.readthedocs.io/en/latest/key_components/rl_toolkit.html#learner-and-actor>`_ to start the
|
||||
training process in which the agent manager interacts with the environment for collecting experiences and updating
|
||||
policies.
|
||||
|
||||
.. code-block::python
|
||||
env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
state_shaper = CIMStateShaper(look_back=7, max_ports_downstream=2)
|
||||
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, NUM_ACTIONS)))
|
||||
experience_shaper = TruncatedExperienceShaper(
|
||||
time_window=100, fulfillment_factor=1.0, shortage_factor=1.0, time_decay_factor=0.97
|
||||
)
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN_INFERENCE,
|
||||
agent_dict=create_dqn_agents(agent_id_list),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper
|
||||
)
|
||||
|
||||
scheduler = TwoPhaseLinearParameterScheduler(
|
||||
max_episode=100,
|
||||
parameter_names=["epsilon"],
|
||||
split_ep=50,
|
||||
start_values=0.4,
|
||||
mid_values=0.32,
|
||||
end_values=.0
|
||||
)
|
||||
|
||||
actor = SimpleActor(env, agent_manager)
|
||||
learner = SimpleLearner(agent_manager, actor, scheduler)
|
||||
learner.learn()
|
||||
|
||||
|
||||
Main Loop with Actor and Learner (Distributed/Multi-process)
|
||||
--------------------------------------------------------------
|
||||
|
||||
We demonstrate a single-learner and multi-actor topology where the learner drives the program by telling remote actors
|
||||
to perform roll-out tasks and using the results they sent back to improve the policies. The workflow usually involves
|
||||
launching a learner process and an actor process separately. Because training occurs on the learner side and inference
|
||||
occurs on the actor side, we need to create appropriate agent managers on both sides.
|
||||
|
||||
On the actor side, the agent manager must be equipped with all shapers as well as an explorer. Thus, The code for
|
||||
creating an environment and an agent manager on the actor side is similar to that for the single-host version,
|
||||
except that it is necessary to set the AgentManagerMode to AgentManagerMode.INFERENCE. As in the single-process version, the environment
|
||||
and the agent manager are wrapped in a SimpleActor instance. To make the actor a distributed worker, we need to further
|
||||
wrap it in an ActorWorker instance. Finally, we launch the worker and it starts to listen to roll-out requests from the
|
||||
learner. The following code snippet shows the creation of an actor worker with a simple (local) actor wrapped inside.
|
||||
|
||||
.. code-block:: python
|
||||
env = Env("cim", "toy.4p_ssdd_l0.0", durations=1120)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.INFERENCE,
|
||||
agent_dict=create_dqn_agents(agent_id_list),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper
|
||||
)
|
||||
proxy_params = {
|
||||
"group_name": "distributed_cim",
|
||||
"expected_peers": {"learner": 1},
|
||||
"redis_address": ("localhost", 6379),
|
||||
"max_retries": 15
|
||||
.. code-block:: python
|
||||
agent_config = {
|
||||
"model": ...,
|
||||
"optimization": ...,
|
||||
"hyper_params": ...
|
||||
}
|
||||
actor_worker = ActorWorker(
|
||||
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
|
||||
proxy_params=proxy_params
|
||||
)
|
||||
actor_worker.launch()
|
||||
|
||||
On the learner side, an agent manager in AgentManagerMode.TRAIN mode is required. However, it is not necessary to create shapers for an
|
||||
agent manager in AgentManagerMode.TRAIN mode. Instead of creating an actor, we create an actor proxy and wrap it inside the learner. This proxy
|
||||
serves as the communication interface for the learner and is responsible for sending roll-out requests to remote actor
|
||||
processes and receiving results. Calling the train method executes the usual training loop except that the actual
|
||||
roll-out is performed remotely. The code snippet below shows the creation of a learner with an actor proxy wrapped
|
||||
inside that communicates with 3 actors.
|
||||
def get_dqn_agent():
|
||||
q_model = SimpleMultiHeadModel(
|
||||
FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
|
||||
)
|
||||
return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
|
||||
|
||||
|
||||
Training
|
||||
--------
|
||||
|
||||
The distributed training consists of one learner process and multiple actor processes. The learner optimizes
|
||||
the policy by collecting roll-out data from the actors to train the underlying agents.
|
||||
|
||||
The actor process must create a roll-out executor for performing the requested roll-outs, which means that the
|
||||
the environment simulator and shapers should be created here. In this example, inference is performed on the
|
||||
actor's side, so a set of DQN agents must be created in order to load the models (and exploration parameters)
|
||||
from the learner.
|
||||
|
||||
.. code-block:: python
|
||||
def cim_dqn_actor():
|
||||
env = Env(**training_config["env"])
|
||||
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
|
||||
actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
|
||||
actor.as_worker(training_config["group"])
|
||||
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN,
|
||||
agent_dict=create_dqn_agents(agent_id_list),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper
|
||||
)
|
||||
proxy_params = {
|
||||
"group_name": "distributed_cim",
|
||||
"expected_peers": {"actor": 3},
|
||||
"redis_address": ("localhost", 6379),
|
||||
"max_retries": 15
|
||||
}
|
||||
actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
|
||||
scheduler = TwoPhaseLinearParameterScheduler(
|
||||
max_episode=100,
|
||||
parameter_names=["epsilon"],
|
||||
split_ep=50,
|
||||
start_values=0.4,
|
||||
mid_values=0.32,
|
||||
end_values=.0
|
||||
)
|
||||
learner = SimpleLearner(agent_manager, actor, scheduler)
|
||||
learner.learn()
|
||||
The learner's side requires a concrete learner class that inherits from ``AbsLearner`` and implements the ``run``
|
||||
method which contains the main training loop. Here the implementation is similar to the single-threaded version
|
||||
except that the ``collect`` method is used to obtain roll-out data from the actors (since the roll-out executors
|
||||
are located on the actors' side). The agents created here are where training occurs and hence always contains the
|
||||
latest policies.
|
||||
|
||||
.. code-block:: python
|
||||
def cim_dqn_learner():
|
||||
env = Env(**training_config["env"])
|
||||
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
|
||||
scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
|
||||
actor = ActorProxy(
|
||||
training_config["group"], training_config["num_actors"],
|
||||
update_trigger=training_config["learner_update_trigger"]
|
||||
)
|
||||
learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
|
||||
learner.run()
|
||||
|
||||
.. note::
|
||||
|
||||
|
|
До Ширина: | Высота: | Размер: 19 KiB После Ширина: | Высота: | Размер: 21 KiB |
До Ширина: | Высота: | Размер: 16 KiB После Ширина: | Высота: | Размер: 16 KiB |
До Ширина: | Высота: | Размер: 13 KiB После Ширина: | Высота: | Размер: 13 KiB |
До Ширина: | Высота: | Размер: 17 KiB После Ширина: | Высота: | Размер: 186 KiB |
До Ширина: | Высота: | Размер: 27 KiB |
До Ширина: | Высота: | Размер: 184 KiB |
После Ширина: | Высота: | Размер: 180 KiB |
До Ширина: | Высота: | Размер: 205 KiB |
До Ширина: | Высота: | Размер: 12 KiB |
После Ширина: | Высота: | Размер: 1.1 MiB |
После Ширина: | Высота: | Размер: 71 KiB |
После Ширина: | Высота: | Размер: 3.3 MiB |
После Ширина: | Высота: | Размер: 558 KiB |
После Ширина: | Высота: | Размер: 344 KiB |
После Ширина: | Высота: | Размер: 316 KiB |
После Ширина: | Высота: | Размер: 1.9 MiB |
После Ширина: | Высота: | Размер: 1.1 MiB |
|
@ -81,9 +81,9 @@ Contents
|
|||
|
||||
installation/pip_install.rst
|
||||
installation/playground.rst
|
||||
installation/grass_cluster_provisioning_on_azure.rst
|
||||
installation/k8s_cluster_provisioning_on_azure.rst
|
||||
installation/grass_cluster_provisioning_on_premises.rst
|
||||
installation/grass_azure_cluster_provisioning.rst
|
||||
installation/grass_on_premises_cluster_provisioning.rst
|
||||
installation/k8s_aks_cluster_provisioning.rst
|
||||
installation/multi_processes_localhost_provisioning.rst
|
||||
|
||||
.. toctree::
|
||||
|
@ -93,6 +93,7 @@ Contents
|
|||
scenarios/container_inventory_management.rst
|
||||
scenarios/citi_bike.rst
|
||||
scenarios/vm_scheduling.rst
|
||||
scenarios/command_line.rst
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
@ -114,6 +115,7 @@ Contents
|
|||
key_components/communication.rst
|
||||
key_components/orchestration.rst
|
||||
key_components/dashboard_visualization.rst
|
||||
key_components/geographic_visualization.rst
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
|
|
@ -0,0 +1,240 @@
|
|||
.. _grass-azure-cluster-provisioning:
|
||||
|
||||
Grass Cluster Provisioning on Azure
|
||||
===================================
|
||||
|
||||
With the following guide, you can build up a MARO cluster in
|
||||
:ref:`grass/azure <grass>`
|
||||
mode on Azure and run your training job in a distributed environment.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
|
||||
* `Install docker <https://docs.docker.com/engine/install/>`_ and
|
||||
`Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
|
||||
|
||||
Cluster Management
|
||||
------------------
|
||||
|
||||
* Create a cluster with a :ref:`deployment <#grass-azure-create>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Create a grass cluster with a grass-create deployment
|
||||
maro grass create ./grass-azure-create.yml
|
||||
|
||||
* Scale the cluster
|
||||
|
||||
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Scale nodes with 'Standard_D4s_v3' specification to 2
|
||||
maro grass node scale myGrassCluster Standard_D4s_v3 2
|
||||
|
||||
# Scale nodes with 'Standard_D2s_v3' specification to 0
|
||||
maro grass node scale myGrassCluster Standard_D2s_v3 0
|
||||
|
||||
* Delete the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Delete a grass cluster
|
||||
maro grass delete myGrassCluster
|
||||
|
||||
* Start/Stop nodes to save costs
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start 2 nodes with 'Standard_D4s_v3' specification
|
||||
maro grass node start myGrassCluster Standard_D4s_v3 2
|
||||
|
||||
# Stop 2 nodes with 'Standard_D4s_v3' specification
|
||||
maro grass node stop myGrassCluster Standard_D4s_v3 2
|
||||
|
||||
* Get statuses of the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Get master status
|
||||
maro grass status myGrassCluster master
|
||||
|
||||
# Get nodes status
|
||||
maro grass status myGrassCluster nodes
|
||||
|
||||
# Get containers status
|
||||
maro grass status myGrassCluster containers
|
||||
|
||||
* Clean up the cluster
|
||||
|
||||
Delete all running jobs, schedules, containers in the cluster.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
maro grass clean myGrassCluster
|
||||
|
||||
.. _grass-azure-cluster-provisioning/run-job:
|
||||
|
||||
Run Job
|
||||
-------
|
||||
|
||||
* Push your training image from local machine
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push image 'myImage' to the cluster,
|
||||
# 'myImage' is a docker image that loaded on the machine that executed this command
|
||||
maro grass image push myGrassCluster --image-name myImage
|
||||
|
||||
* Push your training data
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
|
||||
# You can then assign your mapping location in the start-job-deployment
|
||||
maro grass data push myGrassCluster ./myTrainingData/dqn /myTrainingData
|
||||
|
||||
* Start a training job with a :ref:`start-job-deployment <grass-start-job>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training job with a start-job deployment
|
||||
maro grass job start myGrassCluster ./grass-start-job.yml
|
||||
|
||||
* Or, schedule batch jobs with a :ref:`start-schedule-deployment <grass-start-schedule>`
|
||||
|
||||
These jobs will shared the same specification of components.
|
||||
|
||||
A best practice to use this command will be:
|
||||
Push your training configs all at once with "``maro grass data push``",
|
||||
and get the jobName from environment variables in the containers,
|
||||
then use the specific training config based on the jobName.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training schedule with a start-schedule deployment
|
||||
maro grass schedule start myGrassCluster ./grass-start-schedule.yml
|
||||
|
||||
* Get the logs of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Get the logs of the job
|
||||
maro grass job logs myGrassCluster myJob1
|
||||
|
||||
* List the current status of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# List the current status of the job
|
||||
maro grass job list myGrassCluster
|
||||
|
||||
* Stop a training job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Stop a training job
|
||||
maro grass job stop myJob1
|
||||
|
||||
Sample Deployments
|
||||
------------------
|
||||
|
||||
grass-azure-create
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass/azure
|
||||
name: myGrassCluster
|
||||
|
||||
cloud:
|
||||
resource_group: myResourceGroup
|
||||
subscription: mySubscription
|
||||
location: eastus
|
||||
default_username: admin
|
||||
default_public_key: "{ssh public key}"
|
||||
|
||||
user:
|
||||
admin_id: admin
|
||||
|
||||
master:
|
||||
node_size: Standard_D2s_v3
|
||||
|
||||
grass-start-job
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
You can replace {project root} with a valid linux path. e.g. /home/admin
|
||||
|
||||
Then the data you push will be mount into this folder.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: myJob1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "python {project root}/myTrainingData/dqn/job1/start_actor.py"
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 1
|
||||
gpu: 0
|
||||
memory: 1024m
|
||||
learner:
|
||||
command: "python {project root}/myTrainingData/dqn/job1/start_learner.py"
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
|
||||
grass-start-schedule
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: mySchedule1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
job_names:
|
||||
- myJob2
|
||||
- myJob3
|
||||
- myJob4
|
||||
- myJob5
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "python {project root}/myTrainingData/dqn/schedule1/actor.py"
|
||||
image: myImage
|
||||
mount:
|
||||
target: “{project root}”
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 1
|
||||
gpu: 0
|
||||
memory: 1024m
|
||||
learner:
|
||||
command: "bash {project root}/myTrainingData/dqn/schedule1/learner.py"
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
|
@ -1,202 +0,0 @@
|
|||
|
||||
Grass Cluster Provisioning on Azure
|
||||
===================================
|
||||
|
||||
With the following guide, you can build up a MARO cluster in
|
||||
`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
|
||||
on Azure and run your training job in a distributed environment.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
* `Install the Azure CLI and login <https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
|
||||
* `Install docker <https://docs.docker.com/engine/install/>`_ and
|
||||
`Configure docker to make sure it can be managed as a non-root user <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`_
|
||||
|
||||
Cluster Management
|
||||
------------------
|
||||
|
||||
* Create a cluster with a `deployment <#grass-azure-create>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Create a grass cluster with a grass-create deployment
|
||||
maro grass create ./grass-azure-create.yml
|
||||
|
||||
* Scale the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Scale nodes with 'Standard_D4s_v3' specification to 2
|
||||
maro grass node scale my_grass_cluster Standard_D4s_v3 2
|
||||
|
||||
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
|
||||
to see more node specifications.
|
||||
|
||||
* Delete the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Delete a grass cluster
|
||||
maro grass delete my_grass_cluster
|
||||
|
||||
* Start/stop nodes to save costs
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start 2 nodes with 'Standard_D4s_v3' specification
|
||||
maro grass node start my_grass_cluster Standard_D4s_v3 2
|
||||
|
||||
# Stop 2 nodes with 'Standard_D4s_v3' specification
|
||||
maro grass node stop my_grass_cluster Standard_D4s_v3 2
|
||||
|
||||
Run Job
|
||||
-------
|
||||
|
||||
* Push your training image
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push image 'my_image' to the cluster
|
||||
maro grass image push my_grass_cluster --image-name my_image
|
||||
|
||||
* Push your training data
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
|
||||
# You can then assign your mapping location in the start-job deployment
|
||||
maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
|
||||
|
||||
* Start a training job with a `deployment <#grass-start-job>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training job with a start-job deployment
|
||||
maro grass job start my_grass_cluster ./grass-start-job.yml
|
||||
|
||||
* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training schedule with a start-schedule deployment
|
||||
maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
|
||||
|
||||
* Get the logs of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Get the logs of the job
|
||||
maro grass job logs my_grass_cluster my_job_1
|
||||
|
||||
* List the current status of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# List the current status of the job
|
||||
maro grass job list my_grass_cluster
|
||||
|
||||
* Stop a training job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Stop a training job
|
||||
maro grass job stop my_job_1
|
||||
|
||||
Sample Deployments
|
||||
------------------
|
||||
|
||||
grass-azure-create
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: my_grass_cluster
|
||||
|
||||
cloud:
|
||||
infra: azure
|
||||
location: eastus
|
||||
resource_group: my_grass_resource_group
|
||||
subscription: my_subscription
|
||||
|
||||
user:
|
||||
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
|
||||
admin_username: admin
|
||||
|
||||
master:
|
||||
node_size: Standard_D2s_v3
|
||||
|
||||
grass-start-job
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: my_job_1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "bash {project root}/my_training_data/job_1/actor.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: “{project root}”
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
learner:
|
||||
command: "bash {project root}/my_training_data/job_1/learner.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
|
||||
grass-start-schedule
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: my_schedule_1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
job_names:
|
||||
- my_job_2
|
||||
- my_job_3
|
||||
- my_job_4
|
||||
- my_job_5
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "bash {project root}/my_training_data/job_1/actor.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: “{project root}”
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
learner:
|
||||
command: "bash {project root}/my_training_data/job_1/learner.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
|
@ -1,206 +0,0 @@
|
|||
|
||||
Grass Cluster Provisioning in On-Premises Environment
|
||||
=====================================================
|
||||
|
||||
With the following guide, you can build up a MARO cluster in
|
||||
`grass mode <../distributed_training/orchestration_with_grass.html#orchestration-with-grass>`_
|
||||
in local private network and run your training job in On-Premises distributed environment.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
* Linux with Python 3.6+
|
||||
* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
|
||||
|
||||
Cluster Management
|
||||
------------------
|
||||
|
||||
* Create a cluster with a `deployment <#grass-cluster-create>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Create a grass cluster with a grass-create deployment
|
||||
maro grass create ./grass-azure-create.yml
|
||||
|
||||
* Let a node join a specified cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Let a worker node join into specified cluster
|
||||
maro grass node join ./node-join.yml
|
||||
|
||||
* Let a node leave a specified cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Let a worker node leave a specified cluster
|
||||
maro grass node leave {cluster_name} {node_name}
|
||||
|
||||
|
||||
* Delete the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Delete a grass cluster
|
||||
maro grass delete my_grass_cluster
|
||||
|
||||
|
||||
Run Job
|
||||
-------
|
||||
|
||||
* Push your training image
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push image 'my_image' to the cluster
|
||||
maro grass image push my_grass_cluster --image-name my_image
|
||||
|
||||
* Push your training data
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
|
||||
# You can then assign your mapping location in the start-job deployment
|
||||
maro grass data push my_grass_cluster ./my_training_data/* /my_training_data
|
||||
|
||||
* Start a training job with a `deployment <#grass-start-job>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training job with a start-job deployment
|
||||
maro grass job start my_grass_cluster ./grass-start-job.yml
|
||||
|
||||
* Or, schedule batch jobs with a `deployment <#grass-start-schedule>`_
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training schedule with a start-schedule deployment
|
||||
maro grass schedule start my_grass_cluster ./grass-start-schedule.yml
|
||||
|
||||
* Get the logs of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Get the logs of the job
|
||||
maro grass job logs my_grass_cluster my_job_1
|
||||
|
||||
* List the current status of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# List the current status of the job
|
||||
maro grass job list my_grass_cluster
|
||||
|
||||
* Stop a training job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Stop a training job
|
||||
maro grass job stop my_job_1
|
||||
|
||||
Sample Deployments
|
||||
------------------
|
||||
|
||||
grass-cluster-create
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass/on-premises
|
||||
name: cluster_name
|
||||
|
||||
user:
|
||||
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
|
||||
admin_username: admin
|
||||
|
||||
|
||||
grass-node-join
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: "grass/on-premises"
|
||||
name: ""
|
||||
cluster: ""
|
||||
public_ip_address: ""
|
||||
hostname: ""
|
||||
system: "linux"
|
||||
resources:
|
||||
cpu: 1
|
||||
memory: 1024
|
||||
gpu: 0
|
||||
|
||||
|
||||
grass-start-job
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: my_job_1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "bash {project root}/my_training_data/job_1/actor.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: “{project root}”
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
learner:
|
||||
command: "bash {project root}/my_training_data/job_1/learner.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
|
||||
grass-start-schedule
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass
|
||||
name: my_schedule_1
|
||||
|
||||
allocation:
|
||||
mode: single-metric-balanced
|
||||
metric: cpu
|
||||
|
||||
job_names:
|
||||
- my_job_2
|
||||
- my_job_3
|
||||
- my_job_4
|
||||
- my_job_5
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: "bash {project root}/my_training_data/job_1/actor.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: “{project root}”
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
learner:
|
||||
command: "bash {project root}/my_training_data/job_1/learner.sh"
|
||||
image: my_image
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
|
@ -0,0 +1,98 @@
|
|||
.. _grass-on-premises-cluster-provisioning:
|
||||
|
||||
Grass Cluster Provisioning in On-Premises Environment
|
||||
=====================================================
|
||||
|
||||
With the following guide, you can build up a MARO cluster in
|
||||
:ref:`grass/on-premises <grass>`
|
||||
in local private network and run your training job in On-Premises distributed environment.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
* Linux with Python 3.6+
|
||||
* `Install Powershell <https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell?view=powershell-7.1>`_ if you are using Windows Server
|
||||
|
||||
Cluster Management
|
||||
------------------
|
||||
|
||||
* Create a cluster with a :ref:`deployment <grass-on-premises-create>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Create a grass cluster with a grass-create deployment
|
||||
maro grass create ./grass-azure-create.yml
|
||||
|
||||
* Let a node join a specified cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Let a worker node join into specified cluster
|
||||
maro grass node join ./node-join.yml
|
||||
|
||||
* Let a node leave a specified cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Let a worker node leave a specified cluster
|
||||
maro grass node leave {cluster_name} {node_name}
|
||||
|
||||
|
||||
* Delete the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Delete a grass cluster
|
||||
maro grass delete my_grass_cluster
|
||||
|
||||
|
||||
Run Job
|
||||
-------
|
||||
|
||||
See :ref:`Run Job in grass/azure <grass-azure-cluster-provisioning/run-job>` for reference.
|
||||
|
||||
|
||||
Sample Deployments
|
||||
------------------
|
||||
|
||||
grass-on-premises-create
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass/on-premises
|
||||
name: clusterName
|
||||
|
||||
user:
|
||||
admin_id: admin
|
||||
|
||||
master:
|
||||
username: root
|
||||
hostname: maroMaster
|
||||
public_ip_address: 137.128.0.1
|
||||
private_ip_address: 10.0.0.4
|
||||
|
||||
|
||||
grass-on-premises-join-cluster
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: grass/on-premises
|
||||
|
||||
master:
|
||||
private_ip_address: 10.0.0.4
|
||||
|
||||
node:
|
||||
hostname: maroNode1
|
||||
username: root
|
||||
public_ip_address: 137.128.0.2
|
||||
private_ip_address: 10.0.0.5
|
||||
resources:
|
||||
cpu: all
|
||||
memory: 2048m
|
||||
gpu: 0
|
||||
|
||||
config:
|
||||
install_node_runtime: true
|
||||
install_node_gpu_support: false
|
|
@ -1,8 +1,10 @@
|
|||
.. _k8s-aks-cluster-provisioning:
|
||||
|
||||
K8S Cluster Provisioning on Azure
|
||||
=================================
|
||||
|
||||
With the following guide, you can build up a MARO cluster in
|
||||
`k8s mode <../distributed_training/orchestration_with_k8s.html#orchestration-with-k8s>`_
|
||||
:ref:`k8s/aks <k8s>`
|
||||
on Azure and run your training job in a distributed environment.
|
||||
|
||||
Prerequisites
|
||||
|
@ -36,7 +38,7 @@ Prerequisites
|
|||
Cluster Management
|
||||
------------------
|
||||
|
||||
* Create a cluster with a `deployment <#k8s-azure-create>`_
|
||||
* Create a cluster with a :ref:`deployment <k8s-aks-create>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
|
@ -47,18 +49,20 @@ Cluster Management
|
|||
|
||||
.. code-block:: sh
|
||||
|
||||
# Scale nodes with 'Standard_D4s_v3' specification to 2
|
||||
maro k8s node scale my_k8s_cluster Standard_D4s_v3 2
|
||||
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_ to see more node specifications.
|
||||
|
||||
Check `VM Size <https://docs.microsoft.com/en-us/azure/virtual-machines/sizes>`_
|
||||
to see more node specifications.
|
||||
# Scale nodes with 'Standard_D4s_v3' specification to 2
|
||||
maro k8s node scale myK8sCluster Standard_D4s_v3 2
|
||||
|
||||
# Scale nodes with 'Standard_D2s_v3' specification to 0
|
||||
maro k8s node scale myK8sCluster Standard_D2s_v3 0
|
||||
|
||||
* Delete the cluster
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Delete a k8s cluster
|
||||
maro k8s delete my_k8s_cluster
|
||||
maro k8s delete myK8sCluster
|
||||
|
||||
Run Job
|
||||
-------
|
||||
|
@ -67,72 +71,69 @@ Run Job
|
|||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push image 'my_image' to the cluster
|
||||
maro k8s image push my_k8s_cluster --image-name my_image
|
||||
# Push image 'myImage' to the cluster
|
||||
maro k8s image push myK8sCluster --image-name myImage
|
||||
|
||||
* Push your training data
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Push data under './my_training_data' to a relative path '/my_training_data' in the cluster
|
||||
# You can then assign your mapping location in the start-job deployment
|
||||
maro k8s data push my_k8s_cluster ./my_training_data/* /my_training_data
|
||||
# Push dqn folder under './myTrainingData/' to a relative path '/myTrainingData' in the cluster
|
||||
# You can then assign your mapping location in the start-job-deployment
|
||||
maro k8s data push myGrassCluster ./myTrainingData/dqn /myTrainingData
|
||||
|
||||
* Start a training job with a `deployment <#k8s-start-job>`_
|
||||
* Start a training job with a :ref:`deployment <k8s-start-job>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training job with a start-job deployment
|
||||
maro k8s job start my_k8s_cluster ./k8s-start-job.yml
|
||||
# Start a training job with a start-job-deployment
|
||||
maro k8s job start myK8sCluster ./k8s-start-job.yml
|
||||
|
||||
* Or, schedule batch jobs with a `deployment <#k8s-start-schedule>`_
|
||||
* Or, schedule batch jobs with a :ref:`deployment <k8s-start-schedule>`
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Start a training schedule with a start-schedule deployment
|
||||
maro k8s schedule start my_k8s123_cluster ./k8s-start-schedule.yml
|
||||
# Start a training schedule with a start-schedule-deployment
|
||||
maro k8s schedule start myK8sCluster ./k8s-start-schedule.yml
|
||||
|
||||
* Get the logs of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Logs will be exported to current directory
|
||||
maro k8s job logs my_k8s_cluster my_job_1
|
||||
maro k8s job logs myK8sCluster myJob1
|
||||
|
||||
* List the current status of the job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# List current status of jobs
|
||||
maro k8s job list my_k8s_cluster my_job_1
|
||||
maro k8s job list myK8sCluster myJob1
|
||||
|
||||
* Stop a training job
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
# Stop a training job
|
||||
maro k8s job stop my_k8s_cluster my_job_1
|
||||
maro k8s job stop myK8sCluster myJob1
|
||||
|
||||
Sample Deployments
|
||||
------------------
|
||||
|
||||
k8s-azure-create
|
||||
^^^^^^^^^^^^^^^^
|
||||
k8s-aks-create
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: k8s
|
||||
name: my_k8s_cluster
|
||||
mode: k8s/aks
|
||||
name: myK8sCluster
|
||||
|
||||
cloud:
|
||||
infra: azure
|
||||
subscription: mySubscription
|
||||
resource_group: myResourceGroup
|
||||
location: eastus
|
||||
resource_group: my_k8s_resource_group
|
||||
subscription: my_subscription
|
||||
|
||||
user:
|
||||
admin_public_key: "{ssh public key with 'ssh-rsa' prefix}"
|
||||
admin_username: admin
|
||||
default_public_key: "{ssh public key}"
|
||||
default_username: admin
|
||||
|
||||
master:
|
||||
node_size: Standard_D2s_v3
|
||||
|
@ -142,63 +143,63 @@ k8s-start-job
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: k8s
|
||||
name: my_job_1
|
||||
mode: k8s/aks
|
||||
name: myJob1
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: ["bash", "{project root}/my_training_data/actor.sh"]
|
||||
image: my_image
|
||||
command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
memory: 2048M
|
||||
learner:
|
||||
command: ["bash", "{project root}/my_training_data/learner.sh"]
|
||||
image: my_image
|
||||
command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
memory: 2048M
|
||||
|
||||
k8s-start-schedule
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mode: k8s
|
||||
name: my_schedule_1
|
||||
mode: k8s/aks
|
||||
name: mySchedule1
|
||||
|
||||
job_names:
|
||||
- my_job_2
|
||||
- my_job_3
|
||||
- my_job_4
|
||||
- my_job_5
|
||||
- myJob2
|
||||
- myJob3
|
||||
- myJob4
|
||||
- myJob5
|
||||
|
||||
components:
|
||||
actor:
|
||||
command: ["bash", "{project root}/my_training_data/actor.sh"]
|
||||
image: my_image
|
||||
command: ["python", "{project root}/myTrainingData/dqn/start_actor.py"]
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 5
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
memory: 2048M
|
||||
learner:
|
||||
command: ["bash", "{project root}/my_training_data/learner.sh"]
|
||||
image: my_image
|
||||
command: ["python", "{project root}/myTrainingData/dqn/start_learner.py"]
|
||||
image: myImage
|
||||
mount:
|
||||
target: "{project root}"
|
||||
num: 1
|
||||
resources:
|
||||
cpu: 2
|
||||
gpu: 0
|
||||
memory: 2048m
|
||||
memory: 2048M
|
|
@ -71,7 +71,7 @@ To start this visualization tool, user need to input command following the forma
|
|||
|
||||
.. code-block:: sh
|
||||
|
||||
maro inspector env --source {source\_folder\_path} --force {true/false}
|
||||
maro inspector dashboard --source_path {source\_folder\_path} --force {true/false}
|
||||
|
||||
----
|
||||
|
||||
|
@ -79,7 +79,7 @@ e.g.
|
|||
|
||||
.. code-block:: sh
|
||||
|
||||
maro inspector env --source_path .\maro\dumper_files --force false
|
||||
maro inspector dashboard --source_path .\maro\dumper_files --force false
|
||||
|
||||
----
|
||||
|
||||
|
|
|
@ -0,0 +1,235 @@
|
|||
Geographic Visualization
|
||||
=======================
|
||||
|
||||
We can use Env-geographic for both finished experiments and running experiments.
|
||||
For finished experiments, the local mode is enabled for users to view experimental data
|
||||
in order to help users to make subsequent decisions. If a running experiment is selected,
|
||||
the real-time mode will be launched by default, it is used to view real-time experimental
|
||||
data and judge the effectiveness of the model. You can also freely change to
|
||||
local mode for the finished epoch under real-time mode.
|
||||
|
||||
|
||||
Dependency
|
||||
----------
|
||||
|
||||
Env-geographic's startup depends on docker.
|
||||
Therefore, users need to install docker on the machine and ensure that it can run normally.
|
||||
User could get docker through `Docker installation <https://docs.docker.com/get-docker/>`_.
|
||||
|
||||
|
||||
How to Use?
|
||||
-----------
|
||||
|
||||
Env-geographic has 3 parts: front-end, back-end and database. Users need 2 steps
|
||||
to start this tool:
|
||||
|
||||
1. Start the database and choose an experiment to be displayed.
|
||||
2. Start the front-end and back-end service with specified experiment name.
|
||||
|
||||
|
||||
Start database
|
||||
~~~~~~~~~~~~~~
|
||||
Firstly, user need to start the local database with command:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
maro inspector geo --start database
|
||||
|
||||
----
|
||||
|
||||
After the command is executed successfully, user
|
||||
could view the local data with localhost:9000 by default.
|
||||
If the default port is occupied, user could obtain the access port of each container
|
||||
through the following command:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
docker container ls
|
||||
|
||||
----
|
||||
|
||||
User could view all experiment information by SQL statement:
|
||||
|
||||
.. code-block:: SQL
|
||||
|
||||
SELECT * FROM maro.experiments
|
||||
|
||||
----
|
||||
|
||||
Data is stored locally at the folder maro/maro/streamit/server/data.
|
||||
|
||||
|
||||
Choose an existing experiment
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To view the visualization of experimental data, user need to
|
||||
specify the name of experiment. User could choose an existing
|
||||
experiment or start an experiment either.
|
||||
|
||||
User could select a name from local database.
|
||||
|
||||
.. image:: ../images/visualization/geographic/database_exp.png
|
||||
:alt: database_exp
|
||||
|
||||
|
||||
Create a new experiment
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Currently, users need to manually start the experiment to obtain
|
||||
the data required by the service.
|
||||
|
||||
To send data to database, there are 2 compulsory steps:
|
||||
|
||||
1. Set the environmental variable to enable data transmission.
|
||||
2. Import relevant package and modify the code of environmental initialization to send data.
|
||||
|
||||
User needs to set the value of the environment variable
|
||||
"MARO_STREAMIT_ENABLED" to "true". If user wants to specify the experiment name,
|
||||
set the environment variable "MARO_STREAMIT_EXPERIMENT_NAME". If user does not
|
||||
set this value, a unique experiment name would be processed automatically. User
|
||||
could check the experiment name through database. It should be noted that when
|
||||
selecting a topology, user must select a topology with specific geographic
|
||||
information. The experimental data obtained by using topology files without
|
||||
geographic information cannot be used in the Env-geographic tool.
|
||||
|
||||
User could set the environmental variable as following example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
os.environ["MARO_STREAMIT_ENABLED"] = "true"
|
||||
|
||||
os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "my_maro_experiment"
|
||||
|
||||
----
|
||||
|
||||
To send the experimental data by episode while the experiment is running, user needs to import the
|
||||
package **streamit** with following code before environment initialization:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Import package streamit
|
||||
from maro.streamit import streamit
|
||||
# Initialize environment and send basic information of experiment to database.
|
||||
env = Env(scenario="cim", topology="global_trade.22p_l0.1",
|
||||
start_tick=0, durations=100)
|
||||
|
||||
for ep in range(EPISODE_NUMBER):
|
||||
# Send experimental data to database by episode.
|
||||
streamit.episode(ep)
|
||||
|
||||
----
|
||||
|
||||
To get the complete reference, please view the file maro/examples/hello_world/cim/hello.py.
|
||||
|
||||
After starting the experiment, user needs to query its name in local database to make sure
|
||||
the experimental data is sent successfully.
|
||||
|
||||
|
||||
Start service
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
To start the front-end and back-end service, user need to specify the experiment name.
|
||||
User could specify the port by adding the parameter "front_end_port" as following
|
||||
command:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
maro inspector geo --start service --experiment_name YOUR_EXPERIMENT_NAME --front_end_port 8080
|
||||
|
||||
----
|
||||
|
||||
The program will automatically determine whether to use real-time mode
|
||||
or local mode according to the data status of the current experiment.
|
||||
|
||||
Feature List
|
||||
------------
|
||||
|
||||
For the convenience of users, Env-geographic tool implemented some features
|
||||
so that users can freely view experimental data.
|
||||
|
||||
|
||||
Real-time mode and local mode
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Local mode
|
||||
^^^^^^^^^^
|
||||
|
||||
In this mode, user could comprehend the experimental data through the geographic
|
||||
information and the charts on both sides. By clicking the play button in the lower
|
||||
left corner of the page, user could view the dynamic changes of the data in the
|
||||
selected time window. By hovering on geographic items and charts, more detailed information
|
||||
could be displayed.
|
||||
|
||||
|
||||
.. image:: ../images/visualization/geographic/local_mode.gif
|
||||
:alt: local_mode
|
||||
|
||||
|
||||
The chart on the right side of the page shows the changes in the data over
|
||||
a period of time from the perspectives of overall, port, and vessel.
|
||||
|
||||
.. image:: ../images/visualization/geographic/local_mode_right_chart.gif
|
||||
:alt: local_mode_right_chart
|
||||
|
||||
The chart on the left side of the page shows the ranking of the carrying
|
||||
capacity of each port and the change in carrying capacity between ports
|
||||
in the entire time window.
|
||||
|
||||
.. image:: ../images/visualization/geographic/local_mode_left_chart.gif
|
||||
:alt: local_mode_left_chart
|
||||
|
||||
Real-time mode
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The feature of real-time mode is not much different from that of local mode.
|
||||
The particularity of real-time mode lies in the data. The automatic playback
|
||||
speed of the progress bar in the front-end page is often close to the speed
|
||||
of the experimental data. So user could not select the time window freely in
|
||||
this mode.
|
||||
|
||||
Besides, user could change the mode by clicking. If user choose to view the
|
||||
local data under real-time mode, the experimental data generated so far could
|
||||
be displayed.
|
||||
|
||||
.. image:: ../images/visualization/geographic/real_time_mode.gif
|
||||
:alt: real_time_mode
|
||||
|
||||
Geographic data display
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In the map on the page, user can view the specific status of different resource
|
||||
holders at various times. Users can further understand a specific area by zooming the map.
|
||||
Among them, the three different status of the port:
|
||||
Surplus, Deficit and Balance represent the quantitative relationship between the
|
||||
empty container volume and the received order volume of the corresponding port
|
||||
at that time.
|
||||
|
||||
.. image:: ../images/visualization/geographic/geographic_data_display.gif
|
||||
:alt: geographic_data_display
|
||||
|
||||
Data chart display
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
The ranking table on the right side of the page shows the throughput of routes and
|
||||
ports over a period of time. While the heat-map shows the throughput between ports
|
||||
over a period of time. User can hover to specific elements to view data information.
|
||||
|
||||
The chart on the left shows the order volume and empty container information of each
|
||||
port and each vessel. User can view the data of different resource holders by switching options.
|
||||
|
||||
In addition, user can zoom the chart to display information more clearly.
|
||||
|
||||
.. image:: ../images/visualization/geographic/data_chart_display.gif
|
||||
:alt: data_chart_display
|
||||
|
||||
Time window selection
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This feature is only valid in local mode. User can select the starting point position by
|
||||
sliding to select the left starting point of the time window, and view the specific data at
|
||||
different time.
|
||||
|
||||
In addition, the user can freely choose the end of the time window. When the user plays this tool,
|
||||
it will loop in the time window selected by the user.
|
||||
|
||||
.. image:: ../images/visualization/geographic/time_window_selection.gif
|
||||
:alt: time_window_selection
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Distributed Orchestration
|
||||
=========================
|
||||
|
||||
|
@ -7,20 +6,20 @@ on cloud computing service like `Azure <https://azure.microsoft.com/en-us/>`_.
|
|||
These CLI commands can also be used to schedule the training jobs with the
|
||||
specified resource requirements. In MARO, all training job related components
|
||||
are dockerized for easy deployment and resource allocation. It provides a unified
|
||||
abstraction/interface for different orchestration framework
|
||||
(e.g. `Grass <#id3>`_\ , `Kubernetes <#id4>`_\ ).
|
||||
abstraction/interface for different orchestration framework see
|
||||
(e.g. :ref:`Grass`, :ref:`K8s` ).
|
||||
|
||||
.. image:: ../images/distributed/orch_overview.svg
|
||||
:target: ../images/distributed/orch_overview.svg
|
||||
:alt: Orchestration Overview
|
||||
:width: 600
|
||||
:width: 650
|
||||
|
||||
Process
|
||||
-------
|
||||
|
||||
The process mode is part of the `MARO CLI`, which uses multi-processes to start the
|
||||
training jobs in the localhost environment. To align with `Grass <#id3>`_ and `Kubernetes
|
||||
<#id4>`_, the process mode also uses Redis for job management. The process mode tries
|
||||
training jobs in the localhost environment. To align with :ref:`Grass` and :ref:`K8s`,
|
||||
the process mode also uses Redis for job management. The process mode tries
|
||||
to simulate the operation of the real distributed cluster in localhost so that users can smoothly
|
||||
deploy their code to the distributed cluster. Meanwhile, through the training in the process mode,
|
||||
it is a cheaper way to find bugs that will happens during the real distributed training.
|
||||
|
@ -44,59 +43,118 @@ to get how to use it.
|
|||
.. image:: ../images/distributed/orch_process.svg
|
||||
:target: ../images/distributed/orch_process.svg
|
||||
:alt: Orchestration Process Mode on Local
|
||||
:width: 300
|
||||
:width: 250
|
||||
|
||||
.. _grass:
|
||||
|
||||
Grass
|
||||
-----
|
||||
|
||||
Grass is a self-designed, development purpose orchestration framework. It can be
|
||||
Grass is an orchestration framework developed by the MARO team. It can be
|
||||
confidently applied to small/middle size cluster (< 200 nodes). The design goal
|
||||
of Grass is to speed up the distributed algorithm prototype development.
|
||||
of Grass is to speed up the development of distributed algorithm prototypes.
|
||||
It has the following advantages:
|
||||
|
||||
* Fast deployment in a small cluster.
|
||||
* Fine-grained resource management.
|
||||
* Lightweight, no other dependencies are required.
|
||||
* Lightweight, no complex dependencies required.
|
||||
|
||||
In the Grass mode:
|
||||
|
||||
* All VMs will be deployed in the same virtual network for a faster, more stable
|
||||
connection and larger bandwidth. Please note that the maximum number of VMs is
|
||||
limited by the `available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
|
||||
* It is a centralized topology, the master node will host Redis service for peer
|
||||
discovering, Fluentd service for log collecting, SMB service for file sharing.
|
||||
* On each VM, the probe (worker) agent is used to track the computing resources
|
||||
and detect abnormal events.
|
||||
|
||||
Check `Grass Cluster Provisioning on Azure <../installation/grass_cluster_provisioning_on_azure.html>`_
|
||||
Check :ref:`Grass Cluster Provisioning on Azure <grass-azure-cluster-provisioning>` and
|
||||
:ref:`Grass Cluster Provisioning in On-Premises Environment <grass-on-premises-cluster-provisioning>`
|
||||
to get how to use it.
|
||||
|
||||
Modes
|
||||
^^^^^
|
||||
|
||||
We currently have two modes in Grass, and you can choose whichever you want to create a Grass cluster.
|
||||
|
||||
**grass/azure**
|
||||
|
||||
* Create a Grass cluster with Azure.
|
||||
* With a valid Azure subscription, you can create a cluster with one command from ground zero.
|
||||
* You can easily scale up/down nodes as needed,
|
||||
and start/stop nodes to save costs without messing up the current environment.
|
||||
* Please note that the maximum number of VMs in grass/azure is limited by the
|
||||
`available dedicated IP addresses <https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-address-ranges-can-i-use-in-my-vnets>`_.
|
||||
|
||||
**grass/on-premises**
|
||||
|
||||
* Create a Grass cluster with machines on hand.
|
||||
* You can join a machine to the cluster if the machine is in the same private network as the Master.
|
||||
|
||||
|
||||
Components
|
||||
^^^^^^^^^^
|
||||
Here's the diagram of a Grass cluster with all the components tied together.
|
||||
|
||||
.. image:: ../images/distributed/orch_grass.svg
|
||||
:target: ../images/distributed/orch_grass.svg
|
||||
:alt: Orchestration Grass Mode in Azure
|
||||
:width: 600
|
||||
:width: 650
|
||||
|
||||
Kubernetes
|
||||
----------
|
||||
|
|
||||
|
||||
Master Components
|
||||
|
||||
* redis: A centralized DB for runtime data storage.
|
||||
* fluentd: A centralized data collector for log collecting.
|
||||
* samba-server: For file sharing within the whole cluster.
|
||||
* master-agent: A daemon service for status monitoring and job scheduling.
|
||||
* master-api-server: A RESTFul server for cluster management.
|
||||
The MARO CLI can access this server to control cluster and get cluster information in an encryption session.
|
||||
|
||||
Node Components
|
||||
|
||||
* samba-client: For file sharing.
|
||||
* node-agent: A daemon service for tracking the computing resources and container statues of the node.
|
||||
* node-api-server: An internal RESTFul server for node management.
|
||||
|
||||
|
||||
Communications
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Outer Environment to the Master
|
||||
|
||||
* The communications from outer environment to the Master is encrypted.
|
||||
* Grass will use the following paths in the OuterEnv-Master communications:
|
||||
|
||||
* SSH tunnel: For file transfer and script execution.
|
||||
* HTTP connection: For connection with master-api-server, use RSA+AES hybrid encryption.
|
||||
|
||||
Communications within the Cluster
|
||||
|
||||
* The communications within the cluster is not encrypted.
|
||||
* Therefore, user has the responsibility to make sure all Nodes are connected within a private network and
|
||||
restrict external connections in the cluster.
|
||||
|
||||
|
||||
.. _k8s:
|
||||
|
||||
K8s
|
||||
---
|
||||
|
||||
MARO also supports Kubernetes (k8s) as an orchestration option.
|
||||
With this widely used framework, you can easily build up your training cluster
|
||||
With this widely adopted framework, you can easily build up your MARO Cluster
|
||||
with hundreds and thousands of nodes. It has the following advantages:
|
||||
|
||||
* Higher durability.
|
||||
* Better scalability.
|
||||
|
||||
In the Kubernetes mode:
|
||||
|
||||
* The dockerized job component runs in Kubernetes pod, and each pod only hosts
|
||||
one component.
|
||||
* All Kubernetes pods are registered into the same virtual network using
|
||||
`Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
|
||||
|
||||
Check `K8S Cluster Provisioning on Azure <../installation/k8s_cluster_provisioning_on_azure.html>`_
|
||||
to get how to use it.
|
||||
We currently support the k8s/aks mode in Kubernetes, and it has the following features:
|
||||
|
||||
.. image:: ../images/distributed/orch_k8s.svg
|
||||
:target: ../images/distributed/orch_k8s.svg
|
||||
:alt: Orchestration K8S Mode in Azure
|
||||
:width: 600
|
||||
:width: 650
|
||||
|
||||
|
|
||||
|
||||
* The dockerized job component runs in Kubernetes Pod, and each Pod only hosts one component.
|
||||
* All Kubernetes Pods are registered into the same virtual network using
|
||||
`Container Network Interface(CNI) <https://github.com/containernetworking/cni>`_.
|
||||
* Azure File Service is used for file sharing in all Pods.
|
||||
* Azure Container Registry is included for image management.
|
||||
|
||||
Check :ref:`K8S Cluster Provisioning on Azure <k8s-aks-cluster-provisioning>`
|
||||
to see how to use it.
|
||||
|
|
|
@ -2,112 +2,22 @@
|
|||
RL Toolkit
|
||||
==========
|
||||
|
||||
MARO provides a full-stack abstraction for reinforcement learning (RL), which
|
||||
empowers users to easily apply predefined and customized components to different
|
||||
scenarios in a scalable way. The main abstractions include
|
||||
`Learner, Actor <#learner-and-actor>`_\ , `Agent Manager <#agent-manager>`_\ ,
|
||||
`Agent <#agent>`_\ , `Algorithm <#algorithm>`_\ ,
|
||||
`State Shaper, Action Shaper, Experience Shaper <#shapers>`_\ , etc.
|
||||
MARO provides a full-stack abstraction for reinforcement learning (RL), which enables users to
|
||||
apply predefined and customized components to various scenarios. The main abstractions include
|
||||
fundamental components such as `Agent <#agent>`_\ and `Shaper <#shaper>`_\ , and training routine
|
||||
controllers such as `Actor <#actor>` and `Learner <#learner>`.
|
||||
|
||||
Learner and Actor
|
||||
-----------------
|
||||
|
||||
.. image:: ../images/rl/overview.svg
|
||||
:target: ../images/rl/overview.svg
|
||||
:alt: RL Overview
|
||||
|
||||
* **Learner** is the abstraction of the learnable policy. It is responsible for
|
||||
learning a qualified policy to improve the business optimized object.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Train function of learner.
|
||||
def learn(self):
|
||||
for exploration_params in self._scheduler:
|
||||
performance, exp_by_agent = self._actor.roll_out(
|
||||
self._agent_manager.dump_models(),
|
||||
exploration_params=exploration_params
|
||||
)
|
||||
self._scheduler.record_performance(performance)
|
||||
self._agent_manager.train(exp_by_agent)
|
||||
|
||||
* **Actor** is the abstraction of experience collection. It is responsible for
|
||||
interacting with the environment and collecting experiences. The experiences
|
||||
collected during interaction will be used for the training of the learners.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Rollout function of actor.
|
||||
def roll_out(self, models=None, epsilons=None, seed: int = None):
|
||||
self._env.reset()
|
||||
|
||||
# load models
|
||||
if model_dict is not None:
|
||||
self._agents.load_models(model_dict)
|
||||
|
||||
# load exploration parameters:
|
||||
if exploration_params is not None:
|
||||
self._agents.set_exploration_params(exploration_params)
|
||||
|
||||
metrics, decision_event, is_done = self._env.step(None)
|
||||
while not is_done:
|
||||
action = self._agents.choose_action(decision_event, self._env.snapshot_list)
|
||||
metrics, decision_event, is_done = self._env.step(action)
|
||||
self._agents.on_env_feedback(metrics)
|
||||
|
||||
details = self._agents.post_process(self._env.snapshot_list) if return_details else None
|
||||
|
||||
return self._env.metrics, details
|
||||
|
||||
|
||||
Scheduler
|
||||
---------
|
||||
|
||||
A ``Scheduler`` is the driver of an episodic learning process. The learner uses the scheduler to repeat the
|
||||
rollout-training cycle a set number of episodes. For algorithms that require explicit exploration (e.g.,
|
||||
DQN and DDPG), there are two types of schedules that a learner may follow:
|
||||
|
||||
* Static schedule, where the exploration parameters are generated using a pre-defined function of episode
|
||||
number. See ``LinearParameterScheduler`` and ``TwoPhaseLinearParameterScheduler`` provided in the toolkit
|
||||
for example.
|
||||
* Dynamic schedule, where the exploration parameters for the next episode are determined based on the performance
|
||||
history. Such a mechanism is possible in our abstraction because the scheduler provides a ``record_performance``
|
||||
interface that allows it to keep track of roll-out performances.
|
||||
|
||||
Optionally, an early stopping checker may be registered if one wishes to terminate training when certain performance
|
||||
requirements are satisfied, possibly before reaching the prescribed number of episodes.
|
||||
|
||||
Agent Manager
|
||||
-------------
|
||||
|
||||
The agent manager provides a unified interactive interface with the environment
|
||||
for RL agent(s). From the actor's perspective, it isolates the complex dependencies
|
||||
of the various homogeneous/heterogeneous agents, so that the whole agent manager
|
||||
will behave just like a single agent. Furthermore, to well serve the distributed algorithm
|
||||
(scalable), the agent manager provides two kinds of working modes, which can be applied in
|
||||
different distributed components, such as inference mode in actor, training mode in learner.
|
||||
|
||||
.. image:: ../images/rl/agent_manager.svg
|
||||
:target: ../images/rl/agent_manager.svg
|
||||
:alt: Agent Manager
|
||||
:width: 750
|
||||
|
||||
* In **inference mode**\ , the agent manager is responsible to access and shape
|
||||
the environment state for the related agent, convert the model action to an
|
||||
executable environment action, and finally generate experiences from the
|
||||
interaction trajectory.
|
||||
* In **training mode**\ , the agent manager will optimize the underlying model of
|
||||
the related agent(s), based on the collected experiences from in the inference mode.
|
||||
|
||||
Agent
|
||||
-----
|
||||
|
||||
An agent is a combination of (RL) algorithm, experience pool, and a set of
|
||||
non-algorithm-specific parameters (algorithm-specific parameters are managed by
|
||||
the algorithm module). Non-algorithm-specific parameters are used to manage
|
||||
experience storage, sampling strategies, and training strategies. Since all kinds
|
||||
of scenario-specific stuff will be handled by the agent manager, the agent is
|
||||
scenario agnostic.
|
||||
The Agent is the kernel abstraction of the RL formulation for a real-world problem.
|
||||
Our abstraction decouples agent and its underlying model so that an agent can exist
|
||||
as an RL paradigm independent of the inner workings of the models it uses to generate
|
||||
actions or estimate values. For example, the actor-critic algorithm does not need to
|
||||
concern itself with the structures and optimizing schemes of the actor and critic models.
|
||||
This decoupling is achieved by the Core Model abstraction described below.
|
||||
|
||||
|
||||
.. image:: ../images/rl/agent.svg
|
||||
:target: ../images/rl/agent.svg
|
||||
|
@ -116,96 +26,96 @@ scenario agnostic.
|
|||
.. code-block:: python
|
||||
|
||||
class AbsAgent(ABC):
|
||||
def __init__(self, name: str, algorithm: AbsAlgorithm, experience_pool: AbsStore = None):
|
||||
self._name = name
|
||||
self._algorithm = algorithm
|
||||
self._experience_pool = experience_pool
|
||||
def __init__(self, model: AbsCoreModel, config, experience_pool=None):
|
||||
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
self.model = model.to(self.device)
|
||||
self.config = config
|
||||
self._experience_pool = experience_pool
|
||||
|
||||
|
||||
Algorithm
|
||||
---------
|
||||
|
||||
The algorithm is the kernel abstraction of the RL formulation for a real-world problem. Our abstraction
|
||||
decouples algorithm and model so that an algorithm can exist as an RL paradigm independent of the inner
|
||||
workings of the models it uses to generate actions or estimate values. For example, the actor-critic
|
||||
algorithm does not need to concern itself with the structures and optimizing schemes of the actor and
|
||||
critic models. This decoupling is achieved by the ``LearningModel`` abstraction described below.
|
||||
|
||||
|
||||
.. image:: ../images/rl/algorithm.svg
|
||||
:target: ../images/rl/algorithm.svg
|
||||
:alt: Algorithm
|
||||
:width: 650
|
||||
|
||||
* ``choose_action`` is used to make a decision based on a provided model state.
|
||||
* ``train`` is used to trigger training and the policy update from external.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class AbsAlgorithm(ABC):
|
||||
def __init__(self, model: LearningModel, config):
|
||||
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
self._model = model.to(self._device)
|
||||
self._config = config
|
||||
|
||||
|
||||
Block, NNStack and LearningModel
|
||||
--------------------------------
|
||||
Core Model
|
||||
----------
|
||||
|
||||
MARO provides an abstraction for the underlying models used by agents to form policies and estimate values.
|
||||
The abstraction consists of a 3-level hierachy formed by ``AbsBlock``, ``NNStack`` and ``LearningModel`` from
|
||||
the bottom up, all of which subclass torch's nn.Module. An ``AbsBlock`` is the smallest structural
|
||||
unit of an NN-based model. For instance, the ``FullyConnectedBlock`` provided in the toolkit represents a stack
|
||||
of fully connected layers with features like batch normalization, drop-out and skip connection. An ``NNStack`` is
|
||||
a composite network that consists of one or more such blocks, each with its own set of network features.
|
||||
The complete model as used directly by an ``Algorithm`` is represented by a ``LearningModel``, which consists of
|
||||
one or more task stacks as "heads" and an optional shared stack at the bottom (which serves to produce representations
|
||||
as input to all task stacks). It also contains one or more optimizers responsible for applying gradient steps to the
|
||||
trainable parameters within each stack, which is the smallest trainable unit from the perspective of a ``LearningModel``.
|
||||
The assignment of optimizers is flexible: it is possible to freeze certain stacks while optimizing others. Such an
|
||||
abstraction presents a unified interface to the algorithm, regardless of how many individual models it requires and how
|
||||
The abstraction consists of ``AbsBlock`` and ``AbsCoreModel``, both of which subclass torch's nn.Module.
|
||||
The ``AbsBlock`` represents the smallest structural unit of an NN-based model. For instance, the ``FullyConnectedBlock``
|
||||
provided in the toolkit is a stack of fully connected layers with features like batch normalization,
|
||||
drop-out and skip connection. The ``AbsCoreModel`` is a collection of network components with
|
||||
embedded optimizers and serves as an agent's "brain" by providing a unified interface to it. regardless of how many individual models it requires and how
|
||||
complex the model architecture might be.
|
||||
|
||||
.. image:: ../images/rl/learning_model.svg
|
||||
:target: ../images/rl/learning_model.svg
|
||||
:alt: Algorithm
|
||||
:width: 650
|
||||
|
||||
As an example, the initialization of the actor-critic algorithm may look like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
actor_stack = NNStack(name="actor", block_a1, block_a2, ...)
|
||||
critic_stack = NNStack(name="critic", block_c1, block_c2, ...)
|
||||
learning_model = LearningModel(actor_stack, critic_stack)
|
||||
actor_critic = ActorCritic(learning_model, config)
|
||||
actor_stack = FullyConnectedBlock(...)
|
||||
critic_stack = FullyConnectedBlock(...)
|
||||
model = SimpleMultiHeadModel(
|
||||
{"actor": actor_stack, "critic": critic_stack},
|
||||
optim_option={
|
||||
"actor": OptimizerOption(cls=Adam, params={"lr": 0.001})
|
||||
"critic": OptimizerOption(cls=RMSprop, params={"lr": 0.0001})
|
||||
}
|
||||
)
|
||||
agent = ActorCritic("actor_critic", learning_model, config)
|
||||
|
||||
Choosing an action is simply:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
learning_model(state, task_name="actor", is_training=False)
|
||||
model(state, task_name="actor", training=False)
|
||||
|
||||
And performing one gradient step is simply:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
learning_model.learn(critic_loss + actor_loss)
|
||||
model.learn(critic_loss + actor_loss)
|
||||
|
||||
|
||||
Explorer
|
||||
-------
|
||||
--------
|
||||
|
||||
MARO provides an abstraction for exploration in RL. Some RL algorithms such as DQN and DDPG require
|
||||
explicit exploration, the extent of which is usually determined by a set of parameters whose values
|
||||
are generated by the scheduler. The ``AbsExplorer`` class is designed to cater to these needs. Simple
|
||||
exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space and ``UniformNoiseExplorer``
|
||||
and ``GaussianNoiseExplorer`` for continuous action space, are provided in the toolkit.
|
||||
explicit exploration governed by a set of parameters. The ``AbsExplorer`` class is designed to cater
|
||||
to these needs. Simple exploration schemes, such as ``EpsilonGreedyExplorer`` for discrete action space
|
||||
and ``UniformNoiseExplorer`` and ``GaussianNoiseExplorer`` for continuous action space, are provided in
|
||||
the toolkit.
|
||||
|
||||
As an example, the exploration for DQN may be carried out with the aid of an ``EpsilonGreedyExplorer``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
explorer = EpsilonGreedyExplorer(num_actions=10)
|
||||
greedy_action = learning_model(state, is_training=False).argmax(dim=1).data
|
||||
exploration_action = explorer(greedy_action)
|
||||
greedy_action = learning_model(state, training=False).argmax(dim=1).data
|
||||
exploration_action = explorer(greedy_action)
|
||||
|
||||
|
||||
Tools for Training
|
||||
------------------------------
|
||||
|
||||
.. image:: ../images/rl/learner_actor.svg
|
||||
:target: ../images/rl/learner_actor.svg
|
||||
:alt: RL Overview
|
||||
|
||||
The RL toolkit provides tools that make local and distributed training easy:
|
||||
* Learner, the central controller of the learning process, which consists of collecting simulation data from
|
||||
remote actors and training the agents with them. The training data collection can be done in local or
|
||||
distributed fashion by loading an ``Actor`` or ``ActorProxy`` instance, respectively.
|
||||
* Actor, which implements the ``roll_out`` method where the agent interacts with the environment for one
|
||||
episode. It consists of an environment instance and an agent (a single agent or multiple agents wrapped by
|
||||
``MultiAgentWrapper``). The class provides the as_worker() method which turns it to an event loop where roll-outs
|
||||
are performed on the learner's demand. In distributed RL, there are typically many actor processes running
|
||||
simultaneously to parallelize training data collection.
|
||||
* Actor proxy, which also implements the ``roll_out`` method with the same signature, but manages a set of remote
|
||||
actors for parallel data collection.
|
||||
* Trajectory, which is primarily responsible for translating between scenario-specific information and model
|
||||
input / output. It implements the following methods which are used as callbacks in the actor's roll-out loop:
|
||||
* ``get_state``, which converts observations of an environment into model input. For example, the observation
|
||||
may be represented by a multi-level data structure, which gets encoded by a state shaper to a one-dimensional
|
||||
vector as input to a neural network. The state shaper usually goes hand in hand with the underlying policy
|
||||
or value models.
|
||||
* ``get_action``, which provides model output with necessary context so that it can be executed by the
|
||||
environment simulator.
|
||||
* ``get_reward``, which computes a reward for a given action.
|
||||
* ``on_env_feedback``, which defines things to do upon getting feedback from the environment.
|
||||
* ``on_finish``, which defines things to do upon completion of a roll-out episode.
|
||||
|
|
|
@ -0,0 +1,88 @@
|
|||
Command support for scenarios
|
||||
=================================
|
||||
|
||||
After installation, MARO provides a command that generate project for user,
|
||||
make it much easier to use or customize scenario.
|
||||
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
maro project new
|
||||
|
||||
This command will show a step-by-step wizard to create a new project under current folder.
|
||||
Currently it supports 2 modes.
|
||||
|
||||
|
||||
1. Use built-in scenarios
|
||||
-------------------------
|
||||
|
||||
To use built-in scenarios, please agree the first option "Use built-in scenario" with "yes" or "y", default is "yes".
|
||||
Then you can select a built-in scenario and topologies with auto-completing.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
Use built-in scenario?yes
|
||||
Scenario name:cim
|
||||
Use built-in topology (configuration)?yes
|
||||
Topology name to use:global_trade.22p_l0.0
|
||||
Durations to emulate:1024
|
||||
Number of episodes to emulate:500
|
||||
{'durations': 1024,
|
||||
'scenario': 'cim',
|
||||
'topology': 'global_trade.22p_l0.0',
|
||||
'total_episodes': 500,
|
||||
'use_builtin_scenario': True,
|
||||
'use_builtin_topology': True}
|
||||
|
||||
Is this OK?yes
|
||||
|
||||
If these settings correct, then this command will create a runner.py script, you can just run with:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
python runner.py
|
||||
|
||||
This script contains minimal code to interactive with environment without any action, you can then extend it as you wish.
|
||||
|
||||
Also you can create you own topology (configuration) if you say "no" for options "Use built-in topology (configuration)?".
|
||||
It will ask you for a name of new topology, then copy the content from built-in one into your working folder (topologies/your_topology_name/config.yml).
|
||||
|
||||
|
||||
2. Customized scenario
|
||||
-------------------------------
|
||||
|
||||
This mode is used to generate a template of customize scenario for you instead of writing it from scratch.
|
||||
To enable this, say "no" for option "Use built-in scenario", then provide your scenario name, default is current folder name.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
Use built-in scenario?no
|
||||
New scenario name:my_test
|
||||
New topology name:my_test
|
||||
Durations to emulate:1000
|
||||
Number of episodes to emulate:100
|
||||
{'durations': 1000,
|
||||
'scenario': 'my_test',
|
||||
'topology': 'my_test',
|
||||
'total_episodes': 100,
|
||||
'use_builtin_scenario': False,
|
||||
'use_builtin_topology': False}
|
||||
|
||||
Is this OK?yes
|
||||
|
||||
This will generate following files like below:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
-- runner.py
|
||||
-- scenario
|
||||
-- business_engine.py
|
||||
-- common.py
|
||||
-- events.py
|
||||
-- frame_builder.py
|
||||
-- topologies
|
||||
-- my_test
|
||||
-- config.yml
|
||||
|
||||
The script "runner.py" is the entry of this project, it will interactive with your scenario without action.
|
||||
Then you can fill "scenario/business_engine.py" with your own logic.
|
|
@ -0,0 +1,2 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
|
@ -0,0 +1,11 @@
|
|||
# Container Inventory Management
|
||||
|
||||
Container inventory management (CIM) is a scenario where reinforcement learning (RL) can potentially prove useful. Three algorithms are used to learn the multi-agent policy in given environments. Each algorithm has a ``config`` folder which contains ``agent_config.py`` and ``training_config.py``. The former contains parameters for the underlying models and algorithm specific hyper-parameters. The latter contains parameters for the environment and the main training loop. The file ``common.py`` contains parameters and utility functions shared by some or all of these algorithms.
|
||||
|
||||
In the ``ac`` folder, , the policy is trained using the Actor-Critc algorithm in single-threaded fashion. The example can be run by simply executing ``python3 main.py``. Logs will be saved in a file named ``cim-ac.CURRENT_TIME_STAMP.log`` under the ``ac/logs`` folder, where ``CURRENT_TIME_STAMP`` is the time of executing the script.
|
||||
|
||||
In the ``dqn`` folder, the policy is trained using the DQN algorithm in multi-process / distributed mode. This example can be run in three ways.
|
||||
* ``python3 main.py`` or ``python3 main.py -w 0`` runs the example in multi-process mode, in which a main process spawns one learner process and a number of actor processes as specified in ``config/training_config.py``.
|
||||
* ``python3 main.py -w 1`` launches the learner process only. This is for distributed training and expects a number of actor processes (as specified in ``config/training_config.py``) running on some other node(s).
|
||||
* ``python3 main.py -w 2`` launches the actor process only. This is for distributed training and expects a learner process running on some other node.
|
||||
Logs will be saved in a file named ``GROUP_NAME.log`` under the ``{ac_gnn, dqn}/logs`` folder, where ``GROUP_NAME`` is specified in the "group" field in ``config/training_config.py``.
|
|
@ -0,0 +1,2 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
|
@ -0,0 +1,7 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from .agent_config import agent_config
|
||||
from .training_config import training_config
|
||||
|
||||
__all__ = ["agent_config", "training_config"]
|
|
@ -0,0 +1,52 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from torch import nn
|
||||
from torch.optim import Adam, RMSprop
|
||||
|
||||
from maro.rl import OptimOption
|
||||
|
||||
from examples.cim.common import common_config
|
||||
|
||||
input_dim = (
|
||||
(common_config["look_back"] + 1) *
|
||||
(common_config["max_ports_downstream"] + 1) *
|
||||
len(common_config["port_attributes"]) +
|
||||
len(common_config["vessel_attributes"])
|
||||
)
|
||||
|
||||
agent_config = {
|
||||
"model": {
|
||||
"actor": {
|
||||
"input_dim": input_dim,
|
||||
"output_dim": len(common_config["action_space"]),
|
||||
"hidden_dims": [256, 128, 64],
|
||||
"activation": nn.Tanh,
|
||||
"softmax": True,
|
||||
"batch_norm": False,
|
||||
"head": True
|
||||
},
|
||||
"critic": {
|
||||
"input_dim": input_dim,
|
||||
"output_dim": 1,
|
||||
"hidden_dims": [256, 128, 64],
|
||||
"activation": nn.LeakyReLU,
|
||||
"softmax": False,
|
||||
"batch_norm": True,
|
||||
"head": True
|
||||
}
|
||||
},
|
||||
"optimization": {
|
||||
"actor": OptimOption(optim_cls=Adam, optim_params={"lr": 0.001}),
|
||||
"critic": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.001})
|
||||
},
|
||||
"hyper_params": {
|
||||
"reward_discount": .0,
|
||||
"critic_loss_func": nn.SmoothL1Loss(),
|
||||
"train_iters": 10,
|
||||
"actor_loss_coefficient": 0.1,
|
||||
"k": 1,
|
||||
"lam": 0.0
|
||||
# "clip_ratio": 0.8
|
||||
}
|
||||
}
|
|
@ -0,0 +1,11 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
training_config = {
|
||||
"env": {
|
||||
"scenario": "cim",
|
||||
"topology": "toy.4p_ssdd_l0.0",
|
||||
"durations": 1120,
|
||||
},
|
||||
"max_episode": 50
|
||||
}
|
|
@ -0,0 +1,59 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from collections import defaultdict, deque
|
||||
from os import makedirs, system
|
||||
from os.path import dirname, join, realpath
|
||||
|
||||
import numpy as np
|
||||
from torch import nn
|
||||
from torch.optim import Adam, RMSprop
|
||||
|
||||
from maro.rl import (
|
||||
Actor, ActorCritic, ActorCriticConfig, FullyConnectedBlock, MultiAgentWrapper, SimpleMultiHeadModel,
|
||||
Scheduler, OnPolicyLearner
|
||||
)
|
||||
from maro.simulator import Env
|
||||
from maro.utils import Logger, set_seeds
|
||||
|
||||
from examples.cim.ac.config import agent_config, training_config
|
||||
from examples.cim.common import CIMTrajectory, common_config
|
||||
|
||||
|
||||
def get_ac_agent():
|
||||
actor_net = FullyConnectedBlock(**agent_config["model"]["actor"])
|
||||
critic_net = FullyConnectedBlock(**agent_config["model"]["critic"])
|
||||
ac_model = SimpleMultiHeadModel(
|
||||
{"actor": actor_net, "critic": critic_net}, optim_option=agent_config["optimization"],
|
||||
)
|
||||
return ActorCritic(ac_model, ActorCriticConfig(**agent_config["hyper_params"]))
|
||||
|
||||
|
||||
class CIMTrajectoryForAC(CIMTrajectory):
|
||||
def on_finish(self):
|
||||
training_data = {}
|
||||
for event, state, action in zip(self.trajectory["event"], self.trajectory["state"], self.trajectory["action"]):
|
||||
agent_id = list(state.keys())[0]
|
||||
data = training_data.setdefault(agent_id, {"args": [[] for _ in range(4)]})
|
||||
data["args"][0].append(state[agent_id]) # state
|
||||
data["args"][1].append(action[agent_id][0]) # action
|
||||
data["args"][2].append(action[agent_id][1]) # log_p
|
||||
data["args"][3].append(self.get_offline_reward(event)) # reward
|
||||
|
||||
for agent_id in training_data:
|
||||
training_data[agent_id]["args"] = [
|
||||
np.asarray(vals, dtype=np.float32 if i == 3 else None)
|
||||
for i, vals in enumerate(training_data[agent_id]["args"])
|
||||
]
|
||||
|
||||
return training_data
|
||||
|
||||
|
||||
# Single-threaded launcher
|
||||
if __name__ == "__main__":
|
||||
set_seeds(1024) # for reproducibility
|
||||
env = Env(**training_config["env"])
|
||||
agent = MultiAgentWrapper({name: get_ac_agent() for name in env.agent_idx_list})
|
||||
actor = Actor(env, agent, CIMTrajectoryForAC, trajectory_kwargs=common_config) # local actor
|
||||
learner = OnPolicyLearner(actor, training_config["max_episode"])
|
||||
learner.run()
|
|
@ -0,0 +1,99 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from collections import defaultdict
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import Trajectory
|
||||
from maro.simulator.scenarios.cim.common import Action, ActionType
|
||||
|
||||
common_config = {
|
||||
"port_attributes": ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"],
|
||||
"vessel_attributes": ["empty", "full", "remaining_space"],
|
||||
"action_space": list(np.linspace(-1.0, 1.0, 21)),
|
||||
# Parameters for computing states
|
||||
"look_back": 7,
|
||||
"max_ports_downstream": 2,
|
||||
# Parameters for computing actions
|
||||
"finite_vessel_space": True,
|
||||
"has_early_discharge": True,
|
||||
# Parameters for computing rewards
|
||||
"reward_time_window": 99,
|
||||
"fulfillment_factor": 1.0,
|
||||
"shortage_factor": 1.0,
|
||||
"time_decay": 0.97
|
||||
}
|
||||
|
||||
|
||||
class CIMTrajectory(Trajectory):
|
||||
def __init__(
|
||||
self, env, *, port_attributes, vessel_attributes, action_space, look_back, max_ports_downstream,
|
||||
reward_time_window, fulfillment_factor, shortage_factor, time_decay,
|
||||
finite_vessel_space=True, has_early_discharge=True
|
||||
):
|
||||
super().__init__(env)
|
||||
self.port_attributes = port_attributes
|
||||
self.vessel_attributes = vessel_attributes
|
||||
self.action_space = action_space
|
||||
self.look_back = look_back
|
||||
self.max_ports_downstream = max_ports_downstream
|
||||
self.reward_time_window = reward_time_window
|
||||
self.fulfillment_factor = fulfillment_factor
|
||||
self.shortage_factor = shortage_factor
|
||||
self.time_decay = time_decay
|
||||
self.finite_vessel_space = finite_vessel_space
|
||||
self.has_early_discharge = has_early_discharge
|
||||
|
||||
def get_state(self, event):
|
||||
vessel_snapshots, port_snapshots = self.env.snapshot_list["vessels"], self.env.snapshot_list["ports"]
|
||||
tick, port_idx, vessel_idx = event.tick, event.port_idx, event.vessel_idx
|
||||
ticks = [tick - rt for rt in range(self.look_back - 1)]
|
||||
future_port_idx_list = vessel_snapshots[tick: vessel_idx: 'future_stop_list'].astype('int')
|
||||
port_features = port_snapshots[ticks: [port_idx] + list(future_port_idx_list): self.port_attributes]
|
||||
vessel_features = vessel_snapshots[tick: vessel_idx: self.vessel_attributes]
|
||||
return {port_idx: np.concatenate((port_features, vessel_features))}
|
||||
|
||||
def get_action(self, action_by_agent, event):
|
||||
vessel_snapshots = self.env.snapshot_list["vessels"]
|
||||
action_info = list(action_by_agent.values())[0]
|
||||
model_action = action_info[0] if isinstance(action_info, tuple) else action_info
|
||||
scope, tick, port, vessel = event.action_scope, event.tick, event.port_idx, event.vessel_idx
|
||||
zero_action_idx = len(self.action_space) / 2 # index corresponding to value zero.
|
||||
vessel_space = vessel_snapshots[tick:vessel:self.vessel_attributes][2] if self.finite_vessel_space else float("inf")
|
||||
early_discharge = vessel_snapshots[tick:vessel:"early_discharge"][0] if self.has_early_discharge else 0
|
||||
percent = abs(self.action_space[model_action])
|
||||
|
||||
if model_action < zero_action_idx:
|
||||
action_type = ActionType.LOAD
|
||||
actual_action = min(round(percent * scope.load), vessel_space)
|
||||
elif model_action > zero_action_idx:
|
||||
action_type = ActionType.DISCHARGE
|
||||
plan_action = percent * (scope.discharge + early_discharge) - early_discharge
|
||||
actual_action = round(plan_action) if plan_action > 0 else round(percent * scope.discharge)
|
||||
else:
|
||||
actual_action, action_type = 0, None
|
||||
|
||||
return {port: Action(vessel, port, actual_action, action_type)}
|
||||
|
||||
def get_offline_reward(self, event):
|
||||
port_snapshots = self.env.snapshot_list["ports"]
|
||||
start_tick = event.tick + 1
|
||||
ticks = list(range(start_tick, start_tick + self.reward_time_window))
|
||||
|
||||
future_fulfillment = port_snapshots[ticks::"fulfillment"]
|
||||
future_shortage = port_snapshots[ticks::"shortage"]
|
||||
decay_list = [
|
||||
self.time_decay ** i for i in range(self.reward_time_window)
|
||||
for _ in range(future_fulfillment.shape[0] // self.reward_time_window)
|
||||
]
|
||||
|
||||
tot_fulfillment = np.dot(future_fulfillment, decay_list)
|
||||
tot_shortage = np.dot(future_shortage, decay_list)
|
||||
|
||||
return np.float32(self.fulfillment_factor * tot_fulfillment - self.shortage_factor * tot_shortage)
|
||||
|
||||
def on_env_feedback(self, event, state_by_agent, action_by_agent, reward):
|
||||
self.trajectory["event"].append(event)
|
||||
self.trajectory["state"].append(state_by_agent)
|
||||
self.trajectory["action"].append(action_by_agent)
|
|
@ -1,24 +0,0 @@
|
|||
# Overview
|
||||
|
||||
The CIM problem is one of the quintessential use cases of MARO. The example can
|
||||
be run with a set of scenario configurations that can be found under
|
||||
maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
|
||||
topology, type of algorithm to use, number of training episodes) can be configured
|
||||
through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
|
||||
all algorithm-specific parameters can be configured through
|
||||
the config.py file in that folder.
|
||||
|
||||
## Single-host Single-process Mode
|
||||
|
||||
To run the CIM example using the DQN algorithm under single-host mode, go to
|
||||
examples/cim/dqn and run single_process_launcher.py. You may play around with
|
||||
the configuration if you want to try out different settings.
|
||||
|
||||
## Distributed Mode
|
||||
|
||||
The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
|
||||
for distributed training. For debugging purposes, we provide a script that
|
||||
simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
|
||||
and execute python3 multi_process_launcher.py \[GROUP_NAME\] \[NUM_ACTORS\], where
|
||||
GROUP_NAME is the identifier for the current run and NUM_ACTORS is the number of actor
|
||||
processes to launch.
|
|
@ -0,0 +1,2 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
|
@ -1,14 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from .action_shaper import CIMActionShaper
|
||||
from .agent_manager import DQNAgentManager, create_dqn_agents
|
||||
from .experience_shaper import TruncatedExperienceShaper
|
||||
from .state_shaper import CIMStateShaper
|
||||
|
||||
__all__ = [
|
||||
"CIMActionShaper",
|
||||
"DQNAgentManager", "create_dqn_agents",
|
||||
"TruncatedExperienceShaper",
|
||||
"CIMStateShaper"
|
||||
]
|
|
@ -1,36 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from maro.rl import ActionShaper
|
||||
from maro.simulator.scenarios.cim.common import Action
|
||||
|
||||
|
||||
class CIMActionShaper(ActionShaper):
|
||||
def __init__(self, action_space):
|
||||
super().__init__()
|
||||
self._action_space = action_space
|
||||
self._zero_action_index = action_space.index(0)
|
||||
|
||||
def __call__(self, model_action, decision_event, snapshot_list):
|
||||
scope = decision_event.action_scope
|
||||
tick = decision_event.tick
|
||||
port_idx = decision_event.port_idx
|
||||
vessel_idx = decision_event.vessel_idx
|
||||
|
||||
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
|
||||
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
|
||||
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
|
||||
assert 0 <= model_action < len(self._action_space)
|
||||
|
||||
if model_action < self._zero_action_index:
|
||||
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
|
||||
elif model_action > self._zero_action_index:
|
||||
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
|
||||
actual_action = (
|
||||
round(plan_action) if plan_action > 0
|
||||
else round(self._action_space[model_action] * scope.discharge)
|
||||
)
|
||||
else:
|
||||
actual_action = 0
|
||||
|
||||
return Action(vessel_idx, port_idx, actual_action)
|
|
@ -1,60 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
import pickle
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import AbsAgent, ColumnBasedStore
|
||||
|
||||
|
||||
class DQNAgent(AbsAgent):
|
||||
"""Implementation of AbsAgent for the DQN algorithm.
|
||||
|
||||
Args:
|
||||
name (str): Agent's name.
|
||||
algorithm (AbsAlgorithm): A concrete algorithm instance that inherits from AbstractAlgorithm.
|
||||
experience_pool (AbsStore): It is used to store experiences processed by the experience shaper, which will be
|
||||
used by some value-based algorithms, such as DQN.
|
||||
min_experiences_to_train: minimum number of experiences required for training.
|
||||
num_batches: number of batches to train the DQN model on per call to ``train``.
|
||||
batch_size: mini-batch size.
|
||||
"""
|
||||
def __init__(
|
||||
self,
|
||||
name: str,
|
||||
algorithm,
|
||||
experience_pool: ColumnBasedStore,
|
||||
min_experiences_to_train,
|
||||
num_batches,
|
||||
batch_size
|
||||
):
|
||||
super().__init__(name, algorithm, experience_pool=experience_pool)
|
||||
self._min_experiences_to_train = min_experiences_to_train
|
||||
self._num_batches = num_batches
|
||||
self._batch_size = batch_size
|
||||
|
||||
def train(self):
|
||||
"""Implementation of the training loop for DQN.
|
||||
|
||||
Experiences are sampled using their TD errors as weights. After training, the new TD errors are updated
|
||||
in the experience pool.
|
||||
"""
|
||||
if len(self._experience_pool) < self._min_experiences_to_train:
|
||||
return
|
||||
|
||||
for _ in range(self._num_batches):
|
||||
indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
|
||||
state = np.asarray(sample["state"])
|
||||
action = np.asarray(sample["action"])
|
||||
reward = np.asarray(sample["reward"])
|
||||
next_state = np.asarray(sample["next_state"])
|
||||
loss = self._algorithm.train(state, action, reward, next_state)
|
||||
self._experience_pool.update(indexes, {"loss": loss})
|
||||
|
||||
def dump_experience_pool(self, dir_path: str):
|
||||
"""Dump the experience pool to disk."""
|
||||
os.makedirs(dir_path, exist_ok=True)
|
||||
with open(os.path.join(dir_path, self._name), "wb") as fp:
|
||||
pickle.dump(self._experience_pool, fp)
|
|
@ -1,57 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import torch.nn as nn
|
||||
from torch.optim import RMSprop
|
||||
|
||||
from maro.rl import (
|
||||
ColumnBasedStore, DQN, DQNConfig, FullyConnectedBlock, LearningModel, NNStack, OptimizerOptions,
|
||||
SimpleAgentManager
|
||||
)
|
||||
from maro.utils import set_seeds
|
||||
|
||||
from .agent import DQNAgent
|
||||
|
||||
|
||||
def create_dqn_agents(agent_id_list, config):
|
||||
num_actions = config.algorithm.num_actions
|
||||
set_seeds(config.seed)
|
||||
agent_dict = {}
|
||||
for agent_id in agent_id_list:
|
||||
q_net = NNStack(
|
||||
"q_value",
|
||||
FullyConnectedBlock(
|
||||
input_dim=config.algorithm.input_dim,
|
||||
output_dim=num_actions,
|
||||
activation=nn.LeakyReLU,
|
||||
is_head=True,
|
||||
**config.algorithm.model
|
||||
)
|
||||
)
|
||||
learning_model = LearningModel(
|
||||
q_net,
|
||||
optimizer_options=OptimizerOptions(cls=RMSprop, params=config.algorithm.optimizer)
|
||||
)
|
||||
algorithm = DQN(
|
||||
learning_model,
|
||||
DQNConfig(**config.algorithm.hyper_params, loss_cls=nn.SmoothL1Loss)
|
||||
)
|
||||
agent_dict[agent_id] = DQNAgent(
|
||||
agent_id, algorithm, ColumnBasedStore(**config.experience_pool),
|
||||
**config.training_loop_parameters
|
||||
)
|
||||
|
||||
return agent_dict
|
||||
|
||||
|
||||
class DQNAgentManager(SimpleAgentManager):
|
||||
def train(self, experiences_by_agent, performance=None):
|
||||
self._assert_train_mode()
|
||||
|
||||
# store experiences for each agent
|
||||
for agent_id, exp in experiences_by_agent.items():
|
||||
exp.update({"loss": [1e8] * len(list(exp.values())[0])})
|
||||
self.agent_dict[agent_id].store_experiences(exp)
|
||||
|
||||
for agent in self.agent_dict.values():
|
||||
agent.train()
|
|
@ -1,20 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
"""
|
||||
This file is used to load the configuration and convert it into a dotted dictionary.
|
||||
"""
|
||||
|
||||
import io
|
||||
import os
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
|
||||
with io.open(CONFIG_PATH, "r") as in_file:
|
||||
config = yaml.safe_load(in_file)
|
||||
|
||||
DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
|
||||
with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
|
||||
distributed_config = yaml.safe_load(in_file)
|
|
@ -1,52 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from collections import defaultdict
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import ExperienceShaper
|
||||
|
||||
|
||||
class TruncatedExperienceShaper(ExperienceShaper):
|
||||
def __init__(
|
||||
self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float, shortage_factor: float
|
||||
):
|
||||
super().__init__(reward_func=None)
|
||||
self._time_window = time_window
|
||||
self._time_decay_factor = time_decay_factor
|
||||
self._fulfillment_factor = fulfillment_factor
|
||||
self._shortage_factor = shortage_factor
|
||||
|
||||
def __call__(self, trajectory, snapshot_list):
|
||||
experiences_by_agent = {}
|
||||
for i in range(len(trajectory) - 1):
|
||||
transition = trajectory[i]
|
||||
agent_id = transition["agent_id"]
|
||||
if agent_id not in experiences_by_agent:
|
||||
experiences_by_agent[agent_id] = defaultdict(list)
|
||||
experiences = experiences_by_agent[agent_id]
|
||||
experiences["state"].append(transition["state"])
|
||||
experiences["action"].append(transition["action"])
|
||||
experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
|
||||
experiences["next_state"].append(trajectory[i + 1]["state"])
|
||||
|
||||
return experiences_by_agent
|
||||
|
||||
def _compute_reward(self, decision_event, snapshot_list):
|
||||
start_tick = decision_event.tick + 1
|
||||
end_tick = decision_event.tick + self._time_window
|
||||
ticks = list(range(start_tick, end_tick))
|
||||
|
||||
# calculate tc reward
|
||||
future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
|
||||
future_shortage = snapshot_list["ports"][ticks::"shortage"]
|
||||
decay_list = [
|
||||
self._time_decay_factor ** i for i in range(end_tick - start_tick)
|
||||
for _ in range(future_fulfillment.shape[0] // (end_tick - start_tick))
|
||||
]
|
||||
|
||||
tot_fulfillment = np.dot(future_fulfillment, decay_list)
|
||||
tot_shortage = np.dot(future_shortage, decay_list)
|
||||
|
||||
return np.float32(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
|
|
@ -1,30 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import StateShaper
|
||||
|
||||
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
|
||||
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
|
||||
|
||||
|
||||
class CIMStateShaper(StateShaper):
|
||||
def __init__(self, *, look_back, max_ports_downstream):
|
||||
super().__init__()
|
||||
self._look_back = look_back
|
||||
self._max_ports_downstream = max_ports_downstream
|
||||
self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
|
||||
|
||||
def __call__(self, decision_event, snapshot_list):
|
||||
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
|
||||
ticks = [tick - rt for rt in range(self._look_back - 1)]
|
||||
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
|
||||
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
|
||||
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
|
||||
state = np.concatenate((port_features, vessel_features))
|
||||
return str(port_idx), state
|
||||
|
||||
@property
|
||||
def dim(self):
|
||||
return self._dim
|
|
@ -1,48 +0,0 @@
|
|||
env:
|
||||
scenario: "cim"
|
||||
topology: "toy.4p_ssdd_l0.0"
|
||||
durations: 1120
|
||||
state_shaping:
|
||||
look_back: 7
|
||||
max_ports_downstream: 2
|
||||
experience_shaping:
|
||||
time_window: 100
|
||||
fulfillment_factor: 1.0
|
||||
shortage_factor: 1.0
|
||||
time_decay_factor: 0.97
|
||||
main_loop:
|
||||
max_episode: 500
|
||||
exploration:
|
||||
parameter_names:
|
||||
- "epsilon"
|
||||
split_ep: 250
|
||||
start_values: 0.4
|
||||
mid_values: 0.32
|
||||
end_values: 0.0
|
||||
agents:
|
||||
algorithm:
|
||||
num_actions: 21
|
||||
model:
|
||||
hidden_dims:
|
||||
- 256
|
||||
- 128
|
||||
- 64
|
||||
softmax_enabled: false
|
||||
batch_norm_enabled: true
|
||||
skip_connection_enabled: false
|
||||
dropout_p: 0.0
|
||||
optimizer:
|
||||
lr: 0.05
|
||||
hyper_params:
|
||||
reward_discount: .0
|
||||
target_update_frequency: 5
|
||||
tau: 0.1
|
||||
is_double: true
|
||||
per_sample_td_error_enabled: true
|
||||
experience_pool:
|
||||
capacity: -1
|
||||
training_loop_parameters:
|
||||
min_experiences_to_train: 1024
|
||||
num_batches: 10
|
||||
batch_size: 128
|
||||
seed: 32 # for reproducibility
|
|
@ -0,0 +1,7 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from .agent_config import agent_config
|
||||
from .training_config import training_config
|
||||
|
||||
__all__ = ["agent_config", "training_config"]
|
|
@ -0,0 +1,38 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from torch import nn
|
||||
from torch.optim import RMSprop
|
||||
|
||||
from maro.rl import DQN, DQNConfig, FullyConnectedBlock, OptimOption, PolicyGradient, SimpleMultiHeadModel
|
||||
|
||||
from examples.cim.common import common_config
|
||||
|
||||
input_dim = (
|
||||
(common_config["look_back"] + 1) *
|
||||
(common_config["max_ports_downstream"] + 1) *
|
||||
len(common_config["port_attributes"]) +
|
||||
len(common_config["vessel_attributes"])
|
||||
)
|
||||
|
||||
agent_config = {
|
||||
"model": {
|
||||
"input_dim": input_dim,
|
||||
"output_dim": len(common_config["action_space"]), # number of possible actions
|
||||
"hidden_dims": [256, 128, 64],
|
||||
"activation": nn.LeakyReLU,
|
||||
"softmax": False,
|
||||
"batch_norm": True,
|
||||
"skip_connection": False,
|
||||
"head": True,
|
||||
"dropout_p": 0.0
|
||||
},
|
||||
"optimization": OptimOption(optim_cls=RMSprop, optim_params={"lr": 0.05}),
|
||||
"hyper_params": {
|
||||
"reward_discount": .0,
|
||||
"loss_cls": nn.SmoothL1Loss,
|
||||
"target_update_freq": 5,
|
||||
"tau": 0.1,
|
||||
"double": False
|
||||
}
|
||||
}
|
|
@ -0,0 +1,29 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
training_config = {
|
||||
"env": {
|
||||
"scenario": "cim",
|
||||
"topology": "toy.4p_ssdd_l0.0",
|
||||
"durations": 1120,
|
||||
},
|
||||
"max_episode": 100,
|
||||
"exploration": {
|
||||
"parameter_names": ["epsilon"],
|
||||
"split": 0.5,
|
||||
"start": 0.4,
|
||||
"mid": 0.32,
|
||||
"end": 0.0
|
||||
},
|
||||
"training": {
|
||||
"min_experiences_to_train": 1024,
|
||||
"train_iter": 10,
|
||||
"batch_size": 128,
|
||||
"prioritized_sampling_by_loss": True
|
||||
},
|
||||
"group": "cim-dqn",
|
||||
"learner_update_trigger": 2,
|
||||
"num_actors": 2,
|
||||
"num_trainers": 4,
|
||||
"trainer_id": 0
|
||||
}
|
|
@ -1,49 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
from maro.rl import ActorWorker, AgentManagerMode, SimpleActor
|
||||
from maro.simulator import Env
|
||||
from maro.utils import convert_dottable
|
||||
|
||||
from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
|
||||
|
||||
|
||||
def launch(config, distributed_config):
|
||||
config = convert_dottable(config)
|
||||
distributed_config = convert_dottable(distributed_config)
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
state_shaper = CIMStateShaper(**config.env.state_shaping)
|
||||
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions)))
|
||||
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
|
||||
|
||||
config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_actor",
|
||||
mode=AgentManagerMode.INFERENCE,
|
||||
agent_dict=create_dqn_agents(agent_id_list, config.agents),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper
|
||||
)
|
||||
proxy_params = {
|
||||
"group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
|
||||
"expected_peers": {"learner": 1},
|
||||
"redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
|
||||
"max_retries": 15
|
||||
}
|
||||
actor_worker = ActorWorker(
|
||||
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
|
||||
proxy_params=proxy_params
|
||||
)
|
||||
actor_worker.launch()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config, distributed_config
|
||||
launch(config=config, distributed_config=distributed_config)
|
|
@ -1,51 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
|
||||
from maro.rl import (
|
||||
ActorProxy, AgentManagerMode, SimpleLearner, TwoPhaseLinearParameterScheduler, concat_experiences_by_agent
|
||||
)
|
||||
from maro.simulator import Env
|
||||
from maro.utils import Logger, convert_dottable
|
||||
|
||||
from components import CIMStateShaper, DQNAgentManager, create_dqn_agents
|
||||
|
||||
|
||||
def launch(config, distributed_config):
|
||||
config = convert_dottable(config)
|
||||
distributed_config = convert_dottable(distributed_config)
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
|
||||
config["agents"]["algorithm"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN,
|
||||
agent_dict=create_dqn_agents(agent_id_list, config.agents)
|
||||
)
|
||||
|
||||
proxy_params = {
|
||||
"group_name": os.environ["GROUP"] if "GROUP" in os.environ else distributed_config.group,
|
||||
"expected_peers": {
|
||||
"actor": int(os.environ["NUM_ACTORS"] if "NUM_ACTORS" in os.environ else distributed_config.num_actors)
|
||||
},
|
||||
"redis_address": (distributed_config.redis.hostname, distributed_config.redis.port),
|
||||
"max_retries": 15
|
||||
}
|
||||
|
||||
learner = SimpleLearner(
|
||||
agent_manager=agent_manager,
|
||||
actor=ActorProxy(proxy_params=proxy_params, experience_collecting_func=concat_experiences_by_agent),
|
||||
scheduler=TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration),
|
||||
logger=Logger("cim_learner", auto_timestamp=False)
|
||||
)
|
||||
learner.learn()
|
||||
learner.test()
|
||||
learner.dump_models(os.path.join(os.getcwd(), "models"))
|
||||
learner.exit()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config, distributed_config
|
||||
launch(config=config, distributed_config=distributed_config)
|
|
@ -1,6 +0,0 @@
|
|||
redis:
|
||||
hostname: "localhost"
|
||||
port: 6379
|
||||
group: test_group
|
||||
num_actors: 1
|
||||
num_learners: 1
|
|
@ -0,0 +1,87 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import argparse
|
||||
import time
|
||||
from collections import defaultdict
|
||||
from multiprocessing import Process
|
||||
from os import makedirs
|
||||
from os.path import dirname, join, realpath
|
||||
|
||||
from maro.rl import (
|
||||
Actor, ActorProxy, DQN, DQNConfig, FullyConnectedBlock, MultiAgentWrapper, OffPolicyLearner,
|
||||
SimpleMultiHeadModel, TwoPhaseLinearParameterScheduler
|
||||
)
|
||||
from maro.simulator import Env
|
||||
from maro.utils import Logger, set_seeds
|
||||
|
||||
from examples.cim.common import CIMTrajectory, common_config
|
||||
from examples.cim.dqn.config import agent_config, training_config
|
||||
|
||||
|
||||
def get_dqn_agent():
|
||||
q_model = SimpleMultiHeadModel(
|
||||
FullyConnectedBlock(**agent_config["model"]), optim_option=agent_config["optimization"]
|
||||
)
|
||||
return DQN(q_model, DQNConfig(**agent_config["hyper_params"]))
|
||||
|
||||
|
||||
class CIMTrajectoryForDQN(CIMTrajectory):
|
||||
def on_finish(self):
|
||||
exp_by_agent = defaultdict(lambda: defaultdict(list))
|
||||
for i in range(len(self.trajectory["state"]) - 1):
|
||||
agent_id = list(self.trajectory["state"][i].keys())[0]
|
||||
exp = exp_by_agent[agent_id]
|
||||
exp["S"].append(self.trajectory["state"][i][agent_id])
|
||||
exp["A"].append(self.trajectory["action"][i][agent_id])
|
||||
exp["R"].append(self.get_offline_reward(self.trajectory["event"][i]))
|
||||
exp["S_"].append(list(self.trajectory["state"][i + 1].values())[0])
|
||||
|
||||
return dict(exp_by_agent)
|
||||
|
||||
|
||||
def cim_dqn_learner():
|
||||
env = Env(**training_config["env"])
|
||||
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
|
||||
scheduler = TwoPhaseLinearParameterScheduler(training_config["max_episode"], **training_config["exploration"])
|
||||
actor = ActorProxy(
|
||||
training_config["group"], training_config["num_actors"],
|
||||
update_trigger=training_config["learner_update_trigger"]
|
||||
)
|
||||
learner = OffPolicyLearner(actor, scheduler, agent, **training_config["training"])
|
||||
learner.run()
|
||||
|
||||
|
||||
def cim_dqn_actor():
|
||||
env = Env(**training_config["env"])
|
||||
agent = MultiAgentWrapper({name: get_dqn_agent() for name in env.agent_idx_list})
|
||||
actor = Actor(env, agent, CIMTrajectoryForDQN, trajectory_kwargs=common_config)
|
||||
actor.as_worker(training_config["group"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"-w", "--whoami", type=int, choices=[0, 1, 2], default=0,
|
||||
help="Identity of this process: 0 - multi-process mode, 1 - learner, 2 - actor"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
if args.whoami == 0:
|
||||
actor_processes = [Process(target=cim_dqn_actor) for _ in range(training_config["num_actors"])]
|
||||
learner_process = Process(target=cim_dqn_learner)
|
||||
|
||||
for i, actor_process in enumerate(actor_processes):
|
||||
set_seeds(i) # this is to ensure that the actors explore differently.
|
||||
actor_process.start()
|
||||
|
||||
learner_process.start()
|
||||
|
||||
for actor_process in actor_processes:
|
||||
actor_process.join()
|
||||
|
||||
learner_process.join()
|
||||
elif args.whoami == 1:
|
||||
cim_dqn_learner()
|
||||
elif args.whoami == 2:
|
||||
cim_dqn_actor()
|
|
@ -1,25 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
"""
|
||||
This script is used to debug distributed algorithm in single host multi-process mode.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("group_name", help="group name")
|
||||
parser.add_argument("num_actors", type=int, help="number of actors")
|
||||
args = parser.parse_args()
|
||||
learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
|
||||
actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
|
||||
|
||||
# Launch the learner process
|
||||
os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
|
||||
|
||||
# Launch the actor processes
|
||||
for _ in range(args.num_actors):
|
||||
os.system(f"GROUP={args.group_name} python " + actor_path)
|
|
@ -1,53 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import AgentManagerMode, SimpleActor, SimpleLearner, TwoPhaseLinearParameterScheduler
|
||||
from maro.simulator import Env
|
||||
from maro.utils import LogFormat, Logger, convert_dottable
|
||||
|
||||
from components import CIMActionShaper, CIMStateShaper, DQNAgentManager, TruncatedExperienceShaper, create_dqn_agents
|
||||
|
||||
|
||||
def launch(config):
|
||||
config = convert_dottable(config)
|
||||
# Step 1: Initialize a CIM environment for using a toy dataset.
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
action_space = list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions))
|
||||
|
||||
# Step 2: Create state, action and experience shapers. We also need to create an explorer here due to the
|
||||
# greedy nature of the DQN algorithm.
|
||||
state_shaper = CIMStateShaper(**config.env.state_shaping)
|
||||
action_shaper = CIMActionShaper(action_space=action_space)
|
||||
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
|
||||
|
||||
# Step 3: Create agents and an agent manager.
|
||||
config["agents"]["algorithm"]["input_dim"] = state_shaper.dim
|
||||
agent_manager = DQNAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN_INFERENCE,
|
||||
agent_dict=create_dqn_agents(agent_id_list, config.agents),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper
|
||||
)
|
||||
|
||||
# Step 4: Create an actor and a learner to start the training process.
|
||||
scheduler = TwoPhaseLinearParameterScheduler(config.main_loop.max_episode, **config.main_loop.exploration)
|
||||
actor = SimpleActor(env, agent_manager)
|
||||
learner = SimpleLearner(
|
||||
agent_manager, actor, scheduler,
|
||||
logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
|
||||
)
|
||||
learner.learn()
|
||||
learner.test()
|
||||
learner.dump_models(os.path.join(os.getcwd(), "models"))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config
|
||||
launch(config)
|
|
@ -1,13 +0,0 @@
|
|||
from .actor import ParallelActor
|
||||
from .agent_manager import SimpleAgentManger
|
||||
from .learner import GNNLearner
|
||||
from .state_shaper import GNNStateShaper
|
||||
from .utils import decision_cnt_analysis, load_config, return_scaler, save_code, save_config
|
||||
|
||||
__all__ = [
|
||||
"ParallelActor",
|
||||
"SimpleAgentManger",
|
||||
"GNNLearner",
|
||||
"GNNStateShaper",
|
||||
"decision_cnt_analysis", "load_config", "return_scaler", "save_code", "save_config"
|
||||
]
|
|
@ -1,37 +0,0 @@
|
|||
from maro.rl import ActionShaper
|
||||
|
||||
|
||||
class DiscreteActionShaper(ActionShaper):
|
||||
"""The shaping class to transform the action in [-1, 1] to actual repositioning function."""
|
||||
def __init__(self, action_dim):
|
||||
super().__init__()
|
||||
self._action_dim = action_dim
|
||||
self._zero_action = self._action_dim // 2
|
||||
|
||||
def __call__(self, decision_event, model_action):
|
||||
"""Shaping the action in [-1,1] range to the actual repositioning function.
|
||||
|
||||
This function maps integer model action within the range of [-A, A] to actual action. We define negative actual
|
||||
action as discharge resource from vessel to port and positive action as upload from port to vessel, so the
|
||||
upper bound and lower bound of actual action are the resource in dynamic and static node respectively.
|
||||
|
||||
Args:
|
||||
decision_event (Event): The decision event from the environment.
|
||||
model_action (int): Output action, range A means the half of the agent output dim.
|
||||
"""
|
||||
env_action = 0
|
||||
model_action -= self._zero_action
|
||||
|
||||
action_scope = decision_event.action_scope
|
||||
|
||||
if model_action < 0:
|
||||
# Discharge resource from dynamic node.
|
||||
env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.load)
|
||||
elif model_action == 0:
|
||||
env_action = 0
|
||||
else:
|
||||
# Load resource to dynamic node.
|
||||
env_action = round(int(model_action) * 1.0 / self._zero_action * action_scope.discharge)
|
||||
env_action = int(env_action)
|
||||
|
||||
return env_action
|
|
@ -1,370 +0,0 @@
|
|||
import ctypes
|
||||
import multiprocessing
|
||||
import os
|
||||
import pickle
|
||||
import time
|
||||
from collections import OrderedDict
|
||||
from multiprocessing import Pipe, Process
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from maro.rl import AbsActor
|
||||
from maro.simulator import Env
|
||||
from maro.simulator.scenarios.cim.common import Action
|
||||
|
||||
from .action_shaper import DiscreteActionShaper
|
||||
from .experience_shaper import ExperienceShaper
|
||||
from .shared_structure import SharedStructure
|
||||
from .state_shaper import GNNStateShaper
|
||||
from .utils import fix_seed, gnn_union
|
||||
|
||||
|
||||
def organize_exp_list(experience_collections: dict, idx_mapping: dict):
|
||||
"""The function assemble the experience from multiple processes into a dictionary.
|
||||
|
||||
Args:
|
||||
experience_collections (dict): It stores the experience in all agents. The structure is the same as what is
|
||||
defined in the SharedStructure in the ParallelActor except additional key for experience length. For
|
||||
example:
|
||||
|
||||
{
|
||||
"len": numpy.array,
|
||||
"s": {
|
||||
"v": numpy.array,
|
||||
"p": numpy.array,
|
||||
}
|
||||
"a": numpy.array,
|
||||
"R": numpy.array,
|
||||
"s_": {
|
||||
"v": numpy.array,
|
||||
"p": numpy.array,
|
||||
}
|
||||
}
|
||||
|
||||
Note that the experience from different agents are stored in the same batch in a sequential way. For
|
||||
example, if agent x starts at b_x in batch index and the experience is l_x length long, the range [b_x,
|
||||
l_x) in the batch is the experience of agent x.
|
||||
|
||||
idx_mapping (dict): The key is the name of each agent and the value is the starting index, e.g., b_x, of the
|
||||
storage space where the experience of the agent is stored.
|
||||
"""
|
||||
result = {}
|
||||
tmpi = 0
|
||||
for code, idx in idx_mapping.items():
|
||||
exp_len = experience_collections["len"][0][tmpi]
|
||||
|
||||
s = organize_obs(experience_collections["s"], idx, exp_len)
|
||||
s_ = organize_obs(experience_collections["s_"], idx, exp_len)
|
||||
R = experience_collections["R"][idx: idx + exp_len]
|
||||
R = R.reshape(-1, *R.shape[2:])
|
||||
a = experience_collections["a"][idx: idx + exp_len]
|
||||
a = a.reshape(-1, *a.shape[2:])
|
||||
|
||||
result[code] = {
|
||||
"R": R,
|
||||
"a": a,
|
||||
"s": s,
|
||||
"s_": s_,
|
||||
"len": a.shape[0]
|
||||
}
|
||||
tmpi += 1
|
||||
return result
|
||||
|
||||
|
||||
def organize_obs(obs, idx, exp_len):
|
||||
"""Helper function to transform the observation from multiple processes to a unified dictionary."""
|
||||
tick_buffer, _, para_cnt, v_cnt, v_dim = obs["v"].shape
|
||||
_, _, _, p_cnt, p_dim = obs["p"].shape
|
||||
batch = exp_len * para_cnt
|
||||
# v: tick_buffer, seq_len, parallel_cnt, v_cnt, v_dim --> (tick_buffer, cnt, v_cnt, v_dim)
|
||||
v = obs["v"][:, idx: idx + exp_len]
|
||||
v = v.reshape(tick_buffer, batch, v_cnt, v_dim)
|
||||
p = obs["p"][:, idx: idx + exp_len]
|
||||
p = p.reshape(tick_buffer, batch, p_cnt, p_dim)
|
||||
# vo: seq_len * parallel_cnt * v_cnt * p_cnt* --> cnt * v_cnt * p_cnt*
|
||||
vo = obs["vo"][idx: idx + exp_len]
|
||||
vo = vo.reshape(batch, v_cnt, vo.shape[-1])
|
||||
po = obs["po"][idx: idx + exp_len]
|
||||
po = po.reshape(batch, p_cnt, po.shape[-1])
|
||||
vedge = obs["vedge"][idx: idx + exp_len]
|
||||
vedge = vedge.reshape(batch, v_cnt, vedge.shape[-2], vedge.shape[-1])
|
||||
pedge = obs["pedge"][idx: idx + exp_len]
|
||||
pedge = pedge.reshape(batch, p_cnt, pedge.shape[-2], pedge.shape[-1])
|
||||
ppedge = obs["ppedge"][idx: idx + exp_len]
|
||||
ppedge = ppedge.reshape(batch, p_cnt, ppedge.shape[-2], ppedge.shape[-1])
|
||||
|
||||
# mask: (seq_len, parallel_cnt, tick_buffer)
|
||||
mask = obs["mask"][idx: idx + exp_len].reshape(batch, tick_buffer)
|
||||
|
||||
return {"v": v, "p": p, "vo": vo, "po": po, "pedge": pedge, "vedge": vedge, "ppedge": ppedge, "mask": mask}
|
||||
|
||||
|
||||
def single_player_worker(index, config, exp_idx_mapping, pipe, action_io, exp_output):
|
||||
"""The A2C worker function to collect experience.
|
||||
|
||||
Args:
|
||||
index (int): The process index counted from 0.
|
||||
config (dict): It is a dottable dictionary that stores the configuration of the simulation, state_shaper and
|
||||
postprocessing shaper.
|
||||
exp_idx_mapping (dict): The key is agent code and the value is the starting index where the experience is stored
|
||||
in the experience batch.
|
||||
pipe (Pipe): The pipe instance for communication with the main process.
|
||||
action_io (SharedStructure): The shared memory to hold the state information that the main process uses to
|
||||
generate an action.
|
||||
exp_output (SharedStructure): The shared memory to transfer the experience list to the main process.
|
||||
"""
|
||||
if index == 0:
|
||||
simulation_log_path = os.path.join(config.log.path, f"cim_gnn_{index}")
|
||||
if not os.path.exists(simulation_log_path):
|
||||
os.makedirs(simulation_log_path)
|
||||
opts = {"enable-dump-snapshot": simulation_log_path}
|
||||
env = Env(**config.env.param, options=opts)
|
||||
else:
|
||||
env = Env(**config.env.param)
|
||||
fix_seed(env, config.env.seed)
|
||||
static_code_list, dynamic_code_list = list(env.summary["node_mapping"]["ports"].values()), \
|
||||
list(env.summary["node_mapping"]["vessels"].values())
|
||||
# Create gnn_state_shaper without consuming any resources.
|
||||
|
||||
gnn_state_shaper = GNNStateShaper(
|
||||
static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
|
||||
tick_buffer=config.model.tick_buffer, max_value=env.configs["total_containers"])
|
||||
gnn_state_shaper.compute_static_graph_structure(env)
|
||||
|
||||
action_io_np = action_io.structuralize()
|
||||
|
||||
action_shaper = DiscreteActionShaper(config.model.action_dim)
|
||||
exp_shaper = ExperienceShaper(
|
||||
static_code_list, dynamic_code_list, config.env.param.durations, gnn_state_shaper,
|
||||
scale_factor=config.env.return_scaler, time_slot=config.training.td_steps,
|
||||
discount_factor=config.training.gamma, idx=index, shared_storage=exp_output.structuralize(),
|
||||
exp_idx_mapping=exp_idx_mapping)
|
||||
|
||||
i = 0
|
||||
while pipe.recv() == "reset":
|
||||
r, decision_event, is_done = env.step(None)
|
||||
|
||||
j = 0
|
||||
logs = []
|
||||
while not is_done:
|
||||
model_input = gnn_state_shaper(decision_event, env.snapshot_list)
|
||||
action_io_np["v"][:, index] = model_input["v"]
|
||||
action_io_np["p"][:, index] = model_input["p"]
|
||||
action_io_np["vo"][index] = model_input["vo"]
|
||||
action_io_np["po"][index] = model_input["po"]
|
||||
action_io_np["vedge"][index] = model_input["vedge"]
|
||||
action_io_np["pedge"][index] = model_input["pedge"]
|
||||
action_io_np["ppedge"][index] = model_input["ppedge"]
|
||||
action_io_np["mask"][index] = model_input["mask"]
|
||||
action_io_np["pid"][index] = decision_event.port_idx
|
||||
action_io_np["vid"][index] = decision_event.vessel_idx
|
||||
pipe.send("features")
|
||||
model_action = pipe.recv()
|
||||
env_action = action_shaper(decision_event, model_action)
|
||||
exp_shaper.record(decision_event=decision_event, model_action=model_action, model_input=model_input)
|
||||
logs.append([
|
||||
index, decision_event.tick, decision_event.port_idx, decision_event.vessel_idx, model_action,
|
||||
env_action, decision_event.action_scope.load, decision_event.action_scope.discharge])
|
||||
action = Action(decision_event.vessel_idx, decision_event.port_idx, env_action)
|
||||
r, decision_event, is_done = env.step(action)
|
||||
j += 1
|
||||
action_io_np["sh"][index] = compute_shortage(env.snapshot_list, config.env.param.durations, static_code_list)
|
||||
i += 1
|
||||
pipe.send("done")
|
||||
gnn_state_shaper.end_ep_callback(env.snapshot_list)
|
||||
# Organize and synchronize exp to shared memory.
|
||||
exp_shaper(env.snapshot_list)
|
||||
exp_shaper.reset()
|
||||
logs = np.array(logs, dtype=np.float)
|
||||
pipe.send(logs)
|
||||
env.reset()
|
||||
|
||||
|
||||
def compute_shortage(snapshot_list, max_tick, static_code_list):
|
||||
"""Helper function to compute the shortage after a episode end."""
|
||||
return np.sum(snapshot_list["ports"][max_tick - 1: static_code_list: "acc_shortage"])
|
||||
|
||||
|
||||
class ParallelActor(AbsActor):
|
||||
def __init__(self, config, demo_env, gnn_state_shaper, agent_manager, logger):
|
||||
"""A2C rollout class.
|
||||
|
||||
This implements the synchronized A2C structure. Multiple processes are created to simulate and collect
|
||||
experience where only CPU is needed and whenever an action is required, they notify the main process and the
|
||||
main process will do the batch action inference with GPU.
|
||||
|
||||
Args:
|
||||
config (dict): The configuration to run the simulation.
|
||||
demo_env (maro.simulator.Env): To get configuration information such as the amount of vessels and ports as
|
||||
well as the topology of the environment, the example environment is needed.
|
||||
gnn_state_shaper (AbsShaper): The state shaper instance to extract graph information from the state of
|
||||
the environment.
|
||||
agent_manager (AbsAgentManger): The agent manager instance to do the action inference in batch.
|
||||
logger: The logger instance to log information during the rollout.
|
||||
|
||||
"""
|
||||
super().__init__(demo_env, agent_manager)
|
||||
multiprocessing.set_start_method("spawn", True)
|
||||
self._logger = logger
|
||||
self.config = config
|
||||
|
||||
self._static_node_mapping = demo_env.summary["node_mapping"]["ports"]
|
||||
self._dynamic_node_mapping = demo_env.summary["node_mapping"]["vessels"]
|
||||
self._gnn_state_shaper = gnn_state_shaper
|
||||
self.device = torch.device(config.training.device)
|
||||
|
||||
self.parallel_cnt = config.training.parallel_cnt
|
||||
self.log_header = [f"sh_{i}" for i in range(self.parallel_cnt)]
|
||||
|
||||
tick_buffer = config.model.tick_buffer
|
||||
|
||||
v_dim, vedge_dim, v_cnt = self._gnn_state_shaper.get_input_dim("v"), \
|
||||
self._gnn_state_shaper.get_input_dim("vedge"), len(self._dynamic_node_mapping)
|
||||
p_dim, pedge_dim, p_cnt = self._gnn_state_shaper.get_input_dim("p"), \
|
||||
self._gnn_state_shaper.get_input_dim("pedge"), len(self._static_node_mapping)
|
||||
|
||||
self.pipes = [Pipe() for i in range(self.parallel_cnt)]
|
||||
|
||||
action_io_structure = {
|
||||
"p": ((tick_buffer, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
|
||||
"v": ((tick_buffer, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
|
||||
"po": ((self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
|
||||
"vo": ((self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
|
||||
"vedge": ((self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
|
||||
"pedge": ((self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
|
||||
"ppedge": ((self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
|
||||
"mask": ((self.parallel_cnt, tick_buffer), ctypes.c_bool),
|
||||
"sh": ((self.parallel_cnt, ), ctypes.c_long),
|
||||
"pid": ((self.parallel_cnt, ), ctypes.c_long),
|
||||
"vid": ((self.parallel_cnt, ), ctypes.c_long)
|
||||
}
|
||||
self.action_io = SharedStructure(action_io_structure)
|
||||
self.action_io_np = self.action_io.structuralize()
|
||||
|
||||
tot_exp_len = sum(config.env.exp_per_ep.values())
|
||||
|
||||
exp_output_structure = {
|
||||
"s": {
|
||||
"v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
|
||||
"p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
|
||||
"vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
|
||||
"po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
|
||||
"vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
|
||||
"pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
|
||||
"ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
|
||||
"mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
|
||||
},
|
||||
"s_": {
|
||||
"v": ((tick_buffer, tot_exp_len, self.parallel_cnt, v_cnt, v_dim), ctypes.c_float),
|
||||
"p": ((tick_buffer, tot_exp_len, self.parallel_cnt, p_cnt, p_dim), ctypes.c_float),
|
||||
"vo": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt), ctypes.c_long),
|
||||
"po": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt), ctypes.c_long),
|
||||
"vedge": ((tot_exp_len, self.parallel_cnt, v_cnt, p_cnt, vedge_dim), ctypes.c_float),
|
||||
"pedge": ((tot_exp_len, self.parallel_cnt, p_cnt, v_cnt, vedge_dim), ctypes.c_float),
|
||||
"ppedge": ((tot_exp_len, self.parallel_cnt, p_cnt, p_cnt, pedge_dim), ctypes.c_float),
|
||||
"mask": ((tot_exp_len, self.parallel_cnt, tick_buffer), ctypes.c_bool)
|
||||
},
|
||||
"a": ((tot_exp_len, self.parallel_cnt), ctypes.c_long),
|
||||
"len": ((self.parallel_cnt, len(config.env.exp_per_ep)), ctypes.c_long),
|
||||
"R": ((tot_exp_len, self.parallel_cnt, p_cnt), ctypes.c_float),
|
||||
}
|
||||
self.exp_output = SharedStructure(exp_output_structure)
|
||||
self.exp_output_np = self.exp_output.structuralize()
|
||||
|
||||
self._logger.info("allocate complete")
|
||||
|
||||
self.exp_idx_mapping = OrderedDict()
|
||||
acc_c = 0
|
||||
for key, c in config.env.exp_per_ep.items():
|
||||
self.exp_idx_mapping[key] = acc_c
|
||||
acc_c += c
|
||||
|
||||
self.workers = [
|
||||
Process(
|
||||
target=single_player_worker,
|
||||
args=(i, config, self.exp_idx_mapping, self.pipes[i][1], self.action_io, self.exp_output)
|
||||
) for i in range(self.parallel_cnt)
|
||||
]
|
||||
for w in self.workers:
|
||||
w.start()
|
||||
|
||||
self._logger.info("all thread started")
|
||||
|
||||
self._roll_out_time = 0
|
||||
self._trainsfer_time = 0
|
||||
self._roll_out_cnt = 0
|
||||
|
||||
def roll_out(self):
|
||||
"""Rollout using current policy in the AgentManager.
|
||||
|
||||
Returns:
|
||||
result (dict): The key is the agent code, the value is the experience list stored in numpy.array.
|
||||
"""
|
||||
# Compute the time used for state preparation in the child process.
|
||||
t_state = 0
|
||||
# Compute the time used for action inference.
|
||||
t_action = 0
|
||||
|
||||
for p in self.pipes:
|
||||
p[0].send("reset")
|
||||
self._roll_out_cnt += 1
|
||||
|
||||
step_i = 0
|
||||
tick = time.time()
|
||||
while True:
|
||||
signals = [p[0].recv() for p in self.pipes]
|
||||
if signals[0] == "done":
|
||||
break
|
||||
|
||||
step_i += 1
|
||||
|
||||
t = time.time()
|
||||
graph = gnn_union(
|
||||
self.action_io_np["p"], self.action_io_np["po"], self.action_io_np["pedge"],
|
||||
self.action_io_np["v"], self.action_io_np["vo"], self.action_io_np["vedge"],
|
||||
self._gnn_state_shaper.p2p_static_graph, self.action_io_np["ppedge"],
|
||||
self.action_io_np["mask"], self.device
|
||||
)
|
||||
t_state += time.time() - t
|
||||
|
||||
assert(np.min(self.action_io_np["pid"]) == np.max(self.action_io_np["pid"]))
|
||||
assert(np.min(self.action_io_np["vid"]) == np.max(self.action_io_np["vid"]))
|
||||
|
||||
t = time.time()
|
||||
actions = self._inference_agents.choose_action(
|
||||
agent_id=(self.action_io_np["pid"][0], self.action_io_np["vid"][0]), state=graph
|
||||
)
|
||||
t_action += time.time() - t
|
||||
|
||||
for i, p in enumerate(self.pipes):
|
||||
p[0].send(actions[i])
|
||||
|
||||
self._roll_out_time += time.time() - tick
|
||||
tick = time.time()
|
||||
self._logger.info("receiving exp")
|
||||
logs = [p[0].recv() for p in self.pipes]
|
||||
|
||||
self._logger.info(f"Mean of shortage: {np.mean(self.action_io_np['sh'])}")
|
||||
self._trainsfer_time += time.time() - tick
|
||||
|
||||
self._logger.debug(dict(zip(self.log_header, self.action_io_np["sh"])))
|
||||
|
||||
with open(os.path.join(self.config.log.path, f"logs_{self._roll_out_cnt}"), "wb") as fp:
|
||||
pickle.dump(logs, fp)
|
||||
|
||||
self._logger.info("organize exp_dict")
|
||||
result = organize_exp_list(self.exp_output_np, self.exp_idx_mapping)
|
||||
|
||||
if self.config.log.exp.enable and self._roll_out_cnt % self.config.log.exp.freq == 0:
|
||||
with open(os.path.join(self.config.log.path, f"exp_{self._roll_out_cnt}"), "wb") as fp:
|
||||
pickle.dump(result, fp)
|
||||
|
||||
self._logger.debug(f"play time: {int(self._roll_out_time)}")
|
||||
self._logger.debug(f"transfer time: {int(self._trainsfer_time)}")
|
||||
return result
|
||||
|
||||
def exit(self):
|
||||
"""Terminate the child processes."""
|
||||
for p in self.pipes:
|
||||
p[0].send("close")
|
|
@ -1,180 +0,0 @@
|
|||
import os
|
||||
|
||||
import torch
|
||||
from torch import nn
|
||||
from torch.distributions import Categorical
|
||||
from torch.nn.utils import clip_grad
|
||||
|
||||
from maro.rl import AbsAlgorithm
|
||||
|
||||
from .utils import gnn_union
|
||||
|
||||
|
||||
class ActorCritic(AbsAlgorithm):
|
||||
"""Actor-Critic algorithm in CIM problem.
|
||||
|
||||
The vanilla ac algorithm.
|
||||
|
||||
Args:
|
||||
model (nn.Module): A actor-critic module outputing both the policy network and the value network.
|
||||
device (torch.device): A PyTorch device instance where the module is computed on.
|
||||
p2p_adj (numpy.array): The static port-to-port adjencency matrix.
|
||||
td_steps (int): The value "n" in the n-step TD algorithm.
|
||||
gamma (float): The time decay.
|
||||
learning_rate (float): The learning rate for the module.
|
||||
entropy_factor (float): The weight of the policy"s entropy to boost exploration.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, model: nn.Module, device: torch.device, p2p_adj=None, td_steps=100, gamma=0.97, learning_rate=0.0003,
|
||||
entropy_factor=0.1):
|
||||
self._gamma = gamma
|
||||
self._td_steps = td_steps
|
||||
self._value_discount = gamma**100
|
||||
self._entropy_factor = entropy_factor
|
||||
self._device = device
|
||||
self._tot_batchs = 0
|
||||
self._p2p_adj = p2p_adj
|
||||
super().__init__(
|
||||
model_dict={"a&c": model}, optimizer_opt={"a&c": (torch.optim.Adam, {"lr": learning_rate})},
|
||||
loss_func_dict={}, hyper_params=None)
|
||||
|
||||
def choose_action(self, state: dict, p_idx: int, v_idx: int):
|
||||
"""Get action from the AC model.
|
||||
|
||||
Args:
|
||||
state (dict): A dictionary containing the input to the module. For example:
|
||||
{
|
||||
"v": v,
|
||||
"p": p,
|
||||
"pe": {
|
||||
"edge": pedge,
|
||||
"adj": padj,
|
||||
"mask": pmask,
|
||||
},
|
||||
"ve": {
|
||||
"edge": vedge,
|
||||
"adj": vadj,
|
||||
"mask": vmask,
|
||||
},
|
||||
"ppe": {
|
||||
"edge": ppedge,
|
||||
"adj": p2p_adj,
|
||||
"mask": p2p_mask,
|
||||
},
|
||||
"mask": seq_mask,
|
||||
}
|
||||
p_idx (int): The identity of the port doing the action.
|
||||
v_idx (int): The identity of the vessel doing the action.
|
||||
|
||||
Returns:
|
||||
model_action (numpy.int64): The action returned from the module.
|
||||
"""
|
||||
with torch.no_grad():
|
||||
prob, _ = self._model_dict["a&c"](state, a=True, p_idx=p_idx, v_idx=v_idx)
|
||||
distribution = Categorical(prob)
|
||||
model_action = distribution.sample().cpu().numpy()
|
||||
return model_action
|
||||
|
||||
def train(self, batch, p_idx, v_idx):
|
||||
"""Model training.
|
||||
|
||||
Args:
|
||||
batch (dict): The dictionary of a batch of experience. For example:
|
||||
{
|
||||
"s": the dictionary of state,
|
||||
"a": model actions in numpy array,
|
||||
"R": the n-step accumulated reward,
|
||||
"s"": the dictionary of the next state,
|
||||
}
|
||||
p_idx (int): The identity of the port doing the action.
|
||||
v_idx (int): The identity of the vessel doing the action.
|
||||
|
||||
Returns:
|
||||
a_loss (float): action loss.
|
||||
c_loss (float): critic loss.
|
||||
e_loss (float): entropy loss.
|
||||
tot_norm (float): the L2 norm of the gradient.
|
||||
|
||||
"""
|
||||
self._tot_batchs += 1
|
||||
item_a_loss, item_c_loss, item_e_loss = 0, 0, 0
|
||||
obs_batch = batch["s"]
|
||||
action_batch = batch["a"]
|
||||
return_batch = batch["R"]
|
||||
next_obs_batch = batch["s_"]
|
||||
|
||||
obs_batch = gnn_union(
|
||||
obs_batch["p"], obs_batch["po"], obs_batch["pedge"], obs_batch["v"], obs_batch["vo"], obs_batch["vedge"],
|
||||
self._p2p_adj, obs_batch["ppedge"], obs_batch["mask"], self._device)
|
||||
action_batch = torch.from_numpy(action_batch).long().to(self._device)
|
||||
return_batch = torch.from_numpy(return_batch).float().to(self._device)
|
||||
next_obs_batch = gnn_union(
|
||||
next_obs_batch["p"], next_obs_batch["po"], next_obs_batch["pedge"], next_obs_batch["v"],
|
||||
next_obs_batch["vo"], next_obs_batch["vedge"], self._p2p_adj, next_obs_batch["ppedge"],
|
||||
next_obs_batch["mask"], self._device)
|
||||
|
||||
# Train actor network.
|
||||
self._optimizer["a&c"].zero_grad()
|
||||
|
||||
# Every port has a value.
|
||||
# values.shape: (batch, p_cnt)
|
||||
probs, values = self._model_dict["a&c"](obs_batch, a=True, p_idx=p_idx, v_idx=v_idx, c=True)
|
||||
distribution = Categorical(probs)
|
||||
log_prob = distribution.log_prob(action_batch)
|
||||
entropy_loss = distribution.entropy()
|
||||
|
||||
_, values_ = self._model_dict["a&c"](next_obs_batch, c=True)
|
||||
advantage = return_batch + self._value_discount * values_.detach() - values
|
||||
|
||||
if self._entropy_factor != 0:
|
||||
# actor_loss = actor_loss* torch.log(entropy_loss + np.e)
|
||||
advantage[:, p_idx] += self._entropy_factor * entropy_loss.detach()
|
||||
|
||||
actor_loss = - (log_prob * torch.sum(advantage, axis=-1).detach()).mean()
|
||||
|
||||
item_a_loss = actor_loss.item()
|
||||
item_e_loss = entropy_loss.mean().item()
|
||||
|
||||
# Train critic network.
|
||||
critic_loss = torch.sum(advantage.pow(2), axis=1).mean()
|
||||
item_c_loss = critic_loss.item()
|
||||
# torch.nn.utils.clip_grad_norm_(self._critic_model.parameters(),0.5)
|
||||
tot_loss = 0.1 * actor_loss + critic_loss
|
||||
tot_loss.backward()
|
||||
tot_norm = clip_grad.clip_grad_norm_(self._model_dict["a&c"].parameters(), 1)
|
||||
self._optimizer["a&c"].step()
|
||||
return item_a_loss, item_c_loss, item_e_loss, float(tot_norm)
|
||||
|
||||
def set_weights(self, weights):
|
||||
self._model_dict["a&c"].load_state_dict(weights)
|
||||
|
||||
def get_weights(self):
|
||||
return self._model_dict["a&c"].state_dict()
|
||||
|
||||
def _get_save_idx(self, fp_str):
|
||||
return int(fp_str.split(".")[0].split("_")[0])
|
||||
|
||||
def save_model(self, pth, id):
|
||||
if not os.path.exists(pth):
|
||||
os.makedirs(pth)
|
||||
pth = os.path.join(pth, f"{id}_ac.pkl")
|
||||
torch.save(self._model_dict["a&c"].state_dict(), pth)
|
||||
|
||||
def _set_gnn_weights(self, weights):
|
||||
for key in weights:
|
||||
if key in self._model_dict["a&c"].state_dict().keys():
|
||||
self._model_dict["a&c"].state_dict()[key].copy_(weights[key])
|
||||
|
||||
def load_model(self, folder_pth, idx=-1):
|
||||
if idx == -1:
|
||||
fps = os.listdir(folder_pth)
|
||||
fps = [f for f in fps if "ac" in f]
|
||||
fps.sort(key=self._get_save_idx)
|
||||
ac_pth = fps[-1]
|
||||
else:
|
||||
ac_pth = f"{idx}_ac.pkl"
|
||||
pth = os.path.join(folder_pth, ac_pth)
|
||||
with open(pth, "rb") as fp:
|
||||
weights = torch.load(fp, map_location=self._device)
|
||||
self._set_gnn_weights(weights)
|
|
@ -1,41 +0,0 @@
|
|||
from collections import defaultdict
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import AbsAgent
|
||||
from maro.utils import DummyLogger
|
||||
|
||||
from .numpy_store import Shuffler
|
||||
|
||||
|
||||
class TrainableAgent(AbsAgent):
|
||||
def __init__(self, name, algorithm, experience_pool, logger=DummyLogger()):
|
||||
self._logger = logger
|
||||
super().__init__(name, algorithm, experience_pool)
|
||||
|
||||
def train(self, training_config):
|
||||
loss_dict = defaultdict(list)
|
||||
for j in range(training_config.shuffle_time):
|
||||
shuffler = Shuffler(self._experience_pool, batch_size=training_config.batch_size)
|
||||
while shuffler.has_next():
|
||||
batch = shuffler.next()
|
||||
actor_loss, critic_loss, entropy_loss, tot_loss = self._algorithm.train(
|
||||
batch, self._name[0], self._name[1])
|
||||
loss_dict["actor"].append(actor_loss)
|
||||
loss_dict["critic"].append(critic_loss)
|
||||
loss_dict["entropy"].append(entropy_loss)
|
||||
loss_dict["tot"].append(tot_loss)
|
||||
|
||||
a_loss = np.mean(loss_dict["actor"])
|
||||
c_loss = np.mean(loss_dict["critic"])
|
||||
e_loss = np.mean(loss_dict["entropy"])
|
||||
tot_loss = np.mean(loss_dict["tot"])
|
||||
self._logger.debug(
|
||||
f"code: {str(self._name)} \t actor: {float(a_loss)} \t critic: {float(c_loss)} \t entropy: {float(e_loss)} \
|
||||
\t tot: {float(tot_loss)}")
|
||||
|
||||
self._experience_pool.clear()
|
||||
return loss_dict
|
||||
|
||||
def choose_action(self, model_state):
|
||||
return self._algorithm.choose_action(model_state, self._name[0], self._name[1])
|
|
@ -1,119 +0,0 @@
|
|||
from copy import copy
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from maro.rl import AbsAgentManager, AgentMode
|
||||
from maro.utils import DummyLogger
|
||||
|
||||
from .actor_critic import ActorCritic
|
||||
from .agent import TrainableAgent
|
||||
from .numpy_store import NumpyStore
|
||||
from .simple_gnn import SharedAC
|
||||
from .state_shaper import GNNStateShaper
|
||||
|
||||
|
||||
class SimpleAgentManger(AbsAgentManager):
|
||||
def __init__(
|
||||
self, name, agent_id_list, port_code_list, vessel_code_list, demo_env, state_shaper: GNNStateShaper,
|
||||
logger=DummyLogger()):
|
||||
super().__init__(
|
||||
name, AgentMode.TRAIN, agent_id_list, state_shaper=state_shaper, action_shaper=None,
|
||||
experience_shaper=None, explorer=None)
|
||||
self.port_code_list = copy(port_code_list)
|
||||
self.vessel_code_list = copy(vessel_code_list)
|
||||
self.demo_env = demo_env
|
||||
self._logger = logger
|
||||
|
||||
def assemble(self, config):
|
||||
v_dim, vedge_dim = self._state_shaper.get_input_dim("v"), self._state_shaper.get_input_dim("vedge")
|
||||
p_dim, pedge_dim = self._state_shaper.get_input_dim("p"), self._state_shaper.get_input_dim("pedge")
|
||||
|
||||
self.device = torch.device(config.training.device)
|
||||
self._logger.info(config.training.device)
|
||||
ac_model = SharedAC(
|
||||
p_dim, pedge_dim, v_dim, vedge_dim, config.model.tick_buffer, config.model.action_dim).to(self.device)
|
||||
|
||||
value_dict = {
|
||||
("s", "v"):
|
||||
(
|
||||
(config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
|
||||
np.float32, False),
|
||||
("s", "p"):
|
||||
(
|
||||
(config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
|
||||
np.float32, False),
|
||||
("s", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
|
||||
("s", "po"): ((len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
|
||||
("s", "vedge"):
|
||||
(
|
||||
(len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
|
||||
np.float32, True),
|
||||
("s", "pedge"):
|
||||
(
|
||||
(len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
|
||||
np.float32, True),
|
||||
("s", "ppedge"):
|
||||
(
|
||||
(len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
|
||||
np.float32, True),
|
||||
("s", "mask"): ((config.model.tick_buffer, ), np.bool, True),
|
||||
|
||||
("s_", "v"):
|
||||
(
|
||||
(config.model.tick_buffer, len(self.vessel_code_list), self._state_shaper.get_input_dim("v")),
|
||||
np.float32, False),
|
||||
("s_", "p"):
|
||||
(
|
||||
(config.model.tick_buffer, len(self.port_code_list), self._state_shaper.get_input_dim("p")),
|
||||
np.float32, False),
|
||||
("s_", "vo"): ((len(self.vessel_code_list), len(self.port_code_list)), np.int64, True),
|
||||
("s_", "po"):
|
||||
(
|
||||
(len(self.port_code_list), len(self.vessel_code_list)), np.int64, True),
|
||||
("s_", "vedge"):
|
||||
(
|
||||
(len(self.vessel_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("vedge")),
|
||||
np.float32, True),
|
||||
("s_", "pedge"):
|
||||
(
|
||||
(len(self.port_code_list), len(self.vessel_code_list), self._state_shaper.get_input_dim("vedge")),
|
||||
np.float32, True),
|
||||
("s_", "ppedge"):
|
||||
(
|
||||
(len(self.port_code_list), len(self.port_code_list), self._state_shaper.get_input_dim("pedge")),
|
||||
np.float32, True),
|
||||
("s_", "mask"): ((config.model.tick_buffer, ), np.bool, True),
|
||||
|
||||
# To identify one dimension variable.
|
||||
("R",): ((len(self.port_code_list), ), np.float32, True),
|
||||
("a",): (tuple(), np.int64, True),
|
||||
}
|
||||
|
||||
self._algorithm = ActorCritic(
|
||||
ac_model, self.device, td_steps=config.training.td_steps, p2p_adj=self._state_shaper.p2p_static_graph,
|
||||
gamma=config.training.gamma, learning_rate=config.training.learning_rate)
|
||||
|
||||
for agent_id, cnt in config.env.exp_per_ep.items():
|
||||
experience_pool = NumpyStore(value_dict, config.training.parallel_cnt * config.training.train_freq * cnt)
|
||||
self._agent_dict[agent_id] = TrainableAgent(agent_id, self._algorithm, experience_pool, self._logger)
|
||||
|
||||
def choose_action(self, agent_id, state):
|
||||
return self._agent_dict[agent_id].choose_action(state)
|
||||
|
||||
def load_models_from_files(self, model_pth):
|
||||
self._algorithm.load_model(model_pth)
|
||||
|
||||
def train(self, training_config):
|
||||
for agent in self._agent_dict.values():
|
||||
agent.train(training_config)
|
||||
|
||||
def store_experiences(self, experiences):
|
||||
for code, exp_list in experiences.items():
|
||||
self._agent_dict[code].store_experiences(exp_list)
|
||||
|
||||
def save_model(self, pth, id):
|
||||
self._algorithm.save_model(pth, id)
|
||||
|
||||
def load_model(self, pth):
|
||||
self._algorithm.load_model(pth)
|
|
@ -1,111 +0,0 @@
|
|||
from collections import defaultdict
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class ExperienceShaper:
|
||||
def __init__(
|
||||
self, static_list, dynamic_list, max_tick, gnn_state_shaper, scale_factor=0.0001, time_slot=100,
|
||||
discount_factor=0.97, idx=-1, shared_storage=None, exp_idx_mapping=None):
|
||||
self._static_list = list(static_list)
|
||||
self._dynamic_list = list(dynamic_list)
|
||||
self._time_slot = time_slot
|
||||
self._discount_factor = discount_factor
|
||||
self._discount_vector = np.logspace(1, self._time_slot, self._time_slot, base=discount_factor)
|
||||
self._max_tick = max_tick
|
||||
self._tick_range = list(range(self._max_tick))
|
||||
self._len_return = self._max_tick - self._time_slot
|
||||
self._gnn_state_shaper = gnn_state_shaper
|
||||
self._fulfillment_list, self._shortage_list, self._experience_dict = None, None, None
|
||||
self._experience_dict = defaultdict(list)
|
||||
self._init_state()
|
||||
self._idx = idx
|
||||
self._exp_idx_mapping = exp_idx_mapping
|
||||
self._shared_storage = shared_storage
|
||||
self._scale_factor = scale_factor
|
||||
|
||||
def _init_state(self):
|
||||
self._fulfillment_list, self._shortage_list = np.zeros(self._max_tick + 1), np.zeros(self._max_tick + 1)
|
||||
self._experience_dict = defaultdict(list)
|
||||
self._last_tick = 0
|
||||
|
||||
def record(self, decision_event, model_action, model_input):
|
||||
# Only the experience that has the next state of given time slot is valuable.
|
||||
if decision_event.tick + self._time_slot < self._max_tick:
|
||||
self._experience_dict[decision_event.port_idx, decision_event.vessel_idx].append({
|
||||
"tick": decision_event.tick,
|
||||
"s": model_input,
|
||||
"a": model_action,
|
||||
})
|
||||
|
||||
def _compute_delta(self, arr):
|
||||
delta = np.array(arr)
|
||||
delta[1:] -= arr[:-1]
|
||||
return delta
|
||||
|
||||
def _batch_obs_to_numpy(self, obs):
|
||||
v = np.stack([o["v"] for o in obs], axis=0)
|
||||
p = np.stack([o["p"] for o in obs], axis=0)
|
||||
vo = np.stack([o["vo"] for o in obs], axis=0)
|
||||
po = np.stack([o["po"] for o in obs], axis=0)
|
||||
return {"p": p, "v": v, "vo": vo, "po": po}
|
||||
|
||||
def __call__(self, snapshot_list):
|
||||
if self._shared_storage is None:
|
||||
return
|
||||
|
||||
shortage = snapshot_list["ports"][self._tick_range:self._static_list:"shortage"].reshape(self._max_tick, -1)
|
||||
fulfillment = snapshot_list["ports"][self._tick_range:self._static_list:"fulfillment"] \
|
||||
.reshape(self._max_tick, -1)
|
||||
delta = fulfillment - shortage
|
||||
R = np.empty((self._len_return, len(self._static_list)), dtype=np.float)
|
||||
for i in range(0, self._len_return, 1):
|
||||
R[i] = np.dot(self._discount_vector, delta[i + 1: i + self._time_slot + 1])
|
||||
|
||||
for (agent_idx, vessel_idx), exp_list in self._experience_dict.items():
|
||||
for exp in exp_list:
|
||||
tick = exp["tick"]
|
||||
exp["s_"] = self._gnn_state_shaper(tick=tick + self._time_slot)
|
||||
exp["R"] = self._scale_factor * R[tick]
|
||||
|
||||
tmpi = 0
|
||||
for (agent_idx, vessel_idx), idx_base in self._exp_idx_mapping.items():
|
||||
exp_list = self._experience_dict[(agent_idx, vessel_idx)]
|
||||
exp_len = len(exp_list)
|
||||
# Here, we assume that exp_idx_mapping order is not changed.
|
||||
self._shared_storage["len"][self._idx, tmpi] = exp_len
|
||||
self._shared_storage["s"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["v"] for e in exp_list], axis=1)
|
||||
self._shared_storage["s"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["p"] for e in exp_list], axis=1)
|
||||
self._shared_storage["s"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["vo"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s"]["po"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["po"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["vedge"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s"]["pedge"] for e in exp_list], axis=0)
|
||||
|
||||
self._shared_storage["s_"]["v"][:, idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["v"] for e in exp_list], axis=1)
|
||||
self._shared_storage["s_"]["p"][:, idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["p"] for e in exp_list], axis=1)
|
||||
self._shared_storage["s_"]["vo"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["vo"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s_"]["po"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["po"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s_"]["vedge"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["vedge"] for e in exp_list], axis=0)
|
||||
self._shared_storage["s_"]["pedge"][idx_base:idx_base + exp_len, self._idx] = \
|
||||
np.stack([e["s_"]["pedge"] for e in exp_list], axis=0)
|
||||
|
||||
self._shared_storage["a"][idx_base: idx_base + exp_len, self._idx] = \
|
||||
np.array([exp["a"] for exp in exp_list], dtype=np.int64)
|
||||
self._shared_storage["R"][idx_base: idx_base + exp_len, self._idx] = \
|
||||
np.vstack([exp["R"] for exp in exp_list])
|
||||
tmpi += 1
|
||||
|
||||
def reset(self):
|
||||
del self._experience_dict
|
||||
self._init_state()
|
|
@ -1,52 +0,0 @@
|
|||
import os
|
||||
import time
|
||||
|
||||
from maro.rl import AbsLearner
|
||||
from maro.utils import DummyLogger
|
||||
|
||||
from .actor import ParallelActor
|
||||
from .agent_manager import SimpleAgentManger
|
||||
|
||||
|
||||
class GNNLearner(AbsLearner):
|
||||
"""Learner class for the training pipeline and the specialized logging in GNN solution for CIM problem.
|
||||
|
||||
Args:
|
||||
actor (AbsActor): The actor instance to collect experience.
|
||||
trainable_agents (AbsAgentManager): The agent manager for training RL models.
|
||||
logger (Logger): The logger to save/print the message.
|
||||
"""
|
||||
|
||||
def __init__(self, actor: ParallelActor, trainable_agents: SimpleAgentManger, logger=DummyLogger()):
|
||||
super().__init__()
|
||||
self._actor = actor
|
||||
self._trainable_agents = trainable_agents
|
||||
self._logger = logger
|
||||
|
||||
def train(self, training_config, log_pth=None):
|
||||
rollout_time = 0
|
||||
training_time = 0
|
||||
for i in range(training_config.rollout_cnt):
|
||||
self._logger.info(f"rollout {i + 1}")
|
||||
tick = time.time()
|
||||
exp_dict = self._actor.roll_out()
|
||||
|
||||
rollout_time += time.time() - tick
|
||||
|
||||
self._logger.info("start putting exps")
|
||||
self._trainable_agents.store_experiences(exp_dict)
|
||||
|
||||
if training_config.enable and i % training_config.train_freq == training_config.train_freq - 1:
|
||||
self._logger.info("training start")
|
||||
tick = time.time()
|
||||
self._trainable_agents.train(training_config)
|
||||
training_time += time.time() - tick
|
||||
|
||||
if log_pth is not None and (i + 1) % training_config.model_save_freq == 0:
|
||||
self._trainable_agents.save_model(os.path.join(log_pth, "models"), i + 1)
|
||||
|
||||
self._logger.debug(f"total rollout_time: {int(rollout_time)}")
|
||||
self._logger.debug(f"train_time: {int(training_time)}")
|
||||
|
||||
def test(self):
|
||||
pass
|
|
@ -1,186 +0,0 @@
|
|||
from typing import Sequence
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import AbsStore
|
||||
|
||||
|
||||
def get_item(data_dict, key_tuple):
|
||||
"""Helper function to get the value in a hierarchical dictionary given the key path.
|
||||
|
||||
Args:
|
||||
data_dict (dict): The data structure. For example:
|
||||
{
|
||||
"a": {
|
||||
"b": 1,
|
||||
"c": {
|
||||
"d": 2,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
key_tuple (tuple): The key path to the target field. For example, given the data_dict above, the key_tuple
|
||||
("a", "c", "d") should return 2.
|
||||
"""
|
||||
for key in key_tuple:
|
||||
data_dict = data_dict[key]
|
||||
return data_dict
|
||||
|
||||
|
||||
def set_item(data_dict, key_tuple, data):
|
||||
"""The setter function corresponding to the get_item function."""
|
||||
for i, key in enumerate(key_tuple):
|
||||
if key not in data_dict:
|
||||
data_dict[key] = {}
|
||||
if i == len(key_tuple) - 1:
|
||||
data_dict[key] = data
|
||||
else:
|
||||
data_dict = data_dict[key]
|
||||
|
||||
|
||||
class NumpyStore(AbsStore):
|
||||
def __init__(self, domain_type_dict, capacity):
|
||||
"""
|
||||
Args:
|
||||
domain_type_dict (dict): The dictionary describing the name, structure and type of each field in the
|
||||
experience. Each field in the experience is the key-value pair in the folowing structure:
|
||||
(field_name): (size_of_an_instance, data_type, batch_first)
|
||||
|
||||
For example:
|
||||
("s"): ((32, 64), np.float32, True)
|
||||
|
||||
The field can be a hierarchical dictionary by identifying the full path to the root.
|
||||
|
||||
For example:
|
||||
{
|
||||
("s", "p"): ((32, 64), np.float32, True)
|
||||
("s", "v"): ((48, ), np.float32, False),
|
||||
}
|
||||
Then the batch of experience returned by self.get(indexes) is:
|
||||
{
|
||||
"s":
|
||||
{
|
||||
"p": numpy.array with size (batch, 32, 64),
|
||||
"v": numpy.array with size (32, batch, 48),
|
||||
}
|
||||
}
|
||||
Note that for the field ("s", "v"), the batch is in the 2nd dimension because the batch_first attribute
|
||||
is False.
|
||||
|
||||
capacity (int): The maximum stored experience in the store.
|
||||
"""
|
||||
super().__init__()
|
||||
self.domain_type_dict = dict(domain_type_dict)
|
||||
self.store = {
|
||||
key: np.zeros(
|
||||
shape=(capacity, *shape) if batch_first else (shape[0], capacity, *shape[1:]), dtype=data_type)
|
||||
for key, (shape, data_type, batch_first) in domain_type_dict.items()}
|
||||
self.batch_first_store = {key: batch_first for key, (_, _, batch_first) in domain_type_dict.items()}
|
||||
|
||||
self.cnt = 0
|
||||
self.capacity = capacity
|
||||
|
||||
def put(self, exp_dict: dict):
|
||||
"""Insert a batch of experience into the store.
|
||||
|
||||
If the store reaches the maximum capacity, this function will replace the experience in the store randomly.
|
||||
|
||||
Args:
|
||||
exp_dict (dict): The dictionary of a batch of experience. For example:
|
||||
|
||||
{
|
||||
"s":
|
||||
{
|
||||
"p": numpy.array with size (batch, 32, 64),
|
||||
"v": numpy.array with size (32, batch, 48),
|
||||
}
|
||||
}
|
||||
|
||||
The structure should be consistent with the structure defined in the __init__ function.
|
||||
|
||||
Returns:
|
||||
indexes (numpy.array): The list of the indexes each experience in the batch is located in.
|
||||
"""
|
||||
dlen = exp_dict["len"]
|
||||
append_end = min(max(self.capacity - self.cnt, 0), dlen)
|
||||
idxs = np.zeros(dlen, dtype=np.int)
|
||||
if append_end != 0:
|
||||
for key in self.domain_type_dict.keys():
|
||||
data = get_item(exp_dict, key)
|
||||
if self.batch_first_store[key]:
|
||||
self.store[key][self.cnt: self.cnt + append_end] = data[0: append_end]
|
||||
else:
|
||||
self.store[key][:, self.cnt: self.cnt + append_end] = data[:, 0: append_end]
|
||||
idxs[: append_end] = np.arange(self.cnt, self.cnt + append_end)
|
||||
if append_end < dlen:
|
||||
replace_idx = self._get_replace_idx(dlen - append_end)
|
||||
for key in self.domain_type_dict.keys():
|
||||
data = get_item(exp_dict, key)
|
||||
if self.batch_first_store[key]:
|
||||
self.store[key][replace_idx] = data[append_end: dlen]
|
||||
else:
|
||||
self.store[key][:, replace_idx] = data[:, append_end: dlen]
|
||||
idxs[append_end: dlen] = replace_idx
|
||||
self.cnt += dlen
|
||||
return idxs
|
||||
|
||||
def _get_replace_idx(self, cnt):
|
||||
return np.random.randint(low=0, high=self.capacity, size=cnt)
|
||||
|
||||
def get(self, indexes: np.array):
|
||||
"""Get the experience indexed in the indexes list from the store.
|
||||
|
||||
Args:
|
||||
indexes (np.array): A numpy array containing the indexes of a batch experience.
|
||||
|
||||
Returns:
|
||||
data_dict (dict): the structure same as that defined in the __init__ function.
|
||||
"""
|
||||
data_dict = {}
|
||||
for key in self.domain_type_dict.keys():
|
||||
if self.batch_first_store[key]:
|
||||
set_item(data_dict, key, self.store[key][indexes])
|
||||
else:
|
||||
set_item(data_dict, key, self.store[key][:, indexes])
|
||||
return data_dict
|
||||
|
||||
def __len__(self):
|
||||
return min(self.capacity, self.cnt)
|
||||
|
||||
def update(self, indexes: Sequence, contents: Sequence):
|
||||
raise NotImplementedError("NumpyStore does not support modifying the experience!")
|
||||
|
||||
def sample(self, size, weights: Sequence, replace: bool = True):
|
||||
raise NotImplementedError("NumpyStore does not support sampling. Please use outer sampler to fetch samples!")
|
||||
|
||||
def clear(self):
|
||||
"""Remove all the experience in the store."""
|
||||
self.cnt = 0
|
||||
|
||||
|
||||
class Shuffler:
|
||||
def __init__(self, store: NumpyStore, batch_size: int):
|
||||
"""The helper class for fast batch sampling.
|
||||
|
||||
Args:
|
||||
store (NumpyStore): The data source for sampling.
|
||||
batch_size (int): The size of a batch.
|
||||
"""
|
||||
self._store = store
|
||||
self._shuffled_seq = np.arange(0, len(store))
|
||||
np.random.shuffle(self._shuffled_seq)
|
||||
self._start = 0
|
||||
self._batch_size = batch_size
|
||||
|
||||
def next(self):
|
||||
"""Uniformly sampling out a batch in the store."""
|
||||
if self._start >= len(self._store):
|
||||
return None
|
||||
end = min(self._start + self._batch_size, len(self._store))
|
||||
rst = self._store.get(self._shuffled_seq[self._start: end])
|
||||
self._start += self._batch_size
|
||||
return rst
|
||||
|
||||
def has_next(self):
|
||||
"""Check if any experience is not visited."""
|
||||
return self._start < len(self._store)
|
|
@ -1,46 +0,0 @@
|
|||
import multiprocessing
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def init_shared_memory(data_structure):
|
||||
"""Initialize the data structure of the shared memory.
|
||||
|
||||
Args:
|
||||
data_structure: The dictionary that describes the data structure. For example,
|
||||
{
|
||||
"a": (shape, type),
|
||||
"b": {
|
||||
"b1": (shape, type),
|
||||
}
|
||||
}
|
||||
"""
|
||||
if isinstance(data_structure, tuple):
|
||||
mult = 1
|
||||
for i in data_structure[0]:
|
||||
mult *= i
|
||||
return multiprocessing.Array(data_structure[1], mult, lock=False)
|
||||
else:
|
||||
shared_data = {}
|
||||
for k, v in data_structure.items():
|
||||
shared_data[k] = init_shared_memory(v)
|
||||
return shared_data
|
||||
|
||||
|
||||
def shared_data2numpy(shared_data, structure_info):
|
||||
if not isinstance(shared_data, dict):
|
||||
return np.frombuffer(shared_data, dtype=structure_info[1]).reshape(structure_info[0])
|
||||
else:
|
||||
numpy_dict = {}
|
||||
for k, v in shared_data.items():
|
||||
numpy_dict[k] = shared_data2numpy(v, structure_info[k])
|
||||
return numpy_dict
|
||||
|
||||
|
||||
class SharedStructure:
|
||||
def __init__(self, data_structure):
|
||||
self.data_structure = data_structure
|
||||
self.shared = init_shared_memory(data_structure)
|
||||
|
||||
def structuralize(self):
|
||||
return shared_data2numpy(self.shared, self.data_structure)
|
|
@ -1,335 +0,0 @@
|
|||
import math
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from torch import Tensor
|
||||
from torch.nn import TransformerEncoder, TransformerEncoderLayer
|
||||
from torch.nn import functional as F
|
||||
from torch.nn.modules.activation import MultiheadAttention
|
||||
from torch.nn.modules.dropout import Dropout
|
||||
from torch.nn.modules.normalization import LayerNorm
|
||||
|
||||
|
||||
class PositionalEncoder(nn.Module):
|
||||
"""
|
||||
The positional encoding used in transformer to get the sequential information.
|
||||
|
||||
The code is based on the PyTorch version in web
|
||||
https://pytorch.org/tutorials/beginner/transformer_tutorial.html?highlight=positionalencoding
|
||||
"""
|
||||
|
||||
def __init__(self, d_model, max_seq_len=80):
|
||||
super().__init__()
|
||||
self.d_model = d_model
|
||||
self.times = 4 * math.sqrt(self.d_model)
|
||||
|
||||
# Create constant "pe" matrix with values dependant on pos and i.
|
||||
self.pe = torch.zeros(max_seq_len, d_model)
|
||||
for pos in range(max_seq_len):
|
||||
for i in range(0, d_model, 2):
|
||||
self.pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
|
||||
self.pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))
|
||||
|
||||
self.pe = self.pe.unsqueeze(1) / self.d_model
|
||||
|
||||
def forward(self, x):
|
||||
# Make embeddings relatively larger.
|
||||
addon = self.pe[: x.shape[0], :, : x.shape[2]].to(x.device)
|
||||
return x + addon
|
||||
|
||||
|
||||
class SimpleGATLayer(nn.Module):
|
||||
"""The enhanced graph attention layer for heterogenenous neighborhood.
|
||||
|
||||
It first utilizes pre-layers for both the source and destination node to map their features into the same hidden
|
||||
size. If the edge also has features, they are concatenated with those of the corresponding source node before being
|
||||
fed to the pre-layers. Then the graph attention(https://arxiv.org/abs/1710.10903) is done to aggregate information
|
||||
from the source nodes to the destination nodes. The residual connection and layer normalization are also used to
|
||||
enhance the performance, which is similar to the Transformer(https://arxiv.org/abs/1706.03762).
|
||||
|
||||
Args:
|
||||
src_dim (int): The feature dimension of the source nodes.
|
||||
dest_dim (int): The feature dimension of the destination nodes.
|
||||
edge_dim (int): The feature dimension of the edges. If the edges have no feature, it should be set 0.
|
||||
hidden_size (int): The hidden size both the destination and source is mapped into.
|
||||
nhead (int): The number of head in the multi-head attention.
|
||||
position_encoding (bool): the neighbor source nodes is aggregated in order(True) or orderless(False).
|
||||
"""
|
||||
|
||||
def __init__(self, src_dim, dest_dim, edge_dim, hidden_size, nhead=4, position_encoding=True):
|
||||
super().__init__()
|
||||
self.src_dim = src_dim
|
||||
self.dest_dim = dest_dim
|
||||
self.edge_dim = edge_dim
|
||||
self.hidden_size = hidden_size
|
||||
self.nhead = nhead
|
||||
src_layers = []
|
||||
src_layers.append(nn.Linear(src_dim + edge_dim, hidden_size))
|
||||
src_layers.append(GeLU())
|
||||
self.src_pre_layer = nn.Sequential(*src_layers)
|
||||
|
||||
dest_layers = []
|
||||
dest_layers.append(nn.Linear(dest_dim, hidden_size))
|
||||
dest_layers.append(GeLU())
|
||||
self.dest_pre_layer = nn.Sequential(*dest_layers)
|
||||
|
||||
self.att = MultiheadAttention(embed_dim=hidden_size, num_heads=nhead)
|
||||
self.att_dropout = Dropout(0.1)
|
||||
self.att_norm = LayerNorm(hidden_size)
|
||||
|
||||
self.zero_padding_template = torch.zeros((1, src_dim), dtype=torch.float)
|
||||
|
||||
def forward(self, src: Tensor, dest: Tensor, adj: Tensor, mask: Tensor, edges: Tensor = None):
|
||||
"""Information aggregation from the source nodes to the destination nodes.
|
||||
|
||||
Args:
|
||||
src (Tensor): The source nodes in a batch of graph.
|
||||
dest (Tensor): The destination nodes in a batch of graph.
|
||||
adj (Tensor): The adjencency list stored in a 2D matrix in the batch-second format. The first dimension is
|
||||
the maximum amount of the neighbors the destinations have. As the neighbor quantities vary from one
|
||||
destination to another, the short sequences are padded with 0.
|
||||
mask (Tensor): The mask identifies if a position in the adj is padded. Note that it is stored in the
|
||||
batch-first format.
|
||||
|
||||
Returns:
|
||||
destination_emb: The embedding of the destinations after the GAT layer.
|
||||
|
||||
Shape:
|
||||
src: (batch, src_cnt, src_dim)
|
||||
dest: (batch, dest_cnt, dest_dim)
|
||||
adj: (src_neighbor_cnt, batch*dest_cnt)
|
||||
mask: (batch*dest_cnt)*src_neighbor_cnt
|
||||
edges: (batch*dest_cnt, src_neighbor_cnt, edge_dim)
|
||||
destination_emb: (batch, dest_cnt, hidden_size)
|
||||
|
||||
"""
|
||||
assert(self.src_dim == src.shape[-1])
|
||||
assert(self.dest_dim == dest.shape[-1])
|
||||
batch, s_cnt, src_dim = src.shape
|
||||
batch, d_cnt, dest_dim = dest.shape
|
||||
src_neighbor_cnt = adj.shape[0]
|
||||
|
||||
src_embedding = src.reshape(-1, src_dim)
|
||||
src_embedding = torch.cat((self.zero_padding_template.to(src_embedding.device), src_embedding))
|
||||
|
||||
flat_adj = adj.reshape(-1)
|
||||
src_embedding = src_embedding[flat_adj].reshape(src_neighbor_cnt, -1, src_dim)
|
||||
if edges is not None:
|
||||
src_embedding = torch.cat((src_embedding, edges), axis=2)
|
||||
|
||||
src_input = self.src_pre_layer(
|
||||
src_embedding.reshape(-1, src_dim + self.edge_dim)). \
|
||||
reshape(*src_embedding.shape[:2], self.hidden_size)
|
||||
dest_input = self.dest_pre_layer(dest.reshape(-1, dest_dim)).reshape(1, batch * d_cnt, self.hidden_size)
|
||||
dest_emb, _ = self.att(dest_input, src_input, src_input, key_padding_mask=mask)
|
||||
|
||||
dest_emb = dest_emb + self.att_dropout(dest_emb)
|
||||
dest_emb = self.att_norm(dest_emb)
|
||||
return dest_emb.reshape(batch, d_cnt, self.hidden_size)
|
||||
|
||||
|
||||
class SimpleTransformer(nn.Module):
|
||||
"""Graph attention network with multiple graph in the CIM scenario.
|
||||
|
||||
This module aggregates information in the port-to-port graph, port-to-vessel graph and vessel-to-port graph. The
|
||||
aggregation in the two graph are done separatedly and then the port features are concatenated as the final result.
|
||||
|
||||
Args:
|
||||
p_dim (int): The feature dimension of the ports.
|
||||
v_dim (int): The feature dimension of the vessels.
|
||||
edge_dim (dict): The key is the edge name and the value is the corresponding feature dimension.
|
||||
output_size (int): The hidden size in graph attention.
|
||||
layer_num (int): The number of graph attention layers in each graph.
|
||||
"""
|
||||
|
||||
def __init__(self, p_dim, v_dim, edge_dim: dict, output_size, layer_num=2):
|
||||
super().__init__()
|
||||
self.hidden_size = output_size
|
||||
self.layer_num = layer_num
|
||||
|
||||
pl, vl, ppl = [], [], []
|
||||
for i in range(layer_num):
|
||||
if i == 0:
|
||||
pl.append(SimpleGATLayer(v_dim, p_dim, edge_dim["v"], self.hidden_size, nhead=4))
|
||||
vl.append(SimpleGATLayer(p_dim, v_dim, edge_dim["v"], self.hidden_size, nhead=4))
|
||||
# p2p links.
|
||||
ppl.append(
|
||||
SimpleGATLayer(
|
||||
p_dim, p_dim, edge_dim["p"], self.hidden_size, nhead=4, position_encoding=False)
|
||||
)
|
||||
else:
|
||||
pl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
|
||||
if i != layer_num - 1:
|
||||
# p2v conv is not necessary at the last layer, for we only use port features.
|
||||
vl.append(SimpleGATLayer(self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4))
|
||||
ppl.append(SimpleGATLayer(
|
||||
self.hidden_size, self.hidden_size, 0, self.hidden_size, nhead=4, position_encoding=False))
|
||||
self.p_layers = nn.ModuleList(pl)
|
||||
self.v_layers = nn.ModuleList(vl)
|
||||
self.pp_layers = nn.ModuleList(ppl)
|
||||
|
||||
def forward(self, p, pe, v, ve, ppe):
|
||||
"""Do the multi-channel graph attention.
|
||||
|
||||
Args:
|
||||
p (Tensor): The port feature.
|
||||
pe (Tensor): The vessel-port edge feature.
|
||||
v (Tensor): The vessel feature.
|
||||
ve (Tensor): The port-vessel edge feature.
|
||||
ppe (Tensor): The port-port edge feature.
|
||||
"""
|
||||
# p.shape: (batch*p_cnt, p_dim)
|
||||
pp = p
|
||||
pre_p, pre_v, pre_pp = p, v, pp
|
||||
for i in range(self.layer_num):
|
||||
# Only feed edge info in the first layer.
|
||||
p = self.p_layers[i](pre_v, pre_p, adj=pe["adj"], edges=pe["edge"] if i == 0 else None, mask=pe["mask"])
|
||||
if i != self.layer_num - 1:
|
||||
v = self.v_layers[i](
|
||||
pre_p, pre_v, adj=ve["adj"], edges=ve["edge"] if i == 0 else None, mask=ve["mask"])
|
||||
pp = self.pp_layers[i](
|
||||
pre_pp, pre_pp, adj=ppe["adj"], edges=ppe["edge"] if i == 0 else None, mask=ppe["mask"])
|
||||
pre_p, pre_v, pre_pp = p, v, pp
|
||||
p = torch.cat((p, pp), axis=2)
|
||||
return p, v
|
||||
|
||||
|
||||
class GeLU(nn.Module):
|
||||
"""Simple gelu wrapper as a independent module."""
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
||||
def forward(self, input):
|
||||
return F.gelu(input)
|
||||
|
||||
|
||||
class Header(nn.Module):
|
||||
def __init__(self, input_size, hidden_size, output_size, net_type="res"):
|
||||
super().__init__()
|
||||
self.net_type = net_type
|
||||
if net_type == "res":
|
||||
self.fc_0 = nn.Linear(input_size, hidden_size)
|
||||
self.act_0 = GeLU()
|
||||
# self.do_0 = Dropout(dropout)
|
||||
self.fc_1 = nn.Linear(hidden_size, input_size)
|
||||
self.act_1 = GeLU()
|
||||
self.fc_2 = nn.Linear(input_size, output_size)
|
||||
elif net_type == "2layer":
|
||||
self.fc_0 = nn.Linear(input_size, hidden_size)
|
||||
self.act_0 = GeLU()
|
||||
# self.do_0 = Dropout(dropout)
|
||||
self.fc_1 = nn.Linear(hidden_size, hidden_size // 2)
|
||||
self.act_1 = GeLU()
|
||||
self.fc_2 = nn.Linear(hidden_size // 2, output_size)
|
||||
elif net_type == "1layer":
|
||||
self.fc_0 = nn.Linear(input_size, hidden_size)
|
||||
self.act_0 = GeLU()
|
||||
self.fc_1 = nn.Linear(hidden_size, output_size)
|
||||
|
||||
def forward(self, x):
|
||||
if self.net_type == "res":
|
||||
x1 = self.act_0(self.fc_0(x))
|
||||
x1 = self.act_1(self.fc_1(x1) + x)
|
||||
return self.fc_2(x1)
|
||||
elif self.net_type == "2layer":
|
||||
x = self.act_0(self.fc_0(x))
|
||||
x = self.act_1(self.fc_1(x))
|
||||
x = self.fc_1(x)
|
||||
return x
|
||||
else:
|
||||
x = self.fc_1(self.act_0(self.fc_0(x)))
|
||||
return x
|
||||
|
||||
|
||||
class SharedAC(nn.Module):
|
||||
"""The actor-critic module shared with multiple agents.
|
||||
|
||||
This module maps the input graph of the observation to the policy and value space. It first extracts the temporal
|
||||
information separately for each node with a small transformer block and then extracts the spatial information with
|
||||
a multi-graph/channel graph attention. Finally, the extracted feature embedding is fed to a actor header as well
|
||||
as a critic layer, which are the two MLPs with residual connections.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, input_dim_p, edge_dim_p, input_dim_v, edge_dim_v, tick_buffer, action_dim, a=True, c=True,
|
||||
scale=4, ac_head="res"):
|
||||
super().__init__()
|
||||
assert(a or c)
|
||||
self.a, self.c = a, c
|
||||
self.input_dim_v = input_dim_v
|
||||
self.input_dim_p = input_dim_p
|
||||
self.tick_buffer = tick_buffer
|
||||
|
||||
self.pre_dim_v, self.pre_dim_p = 8 * scale, 16 * scale
|
||||
self.p_pre_layer = nn.Sequential(
|
||||
nn.Linear(input_dim_p, self.pre_dim_p), GeLU(), PositionalEncoder(
|
||||
d_model=self.pre_dim_p, max_seq_len=tick_buffer))
|
||||
self.v_pre_layer = nn.Sequential(
|
||||
nn.Linear(input_dim_v, self.pre_dim_v), GeLU(), PositionalEncoder(
|
||||
d_model=self.pre_dim_v, max_seq_len=tick_buffer))
|
||||
p_encoder_layer = TransformerEncoderLayer(
|
||||
d_model=self.pre_dim_p, nhead=4, activation="gelu", dim_feedforward=self.pre_dim_p * 4)
|
||||
v_encoder_layer = TransformerEncoderLayer(
|
||||
d_model=self.pre_dim_v, nhead=2, activation="gelu", dim_feedforward=self.pre_dim_v * 4)
|
||||
|
||||
# Alternative initialization: define the normalization.
|
||||
# self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_p))
|
||||
# self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3, norm=Norm(self.pre_dim_v))
|
||||
self.trans_layer_p = TransformerEncoder(p_encoder_layer, num_layers=3)
|
||||
self.trans_layer_v = TransformerEncoder(v_encoder_layer, num_layers=3)
|
||||
|
||||
self.gnn_output_size = 32 * scale
|
||||
self.trans_gat = SimpleTransformer(
|
||||
p_dim=self.pre_dim_p,
|
||||
v_dim=self.pre_dim_v,
|
||||
output_size=self.gnn_output_size // 2,
|
||||
edge_dim={"p": edge_dim_p, "v": edge_dim_v},
|
||||
layer_num=2
|
||||
)
|
||||
|
||||
if a:
|
||||
self.policy_hidden_size = 16 * scale
|
||||
self.a_input = 3 * self.gnn_output_size // 2
|
||||
self.actor = nn.Sequential(
|
||||
Header(self.a_input, self.policy_hidden_size, action_dim, ac_head), nn.Softmax(dim=-1))
|
||||
if c:
|
||||
self.value_hidden_size = 16 * scale
|
||||
self.c_input = self.gnn_output_size
|
||||
self.critic = Header(self.c_input, self.value_hidden_size, 1, ac_head)
|
||||
|
||||
def forward(self, state, a=False, p_idx=None, v_idx=None, c=False):
|
||||
assert((a and p_idx is not None and v_idx is not None) or c)
|
||||
feature_p, feature_v = state["p"], state["v"]
|
||||
|
||||
tb, bsize, p_cnt, _ = feature_p.shape
|
||||
v_cnt = feature_v.shape[2]
|
||||
assert(tb == self.tick_buffer)
|
||||
|
||||
# Before: feature_p.shape: (tick_buffer, batch_size, p_cnt, p_dim)
|
||||
# After: feature_p.shape: (tick_buffer, batch_size*p_cnt, p_dim)
|
||||
feature_p = self.p_pre_layer(feature_p.reshape(feature_p.shape[0], -1, feature_p.shape[-1]))
|
||||
# state["mask"]: (batch_size, tick_buffer)
|
||||
# mask_p: (batch_size, p_cnt, tick_buffer)
|
||||
mask_p = state["mask"].repeat(1, p_cnt).reshape(-1, self.tick_buffer)
|
||||
feature_p = self.trans_layer_p(feature_p, src_key_padding_mask=mask_p)
|
||||
|
||||
feature_v = self.v_pre_layer(feature_v.reshape(feature_v.shape[0], -1, feature_v.shape[-1]))
|
||||
mask_v = state["mask"].repeat(1, v_cnt).reshape(-1, self.tick_buffer)
|
||||
feature_v = self.trans_layer_v(feature_v, src_key_padding_mask=mask_v)
|
||||
|
||||
feature_p = feature_p[0].reshape(bsize, p_cnt, self.pre_dim_p)
|
||||
feature_v = feature_v[0].reshape(bsize, v_cnt, self.pre_dim_v)
|
||||
|
||||
emb_p, emb_v = self.trans_gat(feature_p, state["pe"], feature_v, state["ve"], state["ppe"])
|
||||
|
||||
a_rtn, c_rtn = None, None
|
||||
if a and self.a:
|
||||
ap = emb_p.reshape(bsize, p_cnt, self.gnn_output_size)
|
||||
ap = ap[:, p_idx, :]
|
||||
av = emb_v.reshape(bsize, v_cnt, self.gnn_output_size // 2)
|
||||
av = av[:, v_idx, :]
|
||||
emb_a = torch.cat((ap, av), axis=1)
|
||||
a_rtn = self.actor(emb_a)
|
||||
if c and self.c:
|
||||
c_rtn = self.critic(emb_p).reshape(bsize, p_cnt)
|
||||
return a_rtn, c_rtn
|
|
@ -1,235 +0,0 @@
|
|||
import numpy as np
|
||||
|
||||
from maro.rl.shaping.state_shaper import StateShaper
|
||||
|
||||
from .utils import compute_v2p_degree_matrix
|
||||
|
||||
|
||||
class GNNStateShaper(StateShaper):
|
||||
"""State shaper to extract graph information.
|
||||
|
||||
Args:
|
||||
port_code_list (list): The list of the port codes in the CIM topology.
|
||||
vessel_code_list (list): The list of the vessel code in the CIM topology.
|
||||
max_tick (int): The duration of the simulation.
|
||||
feature_config (dict): The dottable dict that stores the configuration of the observation feature.
|
||||
max_value (int): The norm scale. All the feature are simply divided by this number.
|
||||
tick_buffer (int): The value n in n-step TD.
|
||||
only_demo (bool): Define if the shaper instance is used only for shape demonstration(True) or runtime
|
||||
shaping(False).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, port_code_list, vessel_code_list, max_tick, feature_config, max_value=100000, tick_buffer=20,
|
||||
only_demo=False):
|
||||
# Collect and encode all ports.
|
||||
self.port_code_list = list(port_code_list)
|
||||
self.port_cnt = len(self.port_code_list)
|
||||
self.port_code_inv_dict = {code: i for i, code in enumerate(self.port_code_list)}
|
||||
|
||||
# Collect and encode all vessels.
|
||||
self.vessel_code_list = list(vessel_code_list)
|
||||
self.vessel_cnt = len(self.vessel_code_list)
|
||||
self.vessel_code_inv_dict = {code: i for i, code in enumerate(self.vessel_code_list)}
|
||||
|
||||
# Collect and encode ports and vessels together.
|
||||
self.node_code_inv_dict_p = {i: i for i in self.port_code_list}
|
||||
self.node_code_inv_dict_v = {i: i + self.port_cnt for i in self.vessel_code_list}
|
||||
self.node_cnt = self.port_cnt + self.vessel_cnt
|
||||
|
||||
one_hot_coding = np.identity(self.node_cnt)
|
||||
self.port_one_hot_coding = np.expand_dims(one_hot_coding[:self.port_cnt], axis=0)
|
||||
self.vessel_one_hot_coding = np.expand_dims(one_hot_coding[self.port_cnt:], axis=0)
|
||||
self.last_tick = -1
|
||||
|
||||
self.port_features = [
|
||||
"empty", "full", "capacity", "on_shipper", "on_consignee", "booking", "acc_booking", "shortage",
|
||||
"acc_shortage", "fulfillment", "acc_fulfillment"]
|
||||
self.vessel_features = ["empty", "full", "capacity", "remaining_space"]
|
||||
|
||||
self._max_tick = max_tick
|
||||
self._tick_buffer = tick_buffer
|
||||
# To identify one vessel would never arrive at the port.
|
||||
self.max_arrival_time = 99999999
|
||||
|
||||
self.vedge_dim = 2
|
||||
self.pedge_dim = 1
|
||||
|
||||
self._only_demo = only_demo
|
||||
self._feature_config = feature_config
|
||||
self._normalize = True
|
||||
self._norm_scale = 2.0 / max_value
|
||||
if not only_demo:
|
||||
self._state_dict = {
|
||||
# Last "tick" is used for embedding, all zero and never be modified.
|
||||
"v": np.zeros((self._max_tick + 1, self.vessel_cnt, self.get_input_dim("v"))),
|
||||
"p": np.zeros((self._max_tick + 1, self.port_cnt, self.get_input_dim("p"))),
|
||||
"vo": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt), dtype=np.int),
|
||||
"po": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt), dtype=np.int),
|
||||
"vedge": np.zeros((self._max_tick + 1, self.vessel_cnt, self.port_cnt, self.get_input_dim("vedge"))),
|
||||
"pedge": np.zeros((self._max_tick + 1, self.port_cnt, self.vessel_cnt, self.get_input_dim("vedge"))),
|
||||
"ppedge": np.zeros((self._max_tick + 1, self.port_cnt, self.port_cnt, self.get_input_dim("pedge"))),
|
||||
}
|
||||
|
||||
# Fixed order: in the order of degree.
|
||||
|
||||
def compute_static_graph_structure(self, env):
|
||||
v2p_adj_matrix = compute_v2p_degree_matrix(env)
|
||||
p2p_adj_matrix = np.dot(v2p_adj_matrix.T, v2p_adj_matrix)
|
||||
p2p_adj_matrix[p2p_adj_matrix == 0] = self.max_arrival_time
|
||||
np.fill_diagonal(p2p_adj_matrix, self.max_arrival_time)
|
||||
self._p2p_embedding = self.sort(p2p_adj_matrix)
|
||||
|
||||
v2p_adj_matrix = -v2p_adj_matrix
|
||||
v2p_adj_matrix[v2p_adj_matrix == 0] = self.max_arrival_time
|
||||
self._fixed_v_order = self.sort(v2p_adj_matrix)
|
||||
self._fixed_p_order = self.sort(v2p_adj_matrix.T)
|
||||
|
||||
@property
|
||||
def p2p_static_graph(self):
|
||||
return self._p2p_embedding
|
||||
|
||||
def sort(self, arrival_time, attr=None):
|
||||
"""
|
||||
Given the arrival time matrix, this function sort the matrix and return the index matrix in the order of
|
||||
arrival time
|
||||
"""
|
||||
n, m = arrival_time.shape
|
||||
if self._feature_config.attention_order == "ramdom":
|
||||
arrival_time = arrival_time + np.random.randint(self._max_tick, size=arrival_time.shape)
|
||||
at_index = np.argsort(arrival_time, axis=1)
|
||||
if attr is not None:
|
||||
idx_tmp = np.repeat(at_index, attr.shape[-1]).reshape(*at_index.shape, attr.shape[-1])
|
||||
attr = np.take_along_axis(attr, idx_tmp, axis=1)
|
||||
mask = np.sort(arrival_time, axis=1) >= self.max_arrival_time
|
||||
at_index += 1
|
||||
at_index[mask] = 0
|
||||
if attr is None:
|
||||
return at_index
|
||||
else:
|
||||
return at_index, attr
|
||||
|
||||
def end_ep_callback(self, snapshot_list):
|
||||
if self._only_demo:
|
||||
return
|
||||
tick_range = np.arange(start=self.last_tick, stop=self._max_tick)
|
||||
self._sync_raw_features(snapshot_list, list(tick_range))
|
||||
self.last_tick = -1
|
||||
|
||||
def _sync_raw_features(self, snapshot_list, tick_range, static_code=None, dynamic_code=None):
|
||||
"""This function update the state_dict from snapshot_list in the given tick_range."""
|
||||
if len(tick_range) == 0:
|
||||
# This occurs when two actions happen at the same tick.
|
||||
return
|
||||
|
||||
# One dim features.
|
||||
port_naive_feature = snapshot_list["ports"][tick_range: self.port_code_list: self.port_features] \
|
||||
.reshape(len(tick_range), self.port_cnt, -1)
|
||||
# Number of laden from source to destination.
|
||||
full_on_port = snapshot_list["matrices"][tick_range::"full_on_ports"].reshape(
|
||||
len(tick_range), self.port_cnt, self.port_cnt)
|
||||
# Normalize features to a small range.
|
||||
port_state_mat = self.normalize(port_naive_feature)
|
||||
|
||||
if self._feature_config.onehot_identity:
|
||||
# Add onehot vector to identify port and vessel.
|
||||
port_onehot = np.repeat(self.port_one_hot_coding, len(tick_range), axis=0)
|
||||
if static_code is not None and dynamic_code is not None:
|
||||
# Identify the decision vessel at the decision port.
|
||||
port_onehot[-1, self.port_code_inv_dict[static_code], self.node_code_inv_dict_v[dynamic_code]] = -1
|
||||
port_state_mat = np.concatenate([port_state_mat, port_onehot], axis=2)
|
||||
self._state_dict["p"][tick_range] = port_state_mat
|
||||
|
||||
vessel_naive_feature = snapshot_list["vessels"][tick_range:self.vessel_code_list: self.vessel_features] \
|
||||
.reshape(len(tick_range), self.vessel_cnt, -1)
|
||||
full_on_vessel = snapshot_list["matrices"][tick_range::"full_on_vessels"].reshape(
|
||||
len(tick_range), self.vessel_cnt, self.port_cnt)
|
||||
|
||||
vessel_state_mat = self.normalize(vessel_naive_feature)
|
||||
if self._feature_config.onehot_identity:
|
||||
vessel_state_mat = np.concatenate(
|
||||
[vessel_state_mat, np.repeat(self.vessel_one_hot_coding, len(tick_range), axis=0)], axis=2)
|
||||
self._state_dict["v"][tick_range] = vessel_state_mat
|
||||
|
||||
# last_arrival_time.shape: vessel_cnt * port_cnt
|
||||
# -1 means one vessel never stops at the port
|
||||
vessel_arrival_time = snapshot_list["matrices"][tick_range[-1]:: "vessel_plans"].reshape(
|
||||
self.vessel_cnt, self.port_cnt)
|
||||
# Use infinity time to identify vessels never arrive at the port.
|
||||
last_arrival_time = vessel_arrival_time + 1
|
||||
last_arrival_time[last_arrival_time == 0] = self.max_arrival_time
|
||||
if static_code is not None and dynamic_code is not None:
|
||||
# To differentiate vessel acting on the port and other vessels that have taken or wait to take actions.
|
||||
last_arrival_time[self.vessel_code_inv_dict[dynamic_code], self.port_code_inv_dict[static_code]] = 0
|
||||
|
||||
# Here, we assume that the order of arriving time between two action/event is all the same.
|
||||
vedge_raw = self.normalize(np.stack((full_on_vessel[-1], last_arrival_time), axis=-1))
|
||||
vo, vedge = self.sort(last_arrival_time, attr=vedge_raw)
|
||||
po, pedge = self.sort(last_arrival_time.T, attr=vedge_raw.transpose((1, 0, 2)))
|
||||
self._state_dict["vo"][tick_range] = np.expand_dims(vo, axis=0)
|
||||
self._state_dict["vedge"][tick_range] = np.expand_dims(vedge, axis=0)
|
||||
self._state_dict["po"][tick_range] = np.expand_dims(po, axis=0)
|
||||
self._state_dict["pedge"][tick_range] = np.expand_dims(pedge, axis=0)
|
||||
self._state_dict["ppedge"][tick_range] = self.normalize(full_on_port[-1]).reshape(1, *full_on_port[-1].shape, 1)
|
||||
|
||||
def __call__(self, action_info=None, snapshot_list=None, tick=None):
|
||||
if self._only_demo:
|
||||
return
|
||||
assert((action_info is not None and snapshot_list is not None) or tick is not None)
|
||||
|
||||
if action_info is not None and snapshot_list is not None:
|
||||
# Update the state dict.
|
||||
static_code = action_info.port_idx
|
||||
dynamic_code = action_info.vessel_idx
|
||||
if self.last_tick == action_info.tick:
|
||||
tick_range = [action_info.tick]
|
||||
else:
|
||||
tick_range = list(range(self.last_tick + 1, action_info.tick + 1, 1))
|
||||
|
||||
self.last_tick = action_info.tick
|
||||
self._sync_raw_features(snapshot_list, tick_range, static_code, dynamic_code)
|
||||
tick = action_info.tick
|
||||
|
||||
# State_tick_range is inverse order.
|
||||
state_tick_range = np.arange(tick, max(-1, tick - self._tick_buffer), -1)
|
||||
v = np.zeros((self._tick_buffer, self.vessel_cnt, self.get_input_dim("v")))
|
||||
v[:len(state_tick_range)] = self._state_dict["v"][state_tick_range]
|
||||
p = np.zeros((self._tick_buffer, self.port_cnt, self.get_input_dim("p")))
|
||||
p[:len(state_tick_range)] = self._state_dict["p"][state_tick_range]
|
||||
|
||||
# True means padding.
|
||||
mask = np.ones(self._tick_buffer, dtype=np.bool)
|
||||
mask[:len(state_tick_range)] = False
|
||||
ret = {
|
||||
"tick": state_tick_range,
|
||||
"v": v,
|
||||
"p": p,
|
||||
"vo": self._state_dict["vo"][tick],
|
||||
"po": self._state_dict["po"][tick],
|
||||
"vedge": self._state_dict["vedge"][tick],
|
||||
"pedge": self._state_dict["pedge"][tick],
|
||||
"ppedge": self._state_dict["ppedge"][tick],
|
||||
"mask": mask,
|
||||
"len": len(state_tick_range),
|
||||
}
|
||||
|
||||
return ret
|
||||
|
||||
def normalize(self, feature):
|
||||
if not self._normalize:
|
||||
return feature
|
||||
return feature * self._norm_scale
|
||||
|
||||
def get_input_dim(self, agent_code):
|
||||
if agent_code in self.port_code_inv_dict or agent_code == "p":
|
||||
return len(self.port_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
|
||||
elif agent_code in self.vessel_code_inv_dict or agent_code == "v":
|
||||
return len(self.vessel_features) + (self.node_cnt if self._feature_config.onehot_identity else 0)
|
||||
elif agent_code == "vedge":
|
||||
# v-p edge: (arrival_time, laden to destination)
|
||||
return 2
|
||||
elif agent_code == "pedge":
|
||||
# p-p edge: (laden to destination, )
|
||||
return 1
|
||||
else:
|
||||
raise ValueError("agent not exist!")
|
|
@ -1,266 +0,0 @@
|
|||
import ast
|
||||
import io
|
||||
import os
|
||||
import random
|
||||
import shutil
|
||||
import sys
|
||||
from collections import OrderedDict, defaultdict
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
import yaml
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.simulator.scenarios.cim.common import Action
|
||||
from maro.utils import clone, convert_dottable
|
||||
|
||||
|
||||
def compute_v2p_degree_matrix(env):
|
||||
"""This function compute the adjacent matrix."""
|
||||
topo_config = env.configs
|
||||
static_dict = env.summary["node_mapping"]["ports"]
|
||||
dynamic_dict = env.summary["node_mapping"]["vessels"]
|
||||
adj_matrix = np.zeros((len(dynamic_dict), len(static_dict)), dtype=np.int)
|
||||
for v, vinfo in topo_config["vessels"].items():
|
||||
route_name = vinfo["route"]["route_name"]
|
||||
route = topo_config["routes"][route_name]
|
||||
vid = dynamic_dict[v]
|
||||
for p in route:
|
||||
adj_matrix[vid][static_dict[p["port_name"]]] += 1
|
||||
|
||||
return adj_matrix
|
||||
|
||||
|
||||
def from_numpy(device, *np_values):
|
||||
return [torch.from_numpy(v).to(device) for v in np_values]
|
||||
|
||||
|
||||
def gnn_union(p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask, device):
|
||||
"""Union multiple graph in CIM.
|
||||
|
||||
Args:
|
||||
v: Numpy array of shape (seq_len, batch, v_cnt, v_dim).
|
||||
vo: Numpy array of shape (batch, v_cnt, p_cnt).
|
||||
vedge: Numpy array of shape (batch, v_cnt, p_cnt, e_dim).
|
||||
Returns:
|
||||
result (dict): The dictionary that describes the graph.
|
||||
"""
|
||||
seq_len, batch, v_cnt, v_dim = v.shape
|
||||
_, _, p_cnt, p_dim = p.shape
|
||||
|
||||
p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask = from_numpy(
|
||||
device, p, po, pedge, v, vo, vedge, p2p, ppedge, seq_mask)
|
||||
|
||||
batch_range = torch.arange(batch, dtype=torch.long).to(device)
|
||||
# vadj.shape: (batch*v_cnt, p_cnt*)
|
||||
vadj, vedge = flatten_embedding(vo, batch_range, vedge)
|
||||
# vmask.shape: (batch*v_cnt, p_cnt*)
|
||||
vmask = vadj == 0
|
||||
# vadj.shape: (p_cnt*, batch*v_cnt)
|
||||
vadj = vadj.transpose(0, 1)
|
||||
# vedge.shape: (p_cnt*, batch*v_cnt, e_dim)
|
||||
vedge = vedge.transpose(0, 1)
|
||||
|
||||
padj, pedge = flatten_embedding(po, batch_range, pedge)
|
||||
pmask = padj == 0
|
||||
padj = padj.transpose(0, 1)
|
||||
pedge = pedge.transpose(0, 1)
|
||||
|
||||
p2p_adj = p2p.repeat(batch, 1, 1)
|
||||
# p2p_adj.shape: (batch*p_cnt, p_cnt*)
|
||||
p2p_adj, ppedge = flatten_embedding(p2p_adj, batch_range, ppedge)
|
||||
# p2p_mask.shape: (batch*p_cnt, p_cnt*)
|
||||
p2p_mask = p2p_adj == 0
|
||||
# p2p_adj.shape: (p_cnt*, batch*p_cnt)
|
||||
p2p_adj = p2p_adj.transpose(0, 1)
|
||||
ppedge = ppedge.transpose(0, 1)
|
||||
|
||||
return {
|
||||
"v": v,
|
||||
"p": p,
|
||||
"pe": {
|
||||
"edge": pedge,
|
||||
"adj": padj,
|
||||
"mask": pmask,
|
||||
},
|
||||
"ve": {
|
||||
"edge": vedge,
|
||||
"adj": vadj,
|
||||
"mask": vmask,
|
||||
},
|
||||
"ppe": {
|
||||
"edge": ppedge,
|
||||
"adj": p2p_adj,
|
||||
"mask": p2p_mask,
|
||||
},
|
||||
"mask": seq_mask,
|
||||
}
|
||||
|
||||
|
||||
def flatten_embedding(embedding, batch_range, edge=None):
|
||||
if len(embedding.shape) == 3:
|
||||
batch, x_cnt, y_cnt = embedding.shape
|
||||
addon = (batch_range * y_cnt).view(batch, 1, 1)
|
||||
else:
|
||||
seq_len, batch, x_cnt, y_cnt = embedding.shape
|
||||
addon = (batch_range * y_cnt).view(seq_len, batch, 1, 1)
|
||||
|
||||
embedding_mask = embedding == 0
|
||||
embedding += addon
|
||||
embedding[embedding_mask] = 0
|
||||
ret = embedding.reshape(-1, embedding.shape[-1])
|
||||
col_mask = ret.sum(dim=0) != 0
|
||||
ret = ret[:, col_mask]
|
||||
if edge is None:
|
||||
return ret
|
||||
else:
|
||||
edge = edge.reshape(-1, *edge.shape[2:])[:, col_mask, :]
|
||||
return ret, edge
|
||||
|
||||
|
||||
def log2json(file_path):
|
||||
"""load the log file as a json list."""
|
||||
with open(file_path, "r") as fp:
|
||||
lines = fp.read().splitlines()
|
||||
json_list = "[" + ",".join(lines) + "]"
|
||||
return ast.literal_eval(json_list)
|
||||
|
||||
|
||||
def decision_cnt_analysis(env, pv=False, buffer_size=8):
|
||||
if not pv:
|
||||
decision_cnt = [buffer_size] * len(env.node_name_mapping["static"])
|
||||
r, pa, is_done = env.step(None)
|
||||
while not is_done:
|
||||
decision_cnt[pa.port_idx] += 1
|
||||
action = Action(pa.vessel_idx, pa.port_idx, 0)
|
||||
r, pa, is_done = env.step(action)
|
||||
else:
|
||||
decision_cnt = OrderedDict()
|
||||
r, pa, is_done = env.step(None)
|
||||
while not is_done:
|
||||
if (pa.port_idx, pa.vessel_idx) not in decision_cnt:
|
||||
decision_cnt[pa.port_idx, pa.vessel_idx] = buffer_size
|
||||
else:
|
||||
decision_cnt[pa.port_idx, pa.vessel_idx] += 1
|
||||
action = Action(pa.vessel_idx, pa.port_idx, 0)
|
||||
r, pa, is_done = env.step(action)
|
||||
env.reset()
|
||||
return decision_cnt
|
||||
|
||||
|
||||
def random_shortage(env, tick, action_dim=21):
|
||||
_, pa, is_done = env.step(None)
|
||||
node_cnt = len(env.summary["node_mapping"]["ports"])
|
||||
while not is_done:
|
||||
"""
|
||||
load, discharge = pa.action_scope.load, pa.action_scope.discharge
|
||||
action_idx = np.random.randint(action_dim) - zero_idx
|
||||
if action_idx < 0:
|
||||
actual_action = int(1.0*action_idx/zero_idx*load)
|
||||
else:
|
||||
actual_action = int(1.0*action_idx/zero_idx*discharge)
|
||||
"""
|
||||
action = Action(pa.vessel_idx, pa.port_idx, 0)
|
||||
r, pa, is_done = env.step(action)
|
||||
|
||||
shs = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_shortage"]
|
||||
fus = env.snapshot_list["ports"][tick - 1:list(range(node_cnt)):"acc_fulfillment"]
|
||||
env.reset()
|
||||
return fus - shs, np.sum(shs + fus)
|
||||
|
||||
|
||||
def return_scaler(env, tick, gamma, action_dim=21):
|
||||
R, tot_amount = random_shortage(env, tick, action_dim)
|
||||
Rs_mean = np.mean(R) / tick / (1 - gamma)
|
||||
return abs(1.0 / Rs_mean), tot_amount
|
||||
|
||||
|
||||
def load_config(config_pth):
|
||||
with io.open(config_pth, "r") as in_file:
|
||||
raw_config = yaml.safe_load(in_file)
|
||||
config = convert_dottable(raw_config)
|
||||
|
||||
if config.env.seed < 0:
|
||||
config.env.seed = random.randint(0, 99999)
|
||||
|
||||
regularize_config(config)
|
||||
return config
|
||||
|
||||
|
||||
def save_config(config, config_pth):
|
||||
with open(config_pth, "w") as fp:
|
||||
config = dottable2dict(config)
|
||||
config["env"]["exp_per_ep"] = [f"{k[0]}, {k[1]}, {d}" for k, d in config["env"]["exp_per_ep"].items()]
|
||||
yaml.safe_dump(config, fp)
|
||||
|
||||
|
||||
def dottable2dict(config):
|
||||
if isinstance(config, float):
|
||||
return str(config)
|
||||
if not isinstance(config, dict):
|
||||
return clone(config)
|
||||
rt = {}
|
||||
for k, v in config.items():
|
||||
rt[k] = dottable2dict(v)
|
||||
return rt
|
||||
|
||||
|
||||
def save_code(folder, save_pth):
|
||||
save_path = os.path.join(save_pth, "code")
|
||||
code_pth = os.path.join(os.getcwd(), folder)
|
||||
shutil.copytree(code_pth, save_path)
|
||||
|
||||
|
||||
def fix_seed(env, seed):
|
||||
env.set_seed(seed)
|
||||
np.random.seed(seed)
|
||||
random.seed(seed)
|
||||
|
||||
|
||||
def zero_play(**args):
|
||||
env = Env(**args)
|
||||
_, pa, is_done = env.step(None)
|
||||
while not is_done:
|
||||
action = Action(pa.vessel_idx, pa.port_idx, 0)
|
||||
r, pa, is_done = env.step(action)
|
||||
return env.snapshot_list
|
||||
|
||||
|
||||
def regularize_config(config):
|
||||
def parse_value(v):
|
||||
try:
|
||||
return int(v)
|
||||
except ValueError:
|
||||
try:
|
||||
return float(v)
|
||||
except ValueError:
|
||||
if v == "false" or v == "False":
|
||||
return False
|
||||
elif v == "true" or v == "True":
|
||||
return True
|
||||
else:
|
||||
return v
|
||||
|
||||
def set_attr(config, attrs, value):
|
||||
if len(attrs) == 1:
|
||||
config[attrs[0]] = value
|
||||
else:
|
||||
set_attr(config[attrs[0]], attrs[1:], value)
|
||||
|
||||
all_args = sys.argv[1:]
|
||||
for i in range(len(all_args) // 2):
|
||||
name = all_args[i * 2]
|
||||
attrs = name[2:].split(".")
|
||||
value = parse_value(all_args[i * 2 + 1])
|
||||
set_attr(config, attrs, value)
|
||||
|
||||
|
||||
def analysis_speed(env):
|
||||
speed_dict = defaultdict(int)
|
||||
eq_speed = 0
|
||||
for ves in env.configs["vessels"].values():
|
||||
speed_dict[ves["sailing"]["speed"]] += 1
|
||||
for sp, cnt in speed_dict.items():
|
||||
eq_speed += 1.0 * cnt / sp
|
||||
eq_speed = 1.0 / eq_speed
|
||||
return speed_dict, eq_speed
|
|
@ -1,36 +0,0 @@
|
|||
env:
|
||||
seed: 10
|
||||
param:
|
||||
durations: 1120
|
||||
scenario: "cim"
|
||||
topology: "global_trade.22p_l0.8"
|
||||
# topology: "toy.4p_ssdd_l0.0"
|
||||
training:
|
||||
enable: True
|
||||
parallel_cnt: 1
|
||||
device: "cpu"
|
||||
batch_size: 16
|
||||
shuffle_time: 1
|
||||
rollout_cnt: 500
|
||||
train_freq: 1
|
||||
model_save_freq: 1
|
||||
gamma: 0.99
|
||||
learning_rate: 0.00005
|
||||
td_steps: 100
|
||||
entropy_loss_enable: True
|
||||
model:
|
||||
path: "./"
|
||||
tick_buffer: 20
|
||||
hidden_size: 32
|
||||
graph_output_dim: 32
|
||||
action_dim: 21
|
||||
feature:
|
||||
# temporal or random, if temporal, the edges in the graph are listed in the order of event time, else in a
|
||||
# random order.
|
||||
attention_order: temporal
|
||||
onehot_identity: False
|
||||
log:
|
||||
path: "./"
|
||||
exp:
|
||||
enable: false
|
||||
freq: 10
|
|
@ -1,70 +0,0 @@
|
|||
import datetime
|
||||
import os
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.utils import Logger
|
||||
|
||||
from components import (
|
||||
GNNLearner, GNNStateShaper, ParallelActor, SimpleAgentManger,
|
||||
decision_cnt_analysis, load_config, return_scaler, save_code, save_config
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
real_path = os.path.split(os.path.realpath(__file__))[0]
|
||||
|
||||
config_path = os.path.join(real_path, "config.yml")
|
||||
config = load_config(config_path)
|
||||
|
||||
# Generate log path.
|
||||
date_str = datetime.datetime.now().strftime("%Y%m%d")
|
||||
time_str = datetime.datetime.now().strftime("%H%M%S.%f")
|
||||
subfolder_name = f"{config.env.param.topology}_{time_str}"
|
||||
|
||||
# Log path.
|
||||
config.log.path = os.path.join(config.log.path, date_str, subfolder_name)
|
||||
if not os.path.exists(config.log.path):
|
||||
os.makedirs(config.log.path)
|
||||
|
||||
simulation_logger = Logger(tag="simulation", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
|
||||
|
||||
# Create a demo environment to retrieve environment information.
|
||||
simulation_logger.info("Approximating the experience quantity of each agent...")
|
||||
demo_env = Env(**config.env.param)
|
||||
config.env.exp_per_ep = decision_cnt_analysis(demo_env, pv=True, buffer_size=8)
|
||||
simulation_logger.info(config.env.exp_per_ep)
|
||||
|
||||
# Add some buffer to prevent overlapping.
|
||||
config.env.return_scaler, tot_order_amount = return_scaler(
|
||||
demo_env, tick=config.env.param.durations, gamma=config.training.gamma)
|
||||
simulation_logger.info(f"Return value will be scaled down by the factor {config.env.return_scaler}")
|
||||
|
||||
save_config(config, os.path.join(config.log.path, "config.yml"))
|
||||
save_code("examples/cim/gnn", config.log.path)
|
||||
|
||||
port_mapping = demo_env.summary["node_mapping"]["ports"]
|
||||
vessel_mapping = demo_env.summary["node_mapping"]["vessels"]
|
||||
|
||||
# Create a mock gnn_state_shaper.
|
||||
static_code_list, dynamic_code_list = list(port_mapping.values()), list(vessel_mapping.values())
|
||||
gnn_state_shaper = GNNStateShaper(
|
||||
static_code_list, dynamic_code_list, config.env.param.durations, config.model.feature,
|
||||
tick_buffer=config.model.tick_buffer, only_demo=True, max_value=demo_env.configs["total_containers"])
|
||||
gnn_state_shaper.compute_static_graph_structure(demo_env)
|
||||
|
||||
# Create and assemble agent_manager.
|
||||
agent_id_list = list(config.env.exp_per_ep.keys())
|
||||
training_logger = Logger(tag="training", dump_folder=config.log.path, dump_mode="w", auto_timestamp=False)
|
||||
agent_manager = SimpleAgentManger(
|
||||
"CIM-GNN-manager", agent_id_list, static_code_list, dynamic_code_list, demo_env, gnn_state_shaper,
|
||||
training_logger)
|
||||
agent_manager.assemble(config)
|
||||
|
||||
# Create the rollout actor to collect experience.
|
||||
actor = ParallelActor(config, demo_env, gnn_state_shaper, agent_manager, logger=simulation_logger)
|
||||
|
||||
# Learner function for training and testing.
|
||||
learner = GNNLearner(actor, agent_manager, logger=simulation_logger)
|
||||
learner.learn(config.training)
|
||||
|
||||
# Cancel all the child process used for rollout.
|
||||
actor.exit()
|
|
@ -1,22 +0,0 @@
|
|||
# Overview
|
||||
|
||||
The CIM problem is one of the quintessential use cases of MARO. The example can
|
||||
be run with a set of scenario configurations that can be found under
|
||||
maro/simulator/scenarios/cim. General experimental parameters (e.g., type of
|
||||
topology, type of algorithm to use, number of training episodes) can be configured
|
||||
through config.yml. Each RL formulation has a dedicated folder, e.g., dqn, and
|
||||
all algorithm-specific parameters can be configured through
|
||||
the config.py file in that folder.
|
||||
|
||||
## Single-host Single-process Mode
|
||||
|
||||
To run the CIM example using the DQN algorithm under single-host mode, go to
|
||||
examples/cim/dqn and run single_process_launcher.py. You may play around with
|
||||
the configuration if you want to try out different settings.
|
||||
|
||||
## Distributed Mode
|
||||
|
||||
The examples/cim/dqn/components folder contains dist_learner.py and dist_actor.py
|
||||
for distributed training. For debugging purposes, we provide a script that
|
||||
simulates distributed mode using multi-processing. Simply go to examples/cim/dqn
|
||||
and run multi_process_launcher.py to start the learner and actor processes.
|
|
@ -1,14 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from .action_shaper import CIMActionShaper
|
||||
from .agent_manager import POAgentManager, create_po_agents
|
||||
from .experience_shaper import TruncatedExperienceShaper
|
||||
from .state_shaper import CIMStateShaper
|
||||
|
||||
__all__ = [
|
||||
"CIMActionShaper",
|
||||
"POAgentManager", "create_po_agents",
|
||||
"TruncatedExperienceShaper",
|
||||
"CIMStateShaper"
|
||||
]
|
|
@ -1,33 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from maro.rl import ActionShaper
|
||||
from maro.simulator.scenarios.cim.common import Action
|
||||
|
||||
|
||||
class CIMActionShaper(ActionShaper):
|
||||
def __init__(self, action_space):
|
||||
super().__init__()
|
||||
self._action_space = action_space
|
||||
self._zero_action_index = action_space.index(0)
|
||||
|
||||
def __call__(self, model_action, decision_event, snapshot_list):
|
||||
scope = decision_event.action_scope
|
||||
tick = decision_event.tick
|
||||
port_idx = decision_event.port_idx
|
||||
vessel_idx = decision_event.vessel_idx
|
||||
|
||||
port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
|
||||
vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
|
||||
early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
|
||||
assert 0 <= model_action < len(self._action_space)
|
||||
|
||||
if model_action < self._zero_action_index:
|
||||
actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
|
||||
elif model_action > self._zero_action_index:
|
||||
plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
|
||||
actual_action = round(plan_action) if plan_action > 0 else round(self._action_space[model_action] * scope.discharge)
|
||||
else:
|
||||
actual_action = 0
|
||||
|
||||
return Action(vessel_idx, port_idx, actual_action)
|
|
@ -1,83 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import numpy as np
|
||||
import torch.nn as nn
|
||||
from torch.optim import Adam, RMSprop
|
||||
|
||||
from maro.rl import (
|
||||
AbsAgent, ActorCritic, ActorCriticConfig, FullyConnectedBlock, LearningModel, NNStack,
|
||||
OptimizerOptions, PolicyGradient, PolicyOptimizationConfig, SimpleAgentManager
|
||||
)
|
||||
from maro.utils import set_seeds
|
||||
|
||||
|
||||
class POAgent(AbsAgent):
|
||||
def train(self, states: np.ndarray, actions: np.ndarray, log_action_prob: np.ndarray, rewards: np.ndarray):
|
||||
self._algorithm.train(states, actions, log_action_prob, rewards)
|
||||
|
||||
|
||||
def create_po_agents(agent_id_list, config):
|
||||
input_dim, num_actions = config.input_dim, config.num_actions
|
||||
set_seeds(config.seed)
|
||||
agent_dict = {}
|
||||
for agent_id in agent_id_list:
|
||||
actor_net = NNStack(
|
||||
"actor",
|
||||
FullyConnectedBlock(
|
||||
input_dim=input_dim,
|
||||
output_dim=num_actions,
|
||||
activation=nn.Tanh,
|
||||
is_head=True,
|
||||
**config.actor_model
|
||||
)
|
||||
)
|
||||
|
||||
if config.type == "actor_critic":
|
||||
critic_net = NNStack(
|
||||
"critic",
|
||||
FullyConnectedBlock(
|
||||
input_dim=config.input_dim,
|
||||
output_dim=1,
|
||||
activation=nn.LeakyReLU,
|
||||
is_head=True,
|
||||
**config.critic_model
|
||||
)
|
||||
)
|
||||
|
||||
hyper_params = config.actor_critic_hyper_parameters
|
||||
hyper_params.update({"reward_discount": config.reward_discount})
|
||||
learning_model = LearningModel(
|
||||
actor_net, critic_net,
|
||||
optimizer_options={
|
||||
"actor": OptimizerOptions(cls=Adam, params=config.actor_optimizer),
|
||||
"critic": OptimizerOptions(cls=RMSprop, params=config.critic_optimizer)
|
||||
}
|
||||
)
|
||||
algorithm = ActorCritic(
|
||||
learning_model, ActorCriticConfig(critic_loss_func=nn.SmoothL1Loss(), **hyper_params)
|
||||
)
|
||||
else:
|
||||
learning_model = LearningModel(
|
||||
actor_net,
|
||||
optimizer_options=OptimizerOptions(cls=Adam, params=config.actor_optimizer)
|
||||
)
|
||||
algorithm = PolicyGradient(learning_model, PolicyOptimizationConfig(config.reward_discount))
|
||||
|
||||
agent_dict[agent_id] = POAgent(name=agent_id, algorithm=algorithm)
|
||||
|
||||
return agent_dict
|
||||
|
||||
|
||||
class POAgentManager(SimpleAgentManager):
|
||||
def train(self, experiences_by_agent: dict):
|
||||
for agent_id, exp in experiences_by_agent.items():
|
||||
if not isinstance(exp, list):
|
||||
exp = [exp]
|
||||
for trajectory in exp:
|
||||
self.agent_dict[agent_id].train(
|
||||
trajectory["state"],
|
||||
trajectory["action"],
|
||||
trajectory["log_action_probability"],
|
||||
trajectory["reward"]
|
||||
)
|
|
@ -1,19 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
"""
|
||||
This file is used to load the configuration and convert it into a dotted dictionary.
|
||||
"""
|
||||
|
||||
import io
|
||||
import os
|
||||
import yaml
|
||||
|
||||
|
||||
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../config.yml")
|
||||
with io.open(CONFIG_PATH, "r") as in_file:
|
||||
config = yaml.safe_load(in_file)
|
||||
|
||||
DISTRIBUTED_CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "../distributed_config.yml")
|
||||
with io.open(DISTRIBUTED_CONFIG_PATH, "r") as in_file:
|
||||
distributed_config = yaml.safe_load(in_file)
|
|
@ -1,51 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from collections import defaultdict
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import ExperienceShaper
|
||||
|
||||
|
||||
class TruncatedExperienceShaper(ExperienceShaper):
|
||||
def __init__(self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float,
|
||||
shortage_factor: float):
|
||||
super().__init__(reward_func=None)
|
||||
self._time_window = time_window
|
||||
self._time_decay_factor = time_decay_factor
|
||||
self._fulfillment_factor = fulfillment_factor
|
||||
self._shortage_factor = shortage_factor
|
||||
|
||||
def __call__(self, trajectory, snapshot_list):
|
||||
agent_ids = np.asarray(trajectory.get_by_key("agent_id"))
|
||||
states = np.asarray(trajectory.get_by_key("state"))
|
||||
actions = np.asarray(trajectory.get_by_key("action"))
|
||||
log_action_probabilities = np.asarray(trajectory.get_by_key("log_action_probability"))
|
||||
rewards = np.fromiter(
|
||||
map(self._compute_reward, trajectory.get_by_key("event"), [snapshot_list] * len(trajectory)),
|
||||
dtype=np.float32
|
||||
)
|
||||
return {agent_id: {
|
||||
"state": states[agent_ids == agent_id],
|
||||
"action": actions[agent_ids == agent_id],
|
||||
"log_action_probability": log_action_probabilities[agent_ids == agent_id],
|
||||
"reward": rewards[agent_ids == agent_id],
|
||||
}
|
||||
for agent_id in set(agent_ids)}
|
||||
|
||||
def _compute_reward(self, decision_event, snapshot_list):
|
||||
start_tick = decision_event.tick + 1
|
||||
end_tick = decision_event.tick + self._time_window
|
||||
ticks = list(range(start_tick, end_tick))
|
||||
|
||||
# calculate tc reward
|
||||
future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
|
||||
future_shortage = snapshot_list["ports"][ticks::"shortage"]
|
||||
decay_list = [self._time_decay_factor ** i for i in range(end_tick - start_tick)
|
||||
for _ in range(future_fulfillment.shape[0]//(end_tick-start_tick))]
|
||||
|
||||
tot_fulfillment = np.dot(future_fulfillment, decay_list)
|
||||
tot_shortage = np.dot(future_shortage, decay_list)
|
||||
|
||||
return np.float(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
|
|
@ -1,30 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.rl import StateShaper
|
||||
|
||||
PORT_ATTRIBUTES = ["empty", "full", "on_shipper", "on_consignee", "booking", "shortage", "fulfillment"]
|
||||
VESSEL_ATTRIBUTES = ["empty", "full", "remaining_space"]
|
||||
|
||||
|
||||
class CIMStateShaper(StateShaper):
|
||||
def __init__(self, *, look_back, max_ports_downstream):
|
||||
super().__init__()
|
||||
self._look_back = look_back
|
||||
self._max_ports_downstream = max_ports_downstream
|
||||
self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(PORT_ATTRIBUTES) + len(VESSEL_ATTRIBUTES)
|
||||
|
||||
def __call__(self, decision_event, snapshot_list):
|
||||
tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
|
||||
ticks = [tick - rt for rt in range(self._look_back - 1)]
|
||||
future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
|
||||
port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): PORT_ATTRIBUTES]
|
||||
vessel_features = snapshot_list["vessels"][tick: vessel_idx: VESSEL_ATTRIBUTES]
|
||||
state = np.concatenate((port_features, vessel_features))
|
||||
return str(port_idx), state
|
||||
|
||||
@property
|
||||
def dim(self):
|
||||
return self._dim
|
|
@ -1,50 +0,0 @@
|
|||
env:
|
||||
scenario: "cim"
|
||||
topology: "toy.4p_ssdd_l0.0"
|
||||
durations: 1120
|
||||
state_shaping:
|
||||
look_back: 7
|
||||
max_ports_downstream: 2
|
||||
experience_shaping:
|
||||
time_window: 100
|
||||
fulfillment_factor: 1.0
|
||||
shortage_factor: 1.0
|
||||
time_decay_factor: 0.97
|
||||
main_loop:
|
||||
max_episode: 100
|
||||
early_stopping:
|
||||
warmup_ep: 20
|
||||
last_k: 5
|
||||
perf_threshold: 0.95 # minimum performance (fulfillment ratio) required to trigger early stopping
|
||||
perf_stability_threshold: 0.1 # stability is measured by the maximum of abs(perf_(i+1) - perf_i) / perf_i
|
||||
# over the last k episodes (where perf is short for performance). This value must
|
||||
# be below this threshold to trigger early stopping
|
||||
agents:
|
||||
seed: 1024 # for reproducibility
|
||||
type: "actor_critic" # "actor_critic" or "policy_gradient"
|
||||
num_actions: 21
|
||||
actor_model:
|
||||
hidden_dims:
|
||||
- 256
|
||||
- 128
|
||||
- 64
|
||||
softmax_enabled: true
|
||||
batch_norm_enabled: false
|
||||
actor_optimizer:
|
||||
lr: 0.001
|
||||
critic_model:
|
||||
hidden_dims:
|
||||
- 256
|
||||
- 128
|
||||
- 64
|
||||
softmax_enabled: false
|
||||
batch_norm_enabled: true
|
||||
critic_optimizer:
|
||||
lr: 0.001
|
||||
reward_discount: .0
|
||||
actor_critic_hyper_parameters:
|
||||
train_iters: 10
|
||||
actor_loss_coefficient: 0.1
|
||||
k: 1
|
||||
lam: 0.0
|
||||
# clip_ratio: 0.8
|
|
@ -1,46 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.rl import AgentManagerMode, SimpleActor, ActorWorker
|
||||
from maro.utils import convert_dottable
|
||||
|
||||
from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
|
||||
|
||||
|
||||
def launch(config):
|
||||
config = convert_dottable(config)
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
state_shaper = CIMStateShaper(**config.env.state_shaping)
|
||||
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
|
||||
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
|
||||
|
||||
config["agents"]["input_dim"] = state_shaper.dim
|
||||
agent_manager = POAgentManager(
|
||||
name="cim_actor",
|
||||
mode=AgentManagerMode.INFERENCE,
|
||||
agent_dict=create_po_agents(agent_id_list, config.agents),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper,
|
||||
)
|
||||
proxy_params = {
|
||||
"group_name": os.environ["GROUP"],
|
||||
"expected_peers": {"learner": 1},
|
||||
"redis_address": ("localhost", 6379)
|
||||
}
|
||||
actor_worker = ActorWorker(
|
||||
local_actor=SimpleActor(env=env, agent_manager=agent_manager),
|
||||
proxy_params=proxy_params
|
||||
)
|
||||
actor_worker.launch()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config
|
||||
launch(config)
|
|
@ -1,46 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
|
||||
from maro.rl import ActorProxy, AgentManagerMode, Scheduler, SimpleLearner, merge_experiences_with_trajectory_boundaries
|
||||
from maro.simulator import Env
|
||||
from maro.utils import Logger, convert_dottable
|
||||
|
||||
from components import CIMStateShaper, POAgentManager, create_po_agents
|
||||
|
||||
|
||||
def launch(config):
|
||||
config = convert_dottable(config)
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
config["agents"]["input_dim"] = CIMStateShaper(**config.env.state_shaping).dim
|
||||
agent_manager = POAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN,
|
||||
agent_dict=create_po_agents(agent_id_list, config.agents)
|
||||
)
|
||||
|
||||
proxy_params = {
|
||||
"group_name": os.environ["GROUP"],
|
||||
"expected_peers": {"actor": int(os.environ["NUM_ACTORS"])},
|
||||
"redis_address": ("localhost", 6379)
|
||||
}
|
||||
|
||||
learner = SimpleLearner(
|
||||
agent_manager=agent_manager,
|
||||
actor=ActorProxy(
|
||||
proxy_params=proxy_params, experience_collecting_func=merge_experiences_with_trajectory_boundaries
|
||||
),
|
||||
scheduler=Scheduler(config.main_loop.max_episode),
|
||||
logger=Logger("cim_learner", auto_timestamp=False)
|
||||
)
|
||||
learner.learn()
|
||||
learner.test()
|
||||
learner.dump_models(os.path.join(os.getcwd(), "models"))
|
||||
learner.exit()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config
|
||||
launch(config)
|
|
@ -1,6 +0,0 @@
|
|||
redis:
|
||||
hostname: "localhost"
|
||||
port: 6379
|
||||
group: test_group
|
||||
num_actors: 1
|
||||
num_learners: 1
|
|
@ -1,26 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
"""
|
||||
This script is used to debug distributed algorithm in single host multi-process mode.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("group_name", help="group name")
|
||||
parser.add_argument("num_actors", type=int, help="number of actors")
|
||||
args = parser.parse_args()
|
||||
|
||||
learner_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_learner.py &"
|
||||
actor_path = f"{os.path.split(os.path.realpath(__file__))[0]}/dist_actor.py &"
|
||||
|
||||
# Launch the learner process
|
||||
os.system(f"GROUP={args.group_name} NUM_ACTORS={args.num_actors} python " + learner_path)
|
||||
|
||||
# Launch the actor processes
|
||||
for _ in range(args.num_actors):
|
||||
os.system(f"GROUP={args.group_name} python " + actor_path)
|
|
@ -1,91 +0,0 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
import os
|
||||
from statistics import mean
|
||||
|
||||
import numpy as np
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.rl import AgentManagerMode, Scheduler, SimpleActor, SimpleLearner
|
||||
from maro.utils import LogFormat, Logger, convert_dottable
|
||||
|
||||
from components import CIMActionShaper, CIMStateShaper, POAgentManager, TruncatedExperienceShaper, create_po_agents
|
||||
|
||||
|
||||
class EarlyStoppingChecker:
|
||||
"""Callable class that checks the performance history to determine early stopping.
|
||||
|
||||
Args:
|
||||
warmup_ep (int): Episode from which early stopping checking is initiated.
|
||||
last_k (int): Number of latest performance records to check for early stopping.
|
||||
perf_threshold (float): The mean of the ``last_k`` performance metric values must be above this value to
|
||||
trigger early stopping.
|
||||
perf_stability_threshold (float): The maximum one-step change over the ``last_k`` performance metrics must be
|
||||
below this value to trigger early stopping.
|
||||
"""
|
||||
def __init__(self, warmup_ep: int, last_k: int, perf_threshold: float, perf_stability_threshold: float):
|
||||
self._warmup_ep = warmup_ep
|
||||
self._last_k = last_k
|
||||
self._perf_threshold = perf_threshold
|
||||
self._perf_stability_threshold = perf_stability_threshold
|
||||
|
||||
def get_metric(record):
|
||||
return 1 - record["container_shortage"] / record["order_requirements"]
|
||||
self._metric_func = get_metric
|
||||
|
||||
def __call__(self, perf_history) -> bool:
|
||||
if len(perf_history) < max(self._last_k, self._warmup_ep):
|
||||
return False
|
||||
|
||||
metric_series = list(map(self._metric_func, perf_history[-self._last_k:]))
|
||||
max_delta = max(
|
||||
abs(metric_series[i] - metric_series[i - 1]) / metric_series[i - 1] for i in range(1, self._last_k)
|
||||
)
|
||||
print(f"mean_metric: {mean(metric_series)}, max_delta: {max_delta}")
|
||||
return mean(metric_series) > self._perf_threshold and max_delta < self._perf_stability_threshold
|
||||
|
||||
|
||||
def launch(config):
|
||||
# First determine the input dimension and add it to the config.
|
||||
config = convert_dottable(config)
|
||||
|
||||
# Step 1: initialize a CIM environment for using a toy dataset.
|
||||
env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
|
||||
agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
|
||||
|
||||
# Step 2: create state, action and experience shapers. We also need to create an explorer here due to the
|
||||
# greedy nature of the DQN algorithm.
|
||||
state_shaper = CIMStateShaper(**config.env.state_shaping)
|
||||
action_shaper = CIMActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.num_actions)))
|
||||
experience_shaper = TruncatedExperienceShaper(**config.env.experience_shaping)
|
||||
|
||||
# Step 3: create an agent manager.
|
||||
config["agents"]["input_dim"] = state_shaper.dim
|
||||
agent_manager = POAgentManager(
|
||||
name="cim_learner",
|
||||
mode=AgentManagerMode.TRAIN_INFERENCE,
|
||||
agent_dict=create_po_agents(agent_id_list, config.agents),
|
||||
state_shaper=state_shaper,
|
||||
action_shaper=action_shaper,
|
||||
experience_shaper=experience_shaper,
|
||||
)
|
||||
|
||||
# Step 4: Create an actor and a learner to start the training process.
|
||||
scheduler = Scheduler(
|
||||
config.main_loop.max_episode,
|
||||
early_stopping_checker=EarlyStoppingChecker(**config.main_loop.early_stopping)
|
||||
)
|
||||
actor = SimpleActor(env, agent_manager)
|
||||
learner = SimpleLearner(
|
||||
agent_manager, actor, scheduler,
|
||||
logger=Logger("cim_learner", format_=LogFormat.simple, auto_timestamp=False)
|
||||
)
|
||||
learner.learn()
|
||||
learner.test()
|
||||
learner.dump_models(os.path.join(os.getcwd(), "models"))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from components.config import config
|
||||
launch(config)
|
|
@ -1,50 +1,69 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
|
||||
# Enable realtime data streaming with following statements.
|
||||
|
||||
# import os
|
||||
|
||||
# os.environ["MARO_STREAMIT_ENABLED"] = "true"
|
||||
# os.environ["MARO_STREAMIT_EXPERIMENT_NAME"] = "test_317"
|
||||
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.simulator.scenarios.cim.common import Action
|
||||
from maro.simulator.scenarios.cim.common import Action, ActionType
|
||||
from maro.streamit import streamit
|
||||
|
||||
start_tick = 0
|
||||
durations = 100 # 100 days
|
||||
if __name__ == "__main__":
|
||||
start_tick = 0
|
||||
durations = 100 # 100 days
|
||||
|
||||
opts = dict()
|
||||
"""
|
||||
enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
|
||||
If you leave value to empty string, it will dump to current folder.
|
||||
For getting dump data, please uncomment below line and specify dump destination folder.
|
||||
"""
|
||||
# opts['enable-dump-snapshot'] = ''
|
||||
opts = dict()
|
||||
with streamit:
|
||||
"""
|
||||
enable-dump-snapshot parameter means business_engine needs dump snapshot data before reset.
|
||||
If you leave value to empty string, it will dump to current folder.
|
||||
For getting dump data, please uncomment below line and specify dump destination folder.
|
||||
"""
|
||||
# opts['enable-dump-snapshot'] = ''
|
||||
|
||||
# Initialize an environment with a specific scenario, related topology.
|
||||
env = Env(scenario="cim", topology="toy.5p_ssddd_l0.0",
|
||||
start_tick=start_tick, durations=durations, options=opts)
|
||||
# Initialize an environment with a specific scenario, related topology.
|
||||
env = Env(scenario="cim", topology="global_trade.22p_l0.1",
|
||||
start_tick=start_tick, durations=durations, options=opts)
|
||||
|
||||
# Query environment summary, which includes business instances, intra-instance attributes, etc.
|
||||
print(env.summary)
|
||||
|
||||
# Query environment summary, which includes business instances, intra-instance attributes, etc.
|
||||
print(env.summary)
|
||||
for ep in range(2):
|
||||
# Tell streamit we are in a new episode.
|
||||
streamit.episode(ep)
|
||||
|
||||
for ep in range(2):
|
||||
# Gym-like step function
|
||||
metrics, decision_event, is_done = env.step(None)
|
||||
# Gym-like step function.
|
||||
metrics, decision_event, is_done = env.step(None)
|
||||
|
||||
while not is_done:
|
||||
past_week_ticks = [x for x in range(
|
||||
decision_event.tick - 7, decision_event.tick)]
|
||||
decision_port_idx = decision_event.port_idx
|
||||
intr_port_infos = ["booking", "empty", "shortage"]
|
||||
while not is_done:
|
||||
past_week_ticks = [x for x in range(
|
||||
max(decision_event.tick - 7, 0), decision_event.tick)]
|
||||
decision_port_idx = decision_event.port_idx
|
||||
intr_port_infos = ["booking", "empty", "shortage"]
|
||||
|
||||
# Query the decision port booking, empty container inventory, shortage information in the past week
|
||||
past_week_info = env.snapshot_list["ports"][past_week_ticks:
|
||||
decision_port_idx:
|
||||
intr_port_infos]
|
||||
# Query the decision port booking, empty container inventory, shortage information in the past week
|
||||
past_week_info = env.snapshot_list["ports"][past_week_ticks:
|
||||
decision_port_idx:
|
||||
intr_port_infos]
|
||||
|
||||
dummy_action = Action(decision_event.vessel_idx,
|
||||
decision_event.port_idx, 0)
|
||||
dummy_action = Action(
|
||||
decision_event.vessel_idx,
|
||||
decision_event.port_idx,
|
||||
0,
|
||||
ActionType.LOAD
|
||||
)
|
||||
|
||||
# Drive environment with dummy action (no repositioning)
|
||||
metrics, decision_event, is_done = env.step(dummy_action)
|
||||
# Drive environment with dummy action (no repositioning)
|
||||
metrics, decision_event, is_done = env.step(dummy_action)
|
||||
|
||||
# Query environment business metrics at the end of an episode,
|
||||
# it is your optimized object (usually includes multi-target).
|
||||
print(f"ep: {ep}, environment metrics: {env.metrics}")
|
||||
env.reset()
|
||||
# Query environment business metrics at the end of an episode,
|
||||
# it is your optimized object (usually includes multi-target).
|
||||
print(f"ep: {ep}, environment metrics: {env.metrics}")
|
||||
|
||||
env.reset()
|
||||
|
|
|
@ -18,16 +18,16 @@ def worker(group_name):
|
|||
component_type="worker",
|
||||
expected_peers={"master": 1})
|
||||
counter = 0
|
||||
print(f"{proxy.component_name}'s counter is {counter}.")
|
||||
print(f"{proxy.name}'s counter is {counter}.")
|
||||
|
||||
# Nonrecurring receive the message from the proxy.
|
||||
for msg in proxy.receive(is_continuous=False):
|
||||
print(f"{proxy.component_name} receive message from {msg.source}.")
|
||||
print(f"{proxy.name} receive message from {msg.source}.")
|
||||
|
||||
if msg.tag == "INC":
|
||||
counter += 1
|
||||
print(f"{proxy.component_name} receive INC request, {proxy.component_name}'s count is {counter}.")
|
||||
proxy.reply(received_message=msg, tag="done")
|
||||
print(f"{proxy.name} receive INC request, {proxy.name}'s count is {counter}.")
|
||||
proxy.reply(message=msg, tag="done")
|
||||
|
||||
|
||||
def master(group_name: str, worker_num: int, is_immediate: bool = False):
|
||||
|
@ -55,17 +55,18 @@ def master(group_name: str, worker_num: int, is_immediate: bool = False):
|
|||
session_type=SessionType.NOTIFICATION
|
||||
)
|
||||
# Do some tasks with higher priority here.
|
||||
replied_msgs = proxy.receive_by_id(session_ids)
|
||||
replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
|
||||
else:
|
||||
replied_msgs = proxy.broadcast(
|
||||
component_type="worker",
|
||||
tag="INC",
|
||||
session_type=SessionType.NOTIFICATION
|
||||
session_type=SessionType.NOTIFICATION,
|
||||
timeout=-1
|
||||
)
|
||||
|
||||
for msg in replied_msgs:
|
||||
print(
|
||||
f"{proxy.component_name} get receive notification from {msg.source} with "
|
||||
f"{proxy.name} get receive notification from {msg.source} with "
|
||||
f"message session stage {msg.session_stage}."
|
||||
)
|
||||
|
||||
|
|
|
@ -22,11 +22,11 @@ def summation_worker(group_name):
|
|||
|
||||
# Nonrecurring receive the message from the proxy.
|
||||
for msg in proxy.receive(is_continuous=False):
|
||||
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
|
||||
if msg.tag == "job":
|
||||
replied_payload = sum(msg.payload)
|
||||
proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
|
||||
proxy.reply(message=msg, tag="sum", payload=replied_payload)
|
||||
|
||||
|
||||
def multiplication_worker(group_name):
|
||||
|
@ -42,11 +42,11 @@ def multiplication_worker(group_name):
|
|||
|
||||
# Nonrecurring receive the message from the proxy.
|
||||
for msg in proxy.receive(is_continuous=False):
|
||||
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
|
||||
if msg.tag == "job":
|
||||
replied_payload = np.prod(msg.payload)
|
||||
proxy.reply(received_message=msg, tag="multiply", payload=replied_payload)
|
||||
proxy.reply(message=msg, tag="multiply", payload=replied_payload)
|
||||
|
||||
|
||||
def master(group_name: str, sum_worker_number: int, multiply_worker_number: int, is_immediate: bool = False):
|
||||
|
@ -88,19 +88,20 @@ def master(group_name: str, sum_worker_number: int, multiply_worker_number: int,
|
|||
session_type=SessionType.TASK,
|
||||
destination_payload_list=destination_payload_list)
|
||||
# Do some tasks with higher priority here.
|
||||
replied_msgs = proxy.receive_by_id(session_ids)
|
||||
replied_msgs = proxy.receive_by_id(session_ids, timeout=-1)
|
||||
else:
|
||||
replied_msgs = proxy.scatter(tag="job",
|
||||
session_type=SessionType.TASK,
|
||||
destination_payload_list=destination_payload_list)
|
||||
destination_payload_list=destination_payload_list,
|
||||
timeout=-1)
|
||||
|
||||
sum_result, multiply_result = 0, 1
|
||||
for msg in replied_msgs:
|
||||
if msg.tag == "sum":
|
||||
print(f"{proxy.component_name} receive message from {msg.source} with the sum result {msg.payload}.")
|
||||
print(f"{proxy.name} receive message from {msg.source} with the sum result {msg.payload}.")
|
||||
sum_result += msg.payload
|
||||
elif msg.tag == "multiply":
|
||||
print(f"{proxy.component_name} receive message from {msg.source} with the multiply result {msg.payload}.")
|
||||
print(f"{proxy.name} receive message from {msg.source} with the multiply result {msg.payload}.")
|
||||
multiply_result *= msg.payload
|
||||
|
||||
# Check task result correction.
|
||||
|
|
|
@ -22,11 +22,11 @@ def worker(group_name):
|
|||
|
||||
# Nonrecurring receive the message from the proxy.
|
||||
for msg in proxy.receive(is_continuous=False):
|
||||
print(f"{proxy.component_name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
print(f"{proxy.name} receive message from {msg.source}. the payload is {msg.payload}.")
|
||||
|
||||
if msg.tag == "sum":
|
||||
replied_payload = sum(msg.payload)
|
||||
proxy.reply(received_message=msg, tag="sum", payload=replied_payload)
|
||||
proxy.reply(message=msg, tag="sum", payload=replied_payload)
|
||||
|
||||
|
||||
def master(group_name: str, is_immediate: bool = False):
|
||||
|
@ -49,19 +49,19 @@ def master(group_name: str, is_immediate: bool = False):
|
|||
|
||||
for peer in proxy.peers_name["worker"]:
|
||||
message = SessionMessage(tag="sum",
|
||||
source=proxy.component_name,
|
||||
source=proxy.name,
|
||||
destination=peer,
|
||||
payload=random_integer_list,
|
||||
session_type=SessionType.TASK)
|
||||
if is_immediate:
|
||||
session_id = proxy.isend(message)
|
||||
# Do some tasks with higher priority here.
|
||||
replied_msgs = proxy.receive_by_id(session_id)
|
||||
replied_msgs = proxy.receive_by_id(session_id, timeout=-1)
|
||||
else:
|
||||
replied_msgs = proxy.send(message)
|
||||
replied_msgs = proxy.send(message, timeout=-1)
|
||||
|
||||
for msg in replied_msgs:
|
||||
print(f"{proxy.component_name} receive {msg.source}, replied payload is {msg.payload}.")
|
||||
print(f"{proxy.name} receive {msg.source}, replied payload is {msg.payload}.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -0,0 +1,46 @@
|
|||
# Copyright (c) Microsoft Corporation.
|
||||
# Licensed under the MIT license.
|
||||
|
||||
from maro.simulator.scenarios.cim.common import Action, DecisionEvent
|
||||
from maro.vector_env import VectorEnv
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
with VectorEnv(batch_num=4, scenario="cim", topology="toy.5p_ssddd_l0.0", durations=100) as env:
|
||||
for ep in range(2):
|
||||
print("current episode:", ep)
|
||||
|
||||
metrics, decision_event, is_done = (None, None, False)
|
||||
|
||||
while not is_done:
|
||||
action = None
|
||||
|
||||
# Usage:
|
||||
# 1. Only push speicified (1st for this example) environment, leave others behind
|
||||
# if decision_event:
|
||||
# env0_dec: DecisionEvent = decision_event[0]
|
||||
|
||||
# # 1.1 After 1st environment is done, then others will push forward.
|
||||
# if env0_dec:
|
||||
# ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
|
||||
# action = {0: Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)}
|
||||
|
||||
# 2. Only pass action to 1st environment (give None to other environments),
|
||||
# but keep pushing all the environment, until the end
|
||||
if decision_event:
|
||||
env0_dec: DecisionEvent = decision_event[0]
|
||||
|
||||
if env0_dec:
|
||||
ss0 = env.snapshot_list["vessels"][env0_dec.tick:env0_dec.vessel_idx:"remaining_space"]
|
||||
|
||||
action = [None] * env.batch_number
|
||||
|
||||
# with a list of action, will push all environment to next step
|
||||
action[0] = Action(env0_dec.vessel_idx, env0_dec.port_idx, -env0_dec.action_scope.load)
|
||||
|
||||
metrics, decision_event, is_done = env.step(action)
|
||||
|
||||
print("Final tick for each environment:", env.tick)
|
||||
print("Final frame index for each environment:", env.frame_index)
|
||||
|
||||
env.reset()
|
|
@ -0,0 +1,12 @@
|
|||
# Simulation Results
|
||||
|
||||
Below table is the simulation results of current topologies based on `Best Fit` algorithm.
|
||||
|
||||
In the oversubscription topologies, the oversubscription rate is `115%`.
|
||||
|
||||
|Topology | PM Setting | Time Spent(s) | Total VM Requests |Successful Allocation| Energy Consumption| Total Oversubscriptions | Total Overload PMs
|
||||
|:----:|-----|:--------:|:---:|:-------:|:----:|:---:|:---:|
|
||||
|10k| 100 PMs, 32 Cores, 128 GB | 104.98|10,000| 10,000| 2,399,610 | 0 | 0|
|
||||
|10k.oversubscription| 100 PMs, 32 Cores, 128 GB| 101.00 |10,000 |10,000| 2,386,371| 279,331 | 0|
|
||||
|336k| 880 PMs, 16 Cores, 112 GB | 7,896.37 |335,985| 109,249 |26,425,878 | 0 | 0 |
|
||||
|336k.oversubscription| 880 PMs, 16 Cores, 112 GB | 7,903.33| 335,985| 115,008 | 27,440,946 | 3,868,475 | 0
|
|
@ -1,7 +0,0 @@
|
|||
env:
|
||||
scenario: vm_scheduling
|
||||
topology: azure.2019.10k
|
||||
start_tick: 0
|
||||
durations: 8638
|
||||
resolution: 1
|
||||
seed: 88
|
|
@ -1,74 +0,0 @@
|
|||
import io
|
||||
import os
|
||||
import random
|
||||
import timeit
|
||||
|
||||
import yaml
|
||||
|
||||
from maro.simulator import Env
|
||||
from maro.simulator.scenarios.vm_scheduling import AllocateAction, DecisionPayload, PostponeAction
|
||||
from maro.utils import convert_dottable
|
||||
|
||||
CONFIG_PATH = os.path.join(os.path.split(os.path.realpath(__file__))[0], "config.yml")
|
||||
with io.open(CONFIG_PATH, "r") as in_file:
|
||||
raw_config = yaml.safe_load(in_file)
|
||||
config = convert_dottable(raw_config)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
start_time = timeit.default_timer()
|
||||
|
||||
env = Env(
|
||||
scenario=config.env.scenario,
|
||||
topology=config.env.topology,
|
||||
start_tick=config.env.start_tick,
|
||||
durations=config.env.durations,
|
||||
snapshot_resolution=config.env.resolution
|
||||
)
|
||||
|
||||
if config.env.seed is not None:
|
||||
env.set_seed(config.env.seed)
|
||||
random.seed(config.env.seed)
|
||||
|
||||
metrics: object = None
|
||||
decision_event: DecisionPayload = None
|
||||
is_done: bool = False
|
||||
action: AllocateAction = None
|
||||
metrics, decision_event, is_done = env.step(None)
|
||||
|
||||
while not is_done:
|
||||
valid_pm_num: int = len(decision_event.valid_pms)
|
||||
if valid_pm_num <= 0:
|
||||
# No valid PM now, postpone.
|
||||
action: PostponeAction = PostponeAction(
|
||||
vm_id=decision_event.vm_id,
|
||||
postpone_step=1
|
||||
)
|
||||
else:
|
||||
# Get the capacity and allocated cores from snapshot.
|
||||
valid_pm_info = env.snapshot_list["pms"][
|
||||
env.frame_index:decision_event.valid_pms:["cpu_cores_capacity", "cpu_cores_allocated"]
|
||||
].reshape(-1, 2)
|
||||
# Calculate to get the remaining cpu cores.
|
||||
cpu_cores_remaining = valid_pm_info[:, 0] - valid_pm_info[:, 1]
|
||||
# Choose the one with the closet remaining CPU.
|
||||
chosen_idx = 0
|
||||
minimum_remaining_cpu_cores = cpu_cores_remaining[0]
|
||||
for i, remaining in enumerate(cpu_cores_remaining):
|
||||
if remaining < minimum_remaining_cpu_cores:
|
||||
chosen_idx = i
|
||||
minimum_remaining_cpu_cores = remaining
|
||||
# Take action to allocate on the closet pm.
|
||||
action: AllocateAction = AllocateAction(
|
||||
vm_id=decision_event.vm_id,
|
||||
pm_id=decision_event.valid_pms[chosen_idx]
|
||||
)
|
||||
metrics, decision_event, is_done = env.step(action)
|
||||
|
||||
end_time = timeit.default_timer()
|
||||
print(
|
||||
f"[Best fit] Topology: {config.env.topology}. Total ticks: {config.env.durations}."
|
||||
f" Start tick: {config.env.start_tick}."
|
||||
)
|
||||
print(f"[Timer] {end_time - start_time:.2f} seconds to finish the simulation.")
|
||||
print(metrics)
|
|
@ -1,7 +0,0 @@
|
|||
env:
|
||||
scenario: vm_scheduling
|
||||
topology: azure.2019.10k
|
||||
start_tick: 0
|
||||
durations: 8638
|
||||
resolution: 1
|
||||
seed: 666
|