maro/citi_bike в d94899777e45505bd1cebfcb68f9bb345990dda7 - maro

Jinyu-W fa092f35b1 V0.2 update (#262 ) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * bug fix related to np array divide (#245) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Master.simple bike (#250) * notebook for simple bike repositioning added * add simple rule-based algorithms * unify input * add policy based on statistics * update be for simple bike scenario to fit latest event buffer changes (#247) * change rendered graph * figures updated * change notebook * matplot updated * figures updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: wesley <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * simple bike repositioning article: formula updated * checked out docs/source from v0.2 * aligned with v0.2 * rm unwanted import * added references in policy_optimization.py * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 backend dynamic node support (#172) * update lint workflow * fix workflow issue * Update lint.yml * Create tox.ini * Update lint.yml * Update lint.yml * Update tox.ini * Update lint.yml * Delete tox.ini from root folder, move it to .github/linters * Update CONTRIBUTING.md * add more comments * update lint conf to ignore cli banner issue * change extension implementation from c to cpp * update script to gen cpp files * backend base interface redefine * interface revamp for np backend * 1st step for revamp * bug fix * draft * implementation of attribute * implementation of backend * remove backend switching * draft raw backend wrapper * correct function parameter type * 1st runable version * bug fix for types * ut passed * change CRLF to LF * fix get_node_info interface * add raw test in frame ut * return np.array for all query result * use ticks from backend * set init value * snapshot ut passed * support set default backend by environemnt variable * env ut with different backend * fix take snapshot index bug * test under both backends * ignore generated cpp file * fix lint isues * more lint fix * use ordered map to store ticks to keep the order * remove test code * refine dup code * refine code to avoid too much if/else * handle and raise exception for attr getter * change the way to handle cpp exception, use cython runtimeerror instead * add missing function, and fix bug in np impl * fix lint issue * specify c++11 flag for compilers * use normal field assignment instead initializer list, as linux gcc will complain it * add np ignore macro * try to refine token pasting operator to avoid error on linux * more pasting operator issue fix * remove un-used options * update workflow files to fit new backend * 1st version of dynamic backend structure * setup ut for cpp using lest * bitset complete * attributestore and ut * arrange * copy_to * current frame * ut for frame * bug fix and ut correct * fix issue that value not correct after arrange * fix bug in test case * frame update * change the way to add nodes, support add node from middle * frame in backend * snapshotlist code complete * add size method for snapshotlist, add ut template * make sure snapshot max size not be 0 * add max size * fix query parameters * fix attribute store extend error * add function to retrieve attribute from snapshotlist * return nan for invalid index * add function to check if nan for float attribute only * fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash * add functions to expose internal state under debug mode, make it easy to do unit test * fix issue that cause overlap logic skiped * ut passed for all implemented functions * remove query in ut, as it not completed yet * refine querying interfaces, use 2 functions for 1 querying * snapshot query, * use pointer instead weak_ptr * backend impl * set default parameters value * query bug fix, * bug fix: new_attr should return attr id not node id * use macro to create attribute getters * add reset support * change the way to reset, avoid allocation time * test reset for attributestore * use Bitset instead vector<bool> to make it easy to reset * refine backend interfaces to make it compact with old one * correct quering interface, cython compile passed * bug fix: get_ticks not set correct index * correct cpp backend binding, add type for frame * correct ut for snapshot * bug fix: query cause crash after snapshot reset * fix env test * bug fix: is_nan should check data type first * fix cim ut issues with raw backend * fix citibike ut issues for raw backend * add interfaces to support dynamic nodes, not tested * bug fix: access cpp object without cdef * bug fix: missing impl for dynamic methods * ut for append nodes * return node number dynamiclly * remove unused parameters for snapshot * remove unused code * allow get attribute for deleted node * ut for delete and resume node * function to set attribute slot * bug fix: set attribute will cause crash * bug fix: remove append node when reset cause exception * bug fix: frame.backend_type return incorrect name * backends performance comparison * correct internal type * correct warnings * missing ; * formating * fix lint issue * simple the way to copy mapping * add dump interfaces * frame dump * ignore if dump path is not exist * bug fix: use max slots instead of current slots for padding in snapshot querying * use max slot number in history instead of current for padding * dump for snapshot * close file at the end * refine snapshot dump function * fix lint issue * avoid too much allocate operation * use pointer instead reference for furthure changes * avoid 2 times map copy * add comments for missing functions * performance optimize * use emplace instead push * use emplace instead push * remove cpp files * add missing lisence * ignore .vs folder * add lest lisence for cpp unittest * Delete CMakeLists.txt * add error msg for exception, make it easy to identify error at python side * remove old codes * replace with new code * change IDENTIER to NODE_TYPE and ATTR_TYPE * build pass * fix attr type not correct bug * reomve unused comment * make frame ut pass * correct the max snapshots checking * fix test case * add missing file * correct performance test * refine attribute code * refine bitset code * update FrameBase doc about switch backend * correct the exception name * refine frame code * refine node code * refine snapshot list code * add is_const and is_list when adding attribute * support query const attribute without tick exist * add operations for list attribute * remove cache as we have list attribute * add remove and insert for list attribute * add for-loop support for list attribute * fix bug that not update list attribute slot number after operations * test for dynamic features * frame dump * dump for snapshot list * fix issue on gcc compiler * add missing file * fix lint issues * refine the exception, more comments * fix lint issue * fix lint issue * use simulate enum instead of str * Use new type instead old in tests * using mapping instead if-else * remove generated code * use mapping to reduce too much if-else * add default attribute type int if not provided or invalid provided * remove generated code * update workflow with code gen * more frame test * add missing files * test: cover maro.simulator.utils.common * update test with new scenario * comments * tests * update doc * fix lint and comments * CRLF to LF * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 vm oversub docs (#256) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment * Update overload to the VM docs * Update docs * Update vm docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>	2021-01-25 18:23:41 +08:00
..
test_bike_scenario.py	V0.2 update (#262 )	2021-01-25 18:23:41 +08:00

* refine readme

* feat: refine data push/pull (#138)

* feat: refine data push/pull

* test: add cli provision testing

* fix: style fix

* fix: add necessary comments

* fix: from code review

* add fall back function in weather download (#112)

* fix deployment issue in multi envs

* fix typo

* fix ~/.maro not exist issue in build

* skip deploy when build

* update for comments

* temporarily disable weather info

* replace ecr with cim in setup.py

* replace ecr in manifest

* remove weather check when read data

* fix station id issue

* fix format

* add TODO in comments

* add noaa weather source

* fix weather reset and weather comment

* add comment for weather data url

* some format update

* add fall back function in weather download

* update comment

* update for comments

* update comment

* add period

* fix for pylint

* update for pylint check

* added example docs (#136)

* added example docs

* added citibike greedy example doc

* modified citibike doc

* fixed PR comments

* fixed more PR comments

* fixed small formatting issue

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* switch the key and value of handler_dict in decorator (#144)

* switch the key and value of handler_dict in decorator

* add dist decorator UT and fixed multithreading conflict in maro test suite

* pr comments update.

* resolved comments about decorator UT

* rename handler_fun in dist decorator

* change self.attr into class_name.attr

* update UT tests comments

* V0.1 annotation (#147)

* refine the annotation of simulator core

* remove reward from env(be)

* format refined

* white spaces test

* left-padding spaces refined

* format modifed

* update the left-padding spaces of docstrings

* code format updated

* update according to comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Event payload details for env.summary (#156)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* V0.2 online lp for citi bike (#159)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

* online LP example added for citi bike

* infeasible solution

* infeasible solution fixed: call snapshot before any env.step()

* experiment results of toy topos added

* experiment results of toy topos added

* experiment result update: better than naive baseline

* PuLP version added

* greedy experiment results update

* citibike result update

* modified according to PR comments

* update experiment results and forecasting comparison

* citi bike lp README updated

* README updated

* modified according to PR comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* update according to flake8

* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)

* 1. added logical operator overloading for early stopping checker; 2. added mean value checker

* fixed PR comments

* removed learner.exit() in single_process_launcher

* added another early stopping checker in example

* fixed PR comments and lint issues

* lint issue fix

* fixed lint issues

* fixed a bug

* fixed a bug

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 skip connection (#176)

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* moved reward type casting to exp shaper

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* fixed a bug in learner's test() (#193)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 double dqn (#188)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* set is_double to true in DQN config

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature predefined image (#183)

* feat: support predefined image provision

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature proxy rejoin (#158)

* update dist decorator

* replace proxy.get_peers by proxy.peers

* update proxy rejoin (draft, not runable for proxy rejoin)

* fix bugs in proxy

* add message cache, and redesign rejoin parameter

* feat: add checkpoint with test

* update proxy.rejoin

* fixed rejoin bug, rename func

* add test example(temp)

* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.

* capital env vari name

* rm json.dumps; change retries to 10; temp add warning level for rejoin

* fix: unable to load FaultToleranceAgent, missing params

* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent

* feat: add node_id to node_details

* fix: add a new dependency for tests

* style: meet linting requirements

* style: remaining linting problems

* lint fixed; rm temp test folder.

* fixed lint f-string without placeholder

* fix: add a flag for "remove_container", refine restart logic and Redis keys naming

* proxy rejoin update.

* variable rename.

* fixed lint issues

* fixed lint issues

* add exit code for different error

* feat: add special errors handler

* add max rejoin times

* remove unused import

* add rejoin UT; resolve rejoin comments

* lint fixed

* fixed UT import problem

* rm MessageCache in proxy

* fix: refine key naming

* update proxy rejoin; add topic for broadcast

* feat: support predefined image provision

* update UT for communication

* add docstring for rejoin

* fixed isort and zmq driver import

* fixed isort and UT test

* fix isort issue

* proxy rejoin update (comments v2)

* fixed isort error

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* feat: add exists method for checkpoint

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* add driver close and socket SUB disconnect for rejoin

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

* fixed comments and update logger level

* mv driver in proxy.__init__ for issue temp fixed.

* Update docstring and comments

* style: fix code reviews problems

* fix code format

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 feature cli windows (#203)

* fix: change local mkdir to os.makedirs

* fix: add utf8 encoding for logger

* fix: add powershell.exe prefix to subprocess functions

* feat: add debug_green

* fix: use fsutil to create fix-size files in Windows

* fix: use universal_newlines=True to handle encoding problem in different operating systems

* fix: use temp file to do copy when the operating system is not Linux

* fix: linting error

* fix: use fsutil in test_k8s.py

* feat: dynamic init ABS_PATH in GlobalParams

* fix: use -Command to execute Powershell command

* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode

* fix: problems in code review

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* V0.2 merge master (#214)

* fix the visualization of docs/key_components/distributed_toolkit

* add examples into isort ignore

* refine import path for examples (#195)

* refine import path for examples

* refine indents

* fixed formatting issues

* update code style

* add editorconfig-checker, add editorconfig path into lint, change super-linter version

* change path for code saving in cim.gnn

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>

* fix issue that sometimes there is conflict between distutils and setuptools  (#208)

* fix issue that cython and setuptools conflict

* follow the accepted temp workaround

* update comment, it should be conflict between setuptools and distutils

* fixed bugs related to proxy interface changes

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* typo fix

* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)

* bug fix

* clear the reference after extract sub events, update ut to cover this issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fix flake8 style problem

* V0.2 feature refine mode namings (#212)

* feat: refine cli exception

* feat: refine mode namings

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* fixed bugs in dist rl

* feat: rename files

* tests: set longer gracefully wait time

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: rm redundant variables

* fix: refine error message

Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vis new (#210)

Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* V0.2 local host process (#221)

* Update local process (not ready)

* update cli process mode

* add setup/clear/template for maro process

* fix process stop

* add logger and rename parameters

* add logger for setup/clear

* fixed close not exist pid when given pid list.

* Fixed comments and rename setup/clear with create/delete

* update ProcessInternalError

* V0.2 grass on premises (#220)

* feat: refine cli exception
* commit on v0.2_grass_on_premises

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm scheduling scenario (#189)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Resolve none action problem (#224)

* V0.2 vm_scheduling notebook (#223)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Init vm shceduling notebook

* Add notebook

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update based on the v0.2_datacenter

* Update notebook

* Update

* update filepath

* notebook updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Update process mode docs and fixed on premises (#226)

* V0.2 Add github workflow integration (#222)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* V0.2 explorer (#198)

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* added noise explorer

* fixed formatting

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* removed epsilon parameter from choose_action

* fixed some PR comments

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* refined dqn example

* fixed lint issues

* simplified scheduler

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 embedded optim (#191)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* embedded optimizer into SingleHeadLearningModel

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* minor docstring edits

* mv optimizer options inside LearningMode

* modified example accordingly

* fixed a bug

* fixed a bug

* fixed a bug

* added dueling DQN feature

* revised and refined docstrings

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 VM scheduling docs (#228)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* vm doc init

* Update docs

* Update docs

* Update docs

* Update docs

* Remove old notebook

* Update docs

* Update docs

* Add figure

* Update docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* v0.2 VM Scheduling docs refinement (#231)

* Fix typo

* Refining vm scheduling docs

* V0.2 store refinement (#234)

* updated docs and images for rl toolkit

* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Fix bug (#237)

vm scenario: fix the event type bug of the postpone event

* V0.2 rl toolkit doc (#235)

* updated docs and images for rl toolkit

* updated cim example doc

* updated cim exmaple docs

* updated cim example rst

* updated rl_toolkit and cim example docs

* replaced q_module with q_net in example rst

* refined doc

* refined doc

* updated figures

* updated figures

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Merge V0.2 vis into V0.2 (#233)

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* Added manifest file. (#201)

Only a few changes that need to meet requirements of manifest file format.

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

* V0.2 visualization-0.1 (#181)

* visualization 0.1

* render html title function

* flake-8 style fix

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fix the visualization of docs/key_components/distributed_toolkit

* doc refine

* doc update

* params type

* add examples into isort ignore

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>

* image change

* add reset snapshot

* delete dump

* add new line

* add next steps

* import change

* relative import

* add init file

* import change

* change utils file

* change cliexpcetion to clierror

* dashboard test

* change result

* change assertation

* move not

* unit test change

* core change

* unit test delete name_mapping_file

* update cim business engine

* doc update

* change relative path

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* duc update

* duc update

* duc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* change import sequence

* comments update

* doc add pic

* add dependency

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* Update dashboard_visualization.rst

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* delete white space

* doc update

* doc update

* update doc

* update doc

* update doc

Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 docs process mode (#230)

* Update process mode docs and fixed on premises

* Update orchestration docs

* Update process mode docs add JOB_NAME as env variable

* fixed bugs

* fixed isort issue

* update docs index

Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 learning model refinement (#236)

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* misc edits

* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook

* renamed single_host_cim_learner ot cim_learner in notebook

* updated notebook output

* typo fix

* removed dimension check in absence of shared stack

* fixed a typo

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Update vm docs (#241)

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 info update (#240)

* update readme

* update version

* refine reademe format

* add vis gif

* add citation

* update citation

* update badge

Co-authored-by: Arthur Jiang <sjian@microsoft.com>

* Fix typo (#242)

* Fix typo

* fix typo

* fix

* syntax fix (#253)

* syntax fix

* syntax fix

* syntax fix

* rm unwanted import

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm oversubscription (#246)

* Remove topology

* Update pipeline

* Update pipeline

* Update pipeline

* Modify metafile

* Add two attributes of VM

* Update pipeline

* Add vm category

* Add todo

* Add oversub config

* Add oversubscription feature

* Lint fix

* Update based on PR comment.

* Update pipeline

* Update pipeline

* Update config.

* Update based on PR comment

* Update

* Add pm sku feature

* Add sku setting

* Add sku feature

* Lint fix

* Lint style

* Update sku, overloading

* Lint fix

* Lint style

* Fix bug

* Modify config

* Remove sky and replaced it by pm stype

* Add and refactor vm category

* Comment out cofig

* Unify the enum format

* Fix lint style

* Fix import order

* Update based on PR comment

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* V0.2 vm scheduling decision event (#257)

* Fix data preparation bug

* Add frame index

* V0.2 PG, K-step and lambda return utils  (#155)

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* merged with v0.2_embedded_optims

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* revised

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* revised code based on revised abstractions

* fixed some bugs

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* fixed a bug with k-step return in AC

* fixed a bug

* fixed a bug

* merged pg, ac and ppo examples

* fixed a bug

* fixed a bug

* fixed naming for ppo

* renamed some variables in PPO

* added ActionWithLogProbability return type for PO-type algorithms

* fixed a bug

* fixed a bug

* fixed lint issues

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* formatting

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* updated cim PO example code according to changes in maro/rl

* removed early stopping from CIM dqn example

* combined ac and ppo and simplified example code and config

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* put PG and AC under PolicyOptimization class and refined examples accordingly

* fixed lint issues

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* updated cim example for policy optimization

* typo fix

* typo fix

* typo fix

* typo fix

* misc edits

* minor edits to rl_toolkit.rst

* checked out docs from master

* fixed typo in k-step shaper

* fixed lint issues

* bug fix in store

* lint issue fix

* changed default max_ep to 100 for policy_optimization algos

* vis doc update to master (#244)

* refine readme

* feat: refine data push/pull (#138)

* feat: refine data push/pull

* test: add cli provision testing

* fix: style fix

* fix: add necessary comments

* fix: from code review

* add fall back function in weather download (#112)

* fix deployment issue in multi envs

* fix typo

* fix ~/.maro not exist issue in build

* skip deploy when build

* update for comments

* temporarily disable weather info

* replace ecr with cim in setup.py

* replace ecr in manifest

* remove weather check when read data

* fix station id issue

* fix format

* add TODO in comments

* add noaa weather source

* fix weather reset and weather comment

* add comment for weather data url

* some format update

* add fall back function in weather download

* update comment

* update for comments

* update comment

* add period

* fix for pylint

* update for pylint check

* added example docs (#136)

* added example docs

* added citibike greedy example doc

* modified citibike doc

* fixed PR comments

* fixed more PR comments

* fixed small formatting issue

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* switch the key and value of handler_dict in decorator (#144)

* switch the key and value of handler_dict in decorator

* add dist decorator UT and fixed multithreading conflict in maro test suite

* pr comments update.

* resolved comments about decorator UT

* rename handler_fun in dist decorator

* change self.attr into class_name.attr

* update UT tests comments

* V0.1 annotation (#147)

* refine the annotation of simulator core

* remove reward from env(be)

* format refined

* white spaces test

* left-padding spaces refined

* format modifed

* update the left-padding spaces of docstrings

* code format updated

* update according to comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Event payload details for env.summary (#156)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* V0.2 online lp for citi bike (#159)

* key_list of events added for env.summary

* code refined according to lint

* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments

* code format refined

* try trigger the git tests

* update github workflow

* online LP example added for citi bike

* infeasible solution

* infeasible solution fixed: call snapshot before any env.step()

* experiment results of toy topos added

* experiment results of toy topos added

* experiment result update: better than naive baseline

* PuLP version added

* greedy experiment results update

* citibike result update

* modified according to PR comments

* update experiment results and forecasting comparison

* citi bike lp README updated

* README updated

* modified according to PR comments

* update according to PR comments

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* update according to flake8

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)

* 1. added logical operator overloading for early stopping checker; 2. added mean value checker

* fixed PR comments

* removed learner.exit() in single_process_launcher

* added another early stopping checker in example

* fixed PR comments and lint issues

* lint issue fix

* fixed lint issues

* fixed a bug

* fixed a bug

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 skip connection (#176)

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* moved reward type casting to exp shaper

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fixed a bug in learner's test() (#193)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 double dqn (#188)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* set is_double to true in DQN config

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* V0.2 feature predefined image (#183)

* feat: support predefined image provision

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* V0.2 feature proxy rejoin (#158)

* update dist decorator

* replace proxy.get_peers by proxy.peers

* update proxy rejoin (draft, not runable for proxy rejoin)

* fix bugs in proxy

* add message cache, and redesign rejoin parameter

* feat: add checkpoint with test

* update proxy.rejoin

* fixed rejoin bug, rename func

* add test example(temp)

* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.

* capital env vari name

* rm json.dumps; change retries to 10; temp add warning level for rejoin

* fix: unable to load FaultToleranceAgent, missing params

* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent

* feat: add node_id to node_details

* fix: add a new dependency for tests

* style: meet linting requirements

* style: remaining linting problems

* lint fixed; rm temp test folder.

* fixed lint f-string without placeholder

* fix: add a flag for "remove_container", refine restart logic and Redis keys naming

* proxy rejoin update.

* variable rename.

* fixed lint issues

* fixed lint issues

* add exit code for different error

* feat: add special errors handler

* add max rejoin times

* remove unused import

* add rejoin UT; resolve rejoin comments

* lint fixed

* fixed UT import problem

* rm MessageCache in proxy

* fix: refine key naming

* update proxy rejoin; add topic for broadcast

* feat: support predefined image provision

* update UT for communication

* add docstring for rejoin

* fixed isort and zmq driver import

* fixed isort and UT test

* fix isort issue

* proxy rejoin update (comments v2)

* fixed isort error

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* feat: add exists method for checkpoint

* fix: error scripts invocation after using relative import

* fix: missing init.py

* fixed a bug in learner's test()

* add driver close and socket SUB disconnect for rejoin

* feat: add distributed_config for dqn example

* test: update test for grass

* test: update test for k8s

* feat: add promptings for steps

* fix: change relative imports to absolute imports

* fixed comments and update logger level

* mv driver in proxy.__init__ for issue temp fixed.

* Update docstring and comments

* style: fix code reviews problems

* fix code format

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 feature cli windows (#203)

* fix: change local mkdir to os.makedirs

* fix: add utf8 encoding for logger

* fix: add powershell.exe prefix to subprocess functions

* feat: add debug_green

* fix: use fsutil to create fix-size files in Windows

* fix: use universal_newlines=True to handle encoding problem in different operating systems

* fix: use temp file to do copy when the operating system is not Linux

* fix: linting error

* fix: use fsutil in test_k8s.py

* feat: dynamic init ABS_PATH in GlobalParams

* fix: use -Command to execute Powershell command

* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode

* fix: problems in code review

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* V0.2 merge master (#214)

* fix the visualization of docs/key_components/distributed_toolkit

* add examples into isort ignore

* refine import path for examples (#195)

* refine import path for examples

* refine indents

* fixed formatting issues

* update code style

* add editorconfig-checker, add editorconfig path into lint, change super-linter version

* change path for code saving in cim.gnn

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>

* fix issue that sometimes there is conflict between distutils and setuptools  (#208)

* fix issue that cython and setuptools conflict

* follow the accepted temp workaround

* update comment, it should be conflict between setuptools and distutils

* fixed bugs related to proxy interface changes

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* typo fix

* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)

* bug fix

* clear the reference after extract sub events, update ut to cover this issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* fix flake8 style problem

* V0.2 feature refine mode namings (#212)

* feat: refine cli exception

* feat: refine mode namings

* EventBuffer refine (#197)

* merge uniform event changes back

* 1st step: move executing events into stack for better removing performance

* flush event pool

* typo

* add option for env to enable event pool

* refine stack functions

* fix comment issues, add typings

* lint fixing

* lint fix

* add missing fix

* linting

* lint

* use linked list instead original event list and execute stack

* add missing file

* linting, and fixes

* add missing file

* linting fix

* fixing comments

* add missing file

* rename event_list to event_linked_list

* correct import path

* change enable_event_pool to disable_finished_events

* add missing file

* fixed bugs in dist rl

* feat: rename files

* tests: set longer gracefully wait time

* style: fix linting errors

* style: fix linting errors

* style: fix linting errors

* fix: rm redundant variables

* fix: refine error message

Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vis new (#210)

Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* V0.2 local host process (#221)

* Update local process (not ready)

* update cli process mode

* add setup/clear/template for maro process

* fix process stop

* add logger and rename parameters

* add logger for setup/clear

* fixed close not exist pid when given pid list.

* Fixed comments and rename setup/clear with create/delete

* update ProcessInternalError

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* V0.2 grass on premises (#220)

* feat: refine cli exception
* commit on v0.2_grass_on_premises

Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 vm scheduling scenario (#189)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Resolve none action problem (#224)

* V0.2 vm_scheduling notebook (#223)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* Refine cpu reader and unittest

* Lint update

* Refine based on PR comment

* Add agent index

* Add node maping

* Init vm shceduling notebook

* Add notebook

* Refine based on PR comments

* Renaming postpone_step

* Renaming and refine based on PR comments

* Rename config

* Update based on the v0.2_datacenter

* Update notebook

* Update

* update filepath

* notebook updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* Update process mode docs and fixed on premises (#226)

* V0.2 Add github workflow integration (#222)

* test: add github workflow integration

* fix: split procedures && bug fixed

* test: add training only restriction

* fix: add 'approved' restriction

* fix: change default ssh port to 22

* style: in one line

* feat: add timeout for Subprocess.run

* test: change default node_size to Standard_D2s_v3

* style: refine style

* fix: add ssh_port param to on-premises mode

* fix: add missing init.py

* update param name

* V0.2 explorer (#198)

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* added noise explorer

* fixed formatting

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* removed epsilon parameter from choose_action

* fixed some PR comments

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* refined dqn example

* fixed lint issues

* simplified scheduler

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 embedded optim (#191)

* added dueling action value model

* renamed params in dueling_action_value_model

* renamed shared_features to features

* replaced IdentityLayers with nn.Identity

* 1. added skip connection option in FC_net; 2. generalized learning model

* added skip_connection option in config

* removed type casting in fc_net

* fixed lint formatting issues

* refined docstring

* mv dueling_actiovalue_model and fixed some bugs

* added multi-head functionality to LearningModel

* refined learning model docstring

* added head_key param in learningModel forward

* added double DQN and dueling features to DQN

* fixed a bug

* added DuelingQModelHead enum

* fixed a bug

* removed unwanted file

* fixed PR comments

* added top layer logic and is_top option in fc_net

* fixed a bug

* fixed a bug

* reverted some changes in learning model

* reverted some changes in learning model

* added members to learning model to fix the mode issue

* fixed a bug

* fixed mode setting issue in learning model

* fixed PR comments

* revised cim example according to DQN changes

* renamed eval_model to q_value_model in cim example

* more fixes

* fixed a bug

* fixed a bug

* added doc per PR comments

* removed learner.exit() in single_process_launcher

* removed learner.exit() in single_process_launcher

* fixed PR comments

* fixed rl/__init__

* fixed issues in example

* fixed a bug

* fixed a bug

* fixed lint formatting issues

* double DQN feature

* fixed a bug

* fixed a bug

* fixed PR comments

* fixed lint issue

* embedded optimizer into SingleHeadLearningModel

* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm

* added load_models in simple_learner

* minor docstring edits

* minor docstring edits

* minor docstring edits

* mv optimizer options inside LearningMode

* modified example accordingly

* fixed a bug

* fixed a bug

* fixed a bug

* added dueling DQN feature

* revised and refined docstrings

* fixed a bug

* fixed lint issues

* added load/dump functions to LearningModel

* fixed a bug

* fixed a bug

* fixed lint issues

* refined DQN docstrings

* removed load/dump functions from DQN

* added task validator

* fixed decorator use

* fixed a typo

* fixed a bug

* fixed lint issues

* changed LearningModel's step() to take a single loss

* revised learning model design

* revised example

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* added decorator utils to algorithm

* fixed a bug

* renamed core_model to model

* fixed a bug

* 1. fixed lint formatting issues; 2. refined learning model docstrings

* rm trailing whitespaces

* added decorator for choose_action

* fixed a bug

* fixed a bug

* fixed version-related issues

* renamed add_zeroth_dim decorator to expand_dim

* overhauled exploration abstraction

* fixed a bug

* fixed a bug

* fixed a bug

* added exploration related methods to abs_agent

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* separated learning with exploration schedule and without

* small fixes

* moved explorer logic to actor side

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* removed unwanted param from simple agent manager

* small fixes

* added shared_module property to LearningModel

* added shared_module property to LearningModel

* revised __getstate__ for LearningModel

* fixed a bug

* added soft_update function to learningModel

* fixed a bug

* revised learningModel

* rm __getstate__ and __setstate__ from LearningModel

* added noise explorer

* fixed formatting

* removed unnecessary comma

* removed unnecessary comma

* fixed PR comments

* removed unwanted exception and imports

* removed unwanted exception and imports

* fixed a bug

* fixed PR comments

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issue

* fixed a bug

* fixed lint issue

* fixed naming

* combined exploration param generation and early stopping in scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* moved logger inside scheduler

* fixed a bug

* fixed a bug

* fixed a bug

* fixed lint issues

* fixed lint issue

* removed epsilon parameter from choose_action

* removed epsilon parameter from choose_action

* changed agent manager's train parameter to experience_by_agent

* fixed some PR comments

* renamed zero_grad to zero_gradients in LearningModule

* fixed some PR comments

* bug fix

* bug fix

* bug fix

* removed explorer abstraction from agent

* added DEVICE env variable as first choice for torch device

* refined dqn example

* fixed lint issues

* removed unwanted import in cim example

* updated cim-dqn notebook

* simplified scheduler

* edited notebook according to merged scheduler changes

* refined dimension check for learning module manager and removed num_actions from DQNConfig

* bug fix for cim example

* added notebook output

* removed early stopping from CIM dqn example

* removed early stopping from cim example config

* moved decorator logic inside algorithms

* renamed early_stopping_callback to early_stopping_checker

* removed action_dim from noise explorer classes and added some shape checks

* modified NoiseExplorer's __call__ logic to batch processing

* made NoiseExplorer's __call__ return type np array

* renamed update to set_parameters in explorer

* fixed old naming in test_grass

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* V0.2 VM scheduling docs (#228)

* Initialize

* Data center scenario init

* Code style modification

* V0.2 event buffer subevents expand (#180)

* V0.2 rl toolkit refinement (#165)

* refined rl abstractions

* fixed formattin issues

* checked out error-code related code from v0.2_pg

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* fixed a bug

* renamed save_models to dump_models

* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving

* renamed dump_experience_store to dump_experience_pool

* fixed a bug in the dump_experience_pool method

* fixed some PR comments

* fixed more PR comments

* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class

* fixed cim example according to rl toolkit changes

* fixed some more PR comments

* rewrote multi_process_launcher to eliminate the distributed section in config

* 1. fixed a typo; 2. added logging before early stopping

* fixed a bug

* fixed a bug

* fixed a bug

* added early stopping feature to CIM exmaple

* fixed a typo

* fixed some issues with early stopping

* changed early stopping metric func

* fixed a bug

* fixed a bug

* added early stopping to dist mode cim

* added experience collecting func

* edited notebook according to changes in CIM example

* fixed bugs in nb

* fixed lint formatting issues

* fixed a typo

* fixed some PR comments

* fixed more PR comments

* revised docs

* removed nb output

* fixed a bug in simple_learner

* fixed a typo in nb

* fixed a bug

* fixed a bug

* fixed a bug

* removed unused import

* fixed a bug

* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing

* fixed some doc issues

* added output to nb

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* unfold sub-events, insert after parent

* remove event category, use different class instead, add helper functions to gen decision and action event

* add a method to support add immediate event to cascade event with tick validation

* fix ut issue

* add action as 1st sub event to ensure the executing order

Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Data center scenario update

* Code style update

* Data scenario business engine update

* Isort update

* Fix lint code check

* Fix based on PR comments.

* Update based on PR comments.

* Add decision payload

* Add config file

* Update utilization series logic

* Update based on PR comment

* Update based on PR

* Update

* Update

* Add the ValidPm class

* Update docs string and naming

* Add energy consumption

* Lint code fixed

* Refining postpone function

* Lint style update

* Init data pipeline

* Update based on PR comment

* Add data pipeline download

* Lint style update

* Code style fix

* Temp update

* Data pipeline update

* Add aria2p download function

* Update based on PR comment

* Update based on PR comment

* Update based on PR comment

* Update naming of variables

* Rename topology

* Renaming

* Fix valid pm list

* Pylint fix

* Update comment

* Update docstring and comment

* Fix init import

* Update tick issue

* fix merge problem

* update style

* V0.2 datacenter data pipeline (#199)

* Data pipeline update

* Data pipeline update

* Lint update

* Update pipeline

* Add vmid mapping

* Update lint style

* Add VM data analytics

* Update notebook

* Add binary converter

* Modift vmtable yaml

* Update binary meta file

* Add cpu reader

* random example added for data center

* Fix bugs

* Fix pylint

* Add launcher

* Fix pylint

* best fit policy added

* Add reset

* Add config

* Add config

* Modify action object

* Modify config

* Fix naming

* Modify config

* Add snapshot list

* Modify a spelling typo

* Update based on PR comments.

* Rename scenario to vm scheduling

* Rename scenario

* Update print messages

* Lint fix

* Lint fix

* Rename scenario

* Modify the calculation of cpu utilization

* Add comment

* Modify data pipeline path

* Fix typo

* Modify naming

* Add unittest

* Add comment

* Unify naming

* Fix data path typo

* Update comments

* Update snapshot features

* Add take snapshot

* Add summary keys

* Update cpu reader

* Update naming

* Add unit test

* Rename snapshot node

* Add processed data pipeline

* Modify config

* Add comment

* Lint style fix

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>

* Add package used in vm_scheduling

* add aria2p to test requirement

* best fit example: update the usage of snapshot

* Add aria2p to test requriement

* Remove finish event

* Fix unittest

* Add test dataset

* Update based on PR comment

* vm doc init

* Update docs

* Update docs

* Update docs

* Update docs

* Remove old notebook

* Update docs

* Update docs

* Add figure

* Update docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* doc update

* new link

* image update

* v0.2 VM Scheduling docs refinement (#231)

* Fix typo

* Refining vm scheduling docs

* image change

* V0.2 store refinement (#234)

* updated docs and images for rl toolkit

* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Fix bug (#237)

vm scenario: fix the event type bug of the postpone event

* V0.2 rl toolkit doc (#235)

* updated docs and images for rl toolkit

* updated cim example doc

* updated cim exmaple docs

* updated cim example rst

* updated rl_toolkit and cim example docs

* replaced q_module with q_net in example rst

* refined doc

* refined doc

* updated figures

* updated figures

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Merge V0.2 vis into V0.2 (#233)

* Implemented dump snapshots and convert to CSV.

* Let BE supports params when dump snapshot.

* Refactor dump code to core.py

* Implemented decision event dump.

* replace is not '' with !=''

* Fixed issues that code review mentioned.

* removed path from hello.py

* Changed import sort.

* Fix  import sorting in citi_bike/business_engine

* visualization 0.1

* Updated lint configurations.

* Fixed formatting error that caused lint errors.

* render html title function

* Try to fix lint errors.

* flake-8 style fix

* remove space around 18,35

* dump_csv_converter.py re-formatting.

* files re-formatting.

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* re-formatting after merged upstream.

* Updated import section.

* Updated import section.

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* V0.2 vis dump feature enhancement. (#190)

* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* doc refine

* doc update

* params type

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* Added manifest file. (#201)

Only a few changes that need to meet requirements of manifest file format.

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

* V0.2 visualization-0.1 (#181)

* visualization 0.1

* render html title function

* flake-8 style fix

* style fixed

* tab delete

* white space fix

* white space fix-2

* vis redundant function delete

* refine

* pr refine

* isort fix

* white space

* lint error

* \n error

* test continuation

* indent

* continuation of indent

* indent 0.3

* comment update

* comment update 0.2

* f-string update

* f-string 0.2

* lint 0.3

* lint 0.4

* lint 0.4

* lint 0.5

* lint 0.6

* docstring update

* data version deploy update

* condition update

* add whitespace

* deploy info update; docs update

* weird white space

* Update dashboard_visualization.md

* new endline?

* delete dependency

* delete irrelevant file

* change scenario to enum, divide file path into a separated class

* fix the visualization of docs/key_components/distributed_toolkit

* doc refine

* doc update

* params type

* add examples into isort ignore

* data structure update

* doc&enum, formula refine

* refine

* add ut, refine doc

* style refine

* isort

* strong type fix

* os._exit delete

* revert datalib

* import new line

* change test case

* change file name & doc

* change deploy path

* delete params

* revert file

* delete duplicate file

* delete single process

* update naming

* manually change import order

* delete blank

* edit error

* requirement txt

* style fix & refine

* comments&docstring refine

* add parameter name

* test & dump

* comments update

* comments fix

* delete toolkit change

* doc update

* citi bike update

* deploy path

* datalib update

* revert datalib

* revert

* maro file format

* comments update

* doc update

* update param name

* doc update

* new link

* image update

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>

* image change

* add reset snapshot

* delete dump

* add new line

* add next steps

* import change

* relative import

* add init file

* import change

* change utils file

* change cliexpcetion to clierror

* dashboard test

* change result

* change assertation

* move not

* unit test change

* core change

* unit test delete name_mapping_file

* update cim business engine

* doc update

* change relative path

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* duc update

* duc update

* duc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* change import sequence

* comments update

* doc add pic

* add dependency

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* Update dashboard_visualization.rst

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* delete white space

* doc update

* doc update

* update doc

* update doc

* update doc

Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 docs process mode (#230)

* Update process mode docs and fixed on premises

* Update orchestration docs

* Update process mode docs add JOB_NAME as env variable

* fixed bugs

* fixed isort issue

* update docs index

Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 learning model refinement (#236)

* moved optimizer options to LearningModel

* typo fix

* fixed lint issues

* updated notebook

* misc edits

* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook

* renamed single_host_cim_learner ot cim_learner in notebook

* updated notebook output

* typo fix

* removed dimension check in absence of shared stack

* fixed a typo

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Update vm docs (#241)

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 info update (#240)

* update readme

* update version

* refine reademe format

* add vis gif

* add citation

* update citation

* update badge

Co-authored-by: Arthur Jiang <sjian@microsoft.com>

* Fix typo (#242)

* Fix typo

* fix typo

* fix

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

* doc update

Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* bug fix related to np array divide (#245)

Co-authored-by: ysqyang <v-yangqi@microsoft.com>

* Master.simple bike (#250)

* notebook for simple bike repositioning added

* add simple rule-based algorithms

* unify input

* add policy based on statistics

* update be for simple bike scenario to fit latest event buffer changes (#247)

* change rendered graph

* figures updated

* change notebook

* matplot updated

* figures updated

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: wesley <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>

* simple bike repositioning article: formula updated

* checked out docs/source from v0.2

* aligned with v0.2

* rm unwanted import

* added references in policy_optimization.py

* fixed lint issues

Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>

* V0.2 backend dynamic node support (#172)

* update lint workflow

* fix workflow issue

* Update lint.yml

* Create tox.ini

* Update lint.yml

* Update lint.yml

* Update tox.ini

* Update lint.yml

* Delete tox.ini from root folder, move it to .github/linters

* Update CONTRIBUTING.md

* add more comments

* update lint conf to ignore cli banner issue

* change extension implementation from c to cpp

* update script to gen cpp files

* backend base interface redefine

* interface revamp for np backend

* 1st step for revamp

* bug fix

* draft

* implementation of attribute

* implementation of backend

* remove  backend switching

* draft raw backend wrapper

* correct function parameter type

* 1st runable version

* bug fix for types

* ut passed

* change CRLF to LF

* fix get_node_info interface

* add raw test in frame ut

* return np.array for all query result

* use ticks from backend

* set init value

* snapshot ut passed

* support set default backend by environemnt variable

* env ut with different backend

* fix take snapshot index bug

* test under both backends

* ignore generated cpp file

* fix lint isues

* more lint fix

* use ordered map to store ticks to keep the order

* remove test code

* refine dup code

* refine code to avoid too much if/else

* handle and raise exception for attr getter

* change the way to handle cpp exception, use cython runtimeerror instead

* add missing function, and fix bug in np impl

* fix lint issue

* specify c++11 flag for compilers

* use normal field assignment instead initializer list, as linux gcc will complain it

* add np ignore macro

* try to refine token pasting operator to avoid error on linux

* more pasting operator issue fix

* remove un-used options

* update workflow files to fit new backend

* 1st version of dynamic backend structure

* setup ut for cpp using lest

* bitset complete

* attributestore and ut

* arrange

* copy_to

* current frame

* ut for frame

* bug fix and ut correct

* fix issue that value not correct after arrange

* fix bug in test case

* frame update

* change the way to add nodes, support add node from middle

* frame in backend

* snapshotlist code complete

* add size method for snapshotlist, add ut template

* make sure snapshot max size not be 0

* add max size

* fix query parameters

* fix attribute store extend error

* add function to retrieve attribute from snapshotlist

* return nan for invalid index

* add function to check if nan for float attribute only

* fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash

* add functions to expose internal state under debug mode, make it easy to do unit test

* fix issue that cause overlap logic skiped

* ut passed for all implemented functions

* remove query in ut, as it not completed yet

* refine querying interfaces, use 2 functions for 1 querying

* snapshot query,

* use pointer instead weak_ptr

* backend impl

* set default parameters value

* query bug fix,

* bug fix: new_attr should return attr id not node id

* use macro to create attribute getters

* add reset support

* change the way to reset, avoid allocation time

* test reset for attributestore

* use Bitset instead vector<bool> to make it easy to reset

* refine backend interfaces to make it compact with old one

* correct quering interface, cython compile passed

* bug fix: get_ticks not set correct index

* correct cpp backend binding, add type for frame

* correct ut for snapshot

* bug fix: query cause crash after snapshot reset

* fix env test

* bug fix: is_nan should check data type first

* fix cim ut issues with raw backend

* fix citibike ut issues for raw backend

* add interfaces to support dynamic nodes, not tested

* bug fix: access cpp object without cdef

* bug fix: missing impl for dynamic methods

* ut for append nodes

* return node number dynamiclly

* remove unused parameters for snapshot

* remove unused code

* allow get attribute for deleted node

* ut for delete and resume node

* function to set attribute slot

* bug fix: set attribute will cause crash

* bug fix: remove append node when reset cause exception

* bug fix: frame.backend_type return incorrect name

* backends performance comparison

* correct internal type

* correct warnings

* missing ;

* formating

* fix lint issue

* simple the way to copy mapping

* add dump interfaces

* frame dump

* ignore if dump path is not exist

* bug fix: use max slots instead of current slots for padding in snapshot querying

* use max slot number in history instead of current for padding

* dump for snapshot

* close file at the end

* refine snapshot dump function

* fix lint issue

* avoid too much allocate operation

* use pointer instead reference for furthure changes

* avoid 2 times map copy

* add comments for missing functions

* performance optimize

* use emplace instead push

* use emplace instead  push

* remove cpp files

* add missing lisence

* ignore .vs folder

* add lest lisence for cpp unittest

* Delete CMakeLists.txt

* add error msg for exception, make it easy to identify error at python side

* remove old codes

* replace with new code

* change IDENTIER to NODE_TYPE and ATTR_TYPE

* build pass

* fix attr type not correct bug

* reomve unused comment

* make frame ut pass

* correct the max snapshots checking

* fix test case

* add missing file

* correct performance test

* refine attribute code

* refine bitset code

* update FrameBase doc about switch backend

* correct the exception name

* refine frame code

* refine node code

* refine snapshot list code

* add is_const and is_list when adding attribute

* support query const attribute without tick exist

* add operations for list attribute

* remove cache as we have list attribute

* add remove and insert for list attribute

* add for-loop support for list attribute

* fix bug that not update list attribute slot number after operations

* test for dynamic features

* frame dump

* dump for snapshot list

* fix issue on gcc compiler

* add missing file

* fix lint issues

* refine the exception, more comments

* fix lint issue

* fix lint issue

* use simulate enum instead of str

* Use new type instead old in tests

* using mapping instead if-else

* remove generated code

* use mapping to reduce too much if-else

* add default attribute type int if not provided or invalid provided

* remove generated code

* update workflow with code gen

* more frame test

* add missing files

* test: cover maro.simulator.utils.common

* update test with new scenario

* comments

* tests

* update doc

* fix lint and comments

* CRLF to LF

* fix lint issue

Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

* V0.2 vm oversub docs (#256)

* Remove topology

* Update pipeline

* Update pipeline

* Update pipeline

* Modify metafile

* Add two attributes of VM

* Update pipeline

* Add vm category

* Add todo

* Add oversub config

* Add oversubscription feature

* Lint fix

* Update based on PR comment.

* Update pipeline

* Update pipeline

* Update config.

* Update based on PR comment

* Update

* Add pm sku feature

* Add sku setting

* Add sku feature

* Lint fix

* Lint style

* Update sku, overloading

* Lint fix

* Lint style

* Fix bug

* Modify config

* Remove sky and replaced it by pm stype

* Add and refactor vm category

* Comment out cofig

* Unify the enum format

* Fix lint style

* Fix import order

* Update based on PR comment

* Update overload to the VM docs

* Update docs

* Update vm docs

Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>

Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>

2021-01-25 18:23:41 +08:00

test_bike_scenario.py

V0.2 update (#262 )

2021-01-25 18:23:41 +08:00