maro/tests/utils.py

68 строки
1.6 KiB
Python
Исходник Постоянная ссылка Обычный вид История

2020-09-18 06:02:37 +03:00
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from maro.event_buffer import EventBuffer, EventState
V0.1 improve code quality with isort (#186) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.1 issues in CLI (#157) * fix: add docs for AzCopy * style: refine code style * fix: change stdout format to previous version * test: add dqn tests in grass/k8s mode && add "maro k8s job list" * docs: add scripts for installing AzCopy * v0.1 update citi bike doc and data version (#160) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * V0.1revert data version to 0.1 (#164) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * update data pipeline docs and data version * update for comments * update for comments * update for lint * update for lint * update for lint * revert data version to 0.1 * initalize gnn network for CIM problem (#60) * initalize gnn netowrk for ECR problem * fix import path error and remove useless function * add the annotation for gnn and state shaper * rename to cim * modify topology 22p * polish coding style in cim.gnn * run pass after polishing * change the name of simplegat to simple_transformer * fix typo and suggestions * fix format * fix code bugs due to refactoring mistake * refine the code format * refine again * refine the format * refine again * refine again * fix syntax bug * refine again * remove useless comments * remove unnecessary blank line * rename pending_event to decision_event * remove some code faced to the future. * change topology * fix bugs in learner: no training logic executed * revert the topology changes Co-authored-by: wenshi-113 <wenshi@microsoft.com> * V0.1 store unpicklable bug fix (#166) * fixed unpicklable store bug * fixed a bug * fixed a bug * fixed a bug * fixed lint formatting * fixed a lint formatting issue * fixed a bug * renamed dump_experience_store to dump_experience_pool * fixed a PR comment Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.1 confusing component naming fix (#167) * confusing component naming fix * fixed a bug * fixed a bug * fixed lint formatting issues * fixed a PR comment * replaced 'actor_worker' with 'actor' * fixed a minor formatting issue * fixed a minor formatting issue * fixed a minor formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * version updated to 0.1.2a0 * modified according to flake8 * update according to lint checker * update import format according to lint checker * update according to lint checker * isort config added * add isort config file path in lint * update import according to lint isort * update import according to lint isort * skip file added for isort linter * filter files option added for isort * V0.1 reformat imports with isort (#184) * init auto-isort * re-format maro init * redis added to isort known third party * update according to lint isort * update according to lint isort * update according to lint * update import path for communication module * update import path for cli scripts module * update according to lint * filter file option added for isort in CONTRIBUTING * update import path for cli utils and communication * update import path for tests * remove filter file option in CONTRIBUTING Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * update according to isort Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: WesleyStone <4548088+wesley-stone@users.noreply.github.com> Co-authored-by: wenshi-113 <wenshi@microsoft.com>
2020-11-05 12:53:04 +03:00
from maro.simulator.scenarios import AbsBusinessEngine
V0.2 update (#262) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * syntax fix (#253) * syntax fix * syntax fix * syntax fix * rm unwanted import Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm oversubscription (#246) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 vm scheduling decision event (#257) * Fix data preparation bug * Add frame index * V0.2 PG, K-step and lambda return utils (#155) * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * merged with v0.2_embedded_optims * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * revised * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * revised code based on revised abstractions * fixed some bugs * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added shared_module property to LearningModel * added shared_module property to LearningModel * fixed a bug with k-step return in AC * fixed a bug * fixed a bug * merged pg, ac and ppo examples * fixed a bug * fixed a bug * fixed naming for ppo * renamed some variables in PPO * added ActionWithLogProbability return type for PO-type algorithms * fixed a bug * fixed a bug * fixed lint issues * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * formatting * fixed formatting * removed unnecessary comma * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * updated cim PO example code according to changes in maro/rl * removed early stopping from CIM dqn example * combined ac and ppo and simplified example code and config * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * put PG and AC under PolicyOptimization class and refined examples accordingly * fixed lint issues * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * updated cim example for policy optimization * typo fix * typo fix * typo fix * typo fix * misc edits * minor edits to rl_toolkit.rst * checked out docs from master * fixed typo in k-step shaper * fixed lint issues * bug fix in store * lint issue fix * changed default max_ep to 100 for policy_optimization algos * vis doc update to master (#244) * refine readme * feat: refine data push/pull (#138) * feat: refine data push/pull * test: add cli provision testing * fix: style fix * fix: add necessary comments * fix: from code review * add fall back function in weather download (#112) * fix deployment issue in multi envs * fix typo * fix ~/.maro not exist issue in build * skip deploy when build * update for comments * temporarily disable weather info * replace ecr with cim in setup.py * replace ecr in manifest * remove weather check when read data * fix station id issue * fix format * add TODO in comments * add noaa weather source * fix weather reset and weather comment * add comment for weather data url * some format update * add fall back function in weather download * update comment * update for comments * update comment * add period * fix for pylint * update for pylint check * added example docs (#136) * added example docs * added citibike greedy example doc * modified citibike doc * fixed PR comments * fixed more PR comments * fixed small formatting issue Co-authored-by: ysqyang <v-yangqi@microsoft.com> * switch the key and value of handler_dict in decorator (#144) * switch the key and value of handler_dict in decorator * add dist decorator UT and fixed multithreading conflict in maro test suite * pr comments update. * resolved comments about decorator UT * rename handler_fun in dist decorator * change self.attr into class_name.attr * update UT tests comments * V0.1 annotation (#147) * refine the annotation of simulator core * remove reward from env(be) * format refined * white spaces test * left-padding spaces refined * format modifed * update the left-padding spaces of docstrings * code format updated * update according to comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * update according to flake8 * re-formatting after merged upstream. * Updated import section. * Updated import section. * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * update param name * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * doc update * new link * image update * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * image change * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * bug fix related to np array divide (#245) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Master.simple bike (#250) * notebook for simple bike repositioning added * add simple rule-based algorithms * unify input * add policy based on statistics * update be for simple bike scenario to fit latest event buffer changes (#247) * change rendered graph * figures updated * change notebook * matplot updated * figures updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: wesley <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * simple bike repositioning article: formula updated * checked out docs/source from v0.2 * aligned with v0.2 * rm unwanted import * added references in policy_optimization.py * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 backend dynamic node support (#172) * update lint workflow * fix workflow issue * Update lint.yml * Create tox.ini * Update lint.yml * Update lint.yml * Update tox.ini * Update lint.yml * Delete tox.ini from root folder, move it to .github/linters * Update CONTRIBUTING.md * add more comments * update lint conf to ignore cli banner issue * change extension implementation from c to cpp * update script to gen cpp files * backend base interface redefine * interface revamp for np backend * 1st step for revamp * bug fix * draft * implementation of attribute * implementation of backend * remove backend switching * draft raw backend wrapper * correct function parameter type * 1st runable version * bug fix for types * ut passed * change CRLF to LF * fix get_node_info interface * add raw test in frame ut * return np.array for all query result * use ticks from backend * set init value * snapshot ut passed * support set default backend by environemnt variable * env ut with different backend * fix take snapshot index bug * test under both backends * ignore generated cpp file * fix lint isues * more lint fix * use ordered map to store ticks to keep the order * remove test code * refine dup code * refine code to avoid too much if/else * handle and raise exception for attr getter * change the way to handle cpp exception, use cython runtimeerror instead * add missing function, and fix bug in np impl * fix lint issue * specify c++11 flag for compilers * use normal field assignment instead initializer list, as linux gcc will complain it * add np ignore macro * try to refine token pasting operator to avoid error on linux * more pasting operator issue fix * remove un-used options * update workflow files to fit new backend * 1st version of dynamic backend structure * setup ut for cpp using lest * bitset complete * attributestore and ut * arrange * copy_to * current frame * ut for frame * bug fix and ut correct * fix issue that value not correct after arrange * fix bug in test case * frame update * change the way to add nodes, support add node from middle * frame in backend * snapshotlist code complete * add size method for snapshotlist, add ut template * make sure snapshot max size not be 0 * add max size * fix query parameters * fix attribute store extend error * add function to retrieve attribute from snapshotlist * return nan for invalid index * add function to check if nan for float attribute only * fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash * add functions to expose internal state under debug mode, make it easy to do unit test * fix issue that cause overlap logic skiped * ut passed for all implemented functions * remove query in ut, as it not completed yet * refine querying interfaces, use 2 functions for 1 querying * snapshot query, * use pointer instead weak_ptr * backend impl * set default parameters value * query bug fix, * bug fix: new_attr should return attr id not node id * use macro to create attribute getters * add reset support * change the way to reset, avoid allocation time * test reset for attributestore * use Bitset instead vector<bool> to make it easy to reset * refine backend interfaces to make it compact with old one * correct quering interface, cython compile passed * bug fix: get_ticks not set correct index * correct cpp backend binding, add type for frame * correct ut for snapshot * bug fix: query cause crash after snapshot reset * fix env test * bug fix: is_nan should check data type first * fix cim ut issues with raw backend * fix citibike ut issues for raw backend * add interfaces to support dynamic nodes, not tested * bug fix: access cpp object without cdef * bug fix: missing impl for dynamic methods * ut for append nodes * return node number dynamiclly * remove unused parameters for snapshot * remove unused code * allow get attribute for deleted node * ut for delete and resume node * function to set attribute slot * bug fix: set attribute will cause crash * bug fix: remove append node when reset cause exception * bug fix: frame.backend_type return incorrect name * backends performance comparison * correct internal type * correct warnings * missing ; * formating * fix lint issue * simple the way to copy mapping * add dump interfaces * frame dump * ignore if dump path is not exist * bug fix: use max slots instead of current slots for padding in snapshot querying * use max slot number in history instead of current for padding * dump for snapshot * close file at the end * refine snapshot dump function * fix lint issue * avoid too much allocate operation * use pointer instead reference for furthure changes * avoid 2 times map copy * add comments for missing functions * performance optimize * use emplace instead push * use emplace instead push * remove cpp files * add missing lisence * ignore .vs folder * add lest lisence for cpp unittest * Delete CMakeLists.txt * add error msg for exception, make it easy to identify error at python side * remove old codes * replace with new code * change IDENTIER to NODE_TYPE and ATTR_TYPE * build pass * fix attr type not correct bug * reomve unused comment * make frame ut pass * correct the max snapshots checking * fix test case * add missing file * correct performance test * refine attribute code * refine bitset code * update FrameBase doc about switch backend * correct the exception name * refine frame code * refine node code * refine snapshot list code * add is_const and is_list when adding attribute * support query const attribute without tick exist * add operations for list attribute * remove cache as we have list attribute * add remove and insert for list attribute * add for-loop support for list attribute * fix bug that not update list attribute slot number after operations * test for dynamic features * frame dump * dump for snapshot list * fix issue on gcc compiler * add missing file * fix lint issues * refine the exception, more comments * fix lint issue * fix lint issue * use simulate enum instead of str * Use new type instead old in tests * using mapping instead if-else * remove generated code * use mapping to reduce too much if-else * add default attribute type int if not provided or invalid provided * remove generated code * update workflow with code gen * more frame test * add missing files * test: cover maro.simulator.utils.common * update test with new scenario * comments * tests * update doc * fix lint and comments * CRLF to LF * fix lint issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 vm oversub docs (#256) * Remove topology * Update pipeline * Update pipeline * Update pipeline * Modify metafile * Add two attributes of VM * Update pipeline * Add vm category * Add todo * Add oversub config * Add oversubscription feature * Lint fix * Update based on PR comment. * Update pipeline * Update pipeline * Update config. * Update based on PR comment * Update * Add pm sku feature * Add sku setting * Add sku feature * Lint fix * Lint style * Update sku, overloading * Lint fix * Lint style * Fix bug * Modify config * Remove sky and replaced it by pm stype * Add and refactor vm category * Comment out cofig * Unify the enum format * Fix lint style * Fix import order * Update based on PR comment * Update overload to the VM docs * Update docs * Update vm docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: zhanyu wang <pocket_2001@163.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com> Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>
2021-01-25 13:23:41 +03:00
backends_to_test = ["static", "dynamic"]
2020-09-18 06:02:37 +03:00
def next_step(eb: EventBuffer, be: AbsBusinessEngine, tick: int):
if tick > 0:
# lets post process last tick first before start a new tick
is_done = be.post_step(tick - 1)
if is_done:
return True
be.step(tick)
pending_events = eb.execute(tick)
if len(pending_events) != 0:
for evt in pending_events:
evt.state = EventState.FINISHED
eb.execute(tick)
be.frame.take_snapshot(tick)
return False
V0.2 (#239) * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
2021-01-04 14:49:05 +03:00
2020-09-18 06:02:37 +03:00
def be_run_to_end(eb, be):
is_done = False
tick = 0
while not is_done:
is_done = next_step(eb, be, tick)
V0.2 (#239) * Event payload details for env.summary (#156) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * V0.2 online lp for citi bike (#159) * key_list of events added for env.summary * code refined according to lint * 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments * code format refined * try trigger the git tests * update github workflow * online LP example added for citi bike * infeasible solution * infeasible solution fixed: call snapshot before any env.step() * experiment results of toy topos added * experiment results of toy topos added * experiment result update: better than naive baseline * PuLP version added * greedy experiment results update * citibike result update * modified according to PR comments * update experiment results and forecasting comparison * citi bike lp README updated * README updated * modified according to PR comments * update according to PR comments Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * update according to flake8 * V0.2 Logical operator overloading for EarlyStoppingChecker (#178) * 1. added logical operator overloading for early stopping checker; 2. added mean value checker * fixed PR comments * removed learner.exit() in single_process_launcher * added another early stopping checker in example * fixed PR comments and lint issues * lint issue fix * fixed lint issues * fixed a bug * fixed a bug Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 skip connection (#176) * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * moved reward type casting to exp shaper Co-authored-by: ysqyang <v-yangqi@microsoft.com> * fixed a bug in learner's test() (#193) Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 double dqn (#188) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * set is_double to true in DQN config Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature predefined image (#183) * feat: support predefined image provision * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com> * V0.2 feature proxy rejoin (#158) * update dist decorator * replace proxy.get_peers by proxy.peers * update proxy rejoin (draft, not runable for proxy rejoin) * fix bugs in proxy * add message cache, and redesign rejoin parameter * feat: add checkpoint with test * update proxy.rejoin * fixed rejoin bug, rename func * add test example(temp) * feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents. * capital env vari name * rm json.dumps; change retries to 10; temp add warning level for rejoin * fix: unable to load FaultToleranceAgent, missing params * fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent * feat: add node_id to node_details * fix: add a new dependency for tests * style: meet linting requirements * style: remaining linting problems * lint fixed; rm temp test folder. * fixed lint f-string without placeholder * fix: add a flag for "remove_container", refine restart logic and Redis keys naming * proxy rejoin update. * variable rename. * fixed lint issues * fixed lint issues * add exit code for different error * feat: add special errors handler * add max rejoin times * remove unused import * add rejoin UT; resolve rejoin comments * lint fixed * fixed UT import problem * rm MessageCache in proxy * fix: refine key naming * update proxy rejoin; add topic for broadcast * feat: support predefined image provision * update UT for communication * add docstring for rejoin * fixed isort and zmq driver import * fixed isort and UT test * fix isort issue * proxy rejoin update (comments v2) * fixed isort error * style: fix linting errors * style: fix linting errors * style: fix linting errors * style: fix linting errors * feat: add exists method for checkpoint * fix: error scripts invocation after using relative import * fix: missing init.py * fixed a bug in learner's test() * add driver close and socket SUB disconnect for rejoin * feat: add distributed_config for dqn example * test: update test for grass * test: update test for k8s * feat: add promptings for steps * fix: change relative imports to absolute imports * fixed comments and update logger level * mv driver in proxy.__init__ for issue temp fixed. * Update docstring and comments * style: fix code reviews problems * fix code format Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 feature cli windows (#203) * fix: change local mkdir to os.makedirs * fix: add utf8 encoding for logger * fix: add powershell.exe prefix to subprocess functions * feat: add debug_green * fix: use fsutil to create fix-size files in Windows * fix: use universal_newlines=True to handle encoding problem in different operating systems * fix: use temp file to do copy when the operating system is not Linux * fix: linting error * fix: use fsutil in test_k8s.py * feat: dynamic init ABS_PATH in GlobalParams * fix: use -Command to execute Powershell command * fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode * fix: problems in code review * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * V0.2 merge master (#214) * fix the visualization of docs/key_components/distributed_toolkit * add examples into isort ignore * refine import path for examples (#195) * refine import path for examples * refine indents * fixed formatting issues * update code style * add editorconfig-checker, add editorconfig path into lint, change super-linter version * change path for code saving in cim.gnn Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> * fix issue that sometimes there is conflict between distutils and setuptools (#208) * fix issue that cython and setuptools conflict * follow the accepted temp workaround * update comment, it should be conflict between setuptools and distutils * fixed bugs related to proxy interface changes Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * typo fix * Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215) * bug fix * clear the reference after extract sub events, update ut to cover this issue Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * fix flake8 style problem * V0.2 feature refine mode namings (#212) * feat: refine cli exception * feat: refine mode namings * EventBuffer refine (#197) * merge uniform event changes back * 1st step: move executing events into stack for better removing performance * flush event pool * typo * add option for env to enable event pool * refine stack functions * fix comment issues, add typings * lint fixing * lint fix * add missing fix * linting * lint * use linked list instead original event list and execute stack * add missing file * linting, and fixes * add missing file * linting fix * fixing comments * add missing file * rename event_list to event_linked_list * correct import path * change enable_event_pool to disable_finished_events * add missing file * fixed bugs in dist rl * feat: rename files * tests: set longer gracefully wait time * style: fix linting errors * style: fix linting errors * style: fix linting errors * fix: rm redundant variables * fix: refine error message Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vis new (#210) Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> * V0.2 local host process (#221) * Update local process (not ready) * update cli process mode * add setup/clear/template for maro process * fix process stop * add logger and rename parameters * add logger for setup/clear * fixed close not exist pid when given pid list. * Fixed comments and rename setup/clear with create/delete * update ProcessInternalError * V0.2 grass on premises (#220) * feat: refine cli exception * commit on v0.2_grass_on_premises Co-authored-by: Lyuchun Huang <romic.kid@gmail.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 vm scheduling scenario (#189) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Resolve none action problem (#224) * V0.2 vm_scheduling notebook (#223) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * Refine cpu reader and unittest * Lint update * Refine based on PR comment * Add agent index * Add node maping * Init vm shceduling notebook * Add notebook * Refine based on PR comments * Renaming postpone_step * Renaming and refine based on PR comments * Rename config * Update based on the v0.2_datacenter * Update notebook * Update * update filepath * notebook updated Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * Update process mode docs and fixed on premises (#226) * V0.2 Add github workflow integration (#222) * test: add github workflow integration * fix: split procedures && bug fixed * test: add training only restriction * fix: add 'approved' restriction * fix: change default ssh port to 22 * style: in one line * feat: add timeout for Subprocess.run * test: change default node_size to Standard_D2s_v3 * style: refine style * fix: add ssh_port param to on-premises mode * fix: add missing init.py * V0.2 explorer (#198) * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * added noise explorer * fixed formatting * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * removed epsilon parameter from choose_action * fixed some PR comments * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * refined dqn example * fixed lint issues * simplified scheduler * removed early stopping from CIM dqn example * removed early stopping from cim example config * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 embedded optim (#191) * added dueling action value model * renamed params in dueling_action_value_model * renamed shared_features to features * replaced IdentityLayers with nn.Identity * 1. added skip connection option in FC_net; 2. generalized learning model * added skip_connection option in config * removed type casting in fc_net * fixed lint formatting issues * refined docstring * mv dueling_actiovalue_model and fixed some bugs * added multi-head functionality to LearningModel * refined learning model docstring * added head_key param in learningModel forward * added double DQN and dueling features to DQN * fixed a bug * added DuelingQModelHead enum * fixed a bug * removed unwanted file * fixed PR comments * added top layer logic and is_top option in fc_net * fixed a bug * fixed a bug * reverted some changes in learning model * reverted some changes in learning model * added members to learning model to fix the mode issue * fixed a bug * fixed mode setting issue in learning model * fixed PR comments * revised cim example according to DQN changes * renamed eval_model to q_value_model in cim example * more fixes * fixed a bug * fixed a bug * added doc per PR comments * removed learner.exit() in single_process_launcher * removed learner.exit() in single_process_launcher * fixed PR comments * fixed rl/__init__ * fixed issues in example * fixed a bug * fixed a bug * fixed lint formatting issues * double DQN feature * fixed a bug * fixed a bug * fixed PR comments * fixed lint issue * embedded optimizer into SingleHeadLearningModel * 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm * added load_models in simple_learner * minor docstring edits * minor docstring edits * minor docstring edits * mv optimizer options inside LearningMode * modified example accordingly * fixed a bug * fixed a bug * fixed a bug * added dueling DQN feature * revised and refined docstrings * fixed a bug * fixed lint issues * added load/dump functions to LearningModel * fixed a bug * fixed a bug * fixed lint issues * refined DQN docstrings * removed load/dump functions from DQN * added task validator * fixed decorator use * fixed a typo * fixed a bug * fixed lint issues * changed LearningModel's step() to take a single loss * revised learning model design * revised example * fixed a bug * fixed a bug * fixed a bug * fixed a bug * added decorator utils to algorithm * fixed a bug * renamed core_model to model * fixed a bug * 1. fixed lint formatting issues; 2. refined learning model docstrings * rm trailing whitespaces * added decorator for choose_action * fixed a bug * fixed a bug * fixed version-related issues * renamed add_zeroth_dim decorator to expand_dim * overhauled exploration abstraction * fixed a bug * fixed a bug * fixed a bug * added exploration related methods to abs_agent * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * separated learning with exploration schedule and without * small fixes * moved explorer logic to actor side * fixed a bug * fixed a bug * fixed a bug * fixed a bug * removed unwanted param from simple agent manager * small fixes * added shared_module property to LearningModel * added shared_module property to LearningModel * revised __getstate__ for LearningModel * fixed a bug * added soft_update function to learningModel * fixed a bug * revised learningModel * rm __getstate__ and __setstate__ from LearningModel * added noise explorer * fixed formatting * removed unnecessary comma * removed unnecessary comma * fixed PR comments * removed unwanted exception and imports * removed unwanted exception and imports * fixed a bug * fixed PR comments * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issue * fixed a bug * fixed lint issue * fixed naming * combined exploration param generation and early stopping in scheduler * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * moved logger inside scheduler * fixed a bug * fixed a bug * fixed a bug * fixed lint issues * fixed lint issue * removed epsilon parameter from choose_action * removed epsilon parameter from choose_action * changed agent manager's train parameter to experience_by_agent * fixed some PR comments * renamed zero_grad to zero_gradients in LearningModule * fixed some PR comments * bug fix * bug fix * bug fix * removed explorer abstraction from agent * added DEVICE env variable as first choice for torch device * refined dqn example * fixed lint issues * removed unwanted import in cim example * updated cim-dqn notebook * simplified scheduler * edited notebook according to merged scheduler changes * refined dimension check for learning module manager and removed num_actions from DQNConfig * bug fix for cim example * added notebook output * removed early stopping from CIM dqn example * removed early stopping from cim example config * moved decorator logic inside algorithms * renamed early_stopping_callback to early_stopping_checker * removed action_dim from noise explorer classes and added some shape checks * modified NoiseExplorer's __call__ logic to batch processing * made NoiseExplorer's __call__ return type np array * renamed update to set_parameters in explorer * fixed old naming in test_grass Co-authored-by: ysqyang <v-yangqi@microsoft.com> * V0.2 VM scheduling docs (#228) * Initialize * Data center scenario init * Code style modification * V0.2 event buffer subevents expand (#180) * V0.2 rl toolkit refinement (#165) * refined rl abstractions * fixed formattin issues * checked out error-code related code from v0.2_pg * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * fixed a bug * renamed save_models to dump_models * 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving * renamed dump_experience_store to dump_experience_pool * fixed a bug in the dump_experience_pool method * fixed some PR comments * fixed more PR comments * 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class * fixed cim example according to rl toolkit changes * fixed some more PR comments * rewrote multi_process_launcher to eliminate the distributed section in config * 1. fixed a typo; 2. added logging before early stopping * fixed a bug * fixed a bug * fixed a bug * added early stopping feature to CIM exmaple * fixed a typo * fixed some issues with early stopping * changed early stopping metric func * fixed a bug * fixed a bug * added early stopping to dist mode cim * added experience collecting func * edited notebook according to changes in CIM example * fixed bugs in nb * fixed lint formatting issues * fixed a typo * fixed some PR comments * fixed more PR comments * revised docs * removed nb output * fixed a bug in simple_learner * fixed a typo in nb * fixed a bug * fixed a bug * fixed a bug * removed unused import * fixed a bug * 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing * fixed some doc issues * added output to nb Co-authored-by: ysqyang <v-yangqi@microsoft.com> * unfold sub-events, insert after parent * remove event category, use different class instead, add helper functions to gen decision and action event * add a method to support add immediate event to cascade event with tick validation * fix ut issue * add action as 1st sub event to ensure the executing order Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Data center scenario update * Code style update * Data scenario business engine update * Isort update * Fix lint code check * Fix based on PR comments. * Update based on PR comments. * Add decision payload * Add config file * Update utilization series logic * Update based on PR comment * Update based on PR * Update * Update * Add the ValidPm class * Update docs string and naming * Add energy consumption * Lint code fixed * Refining postpone function * Lint style update * Init data pipeline * Update based on PR comment * Add data pipeline download * Lint style update * Code style fix * Temp update * Data pipeline update * Add aria2p download function * Update based on PR comment * Update based on PR comment * Update based on PR comment * Update naming of variables * Rename topology * Renaming * Fix valid pm list * Pylint fix * Update comment * Update docstring and comment * Fix init import * Update tick issue * fix merge problem * update style * V0.2 datacenter data pipeline (#199) * Data pipeline update * Data pipeline update * Lint update * Update pipeline * Add vmid mapping * Update lint style * Add VM data analytics * Update notebook * Add binary converter * Modift vmtable yaml * Update binary meta file * Add cpu reader * random example added for data center * Fix bugs * Fix pylint * Add launcher * Fix pylint * best fit policy added * Add reset * Add config * Add config * Modify action object * Modify config * Fix naming * Modify config * Add snapshot list * Modify a spelling typo * Update based on PR comments. * Rename scenario to vm scheduling * Rename scenario * Update print messages * Lint fix * Lint fix * Rename scenario * Modify the calculation of cpu utilization * Add comment * Modify data pipeline path * Fix typo * Modify naming * Add unittest * Add comment * Unify naming * Fix data path typo * Update comments * Update snapshot features * Add take snapshot * Add summary keys * Update cpu reader * Update naming * Add unit test * Rename snapshot node * Add processed data pipeline * Modify config * Add comment * Lint style fix Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> * Add package used in vm_scheduling * add aria2p to test requirement * best fit example: update the usage of snapshot * Add aria2p to test requriement * Remove finish event * Fix unittest * Add test dataset * Update based on PR comment * vm doc init * Update docs * Update docs * Update docs * Update docs * Remove old notebook * Update docs * Update docs * Add figure * Update docs Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * v0.2 VM Scheduling docs refinement (#231) * Fix typo * Refining vm scheduling docs * V0.2 store refinement (#234) * updated docs and images for rl toolkit * 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Fix bug (#237) vm scenario: fix the event type bug of the postpone event * V0.2 rl toolkit doc (#235) * updated docs and images for rl toolkit * updated cim example doc * updated cim exmaple docs * updated cim example rst * updated rl_toolkit and cim example docs * replaced q_module with q_net in example rst * refined doc * refined doc * updated figures * updated figures Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Merge V0.2 vis into V0.2 (#233) * Implemented dump snapshots and convert to CSV. * Let BE supports params when dump snapshot. * Refactor dump code to core.py * Implemented decision event dump. * replace is not '' with !='' * Fixed issues that code review mentioned. * removed path from hello.py * Changed import sort. * Fix import sorting in citi_bike/business_engine * visualization 0.1 * Updated lint configurations. * Fixed formatting error that caused lint errors. * render html title function * Try to fix lint errors. * flake-8 style fix * remove space around 18,35 * dump_csv_converter.py re-formatting. * files re-formatting. * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * re-formatting after merged upstream. * Updated import section. * Updated import section. * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * V0.2 vis dump feature enhancement. (#190) * Dumps added manifest file. * Code updated format by flake8 * Changed manifest file format for easy reading. * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * doc refine * doc update * params type * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * Added manifest file. (#201) Only a few changes that need to meet requirements of manifest file format. * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update * V0.2 visualization-0.1 (#181) * visualization 0.1 * render html title function * flake-8 style fix * style fixed * tab delete * white space fix * white space fix-2 * vis redundant function delete * refine * pr refine * isort fix * white space * lint error * \n error * test continuation * indent * continuation of indent * indent 0.3 * comment update * comment update 0.2 * f-string update * f-string 0.2 * lint 0.3 * lint 0.4 * lint 0.4 * lint 0.5 * lint 0.6 * docstring update * data version deploy update * condition update * add whitespace * deploy info update; docs update * weird white space * Update dashboard_visualization.md * new endline? * delete dependency * delete irrelevant file * change scenario to enum, divide file path into a separated class * fix the visualization of docs/key_components/distributed_toolkit * doc refine * doc update * params type * add examples into isort ignore * data structure update * doc&enum, formula refine * refine * add ut, refine doc * style refine * isort * strong type fix * os._exit delete * revert datalib * import new line * change test case * change file name & doc * change deploy path * delete params * revert file * delete duplicate file * delete single process * update naming * manually change import order * delete blank * edit error * requirement txt * style fix & refine * comments&docstring refine * add parameter name * test & dump * comments update * comments fix * delete toolkit change * doc update * citi bike update * deploy path * datalib update * revert datalib * revert * maro file format * comments update * doc update * update param name * doc update * new link * image update Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> * image change * add reset snapshot * delete dump * add new line * add next steps * import change * relative import * add init file * import change * change utils file * change cliexpcetion to clierror * dashboard test * change result * change assertation * move not * unit test change * core change * unit test delete name_mapping_file * update cim business engine * doc update * change relative path * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * duc update * duc update * duc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * change import sequence * comments update * doc add pic * add dependency * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * Update dashboard_visualization.rst * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * doc update * delete white space * doc update * doc update * update doc * update doc * update doc Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 docs process mode (#230) * Update process mode docs and fixed on premises * Update orchestration docs * Update process mode docs add JOB_NAME as env variable * fixed bugs * fixed isort issue * update docs index Co-authored-by: kaiqli <v-kaiqli@microsoft.com> * V0.2 learning model refinement (#236) * moved optimizer options to LearningModel * typo fix * fixed lint issues * updated notebook * misc edits * 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook * renamed single_host_cim_learner ot cim_learner in notebook * updated notebook output * typo fix * removed dimension check in absence of shared stack * fixed a typo * fixed lint issues Co-authored-by: ysqyang <v-yangqi@microsoft.com> * Update vm docs (#241) Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> * V0.2 info update (#240) * update readme * update version * refine reademe format * add vis gif * add citation * update citation * update badge Co-authored-by: Arthur Jiang <sjian@microsoft.com> * Fix typo (#242) * Fix typo * fix typo * fix Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com> Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com> Co-authored-by: Jinyu Wang <jinywan@microsoft.com> Co-authored-by: ysqyang <ysqyang@gmail.com> Co-authored-by: ysqyang <v-yangqi@microsoft.com> Co-authored-by: Arthur Jiang <sjian@microsoft.com> Co-authored-by: Romic Huang <romic.kid@gmail.com> Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com> Co-authored-by: Chaos Yu <chaos.you@gmail.com> Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com> Co-authored-by: Michael Li <mic_lee2000@hotmail.com> Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com> Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com> Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com> Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
2021-01-04 14:49:05 +03:00
tick += 1
def compare_list(list1: list, list2: list) -> bool:
return len(list1) == len(list2) and all(val1 == val2 for val1, val2 in zip(list1, list2))
def compare_dictionary(dict1: dict, dict2: dict) -> bool:
keys1 = sorted(list(dict1.keys()))
keys2 = sorted(list(dict2.keys()))
if not compare_list(keys1, keys2):
return False
for key in keys1:
value1 = dict1[key]
value2 = dict2[key]
if type(value1) != type(value2):
return False
if type(value1) == dict:
if not compare_dictionary(value1, value2):
return False
elif type(value1) == list:
if not compare_list(value1, value2):
return False
else:
if value1 != value2:
return False
return True