V0.2 (#239)
* Event payload details for env.summary (#156)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* V0.2 online lp for citi bike (#159)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
* online LP example added for citi bike
* infeasible solution
* infeasible solution fixed: call snapshot before any env.step()
* experiment results of toy topos added
* experiment results of toy topos added
* experiment result update: better than naive baseline
* PuLP version added
* greedy experiment results update
* citibike result update
* modified according to PR comments
* update experiment results and forecasting comparison
* citi bike lp README updated
* README updated
* modified according to PR comments
* update according to PR comments
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* update according to flake8
* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)
* 1. added logical operator overloading for early stopping checker; 2. added mean value checker
* fixed PR comments
* removed learner.exit() in single_process_launcher
* added another early stopping checker in example
* fixed PR comments and lint issues
* lint issue fix
* fixed lint issues
* fixed a bug
* fixed a bug
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 skip connection (#176)
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* moved reward type casting to exp shaper
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* fixed a bug in learner's test() (#193)
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 double dqn (#188)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* set is_double to true in DQN config
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* V0.2 feature predefined image (#183)
* feat: support predefined image provision
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* V0.2 feature proxy rejoin (#158)
* update dist decorator
* replace proxy.get_peers by proxy.peers
* update proxy rejoin (draft, not runable for proxy rejoin)
* fix bugs in proxy
* add message cache, and redesign rejoin parameter
* feat: add checkpoint with test
* update proxy.rejoin
* fixed rejoin bug, rename func
* add test example(temp)
* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.
* capital env vari name
* rm json.dumps; change retries to 10; temp add warning level for rejoin
* fix: unable to load FaultToleranceAgent, missing params
* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent
* feat: add node_id to node_details
* fix: add a new dependency for tests
* style: meet linting requirements
* style: remaining linting problems
* lint fixed; rm temp test folder.
* fixed lint f-string without placeholder
* fix: add a flag for "remove_container", refine restart logic and Redis keys naming
* proxy rejoin update.
* variable rename.
* fixed lint issues
* fixed lint issues
* add exit code for different error
* feat: add special errors handler
* add max rejoin times
* remove unused import
* add rejoin UT; resolve rejoin comments
* lint fixed
* fixed UT import problem
* rm MessageCache in proxy
* fix: refine key naming
* update proxy rejoin; add topic for broadcast
* feat: support predefined image provision
* update UT for communication
* add docstring for rejoin
* fixed isort and zmq driver import
* fixed isort and UT test
* fix isort issue
* proxy rejoin update (comments v2)
* fixed isort error
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* feat: add exists method for checkpoint
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* add driver close and socket SUB disconnect for rejoin
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
* fixed comments and update logger level
* mv driver in proxy.__init__ for issue temp fixed.
* Update docstring and comments
* style: fix code reviews problems
* fix code format
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 feature cli windows (#203)
* fix: change local mkdir to os.makedirs
* fix: add utf8 encoding for logger
* fix: add powershell.exe prefix to subprocess functions
* feat: add debug_green
* fix: use fsutil to create fix-size files in Windows
* fix: use universal_newlines=True to handle encoding problem in different operating systems
* fix: use temp file to do copy when the operating system is not Linux
* fix: linting error
* fix: use fsutil in test_k8s.py
* feat: dynamic init ABS_PATH in GlobalParams
* fix: use -Command to execute Powershell command
* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode
* fix: problems in code review
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* V0.2 merge master (#214)
* fix the visualization of docs/key_components/distributed_toolkit
* add examples into isort ignore
* refine import path for examples (#195)
* refine import path for examples
* refine indents
* fixed formatting issues
* update code style
* add editorconfig-checker, add editorconfig path into lint, change super-linter version
* change path for code saving in cim.gnn
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
* fix issue that sometimes there is conflict between distutils and setuptools (#208)
* fix issue that cython and setuptools conflict
* follow the accepted temp workaround
* update comment, it should be conflict between setuptools and distutils
* fixed bugs related to proxy interface changes
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* typo fix
* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)
* bug fix
* clear the reference after extract sub events, update ut to cover this issue
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* fix flake8 style problem
* V0.2 feature refine mode namings (#212)
* feat: refine cli exception
* feat: refine mode namings
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* fixed bugs in dist rl
* feat: rename files
* tests: set longer gracefully wait time
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: rm redundant variables
* fix: refine error message
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vis new (#210)
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* V0.2 local host process (#221)
* Update local process (not ready)
* update cli process mode
* add setup/clear/template for maro process
* fix process stop
* add logger and rename parameters
* add logger for setup/clear
* fixed close not exist pid when given pid list.
* Fixed comments and rename setup/clear with create/delete
* update ProcessInternalError
* V0.2 grass on premises (#220)
* feat: refine cli exception
* commit on v0.2_grass_on_premises
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vm scheduling scenario (#189)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Resolve none action problem (#224)
* V0.2 vm_scheduling notebook (#223)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Init vm shceduling notebook
* Add notebook
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update based on the v0.2_datacenter
* Update notebook
* Update
* update filepath
* notebook updated
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Update process mode docs and fixed on premises (#226)
* V0.2 Add github workflow integration (#222)
* test: add github workflow integration
* fix: split procedures && bug fixed
* test: add training only restriction
* fix: add 'approved' restriction
* fix: change default ssh port to 22
* style: in one line
* feat: add timeout for Subprocess.run
* test: change default node_size to Standard_D2s_v3
* style: refine style
* fix: add ssh_port param to on-premises mode
* fix: add missing init.py
* V0.2 explorer (#198)
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* added noise explorer
* fixed formatting
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* removed epsilon parameter from choose_action
* fixed some PR comments
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* refined dqn example
* fixed lint issues
* simplified scheduler
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 embedded optim (#191)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* embedded optimizer into SingleHeadLearningModel
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* minor docstring edits
* mv optimizer options inside LearningMode
* modified example accordingly
* fixed a bug
* fixed a bug
* fixed a bug
* added dueling DQN feature
* revised and refined docstrings
* fixed a bug
* fixed lint issues
* added load/dump functions to LearningModel
* fixed a bug
* fixed a bug
* fixed lint issues
* refined DQN docstrings
* removed load/dump functions from DQN
* added task validator
* fixed decorator use
* fixed a typo
* fixed a bug
* fixed lint issues
* changed LearningModel's step() to take a single loss
* revised learning model design
* revised example
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added decorator utils to algorithm
* fixed a bug
* renamed core_model to model
* fixed a bug
* 1. fixed lint formatting issues; 2. refined learning model docstrings
* rm trailing whitespaces
* added decorator for choose_action
* fixed a bug
* fixed a bug
* fixed version-related issues
* renamed add_zeroth_dim decorator to expand_dim
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* small fixes
* added shared_module property to LearningModel
* added shared_module property to LearningModel
* revised __getstate__ for LearningModel
* fixed a bug
* added soft_update function to learningModel
* fixed a bug
* revised learningModel
* rm __getstate__ and __setstate__ from LearningModel
* added noise explorer
* fixed formatting
* removed unnecessary comma
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* removed epsilon parameter from choose_action
* removed epsilon parameter from choose_action
* changed agent manager's train parameter to experience_by_agent
* fixed some PR comments
* renamed zero_grad to zero_gradients in LearningModule
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* added DEVICE env variable as first choice for torch device
* refined dqn example
* fixed lint issues
* removed unwanted import in cim example
* updated cim-dqn notebook
* simplified scheduler
* edited notebook according to merged scheduler changes
* refined dimension check for learning module manager and removed num_actions from DQNConfig
* bug fix for cim example
* added notebook output
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* moved decorator logic inside algorithms
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 VM scheduling docs (#228)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* vm doc init
* Update docs
* Update docs
* Update docs
* Update docs
* Remove old notebook
* Update docs
* Update docs
* Add figure
* Update docs
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* v0.2 VM Scheduling docs refinement (#231)
* Fix typo
* Refining vm scheduling docs
* V0.2 store refinement (#234)
* updated docs and images for rl toolkit
* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Fix bug (#237)
vm scenario: fix the event type bug of the postpone event
* V0.2 rl toolkit doc (#235)
* updated docs and images for rl toolkit
* updated cim example doc
* updated cim exmaple docs
* updated cim example rst
* updated rl_toolkit and cim example docs
* replaced q_module with q_net in example rst
* refined doc
* refined doc
* updated figures
* updated figures
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Merge V0.2 vis into V0.2 (#233)
* Implemented dump snapshots and convert to CSV.
* Let BE supports params when dump snapshot.
* Refactor dump code to core.py
* Implemented decision event dump.
* replace is not '' with !=''
* Fixed issues that code review mentioned.
* removed path from hello.py
* Changed import sort.
* Fix import sorting in citi_bike/business_engine
* visualization 0.1
* Updated lint configurations.
* Fixed formatting error that caused lint errors.
* render html title function
* Try to fix lint errors.
* flake-8 style fix
* remove space around 18,35
* dump_csv_converter.py re-formatting.
* files re-formatting.
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* re-formatting after merged upstream.
* Updated import section.
* Updated import section.
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* V0.2 vis dump feature enhancement. (#190)
* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* doc refine
* doc update
* params type
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* Added manifest file. (#201)
Only a few changes that need to meet requirements of manifest file format.
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
* V0.2 visualization-0.1 (#181)
* visualization 0.1
* render html title function
* flake-8 style fix
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* fix the visualization of docs/key_components/distributed_toolkit
* doc refine
* doc update
* params type
* add examples into isort ignore
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
* image change
* add reset snapshot
* delete dump
* add new line
* add next steps
* import change
* relative import
* add init file
* import change
* change utils file
* change cliexpcetion to clierror
* dashboard test
* change result
* change assertation
* move not
* unit test change
* core change
* unit test delete name_mapping_file
* update cim business engine
* doc update
* change relative path
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* duc update
* duc update
* duc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* change import sequence
* comments update
* doc add pic
* add dependency
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* Update dashboard_visualization.rst
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* delete white space
* doc update
* doc update
* update doc
* update doc
* update doc
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 docs process mode (#230)
* Update process mode docs and fixed on premises
* Update orchestration docs
* Update process mode docs add JOB_NAME as env variable
* fixed bugs
* fixed isort issue
* update docs index
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* V0.2 learning model refinement (#236)
* moved optimizer options to LearningModel
* typo fix
* fixed lint issues
* updated notebook
* misc edits
* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook
* renamed single_host_cim_learner ot cim_learner in notebook
* updated notebook output
* typo fix
* removed dimension check in absence of shared stack
* fixed a typo
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Update vm docs (#241)
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 info update (#240)
* update readme
* update version
* refine reademe format
* add vis gif
* add citation
* update citation
* update badge
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
* Fix typo (#242)
* Fix typo
* fix typo
* fix
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>