* Add README.md and refine the bin_packing algorithm
* refine round_robin and bin_packing
* Update README.md
* Refine the code and README.md
* Refine the bin_packing and round_robin
* Refine the code
Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
* add where filter for general usage
* test for general filter
* simpler comparison for attribute
* filter on raw
* fix array fetch bug
* ut for base comparison
* lint fix
* remove unused variables
* update ignore
* rule_based_algorithm
* revise_the_code_by_aiming_hao
* revise_the_code_by_aiming_hao
* use the np.argmin
* Update best_fit.py
fix the "np not defined"
* refine the code
* fix the error
* refine the code
* fix the error
* fix the error
* refine the code
* remove the history
* refine the code
* update first_fit
* Refine the code
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: aiming hao <37905948+hamcoder@users.noreply.github.com>
* 1st version
* make vectorenv can import under module root
* allow outside control which environment to push, so we do not need to control the tick for each environments
* remove comment
* lint fixing
* add test for vector env, correct the batch number
* lint fixing
* reduce parameters
* Update vector env ut to test if support raw backend
* correct comments on hello
* fix review comments, cim actiontype wip
* add a compatiable way to handle ActionType for cim scenario
* lint fix
* correct the action type to handle previous action
* add doc string for wrappers
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* test: add github workflow integration
* fix: split procedures && bug fixed
* test: add training only restriction
* fix: add 'approved' restriction
* fix: change default ssh port to 22
* style: in one line
* feat: add timeout for Subprocess.run
* test: change default node_size to Standard_D2s_v3
* style: refine style
* fix: add ssh_port param to on-premises mode
* fix: add missing init.py
* refactor: extract reusable methods to GrassExecutor
* feat: refine validation.py and add docstrings
* fix: add remote prefix to ssh function
* style: refine logging output
* fix: extract param 'vm_name'
* fix: linting errors
* feat: add NodeStatus and ContainerStatus at executors
* feat: use master_node_size as the size of build_node_image_vm
* fix: refine comments
* feat: add "state" key for node_details
* fix: linting errors
* fix: deployment error when ssh_port is the default port
* refactor: extract utils/*.py in scripts
* style: single quote to double quote
* refactor: refine folder structure of scripts
* fix: linting errors
* fix: add executable to fix error initialization
* refactor: use SubProcess to execute commands in scripts
* refactor: refine script namings
* refactor: extract utils/*.py and systemd/*.service in agents
* feat: refine Exception structure, add SubProcess class in agents
* feat: use psutil to get resource details, move resource details initialization to agents
* fix: linting errors
* feat: use docker sdk in node_agent
* feat: extract RedisExecutor in agents
* test: remove image when tearing down
* feat: add LoadImageAgent
* feat: move node status update to agents
* refactor: move utils folder to upper level in scripts
* feat: add node_api_server, refine agents folder structure
* fix: linting errors
* refactor: refine folder structure in grass/lib
* refactor: build DeploymentValidator class
* refactor: create DetailsReader, DetailsWriter, delete sync mode
* refactor: rename DockerManager to DockerController
* refactor: rename RedisManager to RedisController
* refactor: rename AzureExecutor to AzureController
* refactor: create NameCreator
* refactor: create PathConvertor
* refactor: rename checkers to details_validity_wrapper
* refactor: rename lock to operation_lock_wrapper
* refactor: create FileSynchronizer
* refactor: create redis instance in RedisController
* feat: add master_api_server, move job related scripts to api_server
* refactor: move node related scripts to api_server
* fix: use "DELETE" instead of "DEL" as http method
* refactor: use mapping names instead of namings like "sths_details"
* feat: move master related scripts to api_server
* feat: move containers related scripts to api_server
* fix: add gracefully wait for remote_start_master_services
* feat: move image_files related scripts to api_server
* fix: improper test in the training stage
* refactor: use local variable "URL_PREFIX" directly, add 's' in node_api_client
* refactor: refine namings in services
* feat: move clean related scripts to api_server
* refactor: delete "public_key" field
* feat: build MasterApiClient
* refactor: delete sync_mkdir
* feat: refine locks in node_details
* feat: build DockerController for grass/utils
* refactor: rename Extractor to Controller
* feat: move schedule related components to api_server
* fix: incorrect allocation when starting batch jobs
* fix: missing field "containers" in job_details
* feat: add delete_job in master_api_server
* feat: add logger in agents
* fix: no "resources" field when scale up node at the very beginning
* feat: use Process back instead of Thread in node_agent
* feat: add 'v1' prefix to api_servers' urls
* refactor: move lib/aks under lib/clouds
* refactor: move lib/k8s_configs to lib/configs, move aks related configs to clouds/aks, delete volumn mount in redis
* feat: extract K8sExecutor
* fix: add one more searching layer of pakcage_data at maro.cli.k8s
* refactor: move lib/configs/nvidia to lib/clouds/aks, make create() as a staticmethod at k8s mode
* refactor: move id init to standardize_create_deployment in grass/azure mode
* fix: use GlobalParams instead of hard-coded data
* feat: build K8sDetailsReader, K8sDetailsWriter
* feat: use k8s sdk to replace subprocess call
* refactor: delete redundant vars
* refactor: move more methods to K8sExecutor
* test: use legal naming in tests/cli/k8s
* refactor: refine logging messages
* refactor: make create() as a staticmethod at grass/azure mode, refine logging messages
* feat: build ArmTemplateParameterBuilder in K8sAzureExecutor
* refactor: remove redundant params
* refactor: rename /clouds to /modes
* refactor: refine structures and logging messages in GrassExecutor
* feat: add 'PENDING' to NodeStatus
* feat: refine build_job_details for create schedule in grass/azure
* feat: refine build_job_details for create schedule in k8s/aks
* feat: use node_join schema in grass/azure
* refactor: replace /.maro with /.maro-shared, replace admin_username with node_username, remove redundant snippets in /grass/lib/scirpts
* refactor: add 'ssh', 'api_server' into master_details and node_details
* refactor: move master runtine params initialization into api_server
* refactor: refine namings
* feat: reconstruct grass/on-premises with new schema
* refactor: delete field 'user' in grass_azure_create
* refactor: rename 'blueprints_v1' to 'blueprints'
* refactor: move some GlobalPaths to subfolders
* refactor: replace 'connection' field with 'master' or 'node'
* refactor: move start_service scripts to init_master.py
* refactor: rename grass/master/release to grass/master/delete_master
* refactor: load local_details in node services, refine script namings
* refactor: move invocations of start_node and stop node to api server
* fix: add missing imports
* refactor: rename SubProcess to Subprocess
* refactor: delete field 'user' in k8s_aks_create
* refactor: refine folder structures in /.maro/clusters/cluster
* refactor: move /logs to /clusters/{cluster_name}
* refactor: refine filenames
* fix: export DEBIAN_FRONTEND=noninteractive to reduce irrelevant warnings
* refactor: refine code structures, delete redundant code
* refactor: change /{cluster_name}/details.yml to /{cluster_name}/cluster_details.yml
* feat: add rsa+aes data encryption on dev-master communication
* fix: change MasterApiClient to RedisController in node-related services and scripts
* refactor: remove all "{cluster_name}" in redis keys
* refactor: extract init_master and create_user to GrassExecutor
* test: refine tests in grass/azure and k8s/aks
* refactor: refine ArmTemplateParameterBuilder
* feat: change the order of installation in init_build_node_image_vm.py
* fix: add user/admin_id to grass_on_premises_create.yml
* fix: change outdated container names
* feat: add standardize_join_cluster_deployment in grass/on-premises
* feat: add init_node_runtime_env in join_cluster.py
* refactor: refine code structure in join_cluster.py
* test: add TestGrassOnPremises
* refactor: refine ARM templates
* fix: linting errors
* fix: test requirements error
* fix: arm linting errors
* refactor: late import in grass, k8s
* style: refine load_parser_grass
* style: refine load_parser_k8s
* docs: update orchestrations
* fix: fix get_job_logs
* docs: add docs for GrassAzureExecutor, GrassExecutor
* docs: add docs for GrassOnPremisesExecutor
* docs: add docs for /grass/scripts
* docs: add docs for /grass/services
* docs: add docs for /grass/utils
* docs: add docs for k8s
* try paramiko of another version
* rollback paramiko package version
Co-authored-by: Wesley <Wenlei.Shi@microsoft.com>
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* embedded optimizer into SingleHeadLearningModel
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* minor docstring edits
* mv optimizer options inside LearningMode
* modified example accordingly
* fixed a bug
* fixed a bug
* fixed a bug
* added dueling DQN feature
* revised and refined docstrings
* fixed a bug
* fixed lint issues
* added load/dump functions to LearningModel
* fixed a bug
* fixed a bug
* fixed lint issues
* refined DQN docstrings
* removed load/dump functions from DQN
* added task validator
* fixed decorator use
* fixed a typo
* fixed a bug
* fixed lint issues
* changed LearningModel's step() to take a single loss
* revised learning model design
* revised example
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added decorator utils to algorithm
* fixed a bug
* renamed core_model to model
* fixed a bug
* 1. fixed lint formatting issues; 2. refined learning model docstrings
* rm trailing whitespaces
* added decorator for choose_action
* fixed a bug
* fixed a bug
* fixed version-related issues
* renamed add_zeroth_dim decorator to expand_dim
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* small fixes
* added shared_module property to LearningModel
* added shared_module property to LearningModel
* some revision to DDPG
* revised __getstate__ for LearningModel
* fixed a bug
* added soft_update function to learningModel
* fixed a bug
* revised learningModel
* rm __getstate__ and __setstate__ from LearningModel
* fixed some issues with DDPG code
* added noise explorer
* formatting
* fixed formatting
* removed unnecessary comma
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* removed unwanted exception and imports
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* removed epsilon parameter from choose_action
* removed epsilon parameter from choose_action
* changed agent manager's train parameter to experience_by_agent
* fixed some PR comments
* renamed zero_grad to zero_gradients in LearningModule
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* added DEVICE env variable as first choice for torch device
* refined dqn example
* fixed lint issues
* removed unwanted import in cim example
* updated cim-dqn notebook
* simplified scheduler
* edited notebook according to merged scheduler changes
* refined dimension check for learning module manager and removed num_actions from DQNConfig
* bug fix for cim example
* added notebook output
* removed early stopping from CIM dqn example
* fixed naming issues
* removed early stopping from cim example config
* moved decorator logic inside algorithms
* renamed early_stopping_callback to early_stopping_checker
* tmp commit
* tmp commit
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
* moved optimizer options to LearningModel
* typo fix
* fixed lint issues
* updated notebook
* fixed learning model naming
* fixed conflicts
* updated ddpg example
* misc edits
* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook
* renamed single_host_cim_learner ot cim_learner in notebook
* updated notebook output
* typo fix
* added ddpg example for cim
* fixed some bugs
* removed dimension check in absence of shared stack
* fixed a typo
* bug fixes
* bug fixes
* aligned with v0.2
* aligned with v0.2
* fixed lint issues
* added reference in ddpg.py
* fixed lint issues
* fixed lint issues
* fixed lint issues
* removed ddpg example
* checked out files from origin/v0.2 before merging
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* refine readme
* feat: refine data push/pull (#138)
* feat: refine data push/pull
* test: add cli provision testing
* fix: style fix
* fix: add necessary comments
* fix: from code review
* add fall back function in weather download (#112)
* fix deployment issue in multi envs
* fix typo
* fix ~/.maro not exist issue in build
* skip deploy when build
* update for comments
* temporarily disable weather info
* replace ecr with cim in setup.py
* replace ecr in manifest
* remove weather check when read data
* fix station id issue
* fix format
* add TODO in comments
* add noaa weather source
* fix weather reset and weather comment
* add comment for weather data url
* some format update
* add fall back function in weather download
* update comment
* update for comments
* update comment
* add period
* fix for pylint
* update for pylint check
* added example docs (#136)
* added example docs
* added citibike greedy example doc
* modified citibike doc
* fixed PR comments
* fixed more PR comments
* fixed small formatting issue
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* switch the key and value of handler_dict in decorator (#144)
* switch the key and value of handler_dict in decorator
* add dist decorator UT and fixed multithreading conflict in maro test suite
* pr comments update.
* resolved comments about decorator UT
* rename handler_fun in dist decorator
* change self.attr into class_name.attr
* update UT tests comments
* V0.1 annotation (#147)
* refine the annotation of simulator core
* remove reward from env(be)
* format refined
* white spaces test
* left-padding spaces refined
* format modifed
* update the left-padding spaces of docstrings
* code format updated
* update according to comments
* update according to PR comments
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Event payload details for env.summary (#156)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* V0.2 online lp for citi bike (#159)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
* online LP example added for citi bike
* infeasible solution
* infeasible solution fixed: call snapshot before any env.step()
* experiment results of toy topos added
* experiment results of toy topos added
* experiment result update: better than naive baseline
* PuLP version added
* greedy experiment results update
* citibike result update
* modified according to PR comments
* update experiment results and forecasting comparison
* citi bike lp README updated
* README updated
* modified according to PR comments
* update according to PR comments
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* update according to flake8
* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)
* 1. added logical operator overloading for early stopping checker; 2. added mean value checker
* fixed PR comments
* removed learner.exit() in single_process_launcher
* added another early stopping checker in example
* fixed PR comments and lint issues
* lint issue fix
* fixed lint issues
* fixed a bug
* fixed a bug
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 skip connection (#176)
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* moved reward type casting to exp shaper
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* fixed a bug in learner's test() (#193)
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 double dqn (#188)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* set is_double to true in DQN config
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* V0.2 feature predefined image (#183)
* feat: support predefined image provision
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* V0.2 feature proxy rejoin (#158)
* update dist decorator
* replace proxy.get_peers by proxy.peers
* update proxy rejoin (draft, not runable for proxy rejoin)
* fix bugs in proxy
* add message cache, and redesign rejoin parameter
* feat: add checkpoint with test
* update proxy.rejoin
* fixed rejoin bug, rename func
* add test example(temp)
* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.
* capital env vari name
* rm json.dumps; change retries to 10; temp add warning level for rejoin
* fix: unable to load FaultToleranceAgent, missing params
* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent
* feat: add node_id to node_details
* fix: add a new dependency for tests
* style: meet linting requirements
* style: remaining linting problems
* lint fixed; rm temp test folder.
* fixed lint f-string without placeholder
* fix: add a flag for "remove_container", refine restart logic and Redis keys naming
* proxy rejoin update.
* variable rename.
* fixed lint issues
* fixed lint issues
* add exit code for different error
* feat: add special errors handler
* add max rejoin times
* remove unused import
* add rejoin UT; resolve rejoin comments
* lint fixed
* fixed UT import problem
* rm MessageCache in proxy
* fix: refine key naming
* update proxy rejoin; add topic for broadcast
* feat: support predefined image provision
* update UT for communication
* add docstring for rejoin
* fixed isort and zmq driver import
* fixed isort and UT test
* fix isort issue
* proxy rejoin update (comments v2)
* fixed isort error
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* feat: add exists method for checkpoint
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* add driver close and socket SUB disconnect for rejoin
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
* fixed comments and update logger level
* mv driver in proxy.__init__ for issue temp fixed.
* Update docstring and comments
* style: fix code reviews problems
* fix code format
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 feature cli windows (#203)
* fix: change local mkdir to os.makedirs
* fix: add utf8 encoding for logger
* fix: add powershell.exe prefix to subprocess functions
* feat: add debug_green
* fix: use fsutil to create fix-size files in Windows
* fix: use universal_newlines=True to handle encoding problem in different operating systems
* fix: use temp file to do copy when the operating system is not Linux
* fix: linting error
* fix: use fsutil in test_k8s.py
* feat: dynamic init ABS_PATH in GlobalParams
* fix: use -Command to execute Powershell command
* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode
* fix: problems in code review
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* V0.2 merge master (#214)
* fix the visualization of docs/key_components/distributed_toolkit
* add examples into isort ignore
* refine import path for examples (#195)
* refine import path for examples
* refine indents
* fixed formatting issues
* update code style
* add editorconfig-checker, add editorconfig path into lint, change super-linter version
* change path for code saving in cim.gnn
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
* fix issue that sometimes there is conflict between distutils and setuptools (#208)
* fix issue that cython and setuptools conflict
* follow the accepted temp workaround
* update comment, it should be conflict between setuptools and distutils
* fixed bugs related to proxy interface changes
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* typo fix
* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)
* bug fix
* clear the reference after extract sub events, update ut to cover this issue
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* fix flake8 style problem
* V0.2 feature refine mode namings (#212)
* feat: refine cli exception
* feat: refine mode namings
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* fixed bugs in dist rl
* feat: rename files
* tests: set longer gracefully wait time
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: rm redundant variables
* fix: refine error message
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vis new (#210)
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* V0.2 local host process (#221)
* Update local process (not ready)
* update cli process mode
* add setup/clear/template for maro process
* fix process stop
* add logger and rename parameters
* add logger for setup/clear
* fixed close not exist pid when given pid list.
* Fixed comments and rename setup/clear with create/delete
* update ProcessInternalError
* V0.2 grass on premises (#220)
* feat: refine cli exception
* commit on v0.2_grass_on_premises
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vm scheduling scenario (#189)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Resolve none action problem (#224)
* V0.2 vm_scheduling notebook (#223)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Init vm shceduling notebook
* Add notebook
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update based on the v0.2_datacenter
* Update notebook
* Update
* update filepath
* notebook updated
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Update process mode docs and fixed on premises (#226)
* V0.2 Add github workflow integration (#222)
* test: add github workflow integration
* fix: split procedures && bug fixed
* test: add training only restriction
* fix: add 'approved' restriction
* fix: change default ssh port to 22
* style: in one line
* feat: add timeout for Subprocess.run
* test: change default node_size to Standard_D2s_v3
* style: refine style
* fix: add ssh_port param to on-premises mode
* fix: add missing init.py
* V0.2 explorer (#198)
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* added noise explorer
* fixed formatting
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* removed epsilon parameter from choose_action
* fixed some PR comments
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* refined dqn example
* fixed lint issues
* simplified scheduler
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 embedded optim (#191)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* embedded optimizer into SingleHeadLearningModel
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* minor docstring edits
* mv optimizer options inside LearningMode
* modified example accordingly
* fixed a bug
* fixed a bug
* fixed a bug
* added dueling DQN feature
* revised and refined docstrings
* fixed a bug
* fixed lint issues
* added load/dump functions to LearningModel
* fixed a bug
* fixed a bug
* fixed lint issues
* refined DQN docstrings
* removed load/dump functions from DQN
* added task validator
* fixed decorator use
* fixed a typo
* fixed a bug
* fixed lint issues
* changed LearningModel's step() to take a single loss
* revised learning model design
* revised example
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added decorator utils to algorithm
* fixed a bug
* renamed core_model to model
* fixed a bug
* 1. fixed lint formatting issues; 2. refined learning model docstrings
* rm trailing whitespaces
* added decorator for choose_action
* fixed a bug
* fixed a bug
* fixed version-related issues
* renamed add_zeroth_dim decorator to expand_dim
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* small fixes
* added shared_module property to LearningModel
* added shared_module property to LearningModel
* revised __getstate__ for LearningModel
* fixed a bug
* added soft_update function to learningModel
* fixed a bug
* revised learningModel
* rm __getstate__ and __setstate__ from LearningModel
* added noise explorer
* fixed formatting
* removed unnecessary comma
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* removed epsilon parameter from choose_action
* removed epsilon parameter from choose_action
* changed agent manager's train parameter to experience_by_agent
* fixed some PR comments
* renamed zero_grad to zero_gradients in LearningModule
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* added DEVICE env variable as first choice for torch device
* refined dqn example
* fixed lint issues
* removed unwanted import in cim example
* updated cim-dqn notebook
* simplified scheduler
* edited notebook according to merged scheduler changes
* refined dimension check for learning module manager and removed num_actions from DQNConfig
* bug fix for cim example
* added notebook output
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* moved decorator logic inside algorithms
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 VM scheduling docs (#228)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* vm doc init
* Update docs
* Update docs
* Update docs
* Update docs
* Remove old notebook
* Update docs
* Update docs
* Add figure
* Update docs
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* v0.2 VM Scheduling docs refinement (#231)
* Fix typo
* Refining vm scheduling docs
* V0.2 store refinement (#234)
* updated docs and images for rl toolkit
* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Fix bug (#237)
vm scenario: fix the event type bug of the postpone event
* V0.2 rl toolkit doc (#235)
* updated docs and images for rl toolkit
* updated cim example doc
* updated cim exmaple docs
* updated cim example rst
* updated rl_toolkit and cim example docs
* replaced q_module with q_net in example rst
* refined doc
* refined doc
* updated figures
* updated figures
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Merge V0.2 vis into V0.2 (#233)
* Implemented dump snapshots and convert to CSV.
* Let BE supports params when dump snapshot.
* Refactor dump code to core.py
* Implemented decision event dump.
* replace is not '' with !=''
* Fixed issues that code review mentioned.
* removed path from hello.py
* Changed import sort.
* Fix import sorting in citi_bike/business_engine
* visualization 0.1
* Updated lint configurations.
* Fixed formatting error that caused lint errors.
* render html title function
* Try to fix lint errors.
* flake-8 style fix
* remove space around 18,35
* dump_csv_converter.py re-formatting.
* files re-formatting.
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* re-formatting after merged upstream.
* Updated import section.
* Updated import section.
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* V0.2 vis dump feature enhancement. (#190)
* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* doc refine
* doc update
* params type
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* Added manifest file. (#201)
Only a few changes that need to meet requirements of manifest file format.
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
* V0.2 visualization-0.1 (#181)
* visualization 0.1
* render html title function
* flake-8 style fix
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* fix the visualization of docs/key_components/distributed_toolkit
* doc refine
* doc update
* params type
* add examples into isort ignore
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
* image change
* add reset snapshot
* delete dump
* add new line
* add next steps
* import change
* relative import
* add init file
* import change
* change utils file
* change cliexpcetion to clierror
* dashboard test
* change result
* change assertation
* move not
* unit test change
* core change
* unit test delete name_mapping_file
* update cim business engine
* doc update
* change relative path
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* duc update
* duc update
* duc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* change import sequence
* comments update
* doc add pic
* add dependency
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* Update dashboard_visualization.rst
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* delete white space
* doc update
* doc update
* update doc
* update doc
* update doc
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 docs process mode (#230)
* Update process mode docs and fixed on premises
* Update orchestration docs
* Update process mode docs add JOB_NAME as env variable
* fixed bugs
* fixed isort issue
* update docs index
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* V0.2 learning model refinement (#236)
* moved optimizer options to LearningModel
* typo fix
* fixed lint issues
* updated notebook
* misc edits
* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook
* renamed single_host_cim_learner ot cim_learner in notebook
* updated notebook output
* typo fix
* removed dimension check in absence of shared stack
* fixed a typo
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Update vm docs (#241)
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 info update (#240)
* update readme
* update version
* refine reademe format
* add vis gif
* add citation
* update citation
* update badge
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
* Fix typo (#242)
* Fix typo
* fix typo
* fix
* syntax fix (#253)
* syntax fix
* syntax fix
* syntax fix
* rm unwanted import
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vm oversubscription (#246)
* Remove topology
* Update pipeline
* Update pipeline
* Update pipeline
* Modify metafile
* Add two attributes of VM
* Update pipeline
* Add vm category
* Add todo
* Add oversub config
* Add oversubscription feature
* Lint fix
* Update based on PR comment.
* Update pipeline
* Update pipeline
* Update config.
* Update based on PR comment
* Update
* Add pm sku feature
* Add sku setting
* Add sku feature
* Lint fix
* Lint style
* Update sku, overloading
* Lint fix
* Lint style
* Fix bug
* Modify config
* Remove sky and replaced it by pm stype
* Add and refactor vm category
* Comment out cofig
* Unify the enum format
* Fix lint style
* Fix import order
* Update based on PR comment
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* V0.2 vm scheduling decision event (#257)
* Fix data preparation bug
* Add frame index
* V0.2 PG, K-step and lambda return utils (#155)
* fixed a bug
* fixed lint issues
* added load/dump functions to LearningModel
* fixed a bug
* fixed a bug
* fixed lint issues
* merged with v0.2_embedded_optims
* refined DQN docstrings
* removed load/dump functions from DQN
* added task validator
* fixed decorator use
* fixed a typo
* fixed a bug
* revised
* fixed lint issues
* changed LearningModel's step() to take a single loss
* revised learning model design
* revised example
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added decorator utils to algorithm
* fixed a bug
* renamed core_model to model
* fixed a bug
* 1. fixed lint formatting issues; 2. refined learning model docstrings
* rm trailing whitespaces
* added decorator for choose_action
* fixed a bug
* fixed a bug
* fixed version-related issues
* renamed add_zeroth_dim decorator to expand_dim
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* small fixes
* revised code based on revised abstractions
* fixed some bugs
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added shared_module property to LearningModel
* added shared_module property to LearningModel
* fixed a bug with k-step return in AC
* fixed a bug
* fixed a bug
* merged pg, ac and ppo examples
* fixed a bug
* fixed a bug
* fixed naming for ppo
* renamed some variables in PPO
* added ActionWithLogProbability return type for PO-type algorithms
* fixed a bug
* fixed a bug
* fixed lint issues
* revised __getstate__ for LearningModel
* fixed a bug
* added soft_update function to learningModel
* fixed a bug
* revised learningModel
* rm __getstate__ and __setstate__ from LearningModel
* added noise explorer
* formatting
* fixed formatting
* removed unnecessary comma
* removed unnecessary comma
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* removed epsilon parameter from choose_action
* removed epsilon parameter from choose_action
* changed agent manager's train parameter to experience_by_agent
* fixed some PR comments
* renamed zero_grad to zero_gradients in LearningModule
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* added DEVICE env variable as first choice for torch device
* refined dqn example
* fixed lint issues
* removed unwanted import in cim example
* updated cim-dqn notebook
* simplified scheduler
* edited notebook according to merged scheduler changes
* refined dimension check for learning module manager and removed num_actions from DQNConfig
* bug fix for cim example
* added notebook output
* updated cim PO example code according to changes in maro/rl
* removed early stopping from CIM dqn example
* combined ac and ppo and simplified example code and config
* removed early stopping from cim example config
* moved decorator logic inside algorithms
* renamed early_stopping_callback to early_stopping_checker
* put PG and AC under PolicyOptimization class and refined examples accordingly
* fixed lint issues
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
* moved optimizer options to LearningModel
* typo fix
* fixed lint issues
* updated notebook
* updated cim example for policy optimization
* typo fix
* typo fix
* typo fix
* typo fix
* misc edits
* minor edits to rl_toolkit.rst
* checked out docs from master
* fixed typo in k-step shaper
* fixed lint issues
* bug fix in store
* lint issue fix
* changed default max_ep to 100 for policy_optimization algos
* vis doc update to master (#244)
* refine readme
* feat: refine data push/pull (#138)
* feat: refine data push/pull
* test: add cli provision testing
* fix: style fix
* fix: add necessary comments
* fix: from code review
* add fall back function in weather download (#112)
* fix deployment issue in multi envs
* fix typo
* fix ~/.maro not exist issue in build
* skip deploy when build
* update for comments
* temporarily disable weather info
* replace ecr with cim in setup.py
* replace ecr in manifest
* remove weather check when read data
* fix station id issue
* fix format
* add TODO in comments
* add noaa weather source
* fix weather reset and weather comment
* add comment for weather data url
* some format update
* add fall back function in weather download
* update comment
* update for comments
* update comment
* add period
* fix for pylint
* update for pylint check
* added example docs (#136)
* added example docs
* added citibike greedy example doc
* modified citibike doc
* fixed PR comments
* fixed more PR comments
* fixed small formatting issue
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* switch the key and value of handler_dict in decorator (#144)
* switch the key and value of handler_dict in decorator
* add dist decorator UT and fixed multithreading conflict in maro test suite
* pr comments update.
* resolved comments about decorator UT
* rename handler_fun in dist decorator
* change self.attr into class_name.attr
* update UT tests comments
* V0.1 annotation (#147)
* refine the annotation of simulator core
* remove reward from env(be)
* format refined
* white spaces test
* left-padding spaces refined
* format modifed
* update the left-padding spaces of docstrings
* code format updated
* update according to comments
* update according to PR comments
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Event payload details for env.summary (#156)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Implemented dump snapshots and convert to CSV.
* Let BE supports params when dump snapshot.
* Refactor dump code to core.py
* Implemented decision event dump.
* V0.2 online lp for citi bike (#159)
* key_list of events added for env.summary
* code refined according to lint
* 2 kinds of Payload added for CIM scenario; citi bike summary refined according to comments
* code format refined
* try trigger the git tests
* update github workflow
* online LP example added for citi bike
* infeasible solution
* infeasible solution fixed: call snapshot before any env.step()
* experiment results of toy topos added
* experiment results of toy topos added
* experiment result update: better than naive baseline
* PuLP version added
* greedy experiment results update
* citibike result update
* modified according to PR comments
* update experiment results and forecasting comparison
* citi bike lp README updated
* README updated
* modified according to PR comments
* update according to PR comments
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* replace is not '' with !=''
* Fixed issues that code review mentioned.
* removed path from hello.py
* Changed import sort.
* Fix import sorting in citi_bike/business_engine
* visualization 0.1
* Updated lint configurations.
* Fixed formatting error that caused lint errors.
* render html title function
* Try to fix lint errors.
* flake-8 style fix
* remove space around 18,35
* dump_csv_converter.py re-formatting.
* files re-formatting.
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* update according to flake8
* re-formatting after merged upstream.
* Updated import section.
* Updated import section.
* V0.2 Logical operator overloading for EarlyStoppingChecker (#178)
* 1. added logical operator overloading for early stopping checker; 2. added mean value checker
* fixed PR comments
* removed learner.exit() in single_process_launcher
* added another early stopping checker in example
* fixed PR comments and lint issues
* lint issue fix
* fixed lint issues
* fixed a bug
* fixed a bug
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 skip connection (#176)
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* moved reward type casting to exp shaper
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* V0.2 vis dump feature enhancement. (#190)
* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* fixed a bug in learner's test() (#193)
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 double dqn (#188)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* set is_double to true in DQN config
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* V0.2 feature predefined image (#183)
* feat: support predefined image provision
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
* doc refine
* doc update
* params type
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* V0.2 feature proxy rejoin (#158)
* update dist decorator
* replace proxy.get_peers by proxy.peers
* update proxy rejoin (draft, not runable for proxy rejoin)
* fix bugs in proxy
* add message cache, and redesign rejoin parameter
* feat: add checkpoint with test
* update proxy.rejoin
* fixed rejoin bug, rename func
* add test example(temp)
* feat: add FaultToleranceAgent, refine other MasterAgents and NodeAgents.
* capital env vari name
* rm json.dumps; change retries to 10; temp add warning level for rejoin
* fix: unable to load FaultToleranceAgent, missing params
* fix: delete mapping in StopJob if FaultTolerance is activated, add exception handler for FaultToleranceAgent
* feat: add node_id to node_details
* fix: add a new dependency for tests
* style: meet linting requirements
* style: remaining linting problems
* lint fixed; rm temp test folder.
* fixed lint f-string without placeholder
* fix: add a flag for "remove_container", refine restart logic and Redis keys naming
* proxy rejoin update.
* variable rename.
* fixed lint issues
* fixed lint issues
* add exit code for different error
* feat: add special errors handler
* add max rejoin times
* remove unused import
* add rejoin UT; resolve rejoin comments
* lint fixed
* fixed UT import problem
* rm MessageCache in proxy
* fix: refine key naming
* update proxy rejoin; add topic for broadcast
* feat: support predefined image provision
* update UT for communication
* add docstring for rejoin
* fixed isort and zmq driver import
* fixed isort and UT test
* fix isort issue
* proxy rejoin update (comments v2)
* fixed isort error
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* feat: add exists method for checkpoint
* fix: error scripts invocation after using relative import
* fix: missing init.py
* fixed a bug in learner's test()
* add driver close and socket SUB disconnect for rejoin
* feat: add distributed_config for dqn example
* test: update test for grass
* test: update test for k8s
* feat: add promptings for steps
* fix: change relative imports to absolute imports
* fixed comments and update logger level
* mv driver in proxy.__init__ for issue temp fixed.
* Update docstring and comments
* style: fix code reviews problems
* fix code format
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 feature cli windows (#203)
* fix: change local mkdir to os.makedirs
* fix: add utf8 encoding for logger
* fix: add powershell.exe prefix to subprocess functions
* feat: add debug_green
* fix: use fsutil to create fix-size files in Windows
* fix: use universal_newlines=True to handle encoding problem in different operating systems
* fix: use temp file to do copy when the operating system is not Linux
* fix: linting error
* fix: use fsutil in test_k8s.py
* feat: dynamic init ABS_PATH in GlobalParams
* fix: use -Command to execute Powershell command
* fix: refine code style in k8s_azure_executor.py, add Windows support for k8s mode
* fix: problems in code review
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* V0.2 merge master (#214)
* fix the visualization of docs/key_components/distributed_toolkit
* add examples into isort ignore
* refine import path for examples (#195)
* refine import path for examples
* refine indents
* fixed formatting issues
* update code style
* add editorconfig-checker, add editorconfig path into lint, change super-linter version
* change path for code saving in cim.gnn
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
* fix issue that sometimes there is conflict between distutils and setuptools (#208)
* fix issue that cython and setuptools conflict
* follow the accepted temp workaround
* update comment, it should be conflict between setuptools and distutils
* fixed bugs related to proxy interface changes
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* typo fix
* Bug fix: event buffer issue that cause Actions cannot be passed into business engine (#215)
* bug fix
* clear the reference after extract sub events, update ut to cover this issue
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* fix flake8 style problem
* V0.2 feature refine mode namings (#212)
* feat: refine cli exception
* feat: refine mode namings
* EventBuffer refine (#197)
* merge uniform event changes back
* 1st step: move executing events into stack for better removing performance
* flush event pool
* typo
* add option for env to enable event pool
* refine stack functions
* fix comment issues, add typings
* lint fixing
* lint fix
* add missing fix
* linting
* lint
* use linked list instead original event list and execute stack
* add missing file
* linting, and fixes
* add missing file
* linting fix
* fixing comments
* add missing file
* rename event_list to event_linked_list
* correct import path
* change enable_event_pool to disable_finished_events
* add missing file
* fixed bugs in dist rl
* feat: rename files
* tests: set longer gracefully wait time
* style: fix linting errors
* style: fix linting errors
* style: fix linting errors
* fix: rm redundant variables
* fix: refine error message
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vis new (#210)
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* V0.2 local host process (#221)
* Update local process (not ready)
* update cli process mode
* add setup/clear/template for maro process
* fix process stop
* add logger and rename parameters
* add logger for setup/clear
* fixed close not exist pid when given pid list.
* Fixed comments and rename setup/clear with create/delete
* update ProcessInternalError
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* V0.2 grass on premises (#220)
* feat: refine cli exception
* commit on v0.2_grass_on_premises
Co-authored-by: Lyuchun Huang <romic.kid@gmail.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 vm scheduling scenario (#189)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Resolve none action problem (#224)
* V0.2 vm_scheduling notebook (#223)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* Refine cpu reader and unittest
* Lint update
* Refine based on PR comment
* Add agent index
* Add node maping
* Init vm shceduling notebook
* Add notebook
* Refine based on PR comments
* Renaming postpone_step
* Renaming and refine based on PR comments
* Rename config
* Update based on the v0.2_datacenter
* Update notebook
* Update
* update filepath
* notebook updated
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Update process mode docs and fixed on premises (#226)
* V0.2 Add github workflow integration (#222)
* test: add github workflow integration
* fix: split procedures && bug fixed
* test: add training only restriction
* fix: add 'approved' restriction
* fix: change default ssh port to 22
* style: in one line
* feat: add timeout for Subprocess.run
* test: change default node_size to Standard_D2s_v3
* style: refine style
* fix: add ssh_port param to on-premises mode
* fix: add missing init.py
* update param name
* V0.2 explorer (#198)
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* added noise explorer
* fixed formatting
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* removed epsilon parameter from choose_action
* fixed some PR comments
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* refined dqn example
* fixed lint issues
* simplified scheduler
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 embedded optim (#191)
* added dueling action value model
* renamed params in dueling_action_value_model
* renamed shared_features to features
* replaced IdentityLayers with nn.Identity
* 1. added skip connection option in FC_net; 2. generalized learning model
* added skip_connection option in config
* removed type casting in fc_net
* fixed lint formatting issues
* refined docstring
* mv dueling_actiovalue_model and fixed some bugs
* added multi-head functionality to LearningModel
* refined learning model docstring
* added head_key param in learningModel forward
* added double DQN and dueling features to DQN
* fixed a bug
* added DuelingQModelHead enum
* fixed a bug
* removed unwanted file
* fixed PR comments
* added top layer logic and is_top option in fc_net
* fixed a bug
* fixed a bug
* reverted some changes in learning model
* reverted some changes in learning model
* added members to learning model to fix the mode issue
* fixed a bug
* fixed mode setting issue in learning model
* fixed PR comments
* revised cim example according to DQN changes
* renamed eval_model to q_value_model in cim example
* more fixes
* fixed a bug
* fixed a bug
* added doc per PR comments
* removed learner.exit() in single_process_launcher
* removed learner.exit() in single_process_launcher
* fixed PR comments
* fixed rl/__init__
* fixed issues in example
* fixed a bug
* fixed a bug
* fixed lint formatting issues
* double DQN feature
* fixed a bug
* fixed a bug
* fixed PR comments
* fixed lint issue
* embedded optimizer into SingleHeadLearningModel
* 1. fixed PR comments related to load/dump; 2. removed abstract load/dump methods from AbsAlgorithm
* added load_models in simple_learner
* minor docstring edits
* minor docstring edits
* minor docstring edits
* mv optimizer options inside LearningMode
* modified example accordingly
* fixed a bug
* fixed a bug
* fixed a bug
* added dueling DQN feature
* revised and refined docstrings
* fixed a bug
* fixed lint issues
* added load/dump functions to LearningModel
* fixed a bug
* fixed a bug
* fixed lint issues
* refined DQN docstrings
* removed load/dump functions from DQN
* added task validator
* fixed decorator use
* fixed a typo
* fixed a bug
* fixed lint issues
* changed LearningModel's step() to take a single loss
* revised learning model design
* revised example
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* added decorator utils to algorithm
* fixed a bug
* renamed core_model to model
* fixed a bug
* 1. fixed lint formatting issues; 2. refined learning model docstrings
* rm trailing whitespaces
* added decorator for choose_action
* fixed a bug
* fixed a bug
* fixed version-related issues
* renamed add_zeroth_dim decorator to expand_dim
* overhauled exploration abstraction
* fixed a bug
* fixed a bug
* fixed a bug
* added exploration related methods to abs_agent
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* separated learning with exploration schedule and without
* small fixes
* moved explorer logic to actor side
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* removed unwanted param from simple agent manager
* small fixes
* added shared_module property to LearningModel
* added shared_module property to LearningModel
* revised __getstate__ for LearningModel
* fixed a bug
* added soft_update function to learningModel
* fixed a bug
* revised learningModel
* rm __getstate__ and __setstate__ from LearningModel
* added noise explorer
* fixed formatting
* removed unnecessary comma
* removed unnecessary comma
* fixed PR comments
* removed unwanted exception and imports
* removed unwanted exception and imports
* fixed a bug
* fixed PR comments
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issue
* fixed a bug
* fixed lint issue
* fixed naming
* combined exploration param generation and early stopping in scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* moved logger inside scheduler
* fixed a bug
* fixed a bug
* fixed a bug
* fixed lint issues
* fixed lint issue
* removed epsilon parameter from choose_action
* removed epsilon parameter from choose_action
* changed agent manager's train parameter to experience_by_agent
* fixed some PR comments
* renamed zero_grad to zero_gradients in LearningModule
* fixed some PR comments
* bug fix
* bug fix
* bug fix
* removed explorer abstraction from agent
* added DEVICE env variable as first choice for torch device
* refined dqn example
* fixed lint issues
* removed unwanted import in cim example
* updated cim-dqn notebook
* simplified scheduler
* edited notebook according to merged scheduler changes
* refined dimension check for learning module manager and removed num_actions from DQNConfig
* bug fix for cim example
* added notebook output
* removed early stopping from CIM dqn example
* removed early stopping from cim example config
* moved decorator logic inside algorithms
* renamed early_stopping_callback to early_stopping_checker
* removed action_dim from noise explorer classes and added some shape checks
* modified NoiseExplorer's __call__ logic to batch processing
* made NoiseExplorer's __call__ return type np array
* renamed update to set_parameters in explorer
* fixed old naming in test_grass
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* V0.2 VM scheduling docs (#228)
* Initialize
* Data center scenario init
* Code style modification
* V0.2 event buffer subevents expand (#180)
* V0.2 rl toolkit refinement (#165)
* refined rl abstractions
* fixed formattin issues
* checked out error-code related code from v0.2_pg
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* fixed a bug
* renamed save_models to dump_models
* 1. set default batch_norm_enabled to True; 2. used state_dict in dqn model saving
* renamed dump_experience_store to dump_experience_pool
* fixed a bug in the dump_experience_pool method
* fixed some PR comments
* fixed more PR comments
* 1.fixed some PR comments; 2.added early_stopping_checker; 3.revised explorer class
* fixed cim example according to rl toolkit changes
* fixed some more PR comments
* rewrote multi_process_launcher to eliminate the distributed section in config
* 1. fixed a typo; 2. added logging before early stopping
* fixed a bug
* fixed a bug
* fixed a bug
* added early stopping feature to CIM exmaple
* fixed a typo
* fixed some issues with early stopping
* changed early stopping metric func
* fixed a bug
* fixed a bug
* added early stopping to dist mode cim
* added experience collecting func
* edited notebook according to changes in CIM example
* fixed bugs in nb
* fixed lint formatting issues
* fixed a typo
* fixed some PR comments
* fixed more PR comments
* revised docs
* removed nb output
* fixed a bug in simple_learner
* fixed a typo in nb
* fixed a bug
* fixed a bug
* fixed a bug
* removed unused import
* fixed a bug
* 1. changed early stopping default config; 2. renamed param in early stopping checker and added typing
* fixed some doc issues
* added output to nb
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* unfold sub-events, insert after parent
* remove event category, use different class instead, add helper functions to gen decision and action event
* add a method to support add immediate event to cascade event with tick validation
* fix ut issue
* add action as 1st sub event to ensure the executing order
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Data center scenario update
* Code style update
* Data scenario business engine update
* Isort update
* Fix lint code check
* Fix based on PR comments.
* Update based on PR comments.
* Add decision payload
* Add config file
* Update utilization series logic
* Update based on PR comment
* Update based on PR
* Update
* Update
* Add the ValidPm class
* Update docs string and naming
* Add energy consumption
* Lint code fixed
* Refining postpone function
* Lint style update
* Init data pipeline
* Update based on PR comment
* Add data pipeline download
* Lint style update
* Code style fix
* Temp update
* Data pipeline update
* Add aria2p download function
* Update based on PR comment
* Update based on PR comment
* Update based on PR comment
* Update naming of variables
* Rename topology
* Renaming
* Fix valid pm list
* Pylint fix
* Update comment
* Update docstring and comment
* Fix init import
* Update tick issue
* fix merge problem
* update style
* V0.2 datacenter data pipeline (#199)
* Data pipeline update
* Data pipeline update
* Lint update
* Update pipeline
* Add vmid mapping
* Update lint style
* Add VM data analytics
* Update notebook
* Add binary converter
* Modift vmtable yaml
* Update binary meta file
* Add cpu reader
* random example added for data center
* Fix bugs
* Fix pylint
* Add launcher
* Fix pylint
* best fit policy added
* Add reset
* Add config
* Add config
* Modify action object
* Modify config
* Fix naming
* Modify config
* Add snapshot list
* Modify a spelling typo
* Update based on PR comments.
* Rename scenario to vm scheduling
* Rename scenario
* Update print messages
* Lint fix
* Lint fix
* Rename scenario
* Modify the calculation of cpu utilization
* Add comment
* Modify data pipeline path
* Fix typo
* Modify naming
* Add unittest
* Add comment
* Unify naming
* Fix data path typo
* Update comments
* Update snapshot features
* Add take snapshot
* Add summary keys
* Update cpu reader
* Update naming
* Add unit test
* Rename snapshot node
* Add processed data pipeline
* Modify config
* Add comment
* Lint style fix
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Add package used in vm_scheduling
* add aria2p to test requirement
* best fit example: update the usage of snapshot
* Add aria2p to test requriement
* Remove finish event
* Fix unittest
* Add test dataset
* Update based on PR comment
* vm doc init
* Update docs
* Update docs
* Update docs
* Update docs
* Remove old notebook
* Update docs
* Update docs
* Add figure
* Update docs
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* doc update
* new link
* image update
* v0.2 VM Scheduling docs refinement (#231)
* Fix typo
* Refining vm scheduling docs
* image change
* V0.2 store refinement (#234)
* updated docs and images for rl toolkit
* 1. fixed import formats for maro/rl; 2. changed decorators to hypers in store
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Fix bug (#237)
vm scenario: fix the event type bug of the postpone event
* V0.2 rl toolkit doc (#235)
* updated docs and images for rl toolkit
* updated cim example doc
* updated cim exmaple docs
* updated cim example rst
* updated rl_toolkit and cim example docs
* replaced q_module with q_net in example rst
* refined doc
* refined doc
* updated figures
* updated figures
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Merge V0.2 vis into V0.2 (#233)
* Implemented dump snapshots and convert to CSV.
* Let BE supports params when dump snapshot.
* Refactor dump code to core.py
* Implemented decision event dump.
* replace is not '' with !=''
* Fixed issues that code review mentioned.
* removed path from hello.py
* Changed import sort.
* Fix import sorting in citi_bike/business_engine
* visualization 0.1
* Updated lint configurations.
* Fixed formatting error that caused lint errors.
* render html title function
* Try to fix lint errors.
* flake-8 style fix
* remove space around 18,35
* dump_csv_converter.py re-formatting.
* files re-formatting.
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* re-formatting after merged upstream.
* Updated import section.
* Updated import section.
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* V0.2 vis dump feature enhancement. (#190)
* Dumps added manifest file.
* Code updated format by flake8
* Changed manifest file format for easy reading.
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* doc refine
* doc update
* params type
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* Added manifest file. (#201)
Only a few changes that need to meet requirements of manifest file format.
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
* V0.2 visualization-0.1 (#181)
* visualization 0.1
* render html title function
* flake-8 style fix
* style fixed
* tab delete
* white space fix
* white space fix-2
* vis redundant function delete
* refine
* pr refine
* isort fix
* white space
* lint error
* \n error
* test continuation
* indent
* continuation of indent
* indent 0.3
* comment update
* comment update 0.2
* f-string update
* f-string 0.2
* lint 0.3
* lint 0.4
* lint 0.4
* lint 0.5
* lint 0.6
* docstring update
* data version deploy update
* condition update
* add whitespace
* deploy info update; docs update
* weird white space
* Update dashboard_visualization.md
* new endline?
* delete dependency
* delete irrelevant file
* change scenario to enum, divide file path into a separated class
* fix the visualization of docs/key_components/distributed_toolkit
* doc refine
* doc update
* params type
* add examples into isort ignore
* data structure update
* doc&enum, formula refine
* refine
* add ut, refine doc
* style refine
* isort
* strong type fix
* os._exit delete
* revert datalib
* import new line
* change test case
* change file name & doc
* change deploy path
* delete params
* revert file
* delete duplicate file
* delete single process
* update naming
* manually change import order
* delete blank
* edit error
* requirement txt
* style fix & refine
* comments&docstring refine
* add parameter name
* test & dump
* comments update
* comments fix
* delete toolkit change
* doc update
* citi bike update
* deploy path
* datalib update
* revert datalib
* revert
* maro file format
* comments update
* doc update
* update param name
* doc update
* new link
* image update
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
* image change
* add reset snapshot
* delete dump
* add new line
* add next steps
* import change
* relative import
* add init file
* import change
* change utils file
* change cliexpcetion to clierror
* dashboard test
* change result
* change assertation
* move not
* unit test change
* core change
* unit test delete name_mapping_file
* update cim business engine
* doc update
* change relative path
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* duc update
* duc update
* duc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* change import sequence
* comments update
* doc add pic
* add dependency
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* Update dashboard_visualization.rst
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* delete white space
* doc update
* doc update
* update doc
* update doc
* update doc
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 docs process mode (#230)
* Update process mode docs and fixed on premises
* Update orchestration docs
* Update process mode docs add JOB_NAME as env variable
* fixed bugs
* fixed isort issue
* update docs index
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* V0.2 learning model refinement (#236)
* moved optimizer options to LearningModel
* typo fix
* fixed lint issues
* updated notebook
* misc edits
* 1. renamed CIMAgent to DQNAgent; 2. moved create_dqn_agents to Agent section in notebook
* renamed single_host_cim_learner ot cim_learner in notebook
* updated notebook output
* typo fix
* removed dimension check in absence of shared stack
* fixed a typo
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Update vm docs (#241)
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 info update (#240)
* update readme
* update version
* refine reademe format
* add vis gif
* add citation
* update citation
* update badge
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
* Fix typo (#242)
* Fix typo
* fix typo
* fix
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
* doc update
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* bug fix related to np array divide (#245)
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Master.simple bike (#250)
* notebook for simple bike repositioning added
* add simple rule-based algorithms
* unify input
* add policy based on statistics
* update be for simple bike scenario to fit latest event buffer changes (#247)
* change rendered graph
* figures updated
* change notebook
* matplot updated
* figures updated
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: wesley <Wenlei.Shi@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* simple bike repositioning article: formula updated
* checked out docs/source from v0.2
* aligned with v0.2
* rm unwanted import
* added references in policy_optimization.py
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
* V0.2 backend dynamic node support (#172)
* update lint workflow
* fix workflow issue
* Update lint.yml
* Create tox.ini
* Update lint.yml
* Update lint.yml
* Update tox.ini
* Update lint.yml
* Delete tox.ini from root folder, move it to .github/linters
* Update CONTRIBUTING.md
* add more comments
* update lint conf to ignore cli banner issue
* change extension implementation from c to cpp
* update script to gen cpp files
* backend base interface redefine
* interface revamp for np backend
* 1st step for revamp
* bug fix
* draft
* implementation of attribute
* implementation of backend
* remove backend switching
* draft raw backend wrapper
* correct function parameter type
* 1st runable version
* bug fix for types
* ut passed
* change CRLF to LF
* fix get_node_info interface
* add raw test in frame ut
* return np.array for all query result
* use ticks from backend
* set init value
* snapshot ut passed
* support set default backend by environemnt variable
* env ut with different backend
* fix take snapshot index bug
* test under both backends
* ignore generated cpp file
* fix lint isues
* more lint fix
* use ordered map to store ticks to keep the order
* remove test code
* refine dup code
* refine code to avoid too much if/else
* handle and raise exception for attr getter
* change the way to handle cpp exception, use cython runtimeerror instead
* add missing function, and fix bug in np impl
* fix lint issue
* specify c++11 flag for compilers
* use normal field assignment instead initializer list, as linux gcc will complain it
* add np ignore macro
* try to refine token pasting operator to avoid error on linux
* more pasting operator issue fix
* remove un-used options
* update workflow files to fit new backend
* 1st version of dynamic backend structure
* setup ut for cpp using lest
* bitset complete
* attributestore and ut
* arrange
* copy_to
* current frame
* ut for frame
* bug fix and ut correct
* fix issue that value not correct after arrange
* fix bug in test case
* frame update
* change the way to add nodes, support add node from middle
* frame in backend
* snapshotlist code complete
* add size method for snapshotlist, add ut template
* make sure snapshot max size not be 0
* add max size
* fix query parameters
* fix attribute store extend error
* add function to retrieve attribute from snapshotlist
* return nan for invalid index
* add function to check if nan for float attribute only
* fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash
* add functions to expose internal state under debug mode, make it easy to do unit test
* fix issue that cause overlap logic skiped
* ut passed for all implemented functions
* remove query in ut, as it not completed yet
* refine querying interfaces, use 2 functions for 1 querying
* snapshot query,
* use pointer instead weak_ptr
* backend impl
* set default parameters value
* query bug fix,
* bug fix: new_attr should return attr id not node id
* use macro to create attribute getters
* add reset support
* change the way to reset, avoid allocation time
* test reset for attributestore
* use Bitset instead vector<bool> to make it easy to reset
* refine backend interfaces to make it compact with old one
* correct quering interface, cython compile passed
* bug fix: get_ticks not set correct index
* correct cpp backend binding, add type for frame
* correct ut for snapshot
* bug fix: query cause crash after snapshot reset
* fix env test
* bug fix: is_nan should check data type first
* fix cim ut issues with raw backend
* fix citibike ut issues for raw backend
* add interfaces to support dynamic nodes, not tested
* bug fix: access cpp object without cdef
* bug fix: missing impl for dynamic methods
* ut for append nodes
* return node number dynamiclly
* remove unused parameters for snapshot
* remove unused code
* allow get attribute for deleted node
* ut for delete and resume node
* function to set attribute slot
* bug fix: set attribute will cause crash
* bug fix: remove append node when reset cause exception
* bug fix: frame.backend_type return incorrect name
* backends performance comparison
* correct internal type
* correct warnings
* missing ;
* formating
* fix lint issue
* simple the way to copy mapping
* add dump interfaces
* frame dump
* ignore if dump path is not exist
* bug fix: use max slots instead of current slots for padding in snapshot querying
* use max slot number in history instead of current for padding
* dump for snapshot
* close file at the end
* refine snapshot dump function
* fix lint issue
* avoid too much allocate operation
* use pointer instead reference for furthure changes
* avoid 2 times map copy
* add comments for missing functions
* performance optimize
* use emplace instead push
* use emplace instead push
* remove cpp files
* add missing lisence
* ignore .vs folder
* add lest lisence for cpp unittest
* Delete CMakeLists.txt
* add error msg for exception, make it easy to identify error at python side
* remove old codes
* replace with new code
* change IDENTIER to NODE_TYPE and ATTR_TYPE
* build pass
* fix attr type not correct bug
* reomve unused comment
* make frame ut pass
* correct the max snapshots checking
* fix test case
* add missing file
* correct performance test
* refine attribute code
* refine bitset code
* update FrameBase doc about switch backend
* correct the exception name
* refine frame code
* refine node code
* refine snapshot list code
* add is_const and is_list when adding attribute
* support query const attribute without tick exist
* add operations for list attribute
* remove cache as we have list attribute
* add remove and insert for list attribute
* add for-loop support for list attribute
* fix bug that not update list attribute slot number after operations
* test for dynamic features
* frame dump
* dump for snapshot list
* fix issue on gcc compiler
* add missing file
* fix lint issues
* refine the exception, more comments
* fix lint issue
* fix lint issue
* use simulate enum instead of str
* Use new type instead old in tests
* using mapping instead if-else
* remove generated code
* use mapping to reduce too much if-else
* add default attribute type int if not provided or invalid provided
* remove generated code
* update workflow with code gen
* more frame test
* add missing files
* test: cover maro.simulator.utils.common
* update test with new scenario
* comments
* tests
* update doc
* fix lint and comments
* CRLF to LF
* fix lint issue
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* V0.2 vm oversub docs (#256)
* Remove topology
* Update pipeline
* Update pipeline
* Update pipeline
* Modify metafile
* Add two attributes of VM
* Update pipeline
* Add vm category
* Add todo
* Add oversub config
* Add oversubscription feature
* Lint fix
* Update based on PR comment.
* Update pipeline
* Update pipeline
* Update config.
* Update based on PR comment
* Update
* Add pm sku feature
* Add sku setting
* Add sku feature
* Lint fix
* Lint style
* Update sku, overloading
* Lint fix
* Lint style
* Fix bug
* Modify config
* Remove sky and replaced it by pm stype
* Add and refactor vm category
* Comment out cofig
* Unify the enum format
* Fix lint style
* Fix import order
* Update based on PR comment
* Update overload to the VM docs
* Update docs
* Update vm docs
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Arthur Jiang <sjian@microsoft.com>
Co-authored-by: Arthur Jiang <ArthurSJiang@gmail.com>
Co-authored-by: Romic Huang <romic.kid@gmail.com>
Co-authored-by: zhanyu wang <pocket_2001@163.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: kaiqli <59279714+kaiqli@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: Wenlei Shi <Wenlei.Shi@microsoft.com>
Co-authored-by: Michael Li <mic_lee2000@hotmail.com>
Co-authored-by: kyu-kuanwei <72911362+kyu-kuanwei@users.noreply.github.com>
Co-authored-by: Meroy Chen <39452768+Meroy9819@users.noreply.github.com>
Co-authored-by: Miaoran Chen (Wicresoft) <v-miaorc@microsoft.com>
Co-authored-by: kaiqli <v-kaiqli@microsoft.com>
Co-authored-by: Kuan Wei Yu <v-kyu@microsoft.com>
* update lint workflow
* fix workflow issue
* Update lint.yml
* Create tox.ini
* Update lint.yml
* Update lint.yml
* Update tox.ini
* Update lint.yml
* Delete tox.ini from root folder, move it to .github/linters
* Update CONTRIBUTING.md
* add more comments
* update lint conf to ignore cli banner issue
* change extension implementation from c to cpp
* update script to gen cpp files
* backend base interface redefine
* interface revamp for np backend
* 1st step for revamp
* bug fix
* draft
* implementation of attribute
* implementation of backend
* remove backend switching
* draft raw backend wrapper
* correct function parameter type
* 1st runable version
* bug fix for types
* ut passed
* change CRLF to LF
* fix get_node_info interface
* add raw test in frame ut
* return np.array for all query result
* use ticks from backend
* set init value
* snapshot ut passed
* support set default backend by environemnt variable
* env ut with different backend
* fix take snapshot index bug
* test under both backends
* ignore generated cpp file
* fix lint isues
* more lint fix
* use ordered map to store ticks to keep the order
* remove test code
* refine dup code
* refine code to avoid too much if/else
* handle and raise exception for attr getter
* change the way to handle cpp exception, use cython runtimeerror instead
* add missing function, and fix bug in np impl
* fix lint issue
* specify c++11 flag for compilers
* use normal field assignment instead initializer list, as linux gcc will complain it
* add np ignore macro
* try to refine token pasting operator to avoid error on linux
* more pasting operator issue fix
* remove un-used options
* update workflow files to fit new backend
* 1st version of dynamic backend structure
* setup ut for cpp using lest
* bitset complete
* attributestore and ut
* arrange
* copy_to
* current frame
* ut for frame
* bug fix and ut correct
* fix issue that value not correct after arrange
* fix bug in test case
* frame update
* change the way to add nodes, support add node from middle
* frame in backend
* snapshotlist code complete
* add size method for snapshotlist, add ut template
* make sure snapshot max size not be 0
* add max size
* fix query parameters
* fix attribute store extend error
* add function to retrieve attribute from snapshotlist
* return nan for invalid index
* add function to check if nan for float attribute only
* fix bug that not update _last_tick for snapshot list, that cause take snapshot for same tick crash
* add functions to expose internal state under debug mode, make it easy to do unit test
* fix issue that cause overlap logic skiped
* ut passed for all implemented functions
* remove query in ut, as it not completed yet
* refine querying interfaces, use 2 functions for 1 querying
* snapshot query,
* use pointer instead weak_ptr
* backend impl
* set default parameters value
* query bug fix,
* bug fix: new_attr should return attr id not node id
* use macro to create attribute getters
* add reset support
* change the way to reset, avoid allocation time
* test reset for attributestore
* use Bitset instead vector<bool> to make it easy to reset
* refine backend interfaces to make it compact with old one
* correct quering interface, cython compile passed
* bug fix: get_ticks not set correct index
* correct cpp backend binding, add type for frame
* correct ut for snapshot
* bug fix: query cause crash after snapshot reset
* fix env test
* bug fix: is_nan should check data type first
* fix cim ut issues with raw backend
* fix citibike ut issues for raw backend
* add interfaces to support dynamic nodes, not tested
* bug fix: access cpp object without cdef
* bug fix: missing impl for dynamic methods
* ut for append nodes
* return node number dynamiclly
* remove unused parameters for snapshot
* remove unused code
* allow get attribute for deleted node
* ut for delete and resume node
* function to set attribute slot
* bug fix: set attribute will cause crash
* bug fix: remove append node when reset cause exception
* bug fix: frame.backend_type return incorrect name
* backends performance comparison
* correct internal type
* correct warnings
* missing ;
* formating
* fix lint issue
* simple the way to copy mapping
* add dump interfaces
* frame dump
* ignore if dump path is not exist
* bug fix: use max slots instead of current slots for padding in snapshot querying
* use max slot number in history instead of current for padding
* dump for snapshot
* close file at the end
* refine snapshot dump function
* fix lint issue
* avoid too much allocate operation
* use pointer instead reference for furthure changes
* avoid 2 times map copy
* add comments for missing functions
* performance optimize
* use emplace instead push
* use emplace instead push
* remove cpp files
* add missing lisence
* ignore .vs folder
* add lest lisence for cpp unittest
* Delete CMakeLists.txt
* add error msg for exception, make it easy to identify error at python side
* remove old codes
* replace with new code
* change IDENTIER to NODE_TYPE and ATTR_TYPE
* build pass
* fix attr type not correct bug
* reomve unused comment
* make frame ut pass
* correct the max snapshots checking
* fix test case
* add missing file
* correct performance test
* refine attribute code
* refine bitset code
* update FrameBase doc about switch backend
* correct the exception name
* refine frame code
* refine node code
* refine snapshot list code
* add is_const and is_list when adding attribute
* support query const attribute without tick exist
* add operations for list attribute
* remove cache as we have list attribute
* add remove and insert for list attribute
* add for-loop support for list attribute
* fix bug that not update list attribute slot number after operations
* test for dynamic features
* frame dump
* dump for snapshot list
* fix issue on gcc compiler
* add missing file
* fix lint issues
* refine the exception, more comments
* fix lint issue
* fix lint issue
* use simulate enum instead of str
* Use new type instead old in tests
* using mapping instead if-else
* remove generated code
* use mapping to reduce too much if-else
* add default attribute type int if not provided or invalid provided
* remove generated code
* update workflow with code gen
* more frame test
* add missing files
* test: cover maro.simulator.utils.common
* update test with new scenario
* comments
* tests
* update doc
* fix lint and comments
* CRLF to LF
* fix lint issue
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>