* updated images and refined doc
* updated images
* updated CIM-AC example
* refined proxy retry logic
* call policy update only for AbsCorePolicy
* add limitation of AbsCorePolicy in Actor.collect()
* refined actor to return only experiences for policies that received new experiences
* fix MsgKey issue in rollout_manager
* fix typo in learner
* call exit function for parallel rollout manager
* update supply chain example distributed training scripts
* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype
* reformat render
* fix supply chain business engine action type problem
* reset supply chain example render figsize from 4 to 3
* Add render to all modes of supply chain example
* fix or policy typos
* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes
* refined parallel policy manager
* updated rl/__init__/py
* fixed lint issues and CIM local learner bugs
* deleted unwanted supply_chain test files
* revised default config for cim-dqn
* removed test_store.py as it is no longer needed
* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms
* updated figures
* removed unwanted import
* refactored CIM-DQN example
* added MultiProcessRolloutManager and MultiProcessTrainingManager
* updated doc
* lint issue fix
* lint issue fix
* fixed import formatting
* [Feature] Prioritized Experience Replay (#355)
* added prioritized experience replay
* deleted unwanted supply_chain test files
* fixed import order
* import fix
* fixed lint issues
* fixed import formatting
* added note in docstring that rank-based PER has yet to be implemented
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* rm AbsDecisionGenerator
* small fixes
* bug fix
* reorganized training folder structure
* fixed lint issues
* fixed lint issues
* policy manager refined
* lint fix
* restructured CIM-dqn sync code
* added policy version index and used it as a measure of experience staleness
* lint issue fix
* lint issue fix
* switched log_dir and proxy_kwargs order
* cim example refinement
* eval schedule sorted only when it's a list
* eval schedule sorted only when it's a list
* update sc env wrapper
* added docker scripts for cim-dqn
* refactored example folder structure and added workflow templates
* fixed lint issues
* fixed lint issues
* fixed template bugs
* removed unused imports
* refactoring sc in progress
* simplified cim meta
* fixed build.sh path bug
* template refinement
* deleted obsolete svgs
* updated learner logs
* minor edits
* refactored templates for easy merge with async PR
* added component names for rollout manager and policy manager
* fixed incorrect position to add last episode to eval schedule
* added max_lag option in templates
* formatting edit in docker_compose_yml script
* moved local learner and early stopper outside sync_tools
* refactored rl toolkit folder structure
* refactored rl toolkit folder structure
* moved env_wrapper and agent_wrapper inside rl/learner
* refined scripts
* fixed typo in script
* changes needed for running sc
* removed unwanted imports
* config change for testing sc scenario
* changes for perf testing
* Asynchronous Training (#364)
* remote inference code draft
* changed actor to rollout_worker and updated init files
* removed unwanted import
* updated inits
* more async code
* added async scripts
* added async training code & scripts for CIM-dqn
* changed async to async_tools to avoid conflict with python keyword
* reverted unwanted change to dockerfile
* added doc for policy server
* addressed PR comments and fixed a bug in docker_compose_yml.py
* fixed lint issue
* resolved PR comment
* resolved merge conflicts
* added async templates
* added proxy.close() for actor and policy_server
* fixed incorrect position to add last episode to eval schedule
* reverted unwanted changes
* added missing async files
* rm unwanted echo in kill.sh
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword
* added missing policy version increment in LocalPolicyManager
* refined rollout manager recv logic
* removed a debugging print
* added sleep in distributed launcher to avoid hanging
* updated api doc and rl toolkit doc
* refined dynamic imports using importlib
* 1. moved policy update triggers to policy manager; 2. added version control in policy manager
* fixed a few bugs and updated cim RL example
* fixed a few more bugs
* added agent wrapper instantiation to workflows
* added agent wrapper instantiation to workflows
* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet
* fixed incorrect get_ac_policy signature for CIM
* moved exploration inside core policy
* added state to exploration call to support context-dependent exploration
* separated non_rl_policy_index and rl_policy_index in workflows
* modified sc example code according to workflow changes
* modified sc example code according to workflow changes
* added replay_agent_ids parameter to get_env_func for RL examples
* fixed a few bugs
* added maro/simulator/scenarios/supply_chain as bind mount
* added post-step, post-collect, post-eval and post-update callbacks
* fixed lint issues
* fixed lint issues
* moved instantiation of policy manager inside simple learner
* fixed env_wrapper get_reward signature
* minor edits
* removed get_eperience kwargs from env_wrapper
* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows
* added rollout exp disribution option in RL examples
* removed unwanted files
* 1. made logger internal in learner; 2 removed logger creation in abs classes
* checked out supply chain test files from v0.2_sc
* 1. added missing model.eval() to choose_action; 2.added entropy features to AC
* fixed a bug in ac entropy
* abbreviated coefficient to coeff
* removed -dqn from job name in rl example config
* added tmp patch to dev.df
* renamed image name for running rl examples
* added get_loss interface for core policies
* added policy manager in rl_toolkit.rst
* 1. env_wrapper bug fix; 2. policy manager update logic refinement
* refactored policy and algorithms
* policy interface redesigned
* refined policy interfaces
* fixed typo
* fixed bugs in refactored policy interface
* fixed some bugs
* refactoring in progress
* policy interface and policy manager redesigned
* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts
* fixed bug in distributed policy manager
* fixed lint issues
* fixed lint issues
* added scipy in setup
* 1. trimmed rollout manager code; 2. added option to docker scripts
* updated api doc for policy manager
* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script
* 1. simplified rl example structure; 2. fixed lint issues
* further rl toolkit code simplifications
* more numpy-based optimization in RL toolkit
* moved replay buffer inside policy
* bug fixes
* numpy optimization and associated refactoring
* extracted shaping logic out of env_sampler
* fixed bug in CIM shaping and lint issues
* preliminary implemetation of parallel batch inference
* fixed bug in ddpg transition recording
* put get_state, get_env_actions, get_reward back in EnvSampler
* simplified exploration and core model interfaces
* bug fixes and doc update
* added improve() interface for RLPolicy for single-thread support
* fixed simple policy manager bug
* updated doc, rst, notebook
* updated notebook
* fixed lint issues
* fixed entropy bugs in ac.py
* reverted to simple policy manager as default
* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit
* fixed lint issues and updated rl toolkit images
* removed obsolete images
* added back agent2policy for general workflow use
* V0.2 rl refinement dist (#377)
* Support `slice` operation in ExperienceSet
* Support naive distributed policy training by proxy
* Dynamically allocate trainers according to number of experience
* code check
* code check
* code check
* Fix a bug in distributed trianing with no gradient
* Code check
* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy
* 1.call allocate_trainer() at first of update(); 2.refine according to code review
* Code check
* Refine code with new interface
* Update docs of PolicyManger and ExperienceSet
* Add images for rl_toolkit docs
* Update diagram of PolicyManager
* Refine with new interface
* Extract allocation strategy into `allocation_strategy.py`
* add `distributed_learn()` in policies for data-parallel training
* Update doc of RL_toolkit
* Add gradient workers for data-parallel
* Refine code and update docs
* Lint check
* Refine by comments
* Rename `trainer` to `worker`
* Rename `distributed_learn` to `learn_with_data_parallel`
* Refine allocator and remove redundant code in policy_manager
* remove arugments in allocate_by_policy and so on
* added checkpointing for simple and multi-process policy managers
* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager
* added missing set_state and get_state for CIM policies
* removed blank line
* updated RL workflow README
* Integrate `data_parallel` arguments into `worker_allocator` (#402)
* 1. simplified workflow config; 2. added comments to CIM shaping
* lint issue fix
* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading
* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows
* refined READEME for CIM
* VM scheduling with RL (#375)
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* added DQN
* added get_experiences func for ac in vm scheduling
* added post_step callback to env wrapper
* moved Aiming's tracking and plotting logic into callbacks
* added eval env wrapper
* renamed AC config variable name for VM
* vm scheduling RL code finished
* updated README
* fixed various bugs and hard coding for vm_scheduling
* uncommented callbacks for VM scheduling
* Minor revision for better code style
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* vm scheduling RL code finished
* added config.py for vm scheduing
* vm example refactoring
* fixed bugs in vm_scheduling
* removed unwanted files from cim dir
* reverted to simple policy manager as default
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* vm scheduling RL code finished
* added config.py for vm scheduing
* resolved rebase conflicts
* fixed bugs in vm_scheduling
* added get_state and set_state to vm_scheduling policy models
* updated README for vm_scheduling with RL
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* SC refinement (#397)
* Refine test scripts & pending_order_daily logic
* Refactor code for better code style: complete type hint, correct typos, remove unused items.
Refactor code for better code style: complete type hint, correct typos, remove unused items.
* Polish test_supply_chain.py
* update import format
* Modify vehicle steps logic & remove outdated test case
* Optimize imports
* Optimize imports
* Lint error
* Lint error
* Lint error
* Add SupplyChainAction
* Lint error
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* refined workflow scripts
* fixed bug in ParallelAgentWrapper
* 1. fixed lint issues; 2. refined main script in workflows
* lint issue fix
* restored default config for rl example
* Update rollout.py
* refined env var processing in policy manager workflow
* added hasattr check in agent wrapper
* updated docker_compose_yml.py
* Minor refinement
* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)
* Prepare to merge master_mirror
* Lint error
* Minor
* Merge latest master into v0.3 (#426)
* update docker hub init (#367)
* update docker hub init
* replace personal account with maro-team
* update hello files for CIM
* update docker repository name
* update docker file name
* fix bugs in notebook, rectify docs
* fix doc build issue
* remove docs from playground; fix citibike lp example Event issue
* update the exampel for vector env
* update vector env example
* update README due to PR comments
* add link to playground above MARO installation in README
* fix some typos
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* update package version
* update README for package description
* update image links for pypi package description
* update image links for pypi package description
* change the input topology schema for CIM real data mode (#372)
* change the input topology schema for CIM real data mode
* remove unused importing
* update test config file correspondingly
* add Exception for env test
* add cost factors to cim data dump
* update CimDataCollection field name
* update field name of data collection related code
* update package version
* adjust interface to reflect actual signature (#374)
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
* update dataclasses requirement to setup
* fix: fixing spelling grammarr
* fix: fix typo spelling code commented and data_model.rst
* Fix Geo vis IP address & SQL logic bugs. (#383)
Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).
* Fix the "Wrong future stop tick predictions" bug (#386)
* Propose my new solution
Refine to the pre-process version
.
* Optimize import
* Fix reset random seed bug (#387)
* update the reset interface of Env and BE
* Try to fix reset routes generation seed issue
* Refine random related logics.
* Minor refinement
* Test check
* Minor
* Remove unused functions so far
* Minor
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* update package version
* Add _init_vessel_plans in business_engine.reset (#388)
* update package version
* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Refine `event_buffer/` module (#389)
* Core & Business Engine code refinement (#392)
* First version
* Optimize imports
* Add typehint
* Lint check
* Lint check
* add higher python version (#398)
* add higher python version
* update pytorch version
* update torchvision version
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* CIM scenario refinement (#400)
* Cim scenario refinement (#394)
* CIM refinement
* Fix lint error
* Fix lint error
* Cim test coverage (#395)
* Enrich tests
* Refactor CimDataGenerator
* Refactor CIM parsers
* Minor refinement
* Fix lint error
* Fix lint error
* Fix lint error
* Minor refactor
* Type
* Add two test file folders. Make a slight change to CIM BE.
* Lint error
* Lint error
* Remove unnecessary public interfaces of CIM BE
* Cim disable auto action type detection (#399)
* Haven't been tested
* Modify document
* Add ActionType checking
* Minor
* Lint error
* Action quantity should be a position number
* Modify related docs & notebooks
* Minor
* Change test file name. Prepare to merge into master.
* .
* Minor test patch
* Add `clear()` function to class `SimRandom` (#401)
* Add SimRandom.clear()
* Minor
* Remove commented codes
* Lint error
* update package version
* Minor
* Remove docs/source/examples/multi_agent_dqn_cim.rst
* Update .gitignore
* Update .gitignore
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
* Change `Env.set_seed()` logic (#456)
* Change Env.set_seed() logic
* Redesign CIM reset logic; fix lint issues;
* Lint
* Seed type assertion
* Remove all SC related files (#473)
* RL Toolkit V3 (#471)
* added daemon=True for multi-process rollout, policy manager and inference
* removed obsolete files
* [REDO][PR#406]V0.2 rl refinement taskq (#408)
* Add a usable task_queue
* Rename some variables
* 1. Add ; 2. Integrate related files; 3. Remove
* merge `data_parallel` and `num_grad_workers` into `data_parallelism`
* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.
* Move `grad_worker` into marl/rl/workflows
* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.
* Refine code and update docs of `TaskQueue`
* Support priority for tasks in `task_queue`
* Update diagram of policy manager and task queue.
* Add configurable `single_task_limit` and correct docstring about `data_parallelism`
* Fix lint errors in `supply chain`
* RL policy redesign (V2) (#405)
* Drafi v2.0 for V2
* Polish models with more comments
* Polish policies with more comments
* Lint
* Lint
* Add developer doc for models.
* Add developer doc for policies.
* Remove policy manager V2 since it is not used and out-of-date
* Lint
* Lint
* refined messy workflow code
* merged 'scenario_dir' and 'scenario' in rl config
* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods
* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2
* merged cim and cim_v2
* lint issue fix
* refined logging logic
* lint issue fix
* reversed unwanted changes
* .
.
.
.
ReplayMemory & IndexScheduler
ReplayMemory & IndexScheduler
.
MultiReplayMemory
get_actions_with_logps
EnvSampler on the road
EnvSampler
Minor
* LearnerManager
* Use batch to transfer data & add SHAPE_CHECK_FLAG
* Rename learner to trainer
* Add property for policy._is_exploring
* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.
* env_sampler.py could run
* env_sampler refine on the way
* First runnable version done
* AC could run, but the result is bad. Need to check the logic
* Refine abstract method & shape check error info.
* Docs
* Very detailed compare. Try again.
* AC done
* DQN check done
* Minor
* DDPG, not tested
* Minors
* A rough draft of MAAC
* Cannot use CIM as the multi-agent scenario.
* Minor
* MAAC refinement on the way
* Remove ActionWithAux
* Refine batch & memory
* MAAC example works
* Reproduce-able fix. Policy share between env_sampler and trainer_manager.
* Detail refinement
* Simplify the user configed workflow
* Minor
* Refine example codes
* Minor polishment
* Migrate rollout_manager to V3
* Error on the way
* Redesign torch.device management
* Rl v3 maddpg (#418)
* Add MADDPG trainer
* Fit independent critics and shared critic modes.
* Add a new property: num_policies
* Lint
* Fix a bug in `sum(rewards)`
* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.
* Rename maddpg in examples.
* Preparation for data parallel (#420)
* Preparation for data parallel
* Minor refinement & lint fix
* Lint
* Lint
* rename atomic_get_batch_grad to get_batch_grad
* Fix a unexpected commit
* distributed maddpg
* Add critic worker
* Minor
* Data parallel related minorities
* Refine code structure for trainers & add more doc strings
* Revert a unwanted change
* Use TrainWorker to do the actual calculations.
* Some minor redesign of the worker's abstraction
* Add set/get_policy_state_dict back
* Refine set/get_policy_state_dict
* Polish policy trainers
move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg
* Rl v3 data parallel grad worker (#432)
* Fit new `trainer_worker` in `grad_worker` and `task_queue`.
* Add batch dispatch
* Add `tensor_dict` for task submit interface
* Move `_remote_learn` to `AbsTrainWorker`.
* Complement docstring for task queue and trainer.
* Rename train worker to train ops; add placeholder for abstract methods;
* Lint
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)
* Preparation for data parallel
* Minor refinement & lint fix
* Lint
* Lint
* rename atomic_get_batch_grad to get_batch_grad
* Fix a unexpected commit
* distributed maddpg
* Add critic worker
* Minor
* Data parallel related minorities
* Refine code structure for trainers & add more doc strings
* Revert a unwanted change
* Use TrainWorker to do the actual calculations.
* Some minor redesign of the worker's abstraction
* Add set/get_policy_state_dict back
* Refine set/get_policy_state_dict
* Polish policy trainers
move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg
* Rl v3 data parallel grad worker (#432)
* Fit new `trainer_worker` in `grad_worker` and `task_queue`.
* Add batch dispatch
* Add `tensor_dict` for task submit interface
* Move `_remote_learn` to `AbsTrainWorker`.
* Complement docstring for task queue and trainer.
* dsitributed training pipeline draft
* added temporary test files for review purposes
* Several code style refinements (#451)
* Polish rl_v3/utils/
* Polish rl_v3/distributed/
* Polish rl_v3/policy_trainer/abs_trainer.py
* fixed merge conflicts
* unified sync and async interfaces
* refactored rl_v3; refinement in progress
* Finish the runnable pipeline under new design
* Remove outdated files; refine class names; optimize imports;
* Lint
* Minor maddpg related refinement
* Lint
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Miner bug fix
* Coroutine-related bug fix ("get_policy_state") (#452)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* deleted unwanted folder
* removed unwanted changes
* resolved PR452 comments
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Quick fix
* Redesign experience recording logic (#453)
* Two not important fix
* Temp draft. Prepare to WFH
* Done
* Lint
* Lint
* Calculating advantages / returns (#454)
* V1.0
* Complete DDPG
* Rl v3 hanging issue fix (#455)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* unified worker interfaces
* recovered some files
* dist training + cli code move
* fixed bugs
* added retry logic to client
* 1. refactored CIM with various algos; 2. lint
* lint
* added type hint
* removed some logs
* lint
* Make main.py more IDE friendly
* Make main.py more IDE friendly
* Lint
* Final test & format. Ready to merge.
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* Rl v3 parallel rollout (#457)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* unified worker interfaces
* recovered some files
* dist training + cli code move
* fixed bugs
* added retry logic to client
* 1. refactored CIM with various algos; 2. lint
* lint
* added type hint
* removed some logs
* lint
* Make main.py more IDE friendly
* Make main.py more IDE friendly
* Lint
* load balancing dispatcher
* added parallel rollout
* lint
* Tracker variable type issue; rename to env_sampler_creator;
* Rl v3 parallel rollout follow ups (#458)
* AbsWorker & AbsDispatcher
* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.
* Fix policy_creator reuse bug
* Format code
* Merge AbsTrainerManager & SimpleTrainerManager
* AC test passed
* Lint
* Remove AbsTrainer.build() method. Put all initialization operations into __init__
* Redesign AC preprocess batches logic
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* MADDPG performance bug fix (#459)
* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;
* Restore Trainer.build() method
* Calculate latest action in the get_actor_grad method in MADDPG.
* Share critic bug fix
* Rl v3 example update (#461)
* updated vm_scheduling example and cim notebook
* fixed bugs in vm_scheduling
* added local train method
* bug fix
* modified async client logic to fix hidden issue
* reverted to default config
* fixed PR comments and some bugs
* removed hardcode
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Done (#462)
* Rl v3 load save (#463)
* added load/save feature
* fixed some bugs
* reverted unwanted changes
* lint
* fixed PR comments
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL Toolkit data parallelism revamp & config utils (#464)
* added load/save feature
* fixed some bugs
* reverted unwanted changes
* lint
* fixed PR comments
* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local
* 1. fixed rollout exit issue; 2. refined config
* removed config file from example
* fixed lint issues
* fixed lint issues
* added main.py under examples/rl
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL doc string (#465)
* First rough draft
* Minors
* Reformat
* Lint
* Resolve PR comments
* Rl type specific env getter (#466)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* fixed bugs
* fixed bugs
* bug fixes
* lint
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Example bug fix
* Optimize parser.py
* Resolve PR comments
* Rl config doc (#467)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* added detailed doc
* lint
* wording refined
* resolved some PR comments
* resolved more PR comments
* typo fix
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL online doc (#469)
* Model, policy, trainer
* RL workflows and env sampler doc in RST (#468)
* First rough draft
* Minors
* Reformat
* Lint
* Resolve PR comments
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* Rl type specific env getter (#466)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* fixed bugs
* fixed bugs
* bug fixes
* lint
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Example bug fix
* Optimize parser.py
* Resolve PR comments
* added detailed doc
* lint
* wording refined
* resolved some PR comments
* rewriting rl toolkit rst
* resolved more PR comments
* typo fix
* updated rst
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Finish docs/source/key_components/rl_toolkit.rst
* API doc
* RL online doc image fix (#470)
* resolved some PR comments
* fix
* fixed PR comments
* added numfig=True setting in conf.py for sphinx
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Resolve PR comments
* Add example github link
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Rl v3 pr comment resolution (#474)
* added load/save feature
* 1. resolved pr comments; 2. reverted maro/cli/k8s
* fixed some bugs
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* RL renaming v2 (#476)
* Change all Logger in RL to LoggerV2
* TrainerManager => TrainingManager
* Add Trainer suffix to all algorithms
* Finish docs
* Update interface names
* Minor fix
* Cherry pick latest RL (#498)
* Cherry pick
* Remove SC related files
* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)
* Cherry pick RL changes from sc_refinement (2a4869)
* Limit time display precision
* RL incremental refactor (#501)
* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.
AC & PPO for continuous action policy; refine AC & PPO logic.
Cherry pick RL changes from GYM-DDPG
Cherry pick RL changes from GYM-SAC
Minor error in doc string
* Add min_n_sample in template and parser
* Resolve PR comments. Fix a minor issue in SAC.
* RL component bundle (#513)
* CIM passed
* Update workers
* Refine annotations
* VM passed
* Code formatting.
* Minor import loop issue
* Pass batch in PPO again
* Remove Scenario
* Complete docs
* Minor
* Remove segment
* Optimize logic in RLComponentBundle
* Resolve PR comments
* Move 'post methods from RLComponenetBundle to EnvSampler
* Add method to get mapping of available tick to frame index (#415)
* add method to get mapping of available tick to frame index
* fix lint issue
* fix naming issue
* Cherry pick from sc_refinement (#527)
* Cherry pick from sc_refinement
* Cherry pick from sc_refinement
* Refine `terminal` / `next_agent_state` logic (#531)
* Optimize RL toolkit
* Fix bug in terminal/next_state generation
* Rewrite terminal/next_state logic again
* Minor renaming
* Minor bug fix
* Resolve PR comments
* Merge master into v0.3 (#536)
* update docker hub init (#367)
* update docker hub init
* replace personal account with maro-team
* update hello files for CIM
* update docker repository name
* update docker file name
* fix bugs in notebook, rectify docs
* fix doc build issue
* remove docs from playground; fix citibike lp example Event issue
* update the exampel for vector env
* update vector env example
* update README due to PR comments
* add link to playground above MARO installation in README
* fix some typos
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* update package version
* update README for package description
* update image links for pypi package description
* update image links for pypi package description
* change the input topology schema for CIM real data mode (#372)
* change the input topology schema for CIM real data mode
* remove unused importing
* update test config file correspondingly
* add Exception for env test
* add cost factors to cim data dump
* update CimDataCollection field name
* update field name of data collection related code
* update package version
* adjust interface to reflect actual signature (#374)
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
* update dataclasses requirement to setup
* fix: fixing spelling grammarr
* fix: fix typo spelling code commented and data_model.rst
* Fix Geo vis IP address & SQL logic bugs. (#383)
Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).
* Fix the "Wrong future stop tick predictions" bug (#386)
* Propose my new solution
Refine to the pre-process version
.
* Optimize import
* Fix reset random seed bug (#387)
* update the reset interface of Env and BE
* Try to fix reset routes generation seed issue
* Refine random related logics.
* Minor refinement
* Test check
* Minor
* Remove unused functions so far
* Minor
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* update package version
* Add _init_vessel_plans in business_engine.reset (#388)
* update package version
* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Refine `event_buffer/` module (#389)
* Core & Business Engine code refinement (#392)
* First version
* Optimize imports
* Add typehint
* Lint check
* Lint check
* add higher python version (#398)
* add higher python version
* update pytorch version
* update torchvision version
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* CIM scenario refinement (#400)
* Cim scenario refinement (#394)
* CIM refinement
* Fix lint error
* Fix lint error
* Cim test coverage (#395)
* Enrich tests
* Refactor CimDataGenerator
* Refactor CIM parsers
* Minor refinement
* Fix lint error
* Fix lint error
* Fix lint error
* Minor refactor
* Type
* Add two test file folders. Make a slight change to CIM BE.
* Lint error
* Lint error
* Remove unnecessary public interfaces of CIM BE
* Cim disable auto action type detection (#399)
* Haven't been tested
* Modify document
* Add ActionType checking
* Minor
* Lint error
* Action quantity should be a position number
* Modify related docs & notebooks
* Minor
* Change test file name. Prepare to merge into master.
* .
* Minor test patch
* Add `clear()` function to class `SimRandom` (#401)
* Add SimRandom.clear()
* Minor
* Remove commented codes
* Lint error
* update package version
* add branch v0.3 to github workflow
* update github test workflow
* Update requirements.dev.txt (#444)
Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.
* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)
Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)
---
updated-dependencies:
- dependency-name: ipython
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add & sort requirements.dev.txt
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Merge master into v0.3 (#545)
* update docker hub init (#367)
* update docker hub init
* replace personal account with maro-team
* update hello files for CIM
* update docker repository name
* update docker file name
* fix bugs in notebook, rectify docs
* fix doc build issue
* remove docs from playground; fix citibike lp example Event issue
* update the exampel for vector env
* update vector env example
* update README due to PR comments
* add link to playground above MARO installation in README
* fix some typos
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* update package version
* update README for package description
* update image links for pypi package description
* update image links for pypi package description
* change the input topology schema for CIM real data mode (#372)
* change the input topology schema for CIM real data mode
* remove unused importing
* update test config file correspondingly
* add Exception for env test
* add cost factors to cim data dump
* update CimDataCollection field name
* update field name of data collection related code
* update package version
* adjust interface to reflect actual signature (#374)
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
* update dataclasses requirement to setup
* fix: fixing spelling grammarr
* fix: fix typo spelling code commented and data_model.rst
* Fix Geo vis IP address & SQL logic bugs. (#383)
Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).
* Fix the "Wrong future stop tick predictions" bug (#386)
* Propose my new solution
Refine to the pre-process version
.
* Optimize import
* Fix reset random seed bug (#387)
* update the reset interface of Env and BE
* Try to fix reset routes generation seed issue
* Refine random related logics.
* Minor refinement
* Test check
* Minor
* Remove unused functions so far
* Minor
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* update package version
* Add _init_vessel_plans in business_engine.reset (#388)
* update package version
* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Refine `event_buffer/` module (#389)
* Core & Business Engine code refinement (#392)
* First version
* Optimize imports
* Add typehint
* Lint check
* Lint check
* add higher python version (#398)
* add higher python version
* update pytorch version
* update torchvision version
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* CIM scenario refinement (#400)
* Cim scenario refinement (#394)
* CIM refinement
* Fix lint error
* Fix lint error
* Cim test coverage (#395)
* Enrich tests
* Refactor CimDataGenerator
* Refactor CIM parsers
* Minor refinement
* Fix lint error
* Fix lint error
* Fix lint error
* Minor refactor
* Type
* Add two test file folders. Make a slight change to CIM BE.
* Lint error
* Lint error
* Remove unnecessary public interfaces of CIM BE
* Cim disable auto action type detection (#399)
* Haven't been tested
* Modify document
* Add ActionType checking
* Minor
* Lint error
* Action quantity should be a position number
* Modify related docs & notebooks
* Minor
* Change test file name. Prepare to merge into master.
* .
* Minor test patch
* Add `clear()` function to class `SimRandom` (#401)
* Add SimRandom.clear()
* Minor
* Remove commented codes
* Lint error
* update package version
* add branch v0.3 to github workflow
* update github test workflow
* Update requirements.dev.txt (#444)
Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.
* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)
Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)
---
updated-dependencies:
- dependency-name: ipython
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* update github woorkflow config
* MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539)
* refined proxy coding style
* updated images and refined doc
* updated images
* updated CIM-AC example
* refined proxy retry logic
* call policy update only for AbsCorePolicy
* add limitation of AbsCorePolicy in Actor.collect()
* refined actor to return only experiences for policies that received new experiences
* fix MsgKey issue in rollout_manager
* fix typo in learner
* call exit function for parallel rollout manager
* update supply chain example distributed training scripts
* 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype
* reformat render
* fix supply chain business engine action type problem
* reset supply chain example render figsize from 4 to 3
* Add render to all modes of supply chain example
* fix or policy typos
* 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes
* refined parallel policy manager
* updated rl/__init__/py
* fixed lint issues and CIM local learner bugs
* deleted unwanted supply_chain test files
* revised default config for cim-dqn
* removed test_store.py as it is no longer needed
* 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms
* updated figures
* removed unwanted import
* refactored CIM-DQN example
* added MultiProcessRolloutManager and MultiProcessTrainingManager
* updated doc
* lint issue fix
* lint issue fix
* fixed import formatting
* [Feature] Prioritized Experience Replay (#355)
* added prioritized experience replay
* deleted unwanted supply_chain test files
* fixed import order
* import fix
* fixed lint issues
* fixed import formatting
* added note in docstring that rank-based PER has yet to be implemented
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* rm AbsDecisionGenerator
* small fixes
* bug fix
* reorganized training folder structure
* fixed lint issues
* fixed lint issues
* policy manager refined
* lint fix
* restructured CIM-dqn sync code
* added policy version index and used it as a measure of experience staleness
* lint issue fix
* lint issue fix
* switched log_dir and proxy_kwargs order
* cim example refinement
* eval schedule sorted only when it's a list
* eval schedule sorted only when it's a list
* update sc env wrapper
* added docker scripts for cim-dqn
* refactored example folder structure and added workflow templates
* fixed lint issues
* fixed lint issues
* fixed template bugs
* removed unused imports
* refactoring sc in progress
* simplified cim meta
* fixed build.sh path bug
* template refinement
* deleted obsolete svgs
* updated learner logs
* minor edits
* refactored templates for easy merge with async PR
* added component names for rollout manager and policy manager
* fixed incorrect position to add last episode to eval schedule
* added max_lag option in templates
* formatting edit in docker_compose_yml script
* moved local learner and early stopper outside sync_tools
* refactored rl toolkit folder structure
* refactored rl toolkit folder structure
* moved env_wrapper and agent_wrapper inside rl/learner
* refined scripts
* fixed typo in script
* changes needed for running sc
* removed unwanted imports
* config change for testing sc scenario
* changes for perf testing
* Asynchronous Training (#364)
* remote inference code draft
* changed actor to rollout_worker and updated init files
* removed unwanted import
* updated inits
* more async code
* added async scripts
* added async training code & scripts for CIM-dqn
* changed async to async_tools to avoid conflict with python keyword
* reverted unwanted change to dockerfile
* added doc for policy server
* addressed PR comments and fixed a bug in docker_compose_yml.py
* fixed lint issue
* resolved PR comment
* resolved merge conflicts
* added async templates
* added proxy.close() for actor and policy_server
* fixed incorrect position to add last episode to eval schedule
* reverted unwanted changes
* added missing async files
* rm unwanted echo in kill.sh
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* renamed sync to synchronous and async to asynchronous to avoid conflict with keyword
* added missing policy version increment in LocalPolicyManager
* refined rollout manager recv logic
* removed a debugging print
* added sleep in distributed launcher to avoid hanging
* updated api doc and rl toolkit doc
* refined dynamic imports using importlib
* 1. moved policy update triggers to policy manager; 2. added version control in policy manager
* fixed a few bugs and updated cim RL example
* fixed a few more bugs
* added agent wrapper instantiation to workflows
* added agent wrapper instantiation to workflows
* removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet
* fixed incorrect get_ac_policy signature for CIM
* moved exploration inside core policy
* added state to exploration call to support context-dependent exploration
* separated non_rl_policy_index and rl_policy_index in workflows
* modified sc example code according to workflow changes
* modified sc example code according to workflow changes
* added replay_agent_ids parameter to get_env_func for RL examples
* fixed a few bugs
* added maro/simulator/scenarios/supply_chain as bind mount
* added post-step, post-collect, post-eval and post-update callbacks
* fixed lint issues
* fixed lint issues
* moved instantiation of policy manager inside simple learner
* fixed env_wrapper get_reward signature
* minor edits
* removed get_eperience kwargs from env_wrapper
* 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows
* added rollout exp disribution option in RL examples
* removed unwanted files
* 1. made logger internal in learner; 2 removed logger creation in abs classes
* checked out supply chain test files from v0.2_sc
* 1. added missing model.eval() to choose_action; 2.added entropy features to AC
* fixed a bug in ac entropy
* abbreviated coefficient to coeff
* removed -dqn from job name in rl example config
* added tmp patch to dev.df
* renamed image name for running rl examples
* added get_loss interface for core policies
* added policy manager in rl_toolkit.rst
* 1. env_wrapper bug fix; 2. policy manager update logic refinement
* refactored policy and algorithms
* policy interface redesigned
* refined policy interfaces
* fixed typo
* fixed bugs in refactored policy interface
* fixed some bugs
* refactoring in progress
* policy interface and policy manager redesigned
* 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts
* fixed bug in distributed policy manager
* fixed lint issues
* fixed lint issues
* added scipy in setup
* 1. trimmed rollout manager code; 2. added option to docker scripts
* updated api doc for policy manager
* 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script
* 1. simplified rl example structure; 2. fixed lint issues
* further rl toolkit code simplifications
* more numpy-based optimization in RL toolkit
* moved replay buffer inside policy
* bug fixes
* numpy optimization and associated refactoring
* extracted shaping logic out of env_sampler
* fixed bug in CIM shaping and lint issues
* preliminary implemetation of parallel batch inference
* fixed bug in ddpg transition recording
* put get_state, get_env_actions, get_reward back in EnvSampler
* simplified exploration and core model interfaces
* bug fixes and doc update
* added improve() interface for RLPolicy for single-thread support
* fixed simple policy manager bug
* updated doc, rst, notebook
* updated notebook
* fixed lint issues
* fixed entropy bugs in ac.py
* reverted to simple policy manager as default
* 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit
* fixed lint issues and updated rl toolkit images
* removed obsolete images
* added back agent2policy for general workflow use
* V0.2 rl refinement dist (#377)
* Support `slice` operation in ExperienceSet
* Support naive distributed policy training by proxy
* Dynamically allocate trainers according to number of experience
* code check
* code check
* code check
* Fix a bug in distributed trianing with no gradient
* Code check
* Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy
* 1.call allocate_trainer() at first of update(); 2.refine according to code review
* Code check
* Refine code with new interface
* Update docs of PolicyManger and ExperienceSet
* Add images for rl_toolkit docs
* Update diagram of PolicyManager
* Refine with new interface
* Extract allocation strategy into `allocation_strategy.py`
* add `distributed_learn()` in policies for data-parallel training
* Update doc of RL_toolkit
* Add gradient workers for data-parallel
* Refine code and update docs
* Lint check
* Refine by comments
* Rename `trainer` to `worker`
* Rename `distributed_learn` to `learn_with_data_parallel`
* Refine allocator and remove redundant code in policy_manager
* remove arugments in allocate_by_policy and so on
* added checkpointing for simple and multi-process policy managers
* 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager
* added missing set_state and get_state for CIM policies
* removed blank line
* updated RL workflow README
* Integrate `data_parallel` arguments into `worker_allocator` (#402)
* 1. simplified workflow config; 2. added comments to CIM shaping
* lint issue fix
* 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading
* 1. moved post_step callback inside env sampler; 2. updated README for rl workflows
* refined READEME for CIM
* VM scheduling with RL (#375)
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* added DQN
* added get_experiences func for ac in vm scheduling
* added post_step callback to env wrapper
* moved Aiming's tracking and plotting logic into callbacks
* added eval env wrapper
* renamed AC config variable name for VM
* vm scheduling RL code finished
* updated README
* fixed various bugs and hard coding for vm_scheduling
* uncommented callbacks for VM scheduling
* Minor revision for better code style
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* vm scheduling RL code finished
* added config.py for vm scheduing
* vm example refactoring
* fixed bugs in vm_scheduling
* removed unwanted files from cim dir
* reverted to simple policy manager as default
* added part of vm scheduling RL code
* refined vm env_wrapper code style
* vm scheduling RL code finished
* added config.py for vm scheduing
* resolved rebase conflicts
* fixed bugs in vm_scheduling
* added get_state and set_state to vm_scheduling policy models
* updated README for vm_scheduling with RL
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* SC refinement (#397)
* Refine test scripts & pending_order_daily logic
* Refactor code for better code style: complete type hint, correct typos, remove unused items.
Refactor code for better code style: complete type hint, correct typos, remove unused items.
* Polish test_supply_chain.py
* update import format
* Modify vehicle steps logic & remove outdated test case
* Optimize imports
* Optimize imports
* Lint error
* Lint error
* Lint error
* Add SupplyChainAction
* Lint error
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* refined workflow scripts
* fixed bug in ParallelAgentWrapper
* 1. fixed lint issues; 2. refined main script in workflows
* lint issue fix
* restored default config for rl example
* Update rollout.py
* refined env var processing in policy manager workflow
* added hasattr check in agent wrapper
* updated docker_compose_yml.py
* Minor refinement
* Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412)
* Prepare to merge master_mirror
* Lint error
* Minor
* Merge latest master into v0.3 (#426)
* update docker hub init (#367)
* update docker hub init
* replace personal account with maro-team
* update hello files for CIM
* update docker repository name
* update docker file name
* fix bugs in notebook, rectify docs
* fix doc build issue
* remove docs from playground; fix citibike lp example Event issue
* update the exampel for vector env
* update vector env example
* update README due to PR comments
* add link to playground above MARO installation in README
* fix some typos
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* update package version
* update README for package description
* update image links for pypi package description
* update image links for pypi package description
* change the input topology schema for CIM real data mode (#372)
* change the input topology schema for CIM real data mode
* remove unused importing
* update test config file correspondingly
* add Exception for env test
* add cost factors to cim data dump
* update CimDataCollection field name
* update field name of data collection related code
* update package version
* adjust interface to reflect actual signature (#374)
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
* update dataclasses requirement to setup
* fix: fixing spelling grammarr
* fix: fix typo spelling code commented and data_model.rst
* Fix Geo vis IP address & SQL logic bugs. (#383)
Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).
* Fix the "Wrong future stop tick predictions" bug (#386)
* Propose my new solution
Refine to the pre-process version
.
* Optimize import
* Fix reset random seed bug (#387)
* update the reset interface of Env and BE
* Try to fix reset routes generation seed issue
* Refine random related logics.
* Minor refinement
* Test check
* Minor
* Remove unused functions so far
* Minor
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* update package version
* Add _init_vessel_plans in business_engine.reset (#388)
* update package version
* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Refine `event_buffer/` module (#389)
* Core & Business Engine code refinement (#392)
* First version
* Optimize imports
* Add typehint
* Lint check
* Lint check
* add higher python version (#398)
* add higher python version
* update pytorch version
* update torchvision version
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* CIM scenario refinement (#400)
* Cim scenario refinement (#394)
* CIM refinement
* Fix lint error
* Fix lint error
* Cim test coverage (#395)
* Enrich tests
* Refactor CimDataGenerator
* Refactor CIM parsers
* Minor refinement
* Fix lint error
* Fix lint error
* Fix lint error
* Minor refactor
* Type
* Add two test file folders. Make a slight change to CIM BE.
* Lint error
* Lint error
* Remove unnecessary public interfaces of CIM BE
* Cim disable auto action type detection (#399)
* Haven't been tested
* Modify document
* Add ActionType checking
* Minor
* Lint error
* Action quantity should be a position number
* Modify related docs & notebooks
* Minor
* Change test file name. Prepare to merge into master.
* .
* Minor test patch
* Add `clear()` function to class `SimRandom` (#401)
* Add SimRandom.clear()
* Minor
* Remove commented codes
* Lint error
* update package version
* Minor
* Remove docs/source/examples/multi_agent_dqn_cim.rst
* Update .gitignore
* Update .gitignore
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
* Change `Env.set_seed()` logic (#456)
* Change Env.set_seed() logic
* Redesign CIM reset logic; fix lint issues;
* Lint
* Seed type assertion
* Remove all SC related files (#473)
* RL Toolkit V3 (#471)
* added daemon=True for multi-process rollout, policy manager and inference
* removed obsolete files
* [REDO][PR#406]V0.2 rl refinement taskq (#408)
* Add a usable task_queue
* Rename some variables
* 1. Add ; 2. Integrate related files; 3. Remove
* merge `data_parallel` and `num_grad_workers` into `data_parallelism`
* Fix bugs in docker_compose_yml.py and Simple/Multi-process mode.
* Move `grad_worker` into marl/rl/workflows
* 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue.
* Refine code and update docs of `TaskQueue`
* Support priority for tasks in `task_queue`
* Update diagram of policy manager and task queue.
* Add configurable `single_task_limit` and correct docstring about `data_parallelism`
* Fix lint errors in `supply chain`
* RL policy redesign (V2) (#405)
* Drafi v2.0 for V2
* Polish models with more comments
* Polish policies with more comments
* Lint
* Lint
* Add developer doc for models.
* Add developer doc for policies.
* Remove policy manager V2 since it is not used and out-of-date
* Lint
* Lint
* refined messy workflow code
* merged 'scenario_dir' and 'scenario' in rl config
* 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods
* 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2
* merged cim and cim_v2
* lint issue fix
* refined logging logic
* lint issue fix
* reversed unwanted changes
* .
.
.
.
ReplayMemory & IndexScheduler
ReplayMemory & IndexScheduler
.
MultiReplayMemory
get_actions_with_logps
EnvSampler on the road
EnvSampler
Minor
* LearnerManager
* Use batch to transfer data & add SHAPE_CHECK_FLAG
* Rename learner to trainer
* Add property for policy._is_exploring
* CIM test scenario for V3. Manual test passed. Next step: run it, make it works.
* env_sampler.py could run
* env_sampler refine on the way
* First runnable version done
* AC could run, but the result is bad. Need to check the logic
* Refine abstract method & shape check error info.
* Docs
* Very detailed compare. Try again.
* AC done
* DQN check done
* Minor
* DDPG, not tested
* Minors
* A rough draft of MAAC
* Cannot use CIM as the multi-agent scenario.
* Minor
* MAAC refinement on the way
* Remove ActionWithAux
* Refine batch & memory
* MAAC example works
* Reproduce-able fix. Policy share between env_sampler and trainer_manager.
* Detail refinement
* Simplify the user configed workflow
* Minor
* Refine example codes
* Minor polishment
* Migrate rollout_manager to V3
* Error on the way
* Redesign torch.device management
* Rl v3 maddpg (#418)
* Add MADDPG trainer
* Fit independent critics and shared critic modes.
* Add a new property: num_policies
* Lint
* Fix a bug in `sum(rewards)`
* Rename `MADDPG` to `DiscreteMADDPG` and fix type hint.
* Rename maddpg in examples.
* Preparation for data parallel (#420)
* Preparation for data parallel
* Minor refinement & lint fix
* Lint
* Lint
* rename atomic_get_batch_grad to get_batch_grad
* Fix a unexpected commit
* distributed maddpg
* Add critic worker
* Minor
* Data parallel related minorities
* Refine code structure for trainers & add more doc strings
* Revert a unwanted change
* Use TrainWorker to do the actual calculations.
* Some minor redesign of the worker's abstraction
* Add set/get_policy_state_dict back
* Refine set/get_policy_state_dict
* Polish policy trainers
move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg
* Rl v3 data parallel grad worker (#432)
* Fit new `trainer_worker` in `grad_worker` and `task_queue`.
* Add batch dispatch
* Add `tensor_dict` for task submit interface
* Move `_remote_learn` to `AbsTrainWorker`.
* Complement docstring for task queue and trainer.
* Rename train worker to train ops; add placeholder for abstract methods;
* Lint
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
* [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450)
* Preparation for data parallel
* Minor refinement & lint fix
* Lint
* Lint
* rename atomic_get_batch_grad to get_batch_grad
* Fix a unexpected commit
* distributed maddpg
* Add critic worker
* Minor
* Data parallel related minorities
* Refine code structure for trainers & add more doc strings
* Revert a unwanted change
* Use TrainWorker to do the actual calculations.
* Some minor redesign of the worker's abstraction
* Add set/get_policy_state_dict back
* Refine set/get_policy_state_dict
* Polish policy trainers
move train_batch_size to abs trainer
delete _train_step_impl()
remove _record_impl
remove unused methods
a minor bug fix in maddpg
* Rl v3 data parallel grad worker (#432)
* Fit new `trainer_worker` in `grad_worker` and `task_queue`.
* Add batch dispatch
* Add `tensor_dict` for task submit interface
* Move `_remote_learn` to `AbsTrainWorker`.
* Complement docstring for task queue and trainer.
* dsitributed training pipeline draft
* added temporary test files for review purposes
* Several code style refinements (#451)
* Polish rl_v3/utils/
* Polish rl_v3/distributed/
* Polish rl_v3/policy_trainer/abs_trainer.py
* fixed merge conflicts
* unified sync and async interfaces
* refactored rl_v3; refinement in progress
* Finish the runnable pipeline under new design
* Remove outdated files; refine class names; optimize imports;
* Lint
* Minor maddpg related refinement
* Lint
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Miner bug fix
* Coroutine-related bug fix ("get_policy_state") (#452)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* deleted unwanted folder
* removed unwanted changes
* resolved PR452 comments
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Quick fix
* Redesign experience recording logic (#453)
* Two not important fix
* Temp draft. Prepare to WFH
* Done
* Lint
* Lint
* Calculating advantages / returns (#454)
* V1.0
* Complete DDPG
* Rl v3 hanging issue fix (#455)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* unified worker interfaces
* recovered some files
* dist training + cli code move
* fixed bugs
* added retry logic to client
* 1. refactored CIM with various algos; 2. lint
* lint
* added type hint
* removed some logs
* lint
* Make main.py more IDE friendly
* Make main.py more IDE friendly
* Lint
* Final test & format. Ready to merge.
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* Rl v3 parallel rollout (#457)
* fixed rebase conflicts
* renamed get_policy_func_dict to policy_creator
* unified worker interfaces
* recovered some files
* dist training + cli code move
* fixed bugs
* added retry logic to client
* 1. refactored CIM with various algos; 2. lint
* lint
* added type hint
* removed some logs
* lint
* Make main.py more IDE friendly
* Make main.py more IDE friendly
* Lint
* load balancing dispatcher
* added parallel rollout
* lint
* Tracker variable type issue; rename to env_sampler_creator;
* Rl v3 parallel rollout follow ups (#458)
* AbsWorker & AbsDispatcher
* Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds.
* Fix policy_creator reuse bug
* Format code
* Merge AbsTrainerManager & SimpleTrainerManager
* AC test passed
* Lint
* Remove AbsTrainer.build() method. Put all initialization operations into __init__
* Redesign AC preprocess batches logic
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
* MADDPG performance bug fix (#459)
* Fix MARL (MADDPG) terminal recording bug; some other minor refinements;
* Restore Trainer.build() method
* Calculate latest action in the get_actor_grad method in MADDPG.
* Share critic bug fix
* Rl v3 example update (#461)
* updated vm_scheduling example and cim notebook
* fixed bugs in vm_scheduling
* added local train method
* bug fix
* modified async client logic to fix hidden issue
* reverted to default config
* fixed PR comments and some bugs
* removed hardcode
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Done (#462)
* Rl v3 load save (#463)
* added load/save feature
* fixed some bugs
* reverted unwanted changes
* lint
* fixed PR comments
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL Toolkit data parallelism revamp & config utils (#464)
* added load/save feature
* fixed some bugs
* reverted unwanted changes
* lint
* fixed PR comments
* 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local
* 1. fixed rollout exit issue; 2. refined config
* removed config file from example
* fixed lint issues
* fixed lint issues
* added main.py under examples/rl
* fixed lint issues
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL doc string (#465)
* First rough draft
* Minors
* Reformat
* Lint
* Resolve PR comments
* Rl type specific env getter (#466)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* fixed bugs
* fixed bugs
* bug fixes
* lint
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Example bug fix
* Optimize parser.py
* Resolve PR comments
* Rl config doc (#467)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* added detailed doc
* lint
* wording refined
* resolved some PR comments
* resolved more PR comments
* typo fix
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* RL online doc (#469)
* Model, policy, trainer
* RL workflows and env sampler doc in RST (#468)
* First rough draft
* Minors
* Reformat
* Lint
* Resolve PR comments
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* Rl type specific env getter (#466)
* 1. type-sensitive env variable getter; 2. updated READMEs for examples
* fixed bugs
* fixed bugs
* bug fixes
* lint
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Example bug fix
* Optimize parser.py
* Resolve PR comments
* added detailed doc
* lint
* wording refined
* resolved some PR comments
* rewriting rl toolkit rst
* resolved more PR comments
* typo fix
* updated rst
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: Default <huo53926@126.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Finish docs/source/key_components/rl_toolkit.rst
* API doc
* RL online doc image fix (#470)
* resolved some PR comments
* fix
* fixed PR comments
* added numfig=True setting in conf.py for sphinx
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* Resolve PR comments
* Add example github link
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
* Rl v3 pr comment resolution (#474)
* added load/save feature
* 1. resolved pr comments; 2. reverted maro/cli/k8s
* fixed some bugs
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
* RL renaming v2 (#476)
* Change all Logger in RL to LoggerV2
* TrainerManager => TrainingManager
* Add Trainer suffix to all algorithms
* Finish docs
* Update interface names
* Minor fix
* Cherry pick latest RL (#498)
* Cherry pick
* Remove SC related files
* Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509)
* Cherry pick RL changes from sc_refinement (2a4869)
* Limit time display precision
* RL incremental refactor (#501)
* Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training.
AC & PPO for continuous action policy; refine AC & PPO logic.
Cherry pick RL changes from GYM-DDPG
Cherry pick RL changes from GYM-SAC
Minor error in doc string
* Add min_n_sample in template and parser
* Resolve PR comments. Fix a minor issue in SAC.
* RL component bundle (#513)
* CIM passed
* Update workers
* Refine annotations
* VM passed
* Code formatting.
* Minor import loop issue
* Pass batch in PPO again
* Remove Scenario
* Complete docs
* Minor
* Remove segment
* Optimize logic in RLComponentBundle
* Resolve PR comments
* Move 'post methods from RLComponenetBundle to EnvSampler
* Add method to get mapping of available tick to frame index (#415)
* add method to get mapping of available tick to frame index
* fix lint issue
* fix naming issue
* Cherry pick from sc_refinement (#527)
* Cherry pick from sc_refinement
* Cherry pick from sc_refinement
* Refine `terminal` / `next_agent_state` logic (#531)
* Optimize RL toolkit
* Fix bug in terminal/next_state generation
* Rewrite terminal/next_state logic again
* Minor renaming
* Minor bug fix
* Resolve PR comments
* Merge master into v0.3 (#536)
* update docker hub init (#367)
* update docker hub init
* replace personal account with maro-team
* update hello files for CIM
* update docker repository name
* update docker file name
* fix bugs in notebook, rectify docs
* fix doc build issue
* remove docs from playground; fix citibike lp example Event issue
* update the exampel for vector env
* update vector env example
* update README due to PR comments
* add link to playground above MARO installation in README
* fix some typos
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* update package version
* update README for package description
* update image links for pypi package description
* update image links for pypi package description
* change the input topology schema for CIM real data mode (#372)
* change the input topology schema for CIM real data mode
* remove unused importing
* update test config file correspondingly
* add Exception for env test
* add cost factors to cim data dump
* update CimDataCollection field name
* update field name of data collection related code
* update package version
* adjust interface to reflect actual signature (#374)
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
* update dataclasses requirement to setup
* fix: fixing spelling grammarr
* fix: fix typo spelling code commented and data_model.rst
* Fix Geo vis IP address & SQL logic bugs. (#383)
Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)).
* Fix the "Wrong future stop tick predictions" bug (#386)
* Propose my new solution
Refine to the pre-process version
.
* Optimize import
* Fix reset random seed bug (#387)
* update the reset interface of Env and BE
* Try to fix reset routes generation seed issue
* Refine random related logics.
* Minor refinement
* Test check
* Minor
* Remove unused functions so far
* Minor
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
* update package version
* Add _init_vessel_plans in business_engine.reset (#388)
* update package version
* change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391)
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* Refine `event_buffer/` module (#389)
* Core & Business Engine code refinement (#392)
* First version
* Optimize imports
* Add typehint
* Lint check
* Lint check
* add higher python version (#398)
* add higher python version
* update pytorch version
* update torchvision version
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
* CIM scenario refinement (#400)
* Cim scenario refinement (#394)
* CIM refinement
* Fix lint error
* Fix lint error
* Cim test coverage (#395)
* Enrich tests
* Refactor CimDataGenerator
* Refactor CIM parsers
* Minor refinement
* Fix lint error
* Fix lint error
* Fix lint error
* Minor refactor
* Type
* Add two test file folders. Make a slight change to CIM BE.
* Lint error
* Lint error
* Remove unnecessary public interfaces of CIM BE
* Cim disable auto action type detection (#399)
* Haven't been tested
* Modify document
* Add ActionType checking
* Minor
* Lint error
* Action quantity should be a position number
* Modify related docs & notebooks
* Minor
* Change test file name. Prepare to merge into master.
* .
* Minor test patch
* Add `clear()` function to class `SimRandom` (#401)
* Add SimRandom.clear()
* Minor
* Remove commented codes
* Lint error
* update package version
* add branch v0.3 to github workflow
* update github test workflow
* Update requirements.dev.txt (#444)
Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact.
* Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460)
Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3)
---
updated-dependencies:
- dependency-name: ipython
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add & sort requirements.dev.txt
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Remove random_config.py
* Remove test_trajectory_utils.py
* Pass tests
* Update rl docs
* Remove python 3.6 in test
* Update docs
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: Wang.Jinyu <jinywan@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Logger bug hotfix (#543)
* Rename param
* Rename param
* Quick fix in env_data_process
* frame data precision issue fix (#544)
* fix frame precision issue
* add .xmake to .gitignore
* update frame precision lost warning message
* add assert to frame precision checking
* typo fix
* add TODO for future Long data type issue fix
* Minor cleaning
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Jinyu Wang <jinywan@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jinyu Wang <jinyu@RL4Inv.l1ea1prscrcu1p4sa0eapum5vc.bx.internal.cloudapp.net>
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
* Update requirements. (#552)
* Fix several encoding issues; update requirements.
* Test & minor
* Remove torch in requirements.build.txt
* Polish
* Update README
* Resolve PR comments
* Keep working
* Keep working
* Update test requirements
* Done (#554)
* Update requirements in example and notebook (#553)
* Update requirements in example and notebook
* Remove autopep8
* Add jupyterlab packages back
Co-authored-by: Jinyu-W <53509467+Jinyu-W@users.noreply.github.com>
* Refine decision event logic (#559)
* Add DecisionEventPayload
* Change decision payload name
* Refine action logic
* Add doc for env.step
* Restore pre-commit config
* Resolve PR comments
* Refactor decision event & action
* Pre-commit
* Resolve PR comments
* Refine rl component bundle (#549)
* Config files
* Done
* Minor bugfix
* Add autoflake
* Update isort exclude; add pre-commit to requirements
* Check only isort
* Minor
* Format
* Test passed
* Run pre-commit
* Minor bugfix in rl_component_bundle
* Pass mypy
* Fix a bug in RL notebook
* A minor bug fix
* Add upper bound for numpy version in test
Co-authored-by: ysqyang <v-yangqi@microsoft.com>
Co-authored-by: ysqyang <ysqyang@gmail.com>
Co-authored-by: yaqiu <v-yaqiu@microsoft.com>
Co-authored-by: Jinyu Wang <Wang.Jinyu@microsoft.com>
Co-authored-by: Huoran Li <huoranli@microsoft.com>
Co-authored-by: GQ.Chen <675865907@qq.com>
Co-authored-by: Jeremy Reynolds <jeremy.reynolds@microsoft.com>
Co-authored-by: Jeremy Reynolds <jeremr@microsoft.com>
Co-authored-by: slowy07 <slowy.arfy@gmail.com>
Co-authored-by: GQ.Chen <v-guanchen@microsoft.com>
Co-authored-by: Huoran Li <huo53926@126.com>
Co-authored-by: Chaos Yu <chaos.you@gmail.com>
Co-authored-by: solosilence <abhishekkr23rs@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jinyu Wang <jinyu@RL4Inv.l1ea1prscrcu1p4sa0eapum5vc.bx.internal.cloudapp.net>