merge with v0.1

2020-09-22 18:35:39 +08:00 · 2020-09-22 18:35:39 +08:00 · ce435b61fe
--- a/README.md
+++ b/README.md
@ -15,55 +15,73 @@ MARO has complete support on data processing, simulator building, RL algorithms
 | `examples`  | Showcase of MARO.           |
 | `notebooks` | MARO quick-start notebooks. |

-### Prerequisites
+## Prerequisites

 - [Python == 3.6/3.7](https://www.python.org/downloads/)
- C++ Compiler
-    - Linux or Mac OS X: `gcc`
-    - Windows: [Build Tools for Visual Studio 2017](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15)

-### Install MARO from PyPI
+## Install MARO from PyPI

 ```sh
 pip install maro
 ```

-### Install MARO from Source
+## Install MARO from Source ([editable mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs))

-```sh
-# If your environment is not clean, create a virtual environment firstly
-python -m venv maro_venv
-source maro_venv/bin/activate
+- Prerequisites
+  - C++ Compiler
+    - Linux or Mac OS X: `gcc`
+    - Windows: [Build Tools for Visual Studio 2017](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15) 

-# Install MARO from source, if you don't need CLI full feature
-pip install -r ./maro/requirements.build.txt
+- Enable Virtual Environment
+  - Mac OS / Linux

-# compile cython files
-bash scripts/compile_cython.sh
-pip install -e .
+    ```sh
+    # If your environment is not clean, create a virtual environment firstly.
+    python -m venv maro_venv
+    source ./maro_venv/bin/activate
+    ```

-# Or with script
-bash scripts/build_maro.sh
-```
+  - Windows

-### Quick example
+    ```ps
+    # If your environment is not clean, create a virtual environment firstly.
+    python -m venv maro_venv
+    .\maro_venv\Scripts\activate
+    ```
+
+- Install MARO
+
+  - Mac OS / Linux
+
+    ```sh
+    # Install MARO from source.
+    bash scripts/install_maro.sh
+    ```
+
+  - Windows
+
+    ```ps
+    # Install MARO from source.
+    .\scripts\install_maro.bat
+    ```
+
+## Quick example

 ```python
 from maro.simulator import Env

 env = Env(scenario="ecr", topology="toy.5p_ssddd_l0.0", start_tick=0, durations=100)

-_, decision_event, is_done = env.step(None)
+metrics, decision_event, is_done = env.step(None)

 while not is_done:
-    reward, decision_event, is_done = env.step(None)
+    metrics, decision_event, is_done = env.step(None)

-tot_shortage = env.snapshot_list["ports"][::"shortage"].sum()
-print(f"total shortage: {tot_shortage}")
+print(f"environment metrics: {env.metrics}")

 ```

-### Run playground
+## Run playground

 ```sh
 # Build playground image
@ -75,7 +93,7 @@ docker build -f ./docker_files/cpu.play.df . -t maro/playground:cpu
 docker run -p 40009:40009 -p 40010:40010 -p 40011:40011 maro/playground:cpu
 ```

-### Contributing
+## Contributing

 This project welcomes contributions and suggestions.  Most contributions require you to agree to a
 Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
--- a/docker_files/cpu.play.df
+++ b/docker_files/cpu.play.df
@ -20,9 +20,9 @@ RUN jupyter contrib nbextension install --system
 RUN jt -t onedork -fs 95 -altp -tfs 11 -nfs 115 -cellw 88% -T

 # Install maro
-COPY --from=ext_build /maro_build/dist/maro-0.0.1a0-cp36-cp36m-manylinux2010_x86_64.whl ./maro-0.0.1a0-cp36-cp36m-manylinux2010_x86_64.whl
-RUN pip install maro-0.0.1a0-cp36-cp36m-manylinux2010_x86_64.whl
-RUN rm maro-0.0.1a0-cp36-cp36m-manylinux2010_x86_64.whl
+COPY --from=ext_build /maro_build/dist/maro-0.1.1a0-cp36-cp36m-manylinux2010_x86_64.whl ./maro-0.1.1a0-cp36-cp36m-manylinux2010_x86_64.whl
+RUN pip install maro-0.1.1a0-cp36-cp36m-manylinux2010_x86_64.whl
+RUN rm maro-0.1.1a0-cp36-cp36m-manylinux2010_x86_64.whl

 # Install redis
 RUN wget http://download.redis.io/releases/redis-6.0.6.tar.gz; tar xzf redis-6.0.6.tar.gz; cd redis-6.0.6; make
--- a/docs/README.md
+++ b/docs/README.md
@ -1,11 +1,14 @@
 # MARO Documentation

 ## Pre-install
+
+## Generate API docs
 ```sh
 pip install -U -r requirements.docs.txt
 ```

 ## Build docs
+
 ```sh
 # For linux, darwin
 make html
@ -15,10 +18,31 @@ make html
 ```

 ## Generate API docs
+
 ```sh
 sphinx-apidoc -f -o ./source/apidoc ../maro/
 ```

+## Local host
+
+```sh
+python -m http.server -d ./_build/html 8000 -b 0.0.0.0
+```
+
+## Auto-build/Auto-refresh
+
+### Prerequisites
+
+- [Watchdog](https://pypi.org/project/watchdog/)
+- [Browser-sync](https://www.browsersync.io/)
+
+```sh
+# Watch file change, auto-build
+watchmedo shell-command --patterns="*.rst;*.md;*.py;*.png;*.ico;*.svg" --ignore-pattern="_build/*" --recursive --command="APIDOC_GEN=False make html"
+# Watch file change, auto-refresh
+browser-sync start --server --startPath ./_build/html --port 8000 --files "**/*"
+```
+
 ## Local host
 ```sh
 python -m http.server -d ./_build/html 8000 -b 0.0.0.0
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -32,47 +32,45 @@ Quick Start
    from maro.simulator import Env
    from maro.simulator.scenarios.ecr.common import Action

-    start_tick = 0
-    durations = 100  # 100 days
-
    # Initialize an environment with a specific scenario, related topology.
-    env = Env(scenario="ecr", topology="5p_ssddd_l0.0",
-            start_tick=start_tick, durations=durations)
+    # In ECR, 1 tick means 1 day, here durations=100 means a length of 100 days
+    env = Env(scenario="ecr", topology="toy.5p_ssddd_l0.0", start_tick=0, durations=100)

    # Query environment summary, which includes business instances, intra-instance attributes, etc.
    print(env.summary)

    for ep in range(2):
-        # Gym-like step function
+        # Gym-like step function.
        metrics, decision_event, is_done = env.step(None)

        while not is_done:
-            past_week_ticks = [x for x in range(
-                decision_event.tick - 7, decision_event.tick)]
+            past_week_ticks = [
+                x for x in range(decision_event.tick - 7, decision_event.tick)
+            ]
            decision_port_idx = decision_event.port_idx
            intr_port_infos = ["booking", "empty", "shortage"]

-            # Query the decision port booking, empty container inventory, shortage information in the past week
-            past_week_info = env.snapshot_list["ports"][past_week_ticks:
-                                                        decision_port_idx:
-                                                        intr_port_infos]
+            # Query the snapshot list of the environment to get the information of
+            # the booking, empty container inventory, shortage of the decision port in the past week.
+            past_week_info = env.snapshot_list["ports"][
+                past_week_ticks : decision_port_idx : intr_port_infos
+            ]

-            dummy_action = Action(decision_event.vessel_idx,
-                                decision_event.port_idx, 0)
+            dummy_action = Action(
+                vessel_idx=decision_event.vessel_idx,
+                port_idx=decision_event.port_idx,
+                quantity=0
+            )

-            # Drive environment with dummy action (no repositioning)
+            # Drive environment with dummy action (no repositioning).
            metrics, decision_event, is_done = env.step(dummy_action)

-        # Query environment business metrics at the end of an episode, it is your optimized object (usually includes multi-target).
-        print(f"ep: {ep}, environment metrics: {env.get_metrics()}")
+        # Query environment business metrics at the end of an episode,
+        # it is your optimized object (usually includes multi-target).
+        print(f"ep: {ep}, environment metrics: {env.metrics}")
        env.reset()

 Contents
-====================
-.. toctree::
-    :maxdepth: 2
-    :caption: Installation
-
    installation/pip_install.md
    installation/playground.md
    installation/grass_cluster_provisioning_on_azure.md
@ -85,14 +83,6 @@ Contents
    scenarios/ecr.md
    scenarios/citi_bike.md

-.. toctree::
-    :maxdepth: 2
-    :caption: Examples
-
-    examples/hello_world.md
-    examples/ecr_single_host.md
-    examples/ecr_distributed.md
-
 .. toctree::
    :maxdepth: 2
    :caption: Key Components
@ -106,13 +96,6 @@ Contents
    key_components/communication.md
    key_components/orchestration.md

-.. toctree::
-    :maxdepth: 2
-    :caption: Experiments
-    
-    experiments/ecr.md
-    experiments/citi_bike.md
-
 .. toctree::
    :maxdepth: 2
    :caption: API Documents
--- a/docs/source/installation/grass_cluster_provisioning_on_azure.md
+++ b/docs/source/installation/grass_cluster_provisioning_on_azure.md
@ -7,7 +7,8 @@ on Azure and run your training job in a distributed environment.
 ## Prerequisites

 - [Install the Azure CLI and login](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
- [Install docker](https://docs.docker.com/engine/install/) and [Configure docker to make sure it can be managed as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)
+- [Install docker](https://docs.docker.com/engine/install/) and
+[Configure docker to make sure it can be managed as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)

 ## Cluster Management

--- a/docs/source/installation/k8s_cluster_provisioning_on_azure.md
+++ b/docs/source/installation/k8s_cluster_provisioning_on_azure.md
@ -8,7 +8,8 @@ on Azure and run your training job in a distributed environment.

 - [Install the Azure CLI and login](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
 - [Install and set up kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
- [Install docker](https://docs.docker.com/engine/install/) and [Configure docker to make sure it can be managed as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)
+- [Install docker](https://docs.docker.com/engine/install/) and
+[Configure docker to make sure it can be managed as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)

 ## Cluster Management

--- a/docs/source/installation/pip_install.md
+++ b/docs/source/installation/pip_install.md
@ -6,27 +6,43 @@
 pip install maro
 ```

-## Install from Source
+## Install MARO from Source ([Editable Mode](https://pip.pypa.io/en/stable/reference/pip_install/#editable-installs))

-### Prerequisites
+- Prerequisites
+  - [Python >= 3.6, < 3.8](https://www.python.org/downloads/)
+  - C++ Compiler
+    - Linux or Mac OS X: `gcc`
+    - Windows: [Build Tools for Visual Studio 2017](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15)

- [Python >= 3.6, < 3.8](https://www.python.org/downloads/)
- C++ Compiler
-  - Linux or Mac OS X: `gcc`
-  - Windows: [Build Tools for Visual Studio 2017](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15)
+- Enable Virtual Environment
+  - Mac OS / Linux

-```sh
-# If your environment is not clean, create a virtual environment first
-python -m venv maro_venv
-source maro_venv/bin/activate
+    ```sh
+    # If your environment is not clean, create a virtual environment firstly.
+    python -m venv maro_venv
+    source ./maro_venv/bin/activate
+    ```

-# Install MARO from source, if you don't need CLI full feature
-pip install -r ./maro/requirements.build.txt
+  - Windows

-# compile cython files
-bash scripts/compile_cython.sh
-pip install -e .
+    ```powershell
+    # If your environment is not clean, create a virtual environment firstly.
+    python -m venv maro_venv
+    .\maro_venv\Scripts\activate
+    ```

-# Or with script
-bash scripts/build_maro.sh
-```
+- Install MARO
+
+  - Mac OS / Linux
+
+    ```sh
+    # Install MARO from source.
+    bash scripts/install_maro.sh
+    ```
+
+  - Windows
+
+    ```powershell
+    # Install MARO from source.
+    .\scripts\install_maro.bat
+    ```
--- a/docs/source/installation/playground.md
+++ b/docs/source/installation/playground.md
@ -36,4 +36,4 @@ docker run -p 40009:40009 -p 40010:40010 -p 40011:40011 maro/playground:cpu
 | `examples`  | Showcases of predefined scenarios. |
 | `notebooks` | Quick-start tutorial.              |

-*(Those not mentioned in the table can be ignored.)*
+*(Those not mentioned in the table can be ignored.)*
--- a/docs/source/key_components/business_engine.md
+++ b/docs/source/key_components/business_engine.md
@ -42,4 +42,4 @@ Generally, the business time series data is read from the historical log or
 generated by a data generation model. Currently, for topologies in Citi Bike
 scenario, data processing is needed before starting the simulation. You can find
 the brief introduction of the data processing command in
-[Data Processing](../scenarios/citi_bike.html#data-processing).
+[Data Processing](../scenarios/citi_bike.html#data-preparation).
--- a/docs/source/key_components/simulation_toolkit.md
+++ b/docs/source/key_components/simulation_toolkit.md
@ -56,39 +56,42 @@ workflow and code snippet.
 from maro.simulator import Env
 from maro.simulator.scenarios.ecr.common import Action

-start_tick = 0
-durations = 100  # 100 days
-
 # Initialize an environment with a specific scenario, related topology.
-env = Env(scenario="ecr", topology="5p_ssddd_l0.0",
-          start_tick=start_tick, durations=durations)
+# In ECR, 1 tick means 1 day, here durations=100 means a length of 100 days
+env = Env(scenario="ecr", topology="toy.5p_ssddd_l0.0", start_tick=0, durations=100)

 # Query environment summary, which includes business instances, intra-instance attributes, etc.
 print(env.summary)

 for ep in range(2):
-    # Gym-like step function
+    # Gym-like step function.
    metrics, decision_event, is_done = env.step(None)

    while not is_done:
-        past_week_ticks = [x for x in range(
-            decision_event.tick - 7, decision_event.tick)]
+        past_week_ticks = [
+            x for x in range(decision_event.tick - 7, decision_event.tick)
+        ]
        decision_port_idx = decision_event.port_idx
        intr_port_infos = ["booking", "empty", "shortage"]

-        # Query the decision port booking, empty container inventory, shortage information in the past week
-        past_week_info = env.snapshot_list["ports"][past_week_ticks:
-                                                    decision_port_idx:
-                                                    intr_port_infos]
+        # Query the snapshot list of the environment to get the information of
+        # the booking, empty container inventory, shortage of the decision port in the past week.
+        past_week_info = env.snapshot_list["ports"][
+            past_week_ticks : decision_port_idx : intr_port_infos
+        ]

-        dummy_action = Action(decision_event.vessel_idx,
-                              decision_event.port_idx, 0)
+        dummy_action = Action(
+            vessel_idx=decision_event.vessel_idx,
+            port_idx=decision_event.port_idx,
+            quantity=0
+        )

-        # Drive environment with dummy action (no repositioning)
+        # Drive environment with dummy action (no repositioning).
        metrics, decision_event, is_done = env.step(dummy_action)

-    # Query environment business metrics at the end of an episode, it is your optimized object (usually includes multi-target).
-    print(f"ep: {ep}, environment metrics: {env.get_metrics()}")
+    # Query environment business metrics at the end of an episode,
+    # it is your optimized object (usually includes multi-target).
+    print(f"ep: {ep}, environment metrics: {env.metrics}")
    env.reset()
 ```

--- a/docs/source/scenarios/citi_bike.md
+++ b/docs/source/scenarios/citi_bike.md
@ -1,4 +1,4 @@
-# Citi Bike (Bike Repositioning)
+# Bike Repositioning (Citi Bike)

 The Citi Bike scenario simulates the bike repositioning problem triggered by the
 one-way bike trips based on the public trip data from
@ -144,70 +144,20 @@ topologies, the definition of the bike flow and the trigger mechanism of
 repositioning actions are the same as those in the toy topologies. We provide
 this series of topologies to better simulate the actual Citi Bike scenario.

-### Naive Baseline
+## Quick Start

-Below are the performance of *no repositioning* and *random repositioning* in
-different topologies. The performance metric used here is the *fulfillment ratio*.
-
-| Topology  | No Repositioning | Random Repositioning |
-| :-------: | :--------------: | :------------------: |
-| toy.3s_4t |                  |                      |
-| toy.4s_4t |                  |                      |
-| toy.5s_6t |                  |                      |
-
-| Topology  | No Repositioning | Random Repositioning |
-| :-------: | :--------------: | :------------------: |
-| ny.201801 |                  |                      |
-| ny.201802 |                  |                      |
-| ny.201803 |                  |                      |
-| ny.201804 |                  |                      |
-| ny.201805 |                  |                      |
-| ny.201806 |                  |                      |
-| ny.201807 |                  |                      |
-| ny.201808 |                  |                      |
-| ny.201809 |                  |                      |
-| ny.201810 |                  |                      |
-| ny.201811 |                  |                      |
-| ny.201812 |                  |                      |
-
-| Topology  | No Repositioning | Random Repositioning |
-| :-------: | :--------------: | :------------------: |
-| ny.201901 |                  |                      |
-| ny.201902 |                  |                      |
-| ny.201903 |                  |                      |
-| ny.201904 |                  |                      |
-| ny.201905 |                  |                      |
-| ny.201906 |                  |                      |
-| ny.201907 |                  |                      |
-| ny.201908 |                  |                      |
-| ny.201909 |                  |                      |
-| ny.201910 |                  |                      |
-| ny.201911 |                  |                      |
-| ny.201912 |                  |                      |
-
-| Topology  | No Repositioning | Random Repositioning |
-| :-------: | :--------------: | :------------------: |
-| ny.202001 |                  |                      |
-| ny.202002 |                  |                      |
-| ny.202003 |                  |                      |
-| ny.202004 |                  |                      |
-| ny.202005 |                  |                      |
-| ny.202006 |                  |                      |
-
-<!-- ## Quick Start
-
-### Data Processing
+### Data Preparation

 To start the simulation of Citi Bike scenario, users need to first generate the
-related data. Below are the introduction to the related commands:
+related data. Below is the introduction to the related commands:

 #### Environment List Command

-The data environment list command is used to list the environments that need the
+The data environment `list` command is used to list the environments that need the
 data files generated before the simulation.

-```console
-user@maro:~/MARO$ maro env data list
+```sh
+maro env data list

 scenario: citi_bike, topology: ny.201801
 scenario: citi_bike, topology: ny.201802
@ -221,9 +171,9 @@ scenario: citi_bike, topology: ny.201806

 #### Generate Command

-The data generate command is used to automatically download and build the
-specified predefined scenario and topology data files for the simulation.
-Currently, there are three arguments for the data generate command:
+The data `generate` command is used to automatically download and build the specified
+predefined scenario and topology data files for the simulation. Currently, there
+are three arguments for the data `generate` command:

 - `-s`: required, used to specify the predefined scenario. Valid scenarios are
 listed in the result of [environment list command](#environment-list-command).
@ -232,8 +182,8 @@ listed in the result of [environment list command](#environment-list-command).
 - `-f`: optional, if set, to force to re-download and re-generate the data files
 and overwrite the already existing ones.

-```console
-user@maro:~/MARO$ maro env data generate -s citi_bike -t ny.201802
+```sh
+maro env data generate -s citi_bike -t ny.201802

 The data files for citi_bike-ny201802 will then be downloaded and deployed to ~/.maro/data/citibike/_build/ny201802.
 ```
@ -254,9 +204,9 @@ For the example above, the directory structure should be like:

 #### Convert Command

-The data convert command is used to convert the CSV data files to binary data
+The data `convert` command is used to convert the CSV data files to binary data
 files that the simulator needs. Currently, there are three arguments for the data
-convert command:
+`convert` command:

 - `--meta`: required, used to specify the path of the meta file. The source
 columns that to be converted and the data type of each columns should be
@ -266,101 +216,255 @@ If multiple source CSV data files are needed, you can list all the full paths of
 the source files in a specific file and use a `@` symbol to specify it.
 - `--output`: required, used to specify the path of the target binary file.

-```console
-user@maro:~/MARO$ maro data convert --meta ~/.maro/data/citibike/meta/trips.yml --file ~/.maro/data/citibike/source/_clean/ny201801/trip.csv --output ~/.maro/data/citibike/_build/ny201801/trip.bin
+```sh
+maro data convert --meta ~/.maro/data/citibike/meta/trips.yml --file ~/.maro/data/citibike/source/_clean/ny201801/trip.csv --output ~/.maro/data/citibike/_build/ny201801/trip.bin
 ```

-### DecisionEvent
+### Environment Interface

-Once the environment need the agent's response to promote the simulation, it will throw an **DecisionEvent**. In the scenario of citi_bike, the information of each DecisionEvent is listed as below:
+Before starting interaction with the environment, we need to know the definition
+of `DecisionEvent` and `Action` in Citi Bike scenario first. Besides, you can query
+the environment [snapshot list](../key_components/data_model.html#advanced-features)
+to get more detailed information for the decision making.

- **station_idx**: the id of the station/agent that needs to respond to the environment
- **tick**: the corresponding tick
- **frame_index**: the corresponding frame index, that is the index of the corresponding snapshot in the snapshot list
- **type**: the decision type of this decision event. In citi_bike scenario, there are 2 types:
-  - **Supply**: There is too many bikes in the corresponding station, it's better to reposition some of them to other stations.
-  - **Demand**: There is no enough bikes in the corresponding station, it's better to reposition bikes from other stations
+#### DecisionEvent

- **action_scope**: a dictionary of valid action items.
-  - The key of the item indicates the station/agent id;
+Once the environment need the agent's response to reposition bikes, it will
+throw an `DecisionEvent`. In the scenario of Citi Bike, the information of each
+`DecisionEvent` is listed as below:
+
+- **station_idx** (int): The id of the station/agent that needs to respond to the
+environment.
+- **tick** (int): The corresponding tick.
+- **frame_index** (int): The corresponding frame index, that is the index of the
+corresponding snapshot in the environment snapshot list.
+- **type** (DecisionType): The decision type of this decision event. In Citi Bike
+scenario, there are 2 types:
+  - `Supply` indicates there is too many bikes in the corresponding station, so
+  it is better to reposition some of them to other stations.
+  - `Demand` indicates there is no enough bikes in the corresponding station, so
+  it is better to reposition bikes from other stations.
+- **action_scope** (dict): A dictionary that maintains the information for
+calculating the valid action scope:
+  - The key of these item indicate the station/agent ids.
  - The meaning of the value differs for different decision type:
-    - If the decision type is Supply, the value of the station itself means its bike inventory at that moment, while the value of other target stations means the number of their empty docks;
-    - If the decision type is Demand, the value of the station itself means the number of its empty docks, while the value of other target stations means their bike inventory.
+    - If the decision type is `Supply`, the value of the station itself means its
+    bike inventory at that moment, while the value of other target stations means
+    the number of their empty docks.
+    - If the decision type is `Demand`, the value of the station itself means the
+    number of its empty docks, while the value of other target stations means
+    their bike inventory.

-### Action
+#### Action

-Once we get a **DecisionEvent** from the envirionment, we should respond with an **Action**. Valid Action could be:
+Once we get a `DecisionEvent` from the environment, we should respond with an
+`Action`. Valid `Action` could be:

- None, which means do nothing.
- A valid Action instance, including:
-  - **from_station_idx**: int, the id of the source station of the bike transportation
-  - **to_station_idx**: int, the id of the destination station of the bike transportation
-  - **number**: int, the quantity of the bike transportation
+- `None`, which means do nothing.
+- A valid `Action` instance, including:
+  - **from_station_idx** (int): The id of the source station of the bike
+  transportation.
+  - **to_station_idx** (int): The id of the destination station of the bike
+  transportation.
+  - **number** (int): The quantity of the bike transportation.

 ### Example

+Here we will show you a simple example of interaction with the environment in
+random mode, we hope this could help you learn how to use the environment interfaces:
+
 ```python
 from maro.simulator import Env
 from maro.simulator.scenarios.citi_bike.common import Action, DecisionEvent, DecisionType

 import random

-# Initialize an Env for citi_bike scenario
-env = Env(scenario="citi_bike", topology="ny201912", start_tick=0, durations=1440, snapshot_resolution=30)
+# Initialize an environment of Citi Bike scenario, with a specific topology.
+# In CitiBike, 1 tick means 1 minute, durations=1440 here indicates a length of 1 day.
+# In CitiBike, one snapshot will be maintained every snapshot_resolution ticks,
+# snapshot_resolution=30 here indicates 1 snapshot per 30 minutes.
+env = Env(scenario="citi_bike", topology="toy.3s_4t", start_tick=0, durations=1440, snapshot_resolution=30)

-is_done: bool = False
-reward: int = None
+# Query for the environment summary, the business instances and intra-instance attributes
+# will be listed in the output for your reference.
+print(env.summary)
+
+metrics: object = None
 decision_event: DecisionEvent = None
+is_done: bool = False
 action: Action = None

-# Start the env with a None Action
-reward, decision_event, is_done = env.step(action)
+num_episode = 2
+for ep in range(num_episode):
+    # Gym-like step function.
+    metrics, decision_event, is_done = env.step(None)

-while not is_done:
-    if decision_event.type == DecisionType.Supply:
-        # the value of the station itself means the bike inventory if Supply
-        self_bike_inventory = decision_event.action_scope[decision_event.station_idx]
-        # the value of other stations means the quantity of empty docks if Supply
-        target_idx_dock_tuple_list = [
-            (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx
+    while not is_done:
+        past_2hour_frames = [
+            x for x in range(decision_event.frame_index - 4, decision_event.frame_index)
        ]
-        # random choose a target station weighted by the quantity of empty docks
-        target_idx, target_dock = random.choices(
-            target_idx_dock_tuple_list,
-            weights=[item[1] for item in target_idx_dock_tuple_list]
-        )[0]
-        # generate the corresponding random Action
-        action = Action(
-            from_station_idx=decision_event.station_idx,
-            to_station_idx=target_idx,
-            number=random.randint(0, min(self_bike_inventory, target_dock))
-        )
+        decision_station_idx = decision_event.station_idx
+        intr_station_infos = ["trip_requirement", "bikes", "shortage"]

-    elif decision_event.type == DecisionType.Demand:
-        # the value of the station itself means the quantity of empty docks if Demand
-        self_available_dock = decision_event.action_scope[decision_event.station_idx]
-        # the value of other stations means their bike inventory if Demand
-        target_idx_inventory_tuple_list = [
-            (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx
+        # Query the snapshot list of this environment to get the information of
+        # the trip requirements, bikes, shortage of the decision station in the past 2 hours.
+        past_2hour_info = env.snapshot_list["stations"][
+            past_2hour_frames : decision_station_idx : intr_station_infos
        ]
-        # random choose a target station weighted by the bike inventory
-        target_idx, target_inventory = random.choices(
-            target_idx_inventory_tuple_list,
-            weights=[item[1] for item in target_idx_inventory_tuple_list]
-        )[0]
-        # generate the corresponding random Action
-        action = Action(
-            from_station_idx=target_idx,
-            to_station_idx=decision_event.station_idx,
-            number=random.randint(0, min(self_available_dock, target_inventory))
-        )

-    else:
-        action = None
+        if decision_event.type == DecisionType.Supply:
+            # Supply: the value of the station itself means the bike inventory.
+            self_bike_inventory = decision_event.action_scope[decision_event.station_idx]
+            # Supply: the value of other stations means the quantity of empty docks.
+            target_idx_dock_tuple_list = [
+                (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx
+            ]
+            # Randomly choose a target station weighted by the quantity of empty docks.
+            target_idx, target_dock = random.choices(
+                target_idx_dock_tuple_list,
+                weights=[item[1] for item in target_idx_dock_tuple_list],
+                k=1
+            )[0]
+            # Generate the corresponding random Action.
+            action = Action(
+                from_station_idx=decision_event.station_idx,
+                to_station_idx=target_idx,
+                number=random.randint(0, min(self_bike_inventory, target_dock))
+            )

-    # Random sampling some records to show in the output   TODO
-    #  if random.random() > 0.95:
-        #  print("*************\n{decision_event}\n{action}")
-    # Respond the environment with the generated Action
-    reward, decision_event, is_done = env.step(action)
-``` -->
+        elif decision_event.type == DecisionType.Demand:
+            # Demand: the value of the station itself means the quantity of empty docks.
+            self_available_dock = decision_event.action_scope[decision_event.station_idx]
+            # Demand: the value of other stations means their bike inventory.
+            target_idx_inventory_tuple_list = [
+                (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx
+            ]
+            # Randomly choose a target station weighted by the bike inventory.
+            target_idx, target_inventory = random.choices(
+                target_idx_inventory_tuple_list,
+                weights=[item[1] for item in target_idx_inventory_tuple_list],
+                k=1
+            )[0]
+            # Generate the corresponding random Action.
+            action = Action(
+                from_station_idx=target_idx,
+                to_station_idx=decision_event.station_idx,
+                number=random.randint(0, min(self_available_dock, target_inventory))
+            )
+
+        else:
+            action = None
+
+        # Drive the environment with the random action.
+        metrics, decision_event, is_done = env.step(action)
+
+    # Query for the environment business metrics at the end of each episode,
+    # it is usually users' optimized object (usually includes multi-target).
+    print(f"ep: {ep}, environment metrics: {env.metrics}")
+    env.reset()
+```
+
+Jump to [this notebook](https://github.com/microsoft/maro/blob/master/notebooks/bike_repositioning/interact_with_simulator.ipynb)
+for a quick experience.
+
+<!--
+### Naive Baseline
+
+Below are the final environment metrics of the method *no repositioning* and
+*random repositioning* in different topologies. For each experiment, we setup
+the environment and test for a duration of 1 week.
+
+#### No Repositioning
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| toy.3s_4t |       +/-         |       +/-         |       +/-         |
+| toy.4s_4t |       +/-         |       +/-         |       +/-         |
+| toy.5s_6t |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.201801 |       +/-         |       +/-         |       +/-         |
+| ny.201802 |       +/-         |       +/-         |       +/-         |
+| ny.201803 |       +/-         |       +/-         |       +/-         |
+| ny.201804 |       +/-         |       +/-         |       +/-         |
+| ny.201805 |       +/-         |       +/-         |       +/-         |
+| ny.201806 |       +/-         |       +/-         |       +/-         |
+| ny.201807 |       +/-         |       +/-         |       +/-         |
+| ny.201808 |       +/-         |       +/-         |       +/-         |
+| ny.201809 |       +/-         |       +/-         |       +/-         |
+| ny.201810 |       +/-         |       +/-         |       +/-         |
+| ny.201811 |       +/-         |       +/-         |       +/-         |
+| ny.201812 |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.201901 |       +/-         |       +/-         |       +/-         |
+| ny.201902 |       +/-         |       +/-         |       +/-         |
+| ny.201903 |       +/-         |       +/-         |       +/-         |
+| ny.201904 |       +/-         |       +/-         |       +/-         |
+| ny.201905 |       +/-         |       +/-         |       +/-         |
+| ny.201906 |       +/-         |       +/-         |       +/-         |
+| ny.201907 |       +/-         |       +/-         |       +/-         |
+| ny.201908 |       +/-         |       +/-         |       +/-         |
+| ny.201909 |       +/-         |       +/-         |       +/-         |
+| ny.201910 |       +/-         |       +/-         |       +/-         |
+| ny.201911 |       +/-         |       +/-         |       +/-         |
+| ny.201912 |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.202001 |       +/-         |       +/-         |       +/-         |
+| ny.202002 |       +/-         |       +/-         |       +/-         |
+| ny.202003 |       +/-         |       +/-         |       +/-         |
+| ny.202004 |       +/-         |       +/-         |       +/-         |
+| ny.202005 |       +/-         |       +/-         |       +/-         |
+| ny.202006 |       +/-         |       +/-         |       +/-         |
+
+#### Random Repositioning
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| toy.3s_4t |       +/-         |       +/-         |       +/-         |
+| toy.4s_4t |       +/-         |       +/-         |       +/-         |
+| toy.5s_6t |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.201801 |       +/-         |       +/-         |       +/-         |
+| ny.201802 |       +/-         |       +/-         |       +/-         |
+| ny.201803 |       +/-         |       +/-         |       +/-         |
+| ny.201804 |       +/-         |       +/-         |       +/-         |
+| ny.201805 |       +/-         |       +/-         |       +/-         |
+| ny.201806 |       +/-         |       +/-         |       +/-         |
+| ny.201807 |       +/-         |       +/-         |       +/-         |
+| ny.201808 |       +/-         |       +/-         |       +/-         |
+| ny.201809 |       +/-         |       +/-         |       +/-         |
+| ny.201810 |       +/-         |       +/-         |       +/-         |
+| ny.201811 |       +/-         |       +/-         |       +/-         |
+| ny.201812 |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.201901 |       +/-         |       +/-         |       +/-         |
+| ny.201902 |       +/-         |       +/-         |       +/-         |
+| ny.201903 |       +/-         |       +/-         |       +/-         |
+| ny.201904 |       +/-         |       +/-         |       +/-         |
+| ny.201905 |       +/-         |       +/-         |       +/-         |
+| ny.201906 |       +/-         |       +/-         |       +/-         |
+| ny.201907 |       +/-         |       +/-         |       +/-         |
+| ny.201908 |       +/-         |       +/-         |       +/-         |
+| ny.201909 |       +/-         |       +/-         |       +/-         |
+| ny.201910 |       +/-         |       +/-         |       +/-         |
+| ny.201911 |       +/-         |       +/-         |       +/-         |
+| ny.201912 |       +/-         |       +/-         |       +/-         |
+
+| Topology  | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------: | :---------------: | :---------------: | :---------------: |
+| ny.202001 |       +/-         |       +/-         |       +/-         |
+| ny.202002 |       +/-         |       +/-         |       +/-         |
+| ny.202003 |       +/-         |       +/-         |       +/-         |
+| ny.202004 |       +/-         |       +/-         |       +/-         |
+| ny.202005 |       +/-         |       +/-         |       +/-         |
+| ny.202006 |       +/-         |       +/-         |       +/-         |
+-->
--- a/docs/source/scenarios/ecr.md
+++ b/docs/source/scenarios/ecr.md
@ -130,128 +130,223 @@ manually.

 *(To make it clearer, the figure above only shows the service routes among ports.)*

-### Naive Baseline
-
-Below are the performance of *no repositioning* and *random repositioning* in
-different topologies. The performance metric used here is the *fulfillment ratio*.
-
-| Topology         | No Repositioning | Random Repositioning |
-| :--------------: | :--------------: | :------------------: |
-| toy.4p_ssdd_l0.0 | 11.16 +/- 0.00   | 36.76 +/-  3.19      |
-| toy.4p_ssdd_l0.1 | 11.16 +/- 0.00   | 78.42 +/-  5.67      |
-| toy.4p_ssdd_l0.2 | 11.16 +/- 0.00   | 71.37 +/- 12.51      |
-| toy.4p_ssdd_l0.3 | 11.16 +/- 0.00   | 67.37 +/- 11.60      |
-| toy.4p_ssdd_l0.4 | 11.16 +/- 0.03   | 68.99 +/- 12.11      |
-| toy.4p_ssdd_l0.5 | 11.16 +/- 0.03   | 67.07 +/- 12.81      |
-| toy.4p_ssdd_l0.6 | 11.16 +/- 0.03   | 67.90 +/-  4.44      |
-| toy.4p_ssdd_l0.7 | 11.16 +/- 0.03   | 62.13 +/-  5.82      |
-| toy.4p_ssdd_l0.8 | 11.17 +/- 0.03   | 64.05 +/-  5.16      |
-
-| Topology          | No Repositioning | Random Repositioning |
-| :---------------: | :--------------: | :------------------: |
-| toy.5p_ssddd_l0.0 | 22.32 +/- 0.00   | 57.15 +/- 4.38       |
-| toy.5p_ssddd_l0.1 | 22.32 +/- 0.00   | 64.15 +/- 3.91       |
-| toy.5p_ssddd_l0.2 | 22.32 +/- 0.00   | 64.14 +/- 3.54       |
-| toy.5p_ssddd_l0.3 | 22.33 +/- 0.00   | 64.37 +/- 3.87       |
-| toy.5p_ssddd_l0.4 | 22.32 +/- 0.06   | 63.53 +/- 3.93       |
-| toy.5p_ssddd_l0.5 | 22.32 +/- 0.06   | 63.93 +/- 3.72       |
-| toy.5p_ssddd_l0.6 | 22.32 +/- 0.06   | 54.60 +/- 5.40       |
-| toy.5p_ssddd_l0.7 | 22.32 +/- 0.06   | 45.00 +/- 6.05       |
-| toy.5p_ssddd_l0.8 | 22.34 +/- 0.06   | 46.32 +/- 4.96       |
-
-| Topology           | No Repositioning | Random Repositioning |
-| :----------------: | :--------------: | :------------------: |
-| toy.6p_sssbdd_l0.0 | 34.15 +/- 0.00   | 44.69 +/- 6.84       |
-| toy.6p_sssbdd_l0.1 | 34.15 +/- 0.00   | 59.35 +/- 5.11       |
-| toy.6p_sssbdd_l0.2 | 34.15 +/- 0.00   | 59.35 +/- 4.97       |
-| toy.6p_sssbdd_l0.3 | 34.16 +/- 0.00   | 56.69 +/- 4.45       |
-| toy.6p_sssbdd_l0.4 | 34.14 +/- 0.09   | 56.72 +/- 4.37       |
-| toy.6p_sssbdd_l0.5 | 34.14 +/- 0.09   | 56.13 +/- 4.34       |
-| toy.6p_sssbdd_l0.6 | 34.14 +/- 0.09   | 56.76 +/- 1.52       |
-| toy.6p_sssbdd_l0.7 | 34.14 +/- 0.09   | 55.86 +/- 2.70       |
-| toy.6p_sssbdd_l0.8 | 34.18 +/- 0.09   | 55.36 +/- 2.11       |
-
-| Topology              | No Repositioning | Random Repositioning |
-| :-------------------: | :--------------: | :------------------: |
-| global_trade.22p_l0.0 | 68.57 +/- 0.00   | 59.27 +/- 1.56       |
-| global_trade.22p_l0.1 | 66.64 +/- 0.00   | 64.56 +/- 0.70       |
-| global_trade.22p_l0.2 | 66.55 +/- 0.00   | 64.73 +/- 0.57       |
-| global_trade.22p_l0.3 | 65.24 +/- 0.00   | 63.31 +/- 0.68       |
-| global_trade.22p_l0.4 | 65.22 +/- 0.15   | 63.46 +/- 0.76       |
-| global_trade.22p_l0.5 | 64.90 +/- 0.15   | 63.10 +/- 0.79       |
-| global_trade.22p_l0.6 | 63.74 +/- 0.49   | 60.98 +/- 0.50       |
-| global_trade.22p_l0.7 | 60.14 +/- 0.47   | 56.38 +/- 0.75       |
-| global_trade.22p_l0.8 | 60.17 +/- 0.45   | 56.45 +/- 0.67       |
-
-<!-- ## Quick Start
+## Quick Start

 ### Data Preparation

+To start a simulation in ECR scenario, no extra data processing is needed. You
+can just specify the scenario and the topology when initialize an environment and
+enjoy your exploration in this scenario.
+
 ### Environment Interface

+Before starting interaction with the environment, we need to know the definition
+of `DecisionEvent` and `Action` in ECR scenario first. Besides, you can query the
+environment [snapshot list](../key_components/data_model.html#advanced-features)
+to get more detailed information for the decision making.
+
 #### DecisionEvent

-Once the environment need the agent's response to promote the simulation, it will throw an **DecisionEvent**. In the scenario of ECR, the information of each DecisionEvent is listed as below:
+Once the environment need the agent's response to promote the simulation, it will
+throw an `DecisionEvent`. In the scenario of ECR, the information of each
+`DecisionEvent` is listed as below:

- **tick**: (int) the corresponding tick
- **port_idx**: (int) the id of the port/agent that needs to respond to the environment
- **vessel_idx**: (int) the id of the vessel/operation object of the port/agnet.
- **snapshot_list**: (int) **Snapshots of the environment to input into the decision model** TODO: confirm the meaning
- **action_scope**: **Load and discharge scope for agent to generate decision**
- **early_discharge**: **Early discharge number of corresponding vessel**
+- **tick** (int): The corresponding tick.
+- **port_idx** (int): The id of the port/agent that needs to respond to the
+environment.
+- **vessel_idx** (int): The id of the vessel/operation object of the port/agent.
+- **action_scope** (ActionScope): ActionScope has two attributes:
+  - `load` indicates the maximum quantity that can be loaded from the port the
+  vessel.
+  - `discharge` indicates the maximum quantity that can be discharged from the
+  vessel to the port.
+- **early_discharge** (int): When the available capacity in the vessel is not
+enough to load the ladens, some of the empty containers in the vessel will be
+early discharged to free the space. The quantity of empty containers that have
+been early discharged due to the laden loading is recorded in this field.

 #### Action

-Once we get a DecisionEvent from the environment, we should respond with an Action. Valid Action could be:
+Once we get a `DecisionEvent` from the environment, we should respond with an
+`Action`. Valid `Action` could be:

- None, which means do nothing.
- A valid Action instance, including:
-  - **vessel_idx**: (int) the id of the vessel/operation object of the port/agent.
-  - **port_idx**: (int) the id of the port/agent that take this action.
-  - **quantity**: (int) the sign of this value denotes different meanings:
-    - positive quantity means unloading empty containers from vessel to port.
-    - negative quantity means loading empty containers from port to vessel.
+- `None`, which means do nothing.
+- A valid `Action` instance, including:
+  - **vessel_idx** (int): The id of the vessel/operation object of the port/agent.
+  - **port_idx** (int): The id of the port/agent that take this action.
+  - **quantity** (int): The sign of this value denotes different meanings:
+    - Positive quantity means discharging empty containers from vessel to port.
+    - Negative quantity means loading empty containers from port to vessel.

-### Example (Random Action)
+### Example
+
+Here we will show you a simple example of interaction with the environment in
+random mode, we hope this could help you learn how to use the environment interfaces:

 ```python
 from maro.simulator import Env
-from maro.simulator.scenarios.ecr.common import Action
+from maro.simulator.scenarios.ecr.common import Action, DecisionEvent

-start_tick = 0
-durations = 100  # 100 days
+import random

-# Initialize an environment with a specific scenario, related topology.
-env = Env(scenario="ecr", topology="5p_ssddd_l0.0",
-          start_tick=start_tick, durations=durations)
+# Initialize an environment of ECR scenario, with a specific topology.
+# In ECR, 1 tick means 1 day, durations=100 here indicates a length of 100 days.
+env = Env(scenario="ecr", topology="toy.5p_ssddd_l0.0", start_tick=0, durations=100)

-# Query environment summary, which includes business instances, intra-instance attributes, etc.
+# Query for the environment summary, the business instances and intra-instance attributes
+# will be listed in the output for your reference.
 print(env.summary)

-for ep in range(2):
-    # Gym-like step function
+metrics: object = None
+decision_event: DecisionEvent = None
+is_done: bool = False
+action: Action = None
+
+num_episode = 2
+for ep in range(num_episode):
+    # Gym-like step function.
    metrics, decision_event, is_done = env.step(None)

    while not is_done:
-        past_week_ticks = [x for x in range(
-            decision_event.tick - 7, decision_event.tick)]
+        past_week_ticks = [
+            x for x in range(decision_event.tick - 7, decision_event.tick)
+        ]
        decision_port_idx = decision_event.port_idx
        intr_port_infos = ["booking", "empty", "shortage"]

-        # Query the decision port booking, empty container inventory, shortage information in the past week
-        past_week_info = env.snapshot_list["ports"][past_week_ticks:
-                                                    decision_port_idx:
-                                                    intr_port_infos]
+        # Query the snapshot list of this environment to get the information of
+        # the booking, empty, shortage of the decision port in the past week.
+        past_week_info = env.snapshot_list["ports"][
+            past_week_ticks : decision_port_idx : intr_port_infos
+        ]

-        dummy_action = Action(decision_event.vessel_idx,
-                              decision_event.port_idx, 0)
+        # Generate a random Action according to the action_scope in DecisionEvent.
+        random_quantity = random.randint(
+            -decision_event.action_scope.load,
+            decision_event.action_scope.discharge
+        )
+        action = Action(
+            vessel_idx=decision_event.vessel_idx,
+            port_idx=decision_event.port_idx,
+            quantity=random_quantity
+        )

-        # Drive environment with dummy action (no repositioning)
-        metrics, decision_event, is_done = env.step(dummy_action)
+        # Drive the environment with the random action.
+        metrics, decision_event, is_done = env.step(action)

-    # Query environment business metrics at the end of an episode, it is your optimized object (usually includes multi-target).
-    print(f"ep: {ep}, environment metrics: {env.get_metrics()}")
+    # Query for the environment business metrics at the end of each episode,
+    # it is usually users' optimized object in ECR scenario (usually includes multi-target).
+    print(f"ep: {ep}, environment metrics: {env.metrics}")
    env.reset()
 ```

-Detail link -->
+Jump to [this notebook](https://github.com/microsoft/maro/blob/master/notebooks/empty_container_repositioning/interact_with_simulator.ipynb)
+for a quick experience.
+
+<!--
+### Naive Baseline
+
+Below are the final environment metrics of the method *no repositioning* and
+*random repositioning* in different topologies. For each experiment, we setup
+the environment and test for a duration of 1120 ticks (days).
+
+#### No Repositioning
+
+| Topology         | Total Requirement | Resource Shortage | Repositioning Cost|
+| :--------------: | :---------------: | :---------------: | :---------------: |
+| toy.4p_ssdd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology          | Total Requirement | Resource Shortage | Repositioning Cost|
+| :---------------: | :---------------: | :---------------: | :---------------: |
+| toy.5p_ssddd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology           | Total Requirement | Resource Shortage | Repositioning Cost|
+| :----------------: | :---------------: | :---------------: | :---------------: |
+| toy.6p_sssbdd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology              | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------------------: | :---------------: | :---------------: | :---------------: |
+| global_trade.22p_l0.0 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.1 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.2 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.3 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.4 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.5 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.6 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.7 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.8 |       +/-         |       +/-         |       +/-         |
+
+#### Random Repositioning
+
+| Topology         | Total Requirement | Resource Shortage | Repositioning Cost|
+| :--------------: | :---------------: | :---------------: | :---------------: |
+| toy.4p_ssdd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.4p_ssdd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology          | Total Requirement | Resource Shortage | Repositioning Cost|
+| :---------------: | :---------------: | :---------------: | :---------------: |
+| toy.5p_ssddd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.5p_ssddd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology           | Total Requirement | Resource Shortage | Repositioning Cost|
+| :----------------: | :---------------: | :---------------: | :---------------: |
+| toy.6p_sssbdd_l0.0 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.1 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.2 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.3 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.4 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.5 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.6 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.7 |       +/-         |       +/-         |       +/-         |
+| toy.6p_sssbdd_l0.8 |       +/-         |       +/-         |       +/-         |
+
+| Topology              | Total Requirement | Resource Shortage | Repositioning Cost|
+| :-------------------: | :---------------: | :---------------: | :---------------: |
+| global_trade.22p_l0.0 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.1 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.2 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.3 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.4 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.5 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.6 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.7 |       +/-         |       +/-         |       +/-         |
+| global_trade.22p_l0.8 |       +/-         |       +/-         |       +/-         |
+-->
--- a/examples/citi_bike/greedy/config.yml
+++ b/examples/citi_bike/greedy/config.yml
@ -0,0 +1,10 @@
+env:
+  scenario: "citi_bike"
+  topology: "ny.201801"
+  start_tick: 0
+  durations: 2880
+  resolution: 10
+  seed: 128
+agent:
+  supply_top_k: 1
+  demand_top_k: 1
--- a/examples/citi_bike/greedy/launcher.py
+++ b/examples/citi_bike/greedy/launcher.py
@ -0,0 +1,77 @@
+import heapq
+import io
+import random
+import yaml
+
+from maro.simulator import Env
+from maro.simulator.scenarios.citi_bike.common import Action, DecisionEvent, DecisionType
+from maro.utils import convert_dottable
+
+
+with io.open("config.yml", "r") as in_file:
+    raw_config = yaml.safe_load(in_file)
+    config = convert_dottable(raw_config)
+
+
+class GreedyAgent:
+    def __init__(self, supply_top_k: int = 1, demand_top_k: int = 1):
+        """
+        Agent that executes a greedy policy. If the event type is supply, send as many bikes as possible to one of the
+        demand_k stations with the most empty slots. If the event type is demand, request as many bikes as possible from
+        one of the supply_k stations with the most bikes.
+
+        Args:
+            supply_top_k (int): number of top supply candidates to choose from.
+            demand_top_k (int): number of top demand candidates to choose from.
+        """
+        self._supply_top_k = supply_top_k
+        self._demand_top_k = demand_top_k
+
+    def choose_action(self, decision_event: DecisionEvent):
+        if decision_event.type == DecisionType.Supply:
+            # find k target stations with the most empty slots, randomly choose one of them and send as many bikes to
+            # it as allowed by the action scope
+            top_k_demands = []
+            for demand_candidate, available_docks in decision_event.action_scope.items():
+                if demand_candidate == decision_event.station_idx:
+                    continue
+
+                heapq.heappush(top_k_demands, (available_docks, demand_candidate))
+                if len(top_k_demands) > self._demand_top_k:
+                    heapq.heappop(top_k_demands)
+
+            max_reposition, target_station_idx = random.choice(top_k_demands)
+            action = Action(decision_event.station_idx, target_station_idx, max_reposition)
+        else:
+            # find k source stations with the most bikes, randomly choose one of them and request as many bikes from
+            # it as allowed by the action scope
+            top_k_supplies = []
+            for supply_candidate, available_bikes in decision_event.action_scope.items():
+                if supply_candidate == decision_event.station_idx:
+                    continue
+
+                heapq.heappush(top_k_supplies, (available_bikes, supply_candidate))
+                if len(top_k_supplies) > self._supply_top_k:
+                    heapq.heappop(top_k_supplies)
+
+            max_reposition, source_idx = random.choice(top_k_supplies)
+            action = Action(source_idx, decision_event.station_idx, max_reposition)
+
+        return action
+
+
+if __name__ == "__main__":
+    env = Env(scenario=config.env.scenario, topology=config.env.topology, start_tick=config.env.start_tick,
+              durations=config.env.durations, snapshot_resolution=config.env.resolution)
+
+    if config.env.seed is not None:
+        env.set_seed(config.env.seed)
+
+    agent = GreedyAgent(config.agent.supply_top_k, config.agent.demand_top_k)
+    metrics, decision_event, done = env.step(None)
+    while not done:
+        metrics, decision_event, done = env.step(agent.choose_action(decision_event))
+
+    print(f"Greedy agent policy performance: {env.metrics}")
+
+    env.reset()
--- a/examples/ecr/dqn/README.md
+++ b/examples/ecr/dqn/README.md
@ -0,0 +1,18 @@
+This file contains instructions on how to use MARO on the Empty Container Repositioning (ECR) scenario.
+
+### Overview
+The ECR problem is one of the quintessential use cases of MARO. The example can be run with a set of scenario 
+configurations that can be found under maro/simulator/scenarios/ecr. General experimental parameters (e.g., type of 
+topology, type of algorithm to use, number of training episodes) can be configured through config.yml. Each RL 
+formulation has a dedicated folder, e.g., dqn, and all algorithm-specific parameters can be configured through
+the config.py file in that folder.   
+
+### Single-host Single-process Mode:
+To run the ECR example using the DQN algorithm under single-host mode, go to examples/ecr/dqn and run 
+single_process_launcher.py. You may play around with the configuration if you want to try out different
+settings. 
+
+### Distributed Mode:
+The examples/ecr/dqn/components folder contains dist_learner.py and dist_actor.py for distributed
+training. For debugging purposes, we provide a script that simulates distributed mode using multi-processing.
+Simply go to examples/ecr/dqn and run multi_process_launcher.py to start the learner and actor processes.     
--- a/examples/ecr/dqn/init.py
+++ b/examples/ecr/dqn/init.py
--- a/examples/ecr/dqn/components/action_shaper.py
+++ b/examples/ecr/dqn/components/action_shaper.py
@ -0,0 +1,33 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from maro.rl import ActionShaper
+from maro.simulator.scenarios.ecr.common import Action
+
+
+class ECRActionShaper(ActionShaper):
+    def __init__(self, action_space):
+        super().__init__()
+        self._action_space = action_space
+        self._zero_action_index = action_space.index(0)
+
+    def __call__(self, model_action, decision_event, snapshot_list):
+        scope = decision_event.action_scope
+        tick = decision_event.tick
+        port_idx = decision_event.port_idx
+        vessel_idx = decision_event.vessel_idx
+
+        port_empty = snapshot_list["ports"][tick: port_idx: ["empty", "full", "on_shipper", "on_consignee"]][0]
+        vessel_remaining_space = snapshot_list["vessels"][tick: vessel_idx: ["empty", "full", "remaining_space"]][2]
+        early_discharge = snapshot_list["vessels"][tick:vessel_idx: "early_discharge"][0]
+        assert 0 <= model_action < len(self._action_space)
+
+        if model_action < self._zero_action_index:
+            actual_action = max(round(self._action_space[model_action] * port_empty), -vessel_remaining_space)
+        elif model_action > self._zero_action_index:
+            plan_action = self._action_space[model_action] * (scope.discharge + early_discharge) - early_discharge
+            actual_action = round(plan_action) if plan_action > 0 else round(self._action_space[model_action] * scope.discharge)
+        else:
+            actual_action = 0
+
+        return Action(vessel_idx, port_idx, actual_action)
--- a/examples/ecr/dqn/components/agent.py
+++ b/examples/ecr/dqn/components/agent.py
@ -0,0 +1,28 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import numpy as np
+
+from maro.rl import AbsAgent, ColumnBasedStore
+
+
+class ECRAgent(AbsAgent):
+    def __init__(self, name, algorithm, experience_pool: ColumnBasedStore, min_experiences_to_train,
+                 num_batches, batch_size):
+        super().__init__(name, algorithm, experience_pool)
+        self._min_experiences_to_train = min_experiences_to_train
+        self._num_batches = num_batches
+        self._batch_size = batch_size
+
+    def train(self):
+        if len(self._experience_pool) < self._min_experiences_to_train:
+            return
+
+        for _ in range(self._num_batches):
+            indexes, sample = self._experience_pool.sample_by_key("loss", self._batch_size)
+            state = np.asarray(sample["state"])
+            action = np.asarray(sample["action"])
+            reward = np.asarray(sample["reward"])
+            next_state = np.asarray(sample["next_state"])
+            loss = self._algorithm.train(state, action, reward, next_state)
+            self._experience_pool.update(indexes, {"loss": loss})
--- a/examples/ecr/dqn/components/agent_manager.py
+++ b/examples/ecr/dqn/components/agent_manager.py
@ -0,0 +1,45 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import io
+import yaml
+
+from torch.nn.functional import smooth_l1_loss
+from torch.optim import RMSprop
+
+from maro.rl import AbsAgentManager, LearningModel, MLPDecisionLayers, DQN, DQNHyperParams, ColumnBasedStore
+from maro.utils import convert_dottable, set_seeds
+from .agent import ECRAgent
+
+
+with io.open("config.yml", "r") as in_file:
+    raw_config = yaml.safe_load(in_file)
+    config = convert_dottable(raw_config)
+    config = config.agents
+
+
+class DQNAgentManager(AbsAgentManager):
+    def _assemble(self, agent_dict):
+        set_seeds(config.seed)
+        num_actions = config.algorithm.num_actions
+        for agent_id in self._agent_id_list:
+            eval_model = LearningModel(decision_layers=MLPDecisionLayers(name=f'{agent_id}.policy',
+                                                                         input_dim=self._state_shaper.dim,
+                                                                         output_dim=num_actions,
+                                                                         **config.algorithm.model)
+                                       )
+
+            algorithm = DQN(model_dict={"eval": eval_model},
+                            optimizer_opt=(RMSprop, config.algorithm.optimizer),
+                            loss_func_dict={"eval": smooth_l1_loss},
+                            hyper_params=DQNHyperParams(**config.algorithm.hyper_parameters,
+                                                        num_actions=num_actions))
+
+            experience_pool = ColumnBasedStore(**config.experience_pool)
+            agent_dict[agent_id] = ECRAgent(name=agent_id, algorithm=algorithm, experience_pool=experience_pool,
+                                            **config.training_loop_parameters)
+
+    def store_experiences(self, experiences):
+        for agent_id, exp in experiences.items():
+            exp.update({"loss": [1e8] * len(exp[next(iter(exp))])})
+            self._agent_dict[agent_id].store_experiences(exp)
--- a/examples/ecr/dqn/components/dist_actor.py
+++ b/examples/ecr/dqn/components/dist_actor.py
@ -0,0 +1,50 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import io
+import yaml
+
+import numpy as np
+
+from maro.simulator import Env
+from maro.rl import AgentMode, SimpleActor, ActorWorker, KStepExperienceShaper, TwoPhaseLinearExplorer
+from maro.utils import convert_dottable
+from examples.ecr.dqn.components.state_shaper import ECRStateShaper
+from examples.ecr.dqn.components.action_shaper import ECRActionShaper
+from examples.ecr.dqn.components.experience_shaper import TruncatedExperienceShaper
+from examples.ecr.dqn.components.agent_manager import DQNAgentManager
+
+
+with io.open("config.yml", "r") as in_file:
+    raw_config = yaml.safe_load(in_file)
+    config = convert_dottable(raw_config)
+
+if __name__ == "__main__":
+    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
+    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
+    state_shaper = ECRStateShaper(**config.state_shaping)
+    action_shaper = ECRActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions)))
+    if config.experience_shaping.type == "truncated":
+        experience_shaper = TruncatedExperienceShaper(**config.experience_shaping.truncated)
+    else:
+        experience_shaper = KStepExperienceShaper(reward_func=lambda mt: mt["perf"], **config.experience_shaping.k_step)
+
+    exploration_config = {"epsilon_range_dict": {"_all_": config.exploration.epsilon_range},
+                          "split_point_dict": {"_all_": config.exploration.split_point},
+                          "with_cache": config.exploration.with_cache
+                          }
+    explorer = TwoPhaseLinearExplorer(agent_id_list, config.general.total_training_episodes, **exploration_config)
+    agent_manager = DQNAgentManager(name="ecr_remote_actor",
+                                    agent_id_list=agent_id_list,
+                                    mode=AgentMode.INFERENCE,
+                                    state_shaper=state_shaper,
+                                    action_shaper=action_shaper,
+                                    experience_shaper=experience_shaper,
+                                    explorer=explorer)
+    proxy_params = {"group_name": config.distributed.group_name,
+                    "expected_peers": config.distributed.actor.peer,
+                    "redis_address": (config.distributed.redis.host_name, config.distributed.redis.port)
+                    }
+    actor_worker = ActorWorker(local_actor=SimpleActor(env=env, inference_agents=agent_manager),
+                               proxy_params=proxy_params)
+    actor_worker.launch()
--- a/examples/ecr/dqn/components/dist_learner.py
+++ b/examples/ecr/dqn/components/dist_learner.py
@ -0,0 +1,41 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import os
+import io
+import yaml
+
+from maro.simulator import Env
+from maro.rl import ActorProxy, SimpleLearner, AgentMode, TwoPhaseLinearExplorer
+from examples.ecr.dqn.components.state_shaper import ECRStateShaper
+from maro.utils import Logger, convert_dottable
+from examples.ecr.dqn.components.agent_manager import DQNAgentManager
+
+
+with io.open("config.yml", "r") as in_file:
+    raw_config = yaml.safe_load(in_file)
+    config = convert_dottable(raw_config)
+
+
+if __name__ == "__main__":
+    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
+    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
+    state_shaper = ECRStateShaper(**config.state_shaping)
+    exploration_config = {"epsilon_range_dict": {"_all_": config.exploration.epsilon_range},
+                          "split_point_dict": {"_all_": config.exploration.split_point},
+                          "with_cache": config.exploration.with_cache
+                          }
+    explorer = TwoPhaseLinearExplorer(agent_id_list, config.general.total_training_episodes, **exploration_config)
+    agent_manager = DQNAgentManager(name="ecr_remote_learner", agent_id_list=agent_id_list, mode=AgentMode.TRAIN,
+                                    state_shaper=state_shaper, explorer=explorer)
+
+    proxy_params = {"group_name": config.distributed.group_name,
+                    "expected_peers": config.distributed.learner.peer,
+                    "redis_address": (config.distributed.redis.host_name, config.distributed.redis.port)
+                    }
+    learner = SimpleLearner(trainable_agents=agent_manager,
+                            actor=ActorProxy(proxy_params=proxy_params),
+                            logger=Logger("distributed_ecr_learner", auto_timestamp=False))
+    learner.train(total_episodes=config.general.total_training_episodes)
+    learner.test()
+    learner.dump_models(os.path.join(os.getcwd(), "models"))
--- a/examples/ecr/dqn/components/experience_shaper.py
+++ b/examples/ecr/dqn/components/experience_shaper.py
@ -0,0 +1,49 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from collections import defaultdict
+
+import numpy as np
+
+from maro.rl import ExperienceShaper
+
+
+class TruncatedExperienceShaper(ExperienceShaper):
+    def __init__(self, *, time_window: int, time_decay_factor: float, fulfillment_factor: float,
+                 shortage_factor: float):
+        super().__init__(reward_func=None)
+        self._time_window = time_window
+        self._time_decay_factor = time_decay_factor
+        self._fulfillment_factor = fulfillment_factor
+        self._shortage_factor = shortage_factor
+
+    def __call__(self, trajectory, snapshot_list):
+        experiences_by_agent = {}
+        for i in range(len(trajectory) - 1):
+            transition = trajectory[i]
+            agent_id = transition["agent_id"]
+            if agent_id not in experiences_by_agent:
+                experiences_by_agent[agent_id] = defaultdict(list)
+            experiences = experiences_by_agent[agent_id]
+            experiences["state"].append(transition["state"])
+            experiences["action"].append(transition["action"])
+            experiences["reward"].append(self._compute_reward(transition["event"], snapshot_list))
+            experiences["next_state"].append(trajectory[i+1]["state"])
+
+        return experiences_by_agent
+
+    def _compute_reward(self, decision_event, snapshot_list):
+        start_tick = decision_event.tick + 1
+        end_tick = decision_event.tick + self._time_window
+        ticks = list(range(start_tick, end_tick))
+
+        # calculate tc reward
+        future_fulfillment = snapshot_list["ports"][ticks::"fulfillment"]
+        future_shortage = snapshot_list["ports"][ticks::"shortage"]
+        decay_list = [self._time_decay_factor ** i for i in range(end_tick - start_tick)
+                      for _ in range(future_fulfillment.shape[0]//(end_tick-start_tick))]
+
+        tot_fulfillment = np.dot(future_fulfillment, decay_list)
+        tot_shortage = np.dot(future_shortage, decay_list)
+
+        return np.float(self._fulfillment_factor * tot_fulfillment - self._shortage_factor * tot_shortage)
--- a/examples/ecr/dqn/components/state_shaper.py
+++ b/examples/ecr/dqn/components/state_shaper.py
@ -0,0 +1,28 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import numpy as np
+from maro.rl import StateShaper
+
+
+class ECRStateShaper(StateShaper):
+    def __init__(self, *, look_back, max_ports_downstream, port_attributes, vessel_attributes):
+        super().__init__()
+        self._look_back = look_back
+        self._max_ports_downstream = max_ports_downstream
+        self._port_attributes = port_attributes
+        self._vessel_attributes = vessel_attributes
+        self._dim = (look_back + 1) * (max_ports_downstream + 1) * len(port_attributes) + len(vessel_attributes)
+
+    def __call__(self, decision_event, snapshot_list):
+        tick, port_idx, vessel_idx = decision_event.tick, decision_event.port_idx, decision_event.vessel_idx
+        ticks = [tick - rt for rt in range(self._look_back-1)]
+        future_port_idx_list = snapshot_list["vessels"][tick: vessel_idx: 'future_stop_list'].astype('int')
+        port_features = snapshot_list["ports"][ticks: [port_idx] + list(future_port_idx_list): self._port_attributes]
+        vessel_features = snapshot_list["vessels"][tick: vessel_idx: self._vessel_attributes]
+        state = np.concatenate((port_features, vessel_features))
+        return str(port_idx), state
+    
+    @property
+    def dim(self):
+        return self._dim
--- a/examples/ecr/dqn/config.yml
+++ b/examples/ecr/dqn/config.yml
@ -0,0 +1,66 @@
+env:
+  scenario: "ecr"
+  topology: "toy.4p_ssdd_l0.0"
+  durations: 1120
+general:
+  total_training_episodes: 500 # max episode
+state_shaping:
+  look_back: 7
+  max_ports_downstream: 2
+  port_attributes:
+    - "empty"
+    - "full"
+    - "on_shipper"
+    - "on_consignee"
+    - "booking"
+    - "shortage"
+    - "fulfillment"
+  vessel_attributes:
+    - "empty"
+    - "full"
+    - "remaining_space"
+experience_shaping:
+  type: "truncated"
+  k_step:
+    reward_decay: 0.9
+    steps: 5
+  truncated:
+    time_window: 100
+    fulfillment_factor: 1.0
+    shortage_factor: 1.0
+    time_decay_factor: 0.97
+exploration:
+  epsilon_range: [0.0, 0.4]
+  split_point: [0.5, 0.8]
+  with_cache: true
+agents:
+  algorithm:
+    num_actions: 21
+    model:
+      hidden_dims:
+        - 256
+        - 128
+        - 64
+      dropout_p: 0.0
+    optimizer:
+      lr: 0.05
+    hyper_parameters:
+      reward_decay: .0
+      num_training_rounds_per_target_replacement: 5
+      tau: 0.1
+  experience_pool:
+    capacity: -1
+  training_loop_parameters:
+    min_experiences_to_train: 1024
+    num_batches: 10  # number of times the algorithm's step() method is called
+    batch_size: 128
+  seed: 1024   # for reproducibility
+distributed:
+  group_name: "dqn_distributed_test"
+  actor:
+    peer: {"actor": 1}
+  learner:
+    peer: {"actor_worker": 1}
+  redis:
+    host_name: "localhost"
+    port: 6379
--- a/examples/ecr/dqn/multi_process_launcher.py
+++ b/examples/ecr/dqn/multi_process_launcher.py
@ -0,0 +1,20 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+"""
+This script is used to debug distributed algorithm in single host multi-process mode.
+"""
+
+import os
+
+ACTOR_NUM = 1   # must be same as in config
+LEARNER_NUM = 1
+
+learner_path = "components/dist_learner.py &"
+actor_path = "components/dist_actor.py &"
+
+for l_num in range(LEARNER_NUM):
+    os.system(f"python " + learner_path)
+
+for a_num in range(ACTOR_NUM):
+    os.system(f"python " + actor_path)
--- a/examples/ecr/dqn/single_process_launcher.py
+++ b/examples/ecr/dqn/single_process_launcher.py
@ -0,0 +1,52 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import os
+import io
+import yaml
+
+import numpy as np
+
+from maro.simulator import Env
+from maro.rl import SimpleLearner, SimpleActor, AgentMode, KStepExperienceShaper, TwoPhaseLinearExplorer
+from maro.utils import Logger, convert_dottable
+from examples.ecr.dqn.components.state_shaper import ECRStateShaper
+from examples.ecr.dqn.components.action_shaper import ECRActionShaper
+from examples.ecr.dqn.components.experience_shaper import TruncatedExperienceShaper
+from examples.ecr.dqn.components.agent_manager import DQNAgentManager
+
+
+with io.open("config.yml", "r") as in_file:
+    raw_config = yaml.safe_load(in_file)
+    config = convert_dottable(raw_config)
+
+
+if __name__ == "__main__":
+    env = Env(config.env.scenario, config.env.topology, durations=config.env.durations)
+    agent_id_list = [str(agent_id) for agent_id in env.agent_idx_list]
+    state_shaper = ECRStateShaper(**config.state_shaping)
+    action_shaper = ECRActionShaper(action_space=list(np.linspace(-1.0, 1.0, config.agents.algorithm.num_actions)))
+    if config.experience_shaping.type == "truncated":
+        experience_shaper = TruncatedExperienceShaper(**config.experience_shaping.truncated)
+    else:
+        experience_shaper = KStepExperienceShaper(reward_func=lambda mt: mt["perf"], **config.experience_shaping.k_step)
+
+    exploration_config = {"epsilon_range_dict": {"_all_": config.exploration.epsilon_range},
+                          "split_point_dict": {"_all_": config.exploration.split_point},
+                          "with_cache": config.exploration.with_cache
+                          }
+    explorer = TwoPhaseLinearExplorer(agent_id_list, config.general.total_training_episodes, **exploration_config)
+    agent_manager = DQNAgentManager(name="ecr_learner",
+                                    mode=AgentMode.TRAIN_INFERENCE,
+                                    agent_id_list=agent_id_list,
+                                    state_shaper=state_shaper,
+                                    action_shaper=action_shaper,
+                                    experience_shaper=experience_shaper,
+                                    explorer=explorer)
+    learner = SimpleLearner(trainable_agents=agent_manager,
+                            actor=SimpleActor(env=env, inference_agents=agent_manager),
+                            logger=Logger("single_host_ecr_learner", auto_timestamp=False))
+
+    learner.train(total_episodes=config.general.total_training_episodes)
+    learner.test()
+    learner.dump_models(os.path.join(os.getcwd(), "models"))
--- a/maro/backends/np_backend.pyx
+++ b/maro/backends/np_backend.pyx
@ -378,6 +378,8 @@ cdef class NPSnapshotList(SnapshotListAbc):
        """Reset snapshot list"""
        self._cur_index = 0
        self._tick2index_dict.clear()
+        self._index2tick_dict.clear()
+        self._history_dict.clear()

        cdef str node_name
        cdef AttrInfo attr_info
--- a/maro/cli/data_pipeline/base.py
+++ b/maro/cli/data_pipeline/base.py
@ -1,3 +1,6 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
 import os
 import shutil
 import tempfile
--- a/maro/cli/data_pipeline/citi_bike.py
+++ b/maro/cli/data_pipeline/citi_bike.py
@ -317,7 +317,7 @@ class WeatherPipeline(DataPipeline):
            return WeatherPipeline.WeatherEnum.SUNNY.value

    def _parse_date(self, row: dict):
-        dstr = row["Date"] if row["Date"] != "" else None
+        dstr = row.get("Date", None)

        return dstr

@ -327,9 +327,9 @@ class WeatherPipeline(DataPipeline):
        wh = self._weather(row=row)
        temp_str = row["Avg Temp"]

-        temp = round(float(temp_str), 2) if temp_str != "" else self._last_day_temp
+        temp = round(float(temp_str), 2) if temp_str != "" and temp_str is not None else self._last_day_temp

-        self.last_day_temp = temp
+        self._last_day_temp = temp

        return {"date": date, "weather": wh, "temp": temp} if date is not None else None

--- a/maro/cli/data_pipeline/utils.py
+++ b/maro/cli/data_pipeline/utils.py
@ -4,10 +4,9 @@
 import json
 import numpy as np
 import os
-import pycurl
-import requests
 import shutil
 import sys
+import urllib.request
 import uuid

 from maro.cli.utils.params import GlobalPaths
@ -43,13 +42,10 @@ def download_file(source: str, destination: str):
    os.makedirs(tmpdir, exist_ok=True)
    os.makedirs(os.path.dirname(destination), exist_ok=True)

+    source_data = urllib.request.urlopen(source) 
+    res_data = source_data.read()
    with open(temp_file_name, "wb") as f:
-        curl = pycurl.Curl()
-        curl.setopt(pycurl.URL, source)
-        curl.setopt(pycurl.WRITEDATA, f)
-        curl.setopt(pycurl.FOLLOWLOCATION, True)
-        curl.perform()
-        curl.close()
+        f.write(res_data)

    if os.path.exists(destination):
        os.remove(destination)
--- a/maro/cli/grass/lib/agents/node_agent.py
+++ b/maro/cli/grass/lib/agents/node_agent.py
@ -143,7 +143,7 @@ class ContainerTrackingAgent(multiprocessing.Process):
                                               encoding='utf8')
            nvidia_smi_str = completed_process.stdout
            node_details['resources']['actual_gpu_usage'] = f"{float(nvidia_smi_str)}%"
-        except:
+        except Exception:
            pass

    @staticmethod
--- a/maro/cli/grass/utils/copy.py
+++ b/maro/cli/grass/utils/copy.py
@ -21,7 +21,8 @@ def copy_files_to_node(local_path: str, remote_dir: str, admin_username: str, no
        admin_username (str)
        node_ip_address (str)
    """
-    copy_scripts = f"rsync -e 'ssh -o StrictHostKeyChecking=no' -az {local_path} {admin_username}@{node_ip_address}:{remote_dir}"
+    copy_scripts = f"rsync -e 'ssh -o StrictHostKeyChecking=no' " \
+                   f"-az {local_path} {admin_username}@{node_ip_address}:{remote_dir}"
    _ = SubProcess.run(copy_scripts)


@ -34,7 +35,8 @@ def copy_files_from_node(local_dir: str, remote_path: str, admin_username: str,
        admin_username (str)
        node_ip_address (str)
    """
-    copy_scripts = f"rsync -e 'ssh -o StrictHostKeyChecking=no' -az {admin_username}@{node_ip_address}:{remote_path} {local_dir}"
+    copy_scripts = f"rsync -e 'ssh -o StrictHostKeyChecking=no' " \
+                   f"-az {admin_username}@{node_ip_address}:{remote_path} {local_dir}"
    _ = SubProcess.run(copy_scripts)


--- a/maro/cli/grass/utils/hash.py
+++ b/maro/cli/grass/utils/hash.py
@ -1,3 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
 import hashlib


--- a/maro/cli/k8s/executors/k8s_azure_executor.py
+++ b/maro/cli/k8s/executors/k8s_azure_executor.py
@ -435,7 +435,10 @@ class K8sAzureExecutor:
        sas = self._check_and_get_account_sas()

        # Push data
-        copy_command = f'azcopy copy "{local_path}" "https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_dir}?{sas}" --recursive=True'
+        copy_command = f'azcopy copy ' \
+                       f'"{local_path}" ' \
+                       f'"https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_dir}?{sas}" ' \
+                       f'--recursive=True'
        _ = SubProcess.run(copy_command)

    def pull_data(self, local_dir: str, remote_path: str):
@ -447,7 +450,10 @@ class K8sAzureExecutor:
        sas = self._check_and_get_account_sas()

        # Push data
-        copy_command = f'azcopy copy "https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_path}?{sas}" "{local_dir}" --recursive=True'
+        copy_command = f'azcopy copy ' \
+                       f'"https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_path}?{sas}" ' \
+                       f'"{local_dir}" ' \
+                       f'--recursive=True'
        _ = SubProcess.run(copy_command)

    def remove_data(self, remote_path: str):
@ -461,7 +467,9 @@ class K8sAzureExecutor:
        sas = self._check_and_get_account_sas()

        # Remove data
-        copy_command = f'azcopy remove "https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_path}?{sas}" --recursive=True'
+        copy_command = f'azcopy remove ' \
+                       f'"https://{cluster_id}st.file.core.windows.net/{cluster_id}-fs{remote_path}?{sas}" ' \
+                       f'--recursive=True'
        _ = SubProcess.run(copy_command)

    def _check_and_get_account_sas(self):
@ -523,12 +531,14 @@ class K8sAzureExecutor:
            yaml.safe_dump(k8s_job_config, fw)

        # Apply k8s config
-        command = f"kubectl apply -f {GlobalPaths.MARO_CLUSTERS}/{self.cluster_name}/jobs/{job_name}/k8s_configs/jobs.yml"
+        command = f"kubectl apply -f " \
+                  f"{GlobalPaths.MARO_CLUSTERS}/{self.cluster_name}/jobs/{job_name}/k8s_configs/jobs.yml"
        _ = SubProcess.run(command)

    def stop_job(self, job_name: str):
        # Stop job
-        command = f"kubectl delete -f {GlobalPaths.MARO_CLUSTERS}/{self.cluster_name}/jobs/{job_name}/k8s_configs/jobs.yml"
+        command = f"kubectl delete -f " \
+                  f"{GlobalPaths.MARO_CLUSTERS}/{self.cluster_name}/jobs/{job_name}/k8s_configs/jobs.yml"
        _ = SubProcess.run(command)

    @staticmethod
--- a/maro/cli/maro.py
+++ b/maro/cli/maro.py
@ -43,7 +43,10 @@ def main():
    # maro env
    parser_env = subparsers.add_parser(
        'env',
-        help='Get all environment-related information, such as the supported scenarios, topologies. And it is also responsible to generate data to the specific environment, which has external data dependency.',
+        help=('Get all environment-related information, '
+              'such as the supported scenarios, topologies. '
+              'And it is also responsible to generate data to the specific environment, '
+              'which has external data dependency.'),
        parents=[global_parser]
    )
    parser_env.set_defaults(func=_help_func(parser=parser_env))
@ -777,7 +780,9 @@ def load_parser_data(prev_parser: ArgumentParser, global_parser: ArgumentParser)
        type=int,
        default=None,
        required=False,
-        help="Specified start timestamp (in UTC) for binary file, then this timestamp will be considered as tick=0 for binary reader, this can be used to adjust the reader pipeline.")
+        help=("Specified start timestamp (in UTC) for binary file, "
+              "then this timestamp will be considered as tick=0 for binary reader, "
+              "this can be used to adjust the reader pipeline."))

    build_cmd_parser.set_defaults(func=convert)

--- a/maro/cli/utils/checkers.py
+++ b/maro/cli/utils/checkers.py
@ -1,3 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
 from functools import wraps

 from maro.cli.utils.details import load_cluster_details
--- a/maro/cli/utils/examples.py
+++ b/maro/cli/utils/examples.py
@ -1,3 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
 MARO_GRASS_CREATE = """
 Examples:
  Create a cluster in grass mode with a deployment
--- a/maro/cli/utils/params.py
+++ b/maro/cli/utils/params.py
@ -1,3 +1,7 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+
 import logging

 class GlobalParams:
--- a/maro/rl/init.py
+++ b/maro/rl/init.py
@ -1,60 +1,55 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-from maro.rl.actor.abstract_actor import AbstractActor, RolloutMode
+from maro.rl.actor.abs_actor import AbsActor
 from maro.rl.actor.simple_actor import SimpleActor
-from maro.rl.learner.abstract_learner import AbstractLearner
+from maro.rl.learner.abs_learner import AbsLearner
 from maro.rl.learner.simple_learner import SimpleLearner
-from maro.rl.agent.agent import Agent, AgentParameters
-from maro.rl.agent.agent_manager import AgentManager, AgentMode
-from maro.rl.algorithms.torch.algorithm import Algorithm
+from maro.rl.agent.abs_agent import AbsAgent
+from maro.rl.agent.abs_agent_manager import AbsAgentManager, AgentMode
+from maro.rl.algorithms.torch.abs_algorithm import AbsAlgorithm
 from maro.rl.algorithms.torch.dqn import DQN, DQNHyperParams
 from maro.rl.models.torch.mlp_representation import MLPRepresentation
 from maro.rl.models.torch.decision_layers import MLPDecisionLayers
 from maro.rl.models.torch.learning_model import LearningModel
-from maro.rl.storage.abstract_store import AbstractStore
-from maro.rl.storage.unbounded_store import UnboundedStore
-from maro.rl.storage.fixed_size_store import FixedSizeStore, OverwriteType
-from maro.rl.shaping.abstract_state_shaper import AbstractStateShaper
-from maro.rl.shaping.abstract_action_shaper import AbstractActionShaper
-from maro.rl.shaping.abstract_reward_shaper import AbstractRewardShaper
-from maro.rl.shaping.k_step_reward_shaper import KStepRewardShaper
-from maro.rl.explorer.abstract_explorer import AbstractExplorer
+from maro.rl.storage.abs_store import AbsStore
+from maro.rl.storage.column_based_store import ColumnBasedStore
+from maro.rl.storage.utils import OverwriteType
+from maro.rl.shaping.abs_shaper import AbsShaper
+from maro.rl.shaping.state_shaper import StateShaper
+from maro.rl.shaping.action_shaper import ActionShaper
+from maro.rl.shaping.experience_shaper import ExperienceShaper
+from maro.rl.shaping.k_step_experience_shaper import KStepExperienceShaper
+from maro.rl.explorer.abs_explorer import AbsExplorer
 from maro.rl.explorer.simple_explorer import LinearExplorer, TwoPhaseLinearExplorer
-from maro.rl.dist_topologies.multi_actor_single_learner_sync import ActorProxy, ActorWorker
-from maro.rl.common import ExperienceKey, ExperienceInfoKey, TransitionInfoKey
+from maro.rl.dist_topologies.single_learner_multi_actor_sync_mode import ActorProxy, ActorWorker


 __all__ = [
-    "AbstractActor",
-    "RolloutMode",
+    "AbsActor",
    "SimpleActor",
-    "AbstractLearner",
+    "AbsLearner",
    "SimpleLearner",
-    "Agent",
-    "AgentParameters",
-    "AgentManager",
+    "AbsAgent",
+    "AbsAgentManager",
    "AgentMode",
-    "Algorithm",
+    "AbsAlgorithm",
    "DQN",
    "DQNHyperParams",
    "MLPRepresentation",
    "MLPDecisionLayers",
    "LearningModel",
-    "AbstractStore",
-    "UnboundedStore",
-    "FixedSizeStore",
+    "AbsStore",
+    "ColumnBasedStore",
    "OverwriteType",
-    "AbstractStateShaper",
-    "AbstractActionShaper",
-    "AbstractRewardShaper",
-    "KStepRewardShaper",
-    "AbstractExplorer",
+    "AbsShaper",
+    "StateShaper",
+    "ActionShaper",
+    "ExperienceShaper",
+    "KStepExperienceShaper",
+    "AbsExplorer",
    "LinearExplorer",
    "TwoPhaseLinearExplorer",
    "ActorProxy",
-    "ActorWorker",
-    "ExperienceKey",
-    "ExperienceInfoKey",
-    "TransitionInfoKey"
+    "ActorWorker"
    ]
--- a/maro/rl/actor/abs_actor.py
+++ b/maro/rl/actor/abs_actor.py
@ -0,0 +1,44 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+from typing import Union
+
+from maro.rl.agent.abs_agent_manager import AbsAgentManager
+from maro.simulator import Env
+
+
+class AbsActor(ABC):
+    def __init__(self, env: Env, inference_agents: Union[dict, AbsAgentManager]):
+        """
+        Actor contains env and agents, and it is responsible for collecting experience from the interaction
+        with the environment.
+
+        Args:
+            env (Env): an Env instance.
+            inference_agents (dict or AbsAgentManager): a dict of agents or an AgentManager instance that
+                                                        manages all agents.
+        """
+        self._env = env
+        self._inference_agents = inference_agents
+
+    @abstractmethod
+    def roll_out(self, model_dict: dict = None, epsilon_dict: dict = None, done: bool = None,
+                 return_details: bool = True):
+        """
+        Performs a single episode of roll-out to collect experiences and performance data from the environment.
+
+        Args:
+            model_dict (dict): if not None, the agents will load the models from model_dict and use these models
+                           to perform roll-out.
+            epsilon_dict (dict): exploration rate by agent.
+            done (bool): if True, the current call is the last call, i.e., no more roll-outs will be performed.
+                         This flag is used to signal remote actor workers to exit.
+            return_details (bool): if True, return episode details (e.g., experiences) as well as performance
+                                   metrics provided by the env.
+        """
+        return NotImplementedError
+
+    @property
+    def inference_agents(self):
+        return self._inference_agents
--- a/maro/rl/actor/simple_actor.py
+++ b/maro/rl/actor/simple_actor.py
@ -1,47 +1,50 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-from typing import Union
-
-from maro.simulator import Env
-from .abstract_actor import AbstractActor, RolloutMode
-from maro.rl.agent.agent_manager import AgentManager
+from .abs_actor import AbsActor
+from maro.rl.agent.abs_agent_manager import AbsAgentManager
 from maro.simulator import Env


-class SimpleActor(AbstractActor):
+class SimpleActor(AbsActor):
    """
-    A simple actor class that implements typical roll-out logic
+    A simple actor class that implements simple roll-out logic
    """
-    def __init__(self, env: Union[dict, Env], inference_agents: AgentManager):
-        assert isinstance(inference_agents, AgentManager), \
-            "SimpleActor only accepts type AgentManager for parameter inference_agents"
-        super().__int__(env, inference_agents)
+    def __init__(self, env: Env, inference_agents: AbsAgentManager):
+        super().__init__(env, inference_agents)

-    def roll_out(self, mode, models=None, epsilon_dict=None, seed: int = None):
-        if mode == RolloutMode.EXIT:
+    def roll_out(self, model_dict: dict = None, epsilon_dict: dict = None, done: bool = False,
+                 return_details: bool = True):
+        """
+        The main interface provided by the Actor class, in which the agents perform a single episode of roll-out
+        to collect experiences and performance data from the environment
+
+        Args:
+            model_dict (dict): if not None, the agents will load the models from model_dict and use these models
+                           to perform roll-out.
+            epsilon_dict (dict): exploration rate by agent
+            done (bool): if True, the current call is the last call, i.e., no more roll-outs will be performed.
+                         This flag is used to signal remote actor workers to exit.
+            return_details (bool): if True, return experiences as well as performance metrics provided by the env.
+        """
+        if done:
            return None, None

-        env = self._env if isinstance(self._env, Env) else self._env[mode]
-        if seed is not None:
-            env.set_seed(seed)
-
+        self._env.reset()
        # assign epsilons
        if epsilon_dict is not None:
            self._inference_agents.explorer.epsilon = epsilon_dict

        # load models
-        if models is not None:
-            self._inference_agents.load_models(models)
+        if model_dict is not None:
+            self._inference_agents.load_models(model_dict)

-        metrics, decision_event, is_done = env.step(None)
+        metrics, decision_event, is_done = self._env.step(None)
        while not is_done:
-            action = self._inference_agents.choose_action(decision_event, env.snapshot_list)
-            metrics, decision_event, is_done = env.step(action)
+            action = self._inference_agents.choose_action(decision_event, self._env.snapshot_list)
+            metrics, decision_event, is_done = self._env.step(action)
            self._inference_agents.on_env_feedback(metrics)

-        exp_by_agent = self._inference_agents.post_process(env.snapshot_list) if mode == RolloutMode.TRAIN else None
-        performance = env.metrics
-        env.reset()
+        details = self._inference_agents.post_process(self._env.snapshot_list) if return_details else None

-        return {'local': performance}, exp_by_agent
+        return self._env.metrics, details
--- a/maro/rl/agent/abs_agent.py
+++ b/maro/rl/agent/abs_agent.py
@ -0,0 +1,83 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+import os
+import pickle
+
+import torch
+
+from maro.rl.algorithms.torch.abs_algorithm import AbsAlgorithm
+
+
+class AbsAgent(ABC):
+    def __init__(self,
+                 name: str,
+                 algorithm: AbsAlgorithm,
+                 experience_pool
+                 ):
+        """
+        RL agent class. It's a sandbox for the RL algorithm. Scenario-specific details will be excluded out.
+        We focus on the abstraction algorithm development here. Environment observation and decision events will be
+        converted to a uniform format before calling in. And the output will be converted to an environment executable
+        format before return back to the environment. Its key responsibility is optimizing policy based on interaction
+        with the environment.
+
+        Args:
+            name (str): agent's name.
+            algorithm: a concrete algorithm instance that inherits from AbstractAlgorithm. This is the centerpiece
+                       of the Agent class and is responsible for the most important tasks of an agent: choosing
+                       actions and optimizing models.
+            experience_pool: a data store that stores experiences generated by the experience shaper
+        """
+        self._name = name
+        self._algorithm = algorithm
+        self._experience_pool = experience_pool
+
+    @property
+    def algorithm(self):
+        return self._algorithm
+
+    @property
+    def experience_pool(self):
+        return self._experience_pool
+
+    def choose_action(self, model_state, epsilon: float = .0):
+        """
+        Choose an action using the underlying algorithm based a preprocessed env state.
+
+        Args:
+            model_state: state vector as accepted by the underlying algorithm.
+            epsilon (float): exploration rate.
+        Returns:
+            Action given by the underlying policy model.
+        """
+        return self._algorithm.choose_action(model_state, epsilon)
+
+    @abstractmethod
+    def train(self):
+        """
+        Runs a specified number of training steps, with each step consisting of sampling a batch from the experience
+        pool and running the underlying algorithm's train_on_batch() method.
+        """
+        return NotImplementedError
+
+    def store_experiences(self, experiences):
+        self._experience_pool.put(experiences)
+
+    def load_model_dict(self, model_dict: dict):
+        self._algorithm.model_dict = model_dict
+
+    def load_model_dict_from_file(self, file_path):
+        model_dict = torch.load(file_path)
+        for model_key, state_dict in model_dict.items():
+            self._algorithm.model_dict[model_key].load_state_dict(state_dict)
+
+    def dump_model_dict(self, dir_path: str):
+        torch.save({model_key: model.state_dict() for model_key, model in self._algorithm.model_dict.items()},
+                   os.path.join(dir_path, self._name))
+
+    def dump_experience_store(self, dir_path: str):
+        with open(os.path.join(dir_path, self._name)) as fp:
+            pickle.dump(self._experience_pool, fp)
+
--- a/maro/rl/agent/abs_agent_manager.py
+++ b/maro/rl/agent/abs_agent_manager.py
@ -0,0 +1,162 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+from enum import Enum
+import os
+
+from maro.rl.shaping.state_shaper import StateShaper
+from maro.rl.shaping.action_shaper import ActionShaper
+from maro.rl.shaping.experience_shaper import ExperienceShaper
+from maro.rl.explorer.abs_explorer import AbsExplorer
+from maro.utils.exception.rl_toolkit_exception import UnsupportedAgentModeError, MissingShaperError, WrongAgentModeError
+
+
+class AgentMode(Enum):
+    TRAIN = "train"
+    INFERENCE = "inference"
+    TRAIN_INFERENCE = "train_inference"
+
+
+class AbsAgentManager(ABC):
+    def __init__(self,
+                 name: str,
+                 mode: AgentMode,
+                 agent_id_list: [str],
+                 state_shaper: StateShaper = None,
+                 action_shaper: ActionShaper = None,
+                 experience_shaper: ExperienceShaper = None,
+                 explorer: AbsExplorer = None):
+        """
+        Manages all agents.
+
+        Args:
+            name (str): name of agent manager.
+            mode (AgentMode): An AgentMode enum member that specifies that role of the agent. Some attributes may
+                              be None under certain modes.
+            agent_id_list (list): list of agent identifiers.
+            experience_shaper: responsible for processing data in the replay buffer at the end of an episode, e.g.,
+                               adjusting rewards and computing target states.
+            state_shaper:  responsible for extracting information from a decision event and the snapshot list for
+                           the event to form a state vector as accepted by the underlying algorithm
+            action_shaper: responsible for converting the output of an agent's action to an EnvAction object that can
+                           be executed by the environment. Cannot be None under Inference and TrainInference modes.
+            explorer: responsible for storing and updating exploration rates.
+        """
+        self._name = name
+        if mode not in AgentMode:
+            raise UnsupportedAgentModeError(msg='mode must be "train", "inference" or "train_inference"')
+        self._mode = mode
+
+        if mode in {AgentMode.INFERENCE, AgentMode.TRAIN_INFERENCE}:
+            if state_shaper is None:
+                raise MissingShaperError(msg=f"state shaper cannot be None under mode {self._mode}")
+            if action_shaper is None:
+                raise MissingShaperError(msg=f"action_shaper cannot be None under mode {self._mode}")
+            if experience_shaper is None:
+                raise MissingShaperError(msg=f"experience_shaper cannot be None under mode {self._mode}")
+
+        self._state_shaper = state_shaper
+        self._action_shaper = action_shaper
+        self._experience_shaper = experience_shaper
+        self._explorer = explorer
+
+        self._agent_id_list = agent_id_list
+        self._trajectory = []
+
+        self._agent_dict = {}
+        self._assemble(self._agent_dict)
+
+    def __getitem__(self, agent_id):
+        return self._agent_dict[agent_id]
+
+    def _assemble(self, agent_dict):
+        """
+        abstract method to populate the _agent_dict attribute.
+        """
+        return NotImplemented
+
+    def choose_action(self, decision_event, snapshot_list):
+        self._assert_inference_mode()
+        agent_id, model_state = self._state_shaper(decision_event, snapshot_list)
+        model_action = self._agent_dict[agent_id].choose_action(
+            model_state, self._explorer.epsilon[agent_id] if self._explorer else None)
+        self._trajectory.append({"state": model_state,
+                                 "action": model_action,
+                                 "reward": None,
+                                 "agent_id": agent_id,
+                                 "event": decision_event})
+        return self._action_shaper(model_action, decision_event, snapshot_list)
+
+    def on_env_feedback(self, metrics):
+        self._trajectory[-1]["metrics"] = metrics
+
+    def post_process(self, snapshot_list):
+        """
+        Called at the end of an episode, this function processes data from the latest episode, including reward
+        adjustments and next-state computations, and returns experiences for individual agents.
+
+        Args:
+            snapshot_list: the snapshot list from the env at the end of an episode.
+        """
+        experiences = self._experience_shaper(self._trajectory, snapshot_list)
+        self._trajectory.clear()
+        self._state_shaper.reset()
+        self._action_shaper.reset()
+        self._experience_shaper.reset()
+        return experiences
+
+    @abstractmethod
+    def store_experiences(self, experiences):
+        return NotImplementedError
+
+    def update_epsilon(self, performance):
+        """
+        This updates the exploration rates for each agent.
+
+        Args:
+            performance: performance from the latest episode.
+        """
+        if self._explorer:
+            self._explorer.update(performance)
+
+    def train(self):
+        self._assert_train_mode()
+        for agent in self._agent_dict.values():
+            agent.train()
+
+    def load_models(self, agent_model_dict):
+        for agent_id, model_dict in agent_model_dict.items():
+            self._agent_dict[agent_id].load_model_dict(model_dict)
+
+    def load_models_from_files(self, file_path_dict):
+        for agent_id, file_path in file_path_dict.items():
+            self._agent_dict[agent_id].load_model_dict_from(file_path)
+
+    def dump_models(self, dir_path: str):
+        os.makedirs(dir_path, exist_ok=True)
+        for agent in self._agent_dict.values():
+            agent.dump_model_dict(dir_path)
+
+    def get_models(self):
+        return {agent_id: agent.algorithm.model_dict for agent_id, agent in self._agent_dict.items()}
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    def agents(self):
+        return self._agent_dict
+
+    @property
+    def explorer(self):
+        return self._explorer
+
+    def _assert_train_mode(self):
+        if self._mode != AgentMode.TRAIN and self._mode != AgentMode.TRAIN_INFERENCE:
+            raise WrongAgentModeError(msg=f"this method is unavailable under mode {self._mode}")
+
+    def _assert_inference_mode(self):
+        if self._mode != AgentMode.INFERENCE and self._mode != AgentMode.TRAIN_INFERENCE:
+            raise WrongAgentModeError(msg=f"this method is unavailable under mode {self._mode}")
--- a/maro/rl/algorithms/torch/abs_algorithm.py
+++ b/maro/rl/algorithms/torch/abs_algorithm.py
@ -0,0 +1,63 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+import itertools
+from typing import Union
+
+
+class AbsAlgorithm(ABC):
+    def __init__(self, model_dict: dict, optimizer_opt: Union[dict, tuple], loss_func_dict: dict, hyper_params: object):
+        """
+        It's the abstraction of RL algorithm, which provides a uniform policy interface, such choose_action, train_on_batch.
+        We also provide some predefined RL algorithm based on it, such DQN, A2C, etc. User can inherit from it to customize
+        their own algorithms.
+
+        Args:
+            model_dict (dict): underlying models for the algorithm (e.g., for A2C,
+                               model_dict = {"actor": ..., "critic": ...})
+            optimizer_opt (tuple or dict): tuple or dict of tuples of (optimizer_class, optimizer_params) associated
+                                           with the models in model_dict. If it is a tuple, the optimizer to be
+                                           instantiated applies to all trainable parameters from model_dict. If it
+                                           is a dict, the optimizer will be applied to the related model with the same key.
+            loss_func_dict (dict): loss function types associated with the models in model_dict.
+            hyper_params (object): algorithm-specific hyper-parameter set.
+        """
+        self._loss_func_dict = loss_func_dict
+        self._hyper_params = hyper_params
+        self._model_dict = model_dict
+        self._register_optimizers(optimizer_opt)
+
+    def _register_optimizers(self, optimizer_opt):
+        if isinstance(optimizer_opt, tuple):
+            # If a single optimizer_opt tuple is provided, a single optimizer will be created to jointly
+            # optimize all model parameters involved in the algorithm.
+            optim_cls, optim_params = optimizer_opt
+            model_params = [model.parameters() for model in self._model_dict.values()]
+            self._optimizer = optim_cls(itertools.chain(*model_params), **optim_params)
+        else:
+            self._optimizer = {}
+            for model_key, model in self._model_dict.items():
+                # No gradient required
+                if model_key not in optimizer_opt or optimizer_opt[model_key] is None:
+                    self._model_dict[model_key].eval()
+                    self._optimizer[model_key] = None
+                else:
+                    optim_cls, optim_params = optimizer_opt[model_key]
+                    self._optimizer[model_key] = optim_cls(model.parameters(), **optim_params)
+
+    @property
+    def model_dict(self):
+        return self._model_dict
+
+    @model_dict.setter
+    def model_dict(self, model_dict):
+        self._model_dict = model_dict
+
+    @abstractmethod
+    def train(self, *args, **kwargs):
+        return NotImplementedError
+
+    @abstractmethod
+    def choose_action(self, state, epsilon: float = None):
+        return NotImplementedError
--- a/maro/rl/algorithms/torch/dqn.py
+++ b/maro/rl/algorithms/torch/dqn.py
@ -1,32 +1,50 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-import random
-from copy import deepcopy
+from typing import Union

 import numpy as np
 import torch

-from maro.rl.algorithms.torch.algorithm import Algorithm
-from maro.rl.common import ExperienceKey, ExperienceInfoKey
+
+from maro.rl.algorithms.torch.abs_algorithm import AbsAlgorithm
+from maro.utils import clone


 class DQNHyperParams:
-    def __init__(self, num_actions: int, replace_target_frequency: int, tau: float):
+    __slots__ = ["num_actions", "reward_decay", "num_training_rounds_per_target_replacement", "tau"]
+    def __init__(self, num_actions: int, reward_decay: float, num_training_rounds_per_target_replacement: int,
+                 tau: float = 1.0):
+        """
+        DQN hyper-parameters.
+        Args:
+            num_actions (int): number of possible actions
+            reward_decay (float): reward decay as defined in standard RL terminology
+            num_training_rounds_per_target_replacement (int): number of training frequency of target model replacement
+            tau (float): soft update coefficient, e.g., target_model = tau * eval_model + (1-tau) * target_model
+        """
        self.num_actions = num_actions
-        self.replace_target_frequency = replace_target_frequency
+        self.reward_decay = reward_decay
+        self.num_training_rounds_per_target_replacement = num_training_rounds_per_target_replacement
        self.tau = tau


-class DQN(Algorithm):
-    def __init__(self, model_dict, optimizer_opt, loss_func_dict, hyper_params: DQNHyperParams):
+class DQN(AbsAlgorithm):
+    def __init__(self, model_dict: dict, optimizer_opt: Union[dict, tuple], loss_func_dict: dict,
+                 hyper_params: DQNHyperParams):
+        """
+        DQN algorithm. The model_dict must contain the key "eval". Optionally a model corresponding to
+        the key "target" can be provided. If the key "target" is absent or model_dict["target"] is None,
+        the target model will be a deep copy of the provided eval model.
+        """
        if model_dict.get("target", None) is None:
-            model_dict['target'] = deepcopy(model_dict['eval'])
+            model_dict["target"] = clone(model_dict["eval"])
        super().__init__(model_dict, optimizer_opt, loss_func_dict, hyper_params)
        self._train_cnt = 0
+        self._device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

-    def choose_action(self, state, epsilon: float = None):
-        if epsilon is None or random.random() > epsilon:
+    def choose_action(self, state: np.ndarray, epsilon: float = None):
+        if epsilon is None or np.random.rand() > epsilon:
            state = torch.from_numpy(state).unsqueeze(0)
            self._model_dict["eval"].eval()
            with torch.no_grad():
@ -34,27 +52,32 @@ class DQN(Algorithm):
            best_action_idx = q_values.argmax(dim=1).item()
            return best_action_idx

-        return random.choice(range(self._hyper_params.num_actions))
+        return np.random.choice(self._hyper_params.num_actions)

-    def train_on_batch(self, batch):
-        state = torch.from_numpy(batch[ExperienceKey.STATE])
-        action = torch.from_numpy(batch[ExperienceKey.ACTION])
-        reward = torch.from_numpy(batch[ExperienceKey.REWARD]).squeeze(1)
-        next_state = torch.from_numpy(batch[ExperienceKey.NEXT_STATE])
-        discount = torch.from_numpy(batch[ExperienceInfoKey.DISCOUNT])
-        q_value = self._model_dict["eval"](state).gather(1, action).squeeze(1)
-        target = (reward + discount * self._model_dict["target"](next_state).max(dim=1)[0]).detach()
-        loss = self._loss_func_dict["eval"](q_value, target)
+    def _prepare_batch(self, raw_batch):
+        return {key: torch.from_numpy(np.asarray(lst)).to(self._device) for key, lst in raw_batch.items()}
+
+    def train(self, state: np.ndarray, action: np.ndarray, reward: np.ndarray, next_state: np.ndarray):
+        state = torch.from_numpy(state).to(self._device)
+        action = torch.from_numpy(action).to(self._device)
+        reward = torch.from_numpy(reward).to(self._device)
+        next_state = torch.from_numpy(next_state).to(self._device)
+        if len(action.shape) == 1:
+            action = action.unsqueeze(1)
+        current_q_values = self._model_dict["eval"](state).gather(1, action).squeeze(1)
+        next_q_values = self._model_dict["target"](next_state).max(dim=1)[0]
+        target_q_values = (reward + self._hyper_params.reward_decay * next_q_values).detach()
+        loss = self._loss_func_dict["eval"](current_q_values, target_q_values)
        self._model_dict["eval"].train()
        self._optimizer.zero_grad()
        loss.backward()
        self._optimizer.step()
        self._train_cnt += 1
-        if self._train_cnt % self._hyper_params.replace_target_frequency == 0:
+        if self._train_cnt % self._hyper_params.num_training_rounds_per_target_replacement == 0:
            self._update_target_model()

-        return np.abs((q_value - target).detach().numpy())
+        return np.abs((current_q_values - target_q_values).detach().numpy())

    def _update_target_model(self):
-        for evl, target in zip(self._model_dict["eval"].parameters(), self._model_dict["target"].parameters()):
-            target.data = self._hyper_params.tau * evl.data + (1 - self._hyper_params.tau) * target.data
+        for eval_params, target_params in zip(self._model_dict["eval"].parameters(), self._model_dict["target"].parameters()):
+            target_params.data = self._hyper_params.tau * eval_params.data + (1 - self._hyper_params.tau) * target_params.data
--- a/maro/rl/dist_topologies/common.py
+++ b/maro/rl/dist_topologies/common.py
@ -2,9 +2,10 @@ from enum import Enum


 class PayloadKey(Enum):
-    RolloutMode = "rollout_mode"
    MODEL = "model"
    EPSILON = "epsilon"
    PERFORMANCE = "performance"
    EXPERIENCE = "experience"
    SEED = "seed"
+    DONE = "done"
+    RETURN_DETAILS = "return_details"
--- a/maro/rl/dist_topologies/single_learner_multi_actor_sync_mode.py
+++ b/maro/rl/dist_topologies/single_learner_multi_actor_sync_mode.py
@ -0,0 +1,81 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from enum import Enum
+from collections import defaultdict
+import sys
+
+from maro.communication import Proxy, SessionType
+from maro.communication.registry_table import RegisterTable
+from maro.rl.dist_topologies.common import PayloadKey
+from maro.rl.actor.abs_actor import AbsActor
+
+
+class MessageTag(Enum):
+    ROLLOUT = "rollout"
+    UPDATE = "update"
+
+
+class ActorProxy(object):
+    def __init__(self, proxy_params):
+        self._proxy = Proxy(component_type="actor", **proxy_params)
+
+    def roll_out(self, model_dict: dict = None, epsilon_dict: dict = None, done: bool = False,
+                 return_details: bool = True):
+        if done:
+            self._proxy.ibroadcast(tag=MessageTag.ROLLOUT,
+                                   session_type=SessionType.NOTIFICATION,
+                                   payload={PayloadKey.DONE: True})
+            return None, None
+        else:
+            performance, exp_by_agent = {}, {}
+            payloads = [(peer, {PayloadKey.MODEL: model_dict,
+                                PayloadKey.EPSILON: epsilon_dict,
+                                PayloadKey.RETURN_DETAILS: return_details})
+                        for peer in self._proxy.peers["actor_worker"]]
+            # TODO: double check when ack enable
+            replies = self._proxy.scatter(tag=MessageTag.ROLLOUT, session_type=SessionType.TASK,
+                                          destination_payload_list=payloads)
+            for msg in replies:
+                performance[msg.source] = msg.payload[PayloadKey.PERFORMANCE]
+                if msg.payload[PayloadKey.EXPERIENCE] is not None:
+                    for agent_id, exp_set in msg.payload[PayloadKey.EXPERIENCE].items():
+                        if agent_id not in exp_by_agent:
+                            exp_by_agent[agent_id] = defaultdict(list)
+                        for k, v in exp_set.items():
+                            exp_by_agent[agent_id][k].extend(v)
+
+            return performance, exp_by_agent
+
+
+class ActorWorker(object):
+    def __init__(self, local_actor: AbsActor, proxy_params):
+        self._local_actor = local_actor
+        self._proxy = Proxy(component_type="actor_worker", **proxy_params)
+        self._registry_table = RegisterTable(self._proxy.get_peers)
+        self._registry_table.register_event_handler("actor:rollout:1", self.on_rollout_request)
+
+    def on_rollout_request(self, message):
+        data = message.payload
+        if data.get(PayloadKey.DONE, False):
+            sys.exit(0)
+
+        performance, experiences = self._local_actor.roll_out(model_dict=data[PayloadKey.MODEL],
+                                                              epsilon_dict=data[PayloadKey.EPSILON],
+                                                              return_details=data[PayloadKey.RETURN_DETAILS])
+
+        self._proxy.reply(received_message=message,
+                          tag=MessageTag.UPDATE,
+                          payload={PayloadKey.PERFORMANCE: performance,
+                                   PayloadKey.EXPERIENCE: experiences}
+                          )
+
+    def launch(self):
+        """
+        This launches an ActorWorker instance.
+        """
+        for msg in self._proxy.receive():
+            self._registry_table.push(msg)
+            triggered_events = self._registry_table.get()
+            for handler_fn, cached_messages in triggered_events:
+                handler_fn(cached_messages)
--- a/maro/rl/explorer/abs_explorer.py
+++ b/maro/rl/explorer/abs_explorer.py
@ -0,0 +1,51 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+
+
+class AbsExplorer(ABC):
+    def __init__(self, agent_id_list: list, total_episodes: int, epsilon_range_dict: dict, with_cache: bool = True):
+        """
+        Args:
+            agent_id_list (list): list of agent ID's.
+            total_episodes: total number of episodes in the training phase.
+            epsilon_range_dict (dict): a dictionary containing tuples of lower and upper bounds
+                                       for the generated exploration rate for each agent. If the
+                                       dictionary contains "_all_" as a key, the corresponding
+                                       value will be shared amongst all agents.
+            with_cache (bool): if True, incoming performances will be cached.
+        """
+        self._total_episodes = total_episodes
+        self._epsilon_range_dict = epsilon_range_dict
+        self._performance_cache = [] if with_cache else None
+        if "_all_" in self._epsilon_range_dict:
+            self._current_epsilon = {agent_id: self._epsilon_range_dict["_all_"][1] for agent_id in agent_id_list}
+        else:
+            self._current_epsilon = {agent_id: self._epsilon_range_dict.get(agent_id, (.0, .0))[1]
+                                     for agent_id in agent_id_list}
+
+    # TODO: performance: summary -> total perf (current version), details -> per-agent perf
+    @abstractmethod
+    def update(self, performance=None):
+        """
+        Updates current epsilon.
+        Args:
+            performance: performance from the latest episode.
+        """
+        return NotImplementedError
+
+    @property
+    def epsilon_range_dict(self):
+        return self._epsilon_range_dict
+
+    @property
+    def epsilon(self):
+        return self._current_epsilon
+
+    @epsilon.setter
+    def epsilon(self, epsilon_dict: dict):
+        self._current_epsilon = epsilon_dict
+
+    def epsilon_range_by_id(self, agent_id):
+        return self._epsilon_range_dict[agent_id]
--- a/maro/rl/explorer/simple_explorer.py
+++ b/maro/rl/explorer/simple_explorer.py
@ -1,10 +1,10 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-from .abstract_explorer import AbstractExplorer
+from .abs_explorer import AbsExplorer


-class LinearExplorer(AbstractExplorer):
+class LinearExplorer(AbsExplorer):
    def __init__(self, agent_id_list, total_episodes, epsilon_range_dict, with_cache=True):
        super().__init__(agent_id_list, total_episodes, epsilon_range_dict, with_cache=with_cache)
        self._step_dict = {}
@ -17,7 +17,7 @@ class LinearExplorer(AbstractExplorer):
            self._current_epsilon[agent_id] = max(.0, self._current_epsilon[agent_id] - self._step_dict[agent_id])


-class TwoPhaseLinearExplorer(AbstractExplorer):
+class TwoPhaseLinearExplorer(AbsExplorer):
    """
    An exploration scheme that consists of two linear schedules separated by a split point
    """
--- a/maro/rl/learner/abs_learner.py
+++ b/maro/rl/learner/abs_learner.py
@ -0,0 +1,22 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+
+
+class AbsLearner(ABC):
+    def __init__(self):
+        pass
+
+    @abstractmethod
+    def train(self, total_episodes):
+        """
+        Main loop for collecting experiences and performance from the actor and using them to optimize models
+        Args:
+            total_episodes (int): number of episodes for the main training loop
+        """
+        pass
+
+    @abstractmethod
+    def test(self):
+        pass
--- a/maro/rl/learner/simple_learner.py
+++ b/maro/rl/learner/simple_learner.py
@ -1,55 +1,64 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.

-from maro.rl.actor.abstract_actor import RolloutMode
-from .abstract_learner import AbstractLearner
-from maro.rl.agent.agent_manager import AgentManager
+from .abs_learner import AbsLearner
+from maro.rl.agent.abs_agent_manager import AbsAgentManager
+from maro.rl.actor.simple_actor import SimpleActor
 from maro.utils import DummyLogger


-class SimpleLearner(AbstractLearner):
+class SimpleLearner(AbsLearner):
    """
-    A learner class that executes simple roll-out-and-train cycles.
+    It is used to control the policy learning process...
    """
-
-    def __init__(self, trainable_agents: AgentManager, actor, logger=DummyLogger(), seed: int = None):
+    def __init__(self, trainable_agents: AbsAgentManager, actor, logger=DummyLogger()):
        """
-        seed (int): initial random seed value for the underlying simulator. If None, no manual seed setting   \n
-                        is performed.
+        seed (int): initial random seed value for the underlying simulator. If None, no manual seed setting is
+                    performed.
        Args:
-            trainable_agents (dict or AgentManager): an AgentManager instance that manages all agents
-            actor (Actor of ActorProxy): an Actor or VectorActorProxy instance.
-            logger: used for logging important events
-            seed (int): initial random seed value for the underlying simulator. If None, no seed fixing is done \n
-                        for the underlying simulator.
+            trainable_agents (AbsAgentManager): an AgentManager instance that manages all agents.
+            actor (Actor or ActorProxy): an Actor or VectorActorProxy instance.
+            logger: used for logging important events.
        """
-        assert isinstance(trainable_agents, AgentManager), \
-            "SimpleLearner only accepts AgentManager for parameter trainable_agents"
-        super().__init__(trainable_agents=trainable_agents, actor=actor, logger=logger)
-        self._seed = seed
+        super().__init__()
+        self._trainable_agents = trainable_agents
+        self._actor = actor
+        self._logger = logger

    def train(self, total_episodes):
+        """
+        Main loop for collecting experiences and performance from the actor and using them to optimize models.
+        Args:
+            total_episodes (int): number of episodes for the main training loop.
+        """
        for current_ep in range(1, total_episodes+1):
-            models = None if self._is_shared_agent_instance() else self._trainable_agents.get_models()
-            performance, exp_by_agent = self._actor.roll_out(mode=RolloutMode.TRAIN,
-                                                             models=models,
-                                                             epsilon_dict=self._trainable_agents.explorer.epsilon,
-                                                             seed=self._seed)
-            if self._seed is not None:
-                self._seed += len(performance)
-            for actor_id, perf in performance.items():
-                self._logger.info(f"ep {current_ep} - performance: {perf}, source: {actor_id}, "
-                                  f"epsilons: {self._trainable_agents.explorer.epsilon}")
+            model_dict = None if self._is_shared_agent_instance() else self._trainable_agents.get_models()
+            epsilon_dict = self._trainable_agents.explorer.epsilon if self._trainable_agents.explorer else None
+            performance, exp_by_agent = self._actor.roll_out(model_dict=model_dict, epsilon_dict=epsilon_dict)
+            if isinstance(performance, dict):
+                for actor_id, perf in performance.items():
+                    self._logger.info(f"ep {current_ep} - performance: {perf}, source: {actor_id}, epsilons: {epsilon_dict}")
+            else:
+                self._logger.info(f"ep {current_ep} - performance: {performance}, epsilons: {epsilon_dict}")

            self._trainable_agents.store_experiences(exp_by_agent)
            self._trainable_agents.train()
            self._trainable_agents.update_epsilon(performance)

    def test(self):
-        performance, _ = self._actor.roll_out(mode=RolloutMode.TEST, models=self._trainable_agents.get_models())
+        """
+        This tells the actor to perform one episode of roll-out for model testing purposes.
+        """
+        performance, _ = self._actor.roll_out(model_dict=self._trainable_agents.get_models(), return_details=False)
        for actor_id, perf in performance.items():
            self._logger.info(f"test performance from {actor_id}: {perf}")
-        self._actor.roll_out(mode=RolloutMode.EXIT)
+        self._actor.roll_out(done=True)

    def dump_models(self, dir_path: str):
        self._trainable_agents.dump_models(dir_path)
+
+    def _is_shared_agent_instance(self):
+        """
+        If true, the set of agents performing inference in actor is the same as self._trainable_agents.
+        """
+        return isinstance(self._actor, SimpleActor) and id(self._actor.inference_agents) == id(self._trainable_agents)
--- a/maro/rl/models/torch/decision_layers.py
+++ b/maro/rl/models/torch/decision_layers.py
@ -33,11 +33,10 @@ class MLPDecisionLayers(nn.Module):
            self._head = nn.Linear(self._input_dim, self._output_dim)
        else:
            self._head = nn.Linear(hidden_dims[-1], self._output_dim)
-        self._device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-        self._net = nn.Sequential(*self._layers, self._head).to(self._device)
+        self._net = nn.Sequential(*self._layers, self._head)

    def forward(self, x):
-        return self._net(x.to(self._device)).double()
+        return self._net(x).double()

    @property
    def input_dim(self):
--- a/maro/rl/models/torch/mlp_representation.py
+++ b/maro/rl/models/torch/mlp_representation.py
@ -34,11 +34,10 @@ class MLPRepresentation(nn.Module):
            self._head = nn.Linear(self._input_dim, self._output_dim)
        else:
            self._head = nn.Linear(hidden_dims[-1], self._output_dim)
-        self._device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-        self._net = nn.Sequential(*self._layers, self._head).to(self._device)
+        self._net = nn.Sequential(*self._layers, self._head)

    def forward(self, x):
-        return self._net(x.to(self._device)).double()
+        return self._net(x).double()

    @property
    def input_dim(self):
--- a/maro/rl/shaping/abs_shaper.py
+++ b/maro/rl/shaping/abs_shaper.py
@ -0,0 +1,13 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+
+
+class AbsShaper(ABC):
+    def __init__(self, *args, **kwargs):
+        pass
+
+    @abstractmethod
+    def __call__(self, *args, **kwargs):
+        pass
--- a/maro/rl/shaping/action_shaper.py
+++ b/maro/rl/shaping/action_shaper.py
@ -0,0 +1,18 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+from .abs_shaper import AbsShaper
+
+
+class ActionShaper(AbsShaper):
+    """
+    An action shaper is used to convert an agent's model output to an Action object which can be executed by the
+    environment.
+    """
+    @abstractmethod
+    def __call__(self, model_action, decision_event, snapshot_list):
+        pass
+
+    def reset(self):
+        pass
--- a/maro/rl/shaping/experience_shaper.py
+++ b/maro/rl/shaping/experience_shaper.py
@ -0,0 +1,33 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+from typing import Callable, Iterable, Union
+from .abs_shaper import AbsShaper
+
+
+class ExperienceShaper(AbsShaper):
+    """
+    A reward shaper is used to record transitions during a roll-out episode and perform necessary post-processing
+    at the end of the episode. The post-processing logic is encapsulated in the abstract shape() method and needs
+    to be implemented for each scenario. In particular, it is necessary to specify how to determine the reward for
+    an action given the business metrics associated with the corresponding transition.
+    """
+    def __init__(self, reward_func: Union[Callable, None], *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._reward_func = reward_func
+
+    @abstractmethod
+    def __call__(self, trajectory, snapshot_list) -> Iterable:
+        """
+        Converts transitions along a trajectory to experiences.
+
+        Args:
+            snapshot_list: snapshot list stored in the env at the end of an episode.
+        Returns:
+            Experiences that can be used by the algorithm
+        """
+        pass
+
+    def reset(self):
+        pass
--- a/maro/rl/shaping/k_step_experience_shaper.py
+++ b/maro/rl/shaping/k_step_experience_shaper.py
@ -0,0 +1,61 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from collections import defaultdict, deque
+from enum import Enum
+from typing import Callable
+
+from .experience_shaper import ExperienceShaper
+
+
+class KStepExperienceKeys(Enum):
+    STATE = "state"
+    ACTION = "action"
+    REWARD = "reward"
+    RETURN = "return"
+    NEXT_STATE = "next_state"
+    NEXT_ACTION = "next_action"
+    DISCOUNT = "discount"
+
+
+class KStepExperienceShaper(ExperienceShaper):
+    def __init__(self, reward_func: Callable, reward_decay: float, steps: int, is_per_agent: bool = True):
+        """
+        An experience shaper that generates K-step returns as well as the full return for each transition
+        along a trajectory.
+
+        Args:
+            reward_func: a function used to compute immediate rewards from metrics given by the env.
+            reward_decay: decay factor used to evaluate multi-step returns.
+            steps: number of time steps used in computing returns
+            is_per_agent: if True, the generated experiences will be bucketed by agent ID.
+        """
+        super().__init__(reward_func)
+        self._reward_decay = reward_decay
+        self._steps = steps
+        self._is_per_agent = is_per_agent
+
+    def __call__(self, trajectory, snapshot_list):
+        experiences = defaultdict(lambda: defaultdict(deque)) if self._is_per_agent else defaultdict(deque)
+        reward_list = deque()
+        full_return = partial_return = 0
+        for i in range(len(trajectory) - 2, -1, -1):
+            transition = trajectory[i]
+            next_transition = trajectory[min(len(trajectory) - 1, i + self._steps)]
+            reward_list.appendleft(self._reward_func(trajectory[i]["metrics"]))
+            # compute the full return
+            full_return = full_return * self._reward_decay + reward_list[0]
+            # compute the partial return
+            partial_return = partial_return * self._reward_decay + reward_list[0]
+            if len(reward_list) > self._steps:
+                partial_return -= reward_list.pop() * self._reward_decay ** (self._steps - 1)
+            agent_exp = experiences[transition["agent_id"]] if self._is_per_agent else experiences
+            agent_exp[KStepExperienceKeys.STATE.value].appendleft(transition["state"])
+            agent_exp[KStepExperienceKeys.ACTION.value].appendleft(transition["action"])
+            agent_exp[KStepExperienceKeys.REWARD.value].appendleft(partial_return)
+            agent_exp[KStepExperienceKeys.RETURN.value].appendleft(full_return)
+            agent_exp[KStepExperienceKeys.NEXT_STATE.value].appendleft(next_transition["state"])
+            agent_exp[KStepExperienceKeys.NEXT_ACTION.value].appendleft(next_transition["action"])
+            agent_exp[KStepExperienceKeys.DISCOUNT.value].appendleft(self._reward_decay ** (min(self._steps, len(trajectory)-1-i)))
+
+        return experiences
--- a/maro/rl/shaping/state_shaper.py
+++ b/maro/rl/shaping/state_shaper.py
@ -0,0 +1,18 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import abstractmethod
+from .abs_shaper import AbsShaper
+
+
+class StateShaper(AbsShaper):
+    """
+    A state shaper is used to convert a decision event and snapshot list to a state vector as input to value or
+    policy models by extracting relevant temporal and spatial information.
+    """
+    @abstractmethod
+    def __call__(self, decision_event, snapshot_list):
+        pass
+
+    def reset(self):
+        pass
--- a/maro/rl/storage/abs_store.py
+++ b/maro/rl/storage/abs_store.py
@ -0,0 +1,78 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from abc import ABC, abstractmethod
+from typing import Sequence
+
+
+class AbsStore(ABC):
+    def __init__(self):
+        pass
+
+    @abstractmethod
+    def get(self, indexes: Sequence):
+        """
+        Get contents.
+
+        Args:
+            indexes: a sequence of indexes where store contents are to be retrieved.
+        Returns:
+            retrieved contents.
+        """
+        pass
+
+    @abstractmethod
+    def put(self, contents: Sequence, index_sampler: Sequence):
+        """
+        Put new contents.
+
+        Args:
+            contents: Item object list.
+            index_sampler: optional custom sampler used to obtain indexes for overwriting
+        Returns:
+            The newly appended item indexes.
+        """
+        pass
+
+    @abstractmethod
+    def update(self, indexes: Sequence, contents: Sequence):
+        """
+        Update selected contents.
+
+        Args:
+            indexes: Item indexes list.
+            contents: Item list, which has the same length as indexes.
+        Returns:
+            The updated item index list.
+        """
+        pass
+
+    def filter(self, filters):
+        """
+        Multi-filter method.
+            The next layer filter input is the last layer filter output.
+
+        Args:
+            filters (Iterable[Callable]): Filter list, each item is a lambda function.
+                The lambda input is a tuple, (index, object).
+                i.e. [lambda x: x['a'] == 1 and x['b'] == 1]
+
+        Returns:
+            list: Filtered indexes or contents. i.e. [1, 2, 3], ['a', 'b', 'c']
+        """
+        pass
+
+    @abstractmethod
+    def sample(self, size, weights: Sequence, replace: bool = True):
+        """
+        Obtain a random sample from the experience pool.
+
+        Args:
+            size (int): sample sizes for each round of sampling in the chain. If this is a single integer, it is
+                        used as the sample size for all samplers in the chain.
+            weights (Sequence): a sequence of sampling weights
+            replace (bool): if True, sampling is performed with replacement. Default is True.
+        Returns:
+            Tuple: Sampled indexes and contents.i.e. [1, 2, 3], ['a', 'b', 'c']
+        """
+        pass
--- a/maro/rl/storage/column_based_store.py
+++ b/maro/rl/storage/column_based_store.py
@ -0,0 +1,194 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from collections import defaultdict
+from typing import Callable, List, Sequence, Tuple
+
+import numpy as np
+
+from .abs_store import AbsStore
+from .utils import check_uniformity, get_update_indexes, normalize, OverwriteType
+from maro.utils import clone
+
+
+class ColumnBasedStore(AbsStore):
+    def __init__(self, capacity: int = -1, overwrite_type: OverwriteType = None):
+        """
+        A ColumnBasedStore instance that uses a Python list as its internal storage data structure and supports unlimited
+        and limited storage.
+
+        Args:
+            capacity: if -1, the store is of unlimited capacity. Default is -1.
+            overwrite_type (OverwriteType): If storage capacity is bounded, this specifies how existing entries
+                                            are overwritten.
+        """
+        super().__init__()
+        self._capacity = capacity
+        self._store = defaultdict(lambda: [] if self._capacity < 0 else [None] * self._capacity)
+        self._size = 0
+        self._overwrite_type = overwrite_type
+        self._iter_index = 0
+
+    def __len__(self):
+        return self._size
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        if self._iter_index >= self._size:
+            self._iter_index = 0
+            raise StopIteration
+        index = self._iter_index
+        self._iter_index += 1
+        return {k: lst[index] for k, lst in self._store.items()}
+
+    def __getitem__(self, index: int):
+        return {k: lst[index] for k, lst in self._store.items()}
+
+    @property
+    def capacity(self):
+        return self._capacity
+
+    @property
+    def overwrite_type(self):
+        return self._overwrite_type
+
+    def get(self, indexes: [int]) -> dict:
+        return {k: [self._store[k][i] for i in indexes] for k in self._store}
+
+    @check_uniformity(arg_num=1)
+    def put(self, contents: dict, overwrite_indexes: Sequence = None) -> List[int]:
+        if len(self._store) > 0 and contents.keys() != self._store.keys():
+            raise ValueError(f"expected keys {list(self._store.keys())}, got {list(contents.keys())}")
+        added_size = len(contents[next(iter(contents))])
+        if self._capacity < 0:
+            for key, lst in contents.items():
+                self._store[key].extend(lst)
+            self._size += added_size
+            return list(range(self._size-added_size, self._size))
+        else:
+            write_indexes = get_update_indexes(self._size, added_size, self._capacity, self._overwrite_type,
+                                               overwrite_indexes=overwrite_indexes)
+            self.update(write_indexes, contents)
+            self._size = min(self._capacity, self._size + added_size)
+            return write_indexes
+
+    @check_uniformity(arg_num=2)
+    def update(self, indexes: Sequence, contents: dict) -> Sequence:
+        """
+        Update selected contents.
+
+        Args:
+            indexes: Item indexes list.
+            contents: contents to write to the internal store at given positions
+        Returns:
+            The updated item indexes.
+        """
+        for key, value_list in contents.items():
+            assert len(indexes) == len(value_list), f"expected updates at {len(indexes)} indexes, got {len(value_list)}"
+            for index, value in zip(indexes, value_list):
+                self._store[key][index] = value
+
+        return indexes
+
+    def apply_multi_filters(self, filters: Sequence[Callable]):
+        """Multi-filter method.
+            The next layer filter input is the last layer filter output.
+
+        Args:
+            filters (Sequence[Callable]): Filter list, each item is a lambda function.
+                                          i.e. [lambda d: d['a'] == 1 and d['b'] == 1]
+        Returns:
+            Filtered indexes and corresponding objects.
+        """
+        indexes = range(self._size)
+        for f in filters:
+            indexes = [i for i in indexes if f(self[i])]
+
+        return indexes, self.get(indexes)
+
+    def apply_multi_samplers(self, samplers: Sequence[Tuple[Callable, int]], replace: bool = True) -> Tuple:
+        """Multi-samplers method.
+            The next layer sampler input is the last layer sampler output.
+
+        Args:
+            samplers ([Tuple[Callable, int]]): Sampler list, each sampler is a tuple.
+                The 1st item of the tuple is a lambda function.
+                    The 1st lambda input is index, the 2nd lambda input is a object.
+                The 2nd item of the tuple is the sample size.
+                i.e. [(lambda o: o['a'], 3)]
+            replace: If True, sampling will be performed with replacement.
+        Returns:
+            Sampled indexes and corresponding objects.
+        """
+        indexes = range(self._size)
+        for weight_fn, sample_size in samplers:
+            weights = np.asarray([weight_fn(self[i]) for i in indexes])
+            indexes = np.random.choice(indexes, size=sample_size, replace=replace, p=weights/np.sum(weights))
+
+        return indexes, self.get(indexes)
+
+    @normalize
+    def sample(self, size, weights: Sequence = None, replace: bool = True):
+        """
+        Obtain a random sample from the experience pool.
+
+        Args:
+            size (int): sample sizes for each round of sampling in the chain. If this is a single integer, it is
+                        used as the sample size for all samplers in the chain.
+            weights (Sequence): a sequence of sampling weights.
+            replace (bool): if True, sampling is performed with replacement. Default is True.
+        Returns:
+            Sampled indexes and the corresponding objects.
+        """
+        indexes = np.random.choice(self._size, size=size, replace=replace, p=weights)
+        return indexes, self.get(indexes)
+
+    def sample_by_key(self, key, size, replace: bool = True):
+        """
+        Obtain a random sample from the store using one of the columns as sampling weights.
+
+        Args:
+            key: the column whose values are to be used as sampling weights.
+            size: sample size.
+            replace: If True, sampling is performed with replacement.
+        Returns:
+            Sampled indexes and the corresponding objects.
+        """
+        weights = np.asarray(self._store[key][:self._size] if self._size < self._capacity else self._store[key])
+        indexes = np.random.choice(self._size, size=size, replace=replace, p=weights/np.sum(weights))
+        return indexes, self.get(indexes)
+
+    def sample_by_keys(self, keys: Sequence, sizes: Sequence, replace: bool = True):
+        """
+        Obtain a random sample from the store by chained sampling using multiple columns as sampling weights.
+
+        Args:
+            keys: the column whose values are to be used as sampling weights.
+            sizes: sample size.
+            replace: If True, sampling is performed with replacement.
+        Returns:
+            Sampled indexes and the corresponding objects.
+        """
+        if len(keys) != len(sizes):
+            raise ValueError(f"expected sizes of length {len(keys)}, got {len(sizes)}")
+
+        indexes = range(self._size)
+        for key, size in zip(keys, sizes):
+            weights = np.asarray([self._store[key][i] for i in indexes])
+            indexes = np.random.choice(indexes, size=size, replace=replace, p=weights/np.sum(weights))
+
+        return indexes, self.get(indexes)
+
+    def dumps(self):
+        return clone(self._store)
+
+    def get_by_key(self, key):
+        return self._store[key]
+
+    def clear(self):
+        del self._store
+        self._store = defaultdict(lambda: [] if self._capacity < 0 else [None] * self._capacity)
+        self._size = 0
+        self._iter_index = 0
--- a/maro/rl/storage/utils.py
+++ b/maro/rl/storage/utils.py
@ -0,0 +1,60 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from enum import Enum
+from functools import wraps
+
+import numpy as np
+
+
+def check_uniformity(arg_num):
+    def decorator(func):
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            contents = args[arg_num]
+            length = len(contents[next(iter(contents))])
+            if any(len(lst) != length for lst in contents.values()):
+                raise ValueError(f"all sequences in contents should have the same length")
+            return func(*args, **kwargs)
+        return wrapper
+    return decorator
+
+
+def normalize(func):
+    @wraps(func)
+    def wrapper(size, weights=None, replace=True):
+        if weights is not None and not isinstance(weights, np.ndarray):
+            weights = np.asarray(weights)
+
+        return func(size, weights/np.sum(weights), replace)
+
+    return wrapper
+
+
+class OverwriteType(Enum):
+    ROLLING = "rolling"
+    RANDOM = "random"
+
+
+def get_update_indexes(size, added_size, capacity, overwrite_type, overwrite_indexes=None):
+    if added_size > capacity:
+        raise ValueError(f"size of added items should not exceed the store capacity.")
+
+    num_overwrites = size + added_size - capacity
+    if num_overwrites < 0:
+        return list(range(size, size + added_size))
+
+    if overwrite_indexes is not None:
+        write_indexes = list(range(size, capacity)) + list(overwrite_indexes)
+    else:
+        # follow the overwrite rule set at init
+        if overwrite_type == OverwriteType.ROLLING:
+            # using the negative index convention for convenience
+            start_index = size - capacity
+            write_indexes = list(range(start_index, start_index + added_size))
+        else:
+            random_indexes = np.random.choice(size, size=num_overwrites, replace=False)
+            write_indexes = list(range(size, capacity)) + list(random_indexes)
+
+    return write_indexes
+
--- a/maro/simulator/scenarios/citi_bike/business_engine.py
+++ b/maro/simulator/scenarios/citi_bike/business_engine.py
@ -135,7 +135,10 @@ class CitibikeBusinessEngine(AbsBusinessEngine):
        return tick + 1 == self._max_tick

    def get_node_mapping(self)->dict:
-        return {}
+        node_mapping = {}
+        for station in self._stations:
+            node_mapping[station.index] = station.id
+        return node_mapping

    def reset(self):
        """Reset after episode"""
@ -154,6 +157,8 @@ class CitibikeBusinessEngine(AbsBusinessEngine):
            station.reset()

        self._matrices_node.reset()
+        
+        self._decision_strategy.reset()

    def get_agent_idx_list(self) -> List[int]:
        return [station.index for station in self._stations]
@ -446,12 +451,13 @@ class CitibikeBusinessEngine(AbsBusinessEngine):
                self._event_buffer.insert_event(transfer_evt)

    def _build_temp_data(self):
+        """build temporary data for predefined environment"""
        logger.warning_yellow(f"Binary data files for scenario: citi_bike topology: {self._topology} not found.")
        citi_bike_process = CitiBikeProcess(is_temp=True)
        if self._topology in citi_bike_process.topologies:
            pid = str(os.getpid())
            logger.warning_yellow(
-                f"Generating temp binary data file for scenario: citi_bike topology: {self._topology} pid: {pid}. If you want to keep the data, please use MARO CLI command 'maro data generate -s citi_bike -t {self._topology}' to generate the binary data files first.")
+                f"Generating temp binary data file for scenario: citi_bike topology: {self._topology} pid: {pid}. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t {self._topology}' to generate the binary data files first.")
            self._citi_bike_data_pipeline = citi_bike_process.topologies[self._topology]
            self._citi_bike_data_pipeline.download()
            self._citi_bike_data_pipeline.clean()
--- a/maro/simulator/scenarios/citi_bike/common.py
+++ b/maro/simulator/scenarios/citi_bike/common.py
@ -53,6 +53,9 @@ class DecisionEvent:
    def __repr__(self):
        return f"decision event {self.__getstate__()}"

+    def __str__(self):
+        return f'DecisionEvent(tick={self.tick}, station_idx={self.station_idx}, type={self.type}, action_scope={self.action_scope})'
+

 class Action:
    def __init__(self, from_station_idx: int, to_station_idx: int, number: int):
--- a/maro/simulator/scenarios/citi_bike/decision_strategy.py
+++ b/maro/simulator/scenarios/citi_bike/decision_strategy.py
@ -30,6 +30,9 @@ class DistanceFilter:

        return result

+    def reset(self):
+        pass
+
 class RequirementsFilter:
    def __init__(self, conf: dict):
        self._output_num = conf["num"]
@ -42,27 +45,47 @@ class RequirementsFilter:

        return {neighbor_scope[i][0]: neighbor_scope[i][1] for i in range(output_num)}

+    def reset(self):
+        pass
+
 class TripsWindowFilter:
    def __init__(self, conf: dict, snapshot_list):
        self._output_num = conf["num"]
        self._windows = conf["windows"]
        self._snapshot_list = snapshot_list

+        self._window_states_cache = {}
+
    def filter(self, station_idx:int, decision_type: DecisionType, source: Dict[int, int]) -> Dict[int, int]:
        output_num = min(self._output_num, len(source))

        avaiable_frame_indices = self._snapshot_list.get_frame_index_list()

-        avaiable_frame_indices = avaiable_frame_indices[-output_num:]
+        # max windows we can get
+        available_windows = min(self._windows, len(avaiable_frame_indices))

-        trip_states = self._snapshot_list["stations"][avaiable_frame_indices::"trip_requirement"]
-        trip_states = trip_states.reshape(-1, len(self._snapshot_list["stations"]))
+        # get frame index list for latest N windows
+        avaiable_frame_indices = avaiable_frame_indices[-available_windows:]

        source_trips = {}

-        for neighbor_idx, _ in source.items():
-            source_trips[neighbor_idx] = trip_states[:, neighbor_idx].sum()
+        for i, frame_index in enumerate(avaiable_frame_indices):
+            if i == available_windows -1 or frame_index not in self._window_states_cache:
+                # overwrite latest one, since it may be changes, and cache not exist one
+                trip_state = self._snapshot_list["stations"][frame_index::"trip_requirement"]

+                self._window_states_cache[frame_index] = trip_state
+
+            trip_state = self._window_states_cache[frame_index]
+
+            for neighbor_idx, _ in source.items():
+                trip_num = trip_state[neighbor_idx]
+                
+                if neighbor_idx not in source_trips:
+                    source_trips[neighbor_idx] = trip_num
+                else:
+                    source_trips[neighbor_idx] += trip_num
+                    
        is_sort_reverse = False

        if decision_type == DecisionType.Demand:
@ -75,8 +98,12 @@ class TripsWindowFilter:
        for neighbor_idx, _ in sorted_neighbors[0: output_num]:
            result[neighbor_idx] = source[neighbor_idx]

+
        return result

+    def reset(self):
+        self._window_states_cache.clear()
+

 class BikeDecisionStrategy:
    """Helper to provide decision related logic"""
@ -219,6 +246,11 @@ class BikeDecisionStrategy:
            if bike_number == 0:
                break

+    def reset(self):
+        """Reset internal states"""
+        for filter_instance in self._filters:
+            filter_instance.reset()
+
    def _construct_action_scope_filters(self, conf: dict):
        for filter_conf in conf["filters"]:
            filter_type = filter_conf["type"]
--- a/maro/simulator/scenarios/helpers.py
+++ b/maro/simulator/scenarios/helpers.py
@ -1,5 +1,5 @@
 # Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.s
+# Licensed under the MIT license.

 import sys
 import warnings
--- a/maro/utils/exception/error_code.py
+++ b/maro/utils/exception/error_code.py
@ -33,4 +33,9 @@ ERROR_CODE = {
    3001: "Command Error",
    3002: "Parsing Error",
    3003: "Deployment Error",
+
+    # 4000-4999: Error codes for RL toolkit
+    4001: "Unsupported Agent Mode Error",
+    4002: "Missing Shaper Error",
+    4003: "Wrong Agent Mode Error"
 }
--- a/maro/utils/exception/rl_toolkit_exception.py
+++ b/maro/utils/exception/rl_toolkit_exception.py
@ -0,0 +1,28 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from maro.utils.exception import MAROException
+
+
+class UnsupportedAgentModeError(MAROException):
+    """
+    Unsupported agent mode error
+    """
+    def __init__(self, msg: str = None):
+        super().__init__(4001, msg)
+
+
+class MissingShaperError(MAROException):
+    """
+    Missing shaper error
+    """
+    def __init__(self, msg: str = None):
+        super().__init__(4002, msg)
+
+
+class WrongAgentModeError(MAROException):
+    """
+    Wrong agent mode error
+    """
+    def __init__(self, msg: str = None):
+        super().__init__(4003, msg)
--- a/maro/utils/logger.py
+++ b/maro/utils/logger.py
@ -95,8 +95,8 @@ class Logger:

    Args:
        tag (str): Log tag for stream and file output.
-        format_ (LogFormat): Predefined formatter, the default value is LogFormat.full. \n
-                        i.e. LogFormat.full: full time | host | user | pid | tag | level | msg \n
+        format_ (LogFormat): Predefined formatter, the default value is LogFormat.full.
+                        i.e. LogFormat.full: full time | host | user | pid | tag | level | msg
                             LogFormat.simple: simple time | tag | level | msg
        dump_folder (str): Log dumped folder, the default value is the current folder. The dumped log level is
                        logging.DEBUG. The full path of the dumped log file is `dump_folder/tag.log`.
--- a/maro/utils/utils.py
+++ b/maro/utils/utils.py
@ -2,6 +2,7 @@
 # Licensed under the MIT license.


+import configparser
 from glob import glob
 import io
 import numpy as np
@ -11,7 +12,6 @@ import random
 import shutil
 import time
 import warnings
-import yaml

 from maro import __data_version__
 from maro.utils.exception.cli_exception import CommandError
@ -61,7 +61,7 @@ def set_seeds(seed):
    random.seed(seed)


-version_file_path = os.path.join(os.path.expanduser("~/.maro"), "version.yml")
+version_file_path = os.path.join(os.path.expanduser("~/.maro"), "version.ini")

 project_root = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")

@ -85,12 +85,12 @@ def deploy(hide_info=True):
        for target_dir, source_dir in target_source_pairs:
            shutil.copytree(source_dir, target_dir)
        # deploy success
-        version_info = {
-            "data_version": __data_version__,
-            "deploy_time": time.time()
-        }
+        version_info = configparser.ConfigParser()
+        version_info["MARO_DATA"] = {}
+        version_info["MARO_DATA"]["version"] = __data_version__
+        version_info["MARO_DATA"]["deploy_time"] = str(int(time.time()))
        with io.open(version_file_path, "w") as version_file:
-            yaml.dump(version_info, version_file)
+            version_info.write(version_file)
        info_list.append("Data files for MARO deployed.")
    except Exception as e:
        error_list.append(f"An issue occured while deploying meta files for MARO. {e} Please run 'maro meta deploy' to deploy the data files.")
@ -111,10 +111,12 @@ def check_deployment_status():
    ret = False
    if os.path.exists(version_file_path):
        with io.open(version_file_path, "r") as version_file:
-            version_info = yaml.safe_load(version_file)
-            if "deploy_time" in version_info \
-                and "data_version" in version_info \
-                and version_info["data_version"] == __data_version__:
+            version_info = configparser.ConfigParser()
+            version_info.read(version_file)
+            if "MARO_DATA" in version_info \
+                and "deploy_time" in version_info["MARO_DATA"] \
+                and "version" in version_info["MARO_DATA"] \
+                and version_info["MARO_DATA"]["version"] == __data_version__:
                ret = True
    return ret

--- a/notebooks/bike_repositioning/interact_with_simulator.ipynb
+++ b/notebooks/bike_repositioning/interact_with_simulator.ipynb
@ -4,13 +4,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Quick Start"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
+    "# Quick Start\n",
+    "\n",
    "Below is a simple demo of interaction with the environment."
   ]
  },
@ -18,44 +13,54 @@
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10:54:35 | WARNING | \u001b[33mBinary data files for scenario: citi_bike topology: toy.3s_4t not found.\u001b[0m\n",
+      "10:54:35 | WARNING | \u001b[33mGenerating temp binary data file for scenario: citi_bike topology: toy.3s_4t pid: 77526. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t toy.3s_4t' to generate the binary data files first.\u001b[0m\n",
+      "10:54:35 | INFO    | \u001b[32mGenerating trip data for topology toy.3s_4t .\u001b[0m\n",
+      "10:54:36 | INFO    | \u001b[32mCleaning weather data\u001b[0m\n",
+      "10:54:36 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/9c6a7687fb9d42b7/trips.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/9c6a7687fb9d42b7/trips.bin\u001b[0m\n",
+      "10:54:44 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/40c6de7db2cc44f1/weather.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/40c6de7db2cc44f1/KNYC_daily.bin\u001b[0m\n",
+      "{'perf': 0.4517766497461929, 'total_trips': 2167, 'total_shortage': 1188}\n"
+     ]
+    }
+   ],
   "source": [
    "from maro.simulator import Env\n",
    "from maro.simulator.scenarios.citi_bike.common import Action, DecisionEvent\n",
    "\n",
-    "env = Env(scenario=\"citi_bike\", topology=\"ny.201912\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
+    "env = Env(scenario=\"citi_bike\", topology=\"toy.3s_4t\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
    "\n",
-    "is_done: bool = False\n",
-    "reward: int = None\n",
+    "metrics: object = None\n",
    "decision_event: DecisionEvent = None\n",
+    "is_done: bool = False\n",
    "\n",
    "while not is_done:\n",
    "    action: Action = None\n",
-    "    reward, decision_event, is_done = env.step(action)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Environment of the bike repositioning"
+    "    metrics, decision_event, is_done = env.step(action)\n",
+    "\n",
+    "print(metrics)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "# Environment of the bike repositioning\n",
+    "\n",
    "To initialize an environment, you need to specify the values of several parameters:\n",
-    "- **scenario**: The target scenario of this Env. \"citi_bike\" denotes for the bike repositioning.\n",
-    "- **topology**: The target topology of this Env.\n",
-    "   + There are some predefined topologies in MARO, that you can directly use it as in the demo.\n",
-    "   + Also, you can define your own topologies following the guidance in the [doc](docs/customization/new_topology.rst).\n",
+    "- **scenario**: The target scenario of this Env.\n",
+    "  - `citi_bike` denotes for the bike repositioning.\n",
+    "- **topology**: The target topology of this Env. As shown below, you can get the predefined topology list by calling `get_topologies(scenario='citi_bike')`\n",
    "- **start_tick**: The start tick of this Env, 1 tick corresponds to 1 minute in citi_bike.\n",
-    "   + In the demo above, *start_tick=0* indicates a simulation start from the beginning of the given topology.\n",
+    "   - In the demo above, `start_tick=0` indicates a simulation start from the beginning of the given topology.\n",
    "- **durations**: The duration of thie Env, in the unit of tick/minute.\n",
-    "   + In the demo above, *durations=1440* indicates a simulation length of 1 day (24h * 60min/h).\n",
+    "   - In the demo above, `durations=1440` indicates a simulation length of 1 day (24h * 60min/h).\n",
    "- **snapshot_resolution**: The time granularity of maintaining the snapshots of the environments, in the unit of tick/minute.\n",
-    "   + In the demo above, *snapshot_resolution=30* indicates that a snapshot will be created and saved every 30 minutes during the simulation.\n",
+    "   - In the demo above, `snapshot_resolution=30` indicates that a snapshot will be created and saved every 30 minutes during the simulation.\n",
    "\n",
    "You can get all available scenarios and topologies by calling:"
   ]
@ -66,57 +71,64 @@
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "[{'scenario': 'citi_bike', 'topology': 'ny201912'},\n",
-       " {'scenario': 'citi_bike', 'topology': 'train'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.0'}]"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "'The available scenarios in MARO:'\n",
+      "['citi_bike', 'ecr']\n",
+      "\n",
+      "'The predefined topologies in Citi Bike:'\n",
+      "['ny.201912',\n",
+      " 'ny.201808',\n",
+      " 'ny.201907',\n",
+      " 'ny.202005',\n",
+      " 'ny.201812',\n",
+      " 'ny.201804',\n",
+      " 'toy.3s_4t',\n",
+      " 'toy.4s_4t',\n",
+      " 'ny.201908',\n",
+      " 'ny.201910',\n",
+      " 'train',\n",
+      " 'ny.201909',\n",
+      " 'ny.202002',\n",
+      " 'ny.201811',\n",
+      " 'ny.201906',\n",
+      " 'ny.201802',\n",
+      " 'ny.201803',\n",
+      " 'ny.201905',\n",
+      " 'ny.202003',\n",
+      " 'ny.201805',\n",
+      " 'ny.201809',\n",
+      " 'toy.5s_6t',\n",
+      " 'ny.201801',\n",
+      " 'ny.201904',\n",
+      " 'ny.201902',\n",
+      " 'ny.201901',\n",
+      " 'ny.201911',\n",
+      " 'ny.201903',\n",
+      " 'ny.202001',\n",
+      " 'ny.202004',\n",
+      " 'ny.201806',\n",
+      " 'ny.201807',\n",
+      " 'ny.202006',\n",
+      " 'ny.201810']\n"
+     ]
    }
   ],
   "source": [
-    "from maro.simulator.utils import get_available_envs\n",
+    "from maro.simulator.utils import get_scenarios, get_topologies\n",
+    "from pprint import pprint\n",
+    "from typing import List\n",
    "\n",
-    "get_available_envs()    # TODO: specify the scenario"
+    "scenarios: List[str] = get_scenarios()\n",
+    "topologies: List[str] = get_topologies(scenario='citi_bike')\n",
+    "\n",
+    "pprint(f'The available scenarios in MARO:')\n",
+    "pprint(scenarios)\n",
+    "\n",
+    "print()\n",
+    "pprint(f'The predefined topologies in Citi Bike:')             # TODO: update the ordered output\n",
+    "pprint(topologies)"
   ]
  },
  {
@ -135,35 +147,51 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "10:54:44 | WARNING | \u001b[33mBinary data files for scenario: citi_bike topology: toy.3s_4t not found.\u001b[0m\n",
+      "10:54:44 | WARNING | \u001b[33mGenerating temp binary data file for scenario: citi_bike topology: toy.3s_4t pid: 77526. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t toy.3s_4t' to generate the binary data files first.\u001b[0m\n",
+      "10:54:44 | INFO    | \u001b[32mGenerating trip data for topology toy.3s_4t .\u001b[0m\n",
+      "10:54:45 | INFO    | \u001b[32mCleaning weather data\u001b[0m\n",
+      "10:54:45 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/ef14844303414e4c/trips.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/ef14844303414e4c/trips.bin\u001b[0m\n",
+      "10:54:53 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/900b86b777124bb6/weather.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/900b86b777124bb6/KNYC_daily.bin\u001b[0m\n",
      "The current tick: 0.\n",
      "The current frame index: 0.\n",
-      "There are 528 agents in this Env.\n",
+      "There are 3 agents in this Env.\n",
      "There will be 48 snapshots in total.\n",
      "\n",
      "Env Summary:\n",
-      "{'node_detail': {'matrices': {'attributes': {...}, 'number': 1},\n",
-      "                 'stations': {'attributes': {...}, 'number': 528}},\n",
+      "{'node_detail': {'matrices': {'attributes': {'trips_adj': {'slots': 9,\n",
+      "                                                           'type': 'i'}},\n",
+      "                              'number': 1},\n",
+      "                 'stations': {'attributes': {'bikes': {'slots': 1, 'type': 'i'},\n",
+      "                                             'capacity': {'slots': 1,\n",
+      "                                                          'type': 'i'},\n",
+      "                                             'extra_cost': {'slots': 1,\n",
+      "                                                            'type': 'i'},\n",
+      "                                             'failed_return': {'slots': 1,\n",
+      "                                                               'type': 'i'},\n",
+      "                                             'fulfillment': {'slots': 1,\n",
+      "                                                             'type': 'i'},\n",
+      "                                             'holiday': {'slots': 1,\n",
+      "                                                         'type': 'i2'},\n",
+      "                                             'min_bikes': {'slots': 1,\n",
+      "                                                           'type': 'i'},\n",
+      "                                             'shortage': {'slots': 1,\n",
+      "                                                          'type': 'i'},\n",
+      "                                             'temperature': {'slots': 1,\n",
+      "                                                             'type': 'i2'},\n",
+      "                                             'transfer_cost': {'slots': 1,\n",
+      "                                                               'type': 'i'},\n",
+      "                                             'trip_requirement': {'slots': 1,\n",
+      "                                                                  'type': 'i'},\n",
+      "                                             'weather': {'slots': 1,\n",
+      "                                                         'type': 'i2'},\n",
+      "                                             'weekday': {'slots': 1,\n",
+      "                                                         'type': 'i2'}},\n",
+      "                              'number': 3}},\n",
      " 'node_mapping': {}}\n",
      "\n",
-      "Env Summary - matrices:\n",
-      "{'attributes': {'distance_adj': {'slots': 278784, 'type': 'f'},\n",
-      "                'trips_adj': {'slots': 278784, 'type': 'i'}},\n",
-      " 'number': 1}\n",
-      "\n",
-      "Env Summary - stations:\n",
-      "{'attributes': {'bikes': {'slots': 1, 'type': 'i'},\n",
-      "                'capacity': {'slots': 1, 'type': 'i'},\n",
-      "                'extra_cost': {'slots': 1, 'type': 'i'},\n",
-      "                'failed_return': {'slots': 1, 'type': 'i'},\n",
-      "                'fulfillment': {'slots': 1, 'type': 'i'},\n",
-      "                'holiday': {'slots': 1, 'type': 'i2'},\n",
-      "                'shortage': {'slots': 1, 'type': 'i'},\n",
-      "                'temperature': {'slots': 1, 'type': 'i2'},\n",
-      "                'transfer_cost': {'slots': 1, 'type': 'i'},\n",
-      "                'trip_requirement': {'slots': 1, 'type': 'i'},\n",
-      "                'weather': {'slots': 1, 'type': 'i2'},\n",
-      "                'weekday': {'slots': 1, 'type': 'i2'}},\n",
-      " 'number': 528}\n"
+      "Env Metrics:\n",
+      "{'perf': 1, 'total_trips': 0, 'total_shortage': 0}\n"
     ]
    }
   ],
@ -175,7 +203,7 @@
    "\n",
    "\n",
    "# Initialize an Env for citi_bike scenario\n",
-    "env = Env(scenario=\"citi_bike\", topology=\"ny201912\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
+    "env = Env(scenario=\"citi_bike\", topology=\"toy.3s_4t\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
    "\n",
    "# The current tick\n",
    "tick: int = env.tick\n",
@ -197,82 +225,105 @@
    "# The summary info of the environment\n",
    "summary: dict = env.summary\n",
    "print(f\"\\nEnv Summary:\")\n",
-    "pprint(summary, depth=3)\n",
+    "pprint(summary)\n",
    "\n",
-    "print(f\"\\nEnv Summary - matrices:\")\n",
-    "pprint(summary['node_detail']['matrices'])\n",
+    "# The metrics of the environment\n",
+    "metrics: dict = env.metrics\n",
+    "print(f\"\\nEnv Metrics:\")            # TODO: update the output with node mapping\n",
+    "pprint(metrics)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Interaction with the environment\n",
+    "\n",
+    "Before starting interaction with the environment, we need to know **DecisionEvent** and **Action** first.\n",
    "\n",
-    "print(f\"\\nEnv Summary - stations:\")\n",
-    "pprint(summary['node_detail']['stations'])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Interaction with the environment"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Before starting interaction with the environment, we need to know **DecisionEvent** and **Action** first."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
    "## DecisionEvent\n",
    "\n",
-    "Once the environment need the agent's response to promote the simulation, it will throw an **DecisionEvent**. In the scenario of citi_bike, the information of each DecisionEvent is listed as below:\n",
-    "- **station_idx**: the id of the station/agent that needs to respond to the environment\n",
-    "- **tick**: the corresponding tick\n",
-    "- **frame_index**: the corresponding frame index, that is the index of the corresponding snapshot in the snapshot list\n",
-    "- **type**: the decision type of this decision event. In citi_bike scenario, there are 2 types:\n",
-    "   + **Supply**: There is too many bikes in the corresponding station, it's better to reposition some of them to other stations.\n",
-    "   + **Demand**: There is no enough bikes in the corresponding station, it's better to reposition bikes from other stations\n",
-    "- **action_scope**: a dictionary of valid action items.\n",
-    "   + The key of the item indicates the station/agent id;\n",
-    "   + The meaning of the value differs for different decision type:\n",
-    "      * If the decision type is Supply, the value of the station itself means its bike inventory at that moment, while the value of other target stations means the number of their empty docks;\n",
-    "      * If the decision type is Demand, the value of the station itself means the number of its empty docks, while the value of other target stations means their bike inventory."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
+    "Once the environment need the agent's response to promote the simulation, it will throw an **DecisionEvent**. In the scenario of citi_bike, the information of each `DecisionEvent` is listed as below:\n",
+    "- **station_idx**: (int) The id of the station/agent that needs to respond to the environment;\n",
+    "- **tick**: (int) The corresponding tick;\n",
+    "- **frame_index**: (int) The corresponding frame index, that is the index of the corresponding snapshot in the snapshot list;\n",
+    "- **type**: (DecisionType) The decision type of this decision event. In citi_bike scenario, there are two types:\n",
+    "   - `Supply` indicates there is too many bikes in the corresponding station, so it is better to reposition some of them to other stations.\n",
+    "   - `Demand` indicates there is no enough bikes in the corresponding station, so it is better to reposition bikes from other stations\n",
+    "- **action_scope**: (Dict) A dictionary that maintains the information for calculating the valid action scope:\n",
+    "   - The key of the item indicates the station/agent id;\n",
+    "   - The meaning of the value differs for different decision type:\n",
+    "      - If the decision type is `Supply`, the value of the station itself means its bike inventory at that moment, while the value of other target stations means the number of their empty docks;\n",
+    "      - If the decision type is `Demand`, the value of the station itself means the number of its empty docks, while the value of other target stations means their bike inventory.\n",
+    "\n",
    "## Action\n",
    "\n",
-    "Once we get a **DecisionEvent** from the envirionment, we should respond with an **Action**. Valid Action could be:\n",
-    "- None, which means do nothing.\n",
-    "- A valid Action instance, including:\n",
-    "   + **from_station_idx**: int, the id of the source station of the bike transportation\n",
-    "   + **to_station_idx**: int, the id of the destination station of the bike transportation\n",
-    "   + **number**: int, the quantity of the bike transportation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Generate random actions based on the DecisionEvent"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The demo code in the Quick Start part has shown an interaction mode that doing nothing(responding with None action). Here we read the detailed information about the DecisionEvent and generate random actions based on it."
+    "Once we get a `DecisionEvent` from the envirionment, we should respond with an `Action`. Valid `Action` could be:\n",
+    "- `None`, which means do nothing.\n",
+    "- A valid `Action` instance, including:\n",
+    "   - **from_station_idx**: (int) The id of the source station of the bike transportation\n",
+    "   - **to_station_idx**: (int) The id of the destination station of the bike transportation\n",
+    "   - **number**: (int) The quantity of the bike transportation\n",
+    "\n",
+    "## Generate random actions based on the DecisionEvent\n",
+    "\n",
+    "The demo code in the Quick Start part has shown an interaction mode that doing nothing(responding with `None` action). Here we read the detailed information about the `DecisionEvent` and generate random `Action` based on it."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "10:54:53 | WARNING | \u001b[33mBinary data files for scenario: citi_bike topology: toy.3s_4t not found.\u001b[0m\n",
+      "10:54:53 | WARNING | \u001b[33mGenerating temp binary data file for scenario: citi_bike topology: toy.3s_4t pid: 77526. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t toy.3s_4t' to generate the binary data files first.\u001b[0m\n",
+      "10:54:53 | INFO    | \u001b[32mGenerating trip data for topology toy.3s_4t .\u001b[0m\n",
+      "10:54:55 | INFO    | \u001b[32mCleaning weather data\u001b[0m\n",
+      "10:54:55 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/03d1f989818548b3/trips.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/03d1f989818548b3/trips.bin\u001b[0m\n",
+      "10:55:03 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/12e2ead45d4c4ec7/weather.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/12e2ead45d4c4ec7/KNYC_daily.bin\u001b[0m\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 19, 'frame_index': 0, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 5, 1: 8, 0: 28}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a02e15d0>\n",
+      "*************\n",
+      "decision event {'station_idx': 1, 'tick': 99, 'frame_index': 3, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {0: 0, 2: 0, 1: 29}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a02f9350>\n",
+      "*************\n",
+      "decision event {'station_idx': 1, 'tick': 139, 'frame_index': 4, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {0: 0, 2: 0, 1: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a02e6e90>\n",
+      "*************\n",
+      "decision event {'station_idx': 2, 'tick': 339, 'frame_index': 11, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {1: 1, 0: 0, 2: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a02a7e90>\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 359, 'frame_index': 11, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 1, 1: 1, 0: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a0293fd0>\n",
+      "*************\n",
+      "decision event {'station_idx': 1, 'tick': 679, 'frame_index': 22, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 0, 0: 1, 1: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a0308e50>\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 759, 'frame_index': 25, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 2, 1: 4, 0: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe59e24b450>\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 779, 'frame_index': 25, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 0, 1: 0, 0: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe59e251a10>\n",
+      "*************\n",
+      "decision event {'station_idx': 1, 'tick': 919, 'frame_index': 30, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 0, 0: 0, 1: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe5a0308e10>\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 1199, 'frame_index': 39, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 0, 1: 3, 0: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe59e43efd0>\n",
+      "*************\n",
+      "decision event {'station_idx': 2, 'tick': 1319, 'frame_index': 43, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {1: 0, 0: 0, 2: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe59e500450>\n",
+      "*************\n",
+      "decision event {'station_idx': 0, 'tick': 1339, 'frame_index': 44, 'type': <DecisionType.Demand: 'demand'>, 'action_scope': {2: 1, 1: 1, 0: 30}}\n",
+      "<maro.simulator.scenarios.citi_bike.common.Action object at 0x7fe59e500090>\n"
+     ]
+    }
+   ],
   "source": [
    "from maro.simulator import Env\n",
    "from maro.simulator.scenarios.citi_bike.common import Action, DecisionEvent, DecisionType\n",
@ -280,30 +331,30 @@
    "import random\n",
    "\n",
    "# Initialize an Env for citi_bike scenario\n",
-    "env = Env(scenario=\"citi_bike\", topology=\"ny201912\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
+    "env = Env(scenario=\"citi_bike\", topology=\"toy.3s_4t\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
    "\n",
-    "is_done: bool = False\n",
-    "reward: int = None\n",
+    "metrics: object = None\n",
    "decision_event: DecisionEvent = None\n",
+    "is_done: bool = False\n",
    "action: Action = None\n",
    "\n",
    "# Start the env with a None Action\n",
-    "reward, decision_event, is_done = env.step(action)\n",
+    "metrics, decision_event, is_done = env.step(action)\n",
    "\n",
    "while not is_done:\n",
    "    if decision_event.type == DecisionType.Supply:\n",
-    "        # the value of the station itself means the bike inventory if Supply\n",
+    "        # Supply: the value of the station itself means the bike inventory\n",
    "        self_bike_inventory = decision_event.action_scope[decision_event.station_idx]\n",
-    "        # the value of other stations means the quantity of empty docks if Supply\n",
+    "        # Supply: the value of other stations means the quantity of empty docks\n",
    "        target_idx_dock_tuple_list = [\n",
    "            (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx\n",
    "        ]\n",
-    "        # random choose a target station weighted by the quantity of empty docks\n",
+    "        # Randomly choose a target station weighted by the quantity of empty docks\n",
    "        target_idx, target_dock = random.choices(\n",
    "            target_idx_dock_tuple_list,\n",
    "            weights=[item[1] for item in target_idx_dock_tuple_list]\n",
    "        )[0]\n",
-    "        # generate the corresponding random Action\n",
+    "        # Generate the corresponding random Action\n",
    "        action = Action(\n",
    "            from_station_idx=decision_event.station_idx,\n",
    "            to_station_idx=target_idx,\n",
@ -311,18 +362,18 @@
    "        )\n",
    "\n",
    "    elif decision_event.type == DecisionType.Demand:\n",
-    "        # the value of the station itself means the quantity of empty docks if Demand\n",
+    "        # Demand: the value of the station itself means the quantity of empty docks\n",
    "        self_available_dock = decision_event.action_scope[decision_event.station_idx]\n",
-    "        # the value of other stations means their bike inventory if Demand\n",
+    "        # Demand: the value of other stations means their bike inventory\n",
    "        target_idx_inventory_tuple_list = [\n",
    "            (k, v) for k, v in decision_event.action_scope.items() if k != decision_event.station_idx\n",
    "        ]\n",
-    "        # random choose a target station weighted by the bike inventory\n",
+    "        # Randomly choose a target station weighted by the bike inventory\n",
    "        target_idx, target_inventory = random.choices(\n",
    "            target_idx_inventory_tuple_list,\n",
    "            weights=[item[1] for item in target_idx_inventory_tuple_list]\n",
    "        )[0]\n",
-    "        # generate the corresponding random Action\n",
+    "        # Generate the corresponding random Action\n",
    "        action = Action(\n",
    "            from_station_idx=target_idx,\n",
    "            to_station_idx=decision_event.station_idx,\n",
@ -332,61 +383,64 @@
    "    else:\n",
    "        action = None\n",
    "    \n",
-    "    # Random sampling some records to show in the output   TODO\n",
-    "#     if random.random() > 0.95:\n",
-    "#         print(\"*************\\n{decision_event}\\n{action}\")\n",
+    "    # Randomly sample some records to show in the output\n",
+    "    if random.random() > 0.95:\n",
+    "        print(f\"*************\\n{decision_event}\\n{action}\")       # TODO: update the output\n",
    "    \n",
    "    # Respond the environment with the generated Action\n",
-    "    reward, decision_event, is_done = env.step(action)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get the environment observation"
+    "    metric, decision_event, is_done = env.step(action)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "## Get the environment observation\n",
+    "\n",
    "You can also implement other strategies or build models to take action. At this time, real-time information and historical records of the environment are very important for making good decisions. In this case, the the environment snapshot list is exactly what you need.\n",
    "\n",
    "The information in the snapshot list is indexed by 3 dimensions:\n",
-    "- A frame index or a frame index list. (int or list of int) Empty indicates for all time slides till now\n",
-    "- A station id (list). (int of list of int) Empty indicates for all stations/agents\n",
-    "- An Attribute name (list). (str of list of str) You can get all available attributes in env.summary as shown before.\n",
+    "- A frame index (list). (int / List[int]) Empty indicates for all time slides till now\n",
+    "- A station id (list). (int / List[int]) Empty indicates for all stations/agents\n",
+    "- An Attribute name (list). (str / List[str]) You can get all available attributes in `env.summary` as shown before.\n",
    "\n",
-    "The return value from the snapshot list is a numpy.ndarray with shape **(frame * attribute * station, )**.\n",
+    "The return value from the snapshot list is a numpy.ndarray with shape **(num_frame * num_station * num_attribute, )**.\n",
    "\n",
    "More detailed introduction to the snapshot list is [here](). # TODO: add hyper-link"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "10:55:03 | WARNING | \u001b[33mBinary data files for scenario: citi_bike topology: toy.3s_4t not found.\u001b[0m\n",
+      "10:55:03 | WARNING | \u001b[33mGenerating temp binary data file for scenario: citi_bike topology: toy.3s_4t pid: 77526. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t toy.3s_4t' to generate the binary data files first.\u001b[0m\n",
+      "10:55:03 | INFO    | \u001b[32mGenerating trip data for topology toy.3s_4t .\u001b[0m\n",
+      "10:55:04 | INFO    | \u001b[32mCleaning weather data\u001b[0m\n",
+      "10:55:04 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/14444479ac7e4a05/trips.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/14444479ac7e4a05/trips.bin\u001b[0m\n",
+      "10:55:12 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/5cbe492813084a6e/weather.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/5cbe492813084a6e/KNYC_daily.bin\u001b[0m\n",
+      "{'matrices': {'attributes': {...}, 'number': 1},\n",
+      " 'stations': {'attributes': {...}, 'number': 3}}\n",
      "\n",
-      "Env Summary - matrices:\n",
      "{'attributes': {'bikes': {'slots': 1, 'type': 'i'},\n",
      "                'capacity': {'slots': 1, 'type': 'i'},\n",
      "                'extra_cost': {'slots': 1, 'type': 'i'},\n",
      "                'failed_return': {'slots': 1, 'type': 'i'},\n",
      "                'fulfillment': {'slots': 1, 'type': 'i'},\n",
      "                'holiday': {'slots': 1, 'type': 'i2'},\n",
+      "                'min_bikes': {'slots': 1, 'type': 'i'},\n",
      "                'shortage': {'slots': 1, 'type': 'i'},\n",
      "                'temperature': {'slots': 1, 'type': 'i2'},\n",
      "                'transfer_cost': {'slots': 1, 'type': 'i'},\n",
      "                'trip_requirement': {'slots': 1, 'type': 'i'},\n",
      "                'weather': {'slots': 1, 'type': 'i2'},\n",
      "                'weekday': {'slots': 1, 'type': 'i2'}},\n",
-      " 'number': 528}\n"
+      " 'number': 3}\n"
     ]
    }
   ],
@ -396,36 +450,32 @@
    "\n",
    "\n",
    "# Initialize an Env for citi_bike scenario\n",
-    "env = Env(scenario=\"citi_bike\", topology=\"ny201912\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
+    "env = Env(scenario=\"citi_bike\", topology=\"toy.3s_4t\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
    "\n",
-    "# The summary info of the environment\n",
-    "print(f\"\\nEnv Summary - matrices:\")\n",
+    "# To get the attribute list that can be accessed in snapshot_list\n",
+    "pprint(env.summary['node_detail'], depth=2)\n",
+    "print()\n",
+    "# The attribute list of stations\n",
    "pprint(env.summary['node_detail']['stations'])"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "<class 'numpy.ndarray'> (10,)\n",
-      "Trip requirements for station 5 with time going by: [1. 0. 0. 1. 4. 2. 1. 4. 2. 0.]\n",
-      "\n",
-      "<class 'numpy.ndarray'> (10560,)\n",
-      "<class 'numpy.ndarray'> (10, 2, 528)\n",
-      "Station 1: [12. 13. 13. 13. 13. 13. 14. 14. 12. 12.]\n",
-      "Station 3: [19. 18. 21. 20. 18. 17. 16. 16. 15. 16.]\n",
-      "Station 5: [11. 12. 12. 12.  9.  9.  8.  4.  7.  7.]\n",
-      "Station 7: [8. 8. 8. 7. 8. 7. 7. 7. 7. 7.]\n",
-      "Station 9: [15. 15. 15. 16. 17. 18. 22. 21. 23. 23.]\n",
-      "Station 11: [13. 13. 14. 13. 16. 17. 20. 17. 17. 16.]\n",
-      "Station 13: [13. 13. 13. 14. 14. 14. 12.  9. 11. 12.]\n",
-      "Station 17: [27. 28. 28. 27. 29. 28. 31. 32. 32. 33.]\n",
-      "Station 19: [18. 19. 19. 19. 20. 20. 20. 20. 20. 21.]\n"
+      "10:55:12 | WARNING | \u001b[33mBinary data files for scenario: citi_bike topology: toy.3s_4t not found.\u001b[0m\n",
+      "10:55:12 | WARNING | \u001b[33mGenerating temp binary data file for scenario: citi_bike topology: toy.3s_4t pid: 77526. If you want to keep the data, please use MARO CLI command 'maro env data generate -s citi_bike -t toy.3s_4t' to generate the binary data files first.\u001b[0m\n",
+      "10:55:12 | INFO    | \u001b[32mGenerating trip data for topology toy.3s_4t .\u001b[0m\n",
+      "10:55:13 | INFO    | \u001b[32mCleaning weather data\u001b[0m\n",
+      "10:55:13 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/0387209299f74786/trips.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/0387209299f74786/trips.bin\u001b[0m\n",
+      "10:55:21 | INFO    | \u001b[32mBuilding binary data from /home/Jinyu/.maro/data/citi_bike/.source/.clean/toy.3s_4t/d01e92f2501f48f2/weather.csv to /home/Jinyu/.maro/data/citi_bike/.build/toy.3s_4t/d01e92f2501f48f2/KNYC_daily.bin\u001b[0m\n",
+      "array([12.,  0., 12., 17.,  0., 17., 15.,  0., 15., 12.,  0., 12.],\n",
+      "      dtype=float32)\n"
     ]
    }
   ],
@ -436,36 +486,33 @@
    "from typing import List\n",
    "\n",
    "\n",
-    "# Initialize an Env for citi_bike scenario, from 07:00 to 12:00\n",
-    "env = Env(scenario=\"citi_bike\", topology=\"ny201912\", start_tick=420, durations=300, snapshot_resolution=30)\n",
+    "# Initialize an Env for citi_bike scenario\n",
+    "env = Env(scenario=\"citi_bike\", topology=\"toy.3s_4t\", start_tick=0, durations=1440, snapshot_resolution=30)\n",
+    "\n",
+    "# Start the environment with None action\n",
+    "_, decision_event, is_done = env.step(None)\n",
    "\n",
-    "# Run the environment to the end\n",
-    "_, _, is_done = env.step(None)\n",
    "while not is_done:\n",
-    "    _, _, is_done = env.step(None)\n",
+    "    # Case of access snapshot after a certain number of frames\n",
+    "    if env.frame_index >= 24:\n",
+    "        # The frame list of past 2 hours\n",
+    "        past_2hour_frames = [x for x in range(env.frame_index - 4, env.frame_index)]\n",
+    "        decision_station_idx = decision_event.station_idx\n",
+    "        intr_station_infos = [\"trip_requirement\", \"bikes\", \"shortage\"]\n",
    "\n",
-    "# Get trip requirement from snapshot list by directly using station id and attribute name\n",
-    "station_id = 5\n",
-    "trip_info = env.snapshot_list[\"stations\"][:station_id:\"trip_requirement\"]\n",
-    "print(type(trip_info), trip_info.shape)\n",
-    "print(f\"Trip requirements for station {station_id} with time going by: {trip_info}\\n\")\n",
+    "        # Query the snapshot list of this environment to get the information of\n",
+    "        # the trip requirements, bikes, shortage of the decision station in the past 2 days\n",
+    "        past_2hour_info = env.snapshot_list[\"stations\"][\n",
+    "            past_2hour_frames : decision_station_idx : intr_station_infos\n",
+    "        ]\n",
+    "        pprint(past_2hour_info)\n",
+    "        \n",
+    "        # This demo code is used to show how to access the information in snapshot,\n",
+    "        # so we terminate the env here for clear output\n",
+    "        break\n",
    "\n",
-    "# Get capacity and bikes from snapshot list simultaneously by using attribute list\n",
-    "attribute_list = [\"capacity\", \"bikes\"]\n",
-    "info = env.snapshot_list[\"stations\"][::attribute_list]\n",
-    "print(type(info), info.shape)\n",
-    "\n",
-    "# Reshape the info of capacity and bikes into a user-friendly shape\n",
-    "num_attributes = len(attribute_list)\n",
-    "num_frame = env.frame_index + 1\n",
-    "num_stations = len(env.agent_idx_list)\n",
-    "info = info.reshape(num_frame, num_attributes, num_stations)\n",
-    "print(type(info), info.shape)\n",
-    "\n",
-    "# Pring and show the change of bikes in some stations:\n",
-    "bikes_idx = 1\n",
-    "for station_id in [1, 3, 5, 7, 9, 11, 13, 17, 19]:\n",
-    "    print(f\"Station {station_id}: {info[:, bikes_idx, station_id]}\")"
+    "    # Drive the environment with None action\n",
+    "    _, decision_event, is_done = env.step(None)"
   ]
  }
 ],
@ -490,4 +537,4 @@
 },
 "nbformat": 4,
 "nbformat_minor": 4
-}
+}
--- a/notebooks/empty_container_repositioning/interact_with_simulator.ipynb
+++ b/notebooks/empty_container_repositioning/interact_with_simulator.ipynb
@ -4,117 +4,125 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Quick Start"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below is a simple demo of interaction with the environment."
+    "# Quick Start\n",
+    "\n",
+    "Below is a sample demo of interaction with the environment."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 1,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'perf': 0.5, 'total_shortage': 100000, 'total_cost': 0}\n"
+     ]
+    }
+   ],
   "source": [
    "from maro.simulator import Env\n",
    "from maro.simulator.scenarios.ecr.common import Action, DecisionEvent\n",
    "\n",
    "env = Env(scenario=\"ecr\", topology=\"toy.5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
    "\n",
-    "is_done: bool = False\n",
-    "reward: int = None\n",
+    "metrics: object = None\n",
    "decision_event: DecisionEvent = None\n",
+    "is_done: bool = False\n",
    "\n",
    "while not is_done:\n",
    "    action: Action = None\n",
-    "    reward, decision_event, is_done = env.step(action)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Environment of Empty Container Repositioning (ECR)"
+    "    metrics, decision_event, is_done = env.step(action)\n",
+    "\n",
+    "print(metrics)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "# Environment of ECR\n",
+    "\n",
    "To initialize an environment, you need to specify the values of several parameters:\n",
-    "- **scenario**: The target scenario of this Env. \"ecr\" denotes for the Empty Container Repositioning.\n",
-    "- **topology**: The target topology of this Env.\n",
-    "   + There are some predefined topologies in MARO, that you can directly use it as in the demo.\n",
-    "   + Also, you can define your own topologies following the guidance in the [doc](docs/customization/new_topology.rst).\n",
-    "- **start_tick**: The start tick of this Env, **1 tick corresponds to 1 minute in ecr.** (TODO: to confirm)\n",
-    "   + In the demo above, *start_tick=0* indicates a simulation start from the beginning of the given topology.\n",
-    "- **durations**: The duration of thie Env, **in the unit of tick/minute**.(TODO: to confirm)\n",
-    "   + In the demo above, *durations=1440* indicates a simulation length of 1 day (24h * 60min/h).\n",
+    "- **scenario**: The target scenario of this Env.\n",
+    "  - `ecr` denotes for the Empty Container Repositioning (ECR).\n",
+    "- **topology**: The target topology of this Env. As shown below, you can get the predefined topology list by calling `get_topologies(scenario='ecr')`.\n",
+    "- **start_tick**: The start tick of this Env, 1 tick corresponds to 1 day in ecr.\n",
+    "  - In the demo above, `start_tick=0` indicates a simulation start from the beginning of the given topology.\n",
+    "- **durations**: The duration of thie Env, in the unit of tick/day.\n",
+    "  - In the demo above, `durations=100` indicates a simulation length of 100 days.\n",
    "\n",
    "You can get all available scenarios and topologies by calling:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
-     "data": {
-      "text/plain": [
-       "[{'scenario': 'citi_bike', 'topology': 'ny201912'},\n",
-       " {'scenario': 'citi_bike', 'topology': 'train'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.0'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '6p_sssbdd_l0.1'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.3'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.4'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.6'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '5p_ssddd_l0.7'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.5'},\n",
-       " {'scenario': 'ecr', 'topology': '22p_global_trade_l0.8'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.2'},\n",
-       " {'scenario': 'ecr', 'topology': '4p_ssdd_l0.0'}]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "'The available scenarios in MARO:'\n",
+      "['citi_bike', 'ecr']\n",
+      "\n",
+      "'The predefined topologies in ECR:'\n",
+      "['toy.6p_sssbdd_l0.3',\n",
+      " 'toy.4p_ssdd_l0.2',\n",
+      " 'global_trade.22p_l0.7',\n",
+      " 'toy.5p_ssddd_l0.1',\n",
+      " 'toy.6p_sssbdd_l0.1',\n",
+      " 'toy.5p_ssddd_l0.8',\n",
+      " 'toy.6p_sssbdd_l0.8',\n",
+      " 'toy.6p_sssbdd_l0.2',\n",
+      " 'toy.6p_sssbdd_l0.5',\n",
+      " 'toy.6p_sssbdd_l0.0',\n",
+      " 'toy.6p_sssbdd_l0.7',\n",
+      " 'toy.5p_ssddd_l0.3',\n",
+      " 'toy.4p_ssdd_l0.3',\n",
+      " 'toy.5p_ssddd_l0.5',\n",
+      " 'toy.4p_ssdd_l0.8',\n",
+      " 'toy.4p_ssdd_l0.1',\n",
+      " 'toy.6p_sssbdd_l0.4',\n",
+      " 'toy.5p_ssddd_l0.7',\n",
+      " 'global_trade.22p_l0.3',\n",
+      " 'global_trade.22p_l0.8',\n",
+      " 'global_trade.22p_l0.2',\n",
+      " 'toy.4p_ssdd_l0.4',\n",
+      " 'toy.6p_sssbdd_l0.6',\n",
+      " 'toy.4p_ssdd_l0.7',\n",
+      " 'toy.5p_ssddd_l0.4',\n",
+      " 'global_trade.22p_l0.1',\n",
+      " 'global_trade.22p_l0.0',\n",
+      " 'global_trade.22p_l0.6',\n",
+      " 'toy.4p_ssdd_l0.5',\n",
+      " 'toy.5p_ssddd_l0.2',\n",
+      " 'toy.4p_ssdd_l0.0',\n",
+      " 'global_trade.22p_l0.5',\n",
+      " 'toy.5p_ssddd_l0.0',\n",
+      " 'toy.5p_ssddd_l0.6',\n",
+      " 'global_trade.22p_l0.4',\n",
+      " 'toy.4p_ssdd_l0.6']\n"
+     ]
    }
   ],
   "source": [
-    "from maro.simulator.utils import get_available_envs\n",
+    "from maro.simulator.utils import get_scenarios, get_topologies\n",
+    "from pprint import pprint\n",
+    "from typing import List\n",
    "\n",
-    "get_available_envs()    # TODO: specify the scenario"
+    "scenarios: List[str] = get_scenarios()\n",
+    "topologies: List[str] = get_topologies(scenario='ecr')\n",
+    "\n",
+    "pprint(f'The available scenarios in MARO:')\n",
+    "pprint(scenarios)\n",
+    "\n",
+    "print()\n",
+    "pprint(f'The predefined topologies in ECR:')             # TODO: update the ordered output\n",
+    "pprint(topologies)"
   ]
  },
  {
@ -126,7 +134,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
@ -139,18 +147,70 @@
      "There will be 100 snapshots in total.\n",
      "\n",
      "Env Summary:\n",
-      "{'node_detail': {},\n",
-      " 'node_mapping': {'ports': {0: 'demand_port_001',\n",
-      "                            1: 'demand_port_002',\n",
-      "                            2: 'supply_port_001',\n",
-      "                            3: 'supply_port_002',\n",
-      "                            4: 'transfer_port_001'},\n",
-      "                  'vessels': {0: 'rt1_vessel_001',\n",
-      "                              1: 'rt1_vessel_002',\n",
-      "                              2: 'rt1_vessel_003',\n",
-      "                              3: 'rt2_vessel_001',\n",
-      "                              4: 'rt2_vessel_002',\n",
-      "                              5: 'rt2_vessel_003'}}}\n"
+      "{'node_detail': {'matrices': {'attributes': {'full_on_ports': {'slots': 25,\n",
+      "                                                               'type': 'i'},\n",
+      "                                             'full_on_vessels': {'slots': 30,\n",
+      "                                                                 'type': 'i'},\n",
+      "                                             'vessel_plans': {'slots': 30,\n",
+      "                                                              'type': 'i'}},\n",
+      "                              'number': 1},\n",
+      "                 'ports': {'attributes': {'acc_booking': {'slots': 1,\n",
+      "                                                          'type': 'i'},\n",
+      "                                          'acc_fulfillment': {'slots': 1,\n",
+      "                                                              'type': 'i'},\n",
+      "                                          'acc_shortage': {'slots': 1,\n",
+      "                                                           'type': 'i'},\n",
+      "                                          'booking': {'slots': 1, 'type': 'i'},\n",
+      "                                          'capacity': {'slots': 1, 'type': 'f'},\n",
+      "                                          'empty': {'slots': 1, 'type': 'i'},\n",
+      "                                          'fulfillment': {'slots': 1,\n",
+      "                                                          'type': 'i'},\n",
+      "                                          'full': {'slots': 1, 'type': 'i'},\n",
+      "                                          'on_consignee': {'slots': 1,\n",
+      "                                                           'type': 'i'},\n",
+      "                                          'on_shipper': {'slots': 1,\n",
+      "                                                         'type': 'i'},\n",
+      "                                          'shortage': {'slots': 1, 'type': 'i'},\n",
+      "                                          'transfer_cost': {'slots': 1,\n",
+      "                                                            'type': 'f'}},\n",
+      "                           'number': 5},\n",
+      "                 'vessels': {'attributes': {'capacity': {'slots': 1,\n",
+      "                                                         'type': 'f'},\n",
+      "                                            'early_discharge': {'slots': 1,\n",
+      "                                                                'type': 'i'},\n",
+      "                                            'empty': {'slots': 1, 'type': 'i'},\n",
+      "                                            'full': {'slots': 1, 'type': 'i'},\n",
+      "                                            'future_stop_list': {'slots': 3,\n",
+      "                                                                 'type': 'i'},\n",
+      "                                            'future_stop_tick_list': {'slots': 3,\n",
+      "                                                                      'type': 'i'},\n",
+      "                                            'last_loc_idx': {'slots': 1,\n",
+      "                                                             'type': 'i'},\n",
+      "                                            'next_loc_idx': {'slots': 1,\n",
+      "                                                             'type': 'i'},\n",
+      "                                            'past_stop_list': {'slots': 4,\n",
+      "                                                               'type': 'i'},\n",
+      "                                            'past_stop_tick_list': {'slots': 4,\n",
+      "                                                                    'type': 'i'},\n",
+      "                                            'remaining_space': {'slots': 1,\n",
+      "                                                                'type': 'i'},\n",
+      "                                            'route_idx': {'slots': 1,\n",
+      "                                                          'type': 'i'}},\n",
+      "                             'number': 6}},\n",
+      " 'node_mapping': {'ports': {'demand_port_001': 0,\n",
+      "                            'demand_port_002': 1,\n",
+      "                            'supply_port_001': 2,\n",
+      "                            'supply_port_002': 3,\n",
+      "                            'transfer_port_001': 4},\n",
+      "                  'vessels': {'rt1_vessel_001': 0,\n",
+      "                              'rt1_vessel_002': 1,\n",
+      "                              'rt1_vessel_003': 2,\n",
+      "                              'rt2_vessel_001': 3,\n",
+      "                              'rt2_vessel_002': 4,\n",
+      "                              'rt2_vessel_003': 5}}}\n",
+      "\n",
+      "Env Metrics:\n",
+      "{'perf': 1, 'total_shortage': 0, 'total_cost': 0}\n"
     ]
    }
   ],
@ -162,7 +222,7 @@
    "\n",
    "\n",
    "# Initialize an Env for ECR scenario\n",
-    "env = Env(scenario=\"ecr\", topology=\"5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
+    "env = Env(scenario=\"ecr\", topology=\"toy.5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
    "\n",
    "# The current tick\n",
    "tick: int = env.tick\n",
@ -184,72 +244,53 @@
    "# The summary info of the environment\n",
    "summary: dict = env.summary\n",
    "print(f\"\\nEnv Summary:\")\n",
-    "pprint(summary, depth=3)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Interaction with the environment"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Before starting interaction with the environment, we need to know DecisionEvent and Action first."
+    "pprint(summary)\n",
+    "\n",
+    "# The metrics of the environment\n",
+    "metrics: dict = env.metrics\n",
+    "print(f\"\\nEnv Metrics:\")\n",
+    "pprint(metrics)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "# Interaction with the environment\n",
+    "\n",
+    "Before starting interaction with the environment, we need to know `DecisionEvent` and `Action` first.\n",
+    "\n",
    "## DecisionEvent\n",
    "\n",
-    "Once the environment need the agent's response to promote the simulation, it will throw an **DecisionEvent**. In the scenario of ECR, the information of each DecisionEvent is listed as below:\n",
-    "- **tick**: (int) the corresponding tick\n",
-    "- **port_idx**: (int) the id of the port/agent that needs to respond to the environment\n",
-    "- **vessel_idx**: (int) the id of the vessel/operation object of the port/agnet.\n",
-    "- **snapshot_list**: (int) **Snapshots of the environment to input into the decision model** TODO: confirm the meaning\n",
-    "- **action_scope**: **Load and discharge scope for agent to generate decision**\n",
-    "- **early_discharge**: **Early discharge number of corresponding vessel**"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
+    "Once the environment need the agent's response to promote the simulation, it will throw an `DecisionEvent`. In the scenario of ECR, the information of each `DecisionEvent` is listed as below:\n",
+    "- **tick**: (int) The corresponding tick;\n",
+    "- **port_idx**: (int) The id of the port/agent that needs to respond to the environment;\n",
+    "- **vessel_idx**: (int) The id of the vessel/operation object of the port/agnet;\n",
+    "- **action_scope**: (ActionScope) ActionScope has two attributes:\n",
+    "  - `load` indicates the maximum quantity that can be loaded from the port the vessel;\n",
+    "  - `discharge` indicates the maximum quantity that can be discharged from the vessel to the port;\n",
+    "- **early_discharge**: (int) When the available capacity in the vessel is not enough to load the ladens, some of the empty containers in the vessel will be early discharged to free the space. The quantity of empty containers that have been early discharged due to the laden loading is recorded in this field.\n",
+    "\n",
    "## Action\n",
    "\n",
-    "Once we get a DecisionEvent from the envirionment, we should respond with an Action. Valid Action could be:\n",
+    "Once we get a `DecisionEvent` from the envirionment, we should respond with an `Action`. Valid `Action` could be:\n",
    "\n",
-    "- None, which means do nothing.\n",
-    "- A valid Action instance, including:\n",
-    "   + **vessel_idx**: (int) the id of the vessel/operation object of the port/agent.\n",
-    "   + **port_idx**: (int) the id of the port/agent that take this action.\n",
-    "   + **quantity**: (int) the sign of this value denotes different meanings:\n",
-    "      * positive quantity means unloading empty containers from vessel to port.\n",
-    "      * negative quantity means loading empty containers from port to vessel."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Generate random actions based on the DecisionEvent"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The demo code in the Quick Start part has shown an interaction mode that doing nothing(responding with None action). Here we read the detailed information about the DecisionEvent and generate random actions based on it."
+    "- `None`, which means do nothing.\n",
+    "- A valid `Action` instance, including:\n",
+    "   - **vessel_idx**: (int) The id of the vessel/operation object of the port/agent;\n",
+    "   - **port_idx**: (int) The id of the port/agent that take this action;\n",
+    "   - **quantity**: (int) The sign of this value denotes different meanings:\n",
+    "      - Positive quantity means unloading empty containers from vessel to port;\n",
+    "      - Negative quantity means loading empty containers from port to vessel.\n",
+    "\n",
+    "## Generate random actions based on the DecisionEvent\n",
+    "\n",
+    "The demo code in the Quick Start part has shown an interaction mode that doing nothing(responding with `None` action). Here we read the detailed information from the `DecisionEvent` and generate random `Action` based on it."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
@ -257,23 +298,14 @@
     "output_type": "stream",
     "text": [
      "*************\n",
-      "DecisionEvent(tick=7, port_idx=3, vessel_idx=1, action_scope=ActionScope {load: 20000, discharge: 0 })\n",
-      "Action {quantity: -6886, port: 3, vessel: 1 }\n",
+      "DecisionEvent(tick=7, port_idx=4, vessel_idx=2, action_scope=ActionScope {load: 12000, discharge: 0 })\n",
+      "Action {quantity: -3730, port: 4, vessel: 2 }\n",
      "*************\n",
-      "DecisionEvent(tick=14, port_idx=3, vessel_idx=0, action_scope=ActionScope {load: 13114, discharge: 2073 })\n",
-      "Action {quantity: -6744, port: 3, vessel: 0 }\n",
+      "DecisionEvent(tick=56, port_idx=0, vessel_idx=5, action_scope=ActionScope {load: 692, discharge: 2988 })\n",
+      "Action {quantity: 1532, port: 0, vessel: 5 }\n",
      "*************\n",
-      "DecisionEvent(tick=21, port_idx=0, vessel_idx=4, action_scope=ActionScope {load: 389, discharge: 5977 })\n",
-      "Action {quantity: 1936, port: 0, vessel: 4 }\n",
-      "*************\n",
-      "DecisionEvent(tick=42, port_idx=3, vessel_idx=2, action_scope=ActionScope {load: 29092, discharge: 6041 })\n",
-      "Action {quantity: 5316, port: 3, vessel: 2 }\n",
-      "*************\n",
-      "DecisionEvent(tick=77, port_idx=3, vessel_idx=0, action_scope=ActionScope {load: 26402, discharge: 12393 })\n",
-      "Action {quantity: -14075, port: 3, vessel: 0 }\n",
-      "*************\n",
-      "DecisionEvent(tick=91, port_idx=0, vessel_idx=3, action_scope=ActionScope {load: 0, discharge: 6462 })\n",
-      "Action {quantity: 6194, port: 0, vessel: 3 }\n"
+      "DecisionEvent(tick=77, port_idx=1, vessel_idx=3, action_scope=ActionScope {load: 0, discharge: 538 })\n",
+      "Action {quantity: 218, port: 1, vessel: 3 }\n"
     ]
    }
   ],
@ -283,71 +315,81 @@
    "\n",
    "import random\n",
    "\n",
-    "env = Env(scenario=\"ecr\", topology=\"5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
+    "# Initialize an Env for ecr scenario\n",
+    "env = Env(scenario=\"ecr\", topology=\"toy.5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
    "\n",
-    "is_done: bool = False\n",
-    "reward: int = None\n",
+    "metrics: object = None\n",
    "decision_event: DecisionEvent = None\n",
+    "is_done: bool = False\n",
+    "action: Action = None\n",
    "\n",
-    "reward, decision_event, is_done = env.step(None)\n",
+    "# Start the env with a None Action\n",
+    "metrics, decision_event, is_done = env.step(None)\n",
    "\n",
    "while not is_done:\n",
    "    # Generate a random Action according to the action_scope in DecisionEvent\n",
+    "    random_quantity = random.randint(\n",
+    "        -decision_event.action_scope.load,\n",
+    "        decision_event.action_scope.discharge\n",
+    "    )\n",
    "    action = Action(\n",
    "        vessel_idx=decision_event.vessel_idx,\n",
    "        port_idx=decision_event.port_idx,\n",
-    "        quantity=random.randint(-decision_event.action_scope.load, decision_event.action_scope.discharge)\n",
+    "        quantity=random_quantity\n",
    "    )\n",
-    "    # random sampling some records to show in the output\n",
+    "    \n",
+    "    # Randomly sample some records to show in the output\n",
    "    if random.random() > 0.95:\n",
    "        print(f\"*************\\n{decision_event}\\n{action}\")\n",
-    "    reward, decision_event, is_done = env.step(action) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Get the environment observation"
+    "    \n",
+    "    # Respond the environment with the generated Action\n",
+    "    metrics, decision_event, is_done = env.step(action)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "## Get the environment observation\n",
+    "\n",
    "You can also implement other strategies or build models to take action. At this time, real-time information and historical records of the environment are very important for making good decisions. In this case, the the environment snapshot list is exactly what you need.\n",
    "\n",
    "The information in the snapshot list is indexed by 3 dimensions:\n",
-    "- A frame index or a frame index list. (int or list of int) Empty indicates for all time slides till now\n",
-    "- A station id (list). (int of list of int) Empty indicates for all ports/agents\n",
-    "- An Attribute name (list). (str of list of str) You can get all available attributes in env.summary as shown before.\n",
+    "- A tick index (list). (int / List[int]) Empty indicates for all time slides till now;\n",
+    "- A port id (list). (int / List[int]) Empty indicates for all ports/agents;\n",
+    "- An attribute name (list). (str / List[str]) You can get all available attributes in `env.summary` as shown before.\n",
    "\n",
-    "The return value from the snapshot list is a numpy.ndarray with shape **(frame * attribute * station, )**.\n",
+    "The return value from the snapshot list is a numpy.ndarray with shape **(num_tick * num_port * num_attribute, )**.\n",
    "\n",
-    "More detailed introduction to the snapshot list is [here](). # TODO: add hyper-link"
+    "More detailed introduction to the snapshot list is [here](docs/_build/html/key_components/data_model.html#advanced-features).     # TODO: add hyper-link"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'node_detail': {},\n",
-      " 'node_mapping': {'ports': {0: 'demand_port_001',\n",
-      "                            1: 'demand_port_002',\n",
-      "                            2: 'supply_port_001',\n",
-      "                            3: 'supply_port_002',\n",
-      "                            4: 'transfer_port_001'},\n",
-      "                  'vessels': {0: 'rt1_vessel_001',\n",
-      "                              1: 'rt1_vessel_002',\n",
-      "                              2: 'rt1_vessel_003',\n",
-      "                              3: 'rt2_vessel_001',\n",
-      "                              4: 'rt2_vessel_002',\n",
-      "                              5: 'rt2_vessel_003'}}}\n"
+      "{'matrices': {'attributes': {...}, 'number': 1},\n",
+      " 'ports': {'attributes': {...}, 'number': 5},\n",
+      " 'vessels': {'attributes': {...}, 'number': 6}}\n",
+      "\n",
+      "{'attributes': {'acc_booking': {'slots': 1, 'type': 'i'},\n",
+      "                'acc_fulfillment': {'slots': 1, 'type': 'i'},\n",
+      "                'acc_shortage': {'slots': 1, 'type': 'i'},\n",
+      "                'booking': {'slots': 1, 'type': 'i'},\n",
+      "                'capacity': {'slots': 1, 'type': 'f'},\n",
+      "                'empty': {'slots': 1, 'type': 'i'},\n",
+      "                'fulfillment': {'slots': 1, 'type': 'i'},\n",
+      "                'full': {'slots': 1, 'type': 'i'},\n",
+      "                'on_consignee': {'slots': 1, 'type': 'i'},\n",
+      "                'on_shipper': {'slots': 1, 'type': 'i'},\n",
+      "                'shortage': {'slots': 1, 'type': 'i'},\n",
+      "                'transfer_cost': {'slots': 1, 'type': 'f'}},\n",
+      " 'number': 5}\n"
     ]
    }
   ],
@ -355,64 +397,27 @@
    "from maro.simulator import Env\n",
    "from pprint import pprint\n",
    "\n",
-    "env = Env(scenario=\"ecr\", topology=\"5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
+    "env = Env(scenario=\"ecr\", topology=\"toy.5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
    "\n",
-    "pprint(env.summary)"
+    "# To get the attribute list that can be accessed in snapshot_list\n",
+    "pprint(env.summary['node_detail'], depth=2)\n",
+    "print()\n",
+    "# The attribute list of ports\n",
+    "pprint(env.summary['node_detail']['ports'])"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "<class 'numpy.ndarray'> (5,)\n",
-      "Port 0 capacity: 100000.0\n",
-      "Port 1 capacity: 100000.0\n",
-      "Port 2 capacity: 100000.0\n",
-      "Port 3 capacity: 100000.0\n",
-      "Port 4 capacity: 100000.0\n",
-      "\n",
-      "<class 'numpy.ndarray'> (1000,)\n",
-      "<class 'numpy.ndarray'> (100, 2, 5)\n",
-      "Port 0: [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.\n",
-      "   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.\n",
-      "   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500.]\n",
-      "Port 1: [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.\n",
-      "   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.\n",
-      "   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500. 500.\n",
-      " 500. 500.]\n",
-      "Port 2: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0.]\n",
-      "Port 3: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
-      " 0. 0. 0. 0.]\n",
-      "Port 4: [   0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.\n",
-      "    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.\n",
-      "    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.\n",
-      "    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.\n",
-      "    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.\n",
-      " 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000.\n",
-      " 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000.\n",
-      " 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 1000.\n",
-      " 1000. 1000. 1000. 1000.]\n"
+      "array([1000.,    0., 1000., 1000.,    0., 1000., 1000.,    0., 1000.,\n",
+      "       1000.,    0., 1000., 1000.,    0., 1000., 1000.,    0., 1000.,\n",
+      "       1000.,    0., 1000.], dtype=float32)\n"
     ]
    }
   ],
@ -424,36 +429,34 @@
    "\n",
    "\n",
    "# Initialize an Env for ECR scenario\n",
-    "env = Env(scenario=\"ecr\", topology=\"5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
+    "env = Env(scenario=\"ecr\", topology=\"toy.5p_ssddd_l0.0\", start_tick=0, durations=100)\n",
+    "\n",
+    "# Start the environment with None action\n",
+    "_, decision_event, is_done = env.step(None)\n",
    "\n",
-    "# Run the environment to the end\n",
-    "_, _, is_done = env.step(None)\n",
    "while not is_done:\n",
-    "    _, _, is_done = env.step(None)\n",
+    "    # Case of access snapshot after a certain number of ticks\n",
+    "    if env.tick >= 80:\n",
+    "        # The tick list of past 1 week\n",
+    "        past_week_ticks = [x for x in range(env.tick - 7, env.tick)]\n",
+    "        # The port index of the current decision_event\n",
+    "        decision_port_idx = decision_event.port_idx\n",
+    "        # The attribute list to access \n",
+    "        intr_port_infos = [\"booking\", \"empty\", \"shortage\"]\n",
    "\n",
-    "# Get the capacity info for each ports by directly using initial frame index and attribute name\n",
-    "capacity = env.snapshot_list[\"ports\"][0::\"capacity\"]\n",
-    "print(type(capacity), capacity.shape)\n",
-    "for i in range(len(env.agent_idx_list)):\n",
-    "    print(f\"Port {i} capacity: {capacity[i]}\")\n",
-    "print()\n",
-    "    \n",
-    "# Get fulfillment and shortage info simultaneously by using attribute list\n",
-    "attribute_list = [\"fulfillment\", \"shortage\"]\n",
-    "info = env.snapshot_list[\"ports\"][::attribute_list]\n",
-    "print(type(info), info.shape)\n",
+    "        # Query the snapshot list of this environment to get the information of\n",
+    "        # the booking, empty, shortage of the decision port in the past week\n",
+    "        past_week_info = env.snapshot_list[\"ports\"][\n",
+    "            past_week_ticks : decision_port_idx : intr_port_infos\n",
+    "        ]\n",
+    "        pprint(past_week_info)\n",
+    "        \n",
+    "        # This demo code is used to show how to access the information in snapshot,\n",
+    "        # so we terminate the env here for clear output\n",
+    "        break\n",
    "\n",
-    "# Reshape the info of fulfillment and shortage into a user-friendly shape\n",
-    "num_attributes = len(attribute_list)\n",
-    "num_frame = env.frame_index + 1\n",
-    "num_ports = len(env.agent_idx_list)\n",
-    "info = info.reshape(num_frame, num_attributes, num_ports)\n",
-    "print(type(info), info.shape)\n",
-    "\n",
-    "# Pring and show the change of shortage in each port:\n",
-    "shortage_idx = 1\n",
-    "for port_id in env.agent_idx_list:\n",
-    "    print(f\"Port {port_id}: {info[:, shortage_idx, port_id]}\")"
+    "    # Drive the environment with None action\n",
+    "    _, decision_event, is_done = env.step(None)"
   ]
  }
 ],
@ -478,4 +481,4 @@
 },
 "nbformat": 4,
 "nbformat_minor": 4
-}
+}
--- a/notebooks/requirements.nb.txt
+++ b/notebooks/requirements.nb.txt
@ -1,15 +1,15 @@
 jupyter==1.0.0
-jupyter-client==5.3.4
-jupyter-console==6.0.0
-jupyter-contrib-core==0.3.3
-jupyter-contrib-nbextensions==0.5.1
-jupyter-core==4.6.1
-jupyter-highlight-selected-word==0.2.0
-jupyter-latex-envs==1.4.6
-jupyter-nbextensions-configurator==0.4.1
-jupyterlab==1.2.3
-jupyterlab-server==1.0.6
-jupyterthemes==0.20.0
+jupyter-client
+jupyter-console
+jupyter-contrib-core
+jupyter-contrib-nbextensions
+jupyter-core
+jupyter-highlight-selected-word
+jupyter-latex-envs
+jupyter-nbextensions-configurator
+jupyterlab
+jupyterlab-server
+jupyterthemes
 isort==4.3.21
 autopep8==1.4.4
 isort==4.3.21
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,2 +1,2 @@
 [build-system]
-requires = ["setuptools", "wheel", "numpy == 1.19.1"]
+requires = ["setuptools", "wheel", "numpy == 1.19.1"]
--- a/scripts/build_maro.sh
+++ b/scripts/build_maro.sh
@ -1,8 +1,11 @@
 #!/bin/bash

 # script to build maro locally on linux/mac, usually for development
-
-cd "$(dirname $(readlink -f $0))/.."
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi

 # compile cython files first
 bash ./scripts/compile_cython.sh
--- a/scripts/build_playground.sh
+++ b/scripts/build_playground.sh
@ -2,7 +2,11 @@

 # script to build docker for playground image on linux/mac, this require the source code of maro

-cd "$(dirname $(readlink -f $0))/.."
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi

 bash ./scripts/compile_cython.sh

--- a/scripts/build_sdist.sh
+++ b/scripts/build_sdist.sh
@ -2,7 +2,11 @@

 # script to create source package on linux

-cd "$(dirname $(readlink -f $0))/.."
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi

 bash ./scripts/compile_cython.sh

--- a/scripts/compile_cython.sh
+++ b/scripts/compile_cython.sh
@ -1,6 +1,10 @@


-cd "$(dirname $(readlink -f $0))/.."
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi

 pip install -r ./maro/requirements.build.txt

--- a/scripts/install_maro.bat
+++ b/scripts/install_maro.bat
@ -0,0 +1,17 @@
+@ECHO OFF
+
+rem Script to install MARO in editable mode on Windows,
+rem usually for development.
+
+chdir "%~dp0.."
+
+rem Install dependencies.
+pip install -r .\maro\requirements.build.txt
+
+rem Compile cython files.
+call .\scripts\compile_cython.bat
+
+call .\scripts\install_torch.bat
+
+rem Install MARO in editable mode.
+pip install -e .
--- a/scripts/install_maro.sh
+++ b/scripts/install_maro.sh
@ -0,0 +1,19 @@
+#!/bin/bash
+
+# Script to install maro in editable mode on linux/darwin, 
+# usually for development.
+
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi
+
+# Install dependencies.
+pip install -r ./maro/requirements.build.txt
+
+# Compile cython files.
+bash scripts/compile_cython.sh
+
+# Install MARO in editable mode.
+pip install -e .
--- a/scripts/run_playground.sh
+++ b/scripts/run_playground.sh
@ -2,8 +2,8 @@

 # script to start playground environment within docker container

-./redis-6.0.6/src/redis-server -p 6379 &
-redis-commander -p 40009 &
+./redis-6.0.6/src/redis-server --port 6379 &
+redis-commander --port 40009 &

 # Python 3.6
 cd ./docs/_build/html; python -m http.server 40010 -b 0.0.0.0 &
--- a/scripts/run_tests.bat
+++ b/scripts/run_tests.bat
@ -8,7 +8,7 @@ call scripts/build_maro.bat

 rem install requirements

-pip install -r ./tests/requirements.txt
+pip install -r ./tests/requirements.test.txt

 rem show coverage

--- a/scripts/run_tests.sh
+++ b/scripts/run_tests.sh
@ -1,6 +1,10 @@
 #!/bin/bash

-cd "$(dirname $(readlink -f $0))/.."
+if [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    cd "$(dirname $(readlink -f $0))/.."
+elif [[ "$OSTYPE" == "darwin"* ]]; then
+    cd "$(cd "$(dirname "$0")"; pwd -P)/.."
+fi

 bash ./scripts/build_maro.sh

@ -9,7 +13,7 @@ bash ./scripts/build_maro.sh
 export PYTHONPATH="."

 # install requirements
-pip install -r ./tests/requirements.txt
+pip install -r ./tests/requirements.test.txt

 coverage run --rcfile=./tests/.coveragerc

--- a/setup.py
+++ b/setup.py
@ -109,7 +109,6 @@ setup(
        "azure-storage-common==2.1.0",
        "geopy==2.0.0",
        "pandas==0.25.3",
-        "pycurl==7.43.0.5",
        "PyYAML==5.3.1"
    ],
    entry_points={
--- a/tests/data/citi_bike/case_1/config.yml
+++ b/tests/data/citi_bike/case_1/config.yml
@ -0,0 +1,42 @@
+decision:
+  extra_cost_mode: source # how to assign extra cost, avaiable value: source, target, target_neighbors
+
+  resolution: 1 # frequency to check if a cell need an action
+
+  # random factor to set bikes transfer time
+  effective_time_mean: 20
+  effective_time_std: 10
+
+  # these 2 water mark will affect the if decision should be generated
+  supply_water_mark_ratio: 0.8
+  demand_water_mark_ratio: 0.001
+
+  # ratio of action 
+  action_scope:
+    low: 0.05 # min ratio of available bikes to keep for current cell, to supply to neighbors
+    high: 1 # max ratio of available bikes neighbors can provide to current cell
+    filters: # filters used to pick destinations
+      - type: "distance" # sort by distance, from neareast to farest
+        num: 20 # number of output
+        
+reward: # reward options
+  fulfillment_factor: 0.4
+  shortage_factor: 0.3
+  transfer_cost_factor: 0.3
+
+# timezone of the data
+# NOTE: we need this if we want to fit local time, as binary data will convert timestamp into UTC
+# name : https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
+time_zone: "America/New_York"
+
+# path to read trip data binary file
+trip_data: "tests/data/citi_bike/case_1/trips.bin"
+
+# path to read weather data
+weather_data: "tests/data/citi_bike/weathers.bin"
+
+# path to read csv file that used to init stations, also with station id -> index mapping
+stations_init_data: "tests/data/citi_bike/case_1/stations.csv"
+
+# path to distance adj matrix
+distance_adj_data: "tests/data/citi_bike/case_1/distance_adj.csv"
--- a/tests/data/citi_bike/case_1/distance_adj.csv
+++ b/tests/data/citi_bike/case_1/distance_adj.csv
@ -0,0 +1,3 @@
+0,1
+0,10.12
+10.12,0
--- a/tests/data/citi_bike/case_1/stations.csv
+++ b/tests/data/citi_bike/case_1/stations.csv
@ -0,0 +1,3 @@
+station_index,capacity,init,station_id
+0,10,5,111
+1,20,10,222
--- a/tests/data/citi_bike/case_1/trips.csv
+++ b/tests/data/citi_bike/case_1/trips.csv
@ -0,0 +1,5 @@
+start_time,duration,start_station_index,end_station_index
+2019-01-01 00:00:00,5,0,1
+2019-01-01 00:01:00,5,0,1
+2019-01-01 00:01:00,5,1,0
+2019-01-01 00:05:00,5,0,1
--- a/tests/data/citi_bike/case_2/config.yml
+++ b/tests/data/citi_bike/case_2/config.yml
@ -0,0 +1,42 @@
+decision:
+  extra_cost_mode: source # how to assign extra cost, avaiable value: source, target, target_neighbors
+
+  resolution: 1 # frequency to check if a cell need an action
+
+  # random factor to set bikes transfer time
+  effective_time_mean: 20
+  effective_time_std: 10
+
+  # these 2 water mark will affect the if decision should be generated
+  supply_water_mark_ratio: 0.8
+  demand_water_mark_ratio: 0.001
+
+  # ratio of action 
+  action_scope:
+    low: 0.05 # min ratio of available bikes to keep for current cell, to supply to neighbors
+    high: 1 # max ratio of available bikes neighbors can provide to current cell
+    filters: # filters used to pick destinations
+      - type: "distance" # sort by distance, from neareast to farest
+        num: 20 # number of output
+        
+reward: # reward options
+  fulfillment_factor: 0.4
+  shortage_factor: 0.3
+  transfer_cost_factor: 0.3
+
+# timezone of the data
+# NOTE: we need this if we want to fit local time, as binary data will convert timestamp into UTC
+# name : https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
+time_zone: "America/New_York"
+
+# path to read trip data binary file
+trip_data: "tests/data/citi_bike/case_2/trips.bin"
+
+# path to read weather data
+weather_data: "tests/data/citi_bike/weathers.bin"
+
+# path to read csv file that used to init stations, also with station id -> index mapping
+stations_init_data: "tests/data/citi_bike/case_2/stations.csv"
+
+# path to distance adj matrix
+distance_adj_data: "tests/data/citi_bike/case_2/distance_adj.csv"
--- a/tests/data/citi_bike/case_2/distance_adj.csv
+++ b/tests/data/citi_bike/case_2/distance_adj.csv
@ -0,0 +1,3 @@
+0,1
+0,10.12
+10.12,0
--- a/tests/data/citi_bike/case_2/stations.csv
+++ b/tests/data/citi_bike/case_2/stations.csv
@ -0,0 +1,3 @@
+station_index,capacity,init,station_id
+0,10,5,111
+1,20,10,222
--- a/tests/data/citi_bike/case_2/trips.csv
+++ b/tests/data/citi_bike/case_2/trips.csv
@ -0,0 +1,10 @@
+start_time,duration,start_station_index,end_station_index
+2019-01-01 00:00:00,5,0,1
+2019-01-01 00:01:00,5,0,1
+2019-01-01 00:01:01,5,0,1
+2019-01-01 00:01:02,5,0,1
+2019-01-01 00:01:03,5,0,1
+2019-01-01 00:01:04,5,0,1
+2019-01-01 00:01:05,5,0,1
+2019-01-01 00:01:09,5,1,0
+2019-01-01 00:05:00,5,0,1
--- a/tests/data/citi_bike/trips.meta.yml
+++ b/tests/data/citi_bike/trips.meta.yml
@ -0,0 +1,28 @@
+
+events:
+  RequireBike: # type name
+    display_name: "require_bike" # can be empty, then will be same as type name (key)
+  ReturnBike:
+    display_name: "return_bike"
+  RebalanceBike:
+    display_name: "rebalance_bike"
+  DeliverBike:
+    display_name: "deliver_bike"
+    
+  "_default": "RequireBike" # default event type if not event type in column, such as citibike scenario, all the rows are trip_requirement, so we do not need to specified event column
+entity:
+  timestamp:
+    column: 'start_time'
+    dtype: 'i8'
+    tzone: "America/New_York" 
+  durations: 
+    column: 'duration'
+    dtype: 'i'
+  src_station: 
+    column: 'start_station_index'
+    dtype: 'i'
+  dest_station: 
+    column: 'end_station_index'
+    dtype: 'i'
+    slot: 1
+  "_event": "type"
--- a/tests/data/citi_bike/weather.csv
+++ b/tests/data/citi_bike/weather.csv
@ -0,0 +1,9 @@
+date,weather,temp
+1/1/2019 0:00:00,0,30.5
+1/2/2019 0:00:00,3,32.0
+1/3/2019 0:00:00,1,30.0
+1/4/2019 0:00:00,0,30.0
+1/5/2019 0:00:00,1,34.5
+1/6/2019 0:00:00,3,32.5
+1/7/2019 0:00:00,3,32.5
+1/8/2019 0:00:00,0,30.5
--- a/tests/data/citi_bike/weather.meta.yml
+++ b/tests/data/citi_bike/weather.meta.yml
@ -0,0 +1,12 @@
+
+entity:
+  timestamp:
+    column: 'date'
+    dtype: 'i8'
+    tzone: "America/New_York"
+  weather: 
+    column: 'weather'
+    dtype: 'i'
+  temp: 
+    column: 'temp'
+    dtype: 'f'
--- a/tests/data/data_lib/case_1/meta.yml
+++ b/tests/data/data_lib/case_1/meta.yml
@ -0,0 +1,18 @@
+entity:
+  timestamp:
+    column: 'start_time'
+    dtype: 'i8'
+    # used to specified time zone in source file, converter will convert it into UTC,
+    # default is UTC if not specified:
+    # name : https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
+    tzone: "America/New_York" 
+  durations: 
+    column: 'duration'
+    dtype: 'i'
+  src_station: 
+    column: 'start_station_index'
+    dtype: 'i'
+  dest_station: 
+    column: 'end_station_index'
+    dtype: 'i'
+    slot: 1
--- a/tests/data/data_lib/case_2/meta.yml
+++ b/tests/data/data_lib/case_2/meta.yml
@ -0,0 +1,26 @@
+events:
+  RequireBike: # type name
+    display_name: "require_bike" # can be empty, then will be same as type name (key)
+  ReturnBike:
+    display_name: "return_bike"
+  RebalanceBike:
+    display_name: "rebalance_bike"
+  DeliverBike:
+    display_name: "deliver_bike"
+    
+  "_default": "RequireBike" # default event type if not event type in column, such as citibike scenario, all the rows are trip_requirement, so we do not need to specified event column
+entity:
+  timestamp:
+    column: 'start_time'
+    dtype: 'i8'
+    tzone: "America/New_York" 
+  durations: 
+    column: 'duration'
+    dtype: 'i'
+  src_station: 
+    column: 'start_station_index'
+    dtype: 'i'
+  dest_station: 
+    column: 'end_station_index'
+    dtype: 'i'
+    slot: 1
--- a/tests/data/data_lib/case_3/meta.yml
+++ b/tests/data/data_lib/case_3/meta.yml
@ -0,0 +1,11 @@
+entity: 
+  durations: 
+    column: 'duration'
+    dtype: 'i'
+  src_station: 
+    column: 'start_station_index'
+    dtype: 'i'
+  dest_station: 
+    column: 'end_station_index'
+    dtype: 'i'
+    slot: 1
--- a/tests/data/data_lib/trips.csv
+++ b/tests/data/data_lib/trips.csv
@ -0,0 +1,5 @@
+start_time,duration,start_station_index,end_station_index
+2019-01-01 00:00:00,5,0,1
+2019-01-01 00:01:00,5,0,1
+2019-01-01 00:01:00,5,1,0
+2019-01-01 00:05:00,5,0,1
--- a/tests/data/ecr/case_01/README.md
+++ b/tests/data/ecr/case_01/README.md
@ -0,0 +1,7 @@
+this case used to test if the vessel arrive and leave at specified tick
+
+check points:
+
+. vessel.location
+. tick
+. vessel.state
--- a/tests/data/ecr/case_01/order.csv
+++ b/tests/data/ecr/case_01/order.csv
@ -0,0 +1 @@
+tick, source, target, order_number
--- a/Показать больше
+++ b/Показать больше