Develop python api ga (#6)

* Dropped support for python 3.6 * Pinning python 3.9.9 for tests due to typing issues with 3.9.10 * Testing new bokken image. * Testing new bokken image. * Updated yamato standalone build test. * Updated yamato standalone build test. * Updated standalone build test. * Updated yamato configs to use mla bokken vm. * Bug fixes for yamato yml files. * Fixed com.unity.ml-agents-test.yml * Bumped min python version to 3.7.2 * pettingzoo api prototype * add example * update file names * support multiple behavior names * fix multi behavior action index * add install in colab * add setup * update colab * fix __init__ * clone single branch * import tags only * import in init * catch import error * update colab * move colab and add readme * handle agent dying * add tests * update doc * add info * add action mask * fix action mask * update action masks in colab * change default env * set version * fix hybrid action * fix colab for hybrid actions * add note on auto reset * Updated colab name. * Update README.md * Following petting_zoo registry API (#5557) * init petting_zoo registry * cherrypick Custom trainer editor analytics (#5511) * cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)" * Update colab to match pettingZoo import api * ToRevert: pull exp-petting-registry branch * Add init file to tests * Install pettingzoo-unity requirements for pytest * update pytest command * Add docstrings and comments * update coverage to pettingzoo folder * unset log level * update env string * Two small bugfixes (#5589) 1. Add the missing `_cumulative_rewards` property 2. Update `agent_selection` to not error out when an agent finishes an episode. * Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing. * Some refactoring. * Finished inital implementation of parallel. Tests not passing. * Finished parallel API implementation and refactor. All PZ tests passing. * Cleanup. * Refactoring. * Pinning numpy version. * add metadata and behavior_specs initialization * addressing behaviour_spec issues * Bumped PZ version to 1.14.0. Fixed failing tests. * Refactored gym-unity and petting-zoo into ml-agents-envs * Added TODO to pydoc-config.yaml * Refactored gym and pz to be under a subpackage in mlagents_env package * Refactored ml-agents-envs docs. * Minor update to PZ API doc. * Updated mlagents_envs docs and colab. * Updated pytest gh workflow to remove ref to gym and pz. * Refactored to remove some test coupling between trainers and envs. * Updated installation doc. * Update ml-agents-envs/README.md Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> * Updated failing yamato jobs. * pettingzoo api prototype * add example * update file names * support multiple behavior names * fix multi behavior action index * add install in colab * add setup * update colab * fix __init__ * clone single branch * import tags only * import in init * catch import error * update colab * move colab and add readme * handle agent dying * add tests * update doc * add info * add action mask * fix action mask * update action masks in colab * change default env * set version * fix hybrid action * fix colab for hybrid actions * add note on auto reset * Updated colab name. * Update README.md * Following petting_zoo registry API (#5557) * init petting_zoo registry * cherrypick Custom trainer editor analytics (#5511) * cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)" * Update colab to match pettingZoo import api * ToRevert: pull exp-petting-registry branch * Add init file to tests * Install pettingzoo-unity requirements for pytest * update pytest command * Add docstrings and comments * update coverage to pettingzoo folder * unset log level * update env string * Two small bugfixes (#5589) 1. Add the missing `_cumulative_rewards` property 2. Update `agent_selection` to not error out when an agent finishes an episode. * Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing. * Some refactoring. * Finished inital implementation of parallel. Tests not passing. * Finished parallel API implementation and refactor. All PZ tests passing. * Cleanup. * Refactoring. * Pinning numpy version. * add metadata and behavior_specs initialization * addressing behaviour_spec issues * Bumped PZ version to 1.14.0. Fixed failing tests. * Refactored gym-unity and petting-zoo into ml-agents-envs * Added TODO to pydoc-config.yaml * Refactored gym and pz to be under a subpackage in mlagents_env package * Refactored ml-agents-envs docs. * Minor update to PZ API doc. * Updated mlagents_envs docs and colab. * Updated pytest gh workflow to remove ref to gym and pz. * Refactored to remove some test coupling between trainers and envs. * Updated installation doc. * Update ml-agents-envs/README.md Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> * Updated CHANGELOG. * Updated Migration guide. * Doc updates based on CR. * Updated github workflow for colab tests. * Updated github workflow for colab tests. * Updated github workflow for colab tests. * Fixed yamato import error. Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Miguel Alonso Jr <miguelalonsojr> Co-authored-by: jmercado1985 <75792879+jmercado1985@users.noreply.github.com> Co-authored-by: Maryam Honari <honari.m94@gmail.com> Co-authored-by: Henry Peteet <henry.peteet@unity3d.com> Co-authored-by: mahon94 <maryam.honari@unity3d.com> Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
2022-02-02 19:32:23 -05:00 · 2022-02-02 19:32:23 -05:00 · 28303adf6c
--- a/.github/workflows/publish_pypi.yaml
+++ b/.github/workflows/publish_pypi.yaml
@ -16,7 +16,7 @@ jobs:
    runs-on: [self-hosted, Linux, X64]
    strategy:
        matrix:
-            package-path: [ml-agents, ml-agents-envs, gym-unity]
+            package-path: [ml-agents, ml-agents-envs]

    steps:
    - uses: actions/checkout@main
--- a/.github/workflows/pytest.yml
+++ b/.github/workflows/pytest.yml
@ -5,7 +5,6 @@ on:
    paths:  # This action will only run if the PR modifies a file in one of these directories
    - 'ml-agents/**'
    - 'ml-agents-envs/**'
-    - 'gym-unity/**'
    - 'test_constraints*.txt'
    - 'test_requirements.txt'
    - '.github/workflows/pytest.yml'
@ -47,7 +46,7 @@ jobs:
 #        # This path is specific to Ubuntu
 #        path: ~/.cache/pip
 #        # Look to see if there is a cache hit for the corresponding requirements file
-#        key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'gym-unity/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
+#        key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
 #        restore-keys: |
 #          ${{ runner.os }}-pip-
 #          ${{ runner.os }}-
@ -60,14 +59,13 @@ jobs:
        python -m pip install --progress-bar=off -e ./ml-agents-envs -c ${{ matrix.pip_constraints }}
        python -m pip install --progress-bar=off -e ./ml-agents -c ${{ matrix.pip_constraints }}
        python -m pip install --progress-bar=off -r test_requirements.txt -c ${{ matrix.pip_constraints }}
-        python -m pip install --progress-bar=off -e ./gym-unity -c ${{ matrix.pip_constraints }}
        python -m pip install --progress-bar=off -e ./ml-agents-plugin-examples -c ${{ matrix.pip_constraints }}
    - name: Save python dependencies
      run: |
        pip freeze > pip_versions-${{ matrix.python-version }}.txt
        cat pip_versions-${{ matrix.python-version }}.txt
    - name: Run pytest
-      run: pytest --cov=ml-agents --cov=ml-agents-envs --cov=gym-unity --cov-report html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
+      run: pytest --cov=ml-agents --cov=ml-agents-envs --cov-report=html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
    - name: Upload pytest test results
      uses: actions/upload-artifact@v2
      with:
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -22,10 +22,6 @@ repos:
        # Exclude protobuf files and don't follow them when imported
        exclude: ".*_pb2.py"
        args: [--ignore-missing-imports, --disallow-incomplete-defs]
-    -   id: mypy
-        name: mypy-gym-unity
-        files: "gym-unity/.*"
-        args: [--ignore-missing-imports, --disallow-incomplete-defs]

 -   repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.1
--- a/.yamato/gym-interface-test.yml
+++ b/.yamato/gym-interface-test.yml
@ -30,7 +30,6 @@ test_gym_interface_{{ editor.version }}:
      pull_request.changes.any match "Project/**" OR
      pull_request.changes.any match "ml-agents/tests/yamato/**" OR
      pull_request.changes.any match "ml-agents-envs/**" OR
-      pull_request.changes.any match "gym-unity/**" OR
      pull_request.changes.any match ".yamato/gym-interface-test.yml") AND
      NOT pull_request.changes.all match "**/*.md"
    {% endif %}
--- a/README.md
+++ b/README.md
@ -38,8 +38,8 @@ developer communities.
 - Train using multiple concurrent Unity environment instances
 - Utilizes the [Unity Inference Engine](docs/Unity-Inference-Engine.md) to
  provide native cross-platform support
- Unity environment [control from Python](docs/Python-API.md)
- Wrap Unity learning environments as a [gym](gym-unity/README.md)
+- Unity environment [control from Python](docs/Python-LLAPI.md)
+- Wrap Unity learning environments as a [gym](docs/Python-Gym-API.md)

 See our [ML-Agents Overview](docs/ML-Agents-Overview.md) page for detailed
 descriptions of all these features.
--- a/com.unity.ml-agents/CHANGELOG.md
+++ b/com.unity.ml-agents/CHANGELOG.md
@ -9,15 +9,17 @@ and this project adheres to
 ## [Unreleased]
 ### Major Changes
 #### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
-#### ml-agents / ml-agents-envs / gym-unity (Python)
- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#4)
+#### ml-agents / ml-agents-envs
+- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#5)
+- Added support for the PettingZoo multi-agent API (#6)
+- Refactored `gym-unity` into the `ml-agents-envs` package (#6)

 ### Minor Changes
 #### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
-#### ml-agents / ml-agents-envs / gym-unity (Python)
+#### ml-agents / ml-agents-envs
 ### Bug Fixes
 #### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
-#### ml-agents / ml-agents-envs / gym-unity (Python)
+#### ml-agents / ml-agents-envs

 ## [2.2.1-exp.1] - 2022-01-14
 ### Major Changes
--- a/docs/Installation.md
+++ b/docs/Installation.md
@ -18,8 +18,6 @@ The ML-Agents Toolkit contains several components:
    a Unity scene. It is a foundational layer that facilitates data messaging
    between Unity scene and the Python machine learning algorithms.
    Consequently, `mlagents` depends on `mlagents_envs`.
-  - [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene
-    that supports the OpenAI Gym interface.
 - Unity [Project](../Project/) that contains several
  [example environments](Learning-Environment-Examples.md) that highlight the
  various features of the toolkit to help you get started.
--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
@ -62,7 +62,7 @@ can interact with it.

 ## Interacting with the Environment

-If you want to use the [Python API](Python-API.md) to interact with your
+If you want to use the [Python API](Python-LLAPI.md) to interact with your
 executable, you can pass the name of the executable with the argument
 'file_name' of the `UnityEnvironment`. For instance:

--- a/docs/Limitations.md
+++ b/docs/Limitations.md
@ -5,4 +5,3 @@ See the package-specific Limitations pages:
 - [`com.unity.mlagents` Unity package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md#known-limitations)
 - [`mlagents` Python package](../ml-agents/README.md#limitations)
 - [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations)
- [`gym_unity` Python package](../gym-unity/README.md#limitations)
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
@ -167,7 +167,7 @@ The ML-Agents Toolkit contains five high-level components:
  process to communicate with and control the Academy during training. However,
  it can be used for other purposes as well. For example, you could use the API
  to use Unity as the simulation engine for your own machine learning
-  algorithms. See [Python API](Python-API.md) for more information.
+  algorithms. See [Python API](Python-LLAPI.md) for more information.
 - **External Communicator** - which connects the Learning Environment with the
  Python Low-Level API. It lives within the Learning Environment.
 - **Python Trainers** which contains all the machine learning algorithms that
@ -179,9 +179,15 @@ The ML-Agents Toolkit contains five high-level components:
 - **Gym Wrapper** (not pictured). A common way in which machine learning
  researchers interact with simulation environments is via a wrapper provided by
  OpenAI called [gym](https://github.com/openai/gym). We provide a gym wrapper
-  in a dedicated `gym-unity` Python package and
-  [instructions](../gym-unity/README.md) for using it with existing machine
+  in the `ml-agents-envs` package and
+  [instructions](Python-Gym-API.md) for using it with existing machine
  learning algorithms which utilize gym.
+- **PettingZoo Wrapper** (not pictured) PettingZoo is python API for
+  interacting with multi-agent simulation environments that provides a
+  gym-like interface. We provide a PettingZoo wrapper for Unity ML-Agents
+  environments in the `ml-agents-envs` package and
+  [instructions](Python-PettingZoo-API.md) for using it with machine learning
+  algorithms.

 <p align="center">
  <img src="images/learning_environment_basic.png"
@ -286,10 +292,10 @@ In the previous mode, the Agents were used for training to generate a PyTorch
 model that the Agents can later use. However, any user of the ML-Agents Toolkit
 can leverage their own algorithms for training. In this case, the behaviors of
 all the Agents in the scene will be controlled within Python. You can even turn
-your environment into a [gym.](../gym-unity/README.md)
+your environment into a [gym.](Python-Gym-API.md)

 We do not currently have a tutorial highlighting this mode, but you can learn
-more about the Python API [here](Python-API.md).
+more about the Python API [here](Python-LLAPI.md).

 ## Flexible Training Scenarios

--- a/docs/Migrating.md
+++ b/docs/Migrating.md
@ -1,6 +1,25 @@
 # Upgrading

 # Migrating
+<!---
+TODO: update ml-agents-env package version before release
+--->
+## Migrating to the ml-agents-envs 0.29.0.dev0 package
+- Python 3.7 is now the minimum version of python supported due to [python3.6 EOL](https://endoflife.date/python).
+  Please update your python installation to 3.7.2 or higher. Note: Due to an issue with the typing system, the maximum
+  version of python supported is python 3.9.9.
+- The `gym-unity` package has been refactored into the `ml-agents-envs` package. Please update your imports accordingly.
+- Example:
+  - Before
+```python
+from gym_unity.unity_gym_env import UnityToGymWrapper
+```
+  - After:
+```python
+from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
+```
+
+
 ## Migrating the package to version 2.0
 - The official version of Unity ML-Agents supports is now 2020.3 LTS. If you run
  into issues, please consider deleting your project's Library folder and reponening your
@ -260,9 +279,9 @@ vector observations to be used simultaneously.
 - The `play_against_current_self_ratio` self-play trainer hyperparameter has
  been renamed to `play_against_latest_model_ratio`
 - Removed the multi-agent gym option from the gym wrapper. For multi-agent
-  scenarios, use the [Low Level Python API](Python-API.md).
+  scenarios, use the [Low Level Python API](Python-LLAPI.md).
 - The low level Python API has changed. You can look at the document
-  [Low Level Python API documentation](Python-API.md) for more information. If
+  [Low Level Python API documentation](Python-LLAPI.md) for more information. If
  you use `mlagents-learn` for training, this should be a transparent change.
 - The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
  `AgentAction` and `AgentReset` have been removed.
@ -487,7 +506,7 @@ vector observations to be used simultaneously.
 ### Important changes

 - The low level Python API has changed. You can look at the document
-  [Low Level Python API documentation](Python-API.md) for more information. This
+  [Low Level Python API documentation](Python-LLAPI.md) for more information. This
  should only affect you if you're writing a custom trainer; if you use
  `mlagents-learn` for training, this should be a transparent change.
  - `reset()` on the Low-Level Python API no longer takes a `train_mode`
@ -497,7 +516,7 @@ vector observations to be used simultaneously.
    `UnityEnvironment` no longer has a `reset_parameters` field. To modify float
    properties in the environment, you must use a `FloatPropertiesChannel`. For
    more information, refer to the
-    [Low Level Python API documentation](Python-API.md)
+    [Low Level Python API documentation](Python-LLAPI.md)
 - `CustomResetParameters` are now removed.
 - The Academy no longer has a `Training Configuration` nor
  `Inference Configuration` field in the inspector. To modify the configuration
--- a/docs/Python-Gym-API-Documentation.md
+++ b/docs/Python-Gym-API-Documentation.md
@ -0,0 +1,161 @@
+# Table of Contents
+
+* [mlagents\_envs.envs.unity\_gym\_env](#mlagents_envs.envs.unity_gym_env)
+  * [UnityGymException](#mlagents_envs.envs.unity_gym_env.UnityGymException)
+  * [UnityToGymWrapper](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper)
+    * [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__)
+    * [reset](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset)
+    * [step](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step)
+    * [render](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render)
+    * [close](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close)
+    * [seed](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed)
+  * [ActionFlattener](#mlagents_envs.envs.unity_gym_env.ActionFlattener)
+    * [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__)
+    * [lookup\_action](#mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action)
+
+<a name="mlagents_envs.envs.unity_gym_env"></a>
+# mlagents\_envs.envs.unity\_gym\_env
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityGymException"></a>
+## UnityGymException Objects
+
+```python
+class UnityGymException(error.Error)
+```
+
+Any error related to the gym wrapper of ml-agents.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper"></a>
+## UnityToGymWrapper Objects
+
+```python
+class UnityToGymWrapper(gym.Env)
+```
+
+Provides Gym wrapper for Unity Learning Environments.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None)
+```
+
+Environment initialization
+
+**Arguments**:
+
+- `unity_env`: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes.
+- `uint8_visual`: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0).
+- `flatten_branched`: If True, turn branched discrete action spaces into a Discrete space rather than
+    MultiDiscrete.
+- `allow_multiple_obs`: If True, return a list of np.ndarrays as observations with the first elements
+    containing the visual observations and the last element containing the array of vector observations.
+    If False, returns a single np.ndarray containing either only a single visual observation or the array of
+    vector observations.
+- `action_space_seed`: If non-None, will be used to set the random seed on created gym.Space instances.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset"></a>
+#### reset
+
+```python
+ | reset() -> Union[List[np.ndarray], np.ndarray]
+```
+
+Resets the state of the environment and returns an initial observation.
+Returns: observation (object/list): the initial observation of the
+space.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step"></a>
+#### step
+
+```python
+ | step(action: List[Any]) -> GymStepResult
+```
+
+Run one timestep of the environment's dynamics. When end of
+episode is reached, you are responsible for calling `reset()`
+to reset this environment's state.
+Accepts an action and returns a tuple (observation, reward, done, info).
+
+**Arguments**:
+
+- `action` _object/list_ - an action provided by the environment
+
+**Returns**:
+
+- `observation` _object/list_ - agent's observation of the current environment
+  reward (float/list) : amount of reward returned after previous action
+- `done` _boolean/list_ - whether the episode has ended.
+- `info` _dict_ - contains auxiliary diagnostic information.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render"></a>
+#### render
+
+```python
+ | render(mode="rgb_array")
+```
+
+Return the latest visual observations.
+Note that it will not render a new frame of the environment.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close"></a>
+#### close
+
+```python
+ | close() -> None
+```
+
+Override _close in your subclass to perform any necessary cleanup.
+Environments will automatically close() themselves when
+garbage collected or when the program exits.
+
+<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed"></a>
+#### seed
+
+```python
+ | seed(seed: Any = None) -> None
+```
+
+Sets the seed for this env's random number generator(s).
+Currently not implemented.
+
+<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener"></a>
+## ActionFlattener Objects
+
+```python
+class ActionFlattener()
+```
+
+Flattens branched discrete action spaces into single-branch discrete action spaces.
+
+<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(branched_action_space)
+```
+
+Initialize the flattener.
+
+**Arguments**:
+
+- `branched_action_space`: A List containing the sizes of each branch of the action
+space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively.
+
+<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action"></a>
+#### lookup\_action
+
+```python
+ | lookup_action(action)
+```
+
+Convert a scalar discrete action into a unique set of branched actions.
+
+**Arguments**:
+
+- `action`: A scalar value representing one of the discrete actions.
+
+**Returns**:
+
+The List containing the branched actions.
--- a/docs/Python-Gym-API.md
+++ b/docs/Python-Gym-API.md
@ -11,17 +11,9 @@ Unity environment via Python.

 ## Installation

-The gym wrapper can be installed using:
+The gym wrapper is part of the `mlgents_envs` package. Please refer to the
+[mlagents_envs installation instructions](../ml-agents-envs/README.md).

-```sh
-pip3 install gym_unity
-```
-
-or by running the following from the `/gym-unity` directory of the repository:
-
-```sh
-pip3 install -e .
-```

 ## Using the Gym Wrapper

@ -29,7 +21,7 @@ The gym interface is available from `gym_unity.envs`. To launch an environment
 from the root of the project repository use:

 ```python
-from gym_unity.envs import UnityToGymWrapper
+from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper

 env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multiple_obs)
 ```
@ -107,35 +99,37 @@ from baselines import deepq
 from baselines import logger

 from mlagents_envs.environment import UnityEnvironment
-from gym_unity.envs import UnityToGymWrapper
+from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
+

 def main():
-    unity_env = UnityEnvironment(<path-to-environment>)
-    env = UnityToGymWrapper(unity_env, uint8_visual=True)
-    logger.configure('./logs') # Change to log in a different directory
-    act = deepq.learn(
-        env,
-        "cnn", # For visual inputs
-        lr=2.5e-4,
-        total_timesteps=1000000,
-        buffer_size=50000,
-        exploration_fraction=0.05,
-        exploration_final_eps=0.1,
-        print_freq=20,
-        train_freq=5,
-        learning_starts=20000,
-        target_network_update_freq=50,
-        gamma=0.99,
-        prioritized_replay=False,
-        checkpoint_freq=1000,
-        checkpoint_path='./logs', # Change to save model in a different directory
-        dueling=True
-    )
-    print("Saving model to unity_model.pkl")
-    act.save("unity_model.pkl")
+  unity_env = UnityEnvironment( < path - to - environment >)
+  env = UnityToGymWrapper(unity_env, uint8_visual=True)
+  logger.configure('./logs')  # Change to log in a different directory
+  act = deepq.learn(
+    env,
+    "cnn",  # For visual inputs
+    lr=2.5e-4,
+    total_timesteps=1000000,
+    buffer_size=50000,
+    exploration_fraction=0.05,
+    exploration_final_eps=0.1,
+    print_freq=20,
+    train_freq=5,
+    learning_starts=20000,
+    target_network_update_freq=50,
+    gamma=0.99,
+    prioritized_replay=False,
+    checkpoint_freq=1000,
+    checkpoint_path='./logs',  # Change to save model in a different directory
+    dueling=True
+  )
+  print("Saving model to unity_model.pkl")
+  act.save("unity_model.pkl")
+

 if __name__ == '__main__':
-    main()
+  main()
 ```

 To start the training process, run the following from the directory containing
@ -163,7 +157,7 @@ method using the PPO2 baseline:

 ```python
 from mlagents_envs.environment import UnityEnvironment
-from gym_unity.envs import UnityToGymWrapper
+from mlagents_envs.envs import UnityToGymWrapper
 from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
 from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
 from baselines.bench import Monitor
@ -173,38 +167,44 @@ import baselines.ppo2.ppo2 as ppo2
 import os

 try:
-    from mpi4py import MPI
+  from mpi4py import MPI
 except ImportError:
-    MPI = None
+  MPI = None
+

 def make_unity_env(env_directory, num_env, visual, start_index=0):
-    """
-    Create a wrapped, monitored Unity environment.
-    """
-    def make_env(rank, use_visual=True): # pylint: disable=C0111
-        def _thunk():
-            unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
-            env = UnityToGymWrapper(unity_env, uint8_visual=True)
-            env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
-            return env
-        return _thunk
-    if visual:
-        return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
-    else:
-        rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
-        return DummyVecEnv([make_env(rank, use_visual=False)])
+  """
+  Create a wrapped, monitored Unity environment.
+  """
+
+  def make_env(rank, use_visual=True):  # pylint: disable=C0111
+    def _thunk():
+      unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
+      env = UnityToGymWrapper(unity_env, uint8_visual=True)
+      env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
+      return env
+
+    return _thunk
+
+  if visual:
+    return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
+  else:
+    rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
+    return DummyVecEnv([make_env(rank, use_visual=False)])
+

 def main():
-    env = make_unity_env(<path-to-environment>, 4, True)
-    ppo2.learn(
-        network="mlp",
-        env=env,
-        total_timesteps=100000,
-        lr=1e-3,
-    )
+  env = make_unity_env( < path - to - environment >, 4, True)
+  ppo2.learn(
+    network="mlp",
+    env=env,
+    total_timesteps=100000,
+    lr=1e-3,
+  )
+

 if __name__ == '__main__':
-    main()
+  main()
 ```

 ## Run Google Dopamine Algorithms
@ -236,7 +236,7 @@ instantiated, just as in the Baselines example. At the top of the file, insert

 ```python
 from mlagents_envs.environment import UnityEnvironment
-from gym_unity.envs import UnityToGymWrapper
+from mlagents_envs.envs import UnityToGymWrapper
 ```

 to import the Gym Wrapper. Navigate to the `create_atari_environment` method in
--- a/docs/Python-LLAPI-Documentation.md
+++ b/docs/Python-LLAPI-Documentation.md
--- a/docs/Python-LLAPI.md
+++ b/docs/Python-LLAPI.md
@ -6,7 +6,7 @@ an entry point to train (`mlagents-learn`) which allows you to train agents in
 Unity Environments using our implementations of reinforcement learning or
 imitation learning. This document describes how to use the `mlagents_envs` API.
 For information on using `mlagents-learn`, see [here](Training-ML-Agents.md).
-For Python Low Level API documentation, see [here](Python-API-Documentation.md).
+For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md).

 The Python Low Level API can be used to interact directly with your Unity
 learning environment. As such, it can serve as the basis for developing and
--- a/docs/Python-PettingZoo-API-Documentation.md
+++ b/docs/Python-PettingZoo-API-Documentation.md
@ -0,0 +1,246 @@
+# Table of Contents
+
+* [mlagents\_envs.envs.pettingzoo\_env\_factory](#mlagents_envs.envs.pettingzoo_env_factory)
+  * [PettingZooEnvFactory](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory)
+    * [env](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env)
+* [mlagents\_envs.envs.unity\_aec\_env](#mlagents_envs.envs.unity_aec_env)
+  * [UnityAECEnv](#mlagents_envs.envs.unity_aec_env.UnityAECEnv)
+    * [\_\_init\_\_](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__)
+    * [step](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.step)
+    * [observe](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe)
+    * [last](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.last)
+* [mlagents\_envs.envs.unity\_parallel\_env](#mlagents_envs.envs.unity_parallel_env)
+  * [UnityParallelEnv](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv)
+    * [\_\_init\_\_](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__)
+    * [reset](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset)
+* [mlagents\_envs.envs.unity\_pettingzoo\_base\_env](#mlagents_envs.envs.unity_pettingzoo_base_env)
+  * [UnityPettingzooBaseEnv](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv)
+    * [observation\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces)
+    * [observation\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space)
+    * [action\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces)
+    * [action\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space)
+    * [side\_channel](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel)
+    * [reset](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset)
+    * [seed](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed)
+    * [render](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render)
+    * [close](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close)
+
+<a name="mlagents_envs.envs.pettingzoo_env_factory"></a>
+# mlagents\_envs.envs.pettingzoo\_env\_factory
+
+<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory"></a>
+## PettingZooEnvFactory Objects
+
+```python
+class PettingZooEnvFactory()
+```
+
+<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env"></a>
+#### env
+
+```python
+ | env(seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]) -> UnityAECEnv
+```
+
+Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
+
+**Arguments**:
+
+- `seed`: The seed for the action spaces of the agents.
+- `kwargs`: Any argument accepted by `UnityEnvironment`class except file_name
+
+<a name="mlagents_envs.envs.unity_aec_env"></a>
+# mlagents\_envs.envs.unity\_aec\_env
+
+<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv"></a>
+## UnityAECEnv Objects
+
+```python
+class UnityAECEnv(UnityPettingzooBaseEnv,  AECEnv)
+```
+
+Unity AEC (PettingZoo) environment wrapper.
+
+<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(env: BaseEnv, seed: Optional[int] = None)
+```
+
+Initializes a Unity AEC environment wrapper.
+
+**Arguments**:
+
+- `env`: The UnityEnvironment that is being wrapped.
+- `seed`: The seed for the action spaces of the agents.
+
+<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.step"></a>
+#### step
+
+```python
+ | step(action: Any) -> None
+```
+
+Sets the action of the active agent and get the observation, reward, done
+and info of the next agent.
+
+**Arguments**:
+
+- `action`: The action for the active agent
+
+<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe"></a>
+#### observe
+
+```python
+ | observe(agent_id)
+```
+
+Returns the observation an agent currently can make. `last()` calls this function.
+
+<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.last"></a>
+#### last
+
+```python
+ | last(observe=True)
+```
+
+returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
+
+<a name="mlagents_envs.envs.unity_parallel_env"></a>
+# mlagents\_envs.envs.unity\_parallel\_env
+
+<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv"></a>
+## UnityParallelEnv Objects
+
+```python
+class UnityParallelEnv(UnityPettingzooBaseEnv,  ParallelEnv)
+```
+
+Unity Parallel (PettingZoo) environment wrapper.
+
+<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__"></a>
+#### \_\_init\_\_
+
+```python
+ | __init__(env: BaseEnv, seed: Optional[int] = None)
+```
+
+Initializes a Unity Parallel environment wrapper.
+
+**Arguments**:
+
+- `env`: The UnityEnvironment that is being wrapped.
+- `seed`: The seed for the action spaces of the agents.
+
+<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset"></a>
+#### reset
+
+```python
+ | reset() -> Dict[str, Any]
+```
+
+Resets the environment.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env"></a>
+# mlagents\_envs.envs.unity\_pettingzoo\_base\_env
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv"></a>
+## UnityPettingzooBaseEnv Objects
+
+```python
+class UnityPettingzooBaseEnv()
+```
+
+Unity Petting Zoo base environment.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces"></a>
+#### observation\_spaces
+
+```python
+ | @property
+ | observation_spaces() -> Dict[str, spaces.Space]
+```
+
+Return the observation spaces of all the agents.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space"></a>
+#### observation\_space
+
+```python
+ | observation_space(agent: str) -> Optional[spaces.Space]
+```
+
+The observation space of the current agent.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces"></a>
+#### action\_spaces
+
+```python
+ | @property
+ | action_spaces() -> Dict[str, spaces.Space]
+```
+
+Return the action spaces of all the agents.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space"></a>
+#### action\_space
+
+```python
+ | action_space(agent: str) -> Optional[spaces.Space]
+```
+
+The action space of the current agent.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel"></a>
+#### side\_channel
+
+```python
+ | @property
+ | side_channel() -> Dict[str, Any]
+```
+
+The side channels of the environment. You can access the side channels
+of an environment with `env.side_channel[<name-of-channel>]`.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset"></a>
+#### reset
+
+```python
+ | reset()
+```
+
+Resets the environment.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed"></a>
+#### seed
+
+```python
+ | seed(seed=None)
+```
+
+Reseeds the environment (making the resulting environment deterministic).
+`reset()` must be called after `seed()`, and before `step()`.
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render"></a>
+#### render
+
+```python
+ | render(mode="human")
+```
+
+NOT SUPPORTED.
+
+Displays a rendered frame from the environment, if supported.
+Alternate render modes in the default environments are `'rgb_array'`
+which returns a numpy array and is supported by all environments outside of classic,
+and `'ansi'` which returns the strings printed (specific to classic environments).
+
+<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close"></a>
+#### close
+
+```python
+ | close() -> None
+```
+
+Close the environment.
--- a/docs/Python-PettingZoo-API.md
+++ b/docs/Python-PettingZoo-API.md
@ -0,0 +1,54 @@
+# Unity ML-Agents PettingZoo Wrapper
+
+With the increasing interest in multi-agent training with a gym-like API, we provide a
+PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper
+provides interfaces on top of our `UnityEnvironment` class, which is the default way of
+interfacing with a Unity environment via Python.
+
+## Installation and Examples
+
+[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb)
+
+This colab notebook demonstrates the example usage of the wrapper, including installation,
+basic usages, and an example with our
+[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie)
+which is a multi-agents environment with multiple different behavior names.
+
+## API interface
+
+This wrapper is compatible with PettingZoo API. Please check out
+[PettingZoo API page](https://www.pettingzoo.ml/api) for more details.
+Here's an example of interacting with wrapped environment:
+
+```python
+from mlagents_envs.environment import UnityEnvironment
+from mlagents_envs.envs import UnityToPettingZooWrapper
+
+unity_env = UnityEnvironment("StrikersVsGoalie")
+env = UnityToPettingZooWrapper(unity_env)
+env.reset()
+for agent in env.agent_iter():
+    observation, reward, done, info = env.last()
+    action = policy(observation, agent)
+    env.step(action)
+```
+
+## Notes
+- There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments)
+  and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs.
+- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly
+  different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`,
+  the PZ wrapper will store the action, and will only perform environment stepping when all the
+  agents requesting for actions in the current step have been assigned an action. This is for
+  performance, considering that the communication between Unity and python is more efficient
+  when data are sent in batches.
+- Since the actions for the AEC wrapper are stored without applying them to the environment until
+  all the actions are queued, some components of the API might behave in unexpected way. For example, a call
+  to `env.reward` should return the instantaneous reward for that particular step, but the true
+  reward would only be available when an actual environment step is performed. It's recommended that
+  you follow the API definition for training (access rewards from `env.last()` instead of
+  `env.reward`) and the underlying mechanism shouldn't affect training results.
+- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will
+  keep going on until the specified max step is reached (default: `2**63`). There is no need to
+  call `env.reset()` except for the very beginning of instantiating an environment.
+
--- a/docs/Readme.md
+++ b/docs/Readme.md
@ -51,10 +51,10 @@
 ## API Docs

 - [API Reference](API-Reference.md)
- [Python API Documentation](Python-API-Documentation.md)
- [How to use the Python API](Python-API.md)
+- [Python API Documentation](Python-LLAPI-Documentation.md)
+- [How to use the Python API](Python-LLAPI.md)
 - [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
+- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)

 ## Translations

--- a/docs/Unity-Environment-Registry.md
+++ b/docs/Unity-Environment-Registry.md
@ -1,6 +1,6 @@
 # Unity Environment Registry [Experimental]

-The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-API.md).
+The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-LLAPI.md).

 ## Loading an Environment from the Registry

@ -14,7 +14,7 @@ for name in environment_names:
   print(name)
 ```

-The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-API.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:
+The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-LLAPI.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:

 ```python
 from mlagents_envs.registry import default_registry
--- a/docs/Using-Virtual-Environment.md
+++ b/docs/Using-Virtual-Environment.md
@ -18,8 +18,7 @@ from dependencies of other projects. This has a few advantages:
   with the different version.

 ## Python Version Requirement (Required)
-
-This guide has been tested with Python 3.7 through Python 3.8. Newer versions might not
+This guide has been tested with Python 3.7.2 through Python 3.9.9. Newer versions might not
 have support for the dependent libraries, so are not recommended.

 ## Installing Pip (Required)
@ -64,8 +63,7 @@ then python3-distutils needs to be installed. Install python3-distutils using
   environment using the same `activate` command listed above)

 Note:
-
- Verify that you are using Python 3.7. Launch a command prompt
-  using `cmd` and execute `python --version` to verify the version.
+- Verify that you are using a Python version between 3.7.2 and 3.9.9. Launch a
+  command prompt using `cmd` and execute `python --version` to verify the version.
 - Python3 installation may require admin privileges on Windows.
 - This guide is for Windows 10 using a 64-bit architecture only.
--- a/gym-unity/images/dopamine_gridworld_plot.png
+++ b/gym-unity/images/dopamine_gridworld_plot.png
--- a/gym-unity/images/dopamine_visualbanana_plot.png
+++ b/gym-unity/images/dopamine_visualbanana_plot.png
--- a/docs/localized/TR/docs/Readme.md
+++ b/docs/localized/TR/docs/Readme.md
@ -51,9 +51,9 @@
 ## API Docs

 - [API Reference](API-Reference.md)
- [How to use the Python API](Python-API.md)
+- [How to use the Python API](Python-LLAPI.md)
 - [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
+- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)

 ## Translations

@ -78,4 +78,4 @@ to keep them up just in case they are helpful to you.
 - [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
 - [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)

-->
+-->
--- a/docs/localized/zh-CN/docs/Learning-Environment-Design.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Design.md
@ -25,7 +25,7 @@ ML-Agents Academy 类按如下方式编排 agent 模拟循环：

 要创建训练环境，请扩展 Academy 和 Agent 类以实现上述方法。`Agent.CollectObservations()` 和 `Agent.AgentAction()` 函数必须实现；而其他方法是可选的，即是否需要实现它们取决于您的具体情况。

-**注意：**在这里用到的 Python API 也可用于其他目的。例如，借助于该 API，您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-API.md) 以了解更多信息。
+**注意：**在这里用到的 Python API 也可用于其他目的。例如，借助于该 API，您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-LLAPI.md) 以了解更多信息。

 ## 组织 Unity 场景

--- a/docs/localized/zh-CN/docs/ML-Agents-Overview.md
+++ b/docs/localized/zh-CN/docs/ML-Agents-Overview.md
@ -252,7 +252,7 @@ Internal Brain 中，以便为连接到该 Brain 的所有 Agent 生成
 的 Brain 类型都会设置为 External，并且场景中所有 Agent 的行为
 都将在 Python 中接受控制。

-我们目前没有教程介绍这种模式，但您可以在[这里](/docs/Python-API.md)
+我们目前没有教程介绍这种模式，但您可以在[这里](/docs/Python-LLAPI.md)
 了解有关 Python API 的更多信息。

 ### Curriculum Learning（课程学习）
--- a/docs/localized/zh-CN/docs/Readme.md
+++ b/docs/localized/zh-CN/docs/Readme.md
@ -39,6 +39,6 @@

 ## API 文档
 * [API 参考](/docs/API-Reference.md)
- * [如何使用 Python API](/docs/Python-API.md)
+ * [如何使用 Python API](/docs/Python-LLAPI.md)

-**注:** 有翻译版的文档会在右上角标注*号。
+**注:** 有翻译版的文档会在右上角标注*号。
--- a/gym-unity/gym_unity/init.py
+++ b/gym-unity/gym_unity/init.py
@ -1,5 +0,0 @@
-# Version of the library that will be used to upload to pypi
-__version__ = "0.29.0.dev0"
-
-# Git tag that will be checked to determine whether to trigger upload to pypi
-__release_tag__ = None
--- a/gym-unity/setup.py
+++ b/gym-unity/setup.py
@ -1,43 +0,0 @@
-#!/usr/bin/env python
-
-import os
-import sys
-from setuptools import setup, find_packages
-from setuptools.command.install import install
-import gym_unity
-
-VERSION = gym_unity.__version__
-EXPECTED_TAG = gym_unity.__release_tag__
-
-
-class VerifyVersionCommand(install):
-    """
-    Custom command to verify that the git tag is the expected one for the release.
-    Originally based on https://circleci.com/blog/continuously-deploying-python-packages-to-pypi-with-circleci/
-    This differs slightly because our tags and versions are different.
-    """
-
-    description = "verify that the git tag matches our version"
-
-    def run(self):
-        tag = os.getenv("GITHUB_REF", "NO GITHUB TAG!").replace("refs/tags/", "")
-
-        if tag != EXPECTED_TAG:
-            info = "Git tag: {} does not match the expected tag of this app: {}".format(
-                tag, EXPECTED_TAG
-            )
-            sys.exit(info)
-
-
-setup(
-    name="gym_unity",
-    version=VERSION,
-    description="Unity Machine Learning Agents Gym Interface",
-    license="Apache License 2.0",
-    author="Unity Technologies",
-    author_email="ML-Agents@unity3d.com",
-    url="https://github.com/Unity-Technologies/ml-agents",
-    packages=find_packages(),
-    install_requires=["gym==0.21.0", f"mlagents_envs=={VERSION}"],
-    cmdclass={"verify": VerifyVersionCommand},
-)
--- a/ml-agents-envs/README.md
+++ b/ml-agents-envs/README.md
@ -2,9 +2,13 @@

 The `mlagents_envs` Python package is part of the
 [ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
-`mlagents_envs` provides a Python API that allows direct interaction with the
-Unity game engine. It is used by the trainer implementation in `mlagents` as
-well as the `gym-unity` package to perform reinforcement learning within Unity.
+`mlagents_envs` provides three Python APIs that allows direct interaction with the
+Unity game engine:
+- A single agent API (Gym API)
+- A gym-like multi-agent API (PettingZoo API)
+- A low-level API (LLAPI)
+
+The LLAPI is used by the trainer implementation in `mlagents`.
 `mlagents_envs` can be used independently of `mlagents` for Python
 communication.

@ -13,13 +17,17 @@ communication.
 Install the `mlagents_envs` package with:

 ```sh
-python -m pip install mlagents_envs==0.28.0
+python -m pip install mlagents_envs==0.29.0
 ```

 ## Usage & More Information

-See the [Python API Guide](../docs/Python-API.md) for more information on how to
-use the API to interact with a Unity environment.
+See
+- [Gym API Guide](../docs/Python-Gym-API.md)
+- [PettingZoo API Guide](../docs/Python-PettingZoo-API.md)
+- [Python API Guide](../docs/Python-LLAPI.md)
+
+for more information on how to use the API to interact with a Unity environment.

 For more information on the ML-Agents Toolkit and how to instrument a Unity
 scene with the ML-Agents SDK, check out the main
--- a/ml-agents-envs/colabs/Colab_PettingZoo.ipynb
+++ b/ml-agents-envs/colabs/Colab_PettingZoo.ipynb
@ -0,0 +1,318 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ML-Agents PettingZoo Wrapper"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#@title Install Rendering Dependencies { display-mode: \"form\" }\n",
+    "#@markdown (You only need to run this code when using Colab's hosted runtime)\n",
+    "\n",
+    "import os\n",
+    "from IPython.display import HTML, display\n",
+    "\n",
+    "def progress(value, max=100):\n",
+    "    return HTML(\"\"\"\n",
+    "        <progress\n",
+    "            value='{value}'\n",
+    "            max='{max}',\n",
+    "            style='width: 100%'\n",
+    "        >\n",
+    "            {value}\n",
+    "        </progress>\n",
+    "    \"\"\".format(value=value, max=max))\n",
+    "\n",
+    "pro_bar = display(progress(0, 100), display_id=True)\n",
+    "\n",
+    "try:\n",
+    "  import google.colab\n",
+    "  INSTALL_XVFB = True\n",
+    "except ImportError:\n",
+    "  INSTALL_XVFB = 'COLAB_ALWAYS_INSTALL_XVFB' in os.environ\n",
+    "\n",
+    "if INSTALL_XVFB:\n",
+    "  with open('frame-buffer', 'w') as writefile:\n",
+    "    writefile.write(\"\"\"#taken from https://gist.github.com/jterrace/2911875\n",
+    "XVFB=/usr/bin/Xvfb\n",
+    "XVFBARGS=\":1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset\"\n",
+    "PIDFILE=./frame-buffer.pid\n",
+    "case \"$1\" in\n",
+    "  start)\n",
+    "    echo -n \"Starting virtual X frame buffer: Xvfb\"\n",
+    "    /sbin/start-stop-daemon --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS\n",
+    "    echo \".\"\n",
+    "    ;;\n",
+    "  stop)\n",
+    "    echo -n \"Stopping virtual X frame buffer: Xvfb\"\n",
+    "    /sbin/start-stop-daemon --stop --quiet --pidfile $PIDFILE\n",
+    "    rm $PIDFILE\n",
+    "    echo \".\"\n",
+    "    ;;\n",
+    "  restart)\n",
+    "    $0 stop\n",
+    "    $0 start\n",
+    "    ;;\n",
+    "  *)\n",
+    "        echo \"Usage: /etc/init.d/xvfb {start|stop|restart}\"\n",
+    "        exit 1\n",
+    "esac\n",
+    "exit 0\n",
+    "    \"\"\")\n",
+    "  pro_bar.update(progress(5, 100))\n",
+    "  !apt-get install daemon >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(10, 100))\n",
+    "  !apt-get install wget >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(20, 100))\n",
+    "  !wget http://security.ubuntu.com/ubuntu/pool/main/libx/libxfont/libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(30, 100))\n",
+    "  !wget --output-document xvfb.deb http://security.ubuntu.com/ubuntu/pool/universe/x/xorg-server/xvfb_1.18.4-0ubuntu0.12_amd64.deb >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(40, 100))\n",
+    "  !dpkg -i libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(50, 100))\n",
+    "  !dpkg -i xvfb.deb >/dev/null 2>&1\n",
+    "  pro_bar.update(progress(70, 100))\n",
+    "  !rm libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb\n",
+    "  pro_bar.update(progress(80, 100))\n",
+    "  !rm xvfb.deb\n",
+    "  pro_bar.update(progress(90, 100))\n",
+    "  !bash frame-buffer start\n",
+    "  os.environ[\"DISPLAY\"] = \":1\"\n",
+    "pro_bar.update(progress(100, 100))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installing ml-agents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "  import mlagents\n",
+    "  print(\"ml-agents already installed\")\n",
+    "except ImportError:\n",
+    "  !git clone -b main --single-branch https://github.com/Unity-Technologies/ml-agents.git\n",
+    "  !python -m pip install -q ./ml-agents/ml-agents-envs\n",
+    "  !python -m pip install -q ./ml-agents/ml-agents\n",
+    "  print(\"Installed ml-agents\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the Environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
+   "source": [
+    "List of available environments:\n",
+    "* Basic\n",
+    "* ThreeDBall\n",
+    "* ThreeDBallHard\n",
+    "* GridWorld\n",
+    "* Hallway\n",
+    "* VisualHallway\n",
+    "* CrawlerDynamicTarget\n",
+    "* CrawlerStaticTarget\n",
+    "* Bouncer\n",
+    "* SoccerTwos\n",
+    "* PushBlock\n",
+    "* VisualPushBlock\n",
+    "* WallJump\n",
+    "* Tennis\n",
+    "* Reacher\n",
+    "* Pyramids\n",
+    "* VisualPyramids\n",
+    "* Walker\n",
+    "* FoodCollector\n",
+    "* VisualFoodCollector\n",
+    "* StrikersVsGoalie\n",
+    "* WormStaticTarget\n",
+    "* WormDynamicTarget"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start Environment with PettingZoo Wrapper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "YSf-WhxbqtLw"
+   },
+   "outputs": [],
+   "source": [
+    "# -----------------\n",
+    "# This code is used to close an env that might not have been closed before\n",
+    "try:\n",
+    "  env.close()\n",
+    "except:\n",
+    "  pass\n",
+    "# -----------------\n",
+    "\n",
+    "import numpy as np\n",
+    "from mlagents_envs.envs import StrikersVsGoalie # import unity environment\n",
+    "env = StrikersVsGoalie.env()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Stepping the environment\n",
+    "\n",
+    "Example of interacting with the environment in basic RL loop. It follows the same interface as described in [PettingZoo API page](https://www.pettingzoo.ml/api)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "dhtl0mpeqxYi"
+   },
+   "outputs": [],
+   "source": [
+    "num_cycles = 10\n",
+    "\n",
+    "env.reset()\n",
+    "for agent in env.agent_iter(env.num_agents * num_cycles):\n",
+    "    prev_observe, reward, done, info = env.last()\n",
+    "    if isinstance(prev_observe, dict) and 'action_mask' in prev_observe:\n",
+    "        action_mask = prev_observe['action_mask']\n",
+    "    if done:\n",
+    "        action = None\n",
+    "    else:\n",
+    "        action = env.action_spaces[agent].sample() # randomly choose an action for example\n",
+    "    env.step(action)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Additional Environment API\n",
+    "\n",
+    "All the API described in the `Additional Environment API` section in the [PettingZoo API page](https://www.pettingzoo.ml/api) are all supported. A few examples are shown below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# `agents`: a list of the names of all current agents\n",
+    "print(\"Agent names:\", env.agents)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# `agent_selection`: the currently agent that an action can be taken for.\n",
+    "print(\"Current agent:\", env.agent_selection)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# `observation_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
+    "print(\"Observation space of current agent:\", env.observation_spaces[env.agent_selection])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# `action_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
+    "print(\"Action space of current agent:\", env.action_spaces[env.agent_selection])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Close the Environment to free the port it is using"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "a7KatdThq7OV"
+   },
+   "outputs": [],
+   "source": [
+    "env.close()"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "name": "Colab-UnityEnvironment-1-Run.ipynb",
+   "private_outputs": true,
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/ml-agents-envs/mlagents_envs/envs/init.py
+++ b/ml-agents-envs/mlagents_envs/envs/init.py
@ -0,0 +1,15 @@
+from mlagents_envs.registry import default_registry
+from mlagents_envs.envs.pettingzoo_env_factory import logger, PettingZooEnvFactory
+
+# Register each environment in default_registry as a PettingZooEnv
+for key in default_registry:
+    env_name = key
+    if key[0].isdigit():
+        env_name = key.replace("3", "Three")
+    if not env_name.isidentifier():
+        logger.warning(
+            f"Environment id {env_name} can not be registered since it is"
+            f"not a valid identifier name."
+        )
+        continue
+    locals()[env_name] = PettingZooEnvFactory(key)
--- a/ml-agents-envs/mlagents_envs/envs/env_helpers.py
+++ b/ml-agents-envs/mlagents_envs/envs/env_helpers.py
@ -0,0 +1,76 @@
+from urllib.parse import urlparse, parse_qs
+
+
+def _behavior_to_agent_id(behavior_name: str, unique_id: int) -> str:
+    return f"{behavior_name}?agent_id={unique_id}"
+
+
+def _agent_id_to_behavior(agent_id: str) -> str:
+    return agent_id.split("?agent_id=")[0]
+
+
+def _unwrap_batch_steps(batch_steps, behavior_name):
+    decision_batch, termination_batch = batch_steps
+    decision_id = [
+        _behavior_to_agent_id(behavior_name, i) for i in decision_batch.agent_id
+    ]
+    termination_id = [
+        _behavior_to_agent_id(behavior_name, i) for i in termination_batch.agent_id
+    ]
+    agents = decision_id + termination_id
+    obs = {
+        agent_id: [batch_obs[i] for batch_obs in termination_batch.obs]
+        for i, agent_id in enumerate(termination_id)
+    }
+    if decision_batch.action_mask is not None:
+        obs.update(
+            {
+                agent_id: {
+                    "observation": [batch_obs[i] for batch_obs in decision_batch.obs],
+                    "action_mask": [mask[i] for mask in decision_batch.action_mask],
+                }
+                for i, agent_id in enumerate(decision_id)
+            }
+        )
+    else:
+        obs.update(
+            {
+                agent_id: [batch_obs[i] for batch_obs in decision_batch.obs]
+                for i, agent_id in enumerate(decision_id)
+            }
+        )
+    obs = {k: v if len(v) > 1 else v[0] for k, v in obs.items()}
+    dones = {agent_id: True for agent_id in termination_id}
+    dones.update({agent_id: False for agent_id in decision_id})
+    rewards = {
+        agent_id: termination_batch.reward[i]
+        for i, agent_id in enumerate(termination_id)
+    }
+    rewards.update(
+        {agent_id: decision_batch.reward[i] for i, agent_id in enumerate(decision_id)}
+    )
+    cumulative_rewards = {k: v for k, v in rewards.items()}
+    infos = {}
+    for i, agent_id in enumerate(decision_id):
+        infos[agent_id] = {}
+        infos[agent_id]["behavior_name"] = behavior_name
+        infos[agent_id]["group_id"] = decision_batch.group_id[i]
+        infos[agent_id]["group_reward"] = decision_batch.group_reward[i]
+    for i, agent_id in enumerate(termination_id):
+        infos[agent_id] = {}
+        infos[agent_id]["behavior_name"] = behavior_name
+        infos[agent_id]["group_id"] = termination_batch.group_id[i]
+        infos[agent_id]["group_reward"] = termination_batch.group_reward[i]
+        infos[agent_id]["interrupted"] = termination_batch.interrupted[i]
+    id_map = {agent_id: i for i, agent_id in enumerate(decision_id)}
+    return agents, obs, dones, rewards, cumulative_rewards, infos, id_map
+
+
+def _parse_behavior(full_behavior):
+    parsed = urlparse(full_behavior)
+    name = parsed.path
+    ids = parse_qs(parsed.query)
+    team_id: int = 0
+    if "team" in ids:
+        team_id = int(ids["team"][0])
+    return name, team_id
--- a/ml-agents-envs/mlagents_envs/envs/pettingzoo_env_factory.py
+++ b/ml-agents-envs/mlagents_envs/envs/pettingzoo_env_factory.py
@ -0,0 +1,50 @@
+from typing import Optional, Union, List
+
+from mlagents_envs import logging_util
+from mlagents_envs.exception import UnityWorkerInUseException
+from mlagents_envs.registry import default_registry
+from mlagents_envs.side_channel.engine_configuration_channel import (
+    EngineConfigurationChannel,
+)
+from mlagents_envs.side_channel.environment_parameters_channel import (
+    EnvironmentParametersChannel,
+)
+from mlagents_envs.side_channel.stats_side_channel import StatsSideChannel
+from mlagents_envs.envs.unity_aec_env import UnityAECEnv
+
+logger = logging_util.get_logger(__name__)
+
+
+class PettingZooEnvFactory:
+    def __init__(self, env_id: str) -> None:
+        self.env_id = env_id
+
+    def env(
+        self, seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]
+    ) -> UnityAECEnv:
+        """
+        Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
+        :param seed: The seed for the action spaces of the agents.
+        :param kwargs: Any argument accepted by `UnityEnvironment`class except file_name
+        """
+        # If not side_channels specified, add the followings
+        if "side_channels" not in kwargs:
+            kwargs["side_channels"] = [
+                EngineConfigurationChannel(),
+                EnvironmentParametersChannel(),
+                StatsSideChannel(),
+            ]
+        _env = None
+        # If no base port argument is provided, try ports starting at 6000 until one is free
+        if "base_port" not in kwargs:
+            port = 6000
+            while _env is None:
+                try:
+                    kwargs["base_port"] = port
+                    _env = default_registry[self.env_id].make(**kwargs)
+                except UnityWorkerInUseException:
+                    port += 1
+                    pass
+        else:
+            _env = default_registry[self.env_id].make(**kwargs)
+        return UnityAECEnv(_env, seed)
--- a/ml-agents-envs/mlagents_envs/envs/unity_aec_env.py
+++ b/ml-agents-envs/mlagents_envs/envs/unity_aec_env.py
@ -0,0 +1,72 @@
+from typing import Any, Optional
+from gym import error
+from mlagents_envs.base_env import BaseEnv
+from pettingzoo import AECEnv
+
+from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
+
+
+class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv):
+    """
+    Unity AEC (PettingZoo) environment wrapper.
+    """
+
+    def __init__(self, env: BaseEnv, seed: Optional[int] = None):
+        """
+        Initializes a Unity AEC environment wrapper.
+
+        :param env: The UnityEnvironment that is being wrapped.
+        :param seed: The seed for the action spaces of the agents.
+        """
+        super().__init__(env, seed)
+
+    def step(self, action: Any) -> None:
+        """
+        Sets the action of the active agent and get the observation, reward, done
+        and info of the next agent.
+        :param action: The action for the active agent
+        """
+        self._assert_loaded()
+        if len(self._live_agents) <= 0:
+            raise error.Error(
+                "You must reset the environment before you can perform a step"
+            )
+
+        # Process action
+        current_agent = self._agents[self._agent_index]
+        self._process_action(current_agent, action)
+
+        self._agent_index += 1
+        # Reset reward
+        for k in self._rewards.keys():
+            self._rewards[k] = 0
+
+        if self._agent_index >= len(self._agents) and self.num_agents > 0:
+            # The index is too high, time to set the action for the agents we have
+            self._step()
+            self._live_agents.sort()  # unnecessary, only for passing API test
+
+    def observe(self, agent_id):
+        """
+        Returns the observation an agent currently can make. `last()` calls this function.
+        """
+        return (
+            self._observations[agent_id],
+            self._cumm_rewards[agent_id],
+            self._dones[agent_id],
+            self._infos[agent_id],
+        )
+
+    def last(self, observe=True):
+        """
+        returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
+        """
+        obs, reward, done, info = self.observe(self._agents[self._agent_index])
+        return obs if observe else None, reward, done, info
+
+    @property
+    def agent_selection(self):
+        if not self._live_agents:
+            # If we had an agent finish then return that agent even though it isn't alive.
+            return self._agents[0]
+        return self._agents[self._agent_index]
--- a/ml-agents-envs/mlagents_envs/envs/unity_gym_env.py
+++ b/ml-agents-envs/mlagents_envs/envs/unity_gym_env.py
@ -19,8 +19,6 @@ class UnityGymException(error.Error):


 logger = logging_util.get_logger(__name__)
-logging_util.set_log_level(logging_util.INFO)
-
 GymStepResult = Tuple[np.ndarray, float, bool, Dict]


@ -58,7 +56,7 @@ class UnityToGymWrapper(gym.Env):
        self.visual_obs = None

        # Save the step result from the last time all Agents requested decisions.
-        self._previous_decision_step: DecisionSteps = None
+        self._previous_decision_step: Optional[DecisionSteps] = None
        self._flattener = None
        # Hidden flag used by Atari environments to determine if the game is over
        self.game_over = False
@ -355,7 +353,7 @@ class ActionFlattener:
    def lookup_action(self, action):
        """
        Convert a scalar discrete action into a unique set of branched actions.
-        :param: action: A scalar value representing one of the discrete actions.
-        :return: The List containing the branched actions.
+        :param action: A scalar value representing one of the discrete actions.
+        :returns: The List containing the branched actions.
        """
        return self.action_lookup[action]
--- a/ml-agents-envs/mlagents_envs/envs/unity_parallel_env.py
+++ b/ml-agents-envs/mlagents_envs/envs/unity_parallel_env.py
@ -0,0 +1,53 @@
+from typing import Optional, Dict, Any, Tuple
+from gym import error
+from mlagents_envs.base_env import BaseEnv
+from pettingzoo import ParallelEnv
+
+from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
+
+
+class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv):
+    """
+    Unity Parallel (PettingZoo) environment wrapper.
+    """
+
+    def __init__(self, env: BaseEnv, seed: Optional[int] = None):
+        """
+        Initializes a Unity Parallel environment wrapper.
+
+        :param env: The UnityEnvironment that is being wrapped.
+        :param seed: The seed for the action spaces of the agents.
+        """
+        super().__init__(env, seed)
+
+    def reset(self) -> Dict[str, Any]:
+        """
+        Resets the environment.
+        """
+        super().reset()
+
+        return self._observations
+
+    def step(self, actions: Dict[str, Any]) -> Tuple:
+        self._assert_loaded()
+        if len(self._live_agents) <= 0 and actions:
+            raise error.Error(
+                "You must reset the environment before you can perform a step."
+            )
+
+        # Process actions
+        for current_agent, action in actions.items():
+            self._process_action(current_agent, action)
+
+        # Reset reward
+        for k in self._rewards.keys():
+            self._rewards[k] = 0
+
+        # Step environment
+        self._step()
+
+        # Agent cleanup and sorting
+        self._cleanup_agents()
+        self._live_agents.sort()  # unnecessary, only for passing API test
+
+        return self._observations, self._rewards, self._dones, self._infos
--- a/ml-agents-envs/mlagents_envs/envs/unity_pettingzoo_base_env.py
+++ b/ml-agents-envs/mlagents_envs/envs/unity_pettingzoo_base_env.py
@ -0,0 +1,317 @@
+import atexit
+from typing import Optional, List, Set, Dict, Any, Tuple
+import numpy as np
+from gym import error, spaces
+from mlagents_envs.base_env import BaseEnv, ActionTuple
+from mlagents_envs.envs.env_helpers import _agent_id_to_behavior, _unwrap_batch_steps
+
+
+class UnityPettingzooBaseEnv:
+    """
+    Unity Petting Zoo base environment.
+    """
+
+    def __init__(
+        self, env: BaseEnv, seed: Optional[int] = None, metadata: Optional[dict] = None
+    ):
+        super().__init__()
+        atexit.register(self.close)
+        self._env = env
+        self.metadata = metadata
+        self._assert_loaded()
+
+        self._agent_index = 0
+        self._seed = seed
+        self._side_channel_dict = {
+            type(v).__name__: v
+            for v in self._env._side_channel_manager._side_channels_dict.values()  # type: ignore
+        }
+
+        self._live_agents: List[str] = []  # agent id for agents alive
+        self._agents: List[str] = []  # all agent id in current step
+        self._possible_agents: Set[str] = set()  # all agents that have ever appear
+        self._agent_id_to_index: Dict[str, int] = {}  # agent_id: index in decision step
+        self._observations: Dict[str, np.ndarray] = {}  # agent_id: obs
+        self._dones: Dict[str, bool] = {}  # agent_id: done
+        self._rewards: Dict[str, float] = {}  # agent_id: reward
+        self._cumm_rewards: Dict[str, float] = {}  # agent_id: reward
+        self._infos: Dict[str, Dict] = {}  # agent_id: info
+        self._action_spaces: Dict[str, spaces.Space] = {}  # behavior_name: action_space
+        self._observation_spaces: Dict[
+            str, spaces.Space
+        ] = {}  # behavior_name: obs_space
+        self._current_action: Dict[str, ActionTuple] = {}  # behavior_name: ActionTuple
+        # Take a single step so that the brain information will be sent over
+        if not self._env.behavior_specs:
+            self._env.step()
+            for behavior_name in self._env.behavior_specs.keys():
+                _, _, _ = self._batch_update(behavior_name)
+        self._update_observation_spaces()
+        self._update_action_spaces()
+
+    def _assert_loaded(self) -> None:
+        if self._env is None:
+            raise error.Error("No environment loaded")
+
+    @property
+    def observation_spaces(self) -> Dict[str, spaces.Space]:
+        """
+        Return the observation spaces of all the agents.
+        """
+        return {
+            agent_id: self._observation_spaces[_agent_id_to_behavior(agent_id)]
+            for agent_id in self._possible_agents
+        }
+
+    def observation_space(self, agent: str) -> Optional[spaces.Space]:
+        """
+        The observation space of the current agent.
+        """
+        behavior_name = _agent_id_to_behavior(agent)
+        return self._observation_spaces[behavior_name]
+
+    def _update_observation_spaces(self) -> None:
+        self._assert_loaded()
+        for behavior_name in self._env.behavior_specs.keys():
+            if behavior_name not in self._observation_spaces:
+                obs_spec = self._env.behavior_specs[behavior_name].observation_specs
+                obs_spaces = tuple(
+                    spaces.Box(
+                        low=-np.float32(np.inf),
+                        high=np.float32(np.inf),
+                        shape=spec.shape,
+                        dtype=np.float32,
+                    )
+                    for spec in obs_spec
+                )
+                if len(obs_spaces) == 1:
+                    self._observation_spaces[behavior_name] = obs_spaces[0]
+                else:
+                    self._observation_spaces[behavior_name] = spaces.Tuple(obs_spaces)
+
+    @property
+    def action_spaces(self) -> Dict[str, spaces.Space]:
+        """
+        Return the action spaces of all the agents.
+        """
+        return {
+            agent_id: self._action_spaces[_agent_id_to_behavior(agent_id)]
+            for agent_id in self._possible_agents
+        }
+
+    def action_space(self, agent: str) -> Optional[spaces.Space]:
+        """
+        The action space of the current agent.
+        """
+        behavior_name = _agent_id_to_behavior(agent)
+        return self._action_spaces[behavior_name]
+
+    def _update_action_spaces(self) -> None:
+        self._assert_loaded()
+        for behavior_name in self._env.behavior_specs.keys():
+            if behavior_name not in self._action_spaces:
+                act_spec = self._env.behavior_specs[behavior_name].action_spec
+                if (
+                    act_spec.continuous_size == 0
+                    and len(act_spec.discrete_branches) == 0
+                ):
+                    raise error.Error("No actions found")
+                if act_spec.discrete_size == 1:
+                    d_space = spaces.Discrete(act_spec.discrete_branches[0])
+                    if self._seed is not None:
+                        d_space.seed(self._seed)
+                    if act_spec.continuous_size == 0:
+                        self._action_spaces[behavior_name] = d_space
+                        continue
+                if act_spec.discrete_size > 0:
+                    d_space = spaces.MultiDiscrete(act_spec.discrete_branches)
+                    if self._seed is not None:
+                        d_space.seed(self._seed)
+                    if act_spec.continuous_size == 0:
+                        self._action_spaces[behavior_name] = d_space
+                        continue
+                if act_spec.continuous_size > 0:
+                    c_space = spaces.Box(
+                        -1, 1, (act_spec.continuous_size,), dtype=np.int32
+                    )
+                    if self._seed is not None:
+                        c_space.seed(self._seed)
+                    if len(act_spec.discrete_branches) == 0:
+                        self._action_spaces[behavior_name] = c_space
+                        continue
+                self._action_spaces[behavior_name] = spaces.Tuple((c_space, d_space))
+
+    def _process_action(self, current_agent, action):
+        current_action_space = self.action_space(current_agent)
+        # Convert actions
+        if action is not None:
+            if isinstance(action, Tuple):
+                action = tuple(np.array(a) for a in action)
+            else:
+                action = self._action_to_np(current_action_space, action)
+            if not current_action_space.contains(action):  # type: ignore
+                raise error.Error(
+                    f"Invalid action, got {action} but was expecting action from {self.action_space}"
+                )
+            if isinstance(current_action_space, spaces.Tuple):
+                action = ActionTuple(action[0], action[1])
+            elif isinstance(current_action_space, spaces.MultiDiscrete):
+                action = ActionTuple(None, action)
+            elif isinstance(current_action_space, spaces.Discrete):
+                action = ActionTuple(None, np.array(action).reshape(1, 1))
+            else:
+                action = ActionTuple(action, None)
+
+        if not self._dones[current_agent]:
+            current_behavior = _agent_id_to_behavior(current_agent)
+            current_index = self._agent_id_to_index[current_agent]
+            if action.continuous is not None:
+                self._current_action[current_behavior].continuous[
+                    current_index
+                ] = action.continuous[0]
+            if action.discrete is not None:
+                self._current_action[current_behavior].discrete[
+                    current_index
+                ] = action.discrete[0]
+        else:
+            self._live_agents.remove(current_agent)
+            del self._observations[current_agent]
+            del self._dones[current_agent]
+            del self._rewards[current_agent]
+            del self._cumm_rewards[current_agent]
+            del self._infos[current_agent]
+
+    def _step(self):
+        for behavior_name, actions in self._current_action.items():
+            self._env.set_actions(behavior_name, actions)
+        self._env.step()
+        self._reset_states()
+        for behavior_name in self._env.behavior_specs.keys():
+            dones, rewards, cumulative_rewards = self._batch_update(behavior_name)
+            self._dones.update(dones)
+            self._rewards.update(rewards)
+            self._cumm_rewards.update(cumulative_rewards)
+        self._agent_index = 0
+
+    def _cleanup_agents(self):
+        for current_agent, done in self.dones.items():
+            if done:
+                self._live_agents.remove(current_agent)
+
+    @property
+    def side_channel(self) -> Dict[str, Any]:
+        """
+        The side channels of the environment. You can access the side channels
+        of an environment with `env.side_channel[<name-of-channel>]`.
+        """
+        self._assert_loaded()
+        return self._side_channel_dict
+
+    @staticmethod
+    def _action_to_np(current_action_space, action):
+        return np.array(action, dtype=current_action_space.dtype)
+
+    def _create_empty_actions(self, behavior_name, num_agents):
+        a_spec = self._env.behavior_specs[behavior_name].action_spec
+        return ActionTuple(
+            np.zeros((num_agents, a_spec.continuous_size), dtype=np.float32),
+            np.zeros((num_agents, len(a_spec.discrete_branches)), dtype=np.int32),
+        )
+
+    @property
+    def _cumulative_rewards(self):
+        return self._cumm_rewards
+
+    def _reset_states(self):
+        self._live_agents = []
+        self._agents = []
+        self._observations = {}
+        self._dones = {}
+        self._rewards = {}
+        self._cumm_rewards = {}
+        self._infos = {}
+        self._agent_id_to_index = {}
+
+    def reset(self):
+        """
+        Resets the environment.
+        """
+        self._assert_loaded()
+        self._agent_index = 0
+        self._reset_states()
+        self._possible_agents = set()
+        self._env.reset()
+        for behavior_name in self._env.behavior_specs.keys():
+            _, _, _ = self._batch_update(behavior_name)
+        self._live_agents.sort()  # unnecessary, only for passing API test
+        self._dones = {agent: False for agent in self._agents}
+        self._rewards = {agent: 0 for agent in self._agents}
+        self._cumm_rewards = {agent: 0 for agent in self._agents}
+
+    def _batch_update(self, behavior_name):
+        current_batch = self._env.get_steps(behavior_name)
+        self._current_action[behavior_name] = self._create_empty_actions(
+            behavior_name, len(current_batch[0])
+        )
+        agents, obs, dones, rewards, cumulative_rewards, infos, id_map = _unwrap_batch_steps(
+            current_batch, behavior_name
+        )
+        self._live_agents += agents
+        self._agents += agents
+        self._observations.update(obs)
+        self._infos.update(infos)
+        self._agent_id_to_index.update(id_map)
+        self._possible_agents.update(agents)
+        return dones, rewards, cumulative_rewards
+
+    def seed(self, seed=None):
+        """
+        Reseeds the environment (making the resulting environment deterministic).
+        `reset()` must be called after `seed()`, and before `step()`.
+        """
+        self._seed = seed
+
+    def render(self, mode="human"):
+        """
+        NOT SUPPORTED.
+
+        Displays a rendered frame from the environment, if supported.
+        Alternate render modes in the default environments are `'rgb_array'`
+        which returns a numpy array and is supported by all environments outside of classic,
+        and `'ansi'` which returns the strings printed (specific to classic environments).
+        """
+        pass
+
+    @property
+    def dones(self):
+        return dict(self._dones)
+
+    @property
+    def agents(self):
+        return sorted(self._live_agents)
+
+    @property
+    def rewards(self):
+        return dict(self._rewards)
+
+    @property
+    def infos(self):
+        return dict(self._infos)
+
+    @property
+    def possible_agents(self):
+        return sorted(self._possible_agents)
+
+    def close(self) -> None:
+        """
+        Close the environment.
+        """
+        if self._env is not None:
+            self._env.close()
+            self._env = None  # type: ignore
+
+    def __del__(self) -> None:
+        self.close()
+
+    def state(self):
+        pass
--- a/ml-agents-envs/mlagents_envs/tests/init.py
+++ b/ml-agents-envs/mlagents_envs/tests/init.py
--- a/ml-agents-envs/pydoc-config.yaml
+++ b/ml-agents-envs/pydoc-config.yaml
@ -2,7 +2,18 @@
 folder: docs
 modules:
    - name: mlagents_envs
-      file_name: Python-API-Documentation.md
+      file_name: Python-Gym-API-Documentation.md
+      submodules:
+        - envs.unity_gym_env
+    - name: mlagents_envs
+      file_name: Python-PettingZoo-API-Documentation.md
+      submodules:
+          - envs.pettingzoo_env_factory
+          - envs.unity_aec_env
+          - envs.unity_parallel_env
+          - envs.unity_pettingzoo_base_env
+    - name: mlagents_envs
+      file_name: Python-LLAPI-Documentation.md
      submodules:
        - base_env
        - environment
--- a/ml-agents-envs/setup.py
+++ b/ml-agents-envs/setup.py
@ -43,7 +43,9 @@ setup(
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
    ],
-    packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
+    packages=find_packages(
+        exclude=["*.tests", "*.tests.*", "tests.*", "tests", "colabs", "*.ipynb"]
+    ),
    zip_safe=False,
    install_requires=[
        "cloudpickle",
@ -52,6 +54,9 @@ setup(
        "Pillow>=4.2.1",
        "protobuf>=3.6",
        "pyyaml>=3.1.0",
+        "gym==0.21.0",
+        "pettingzoo==1.14.0",
+        "numpy==1.21.2",
    ],
    python_requires=">=3.7.2",
    cmdclass={"verify": VerifyVersionCommand},
--- a/ml-agents-envs/tests/dummy_config.py
+++ b/ml-agents-envs/tests/dummy_config.py
@ -0,0 +1,111 @@
+from typing import List, Tuple
+from mlagents_envs.base_env import ObservationSpec, DimensionProperty, ObservationType
+import pytest
+import copy
+import os
+from mlagents.trainers.settings import (
+    POCASettings,
+    TrainerSettings,
+    PPOSettings,
+    SACSettings,
+    GAILSettings,
+    CuriositySettings,
+    RewardSignalSettings,
+    NetworkSettings,
+    TrainerType,
+    RewardSignalType,
+    ScheduleType,
+)
+
+CONTINUOUS_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/test.demo"
+DISCRETE_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/testdcvis.demo"
+
+_PPO_CONFIG = TrainerSettings(
+    trainer_type=TrainerType.PPO,
+    hyperparameters=PPOSettings(
+        learning_rate=5.0e-3,
+        learning_rate_schedule=ScheduleType.CONSTANT,
+        batch_size=16,
+        buffer_size=64,
+    ),
+    network_settings=NetworkSettings(num_layers=1, hidden_units=32),
+    summary_freq=500,
+    max_steps=3000,
+    threaded=False,
+)
+
+_SAC_CONFIG = TrainerSettings(
+    trainer_type=TrainerType.SAC,
+    hyperparameters=SACSettings(
+        learning_rate=5.0e-3,
+        learning_rate_schedule=ScheduleType.CONSTANT,
+        batch_size=8,
+        buffer_init_steps=100,
+        buffer_size=5000,
+        tau=0.01,
+        init_entcoef=0.01,
+    ),
+    network_settings=NetworkSettings(num_layers=1, hidden_units=16),
+    summary_freq=100,
+    max_steps=1000,
+    threaded=False,
+)
+
+_POCA_CONFIG = TrainerSettings(
+    trainer_type=TrainerType.POCA,
+    hyperparameters=POCASettings(
+        learning_rate=5.0e-3,
+        learning_rate_schedule=ScheduleType.CONSTANT,
+        batch_size=16,
+        buffer_size=64,
+    ),
+    network_settings=NetworkSettings(num_layers=1, hidden_units=32),
+    summary_freq=500,
+    max_steps=3000,
+    threaded=False,
+)
+
+
+def ppo_dummy_config():
+    return copy.deepcopy(_PPO_CONFIG)
+
+
+def sac_dummy_config():
+    return copy.deepcopy(_SAC_CONFIG)
+
+
+def poca_dummy_config():
+    return copy.deepcopy(_POCA_CONFIG)
+
+
+@pytest.fixture
+def gail_dummy_config():
+    return {RewardSignalType.GAIL: GAILSettings(demo_path=CONTINUOUS_DEMO_PATH)}
+
+
+@pytest.fixture
+def curiosity_dummy_config():
+    return {RewardSignalType.CURIOSITY: CuriositySettings()}
+
+
+@pytest.fixture
+def extrinsic_dummy_config():
+    return {RewardSignalType.EXTRINSIC: RewardSignalSettings()}
+
+
+def create_observation_specs_with_shapes(
+    shapes: List[Tuple[int, ...]]
+) -> List[ObservationSpec]:
+    obs_specs: List[ObservationSpec] = []
+    for i, shape in enumerate(shapes):
+        dim_prop = (DimensionProperty.UNSPECIFIED,) * len(shape)
+        if len(shape) == 2:
+            dim_prop = (DimensionProperty.VARIABLE_SIZE, DimensionProperty.NONE)
+        spec = ObservationSpec(
+            name=f"observation {i} with shape {shape}",
+            shape=shape,
+            dimension_property=dim_prop,
+            observation_type=ObservationType.DEFAULT,
+        )
+        obs_specs.append(spec)
+    return obs_specs
--- a/ml-agents-envs/tests/simple_test_envs.py
+++ b/ml-agents-envs/tests/simple_test_envs.py
@ -0,0 +1,510 @@
+"""
+Copied from ml-agents/mlagents/trainers/tests/simple_test_envs.py
+
+Modified the env so that it doesn't automatically reset and respawn agent in order to pass
+pettingzoo api tests, since current PZ api test doesn't allow spawning new agents.
+"""
+
+import random
+from typing import Dict, List, Any, Tuple
+import numpy as np
+
+from mlagents_envs.base_env import (
+    ActionSpec,
+    ObservationSpec,
+    ObservationType,
+    ActionTuple,
+    BaseEnv,
+    BehaviorSpec,
+    DecisionSteps,
+    TerminalSteps,
+    BehaviorMapping,
+)
+from mlagents_envs.side_channel.side_channel_manager import SideChannelManager
+from dummy_config import create_observation_specs_with_shapes
+
+OBS_SIZE = 1
+VIS_OBS_SIZE = (20, 20, 3)
+VAR_LEN_SIZE = (10, 5)
+STEP_SIZE = 0.2
+
+TIME_PENALTY = 0.01
+MIN_STEPS = int(1.0 / STEP_SIZE) + 1
+SUCCESS_REWARD = 1.0 + MIN_STEPS * TIME_PENALTY
+
+
+def clamp(x, min_val, max_val):
+    return max(min_val, min(x, max_val))
+
+
+class SimpleEnvironment(BaseEnv):
+    """
+    Very simple "game" - the agent has a position on [-1, 1], gets a reward of 1 if it reaches 1, and a reward of -1 if
+    it reaches -1. The position is incremented by the action amount (clamped to [-step_size, step_size]).
+    """
+
+    def __init__(
+        self,
+        brain_names,
+        step_size=STEP_SIZE,
+        num_visual=0,
+        num_vector=1,
+        num_var_len=0,
+        vis_obs_size=VIS_OBS_SIZE,
+        vec_obs_size=OBS_SIZE,
+        var_len_obs_size=VAR_LEN_SIZE,
+        action_sizes=(1, 0),
+        goal_indices=None,
+    ):
+        super().__init__()
+        self.num_visual = num_visual
+        self.num_vector = num_vector
+        self.num_var_len = num_var_len
+        self.vis_obs_size = vis_obs_size
+        self.vec_obs_size = vec_obs_size
+        self.var_len_obs_size = var_len_obs_size
+        self.goal_indices = goal_indices
+        continuous_action_size, discrete_action_size = action_sizes
+        discrete_tuple = tuple(2 for _ in range(discrete_action_size))
+        action_spec = ActionSpec(continuous_action_size, discrete_tuple)
+        self.total_action_size = (
+            continuous_action_size + discrete_action_size
+        )  # to set the goals/positions
+        self.action_spec = action_spec
+        self.behavior_spec = BehaviorSpec(self._make_observation_specs(), action_spec)
+        self.action_spec = action_spec
+        self.names = brain_names
+        self.positions: Dict[str, List[float]] = {}
+        self.step_count: Dict[str, float] = {}
+        self._side_channel_manager = SideChannelManager([])
+
+        # Concatenate the arguments for a consistent random seed
+        seed = (
+            brain_names,
+            step_size,
+            num_visual,
+            num_vector,
+            num_var_len,
+            vis_obs_size,
+            vec_obs_size,
+            var_len_obs_size,
+            action_sizes,
+        )
+        self.random = random.Random(str(seed))
+
+        self.goal: Dict[str, int] = {}
+        self.action = {}
+        self.rewards: Dict[str, float] = {}
+        self.final_rewards: Dict[str, List[float]] = {}
+        self.step_result: Dict[str, Tuple[DecisionSteps, TerminalSteps]] = {}
+        self.agent_id: Dict[str, int] = {}
+        self.step_size = step_size  # defines the difficulty of the test
+        # Allow to be used as a UnityEnvironment during tests
+        self.academy_capabilities = None
+
+        for name in self.names:
+            self.agent_id[name] = 0
+            self.goal[name] = self.random.choice([-1, 1])
+            self.rewards[name] = 0
+            self.final_rewards[name] = []
+            self._reset_agent(name)
+            self.action[name] = None
+            self.step_result[name] = None
+
+    def _make_observation_specs(self) -> List[ObservationSpec]:
+        obs_shape: List[Any] = []
+        for _ in range(self.num_vector):
+            obs_shape.append((self.vec_obs_size,))
+        for _ in range(self.num_visual):
+            obs_shape.append(self.vis_obs_size)
+        for _ in range(self.num_var_len):
+            obs_shape.append(self.var_len_obs_size)
+        obs_spec = create_observation_specs_with_shapes(obs_shape)
+        if self.goal_indices is not None:
+            for i in range(len(obs_spec)):
+                if i in self.goal_indices:
+                    obs_spec[i] = ObservationSpec(
+                        shape=obs_spec[i].shape,
+                        dimension_property=obs_spec[i].dimension_property,
+                        observation_type=ObservationType.GOAL_SIGNAL,
+                        name=obs_spec[i].name,
+                    )
+        return obs_spec
+
+    def _make_obs(self, value: float) -> List[np.ndarray]:
+        obs = []
+        for _ in range(self.num_vector):
+            obs.append(np.ones((1, self.vec_obs_size), dtype=np.float32) * value)
+        for _ in range(self.num_visual):
+            obs.append(np.ones((1,) + self.vis_obs_size, dtype=np.float32) * value)
+        for _ in range(self.num_var_len):
+            obs.append(np.ones((1,) + self.var_len_obs_size, dtype=np.float32) * value)
+        return obs
+
+    @property
+    def behavior_specs(self):
+        behavior_dict = {}
+        for n in self.names:
+            behavior_dict[n] = self.behavior_spec
+        return BehaviorMapping(behavior_dict)
+
+    def set_action_for_agent(self, behavior_name, agent_id, action):
+        pass
+
+    def set_actions(self, behavior_name, action):
+        self.action[behavior_name] = action
+
+    def get_steps(self, behavior_name):
+        return self.step_result[behavior_name]
+
+    def _take_action(self, name: str) -> bool:
+        deltas = []
+        _act = self.action[name]
+        if self.action_spec.continuous_size > 0 and not _act:
+            for _cont in _act.continuous[0]:
+                deltas.append(_cont)
+        if self.action_spec.discrete_size > 0 and not _act:
+            for _disc in _act.discrete[0]:
+                deltas.append(1 if _disc else -1)
+        for i, _delta in enumerate(deltas):
+            _delta = clamp(_delta, -self.step_size, self.step_size)
+            self.positions[name][i] += _delta
+            self.positions[name][i] = clamp(self.positions[name][i], -1, 1)
+            self.step_count[name] += 1
+            # Both must be in 1.0 to be done
+        done = all(pos >= 1.0 or pos <= -1.0 for pos in self.positions[name])
+        return done
+
+    def _generate_mask(self):
+        action_mask = None
+        if self.action_spec.discrete_size > 0:
+            # LL-Python API will return an empty dim if there is only 1 agent.
+            ndmask = np.array(
+                2 * self.action_spec.discrete_size * [False], dtype=np.bool
+            )
+            ndmask = np.expand_dims(ndmask, axis=0)
+            action_mask = [ndmask]
+        return action_mask
+
+    def _compute_reward(self, name: str, done: bool) -> float:
+        if done:
+            reward = 0.0
+            for _pos in self.positions[name]:
+                reward += (SUCCESS_REWARD * _pos * self.goal[name]) / len(
+                    self.positions[name]
+                )
+        else:
+            reward = -TIME_PENALTY
+        return reward
+
+    def _reset_agent(self, name):
+        self.goal[name] = self.random.choice([-1, 1])
+        self.positions[name] = [0.0 for _ in range(self.total_action_size)]
+        self.step_count[name] = 0
+        self.rewards[name] = 0
+        self.agent_id[name] = self.agent_id[name] + 1
+
+    def _make_batched_step(
+        self, name: str, done: bool, reward: float, group_reward: float
+    ) -> Tuple[DecisionSteps, TerminalSteps]:
+        m_vector_obs = self._make_obs(self.goal[name])
+        m_reward = np.array([reward], dtype=np.float32)
+        m_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
+        m_group_id = np.array([0], dtype=np.int32)
+        m_group_reward = np.array([group_reward], dtype=np.float32)
+        action_mask = self._generate_mask()
+        decision_step = DecisionSteps(
+            m_vector_obs, m_reward, m_agent_id, action_mask, m_group_id, m_group_reward
+        )
+        terminal_step = TerminalSteps.empty(self.behavior_spec)
+        if done:
+            self.final_rewards[name].append(self.rewards[name])
+            # self._reset_agent(name)
+            # new_vector_obs = self._make_obs(self.goal[name])
+            # (
+            #     new_reward,
+            #     new_done,
+            #     new_agent_id,
+            #     new_action_mask,
+            #     new_group_id,
+            #     new_group_reward,
+            # ) = self._construct_reset_step(name)
+
+            # decision_step = DecisionSteps(
+            #     new_vector_obs,
+            #     new_reward,
+            #     new_agent_id,
+            #     new_action_mask,
+            #     new_group_id,
+            #     new_group_reward,
+            # )
+            decision_step = DecisionSteps([], [], [], [], [], [])
+            terminal_step = TerminalSteps(
+                m_vector_obs,
+                m_reward,
+                np.array([False], dtype=bool),
+                m_agent_id,
+                m_group_id,
+                m_group_reward,
+            )
+        return (decision_step, terminal_step)
+
+    def _construct_reset_step(
+        self, name: str
+    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
+        new_reward = np.array([0.0], dtype=np.float32)
+        new_done = np.array([False], dtype=np.bool)
+        new_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
+        new_action_mask = self._generate_mask()
+        new_group_id = np.array([0], dtype=np.int32)
+        new_group_reward = np.array([0.0], dtype=np.float32)
+        return (
+            new_reward,
+            new_done,
+            new_agent_id,
+            new_action_mask,
+            new_group_id,
+            new_group_reward,
+        )
+
+    def step(self) -> None:
+        assert all(action is not None for action in self.action.values())
+        for name in self.names:
+
+            done = self._take_action(name)
+            reward = self._compute_reward(name, done)
+            self.rewards[name] += reward
+            self.step_result[name] = self._make_batched_step(name, done, reward, 0.0)
+
+    def reset(self) -> None:  # type: ignore
+        for name in self.names:
+            self._reset_agent(name)
+            self.step_result[name] = self._make_batched_step(name, False, 0.0, 0.0)
+
+    @property
+    def reset_parameters(self) -> Dict[str, str]:
+        return {}
+
+    def close(self):
+        pass
+
+
+class MultiAgentEnvironment(BaseEnv):
+    """
+    The MultiAgentEnvironment maintains a list of SimpleEnvironment, one for each agent.
+    When sending DecisionSteps and TerminalSteps to the trainers, it first batches the
+    decision steps from the individual environments. When setting actions, it indexes the
+    batched ActionTuple to obtain the ActionTuple for individual agents
+    """
+
+    def __init__(
+        self,
+        brain_names,
+        step_size=STEP_SIZE,
+        num_visual=0,
+        num_vector=1,
+        num_var_len=0,
+        vis_obs_size=VIS_OBS_SIZE,
+        vec_obs_size=OBS_SIZE,
+        var_len_obs_size=VAR_LEN_SIZE,
+        action_sizes=(1, 0),
+        num_agents=2,
+        goal_indices=None,
+    ):
+        super().__init__()
+        self.envs = {}
+        self.dones = {}
+        self.just_died = set()
+        self.names = brain_names
+        self.final_rewards: Dict[str, List[float]] = {}
+        for name in brain_names:
+            self.final_rewards[name] = []
+            for i in range(num_agents):
+                name_and_num = name + str(i)
+                self.envs[name_and_num] = SimpleEnvironment(
+                    [name],
+                    step_size,
+                    num_visual,
+                    num_vector,
+                    num_var_len,
+                    vis_obs_size,
+                    vec_obs_size,
+                    var_len_obs_size,
+                    action_sizes,
+                    goal_indices,
+                )
+                self.dones[name_and_num] = False
+                self.envs[name_and_num].reset()
+        # All envs have the same behavior spec, so just get the last one.
+        self.behavior_spec = self.envs[name_and_num].behavior_spec
+        self.action_spec = self.envs[name_and_num].action_spec
+        self.num_agents = num_agents
+        self._side_channel_manager = SideChannelManager([])
+
+    @property
+    def all_done(self):
+        return all(self.dones.values())
+
+    @property
+    def behavior_specs(self):
+        behavior_dict = {}
+        for n in self.names:
+            behavior_dict[n] = self.behavior_spec
+        return BehaviorMapping(behavior_dict)
+
+    def set_action_for_agent(self, behavior_name, agent_id, action):
+        pass
+
+    def set_actions(self, behavior_name, action):
+        # The ActionTuple contains the actions for all n_agents. This
+        # slices the ActionTuple into an action tuple for each environment
+        # and sets it. The index j is used to ignore agents that have already
+        # reached done.
+        j = 0
+        for i in range(self.num_agents):
+            _act = ActionTuple()
+            name_and_num = behavior_name + str(i)
+            env = self.envs[name_and_num]
+            if not self.dones[name_and_num]:
+                if self.action_spec.continuous_size > 0:
+                    _act.add_continuous(action.continuous[j : j + 1])
+                if self.action_spec.discrete_size > 0:
+                    _disc_list = [action.discrete[j, :]]
+                    _act.add_discrete(np.array(_disc_list))
+                j += 1
+                env.action[behavior_name] = _act
+
+    def get_steps(self, behavior_name):
+        # This gets the individual DecisionSteps and TerminalSteps
+        # from the envs and merges them into a batch to be sent
+        # to the AgentProcessor.
+        dec_vec_obs = []
+        dec_reward = []
+        dec_group_reward = []
+        dec_agent_id = []
+        dec_group_id = []
+        ter_vec_obs = []
+        ter_reward = []
+        ter_group_reward = []
+        ter_agent_id = []
+        ter_group_id = []
+        interrupted = []
+
+        action_mask = None
+        terminal_step = TerminalSteps.empty(self.behavior_spec)
+        decision_step = None
+        for i in range(self.num_agents):
+            name_and_num = behavior_name + str(i)
+            env = self.envs[name_and_num]
+            _dec, _term = env.step_result[behavior_name]
+            if not self.dones[name_and_num]:
+                dec_agent_id.append(i)
+                dec_group_id.append(1)
+                if len(dec_vec_obs) > 0:
+                    for j, obs in enumerate(_dec.obs):
+                        dec_vec_obs[j] = np.concatenate((dec_vec_obs[j], obs), axis=0)
+                else:
+                    for obs in _dec.obs:
+                        dec_vec_obs.append(obs)
+                dec_reward.append(_dec.reward[0])
+                dec_group_reward.append(_dec.group_reward[0])
+                if _dec.action_mask is not None:
+                    if action_mask is None:
+                        action_mask = []
+                    if len(action_mask) > 0:
+                        action_mask[0] = np.concatenate(
+                            (action_mask[0], _dec.action_mask[0]), axis=0
+                        )
+                    else:
+                        action_mask.append(_dec.action_mask[0])
+            if len(_term.reward) > 0 and name_and_num in self.just_died:
+                ter_agent_id.append(i)
+                ter_group_id.append(1)
+                if len(ter_vec_obs) > 0:
+                    for j, obs in enumerate(_term.obs):
+                        ter_vec_obs[j] = np.concatenate((ter_vec_obs[j], obs), axis=0)
+                else:
+                    for obs in _term.obs:
+                        ter_vec_obs.append(obs)
+                ter_reward.append(_term.reward[0])
+                ter_group_reward.append(_term.group_reward[0])
+                interrupted.append(False)
+                self.just_died.remove(name_and_num)
+        decision_step = DecisionSteps(
+            dec_vec_obs,
+            dec_reward,
+            dec_agent_id,
+            action_mask,
+            dec_group_id,
+            dec_group_reward,
+        )
+        terminal_step = TerminalSteps(
+            ter_vec_obs,
+            ter_reward,
+            interrupted,
+            ter_agent_id,
+            ter_group_id,
+            ter_group_reward,
+        )
+        if self.all_done:
+            decision_step = DecisionSteps([], [], [], [], [], [])
+        return (decision_step, terminal_step)
+
+    def step(self) -> None:
+        # Steps all environments and calls reset if all agents are done.
+        for name in self.names:
+            for i in range(self.num_agents):
+                name_and_num = name + str(i)
+                # Does not step the env if done
+                if not self.dones[name_and_num]:
+                    env = self.envs[name_and_num]
+                    # Reproducing part of env step to intercept Dones
+                    assert all(action is not None for action in env.action.values())
+                    done = env._take_action(name)
+                    reward = env._compute_reward(name, done)
+                    self.dones[name_and_num] = done
+                    if done:
+                        self.just_died.add(name_and_num)
+                    if self.all_done:
+                        env.step_result[name] = env._make_batched_step(
+                            name, done, 0.0, reward
+                        )
+                        self.final_rewards[name].append(reward)
+                        # self.reset()
+                    elif done:
+                        # This agent has finished but others are still running.
+                        # This gives a reward of the time penalty if this agent
+                        # is successful and the negative env reward if it fails.
+                        ceil_reward = min(-TIME_PENALTY, reward)
+                        env.step_result[name] = env._make_batched_step(
+                            name, done, ceil_reward, 0.0
+                        )
+                        self.final_rewards[name].append(reward)
+
+                    else:
+                        env.step_result[name] = env._make_batched_step(
+                            name, done, reward, 0.0
+                        )
+
+    def reset(self) -> None:  # type: ignore
+        for name in self.names:
+            for i in range(self.num_agents):
+                name_and_num = name + str(i)
+                self.dones[name_and_num] = False
+
+        self.dones = {}
+        self.just_died = set()
+        self.final_rewards = {}
+        for name in self.names:
+            self.final_rewards[name] = []
+            for i in range(self.num_agents):
+                name_and_num = name + str(i)
+                self.dones[name_and_num] = False
+                self.envs[name_and_num].reset()
+
+    @property
+    def reset_parameters(self) -> Dict[str, str]:
+        return {}
+
+    def close(self):
+        pass
--- a/ml-agents-envs/mlagents_envs/tests/test_env_utils.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_env_utils.py
--- a/ml-agents-envs/mlagents_envs/tests/test_envs.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_envs.py
--- a/gym-unity/gym_unity/tests/test_gym.py
+++ b/gym-unity/gym_unity/tests/test_gym.py
@ -3,7 +3,8 @@ import pytest
 import numpy as np

 from gym import spaces
-from gym_unity.envs import UnityToGymWrapper
+
+from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
 from mlagents_envs.base_env import (
    BehaviorSpec,
    ActionSpec,
@ -11,7 +12,7 @@ from mlagents_envs.base_env import (
    TerminalSteps,
    BehaviorMapping,
 )
-from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
+from dummy_config import create_observation_specs_with_shapes


 def test_gym_wrapper():
--- a/ml-agents-envs/tests/test_pettingzoo_wrapper.py
+++ b/ml-agents-envs/tests/test_pettingzoo_wrapper.py
@ -0,0 +1,32 @@
+from mlagents_envs.envs.unity_aec_env import UnityAECEnv
+from mlagents_envs.envs.unity_parallel_env import UnityParallelEnv
+from simple_test_envs import SimpleEnvironment, MultiAgentEnvironment
+from pettingzoo.test import api_test, parallel_api_test
+
+NUM_TEST_CYCLES = 100
+
+
+def test_single_agent_aec():
+    unity_env = SimpleEnvironment(["test_single"])
+    env = UnityAECEnv(unity_env)
+    api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
+
+
+def test_multi_agent_aec():
+    unity_env = MultiAgentEnvironment(["test_multi_1", "test_multi_2"], num_agents=2)
+    env = UnityAECEnv(unity_env)
+    api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
+
+
+def test_single_agent_parallel():
+    unity_env = SimpleEnvironment(["test_single"])
+    env = UnityParallelEnv(unity_env)
+    parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)
+
+
+def test_multi_agent_parallel():
+    unity_env = MultiAgentEnvironment(
+        ["test_multi_1", "test_multi_2", "test_multi_3"], num_agents=3
+    )
+    env = UnityParallelEnv(unity_env)
+    parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)
--- a/ml-agents-envs/mlagents_envs/tests/test_registry.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_registry.py
--- a/ml-agents-envs/mlagents_envs/tests/test_rpc_communicator.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_rpc_communicator.py
--- a/ml-agents-envs/tests/test_rpc_utils.py
+++ b/ml-agents-envs/tests/test_rpc_utils.py
@ -0,0 +1,503 @@
+import io
+import numpy as np
+import pytest
+from typing import List, Tuple, Any
+
+from mlagents_envs.communicator_objects.agent_info_pb2 import AgentInfoProto
+from mlagents_envs.communicator_objects.observation_pb2 import (
+    ObservationProto,
+    NONE,
+    PNG,
+)
+from mlagents_envs.communicator_objects.brain_parameters_pb2 import BrainParametersProto
+from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
+    AgentInfoActionPairProto,
+)
+from mlagents_envs.communicator_objects.agent_action_pb2 import AgentActionProto
+from mlagents_envs.base_env import (
+    BehaviorSpec,
+    ActionSpec,
+    DecisionSteps,
+    TerminalSteps,
+)
+from mlagents_envs.exception import UnityObservationException
+from mlagents_envs.rpc_utils import (
+    behavior_spec_from_proto,
+    process_pixels,
+    _process_maybe_compressed_observation,
+    _process_rank_one_or_two_observation,
+    steps_from_proto,
+)
+from PIL import Image
+from dummy_config import create_observation_specs_with_shapes
+
+
+def generate_list_agent_proto(
+    n_agent: int,
+    shape: List[Tuple[int]],
+    infinite_rewards: bool = False,
+    nan_observations: bool = False,
+) -> List[AgentInfoProto]:
+    result = []
+    for agent_index in range(n_agent):
+        ap = AgentInfoProto()
+        ap.reward = float("inf") if infinite_rewards else agent_index
+        ap.done = agent_index % 2 == 0
+        ap.max_step_reached = agent_index % 4 == 0
+        ap.id = agent_index
+        ap.action_mask.extend([True, False] * 5)
+        obs_proto_list = []
+        for obs_index in range(len(shape)):
+            obs_proto = ObservationProto()
+            obs_proto.shape.extend(list(shape[obs_index]))
+            obs_proto.compression_type = NONE
+            obs_proto.float_data.data.extend(
+                ([float("nan")] if nan_observations else [0.1])
+                * np.prod(shape[obs_index])
+            )
+            obs_proto_list.append(obs_proto)
+        ap.observations.extend(obs_proto_list)
+        result.append(ap)
+    return result
+
+
+def generate_compressed_data(in_array: np.ndarray) -> bytes:
+    image_arr = (in_array * 255).astype(np.uint8)
+    bytes_out = bytes()
+
+    num_channels = in_array.shape[2]
+    num_images = (num_channels + 2) // 3
+    # Split the input image into batches of 3 channels.
+    for i in range(num_images):
+        sub_image = image_arr[..., 3 * i : 3 * i + 3]
+        if (i == num_images - 1) and (num_channels % 3) != 0:
+            # Pad zeros
+            zero_shape = list(in_array.shape)
+            zero_shape[2] = 3 - (num_channels % 3)
+            z = np.zeros(zero_shape, dtype=np.uint8)
+            sub_image = np.concatenate([sub_image, z], axis=2)
+        im = Image.fromarray(sub_image, "RGB")
+        byteIO = io.BytesIO()
+        im.save(byteIO, format="PNG")
+        bytes_out += byteIO.getvalue()
+    return bytes_out
+
+
+# test helper function for old C# API (no compressed channel mapping)
+def generate_compressed_proto_obs(
+    in_array: np.ndarray, grayscale: bool = False
+) -> ObservationProto:
+    obs_proto = ObservationProto()
+    obs_proto.compressed_data = generate_compressed_data(in_array)
+    obs_proto.compression_type = PNG
+    if grayscale:
+        # grayscale flag is only used for old API without mapping
+        expected_shape = [in_array.shape[0], in_array.shape[1], 1]
+        obs_proto.shape.extend(expected_shape)
+    else:
+        obs_proto.shape.extend(in_array.shape)
+    return obs_proto
+
+
+# test helper function for new C# API (with compressed channel mapping)
+def generate_compressed_proto_obs_with_mapping(
+    in_array: np.ndarray, mapping: List[int]
+) -> ObservationProto:
+    obs_proto = ObservationProto()
+    obs_proto.compressed_data = generate_compressed_data(in_array)
+    obs_proto.compression_type = PNG
+    if mapping is not None:
+        obs_proto.compressed_channel_mapping.extend(mapping)
+        expected_shape = [
+            in_array.shape[0],
+            in_array.shape[1],
+            len({m for m in mapping if m >= 0}),
+        ]
+        obs_proto.shape.extend(expected_shape)
+    else:
+        obs_proto.shape.extend(in_array.shape)
+    return obs_proto
+
+
+def generate_uncompressed_proto_obs(in_array: np.ndarray) -> ObservationProto:
+    obs_proto = ObservationProto()
+    obs_proto.float_data.data.extend(in_array.flatten().tolist())
+    obs_proto.compression_type = NONE
+    obs_proto.shape.extend(in_array.shape)
+    return obs_proto
+
+
+def proto_from_steps(
+    decision_steps: DecisionSteps, terminal_steps: TerminalSteps
+) -> List[AgentInfoProto]:
+    agent_info_protos: List[AgentInfoProto] = []
+    # Take care of the DecisionSteps first
+    for agent_id in decision_steps.agent_id:
+        agent_id_index = decision_steps.agent_id_to_index[agent_id]
+        reward = decision_steps.reward[agent_id_index]
+        done = False
+        max_step_reached = False
+        agent_mask: Any = None
+        if decision_steps.action_mask is not None:
+            agent_mask = []
+            for _branch in decision_steps.action_mask:
+                agent_mask = np.concatenate(
+                    (agent_mask, _branch[agent_id_index, :]), axis=0
+                )
+            agent_mask = agent_mask.astype(np.bool).tolist()
+        observations: List[ObservationProto] = []
+        for all_observations_of_type in decision_steps.obs:
+            observation = all_observations_of_type[agent_id_index]
+            if len(observation.shape) == 3:
+                observations.append(generate_uncompressed_proto_obs(observation))
+            else:
+                observations.append(
+                    ObservationProto(
+                        float_data=ObservationProto.FloatData(data=observation),
+                        shape=[len(observation)],
+                        compression_type=NONE,
+                    )
+                )
+        agent_info_proto = AgentInfoProto(
+            reward=reward,
+            done=done,
+            id=agent_id,
+            max_step_reached=bool(max_step_reached),
+            action_mask=agent_mask,
+            observations=observations,
+        )
+        agent_info_protos.append(agent_info_proto)
+    # Take care of the TerminalSteps second
+    for agent_id in terminal_steps.agent_id:
+        agent_id_index = terminal_steps.agent_id_to_index[agent_id]
+        reward = terminal_steps.reward[agent_id_index]
+        done = True
+        max_step_reached = terminal_steps.interrupted[agent_id_index]
+
+        final_observations: List[ObservationProto] = []
+        for all_observations_of_type in terminal_steps.obs:
+            observation = all_observations_of_type[agent_id_index]
+            if len(observation.shape) == 3:
+                final_observations.append(generate_uncompressed_proto_obs(observation))
+            else:
+                final_observations.append(
+                    ObservationProto(
+                        float_data=ObservationProto.FloatData(data=observation),
+                        shape=[len(observation)],
+                        compression_type=NONE,
+                    )
+                )
+        agent_info_proto = AgentInfoProto(
+            reward=reward,
+            done=done,
+            id=agent_id,
+            max_step_reached=bool(max_step_reached),
+            action_mask=None,
+            observations=final_observations,
+        )
+        agent_info_protos.append(agent_info_proto)
+
+    return agent_info_protos
+
+
+# The arguments here are the DecisionSteps, TerminalSteps and continuous/discrete actions for a single agent name
+def proto_from_steps_and_action(
+    decision_steps: DecisionSteps,
+    terminal_steps: TerminalSteps,
+    continuous_actions: np.ndarray,
+    discrete_actions: np.ndarray,
+) -> List[AgentInfoActionPairProto]:
+    agent_info_protos = proto_from_steps(decision_steps, terminal_steps)
+    agent_action_protos = []
+    num_agents = (
+        len(continuous_actions)
+        if continuous_actions is not None
+        else len(discrete_actions)
+    )
+    for i in range(num_agents):
+        proto = AgentActionProto()
+        if continuous_actions is not None:
+            proto.continuous_actions.extend(continuous_actions[i])
+            proto.vector_actions_deprecated.extend(continuous_actions[i])
+        if discrete_actions is not None:
+            proto.discrete_actions.extend(discrete_actions[i])
+            proto.vector_actions_deprecated.extend(discrete_actions[i])
+        agent_action_protos.append(proto)
+    agent_info_action_pair_protos = [
+        AgentInfoActionPairProto(agent_info=agent_info_proto, action_info=action_proto)
+        for agent_info_proto, action_proto in zip(
+            agent_info_protos, agent_action_protos
+        )
+    ]
+    return agent_info_action_pair_protos
+
+
+def test_process_pixels():
+    in_array = np.random.rand(128, 64, 3)
+    byte_arr = generate_compressed_data(in_array)
+    out_array = process_pixels(byte_arr, 3)
+    assert out_array.shape == (128, 64, 3)
+    assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
+    assert np.allclose(in_array, out_array, atol=0.01)
+
+
+def test_process_pixels_multi_png():
+    height = 128
+    width = 64
+    num_channels = 7
+    in_array = np.random.rand(height, width, num_channels)
+    byte_arr = generate_compressed_data(in_array)
+    out_array = process_pixels(byte_arr, num_channels)
+    assert out_array.shape == (height, width, num_channels)
+    assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
+    assert np.allclose(in_array, out_array, atol=0.01)
+
+
+def test_process_pixels_gray():
+    in_array = np.random.rand(128, 64, 3)
+    byte_arr = generate_compressed_data(in_array)
+    out_array = process_pixels(byte_arr, 1)
+    assert out_array.shape == (128, 64, 1)
+    assert np.mean(in_array.mean(axis=2, keepdims=True) - out_array) < 0.01
+    assert np.allclose(in_array.mean(axis=2, keepdims=True), out_array, atol=0.01)
+
+
+def test_vector_observation():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    obs_specs = create_observation_specs_with_shapes(shapes)
+    list_proto = generate_list_agent_proto(n_agents, shapes)
+    for obs_index, shape in enumerate(shapes):
+        arr = _process_rank_one_or_two_observation(
+            obs_index, obs_specs[obs_index], list_proto
+        )
+        assert list(arr.shape) == ([n_agents] + list(shape))
+        assert np.allclose(arr, 0.1, atol=0.01)
+
+
+def test_process_visual_observation():
+    shape = (128, 64, 3)
+    in_array_1 = np.random.rand(*shape)
+    proto_obs_1 = generate_compressed_proto_obs(in_array_1)
+    in_array_2 = np.random.rand(*shape)
+    in_array_2_mapping = [0, 1, 2]
+    proto_obs_2 = generate_compressed_proto_obs_with_mapping(
+        in_array_2, in_array_2_mapping
+    )
+
+    ap1 = AgentInfoProto()
+    ap1.observations.extend([proto_obs_1])
+    ap2 = AgentInfoProto()
+    ap2.observations.extend([proto_obs_2])
+    ap_list = [ap1, ap2]
+    obs_spec = create_observation_specs_with_shapes([shape])[0]
+    arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
+    assert list(arr.shape) == [2, 128, 64, 3]
+    assert np.allclose(arr[0, :, :, :], in_array_1, atol=0.01)
+    assert np.allclose(arr[1, :, :, :], in_array_2, atol=0.01)
+
+
+def test_process_visual_observation_grayscale():
+    in_array_1 = np.random.rand(128, 64, 3)
+    proto_obs_1 = generate_compressed_proto_obs(in_array_1, grayscale=True)
+    expected_out_array_1 = np.mean(in_array_1, axis=2, keepdims=True)
+    in_array_2 = np.random.rand(128, 64, 3)
+    in_array_2_mapping = [0, 0, 0]
+    proto_obs_2 = generate_compressed_proto_obs_with_mapping(
+        in_array_2, in_array_2_mapping
+    )
+    expected_out_array_2 = np.mean(in_array_2, axis=2, keepdims=True)
+
+    ap1 = AgentInfoProto()
+    ap1.observations.extend([proto_obs_1])
+    ap2 = AgentInfoProto()
+    ap2.observations.extend([proto_obs_2])
+    ap_list = [ap1, ap2]
+    shape = (128, 64, 1)
+    obs_spec = create_observation_specs_with_shapes([shape])[0]
+    arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
+    assert list(arr.shape) == [2, 128, 64, 1]
+    assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
+    assert np.allclose(arr[1, :, :, :], expected_out_array_2, atol=0.01)
+
+
+def test_process_visual_observation_padded_channels():
+    in_array_1 = np.random.rand(128, 64, 12)
+    in_array_1_mapping = [0, 1, 2, 3, -1, -1, 4, 5, 6, 7, -1, -1]
+    proto_obs_1 = generate_compressed_proto_obs_with_mapping(
+        in_array_1, in_array_1_mapping
+    )
+    expected_out_array_1 = np.take(in_array_1, [0, 1, 2, 3, 6, 7, 8, 9], axis=2)
+
+    ap1 = AgentInfoProto()
+    ap1.observations.extend([proto_obs_1])
+    ap_list = [ap1]
+    shape = (128, 64, 8)
+    obs_spec = create_observation_specs_with_shapes([shape])[0]
+
+    arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
+    assert list(arr.shape) == [1, 128, 64, 8]
+    assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
+
+
+def test_process_visual_observation_bad_shape():
+    in_array_1 = np.random.rand(128, 64, 3)
+    proto_obs_1 = generate_compressed_proto_obs(in_array_1)
+    ap1 = AgentInfoProto()
+    ap1.observations.extend([proto_obs_1])
+    ap_list = [ap1]
+
+    shape = (128, 42, 3)
+    obs_spec = create_observation_specs_with_shapes([shape])[0]
+
+    with pytest.raises(UnityObservationException):
+        _process_maybe_compressed_observation(0, obs_spec, ap_list)
+
+
+def test_batched_step_result_from_proto():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    decision_steps, terminal_steps = steps_from_proto(ap_list, spec)
+    for agent_id in range(n_agents):
+        if agent_id in decision_steps:
+            # we set the reward equal to the agent id in generate_list_agent_proto
+            assert decision_steps[agent_id].reward == agent_id
+        elif agent_id in terminal_steps:
+            assert terminal_steps[agent_id].reward == agent_id
+        else:
+            raise Exception("Missing agent from the steps")
+    # We sort the AgentId since they are split between DecisionSteps and TerminalSteps
+    combined_agent_id = list(decision_steps.agent_id) + list(terminal_steps.agent_id)
+    combined_agent_id.sort()
+    assert combined_agent_id == list(range(n_agents))
+    for agent_id in range(n_agents):
+        assert (agent_id in terminal_steps) == (agent_id % 2 == 0)
+        if agent_id in terminal_steps:
+            assert terminal_steps[agent_id].interrupted == (agent_id % 4 == 0)
+    assert decision_steps.obs[0].shape[1] == shapes[0][0]
+    assert decision_steps.obs[1].shape[1] == shapes[1][0]
+    assert terminal_steps.obs[0].shape[1] == shapes[0][0]
+    assert terminal_steps.obs[1].shape[1] == shapes[1][0]
+
+
+def test_mismatch_observations_raise_in_step_result_from_proto():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    # Hack an observation to be larger, we should get an exception
+    ap_list[0].observations[0].shape[0] += 1
+    ap_list[0].observations[0].float_data.data.append(0.42)
+    with pytest.raises(UnityObservationException):
+        steps_from_proto(ap_list, spec)
+
+
+def test_action_masking_discrete():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((7, 3))
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
+    masks = decision_steps.action_mask
+    assert isinstance(masks, list)
+    assert len(masks) == 2
+    assert masks[0].shape == (n_agents / 2, 7)  # half agents are done
+    assert masks[1].shape == (n_agents / 2, 3)  # half agents are done
+    assert masks[0][0, 0]
+    assert not masks[1][0, 0]
+    assert masks[1][0, 1]
+
+
+def test_action_masking_discrete_1():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((10,))
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
+    masks = decision_steps.action_mask
+    assert isinstance(masks, list)
+    assert len(masks) == 1
+    assert masks[0].shape == (n_agents / 2, 10)
+    assert masks[0][0, 0]
+
+
+def test_action_masking_discrete_2():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes),
+        ActionSpec.create_discrete((2, 2, 6)),
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
+    masks = decision_steps.action_mask
+    assert isinstance(masks, list)
+    assert len(masks) == 3
+    assert masks[0].shape == (n_agents / 2, 2)
+    assert masks[1].shape == (n_agents / 2, 2)
+    assert masks[2].shape == (n_agents / 2, 6)
+    assert masks[0][0, 0]
+
+
+def test_action_masking_continuous():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(10)
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes)
+    decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
+    masks = decision_steps.action_mask
+    assert masks is None
+
+
+def test_agent_behavior_spec_from_proto():
+    agent_proto = generate_list_agent_proto(1, [(3,), (4,)])[0]
+    bp = BrainParametersProto()
+    bp.vector_action_size_deprecated.extend([5, 4])
+    bp.vector_action_space_type_deprecated = 0
+    behavior_spec = behavior_spec_from_proto(bp, agent_proto)
+    assert behavior_spec.action_spec.is_discrete()
+    assert not behavior_spec.action_spec.is_continuous()
+    assert [spec.shape for spec in behavior_spec.observation_specs] == [(3,), (4,)]
+    assert behavior_spec.action_spec.discrete_branches == (5, 4)
+    assert behavior_spec.action_spec.discrete_size == 2
+    bp = BrainParametersProto()
+    bp.vector_action_size_deprecated.extend([6])
+    bp.vector_action_space_type_deprecated = 1
+    behavior_spec = behavior_spec_from_proto(bp, agent_proto)
+    assert not behavior_spec.action_spec.is_discrete()
+    assert behavior_spec.action_spec.is_continuous()
+    assert behavior_spec.action_spec.continuous_size == 6
+
+
+def test_batched_step_result_from_proto_raises_on_infinite():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes, infinite_rewards=True)
+    with pytest.raises(RuntimeError):
+        steps_from_proto(ap_list, behavior_spec)
+
+
+def test_batched_step_result_from_proto_raises_on_nan():
+    n_agents = 10
+    shapes = [(3,), (4,)]
+    behavior_spec = BehaviorSpec(
+        create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
+    )
+    ap_list = generate_list_agent_proto(n_agents, shapes, nan_observations=True)
+    with pytest.raises(RuntimeError):
+        steps_from_proto(ap_list, behavior_spec)
--- a/ml-agents-envs/mlagents_envs/tests/test_set_action.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_set_action.py
--- a/ml-agents-envs/mlagents_envs/tests/test_side_channel.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_side_channel.py
--- a/ml-agents-envs/mlagents_envs/tests/test_steps.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_steps.py
@ -7,7 +7,7 @@ from mlagents_envs.base_env import (
    ActionSpec,
    BehaviorSpec,
 )
-from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
+from dummy_config import create_observation_specs_with_shapes


 def test_decision_steps():
--- a/ml-agents-envs/mlagents_envs/tests/test_timers.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_timers.py
--- a/ml-agents/README.md
+++ b/ml-agents/README.md
@ -4,7 +4,7 @@ The `mlagents` Python package is part of the
 [ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents`
 provides a set of reinforcement and imitation learning algorithms designed to be
 used with Unity environments. The algorithms interface with the Python API
-provided by the `mlagents_envs` package. See [here](../docs/Python-API.md) for
+provided by the `mlagents_envs` package. See [here](../docs/Python-LLAPI.md) for
 more information on `mlagents_envs`.

 The algorithms can be accessed using the: `mlagents-learn` access point. See
--- a/ml-agents/mlagents/trainers/tests/results/ppo/run_logs/training_status.json
+++ b/ml-agents/mlagents/trainers/tests/results/ppo/run_logs/training_status.json
@ -0,0 +1,16 @@
+{
+    "param_1": {
+        "lesson_num": 2
+    },
+    "param_2": {
+        "lesson_num": 0
+    },
+    "param_3": {
+        "lesson_num": 0
+    },
+    "metadata": {
+        "stats_format_version": "0.3.0",
+        "mlagents_version": "0.29.0.dev0",
+        "torch_version": "1.8.1"
+    }
+}
--- a/ml-agents/mlagents/trainers/tests/simple_test_envs.py
+++ b/ml-agents/mlagents/trainers/tests/simple_test_envs.py
@ -13,7 +13,7 @@ from mlagents_envs.base_env import (
    TerminalSteps,
    BehaviorMapping,
 )
-from mlagents_envs.tests.test_rpc_utils import proto_from_steps_and_action
+from .test_rpc_utils import proto_from_steps_and_action
 from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
    AgentInfoActionPairProto,
 )
--- a/ml-agents-envs/mlagents_envs/tests/test_rpc_utils.py
+++ b/ml-agents-envs/mlagents_envs/tests/test_rpc_utils.py
--- a/ml-agents/tests/yamato/scripts/run_gym.py
+++ b/ml-agents/tests/yamato/scripts/run_gym.py
@ -1,7 +1,7 @@
 import argparse

 from mlagents_envs.environment import UnityEnvironment
-from gym_unity.envs import UnityToGymWrapper
+from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper


 def test_run_environment(env_name):
--- a/ml-agents/tests/yamato/yamato_utils.py
+++ b/ml-agents/tests/yamato/yamato_utils.py
@ -136,13 +136,12 @@ def init_venv(
        # install from pypi
        pip_commands += [
            f"mlagents=={mlagents_python_version}",
-            f"gym-unity=={mlagents_python_version}",
            # TODO build these and publish to internal pypi
            "tf2onnx==1.6.1",
        ]
    else:
        # Local install
-        pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents", "-e ./gym-unity"]
+        pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents"]
    if extra_packages:
        pip_commands += extra_packages

--- a/utils/validate_inits.py
+++ b/utils/validate_inits.py
@ -40,7 +40,7 @@ def validate_packages(root_dir):


 def main():
-    for root_dir in ["ml-agents", "ml-agents-envs", "gym-unity"]:
+    for root_dir in ["ml-agents", "ml-agents-envs"]:
        validate_packages(root_dir)


--- a/utils/validate_release_links.py
+++ b/utils/validate_release_links.py
@ -22,7 +22,9 @@ MATCH_ANY = re.compile(r"(?s).*")
 # To allow everything in the file (effectively skipping it), use MATCH_ANY for the value
 ALLOW_LIST = {
    # Previous release table
-    "README.md": re.compile(r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"),
+    "docs/Python-PettingZoo-API.md": re.compile(
+        r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"
+    ),
    "docs/Versioning.md": MATCH_ANY,
    "com.unity.ml-agents/CHANGELOG.md": MATCH_ANY,
    "utils/make_readme_table.py": MATCH_ANY,
--- a/utils/validate_versions.py
+++ b/utils/validate_versions.py
@ -8,11 +8,7 @@ import argparse

 VERSION_LINE_START = "__version__ = "

-DIRECTORIES = [
-    "ml-agents/mlagents/trainers",
-    "ml-agents-envs/mlagents_envs",
-    "gym-unity/gym_unity",
-]
+DIRECTORIES = ["ml-agents/mlagents/trainers", "ml-agents-envs/mlagents_envs"]

 MLAGENTS_PACKAGE_JSON_PATH = "com.unity.ml-agents/package.json"
 MLAGENTS_EXTENSIONS_PACKAGE_JSON_PATH = "com.unity.ml-agents.extensions/package.json"