Develop python api ga (#6)
* Dropped support for python 3.6 * Pinning python 3.9.9 for tests due to typing issues with 3.9.10 * Testing new bokken image. * Testing new bokken image. * Updated yamato standalone build test. * Updated yamato standalone build test. * Updated standalone build test. * Updated yamato configs to use mla bokken vm. * Bug fixes for yamato yml files. * Fixed com.unity.ml-agents-test.yml * Bumped min python version to 3.7.2 * pettingzoo api prototype * add example * update file names * support multiple behavior names * fix multi behavior action index * add install in colab * add setup * update colab * fix __init__ * clone single branch * import tags only * import in init * catch import error * update colab * move colab and add readme * handle agent dying * add tests * update doc * add info * add action mask * fix action mask * update action masks in colab * change default env * set version * fix hybrid action * fix colab for hybrid actions * add note on auto reset * Updated colab name. * Update README.md * Following petting_zoo registry API (#5557) * init petting_zoo registry * cherrypick Custom trainer editor analytics (#5511) * cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)" * Update colab to match pettingZoo import api * ToRevert: pull exp-petting-registry branch * Add init file to tests * Install pettingzoo-unity requirements for pytest * update pytest command * Add docstrings and comments * update coverage to pettingzoo folder * unset log level * update env string * Two small bugfixes (#5589) 1. Add the missing `_cumulative_rewards` property 2. Update `agent_selection` to not error out when an agent finishes an episode. * Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing. * Some refactoring. * Finished inital implementation of parallel. Tests not passing. * Finished parallel API implementation and refactor. All PZ tests passing. * Cleanup. * Refactoring. * Pinning numpy version. * add metadata and behavior_specs initialization * addressing behaviour_spec issues * Bumped PZ version to 1.14.0. Fixed failing tests. * Refactored gym-unity and petting-zoo into ml-agents-envs * Added TODO to pydoc-config.yaml * Refactored gym and pz to be under a subpackage in mlagents_env package * Refactored ml-agents-envs docs. * Minor update to PZ API doc. * Updated mlagents_envs docs and colab. * Updated pytest gh workflow to remove ref to gym and pz. * Refactored to remove some test coupling between trainers and envs. * Updated installation doc. * Update ml-agents-envs/README.md Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> * Updated failing yamato jobs. * pettingzoo api prototype * add example * update file names * support multiple behavior names * fix multi behavior action index * add install in colab * add setup * update colab * fix __init__ * clone single branch * import tags only * import in init * catch import error * update colab * move colab and add readme * handle agent dying * add tests * update doc * add info * add action mask * fix action mask * update action masks in colab * change default env * set version * fix hybrid action * fix colab for hybrid actions * add note on auto reset * Updated colab name. * Update README.md * Following petting_zoo registry API (#5557) * init petting_zoo registry * cherrypick Custom trainer editor analytics (#5511) * cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)" * Update colab to match pettingZoo import api * ToRevert: pull exp-petting-registry branch * Add init file to tests * Install pettingzoo-unity requirements for pytest * update pytest command * Add docstrings and comments * update coverage to pettingzoo folder * unset log level * update env string * Two small bugfixes (#5589) 1. Add the missing `_cumulative_rewards` property 2. Update `agent_selection` to not error out when an agent finishes an episode. * Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing. * Some refactoring. * Finished inital implementation of parallel. Tests not passing. * Finished parallel API implementation and refactor. All PZ tests passing. * Cleanup. * Refactoring. * Pinning numpy version. * add metadata and behavior_specs initialization * addressing behaviour_spec issues * Bumped PZ version to 1.14.0. Fixed failing tests. * Refactored gym-unity and petting-zoo into ml-agents-envs * Added TODO to pydoc-config.yaml * Refactored gym and pz to be under a subpackage in mlagents_env package * Refactored ml-agents-envs docs. * Minor update to PZ API doc. * Updated mlagents_envs docs and colab. * Updated pytest gh workflow to remove ref to gym and pz. * Refactored to remove some test coupling between trainers and envs. * Updated installation doc. * Update ml-agents-envs/README.md Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com> * Updated CHANGELOG. * Updated Migration guide. * Doc updates based on CR. * Updated github workflow for colab tests. * Updated github workflow for colab tests. * Updated github workflow for colab tests. * Fixed yamato import error. Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com> Co-authored-by: Miguel Alonso Jr <miguelalonsojr> Co-authored-by: jmercado1985 <75792879+jmercado1985@users.noreply.github.com> Co-authored-by: Maryam Honari <honari.m94@gmail.com> Co-authored-by: Henry Peteet <henry.peteet@unity3d.com> Co-authored-by: mahon94 <maryam.honari@unity3d.com> Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
This commit is contained in:
Родитель
b4cbaa6840
Коммит
28303adf6c
|
@ -16,7 +16,7 @@ jobs:
|
|||
runs-on: [self-hosted, Linux, X64]
|
||||
strategy:
|
||||
matrix:
|
||||
package-path: [ml-agents, ml-agents-envs, gym-unity]
|
||||
package-path: [ml-agents, ml-agents-envs]
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@main
|
||||
|
|
|
@ -5,7 +5,6 @@ on:
|
|||
paths: # This action will only run if the PR modifies a file in one of these directories
|
||||
- 'ml-agents/**'
|
||||
- 'ml-agents-envs/**'
|
||||
- 'gym-unity/**'
|
||||
- 'test_constraints*.txt'
|
||||
- 'test_requirements.txt'
|
||||
- '.github/workflows/pytest.yml'
|
||||
|
@ -47,7 +46,7 @@ jobs:
|
|||
# # This path is specific to Ubuntu
|
||||
# path: ~/.cache/pip
|
||||
# # Look to see if there is a cache hit for the corresponding requirements file
|
||||
# key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'gym-unity/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
|
||||
# key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
|
||||
# restore-keys: |
|
||||
# ${{ runner.os }}-pip-
|
||||
# ${{ runner.os }}-
|
||||
|
@ -60,14 +59,13 @@ jobs:
|
|||
python -m pip install --progress-bar=off -e ./ml-agents-envs -c ${{ matrix.pip_constraints }}
|
||||
python -m pip install --progress-bar=off -e ./ml-agents -c ${{ matrix.pip_constraints }}
|
||||
python -m pip install --progress-bar=off -r test_requirements.txt -c ${{ matrix.pip_constraints }}
|
||||
python -m pip install --progress-bar=off -e ./gym-unity -c ${{ matrix.pip_constraints }}
|
||||
python -m pip install --progress-bar=off -e ./ml-agents-plugin-examples -c ${{ matrix.pip_constraints }}
|
||||
- name: Save python dependencies
|
||||
run: |
|
||||
pip freeze > pip_versions-${{ matrix.python-version }}.txt
|
||||
cat pip_versions-${{ matrix.python-version }}.txt
|
||||
- name: Run pytest
|
||||
run: pytest --cov=ml-agents --cov=ml-agents-envs --cov=gym-unity --cov-report html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
|
||||
run: pytest --cov=ml-agents --cov=ml-agents-envs --cov-report=html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
|
||||
- name: Upload pytest test results
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
|
|
|
@ -22,10 +22,6 @@ repos:
|
|||
# Exclude protobuf files and don't follow them when imported
|
||||
exclude: ".*_pb2.py"
|
||||
args: [--ignore-missing-imports, --disallow-incomplete-defs]
|
||||
- id: mypy
|
||||
name: mypy-gym-unity
|
||||
files: "gym-unity/.*"
|
||||
args: [--ignore-missing-imports, --disallow-incomplete-defs]
|
||||
|
||||
- repo: https://gitlab.com/pycqa/flake8
|
||||
rev: 3.8.1
|
||||
|
|
|
@ -30,7 +30,6 @@ test_gym_interface_{{ editor.version }}:
|
|||
pull_request.changes.any match "Project/**" OR
|
||||
pull_request.changes.any match "ml-agents/tests/yamato/**" OR
|
||||
pull_request.changes.any match "ml-agents-envs/**" OR
|
||||
pull_request.changes.any match "gym-unity/**" OR
|
||||
pull_request.changes.any match ".yamato/gym-interface-test.yml") AND
|
||||
NOT pull_request.changes.all match "**/*.md"
|
||||
{% endif %}
|
||||
|
|
|
@ -38,8 +38,8 @@ developer communities.
|
|||
- Train using multiple concurrent Unity environment instances
|
||||
- Utilizes the [Unity Inference Engine](docs/Unity-Inference-Engine.md) to
|
||||
provide native cross-platform support
|
||||
- Unity environment [control from Python](docs/Python-API.md)
|
||||
- Wrap Unity learning environments as a [gym](gym-unity/README.md)
|
||||
- Unity environment [control from Python](docs/Python-LLAPI.md)
|
||||
- Wrap Unity learning environments as a [gym](docs/Python-Gym-API.md)
|
||||
|
||||
See our [ML-Agents Overview](docs/ML-Agents-Overview.md) page for detailed
|
||||
descriptions of all these features.
|
||||
|
|
|
@ -9,15 +9,17 @@ and this project adheres to
|
|||
## [Unreleased]
|
||||
### Major Changes
|
||||
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
|
||||
#### ml-agents / ml-agents-envs / gym-unity (Python)
|
||||
- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#4)
|
||||
#### ml-agents / ml-agents-envs
|
||||
- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#5)
|
||||
- Added support for the PettingZoo multi-agent API (#6)
|
||||
- Refactored `gym-unity` into the `ml-agents-envs` package (#6)
|
||||
|
||||
### Minor Changes
|
||||
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
|
||||
#### ml-agents / ml-agents-envs / gym-unity (Python)
|
||||
#### ml-agents / ml-agents-envs
|
||||
### Bug Fixes
|
||||
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
|
||||
#### ml-agents / ml-agents-envs / gym-unity (Python)
|
||||
#### ml-agents / ml-agents-envs
|
||||
|
||||
## [2.2.1-exp.1] - 2022-01-14
|
||||
### Major Changes
|
||||
|
|
|
@ -18,8 +18,6 @@ The ML-Agents Toolkit contains several components:
|
|||
a Unity scene. It is a foundational layer that facilitates data messaging
|
||||
between Unity scene and the Python machine learning algorithms.
|
||||
Consequently, `mlagents` depends on `mlagents_envs`.
|
||||
- [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene
|
||||
that supports the OpenAI Gym interface.
|
||||
- Unity [Project](../Project/) that contains several
|
||||
[example environments](Learning-Environment-Examples.md) that highlight the
|
||||
various features of the toolkit to help you get started.
|
||||
|
|
|
@ -62,7 +62,7 @@ can interact with it.
|
|||
|
||||
## Interacting with the Environment
|
||||
|
||||
If you want to use the [Python API](Python-API.md) to interact with your
|
||||
If you want to use the [Python API](Python-LLAPI.md) to interact with your
|
||||
executable, you can pass the name of the executable with the argument
|
||||
'file_name' of the `UnityEnvironment`. For instance:
|
||||
|
||||
|
|
|
@ -5,4 +5,3 @@ See the package-specific Limitations pages:
|
|||
- [`com.unity.mlagents` Unity package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md#known-limitations)
|
||||
- [`mlagents` Python package](../ml-agents/README.md#limitations)
|
||||
- [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations)
|
||||
- [`gym_unity` Python package](../gym-unity/README.md#limitations)
|
||||
|
|
|
@ -167,7 +167,7 @@ The ML-Agents Toolkit contains five high-level components:
|
|||
process to communicate with and control the Academy during training. However,
|
||||
it can be used for other purposes as well. For example, you could use the API
|
||||
to use Unity as the simulation engine for your own machine learning
|
||||
algorithms. See [Python API](Python-API.md) for more information.
|
||||
algorithms. See [Python API](Python-LLAPI.md) for more information.
|
||||
- **External Communicator** - which connects the Learning Environment with the
|
||||
Python Low-Level API. It lives within the Learning Environment.
|
||||
- **Python Trainers** which contains all the machine learning algorithms that
|
||||
|
@ -179,9 +179,15 @@ The ML-Agents Toolkit contains five high-level components:
|
|||
- **Gym Wrapper** (not pictured). A common way in which machine learning
|
||||
researchers interact with simulation environments is via a wrapper provided by
|
||||
OpenAI called [gym](https://github.com/openai/gym). We provide a gym wrapper
|
||||
in a dedicated `gym-unity` Python package and
|
||||
[instructions](../gym-unity/README.md) for using it with existing machine
|
||||
in the `ml-agents-envs` package and
|
||||
[instructions](Python-Gym-API.md) for using it with existing machine
|
||||
learning algorithms which utilize gym.
|
||||
- **PettingZoo Wrapper** (not pictured) PettingZoo is python API for
|
||||
interacting with multi-agent simulation environments that provides a
|
||||
gym-like interface. We provide a PettingZoo wrapper for Unity ML-Agents
|
||||
environments in the `ml-agents-envs` package and
|
||||
[instructions](Python-PettingZoo-API.md) for using it with machine learning
|
||||
algorithms.
|
||||
|
||||
<p align="center">
|
||||
<img src="images/learning_environment_basic.png"
|
||||
|
@ -286,10 +292,10 @@ In the previous mode, the Agents were used for training to generate a PyTorch
|
|||
model that the Agents can later use. However, any user of the ML-Agents Toolkit
|
||||
can leverage their own algorithms for training. In this case, the behaviors of
|
||||
all the Agents in the scene will be controlled within Python. You can even turn
|
||||
your environment into a [gym.](../gym-unity/README.md)
|
||||
your environment into a [gym.](Python-Gym-API.md)
|
||||
|
||||
We do not currently have a tutorial highlighting this mode, but you can learn
|
||||
more about the Python API [here](Python-API.md).
|
||||
more about the Python API [here](Python-LLAPI.md).
|
||||
|
||||
## Flexible Training Scenarios
|
||||
|
||||
|
|
|
@ -1,6 +1,25 @@
|
|||
# Upgrading
|
||||
|
||||
# Migrating
|
||||
<!---
|
||||
TODO: update ml-agents-env package version before release
|
||||
--->
|
||||
## Migrating to the ml-agents-envs 0.29.0.dev0 package
|
||||
- Python 3.7 is now the minimum version of python supported due to [python3.6 EOL](https://endoflife.date/python).
|
||||
Please update your python installation to 3.7.2 or higher. Note: Due to an issue with the typing system, the maximum
|
||||
version of python supported is python 3.9.9.
|
||||
- The `gym-unity` package has been refactored into the `ml-agents-envs` package. Please update your imports accordingly.
|
||||
- Example:
|
||||
- Before
|
||||
```python
|
||||
from gym_unity.unity_gym_env import UnityToGymWrapper
|
||||
```
|
||||
- After:
|
||||
```python
|
||||
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
|
||||
```
|
||||
|
||||
|
||||
## Migrating the package to version 2.0
|
||||
- The official version of Unity ML-Agents supports is now 2020.3 LTS. If you run
|
||||
into issues, please consider deleting your project's Library folder and reponening your
|
||||
|
@ -260,9 +279,9 @@ vector observations to be used simultaneously.
|
|||
- The `play_against_current_self_ratio` self-play trainer hyperparameter has
|
||||
been renamed to `play_against_latest_model_ratio`
|
||||
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
|
||||
scenarios, use the [Low Level Python API](Python-API.md).
|
||||
scenarios, use the [Low Level Python API](Python-LLAPI.md).
|
||||
- The low level Python API has changed. You can look at the document
|
||||
[Low Level Python API documentation](Python-API.md) for more information. If
|
||||
[Low Level Python API documentation](Python-LLAPI.md) for more information. If
|
||||
you use `mlagents-learn` for training, this should be a transparent change.
|
||||
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
|
||||
`AgentAction` and `AgentReset` have been removed.
|
||||
|
@ -487,7 +506,7 @@ vector observations to be used simultaneously.
|
|||
### Important changes
|
||||
|
||||
- The low level Python API has changed. You can look at the document
|
||||
[Low Level Python API documentation](Python-API.md) for more information. This
|
||||
[Low Level Python API documentation](Python-LLAPI.md) for more information. This
|
||||
should only affect you if you're writing a custom trainer; if you use
|
||||
`mlagents-learn` for training, this should be a transparent change.
|
||||
- `reset()` on the Low-Level Python API no longer takes a `train_mode`
|
||||
|
@ -497,7 +516,7 @@ vector observations to be used simultaneously.
|
|||
`UnityEnvironment` no longer has a `reset_parameters` field. To modify float
|
||||
properties in the environment, you must use a `FloatPropertiesChannel`. For
|
||||
more information, refer to the
|
||||
[Low Level Python API documentation](Python-API.md)
|
||||
[Low Level Python API documentation](Python-LLAPI.md)
|
||||
- `CustomResetParameters` are now removed.
|
||||
- The Academy no longer has a `Training Configuration` nor
|
||||
`Inference Configuration` field in the inspector. To modify the configuration
|
||||
|
|
|
@ -0,0 +1,161 @@
|
|||
# Table of Contents
|
||||
|
||||
* [mlagents\_envs.envs.unity\_gym\_env](#mlagents_envs.envs.unity_gym_env)
|
||||
* [UnityGymException](#mlagents_envs.envs.unity_gym_env.UnityGymException)
|
||||
* [UnityToGymWrapper](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper)
|
||||
* [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__)
|
||||
* [reset](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset)
|
||||
* [step](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step)
|
||||
* [render](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render)
|
||||
* [close](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close)
|
||||
* [seed](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed)
|
||||
* [ActionFlattener](#mlagents_envs.envs.unity_gym_env.ActionFlattener)
|
||||
* [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__)
|
||||
* [lookup\_action](#mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action)
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env"></a>
|
||||
# mlagents\_envs.envs.unity\_gym\_env
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityGymException"></a>
|
||||
## UnityGymException Objects
|
||||
|
||||
```python
|
||||
class UnityGymException(error.Error)
|
||||
```
|
||||
|
||||
Any error related to the gym wrapper of ml-agents.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper"></a>
|
||||
## UnityToGymWrapper Objects
|
||||
|
||||
```python
|
||||
class UnityToGymWrapper(gym.Env)
|
||||
```
|
||||
|
||||
Provides Gym wrapper for Unity Learning Environments.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None)
|
||||
```
|
||||
|
||||
Environment initialization
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `unity_env`: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes.
|
||||
- `uint8_visual`: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0).
|
||||
- `flatten_branched`: If True, turn branched discrete action spaces into a Discrete space rather than
|
||||
MultiDiscrete.
|
||||
- `allow_multiple_obs`: If True, return a list of np.ndarrays as observations with the first elements
|
||||
containing the visual observations and the last element containing the array of vector observations.
|
||||
If False, returns a single np.ndarray containing either only a single visual observation or the array of
|
||||
vector observations.
|
||||
- `action_space_seed`: If non-None, will be used to set the random seed on created gym.Space instances.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset"></a>
|
||||
#### reset
|
||||
|
||||
```python
|
||||
| reset() -> Union[List[np.ndarray], np.ndarray]
|
||||
```
|
||||
|
||||
Resets the state of the environment and returns an initial observation.
|
||||
Returns: observation (object/list): the initial observation of the
|
||||
space.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step"></a>
|
||||
#### step
|
||||
|
||||
```python
|
||||
| step(action: List[Any]) -> GymStepResult
|
||||
```
|
||||
|
||||
Run one timestep of the environment's dynamics. When end of
|
||||
episode is reached, you are responsible for calling `reset()`
|
||||
to reset this environment's state.
|
||||
Accepts an action and returns a tuple (observation, reward, done, info).
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `action` _object/list_ - an action provided by the environment
|
||||
|
||||
**Returns**:
|
||||
|
||||
- `observation` _object/list_ - agent's observation of the current environment
|
||||
reward (float/list) : amount of reward returned after previous action
|
||||
- `done` _boolean/list_ - whether the episode has ended.
|
||||
- `info` _dict_ - contains auxiliary diagnostic information.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render"></a>
|
||||
#### render
|
||||
|
||||
```python
|
||||
| render(mode="rgb_array")
|
||||
```
|
||||
|
||||
Return the latest visual observations.
|
||||
Note that it will not render a new frame of the environment.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close"></a>
|
||||
#### close
|
||||
|
||||
```python
|
||||
| close() -> None
|
||||
```
|
||||
|
||||
Override _close in your subclass to perform any necessary cleanup.
|
||||
Environments will automatically close() themselves when
|
||||
garbage collected or when the program exits.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed"></a>
|
||||
#### seed
|
||||
|
||||
```python
|
||||
| seed(seed: Any = None) -> None
|
||||
```
|
||||
|
||||
Sets the seed for this env's random number generator(s).
|
||||
Currently not implemented.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener"></a>
|
||||
## ActionFlattener Objects
|
||||
|
||||
```python
|
||||
class ActionFlattener()
|
||||
```
|
||||
|
||||
Flattens branched discrete action spaces into single-branch discrete action spaces.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(branched_action_space)
|
||||
```
|
||||
|
||||
Initialize the flattener.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `branched_action_space`: A List containing the sizes of each branch of the action
|
||||
space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action"></a>
|
||||
#### lookup\_action
|
||||
|
||||
```python
|
||||
| lookup_action(action)
|
||||
```
|
||||
|
||||
Convert a scalar discrete action into a unique set of branched actions.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `action`: A scalar value representing one of the discrete actions.
|
||||
|
||||
**Returns**:
|
||||
|
||||
The List containing the branched actions.
|
|
@ -11,17 +11,9 @@ Unity environment via Python.
|
|||
|
||||
## Installation
|
||||
|
||||
The gym wrapper can be installed using:
|
||||
The gym wrapper is part of the `mlgents_envs` package. Please refer to the
|
||||
[mlagents_envs installation instructions](../ml-agents-envs/README.md).
|
||||
|
||||
```sh
|
||||
pip3 install gym_unity
|
||||
```
|
||||
|
||||
or by running the following from the `/gym-unity` directory of the repository:
|
||||
|
||||
```sh
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
## Using the Gym Wrapper
|
||||
|
||||
|
@ -29,7 +21,7 @@ The gym interface is available from `gym_unity.envs`. To launch an environment
|
|||
from the root of the project repository use:
|
||||
|
||||
```python
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
|
||||
|
||||
env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multiple_obs)
|
||||
```
|
||||
|
@ -107,35 +99,37 @@ from baselines import deepq
|
|||
from baselines import logger
|
||||
|
||||
from mlagents_envs.environment import UnityEnvironment
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
|
||||
|
||||
|
||||
def main():
|
||||
unity_env = UnityEnvironment(<path-to-environment>)
|
||||
env = UnityToGymWrapper(unity_env, uint8_visual=True)
|
||||
logger.configure('./logs') # Change to log in a different directory
|
||||
act = deepq.learn(
|
||||
env,
|
||||
"cnn", # For visual inputs
|
||||
lr=2.5e-4,
|
||||
total_timesteps=1000000,
|
||||
buffer_size=50000,
|
||||
exploration_fraction=0.05,
|
||||
exploration_final_eps=0.1,
|
||||
print_freq=20,
|
||||
train_freq=5,
|
||||
learning_starts=20000,
|
||||
target_network_update_freq=50,
|
||||
gamma=0.99,
|
||||
prioritized_replay=False,
|
||||
checkpoint_freq=1000,
|
||||
checkpoint_path='./logs', # Change to save model in a different directory
|
||||
dueling=True
|
||||
)
|
||||
print("Saving model to unity_model.pkl")
|
||||
act.save("unity_model.pkl")
|
||||
unity_env = UnityEnvironment( < path - to - environment >)
|
||||
env = UnityToGymWrapper(unity_env, uint8_visual=True)
|
||||
logger.configure('./logs') # Change to log in a different directory
|
||||
act = deepq.learn(
|
||||
env,
|
||||
"cnn", # For visual inputs
|
||||
lr=2.5e-4,
|
||||
total_timesteps=1000000,
|
||||
buffer_size=50000,
|
||||
exploration_fraction=0.05,
|
||||
exploration_final_eps=0.1,
|
||||
print_freq=20,
|
||||
train_freq=5,
|
||||
learning_starts=20000,
|
||||
target_network_update_freq=50,
|
||||
gamma=0.99,
|
||||
prioritized_replay=False,
|
||||
checkpoint_freq=1000,
|
||||
checkpoint_path='./logs', # Change to save model in a different directory
|
||||
dueling=True
|
||||
)
|
||||
print("Saving model to unity_model.pkl")
|
||||
act.save("unity_model.pkl")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
main()
|
||||
```
|
||||
|
||||
To start the training process, run the following from the directory containing
|
||||
|
@ -163,7 +157,7 @@ method using the PPO2 baseline:
|
|||
|
||||
```python
|
||||
from mlagents_envs.environment import UnityEnvironment
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
from mlagents_envs.envs import UnityToGymWrapper
|
||||
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
|
||||
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
|
||||
from baselines.bench import Monitor
|
||||
|
@ -173,38 +167,44 @@ import baselines.ppo2.ppo2 as ppo2
|
|||
import os
|
||||
|
||||
try:
|
||||
from mpi4py import MPI
|
||||
from mpi4py import MPI
|
||||
except ImportError:
|
||||
MPI = None
|
||||
MPI = None
|
||||
|
||||
|
||||
def make_unity_env(env_directory, num_env, visual, start_index=0):
|
||||
"""
|
||||
Create a wrapped, monitored Unity environment.
|
||||
"""
|
||||
def make_env(rank, use_visual=True): # pylint: disable=C0111
|
||||
def _thunk():
|
||||
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
|
||||
env = UnityToGymWrapper(unity_env, uint8_visual=True)
|
||||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
|
||||
return env
|
||||
return _thunk
|
||||
if visual:
|
||||
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
|
||||
else:
|
||||
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
|
||||
return DummyVecEnv([make_env(rank, use_visual=False)])
|
||||
"""
|
||||
Create a wrapped, monitored Unity environment.
|
||||
"""
|
||||
|
||||
def make_env(rank, use_visual=True): # pylint: disable=C0111
|
||||
def _thunk():
|
||||
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
|
||||
env = UnityToGymWrapper(unity_env, uint8_visual=True)
|
||||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
|
||||
return env
|
||||
|
||||
return _thunk
|
||||
|
||||
if visual:
|
||||
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
|
||||
else:
|
||||
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
|
||||
return DummyVecEnv([make_env(rank, use_visual=False)])
|
||||
|
||||
|
||||
def main():
|
||||
env = make_unity_env(<path-to-environment>, 4, True)
|
||||
ppo2.learn(
|
||||
network="mlp",
|
||||
env=env,
|
||||
total_timesteps=100000,
|
||||
lr=1e-3,
|
||||
)
|
||||
env = make_unity_env( < path - to - environment >, 4, True)
|
||||
ppo2.learn(
|
||||
network="mlp",
|
||||
env=env,
|
||||
total_timesteps=100000,
|
||||
lr=1e-3,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
main()
|
||||
```
|
||||
|
||||
## Run Google Dopamine Algorithms
|
||||
|
@ -236,7 +236,7 @@ instantiated, just as in the Baselines example. At the top of the file, insert
|
|||
|
||||
```python
|
||||
from mlagents_envs.environment import UnityEnvironment
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
from mlagents_envs.envs import UnityToGymWrapper
|
||||
```
|
||||
|
||||
to import the Gym Wrapper. Navigate to the `create_atari_environment` method in
|
|
@ -6,7 +6,7 @@ an entry point to train (`mlagents-learn`) which allows you to train agents in
|
|||
Unity Environments using our implementations of reinforcement learning or
|
||||
imitation learning. This document describes how to use the `mlagents_envs` API.
|
||||
For information on using `mlagents-learn`, see [here](Training-ML-Agents.md).
|
||||
For Python Low Level API documentation, see [here](Python-API-Documentation.md).
|
||||
For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md).
|
||||
|
||||
The Python Low Level API can be used to interact directly with your Unity
|
||||
learning environment. As such, it can serve as the basis for developing and
|
|
@ -0,0 +1,246 @@
|
|||
# Table of Contents
|
||||
|
||||
* [mlagents\_envs.envs.pettingzoo\_env\_factory](#mlagents_envs.envs.pettingzoo_env_factory)
|
||||
* [PettingZooEnvFactory](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory)
|
||||
* [env](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env)
|
||||
* [mlagents\_envs.envs.unity\_aec\_env](#mlagents_envs.envs.unity_aec_env)
|
||||
* [UnityAECEnv](#mlagents_envs.envs.unity_aec_env.UnityAECEnv)
|
||||
* [\_\_init\_\_](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__)
|
||||
* [step](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.step)
|
||||
* [observe](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe)
|
||||
* [last](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.last)
|
||||
* [mlagents\_envs.envs.unity\_parallel\_env](#mlagents_envs.envs.unity_parallel_env)
|
||||
* [UnityParallelEnv](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv)
|
||||
* [\_\_init\_\_](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__)
|
||||
* [reset](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset)
|
||||
* [mlagents\_envs.envs.unity\_pettingzoo\_base\_env](#mlagents_envs.envs.unity_pettingzoo_base_env)
|
||||
* [UnityPettingzooBaseEnv](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv)
|
||||
* [observation\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces)
|
||||
* [observation\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space)
|
||||
* [action\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces)
|
||||
* [action\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space)
|
||||
* [side\_channel](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel)
|
||||
* [reset](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset)
|
||||
* [seed](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed)
|
||||
* [render](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render)
|
||||
* [close](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close)
|
||||
|
||||
<a name="mlagents_envs.envs.pettingzoo_env_factory"></a>
|
||||
# mlagents\_envs.envs.pettingzoo\_env\_factory
|
||||
|
||||
<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory"></a>
|
||||
## PettingZooEnvFactory Objects
|
||||
|
||||
```python
|
||||
class PettingZooEnvFactory()
|
||||
```
|
||||
|
||||
<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env"></a>
|
||||
#### env
|
||||
|
||||
```python
|
||||
| env(seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]) -> UnityAECEnv
|
||||
```
|
||||
|
||||
Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `seed`: The seed for the action spaces of the agents.
|
||||
- `kwargs`: Any argument accepted by `UnityEnvironment`class except file_name
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env"></a>
|
||||
# mlagents\_envs.envs.unity\_aec\_env
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv"></a>
|
||||
## UnityAECEnv Objects
|
||||
|
||||
```python
|
||||
class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv)
|
||||
```
|
||||
|
||||
Unity AEC (PettingZoo) environment wrapper.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(env: BaseEnv, seed: Optional[int] = None)
|
||||
```
|
||||
|
||||
Initializes a Unity AEC environment wrapper.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `env`: The UnityEnvironment that is being wrapped.
|
||||
- `seed`: The seed for the action spaces of the agents.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.step"></a>
|
||||
#### step
|
||||
|
||||
```python
|
||||
| step(action: Any) -> None
|
||||
```
|
||||
|
||||
Sets the action of the active agent and get the observation, reward, done
|
||||
and info of the next agent.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `action`: The action for the active agent
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe"></a>
|
||||
#### observe
|
||||
|
||||
```python
|
||||
| observe(agent_id)
|
||||
```
|
||||
|
||||
Returns the observation an agent currently can make. `last()` calls this function.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.last"></a>
|
||||
#### last
|
||||
|
||||
```python
|
||||
| last(observe=True)
|
||||
```
|
||||
|
||||
returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
|
||||
|
||||
<a name="mlagents_envs.envs.unity_parallel_env"></a>
|
||||
# mlagents\_envs.envs.unity\_parallel\_env
|
||||
|
||||
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv"></a>
|
||||
## UnityParallelEnv Objects
|
||||
|
||||
```python
|
||||
class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv)
|
||||
```
|
||||
|
||||
Unity Parallel (PettingZoo) environment wrapper.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__"></a>
|
||||
#### \_\_init\_\_
|
||||
|
||||
```python
|
||||
| __init__(env: BaseEnv, seed: Optional[int] = None)
|
||||
```
|
||||
|
||||
Initializes a Unity Parallel environment wrapper.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
- `env`: The UnityEnvironment that is being wrapped.
|
||||
- `seed`: The seed for the action spaces of the agents.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset"></a>
|
||||
#### reset
|
||||
|
||||
```python
|
||||
| reset() -> Dict[str, Any]
|
||||
```
|
||||
|
||||
Resets the environment.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env"></a>
|
||||
# mlagents\_envs.envs.unity\_pettingzoo\_base\_env
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv"></a>
|
||||
## UnityPettingzooBaseEnv Objects
|
||||
|
||||
```python
|
||||
class UnityPettingzooBaseEnv()
|
||||
```
|
||||
|
||||
Unity Petting Zoo base environment.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces"></a>
|
||||
#### observation\_spaces
|
||||
|
||||
```python
|
||||
| @property
|
||||
| observation_spaces() -> Dict[str, spaces.Space]
|
||||
```
|
||||
|
||||
Return the observation spaces of all the agents.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space"></a>
|
||||
#### observation\_space
|
||||
|
||||
```python
|
||||
| observation_space(agent: str) -> Optional[spaces.Space]
|
||||
```
|
||||
|
||||
The observation space of the current agent.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces"></a>
|
||||
#### action\_spaces
|
||||
|
||||
```python
|
||||
| @property
|
||||
| action_spaces() -> Dict[str, spaces.Space]
|
||||
```
|
||||
|
||||
Return the action spaces of all the agents.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space"></a>
|
||||
#### action\_space
|
||||
|
||||
```python
|
||||
| action_space(agent: str) -> Optional[spaces.Space]
|
||||
```
|
||||
|
||||
The action space of the current agent.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel"></a>
|
||||
#### side\_channel
|
||||
|
||||
```python
|
||||
| @property
|
||||
| side_channel() -> Dict[str, Any]
|
||||
```
|
||||
|
||||
The side channels of the environment. You can access the side channels
|
||||
of an environment with `env.side_channel[<name-of-channel>]`.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset"></a>
|
||||
#### reset
|
||||
|
||||
```python
|
||||
| reset()
|
||||
```
|
||||
|
||||
Resets the environment.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed"></a>
|
||||
#### seed
|
||||
|
||||
```python
|
||||
| seed(seed=None)
|
||||
```
|
||||
|
||||
Reseeds the environment (making the resulting environment deterministic).
|
||||
`reset()` must be called after `seed()`, and before `step()`.
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render"></a>
|
||||
#### render
|
||||
|
||||
```python
|
||||
| render(mode="human")
|
||||
```
|
||||
|
||||
NOT SUPPORTED.
|
||||
|
||||
Displays a rendered frame from the environment, if supported.
|
||||
Alternate render modes in the default environments are `'rgb_array'`
|
||||
which returns a numpy array and is supported by all environments outside of classic,
|
||||
and `'ansi'` which returns the strings printed (specific to classic environments).
|
||||
|
||||
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close"></a>
|
||||
#### close
|
||||
|
||||
```python
|
||||
| close() -> None
|
||||
```
|
||||
|
||||
Close the environment.
|
|
@ -0,0 +1,54 @@
|
|||
# Unity ML-Agents PettingZoo Wrapper
|
||||
|
||||
With the increasing interest in multi-agent training with a gym-like API, we provide a
|
||||
PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper
|
||||
provides interfaces on top of our `UnityEnvironment` class, which is the default way of
|
||||
interfacing with a Unity environment via Python.
|
||||
|
||||
## Installation and Examples
|
||||
|
||||
[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb)
|
||||
|
||||
This colab notebook demonstrates the example usage of the wrapper, including installation,
|
||||
basic usages, and an example with our
|
||||
[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie)
|
||||
which is a multi-agents environment with multiple different behavior names.
|
||||
|
||||
## API interface
|
||||
|
||||
This wrapper is compatible with PettingZoo API. Please check out
|
||||
[PettingZoo API page](https://www.pettingzoo.ml/api) for more details.
|
||||
Here's an example of interacting with wrapped environment:
|
||||
|
||||
```python
|
||||
from mlagents_envs.environment import UnityEnvironment
|
||||
from mlagents_envs.envs import UnityToPettingZooWrapper
|
||||
|
||||
unity_env = UnityEnvironment("StrikersVsGoalie")
|
||||
env = UnityToPettingZooWrapper(unity_env)
|
||||
env.reset()
|
||||
for agent in env.agent_iter():
|
||||
observation, reward, done, info = env.last()
|
||||
action = policy(observation, agent)
|
||||
env.step(action)
|
||||
```
|
||||
|
||||
## Notes
|
||||
- There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments)
|
||||
and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs.
|
||||
- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly
|
||||
different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`,
|
||||
the PZ wrapper will store the action, and will only perform environment stepping when all the
|
||||
agents requesting for actions in the current step have been assigned an action. This is for
|
||||
performance, considering that the communication between Unity and python is more efficient
|
||||
when data are sent in batches.
|
||||
- Since the actions for the AEC wrapper are stored without applying them to the environment until
|
||||
all the actions are queued, some components of the API might behave in unexpected way. For example, a call
|
||||
to `env.reward` should return the instantaneous reward for that particular step, but the true
|
||||
reward would only be available when an actual environment step is performed. It's recommended that
|
||||
you follow the API definition for training (access rewards from `env.last()` instead of
|
||||
`env.reward`) and the underlying mechanism shouldn't affect training results.
|
||||
- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will
|
||||
keep going on until the specified max step is reached (default: `2**63`). There is no need to
|
||||
call `env.reset()` except for the very beginning of instantiating an environment.
|
||||
|
|
@ -51,10 +51,10 @@
|
|||
## API Docs
|
||||
|
||||
- [API Reference](API-Reference.md)
|
||||
- [Python API Documentation](Python-API-Documentation.md)
|
||||
- [How to use the Python API](Python-API.md)
|
||||
- [Python API Documentation](Python-LLAPI-Documentation.md)
|
||||
- [How to use the Python API](Python-LLAPI.md)
|
||||
- [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
|
||||
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
|
||||
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)
|
||||
|
||||
## Translations
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Unity Environment Registry [Experimental]
|
||||
|
||||
The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-API.md).
|
||||
The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-LLAPI.md).
|
||||
|
||||
## Loading an Environment from the Registry
|
||||
|
||||
|
@ -14,7 +14,7 @@ for name in environment_names:
|
|||
print(name)
|
||||
```
|
||||
|
||||
The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-API.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:
|
||||
The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-LLAPI.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:
|
||||
|
||||
```python
|
||||
from mlagents_envs.registry import default_registry
|
||||
|
|
|
@ -18,8 +18,7 @@ from dependencies of other projects. This has a few advantages:
|
|||
with the different version.
|
||||
|
||||
## Python Version Requirement (Required)
|
||||
|
||||
This guide has been tested with Python 3.7 through Python 3.8. Newer versions might not
|
||||
This guide has been tested with Python 3.7.2 through Python 3.9.9. Newer versions might not
|
||||
have support for the dependent libraries, so are not recommended.
|
||||
|
||||
## Installing Pip (Required)
|
||||
|
@ -64,8 +63,7 @@ then python3-distutils needs to be installed. Install python3-distutils using
|
|||
environment using the same `activate` command listed above)
|
||||
|
||||
Note:
|
||||
|
||||
- Verify that you are using Python 3.7. Launch a command prompt
|
||||
using `cmd` and execute `python --version` to verify the version.
|
||||
- Verify that you are using a Python version between 3.7.2 and 3.9.9. Launch a
|
||||
command prompt using `cmd` and execute `python --version` to verify the version.
|
||||
- Python3 installation may require admin privileges on Windows.
|
||||
- This guide is for Windows 10 using a 64-bit architecture only.
|
||||
|
|
До Ширина: | Высота: | Размер: 67 KiB После Ширина: | Высота: | Размер: 67 KiB |
До Ширина: | Высота: | Размер: 36 KiB После Ширина: | Высота: | Размер: 36 KiB |
|
@ -51,9 +51,9 @@
|
|||
## API Docs
|
||||
|
||||
- [API Reference](API-Reference.md)
|
||||
- [How to use the Python API](Python-API.md)
|
||||
- [How to use the Python API](Python-LLAPI.md)
|
||||
- [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
|
||||
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
|
||||
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)
|
||||
|
||||
## Translations
|
||||
|
||||
|
@ -78,4 +78,4 @@ to keep them up just in case they are helpful to you.
|
|||
- [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
|
||||
- [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
|
||||
|
||||
-->
|
||||
-->
|
||||
|
|
|
@ -25,7 +25,7 @@ ML-Agents Academy 类按如下方式编排 agent 模拟循环:
|
|||
|
||||
要创建训练环境,请扩展 Academy 和 Agent 类以实现上述方法。`Agent.CollectObservations()` 和 `Agent.AgentAction()` 函数必须实现;而其他方法是可选的,即是否需要实现它们取决于您的具体情况。
|
||||
|
||||
**注意:**在这里用到的 Python API 也可用于其他目的。例如,借助于该 API,您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-API.md) 以了解更多信息。
|
||||
**注意:**在这里用到的 Python API 也可用于其他目的。例如,借助于该 API,您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-LLAPI.md) 以了解更多信息。
|
||||
|
||||
## 组织 Unity 场景
|
||||
|
||||
|
|
|
@ -252,7 +252,7 @@ Internal Brain 中,以便为连接到该 Brain 的所有 Agent 生成
|
|||
的 Brain 类型都会设置为 External,并且场景中所有 Agent 的行为
|
||||
都将在 Python 中接受控制。
|
||||
|
||||
我们目前没有教程介绍这种模式,但您可以在[这里](/docs/Python-API.md)
|
||||
我们目前没有教程介绍这种模式,但您可以在[这里](/docs/Python-LLAPI.md)
|
||||
了解有关 Python API 的更多信息。
|
||||
|
||||
### Curriculum Learning(课程学习)
|
||||
|
|
|
@ -39,6 +39,6 @@
|
|||
|
||||
## API 文档
|
||||
* [API 参考](/docs/API-Reference.md)
|
||||
* [如何使用 Python API](/docs/Python-API.md)
|
||||
* [如何使用 Python API](/docs/Python-LLAPI.md)
|
||||
|
||||
**注:** 有翻译版的文档会在右上角标注*号。
|
||||
**注:** 有翻译版的文档会在右上角标注*号。
|
||||
|
|
|
@ -1,5 +0,0 @@
|
|||
# Version of the library that will be used to upload to pypi
|
||||
__version__ = "0.29.0.dev0"
|
||||
|
||||
# Git tag that will be checked to determine whether to trigger upload to pypi
|
||||
__release_tag__ = None
|
|
@ -1,43 +0,0 @@
|
|||
#!/usr/bin/env python
|
||||
|
||||
import os
|
||||
import sys
|
||||
from setuptools import setup, find_packages
|
||||
from setuptools.command.install import install
|
||||
import gym_unity
|
||||
|
||||
VERSION = gym_unity.__version__
|
||||
EXPECTED_TAG = gym_unity.__release_tag__
|
||||
|
||||
|
||||
class VerifyVersionCommand(install):
|
||||
"""
|
||||
Custom command to verify that the git tag is the expected one for the release.
|
||||
Originally based on https://circleci.com/blog/continuously-deploying-python-packages-to-pypi-with-circleci/
|
||||
This differs slightly because our tags and versions are different.
|
||||
"""
|
||||
|
||||
description = "verify that the git tag matches our version"
|
||||
|
||||
def run(self):
|
||||
tag = os.getenv("GITHUB_REF", "NO GITHUB TAG!").replace("refs/tags/", "")
|
||||
|
||||
if tag != EXPECTED_TAG:
|
||||
info = "Git tag: {} does not match the expected tag of this app: {}".format(
|
||||
tag, EXPECTED_TAG
|
||||
)
|
||||
sys.exit(info)
|
||||
|
||||
|
||||
setup(
|
||||
name="gym_unity",
|
||||
version=VERSION,
|
||||
description="Unity Machine Learning Agents Gym Interface",
|
||||
license="Apache License 2.0",
|
||||
author="Unity Technologies",
|
||||
author_email="ML-Agents@unity3d.com",
|
||||
url="https://github.com/Unity-Technologies/ml-agents",
|
||||
packages=find_packages(),
|
||||
install_requires=["gym==0.21.0", f"mlagents_envs=={VERSION}"],
|
||||
cmdclass={"verify": VerifyVersionCommand},
|
||||
)
|
|
@ -2,9 +2,13 @@
|
|||
|
||||
The `mlagents_envs` Python package is part of the
|
||||
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
|
||||
`mlagents_envs` provides a Python API that allows direct interaction with the
|
||||
Unity game engine. It is used by the trainer implementation in `mlagents` as
|
||||
well as the `gym-unity` package to perform reinforcement learning within Unity.
|
||||
`mlagents_envs` provides three Python APIs that allows direct interaction with the
|
||||
Unity game engine:
|
||||
- A single agent API (Gym API)
|
||||
- A gym-like multi-agent API (PettingZoo API)
|
||||
- A low-level API (LLAPI)
|
||||
|
||||
The LLAPI is used by the trainer implementation in `mlagents`.
|
||||
`mlagents_envs` can be used independently of `mlagents` for Python
|
||||
communication.
|
||||
|
||||
|
@ -13,13 +17,17 @@ communication.
|
|||
Install the `mlagents_envs` package with:
|
||||
|
||||
```sh
|
||||
python -m pip install mlagents_envs==0.28.0
|
||||
python -m pip install mlagents_envs==0.29.0
|
||||
```
|
||||
|
||||
## Usage & More Information
|
||||
|
||||
See the [Python API Guide](../docs/Python-API.md) for more information on how to
|
||||
use the API to interact with a Unity environment.
|
||||
See
|
||||
- [Gym API Guide](../docs/Python-Gym-API.md)
|
||||
- [PettingZoo API Guide](../docs/Python-PettingZoo-API.md)
|
||||
- [Python API Guide](../docs/Python-LLAPI.md)
|
||||
|
||||
for more information on how to use the API to interact with a Unity environment.
|
||||
|
||||
For more information on the ML-Agents Toolkit and how to instrument a Unity
|
||||
scene with the ML-Agents SDK, check out the main
|
||||
|
|
|
@ -0,0 +1,318 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ML-Agents PettingZoo Wrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#@title Install Rendering Dependencies { display-mode: \"form\" }\n",
|
||||
"#@markdown (You only need to run this code when using Colab's hosted runtime)\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"from IPython.display import HTML, display\n",
|
||||
"\n",
|
||||
"def progress(value, max=100):\n",
|
||||
" return HTML(\"\"\"\n",
|
||||
" <progress\n",
|
||||
" value='{value}'\n",
|
||||
" max='{max}',\n",
|
||||
" style='width: 100%'\n",
|
||||
" >\n",
|
||||
" {value}\n",
|
||||
" </progress>\n",
|
||||
" \"\"\".format(value=value, max=max))\n",
|
||||
"\n",
|
||||
"pro_bar = display(progress(0, 100), display_id=True)\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" import google.colab\n",
|
||||
" INSTALL_XVFB = True\n",
|
||||
"except ImportError:\n",
|
||||
" INSTALL_XVFB = 'COLAB_ALWAYS_INSTALL_XVFB' in os.environ\n",
|
||||
"\n",
|
||||
"if INSTALL_XVFB:\n",
|
||||
" with open('frame-buffer', 'w') as writefile:\n",
|
||||
" writefile.write(\"\"\"#taken from https://gist.github.com/jterrace/2911875\n",
|
||||
"XVFB=/usr/bin/Xvfb\n",
|
||||
"XVFBARGS=\":1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset\"\n",
|
||||
"PIDFILE=./frame-buffer.pid\n",
|
||||
"case \"$1\" in\n",
|
||||
" start)\n",
|
||||
" echo -n \"Starting virtual X frame buffer: Xvfb\"\n",
|
||||
" /sbin/start-stop-daemon --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS\n",
|
||||
" echo \".\"\n",
|
||||
" ;;\n",
|
||||
" stop)\n",
|
||||
" echo -n \"Stopping virtual X frame buffer: Xvfb\"\n",
|
||||
" /sbin/start-stop-daemon --stop --quiet --pidfile $PIDFILE\n",
|
||||
" rm $PIDFILE\n",
|
||||
" echo \".\"\n",
|
||||
" ;;\n",
|
||||
" restart)\n",
|
||||
" $0 stop\n",
|
||||
" $0 start\n",
|
||||
" ;;\n",
|
||||
" *)\n",
|
||||
" echo \"Usage: /etc/init.d/xvfb {start|stop|restart}\"\n",
|
||||
" exit 1\n",
|
||||
"esac\n",
|
||||
"exit 0\n",
|
||||
" \"\"\")\n",
|
||||
" pro_bar.update(progress(5, 100))\n",
|
||||
" !apt-get install daemon >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(10, 100))\n",
|
||||
" !apt-get install wget >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(20, 100))\n",
|
||||
" !wget http://security.ubuntu.com/ubuntu/pool/main/libx/libxfont/libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(30, 100))\n",
|
||||
" !wget --output-document xvfb.deb http://security.ubuntu.com/ubuntu/pool/universe/x/xorg-server/xvfb_1.18.4-0ubuntu0.12_amd64.deb >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(40, 100))\n",
|
||||
" !dpkg -i libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(50, 100))\n",
|
||||
" !dpkg -i xvfb.deb >/dev/null 2>&1\n",
|
||||
" pro_bar.update(progress(70, 100))\n",
|
||||
" !rm libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb\n",
|
||||
" pro_bar.update(progress(80, 100))\n",
|
||||
" !rm xvfb.deb\n",
|
||||
" pro_bar.update(progress(90, 100))\n",
|
||||
" !bash frame-buffer start\n",
|
||||
" os.environ[\"DISPLAY\"] = \":1\"\n",
|
||||
"pro_bar.update(progress(100, 100))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installing ml-agents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" import mlagents\n",
|
||||
" print(\"ml-agents already installed\")\n",
|
||||
"except ImportError:\n",
|
||||
" !git clone -b main --single-branch https://github.com/Unity-Technologies/ml-agents.git\n",
|
||||
" !python -m pip install -q ./ml-agents/ml-agents-envs\n",
|
||||
" !python -m pip install -q ./ml-agents/ml-agents\n",
|
||||
" print(\"Installed ml-agents\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run the Environment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"jp-MarkdownHeadingCollapsed": true,
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"List of available environments:\n",
|
||||
"* Basic\n",
|
||||
"* ThreeDBall\n",
|
||||
"* ThreeDBallHard\n",
|
||||
"* GridWorld\n",
|
||||
"* Hallway\n",
|
||||
"* VisualHallway\n",
|
||||
"* CrawlerDynamicTarget\n",
|
||||
"* CrawlerStaticTarget\n",
|
||||
"* Bouncer\n",
|
||||
"* SoccerTwos\n",
|
||||
"* PushBlock\n",
|
||||
"* VisualPushBlock\n",
|
||||
"* WallJump\n",
|
||||
"* Tennis\n",
|
||||
"* Reacher\n",
|
||||
"* Pyramids\n",
|
||||
"* VisualPyramids\n",
|
||||
"* Walker\n",
|
||||
"* FoodCollector\n",
|
||||
"* VisualFoodCollector\n",
|
||||
"* StrikersVsGoalie\n",
|
||||
"* WormStaticTarget\n",
|
||||
"* WormDynamicTarget"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Start Environment with PettingZoo Wrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "YSf-WhxbqtLw"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# -----------------\n",
|
||||
"# This code is used to close an env that might not have been closed before\n",
|
||||
"try:\n",
|
||||
" env.close()\n",
|
||||
"except:\n",
|
||||
" pass\n",
|
||||
"# -----------------\n",
|
||||
"\n",
|
||||
"import numpy as np\n",
|
||||
"from mlagents_envs.envs import StrikersVsGoalie # import unity environment\n",
|
||||
"env = StrikersVsGoalie.env()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Stepping the environment\n",
|
||||
"\n",
|
||||
"Example of interacting with the environment in basic RL loop. It follows the same interface as described in [PettingZoo API page](https://www.pettingzoo.ml/api)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "dhtl0mpeqxYi"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"num_cycles = 10\n",
|
||||
"\n",
|
||||
"env.reset()\n",
|
||||
"for agent in env.agent_iter(env.num_agents * num_cycles):\n",
|
||||
" prev_observe, reward, done, info = env.last()\n",
|
||||
" if isinstance(prev_observe, dict) and 'action_mask' in prev_observe:\n",
|
||||
" action_mask = prev_observe['action_mask']\n",
|
||||
" if done:\n",
|
||||
" action = None\n",
|
||||
" else:\n",
|
||||
" action = env.action_spaces[agent].sample() # randomly choose an action for example\n",
|
||||
" env.step(action)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Additional Environment API\n",
|
||||
"\n",
|
||||
"All the API described in the `Additional Environment API` section in the [PettingZoo API page](https://www.pettingzoo.ml/api) are all supported. A few examples are shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# `agents`: a list of the names of all current agents\n",
|
||||
"print(\"Agent names:\", env.agents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# `agent_selection`: the currently agent that an action can be taken for.\n",
|
||||
"print(\"Current agent:\", env.agent_selection)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# `observation_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
|
||||
"print(\"Observation space of current agent:\", env.observation_spaces[env.agent_selection])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# `action_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
|
||||
"print(\"Action space of current agent:\", env.action_spaces[env.agent_selection])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Close the Environment to free the port it is using"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "a7KatdThq7OV"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"env.close()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"collapsed_sections": [],
|
||||
"name": "Colab-UnityEnvironment-1-Run.ipynb",
|
||||
"private_outputs": true,
|
||||
"provenance": [],
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
|
@ -0,0 +1,15 @@
|
|||
from mlagents_envs.registry import default_registry
|
||||
from mlagents_envs.envs.pettingzoo_env_factory import logger, PettingZooEnvFactory
|
||||
|
||||
# Register each environment in default_registry as a PettingZooEnv
|
||||
for key in default_registry:
|
||||
env_name = key
|
||||
if key[0].isdigit():
|
||||
env_name = key.replace("3", "Three")
|
||||
if not env_name.isidentifier():
|
||||
logger.warning(
|
||||
f"Environment id {env_name} can not be registered since it is"
|
||||
f"not a valid identifier name."
|
||||
)
|
||||
continue
|
||||
locals()[env_name] = PettingZooEnvFactory(key)
|
|
@ -0,0 +1,76 @@
|
|||
from urllib.parse import urlparse, parse_qs
|
||||
|
||||
|
||||
def _behavior_to_agent_id(behavior_name: str, unique_id: int) -> str:
|
||||
return f"{behavior_name}?agent_id={unique_id}"
|
||||
|
||||
|
||||
def _agent_id_to_behavior(agent_id: str) -> str:
|
||||
return agent_id.split("?agent_id=")[0]
|
||||
|
||||
|
||||
def _unwrap_batch_steps(batch_steps, behavior_name):
|
||||
decision_batch, termination_batch = batch_steps
|
||||
decision_id = [
|
||||
_behavior_to_agent_id(behavior_name, i) for i in decision_batch.agent_id
|
||||
]
|
||||
termination_id = [
|
||||
_behavior_to_agent_id(behavior_name, i) for i in termination_batch.agent_id
|
||||
]
|
||||
agents = decision_id + termination_id
|
||||
obs = {
|
||||
agent_id: [batch_obs[i] for batch_obs in termination_batch.obs]
|
||||
for i, agent_id in enumerate(termination_id)
|
||||
}
|
||||
if decision_batch.action_mask is not None:
|
||||
obs.update(
|
||||
{
|
||||
agent_id: {
|
||||
"observation": [batch_obs[i] for batch_obs in decision_batch.obs],
|
||||
"action_mask": [mask[i] for mask in decision_batch.action_mask],
|
||||
}
|
||||
for i, agent_id in enumerate(decision_id)
|
||||
}
|
||||
)
|
||||
else:
|
||||
obs.update(
|
||||
{
|
||||
agent_id: [batch_obs[i] for batch_obs in decision_batch.obs]
|
||||
for i, agent_id in enumerate(decision_id)
|
||||
}
|
||||
)
|
||||
obs = {k: v if len(v) > 1 else v[0] for k, v in obs.items()}
|
||||
dones = {agent_id: True for agent_id in termination_id}
|
||||
dones.update({agent_id: False for agent_id in decision_id})
|
||||
rewards = {
|
||||
agent_id: termination_batch.reward[i]
|
||||
for i, agent_id in enumerate(termination_id)
|
||||
}
|
||||
rewards.update(
|
||||
{agent_id: decision_batch.reward[i] for i, agent_id in enumerate(decision_id)}
|
||||
)
|
||||
cumulative_rewards = {k: v for k, v in rewards.items()}
|
||||
infos = {}
|
||||
for i, agent_id in enumerate(decision_id):
|
||||
infos[agent_id] = {}
|
||||
infos[agent_id]["behavior_name"] = behavior_name
|
||||
infos[agent_id]["group_id"] = decision_batch.group_id[i]
|
||||
infos[agent_id]["group_reward"] = decision_batch.group_reward[i]
|
||||
for i, agent_id in enumerate(termination_id):
|
||||
infos[agent_id] = {}
|
||||
infos[agent_id]["behavior_name"] = behavior_name
|
||||
infos[agent_id]["group_id"] = termination_batch.group_id[i]
|
||||
infos[agent_id]["group_reward"] = termination_batch.group_reward[i]
|
||||
infos[agent_id]["interrupted"] = termination_batch.interrupted[i]
|
||||
id_map = {agent_id: i for i, agent_id in enumerate(decision_id)}
|
||||
return agents, obs, dones, rewards, cumulative_rewards, infos, id_map
|
||||
|
||||
|
||||
def _parse_behavior(full_behavior):
|
||||
parsed = urlparse(full_behavior)
|
||||
name = parsed.path
|
||||
ids = parse_qs(parsed.query)
|
||||
team_id: int = 0
|
||||
if "team" in ids:
|
||||
team_id = int(ids["team"][0])
|
||||
return name, team_id
|
|
@ -0,0 +1,50 @@
|
|||
from typing import Optional, Union, List
|
||||
|
||||
from mlagents_envs import logging_util
|
||||
from mlagents_envs.exception import UnityWorkerInUseException
|
||||
from mlagents_envs.registry import default_registry
|
||||
from mlagents_envs.side_channel.engine_configuration_channel import (
|
||||
EngineConfigurationChannel,
|
||||
)
|
||||
from mlagents_envs.side_channel.environment_parameters_channel import (
|
||||
EnvironmentParametersChannel,
|
||||
)
|
||||
from mlagents_envs.side_channel.stats_side_channel import StatsSideChannel
|
||||
from mlagents_envs.envs.unity_aec_env import UnityAECEnv
|
||||
|
||||
logger = logging_util.get_logger(__name__)
|
||||
|
||||
|
||||
class PettingZooEnvFactory:
|
||||
def __init__(self, env_id: str) -> None:
|
||||
self.env_id = env_id
|
||||
|
||||
def env(
|
||||
self, seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]
|
||||
) -> UnityAECEnv:
|
||||
"""
|
||||
Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
|
||||
:param seed: The seed for the action spaces of the agents.
|
||||
:param kwargs: Any argument accepted by `UnityEnvironment`class except file_name
|
||||
"""
|
||||
# If not side_channels specified, add the followings
|
||||
if "side_channels" not in kwargs:
|
||||
kwargs["side_channels"] = [
|
||||
EngineConfigurationChannel(),
|
||||
EnvironmentParametersChannel(),
|
||||
StatsSideChannel(),
|
||||
]
|
||||
_env = None
|
||||
# If no base port argument is provided, try ports starting at 6000 until one is free
|
||||
if "base_port" not in kwargs:
|
||||
port = 6000
|
||||
while _env is None:
|
||||
try:
|
||||
kwargs["base_port"] = port
|
||||
_env = default_registry[self.env_id].make(**kwargs)
|
||||
except UnityWorkerInUseException:
|
||||
port += 1
|
||||
pass
|
||||
else:
|
||||
_env = default_registry[self.env_id].make(**kwargs)
|
||||
return UnityAECEnv(_env, seed)
|
|
@ -0,0 +1,72 @@
|
|||
from typing import Any, Optional
|
||||
from gym import error
|
||||
from mlagents_envs.base_env import BaseEnv
|
||||
from pettingzoo import AECEnv
|
||||
|
||||
from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
|
||||
|
||||
|
||||
class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv):
|
||||
"""
|
||||
Unity AEC (PettingZoo) environment wrapper.
|
||||
"""
|
||||
|
||||
def __init__(self, env: BaseEnv, seed: Optional[int] = None):
|
||||
"""
|
||||
Initializes a Unity AEC environment wrapper.
|
||||
|
||||
:param env: The UnityEnvironment that is being wrapped.
|
||||
:param seed: The seed for the action spaces of the agents.
|
||||
"""
|
||||
super().__init__(env, seed)
|
||||
|
||||
def step(self, action: Any) -> None:
|
||||
"""
|
||||
Sets the action of the active agent and get the observation, reward, done
|
||||
and info of the next agent.
|
||||
:param action: The action for the active agent
|
||||
"""
|
||||
self._assert_loaded()
|
||||
if len(self._live_agents) <= 0:
|
||||
raise error.Error(
|
||||
"You must reset the environment before you can perform a step"
|
||||
)
|
||||
|
||||
# Process action
|
||||
current_agent = self._agents[self._agent_index]
|
||||
self._process_action(current_agent, action)
|
||||
|
||||
self._agent_index += 1
|
||||
# Reset reward
|
||||
for k in self._rewards.keys():
|
||||
self._rewards[k] = 0
|
||||
|
||||
if self._agent_index >= len(self._agents) and self.num_agents > 0:
|
||||
# The index is too high, time to set the action for the agents we have
|
||||
self._step()
|
||||
self._live_agents.sort() # unnecessary, only for passing API test
|
||||
|
||||
def observe(self, agent_id):
|
||||
"""
|
||||
Returns the observation an agent currently can make. `last()` calls this function.
|
||||
"""
|
||||
return (
|
||||
self._observations[agent_id],
|
||||
self._cumm_rewards[agent_id],
|
||||
self._dones[agent_id],
|
||||
self._infos[agent_id],
|
||||
)
|
||||
|
||||
def last(self, observe=True):
|
||||
"""
|
||||
returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
|
||||
"""
|
||||
obs, reward, done, info = self.observe(self._agents[self._agent_index])
|
||||
return obs if observe else None, reward, done, info
|
||||
|
||||
@property
|
||||
def agent_selection(self):
|
||||
if not self._live_agents:
|
||||
# If we had an agent finish then return that agent even though it isn't alive.
|
||||
return self._agents[0]
|
||||
return self._agents[self._agent_index]
|
|
@ -19,8 +19,6 @@ class UnityGymException(error.Error):
|
|||
|
||||
|
||||
logger = logging_util.get_logger(__name__)
|
||||
logging_util.set_log_level(logging_util.INFO)
|
||||
|
||||
GymStepResult = Tuple[np.ndarray, float, bool, Dict]
|
||||
|
||||
|
||||
|
@ -58,7 +56,7 @@ class UnityToGymWrapper(gym.Env):
|
|||
self.visual_obs = None
|
||||
|
||||
# Save the step result from the last time all Agents requested decisions.
|
||||
self._previous_decision_step: DecisionSteps = None
|
||||
self._previous_decision_step: Optional[DecisionSteps] = None
|
||||
self._flattener = None
|
||||
# Hidden flag used by Atari environments to determine if the game is over
|
||||
self.game_over = False
|
||||
|
@ -355,7 +353,7 @@ class ActionFlattener:
|
|||
def lookup_action(self, action):
|
||||
"""
|
||||
Convert a scalar discrete action into a unique set of branched actions.
|
||||
:param: action: A scalar value representing one of the discrete actions.
|
||||
:return: The List containing the branched actions.
|
||||
:param action: A scalar value representing one of the discrete actions.
|
||||
:returns: The List containing the branched actions.
|
||||
"""
|
||||
return self.action_lookup[action]
|
|
@ -0,0 +1,53 @@
|
|||
from typing import Optional, Dict, Any, Tuple
|
||||
from gym import error
|
||||
from mlagents_envs.base_env import BaseEnv
|
||||
from pettingzoo import ParallelEnv
|
||||
|
||||
from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
|
||||
|
||||
|
||||
class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv):
|
||||
"""
|
||||
Unity Parallel (PettingZoo) environment wrapper.
|
||||
"""
|
||||
|
||||
def __init__(self, env: BaseEnv, seed: Optional[int] = None):
|
||||
"""
|
||||
Initializes a Unity Parallel environment wrapper.
|
||||
|
||||
:param env: The UnityEnvironment that is being wrapped.
|
||||
:param seed: The seed for the action spaces of the agents.
|
||||
"""
|
||||
super().__init__(env, seed)
|
||||
|
||||
def reset(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Resets the environment.
|
||||
"""
|
||||
super().reset()
|
||||
|
||||
return self._observations
|
||||
|
||||
def step(self, actions: Dict[str, Any]) -> Tuple:
|
||||
self._assert_loaded()
|
||||
if len(self._live_agents) <= 0 and actions:
|
||||
raise error.Error(
|
||||
"You must reset the environment before you can perform a step."
|
||||
)
|
||||
|
||||
# Process actions
|
||||
for current_agent, action in actions.items():
|
||||
self._process_action(current_agent, action)
|
||||
|
||||
# Reset reward
|
||||
for k in self._rewards.keys():
|
||||
self._rewards[k] = 0
|
||||
|
||||
# Step environment
|
||||
self._step()
|
||||
|
||||
# Agent cleanup and sorting
|
||||
self._cleanup_agents()
|
||||
self._live_agents.sort() # unnecessary, only for passing API test
|
||||
|
||||
return self._observations, self._rewards, self._dones, self._infos
|
|
@ -0,0 +1,317 @@
|
|||
import atexit
|
||||
from typing import Optional, List, Set, Dict, Any, Tuple
|
||||
import numpy as np
|
||||
from gym import error, spaces
|
||||
from mlagents_envs.base_env import BaseEnv, ActionTuple
|
||||
from mlagents_envs.envs.env_helpers import _agent_id_to_behavior, _unwrap_batch_steps
|
||||
|
||||
|
||||
class UnityPettingzooBaseEnv:
|
||||
"""
|
||||
Unity Petting Zoo base environment.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, env: BaseEnv, seed: Optional[int] = None, metadata: Optional[dict] = None
|
||||
):
|
||||
super().__init__()
|
||||
atexit.register(self.close)
|
||||
self._env = env
|
||||
self.metadata = metadata
|
||||
self._assert_loaded()
|
||||
|
||||
self._agent_index = 0
|
||||
self._seed = seed
|
||||
self._side_channel_dict = {
|
||||
type(v).__name__: v
|
||||
for v in self._env._side_channel_manager._side_channels_dict.values() # type: ignore
|
||||
}
|
||||
|
||||
self._live_agents: List[str] = [] # agent id for agents alive
|
||||
self._agents: List[str] = [] # all agent id in current step
|
||||
self._possible_agents: Set[str] = set() # all agents that have ever appear
|
||||
self._agent_id_to_index: Dict[str, int] = {} # agent_id: index in decision step
|
||||
self._observations: Dict[str, np.ndarray] = {} # agent_id: obs
|
||||
self._dones: Dict[str, bool] = {} # agent_id: done
|
||||
self._rewards: Dict[str, float] = {} # agent_id: reward
|
||||
self._cumm_rewards: Dict[str, float] = {} # agent_id: reward
|
||||
self._infos: Dict[str, Dict] = {} # agent_id: info
|
||||
self._action_spaces: Dict[str, spaces.Space] = {} # behavior_name: action_space
|
||||
self._observation_spaces: Dict[
|
||||
str, spaces.Space
|
||||
] = {} # behavior_name: obs_space
|
||||
self._current_action: Dict[str, ActionTuple] = {} # behavior_name: ActionTuple
|
||||
# Take a single step so that the brain information will be sent over
|
||||
if not self._env.behavior_specs:
|
||||
self._env.step()
|
||||
for behavior_name in self._env.behavior_specs.keys():
|
||||
_, _, _ = self._batch_update(behavior_name)
|
||||
self._update_observation_spaces()
|
||||
self._update_action_spaces()
|
||||
|
||||
def _assert_loaded(self) -> None:
|
||||
if self._env is None:
|
||||
raise error.Error("No environment loaded")
|
||||
|
||||
@property
|
||||
def observation_spaces(self) -> Dict[str, spaces.Space]:
|
||||
"""
|
||||
Return the observation spaces of all the agents.
|
||||
"""
|
||||
return {
|
||||
agent_id: self._observation_spaces[_agent_id_to_behavior(agent_id)]
|
||||
for agent_id in self._possible_agents
|
||||
}
|
||||
|
||||
def observation_space(self, agent: str) -> Optional[spaces.Space]:
|
||||
"""
|
||||
The observation space of the current agent.
|
||||
"""
|
||||
behavior_name = _agent_id_to_behavior(agent)
|
||||
return self._observation_spaces[behavior_name]
|
||||
|
||||
def _update_observation_spaces(self) -> None:
|
||||
self._assert_loaded()
|
||||
for behavior_name in self._env.behavior_specs.keys():
|
||||
if behavior_name not in self._observation_spaces:
|
||||
obs_spec = self._env.behavior_specs[behavior_name].observation_specs
|
||||
obs_spaces = tuple(
|
||||
spaces.Box(
|
||||
low=-np.float32(np.inf),
|
||||
high=np.float32(np.inf),
|
||||
shape=spec.shape,
|
||||
dtype=np.float32,
|
||||
)
|
||||
for spec in obs_spec
|
||||
)
|
||||
if len(obs_spaces) == 1:
|
||||
self._observation_spaces[behavior_name] = obs_spaces[0]
|
||||
else:
|
||||
self._observation_spaces[behavior_name] = spaces.Tuple(obs_spaces)
|
||||
|
||||
@property
|
||||
def action_spaces(self) -> Dict[str, spaces.Space]:
|
||||
"""
|
||||
Return the action spaces of all the agents.
|
||||
"""
|
||||
return {
|
||||
agent_id: self._action_spaces[_agent_id_to_behavior(agent_id)]
|
||||
for agent_id in self._possible_agents
|
||||
}
|
||||
|
||||
def action_space(self, agent: str) -> Optional[spaces.Space]:
|
||||
"""
|
||||
The action space of the current agent.
|
||||
"""
|
||||
behavior_name = _agent_id_to_behavior(agent)
|
||||
return self._action_spaces[behavior_name]
|
||||
|
||||
def _update_action_spaces(self) -> None:
|
||||
self._assert_loaded()
|
||||
for behavior_name in self._env.behavior_specs.keys():
|
||||
if behavior_name not in self._action_spaces:
|
||||
act_spec = self._env.behavior_specs[behavior_name].action_spec
|
||||
if (
|
||||
act_spec.continuous_size == 0
|
||||
and len(act_spec.discrete_branches) == 0
|
||||
):
|
||||
raise error.Error("No actions found")
|
||||
if act_spec.discrete_size == 1:
|
||||
d_space = spaces.Discrete(act_spec.discrete_branches[0])
|
||||
if self._seed is not None:
|
||||
d_space.seed(self._seed)
|
||||
if act_spec.continuous_size == 0:
|
||||
self._action_spaces[behavior_name] = d_space
|
||||
continue
|
||||
if act_spec.discrete_size > 0:
|
||||
d_space = spaces.MultiDiscrete(act_spec.discrete_branches)
|
||||
if self._seed is not None:
|
||||
d_space.seed(self._seed)
|
||||
if act_spec.continuous_size == 0:
|
||||
self._action_spaces[behavior_name] = d_space
|
||||
continue
|
||||
if act_spec.continuous_size > 0:
|
||||
c_space = spaces.Box(
|
||||
-1, 1, (act_spec.continuous_size,), dtype=np.int32
|
||||
)
|
||||
if self._seed is not None:
|
||||
c_space.seed(self._seed)
|
||||
if len(act_spec.discrete_branches) == 0:
|
||||
self._action_spaces[behavior_name] = c_space
|
||||
continue
|
||||
self._action_spaces[behavior_name] = spaces.Tuple((c_space, d_space))
|
||||
|
||||
def _process_action(self, current_agent, action):
|
||||
current_action_space = self.action_space(current_agent)
|
||||
# Convert actions
|
||||
if action is not None:
|
||||
if isinstance(action, Tuple):
|
||||
action = tuple(np.array(a) for a in action)
|
||||
else:
|
||||
action = self._action_to_np(current_action_space, action)
|
||||
if not current_action_space.contains(action): # type: ignore
|
||||
raise error.Error(
|
||||
f"Invalid action, got {action} but was expecting action from {self.action_space}"
|
||||
)
|
||||
if isinstance(current_action_space, spaces.Tuple):
|
||||
action = ActionTuple(action[0], action[1])
|
||||
elif isinstance(current_action_space, spaces.MultiDiscrete):
|
||||
action = ActionTuple(None, action)
|
||||
elif isinstance(current_action_space, spaces.Discrete):
|
||||
action = ActionTuple(None, np.array(action).reshape(1, 1))
|
||||
else:
|
||||
action = ActionTuple(action, None)
|
||||
|
||||
if not self._dones[current_agent]:
|
||||
current_behavior = _agent_id_to_behavior(current_agent)
|
||||
current_index = self._agent_id_to_index[current_agent]
|
||||
if action.continuous is not None:
|
||||
self._current_action[current_behavior].continuous[
|
||||
current_index
|
||||
] = action.continuous[0]
|
||||
if action.discrete is not None:
|
||||
self._current_action[current_behavior].discrete[
|
||||
current_index
|
||||
] = action.discrete[0]
|
||||
else:
|
||||
self._live_agents.remove(current_agent)
|
||||
del self._observations[current_agent]
|
||||
del self._dones[current_agent]
|
||||
del self._rewards[current_agent]
|
||||
del self._cumm_rewards[current_agent]
|
||||
del self._infos[current_agent]
|
||||
|
||||
def _step(self):
|
||||
for behavior_name, actions in self._current_action.items():
|
||||
self._env.set_actions(behavior_name, actions)
|
||||
self._env.step()
|
||||
self._reset_states()
|
||||
for behavior_name in self._env.behavior_specs.keys():
|
||||
dones, rewards, cumulative_rewards = self._batch_update(behavior_name)
|
||||
self._dones.update(dones)
|
||||
self._rewards.update(rewards)
|
||||
self._cumm_rewards.update(cumulative_rewards)
|
||||
self._agent_index = 0
|
||||
|
||||
def _cleanup_agents(self):
|
||||
for current_agent, done in self.dones.items():
|
||||
if done:
|
||||
self._live_agents.remove(current_agent)
|
||||
|
||||
@property
|
||||
def side_channel(self) -> Dict[str, Any]:
|
||||
"""
|
||||
The side channels of the environment. You can access the side channels
|
||||
of an environment with `env.side_channel[<name-of-channel>]`.
|
||||
"""
|
||||
self._assert_loaded()
|
||||
return self._side_channel_dict
|
||||
|
||||
@staticmethod
|
||||
def _action_to_np(current_action_space, action):
|
||||
return np.array(action, dtype=current_action_space.dtype)
|
||||
|
||||
def _create_empty_actions(self, behavior_name, num_agents):
|
||||
a_spec = self._env.behavior_specs[behavior_name].action_spec
|
||||
return ActionTuple(
|
||||
np.zeros((num_agents, a_spec.continuous_size), dtype=np.float32),
|
||||
np.zeros((num_agents, len(a_spec.discrete_branches)), dtype=np.int32),
|
||||
)
|
||||
|
||||
@property
|
||||
def _cumulative_rewards(self):
|
||||
return self._cumm_rewards
|
||||
|
||||
def _reset_states(self):
|
||||
self._live_agents = []
|
||||
self._agents = []
|
||||
self._observations = {}
|
||||
self._dones = {}
|
||||
self._rewards = {}
|
||||
self._cumm_rewards = {}
|
||||
self._infos = {}
|
||||
self._agent_id_to_index = {}
|
||||
|
||||
def reset(self):
|
||||
"""
|
||||
Resets the environment.
|
||||
"""
|
||||
self._assert_loaded()
|
||||
self._agent_index = 0
|
||||
self._reset_states()
|
||||
self._possible_agents = set()
|
||||
self._env.reset()
|
||||
for behavior_name in self._env.behavior_specs.keys():
|
||||
_, _, _ = self._batch_update(behavior_name)
|
||||
self._live_agents.sort() # unnecessary, only for passing API test
|
||||
self._dones = {agent: False for agent in self._agents}
|
||||
self._rewards = {agent: 0 for agent in self._agents}
|
||||
self._cumm_rewards = {agent: 0 for agent in self._agents}
|
||||
|
||||
def _batch_update(self, behavior_name):
|
||||
current_batch = self._env.get_steps(behavior_name)
|
||||
self._current_action[behavior_name] = self._create_empty_actions(
|
||||
behavior_name, len(current_batch[0])
|
||||
)
|
||||
agents, obs, dones, rewards, cumulative_rewards, infos, id_map = _unwrap_batch_steps(
|
||||
current_batch, behavior_name
|
||||
)
|
||||
self._live_agents += agents
|
||||
self._agents += agents
|
||||
self._observations.update(obs)
|
||||
self._infos.update(infos)
|
||||
self._agent_id_to_index.update(id_map)
|
||||
self._possible_agents.update(agents)
|
||||
return dones, rewards, cumulative_rewards
|
||||
|
||||
def seed(self, seed=None):
|
||||
"""
|
||||
Reseeds the environment (making the resulting environment deterministic).
|
||||
`reset()` must be called after `seed()`, and before `step()`.
|
||||
"""
|
||||
self._seed = seed
|
||||
|
||||
def render(self, mode="human"):
|
||||
"""
|
||||
NOT SUPPORTED.
|
||||
|
||||
Displays a rendered frame from the environment, if supported.
|
||||
Alternate render modes in the default environments are `'rgb_array'`
|
||||
which returns a numpy array and is supported by all environments outside of classic,
|
||||
and `'ansi'` which returns the strings printed (specific to classic environments).
|
||||
"""
|
||||
pass
|
||||
|
||||
@property
|
||||
def dones(self):
|
||||
return dict(self._dones)
|
||||
|
||||
@property
|
||||
def agents(self):
|
||||
return sorted(self._live_agents)
|
||||
|
||||
@property
|
||||
def rewards(self):
|
||||
return dict(self._rewards)
|
||||
|
||||
@property
|
||||
def infos(self):
|
||||
return dict(self._infos)
|
||||
|
||||
@property
|
||||
def possible_agents(self):
|
||||
return sorted(self._possible_agents)
|
||||
|
||||
def close(self) -> None:
|
||||
"""
|
||||
Close the environment.
|
||||
"""
|
||||
if self._env is not None:
|
||||
self._env.close()
|
||||
self._env = None # type: ignore
|
||||
|
||||
def __del__(self) -> None:
|
||||
self.close()
|
||||
|
||||
def state(self):
|
||||
pass
|
|
@ -2,7 +2,18 @@
|
|||
folder: docs
|
||||
modules:
|
||||
- name: mlagents_envs
|
||||
file_name: Python-API-Documentation.md
|
||||
file_name: Python-Gym-API-Documentation.md
|
||||
submodules:
|
||||
- envs.unity_gym_env
|
||||
- name: mlagents_envs
|
||||
file_name: Python-PettingZoo-API-Documentation.md
|
||||
submodules:
|
||||
- envs.pettingzoo_env_factory
|
||||
- envs.unity_aec_env
|
||||
- envs.unity_parallel_env
|
||||
- envs.unity_pettingzoo_base_env
|
||||
- name: mlagents_envs
|
||||
file_name: Python-LLAPI-Documentation.md
|
||||
submodules:
|
||||
- base_env
|
||||
- environment
|
||||
|
|
|
@ -43,7 +43,9 @@ setup(
|
|||
"Programming Language :: Python :: 3.7",
|
||||
"Programming Language :: Python :: 3.8",
|
||||
],
|
||||
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
|
||||
packages=find_packages(
|
||||
exclude=["*.tests", "*.tests.*", "tests.*", "tests", "colabs", "*.ipynb"]
|
||||
),
|
||||
zip_safe=False,
|
||||
install_requires=[
|
||||
"cloudpickle",
|
||||
|
@ -52,6 +54,9 @@ setup(
|
|||
"Pillow>=4.2.1",
|
||||
"protobuf>=3.6",
|
||||
"pyyaml>=3.1.0",
|
||||
"gym==0.21.0",
|
||||
"pettingzoo==1.14.0",
|
||||
"numpy==1.21.2",
|
||||
],
|
||||
python_requires=">=3.7.2",
|
||||
cmdclass={"verify": VerifyVersionCommand},
|
||||
|
|
|
@ -0,0 +1,111 @@
|
|||
from typing import List, Tuple
|
||||
from mlagents_envs.base_env import ObservationSpec, DimensionProperty, ObservationType
|
||||
import pytest
|
||||
import copy
|
||||
import os
|
||||
from mlagents.trainers.settings import (
|
||||
POCASettings,
|
||||
TrainerSettings,
|
||||
PPOSettings,
|
||||
SACSettings,
|
||||
GAILSettings,
|
||||
CuriositySettings,
|
||||
RewardSignalSettings,
|
||||
NetworkSettings,
|
||||
TrainerType,
|
||||
RewardSignalType,
|
||||
ScheduleType,
|
||||
)
|
||||
|
||||
CONTINUOUS_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/test.demo"
|
||||
DISCRETE_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/testdcvis.demo"
|
||||
|
||||
_PPO_CONFIG = TrainerSettings(
|
||||
trainer_type=TrainerType.PPO,
|
||||
hyperparameters=PPOSettings(
|
||||
learning_rate=5.0e-3,
|
||||
learning_rate_schedule=ScheduleType.CONSTANT,
|
||||
batch_size=16,
|
||||
buffer_size=64,
|
||||
),
|
||||
network_settings=NetworkSettings(num_layers=1, hidden_units=32),
|
||||
summary_freq=500,
|
||||
max_steps=3000,
|
||||
threaded=False,
|
||||
)
|
||||
|
||||
_SAC_CONFIG = TrainerSettings(
|
||||
trainer_type=TrainerType.SAC,
|
||||
hyperparameters=SACSettings(
|
||||
learning_rate=5.0e-3,
|
||||
learning_rate_schedule=ScheduleType.CONSTANT,
|
||||
batch_size=8,
|
||||
buffer_init_steps=100,
|
||||
buffer_size=5000,
|
||||
tau=0.01,
|
||||
init_entcoef=0.01,
|
||||
),
|
||||
network_settings=NetworkSettings(num_layers=1, hidden_units=16),
|
||||
summary_freq=100,
|
||||
max_steps=1000,
|
||||
threaded=False,
|
||||
)
|
||||
|
||||
_POCA_CONFIG = TrainerSettings(
|
||||
trainer_type=TrainerType.POCA,
|
||||
hyperparameters=POCASettings(
|
||||
learning_rate=5.0e-3,
|
||||
learning_rate_schedule=ScheduleType.CONSTANT,
|
||||
batch_size=16,
|
||||
buffer_size=64,
|
||||
),
|
||||
network_settings=NetworkSettings(num_layers=1, hidden_units=32),
|
||||
summary_freq=500,
|
||||
max_steps=3000,
|
||||
threaded=False,
|
||||
)
|
||||
|
||||
|
||||
def ppo_dummy_config():
|
||||
return copy.deepcopy(_PPO_CONFIG)
|
||||
|
||||
|
||||
def sac_dummy_config():
|
||||
return copy.deepcopy(_SAC_CONFIG)
|
||||
|
||||
|
||||
def poca_dummy_config():
|
||||
return copy.deepcopy(_POCA_CONFIG)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def gail_dummy_config():
|
||||
return {RewardSignalType.GAIL: GAILSettings(demo_path=CONTINUOUS_DEMO_PATH)}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def curiosity_dummy_config():
|
||||
return {RewardSignalType.CURIOSITY: CuriositySettings()}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def extrinsic_dummy_config():
|
||||
return {RewardSignalType.EXTRINSIC: RewardSignalSettings()}
|
||||
|
||||
|
||||
def create_observation_specs_with_shapes(
|
||||
shapes: List[Tuple[int, ...]]
|
||||
) -> List[ObservationSpec]:
|
||||
obs_specs: List[ObservationSpec] = []
|
||||
for i, shape in enumerate(shapes):
|
||||
dim_prop = (DimensionProperty.UNSPECIFIED,) * len(shape)
|
||||
if len(shape) == 2:
|
||||
dim_prop = (DimensionProperty.VARIABLE_SIZE, DimensionProperty.NONE)
|
||||
spec = ObservationSpec(
|
||||
name=f"observation {i} with shape {shape}",
|
||||
shape=shape,
|
||||
dimension_property=dim_prop,
|
||||
observation_type=ObservationType.DEFAULT,
|
||||
)
|
||||
obs_specs.append(spec)
|
||||
return obs_specs
|
|
@ -0,0 +1,510 @@
|
|||
"""
|
||||
Copied from ml-agents/mlagents/trainers/tests/simple_test_envs.py
|
||||
|
||||
Modified the env so that it doesn't automatically reset and respawn agent in order to pass
|
||||
pettingzoo api tests, since current PZ api test doesn't allow spawning new agents.
|
||||
"""
|
||||
|
||||
import random
|
||||
from typing import Dict, List, Any, Tuple
|
||||
import numpy as np
|
||||
|
||||
from mlagents_envs.base_env import (
|
||||
ActionSpec,
|
||||
ObservationSpec,
|
||||
ObservationType,
|
||||
ActionTuple,
|
||||
BaseEnv,
|
||||
BehaviorSpec,
|
||||
DecisionSteps,
|
||||
TerminalSteps,
|
||||
BehaviorMapping,
|
||||
)
|
||||
from mlagents_envs.side_channel.side_channel_manager import SideChannelManager
|
||||
from dummy_config import create_observation_specs_with_shapes
|
||||
|
||||
OBS_SIZE = 1
|
||||
VIS_OBS_SIZE = (20, 20, 3)
|
||||
VAR_LEN_SIZE = (10, 5)
|
||||
STEP_SIZE = 0.2
|
||||
|
||||
TIME_PENALTY = 0.01
|
||||
MIN_STEPS = int(1.0 / STEP_SIZE) + 1
|
||||
SUCCESS_REWARD = 1.0 + MIN_STEPS * TIME_PENALTY
|
||||
|
||||
|
||||
def clamp(x, min_val, max_val):
|
||||
return max(min_val, min(x, max_val))
|
||||
|
||||
|
||||
class SimpleEnvironment(BaseEnv):
|
||||
"""
|
||||
Very simple "game" - the agent has a position on [-1, 1], gets a reward of 1 if it reaches 1, and a reward of -1 if
|
||||
it reaches -1. The position is incremented by the action amount (clamped to [-step_size, step_size]).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
brain_names,
|
||||
step_size=STEP_SIZE,
|
||||
num_visual=0,
|
||||
num_vector=1,
|
||||
num_var_len=0,
|
||||
vis_obs_size=VIS_OBS_SIZE,
|
||||
vec_obs_size=OBS_SIZE,
|
||||
var_len_obs_size=VAR_LEN_SIZE,
|
||||
action_sizes=(1, 0),
|
||||
goal_indices=None,
|
||||
):
|
||||
super().__init__()
|
||||
self.num_visual = num_visual
|
||||
self.num_vector = num_vector
|
||||
self.num_var_len = num_var_len
|
||||
self.vis_obs_size = vis_obs_size
|
||||
self.vec_obs_size = vec_obs_size
|
||||
self.var_len_obs_size = var_len_obs_size
|
||||
self.goal_indices = goal_indices
|
||||
continuous_action_size, discrete_action_size = action_sizes
|
||||
discrete_tuple = tuple(2 for _ in range(discrete_action_size))
|
||||
action_spec = ActionSpec(continuous_action_size, discrete_tuple)
|
||||
self.total_action_size = (
|
||||
continuous_action_size + discrete_action_size
|
||||
) # to set the goals/positions
|
||||
self.action_spec = action_spec
|
||||
self.behavior_spec = BehaviorSpec(self._make_observation_specs(), action_spec)
|
||||
self.action_spec = action_spec
|
||||
self.names = brain_names
|
||||
self.positions: Dict[str, List[float]] = {}
|
||||
self.step_count: Dict[str, float] = {}
|
||||
self._side_channel_manager = SideChannelManager([])
|
||||
|
||||
# Concatenate the arguments for a consistent random seed
|
||||
seed = (
|
||||
brain_names,
|
||||
step_size,
|
||||
num_visual,
|
||||
num_vector,
|
||||
num_var_len,
|
||||
vis_obs_size,
|
||||
vec_obs_size,
|
||||
var_len_obs_size,
|
||||
action_sizes,
|
||||
)
|
||||
self.random = random.Random(str(seed))
|
||||
|
||||
self.goal: Dict[str, int] = {}
|
||||
self.action = {}
|
||||
self.rewards: Dict[str, float] = {}
|
||||
self.final_rewards: Dict[str, List[float]] = {}
|
||||
self.step_result: Dict[str, Tuple[DecisionSteps, TerminalSteps]] = {}
|
||||
self.agent_id: Dict[str, int] = {}
|
||||
self.step_size = step_size # defines the difficulty of the test
|
||||
# Allow to be used as a UnityEnvironment during tests
|
||||
self.academy_capabilities = None
|
||||
|
||||
for name in self.names:
|
||||
self.agent_id[name] = 0
|
||||
self.goal[name] = self.random.choice([-1, 1])
|
||||
self.rewards[name] = 0
|
||||
self.final_rewards[name] = []
|
||||
self._reset_agent(name)
|
||||
self.action[name] = None
|
||||
self.step_result[name] = None
|
||||
|
||||
def _make_observation_specs(self) -> List[ObservationSpec]:
|
||||
obs_shape: List[Any] = []
|
||||
for _ in range(self.num_vector):
|
||||
obs_shape.append((self.vec_obs_size,))
|
||||
for _ in range(self.num_visual):
|
||||
obs_shape.append(self.vis_obs_size)
|
||||
for _ in range(self.num_var_len):
|
||||
obs_shape.append(self.var_len_obs_size)
|
||||
obs_spec = create_observation_specs_with_shapes(obs_shape)
|
||||
if self.goal_indices is not None:
|
||||
for i in range(len(obs_spec)):
|
||||
if i in self.goal_indices:
|
||||
obs_spec[i] = ObservationSpec(
|
||||
shape=obs_spec[i].shape,
|
||||
dimension_property=obs_spec[i].dimension_property,
|
||||
observation_type=ObservationType.GOAL_SIGNAL,
|
||||
name=obs_spec[i].name,
|
||||
)
|
||||
return obs_spec
|
||||
|
||||
def _make_obs(self, value: float) -> List[np.ndarray]:
|
||||
obs = []
|
||||
for _ in range(self.num_vector):
|
||||
obs.append(np.ones((1, self.vec_obs_size), dtype=np.float32) * value)
|
||||
for _ in range(self.num_visual):
|
||||
obs.append(np.ones((1,) + self.vis_obs_size, dtype=np.float32) * value)
|
||||
for _ in range(self.num_var_len):
|
||||
obs.append(np.ones((1,) + self.var_len_obs_size, dtype=np.float32) * value)
|
||||
return obs
|
||||
|
||||
@property
|
||||
def behavior_specs(self):
|
||||
behavior_dict = {}
|
||||
for n in self.names:
|
||||
behavior_dict[n] = self.behavior_spec
|
||||
return BehaviorMapping(behavior_dict)
|
||||
|
||||
def set_action_for_agent(self, behavior_name, agent_id, action):
|
||||
pass
|
||||
|
||||
def set_actions(self, behavior_name, action):
|
||||
self.action[behavior_name] = action
|
||||
|
||||
def get_steps(self, behavior_name):
|
||||
return self.step_result[behavior_name]
|
||||
|
||||
def _take_action(self, name: str) -> bool:
|
||||
deltas = []
|
||||
_act = self.action[name]
|
||||
if self.action_spec.continuous_size > 0 and not _act:
|
||||
for _cont in _act.continuous[0]:
|
||||
deltas.append(_cont)
|
||||
if self.action_spec.discrete_size > 0 and not _act:
|
||||
for _disc in _act.discrete[0]:
|
||||
deltas.append(1 if _disc else -1)
|
||||
for i, _delta in enumerate(deltas):
|
||||
_delta = clamp(_delta, -self.step_size, self.step_size)
|
||||
self.positions[name][i] += _delta
|
||||
self.positions[name][i] = clamp(self.positions[name][i], -1, 1)
|
||||
self.step_count[name] += 1
|
||||
# Both must be in 1.0 to be done
|
||||
done = all(pos >= 1.0 or pos <= -1.0 for pos in self.positions[name])
|
||||
return done
|
||||
|
||||
def _generate_mask(self):
|
||||
action_mask = None
|
||||
if self.action_spec.discrete_size > 0:
|
||||
# LL-Python API will return an empty dim if there is only 1 agent.
|
||||
ndmask = np.array(
|
||||
2 * self.action_spec.discrete_size * [False], dtype=np.bool
|
||||
)
|
||||
ndmask = np.expand_dims(ndmask, axis=0)
|
||||
action_mask = [ndmask]
|
||||
return action_mask
|
||||
|
||||
def _compute_reward(self, name: str, done: bool) -> float:
|
||||
if done:
|
||||
reward = 0.0
|
||||
for _pos in self.positions[name]:
|
||||
reward += (SUCCESS_REWARD * _pos * self.goal[name]) / len(
|
||||
self.positions[name]
|
||||
)
|
||||
else:
|
||||
reward = -TIME_PENALTY
|
||||
return reward
|
||||
|
||||
def _reset_agent(self, name):
|
||||
self.goal[name] = self.random.choice([-1, 1])
|
||||
self.positions[name] = [0.0 for _ in range(self.total_action_size)]
|
||||
self.step_count[name] = 0
|
||||
self.rewards[name] = 0
|
||||
self.agent_id[name] = self.agent_id[name] + 1
|
||||
|
||||
def _make_batched_step(
|
||||
self, name: str, done: bool, reward: float, group_reward: float
|
||||
) -> Tuple[DecisionSteps, TerminalSteps]:
|
||||
m_vector_obs = self._make_obs(self.goal[name])
|
||||
m_reward = np.array([reward], dtype=np.float32)
|
||||
m_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
|
||||
m_group_id = np.array([0], dtype=np.int32)
|
||||
m_group_reward = np.array([group_reward], dtype=np.float32)
|
||||
action_mask = self._generate_mask()
|
||||
decision_step = DecisionSteps(
|
||||
m_vector_obs, m_reward, m_agent_id, action_mask, m_group_id, m_group_reward
|
||||
)
|
||||
terminal_step = TerminalSteps.empty(self.behavior_spec)
|
||||
if done:
|
||||
self.final_rewards[name].append(self.rewards[name])
|
||||
# self._reset_agent(name)
|
||||
# new_vector_obs = self._make_obs(self.goal[name])
|
||||
# (
|
||||
# new_reward,
|
||||
# new_done,
|
||||
# new_agent_id,
|
||||
# new_action_mask,
|
||||
# new_group_id,
|
||||
# new_group_reward,
|
||||
# ) = self._construct_reset_step(name)
|
||||
|
||||
# decision_step = DecisionSteps(
|
||||
# new_vector_obs,
|
||||
# new_reward,
|
||||
# new_agent_id,
|
||||
# new_action_mask,
|
||||
# new_group_id,
|
||||
# new_group_reward,
|
||||
# )
|
||||
decision_step = DecisionSteps([], [], [], [], [], [])
|
||||
terminal_step = TerminalSteps(
|
||||
m_vector_obs,
|
||||
m_reward,
|
||||
np.array([False], dtype=bool),
|
||||
m_agent_id,
|
||||
m_group_id,
|
||||
m_group_reward,
|
||||
)
|
||||
return (decision_step, terminal_step)
|
||||
|
||||
def _construct_reset_step(
|
||||
self, name: str
|
||||
) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
|
||||
new_reward = np.array([0.0], dtype=np.float32)
|
||||
new_done = np.array([False], dtype=np.bool)
|
||||
new_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
|
||||
new_action_mask = self._generate_mask()
|
||||
new_group_id = np.array([0], dtype=np.int32)
|
||||
new_group_reward = np.array([0.0], dtype=np.float32)
|
||||
return (
|
||||
new_reward,
|
||||
new_done,
|
||||
new_agent_id,
|
||||
new_action_mask,
|
||||
new_group_id,
|
||||
new_group_reward,
|
||||
)
|
||||
|
||||
def step(self) -> None:
|
||||
assert all(action is not None for action in self.action.values())
|
||||
for name in self.names:
|
||||
|
||||
done = self._take_action(name)
|
||||
reward = self._compute_reward(name, done)
|
||||
self.rewards[name] += reward
|
||||
self.step_result[name] = self._make_batched_step(name, done, reward, 0.0)
|
||||
|
||||
def reset(self) -> None: # type: ignore
|
||||
for name in self.names:
|
||||
self._reset_agent(name)
|
||||
self.step_result[name] = self._make_batched_step(name, False, 0.0, 0.0)
|
||||
|
||||
@property
|
||||
def reset_parameters(self) -> Dict[str, str]:
|
||||
return {}
|
||||
|
||||
def close(self):
|
||||
pass
|
||||
|
||||
|
||||
class MultiAgentEnvironment(BaseEnv):
|
||||
"""
|
||||
The MultiAgentEnvironment maintains a list of SimpleEnvironment, one for each agent.
|
||||
When sending DecisionSteps and TerminalSteps to the trainers, it first batches the
|
||||
decision steps from the individual environments. When setting actions, it indexes the
|
||||
batched ActionTuple to obtain the ActionTuple for individual agents
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
brain_names,
|
||||
step_size=STEP_SIZE,
|
||||
num_visual=0,
|
||||
num_vector=1,
|
||||
num_var_len=0,
|
||||
vis_obs_size=VIS_OBS_SIZE,
|
||||
vec_obs_size=OBS_SIZE,
|
||||
var_len_obs_size=VAR_LEN_SIZE,
|
||||
action_sizes=(1, 0),
|
||||
num_agents=2,
|
||||
goal_indices=None,
|
||||
):
|
||||
super().__init__()
|
||||
self.envs = {}
|
||||
self.dones = {}
|
||||
self.just_died = set()
|
||||
self.names = brain_names
|
||||
self.final_rewards: Dict[str, List[float]] = {}
|
||||
for name in brain_names:
|
||||
self.final_rewards[name] = []
|
||||
for i in range(num_agents):
|
||||
name_and_num = name + str(i)
|
||||
self.envs[name_and_num] = SimpleEnvironment(
|
||||
[name],
|
||||
step_size,
|
||||
num_visual,
|
||||
num_vector,
|
||||
num_var_len,
|
||||
vis_obs_size,
|
||||
vec_obs_size,
|
||||
var_len_obs_size,
|
||||
action_sizes,
|
||||
goal_indices,
|
||||
)
|
||||
self.dones[name_and_num] = False
|
||||
self.envs[name_and_num].reset()
|
||||
# All envs have the same behavior spec, so just get the last one.
|
||||
self.behavior_spec = self.envs[name_and_num].behavior_spec
|
||||
self.action_spec = self.envs[name_and_num].action_spec
|
||||
self.num_agents = num_agents
|
||||
self._side_channel_manager = SideChannelManager([])
|
||||
|
||||
@property
|
||||
def all_done(self):
|
||||
return all(self.dones.values())
|
||||
|
||||
@property
|
||||
def behavior_specs(self):
|
||||
behavior_dict = {}
|
||||
for n in self.names:
|
||||
behavior_dict[n] = self.behavior_spec
|
||||
return BehaviorMapping(behavior_dict)
|
||||
|
||||
def set_action_for_agent(self, behavior_name, agent_id, action):
|
||||
pass
|
||||
|
||||
def set_actions(self, behavior_name, action):
|
||||
# The ActionTuple contains the actions for all n_agents. This
|
||||
# slices the ActionTuple into an action tuple for each environment
|
||||
# and sets it. The index j is used to ignore agents that have already
|
||||
# reached done.
|
||||
j = 0
|
||||
for i in range(self.num_agents):
|
||||
_act = ActionTuple()
|
||||
name_and_num = behavior_name + str(i)
|
||||
env = self.envs[name_and_num]
|
||||
if not self.dones[name_and_num]:
|
||||
if self.action_spec.continuous_size > 0:
|
||||
_act.add_continuous(action.continuous[j : j + 1])
|
||||
if self.action_spec.discrete_size > 0:
|
||||
_disc_list = [action.discrete[j, :]]
|
||||
_act.add_discrete(np.array(_disc_list))
|
||||
j += 1
|
||||
env.action[behavior_name] = _act
|
||||
|
||||
def get_steps(self, behavior_name):
|
||||
# This gets the individual DecisionSteps and TerminalSteps
|
||||
# from the envs and merges them into a batch to be sent
|
||||
# to the AgentProcessor.
|
||||
dec_vec_obs = []
|
||||
dec_reward = []
|
||||
dec_group_reward = []
|
||||
dec_agent_id = []
|
||||
dec_group_id = []
|
||||
ter_vec_obs = []
|
||||
ter_reward = []
|
||||
ter_group_reward = []
|
||||
ter_agent_id = []
|
||||
ter_group_id = []
|
||||
interrupted = []
|
||||
|
||||
action_mask = None
|
||||
terminal_step = TerminalSteps.empty(self.behavior_spec)
|
||||
decision_step = None
|
||||
for i in range(self.num_agents):
|
||||
name_and_num = behavior_name + str(i)
|
||||
env = self.envs[name_and_num]
|
||||
_dec, _term = env.step_result[behavior_name]
|
||||
if not self.dones[name_and_num]:
|
||||
dec_agent_id.append(i)
|
||||
dec_group_id.append(1)
|
||||
if len(dec_vec_obs) > 0:
|
||||
for j, obs in enumerate(_dec.obs):
|
||||
dec_vec_obs[j] = np.concatenate((dec_vec_obs[j], obs), axis=0)
|
||||
else:
|
||||
for obs in _dec.obs:
|
||||
dec_vec_obs.append(obs)
|
||||
dec_reward.append(_dec.reward[0])
|
||||
dec_group_reward.append(_dec.group_reward[0])
|
||||
if _dec.action_mask is not None:
|
||||
if action_mask is None:
|
||||
action_mask = []
|
||||
if len(action_mask) > 0:
|
||||
action_mask[0] = np.concatenate(
|
||||
(action_mask[0], _dec.action_mask[0]), axis=0
|
||||
)
|
||||
else:
|
||||
action_mask.append(_dec.action_mask[0])
|
||||
if len(_term.reward) > 0 and name_and_num in self.just_died:
|
||||
ter_agent_id.append(i)
|
||||
ter_group_id.append(1)
|
||||
if len(ter_vec_obs) > 0:
|
||||
for j, obs in enumerate(_term.obs):
|
||||
ter_vec_obs[j] = np.concatenate((ter_vec_obs[j], obs), axis=0)
|
||||
else:
|
||||
for obs in _term.obs:
|
||||
ter_vec_obs.append(obs)
|
||||
ter_reward.append(_term.reward[0])
|
||||
ter_group_reward.append(_term.group_reward[0])
|
||||
interrupted.append(False)
|
||||
self.just_died.remove(name_and_num)
|
||||
decision_step = DecisionSteps(
|
||||
dec_vec_obs,
|
||||
dec_reward,
|
||||
dec_agent_id,
|
||||
action_mask,
|
||||
dec_group_id,
|
||||
dec_group_reward,
|
||||
)
|
||||
terminal_step = TerminalSteps(
|
||||
ter_vec_obs,
|
||||
ter_reward,
|
||||
interrupted,
|
||||
ter_agent_id,
|
||||
ter_group_id,
|
||||
ter_group_reward,
|
||||
)
|
||||
if self.all_done:
|
||||
decision_step = DecisionSteps([], [], [], [], [], [])
|
||||
return (decision_step, terminal_step)
|
||||
|
||||
def step(self) -> None:
|
||||
# Steps all environments and calls reset if all agents are done.
|
||||
for name in self.names:
|
||||
for i in range(self.num_agents):
|
||||
name_and_num = name + str(i)
|
||||
# Does not step the env if done
|
||||
if not self.dones[name_and_num]:
|
||||
env = self.envs[name_and_num]
|
||||
# Reproducing part of env step to intercept Dones
|
||||
assert all(action is not None for action in env.action.values())
|
||||
done = env._take_action(name)
|
||||
reward = env._compute_reward(name, done)
|
||||
self.dones[name_and_num] = done
|
||||
if done:
|
||||
self.just_died.add(name_and_num)
|
||||
if self.all_done:
|
||||
env.step_result[name] = env._make_batched_step(
|
||||
name, done, 0.0, reward
|
||||
)
|
||||
self.final_rewards[name].append(reward)
|
||||
# self.reset()
|
||||
elif done:
|
||||
# This agent has finished but others are still running.
|
||||
# This gives a reward of the time penalty if this agent
|
||||
# is successful and the negative env reward if it fails.
|
||||
ceil_reward = min(-TIME_PENALTY, reward)
|
||||
env.step_result[name] = env._make_batched_step(
|
||||
name, done, ceil_reward, 0.0
|
||||
)
|
||||
self.final_rewards[name].append(reward)
|
||||
|
||||
else:
|
||||
env.step_result[name] = env._make_batched_step(
|
||||
name, done, reward, 0.0
|
||||
)
|
||||
|
||||
def reset(self) -> None: # type: ignore
|
||||
for name in self.names:
|
||||
for i in range(self.num_agents):
|
||||
name_and_num = name + str(i)
|
||||
self.dones[name_and_num] = False
|
||||
|
||||
self.dones = {}
|
||||
self.just_died = set()
|
||||
self.final_rewards = {}
|
||||
for name in self.names:
|
||||
self.final_rewards[name] = []
|
||||
for i in range(self.num_agents):
|
||||
name_and_num = name + str(i)
|
||||
self.dones[name_and_num] = False
|
||||
self.envs[name_and_num].reset()
|
||||
|
||||
@property
|
||||
def reset_parameters(self) -> Dict[str, str]:
|
||||
return {}
|
||||
|
||||
def close(self):
|
||||
pass
|
|
@ -3,7 +3,8 @@ import pytest
|
|||
import numpy as np
|
||||
|
||||
from gym import spaces
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
|
||||
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
|
||||
from mlagents_envs.base_env import (
|
||||
BehaviorSpec,
|
||||
ActionSpec,
|
||||
|
@ -11,7 +12,7 @@ from mlagents_envs.base_env import (
|
|||
TerminalSteps,
|
||||
BehaviorMapping,
|
||||
)
|
||||
from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
|
||||
from dummy_config import create_observation_specs_with_shapes
|
||||
|
||||
|
||||
def test_gym_wrapper():
|
|
@ -0,0 +1,32 @@
|
|||
from mlagents_envs.envs.unity_aec_env import UnityAECEnv
|
||||
from mlagents_envs.envs.unity_parallel_env import UnityParallelEnv
|
||||
from simple_test_envs import SimpleEnvironment, MultiAgentEnvironment
|
||||
from pettingzoo.test import api_test, parallel_api_test
|
||||
|
||||
NUM_TEST_CYCLES = 100
|
||||
|
||||
|
||||
def test_single_agent_aec():
|
||||
unity_env = SimpleEnvironment(["test_single"])
|
||||
env = UnityAECEnv(unity_env)
|
||||
api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
|
||||
|
||||
|
||||
def test_multi_agent_aec():
|
||||
unity_env = MultiAgentEnvironment(["test_multi_1", "test_multi_2"], num_agents=2)
|
||||
env = UnityAECEnv(unity_env)
|
||||
api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
|
||||
|
||||
|
||||
def test_single_agent_parallel():
|
||||
unity_env = SimpleEnvironment(["test_single"])
|
||||
env = UnityParallelEnv(unity_env)
|
||||
parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)
|
||||
|
||||
|
||||
def test_multi_agent_parallel():
|
||||
unity_env = MultiAgentEnvironment(
|
||||
["test_multi_1", "test_multi_2", "test_multi_3"], num_agents=3
|
||||
)
|
||||
env = UnityParallelEnv(unity_env)
|
||||
parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)
|
|
@ -0,0 +1,503 @@
|
|||
import io
|
||||
import numpy as np
|
||||
import pytest
|
||||
from typing import List, Tuple, Any
|
||||
|
||||
from mlagents_envs.communicator_objects.agent_info_pb2 import AgentInfoProto
|
||||
from mlagents_envs.communicator_objects.observation_pb2 import (
|
||||
ObservationProto,
|
||||
NONE,
|
||||
PNG,
|
||||
)
|
||||
from mlagents_envs.communicator_objects.brain_parameters_pb2 import BrainParametersProto
|
||||
from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
|
||||
AgentInfoActionPairProto,
|
||||
)
|
||||
from mlagents_envs.communicator_objects.agent_action_pb2 import AgentActionProto
|
||||
from mlagents_envs.base_env import (
|
||||
BehaviorSpec,
|
||||
ActionSpec,
|
||||
DecisionSteps,
|
||||
TerminalSteps,
|
||||
)
|
||||
from mlagents_envs.exception import UnityObservationException
|
||||
from mlagents_envs.rpc_utils import (
|
||||
behavior_spec_from_proto,
|
||||
process_pixels,
|
||||
_process_maybe_compressed_observation,
|
||||
_process_rank_one_or_two_observation,
|
||||
steps_from_proto,
|
||||
)
|
||||
from PIL import Image
|
||||
from dummy_config import create_observation_specs_with_shapes
|
||||
|
||||
|
||||
def generate_list_agent_proto(
|
||||
n_agent: int,
|
||||
shape: List[Tuple[int]],
|
||||
infinite_rewards: bool = False,
|
||||
nan_observations: bool = False,
|
||||
) -> List[AgentInfoProto]:
|
||||
result = []
|
||||
for agent_index in range(n_agent):
|
||||
ap = AgentInfoProto()
|
||||
ap.reward = float("inf") if infinite_rewards else agent_index
|
||||
ap.done = agent_index % 2 == 0
|
||||
ap.max_step_reached = agent_index % 4 == 0
|
||||
ap.id = agent_index
|
||||
ap.action_mask.extend([True, False] * 5)
|
||||
obs_proto_list = []
|
||||
for obs_index in range(len(shape)):
|
||||
obs_proto = ObservationProto()
|
||||
obs_proto.shape.extend(list(shape[obs_index]))
|
||||
obs_proto.compression_type = NONE
|
||||
obs_proto.float_data.data.extend(
|
||||
([float("nan")] if nan_observations else [0.1])
|
||||
* np.prod(shape[obs_index])
|
||||
)
|
||||
obs_proto_list.append(obs_proto)
|
||||
ap.observations.extend(obs_proto_list)
|
||||
result.append(ap)
|
||||
return result
|
||||
|
||||
|
||||
def generate_compressed_data(in_array: np.ndarray) -> bytes:
|
||||
image_arr = (in_array * 255).astype(np.uint8)
|
||||
bytes_out = bytes()
|
||||
|
||||
num_channels = in_array.shape[2]
|
||||
num_images = (num_channels + 2) // 3
|
||||
# Split the input image into batches of 3 channels.
|
||||
for i in range(num_images):
|
||||
sub_image = image_arr[..., 3 * i : 3 * i + 3]
|
||||
if (i == num_images - 1) and (num_channels % 3) != 0:
|
||||
# Pad zeros
|
||||
zero_shape = list(in_array.shape)
|
||||
zero_shape[2] = 3 - (num_channels % 3)
|
||||
z = np.zeros(zero_shape, dtype=np.uint8)
|
||||
sub_image = np.concatenate([sub_image, z], axis=2)
|
||||
im = Image.fromarray(sub_image, "RGB")
|
||||
byteIO = io.BytesIO()
|
||||
im.save(byteIO, format="PNG")
|
||||
bytes_out += byteIO.getvalue()
|
||||
return bytes_out
|
||||
|
||||
|
||||
# test helper function for old C# API (no compressed channel mapping)
|
||||
def generate_compressed_proto_obs(
|
||||
in_array: np.ndarray, grayscale: bool = False
|
||||
) -> ObservationProto:
|
||||
obs_proto = ObservationProto()
|
||||
obs_proto.compressed_data = generate_compressed_data(in_array)
|
||||
obs_proto.compression_type = PNG
|
||||
if grayscale:
|
||||
# grayscale flag is only used for old API without mapping
|
||||
expected_shape = [in_array.shape[0], in_array.shape[1], 1]
|
||||
obs_proto.shape.extend(expected_shape)
|
||||
else:
|
||||
obs_proto.shape.extend(in_array.shape)
|
||||
return obs_proto
|
||||
|
||||
|
||||
# test helper function for new C# API (with compressed channel mapping)
|
||||
def generate_compressed_proto_obs_with_mapping(
|
||||
in_array: np.ndarray, mapping: List[int]
|
||||
) -> ObservationProto:
|
||||
obs_proto = ObservationProto()
|
||||
obs_proto.compressed_data = generate_compressed_data(in_array)
|
||||
obs_proto.compression_type = PNG
|
||||
if mapping is not None:
|
||||
obs_proto.compressed_channel_mapping.extend(mapping)
|
||||
expected_shape = [
|
||||
in_array.shape[0],
|
||||
in_array.shape[1],
|
||||
len({m for m in mapping if m >= 0}),
|
||||
]
|
||||
obs_proto.shape.extend(expected_shape)
|
||||
else:
|
||||
obs_proto.shape.extend(in_array.shape)
|
||||
return obs_proto
|
||||
|
||||
|
||||
def generate_uncompressed_proto_obs(in_array: np.ndarray) -> ObservationProto:
|
||||
obs_proto = ObservationProto()
|
||||
obs_proto.float_data.data.extend(in_array.flatten().tolist())
|
||||
obs_proto.compression_type = NONE
|
||||
obs_proto.shape.extend(in_array.shape)
|
||||
return obs_proto
|
||||
|
||||
|
||||
def proto_from_steps(
|
||||
decision_steps: DecisionSteps, terminal_steps: TerminalSteps
|
||||
) -> List[AgentInfoProto]:
|
||||
agent_info_protos: List[AgentInfoProto] = []
|
||||
# Take care of the DecisionSteps first
|
||||
for agent_id in decision_steps.agent_id:
|
||||
agent_id_index = decision_steps.agent_id_to_index[agent_id]
|
||||
reward = decision_steps.reward[agent_id_index]
|
||||
done = False
|
||||
max_step_reached = False
|
||||
agent_mask: Any = None
|
||||
if decision_steps.action_mask is not None:
|
||||
agent_mask = []
|
||||
for _branch in decision_steps.action_mask:
|
||||
agent_mask = np.concatenate(
|
||||
(agent_mask, _branch[agent_id_index, :]), axis=0
|
||||
)
|
||||
agent_mask = agent_mask.astype(np.bool).tolist()
|
||||
observations: List[ObservationProto] = []
|
||||
for all_observations_of_type in decision_steps.obs:
|
||||
observation = all_observations_of_type[agent_id_index]
|
||||
if len(observation.shape) == 3:
|
||||
observations.append(generate_uncompressed_proto_obs(observation))
|
||||
else:
|
||||
observations.append(
|
||||
ObservationProto(
|
||||
float_data=ObservationProto.FloatData(data=observation),
|
||||
shape=[len(observation)],
|
||||
compression_type=NONE,
|
||||
)
|
||||
)
|
||||
agent_info_proto = AgentInfoProto(
|
||||
reward=reward,
|
||||
done=done,
|
||||
id=agent_id,
|
||||
max_step_reached=bool(max_step_reached),
|
||||
action_mask=agent_mask,
|
||||
observations=observations,
|
||||
)
|
||||
agent_info_protos.append(agent_info_proto)
|
||||
# Take care of the TerminalSteps second
|
||||
for agent_id in terminal_steps.agent_id:
|
||||
agent_id_index = terminal_steps.agent_id_to_index[agent_id]
|
||||
reward = terminal_steps.reward[agent_id_index]
|
||||
done = True
|
||||
max_step_reached = terminal_steps.interrupted[agent_id_index]
|
||||
|
||||
final_observations: List[ObservationProto] = []
|
||||
for all_observations_of_type in terminal_steps.obs:
|
||||
observation = all_observations_of_type[agent_id_index]
|
||||
if len(observation.shape) == 3:
|
||||
final_observations.append(generate_uncompressed_proto_obs(observation))
|
||||
else:
|
||||
final_observations.append(
|
||||
ObservationProto(
|
||||
float_data=ObservationProto.FloatData(data=observation),
|
||||
shape=[len(observation)],
|
||||
compression_type=NONE,
|
||||
)
|
||||
)
|
||||
agent_info_proto = AgentInfoProto(
|
||||
reward=reward,
|
||||
done=done,
|
||||
id=agent_id,
|
||||
max_step_reached=bool(max_step_reached),
|
||||
action_mask=None,
|
||||
observations=final_observations,
|
||||
)
|
||||
agent_info_protos.append(agent_info_proto)
|
||||
|
||||
return agent_info_protos
|
||||
|
||||
|
||||
# The arguments here are the DecisionSteps, TerminalSteps and continuous/discrete actions for a single agent name
|
||||
def proto_from_steps_and_action(
|
||||
decision_steps: DecisionSteps,
|
||||
terminal_steps: TerminalSteps,
|
||||
continuous_actions: np.ndarray,
|
||||
discrete_actions: np.ndarray,
|
||||
) -> List[AgentInfoActionPairProto]:
|
||||
agent_info_protos = proto_from_steps(decision_steps, terminal_steps)
|
||||
agent_action_protos = []
|
||||
num_agents = (
|
||||
len(continuous_actions)
|
||||
if continuous_actions is not None
|
||||
else len(discrete_actions)
|
||||
)
|
||||
for i in range(num_agents):
|
||||
proto = AgentActionProto()
|
||||
if continuous_actions is not None:
|
||||
proto.continuous_actions.extend(continuous_actions[i])
|
||||
proto.vector_actions_deprecated.extend(continuous_actions[i])
|
||||
if discrete_actions is not None:
|
||||
proto.discrete_actions.extend(discrete_actions[i])
|
||||
proto.vector_actions_deprecated.extend(discrete_actions[i])
|
||||
agent_action_protos.append(proto)
|
||||
agent_info_action_pair_protos = [
|
||||
AgentInfoActionPairProto(agent_info=agent_info_proto, action_info=action_proto)
|
||||
for agent_info_proto, action_proto in zip(
|
||||
agent_info_protos, agent_action_protos
|
||||
)
|
||||
]
|
||||
return agent_info_action_pair_protos
|
||||
|
||||
|
||||
def test_process_pixels():
|
||||
in_array = np.random.rand(128, 64, 3)
|
||||
byte_arr = generate_compressed_data(in_array)
|
||||
out_array = process_pixels(byte_arr, 3)
|
||||
assert out_array.shape == (128, 64, 3)
|
||||
assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
|
||||
assert np.allclose(in_array, out_array, atol=0.01)
|
||||
|
||||
|
||||
def test_process_pixels_multi_png():
|
||||
height = 128
|
||||
width = 64
|
||||
num_channels = 7
|
||||
in_array = np.random.rand(height, width, num_channels)
|
||||
byte_arr = generate_compressed_data(in_array)
|
||||
out_array = process_pixels(byte_arr, num_channels)
|
||||
assert out_array.shape == (height, width, num_channels)
|
||||
assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
|
||||
assert np.allclose(in_array, out_array, atol=0.01)
|
||||
|
||||
|
||||
def test_process_pixels_gray():
|
||||
in_array = np.random.rand(128, 64, 3)
|
||||
byte_arr = generate_compressed_data(in_array)
|
||||
out_array = process_pixels(byte_arr, 1)
|
||||
assert out_array.shape == (128, 64, 1)
|
||||
assert np.mean(in_array.mean(axis=2, keepdims=True) - out_array) < 0.01
|
||||
assert np.allclose(in_array.mean(axis=2, keepdims=True), out_array, atol=0.01)
|
||||
|
||||
|
||||
def test_vector_observation():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
obs_specs = create_observation_specs_with_shapes(shapes)
|
||||
list_proto = generate_list_agent_proto(n_agents, shapes)
|
||||
for obs_index, shape in enumerate(shapes):
|
||||
arr = _process_rank_one_or_two_observation(
|
||||
obs_index, obs_specs[obs_index], list_proto
|
||||
)
|
||||
assert list(arr.shape) == ([n_agents] + list(shape))
|
||||
assert np.allclose(arr, 0.1, atol=0.01)
|
||||
|
||||
|
||||
def test_process_visual_observation():
|
||||
shape = (128, 64, 3)
|
||||
in_array_1 = np.random.rand(*shape)
|
||||
proto_obs_1 = generate_compressed_proto_obs(in_array_1)
|
||||
in_array_2 = np.random.rand(*shape)
|
||||
in_array_2_mapping = [0, 1, 2]
|
||||
proto_obs_2 = generate_compressed_proto_obs_with_mapping(
|
||||
in_array_2, in_array_2_mapping
|
||||
)
|
||||
|
||||
ap1 = AgentInfoProto()
|
||||
ap1.observations.extend([proto_obs_1])
|
||||
ap2 = AgentInfoProto()
|
||||
ap2.observations.extend([proto_obs_2])
|
||||
ap_list = [ap1, ap2]
|
||||
obs_spec = create_observation_specs_with_shapes([shape])[0]
|
||||
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
|
||||
assert list(arr.shape) == [2, 128, 64, 3]
|
||||
assert np.allclose(arr[0, :, :, :], in_array_1, atol=0.01)
|
||||
assert np.allclose(arr[1, :, :, :], in_array_2, atol=0.01)
|
||||
|
||||
|
||||
def test_process_visual_observation_grayscale():
|
||||
in_array_1 = np.random.rand(128, 64, 3)
|
||||
proto_obs_1 = generate_compressed_proto_obs(in_array_1, grayscale=True)
|
||||
expected_out_array_1 = np.mean(in_array_1, axis=2, keepdims=True)
|
||||
in_array_2 = np.random.rand(128, 64, 3)
|
||||
in_array_2_mapping = [0, 0, 0]
|
||||
proto_obs_2 = generate_compressed_proto_obs_with_mapping(
|
||||
in_array_2, in_array_2_mapping
|
||||
)
|
||||
expected_out_array_2 = np.mean(in_array_2, axis=2, keepdims=True)
|
||||
|
||||
ap1 = AgentInfoProto()
|
||||
ap1.observations.extend([proto_obs_1])
|
||||
ap2 = AgentInfoProto()
|
||||
ap2.observations.extend([proto_obs_2])
|
||||
ap_list = [ap1, ap2]
|
||||
shape = (128, 64, 1)
|
||||
obs_spec = create_observation_specs_with_shapes([shape])[0]
|
||||
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
|
||||
assert list(arr.shape) == [2, 128, 64, 1]
|
||||
assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
|
||||
assert np.allclose(arr[1, :, :, :], expected_out_array_2, atol=0.01)
|
||||
|
||||
|
||||
def test_process_visual_observation_padded_channels():
|
||||
in_array_1 = np.random.rand(128, 64, 12)
|
||||
in_array_1_mapping = [0, 1, 2, 3, -1, -1, 4, 5, 6, 7, -1, -1]
|
||||
proto_obs_1 = generate_compressed_proto_obs_with_mapping(
|
||||
in_array_1, in_array_1_mapping
|
||||
)
|
||||
expected_out_array_1 = np.take(in_array_1, [0, 1, 2, 3, 6, 7, 8, 9], axis=2)
|
||||
|
||||
ap1 = AgentInfoProto()
|
||||
ap1.observations.extend([proto_obs_1])
|
||||
ap_list = [ap1]
|
||||
shape = (128, 64, 8)
|
||||
obs_spec = create_observation_specs_with_shapes([shape])[0]
|
||||
|
||||
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
|
||||
assert list(arr.shape) == [1, 128, 64, 8]
|
||||
assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
|
||||
|
||||
|
||||
def test_process_visual_observation_bad_shape():
|
||||
in_array_1 = np.random.rand(128, 64, 3)
|
||||
proto_obs_1 = generate_compressed_proto_obs(in_array_1)
|
||||
ap1 = AgentInfoProto()
|
||||
ap1.observations.extend([proto_obs_1])
|
||||
ap_list = [ap1]
|
||||
|
||||
shape = (128, 42, 3)
|
||||
obs_spec = create_observation_specs_with_shapes([shape])[0]
|
||||
|
||||
with pytest.raises(UnityObservationException):
|
||||
_process_maybe_compressed_observation(0, obs_spec, ap_list)
|
||||
|
||||
|
||||
def test_batched_step_result_from_proto():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
decision_steps, terminal_steps = steps_from_proto(ap_list, spec)
|
||||
for agent_id in range(n_agents):
|
||||
if agent_id in decision_steps:
|
||||
# we set the reward equal to the agent id in generate_list_agent_proto
|
||||
assert decision_steps[agent_id].reward == agent_id
|
||||
elif agent_id in terminal_steps:
|
||||
assert terminal_steps[agent_id].reward == agent_id
|
||||
else:
|
||||
raise Exception("Missing agent from the steps")
|
||||
# We sort the AgentId since they are split between DecisionSteps and TerminalSteps
|
||||
combined_agent_id = list(decision_steps.agent_id) + list(terminal_steps.agent_id)
|
||||
combined_agent_id.sort()
|
||||
assert combined_agent_id == list(range(n_agents))
|
||||
for agent_id in range(n_agents):
|
||||
assert (agent_id in terminal_steps) == (agent_id % 2 == 0)
|
||||
if agent_id in terminal_steps:
|
||||
assert terminal_steps[agent_id].interrupted == (agent_id % 4 == 0)
|
||||
assert decision_steps.obs[0].shape[1] == shapes[0][0]
|
||||
assert decision_steps.obs[1].shape[1] == shapes[1][0]
|
||||
assert terminal_steps.obs[0].shape[1] == shapes[0][0]
|
||||
assert terminal_steps.obs[1].shape[1] == shapes[1][0]
|
||||
|
||||
|
||||
def test_mismatch_observations_raise_in_step_result_from_proto():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
# Hack an observation to be larger, we should get an exception
|
||||
ap_list[0].observations[0].shape[0] += 1
|
||||
ap_list[0].observations[0].float_data.data.append(0.42)
|
||||
with pytest.raises(UnityObservationException):
|
||||
steps_from_proto(ap_list, spec)
|
||||
|
||||
|
||||
def test_action_masking_discrete():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((7, 3))
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
|
||||
masks = decision_steps.action_mask
|
||||
assert isinstance(masks, list)
|
||||
assert len(masks) == 2
|
||||
assert masks[0].shape == (n_agents / 2, 7) # half agents are done
|
||||
assert masks[1].shape == (n_agents / 2, 3) # half agents are done
|
||||
assert masks[0][0, 0]
|
||||
assert not masks[1][0, 0]
|
||||
assert masks[1][0, 1]
|
||||
|
||||
|
||||
def test_action_masking_discrete_1():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((10,))
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
|
||||
masks = decision_steps.action_mask
|
||||
assert isinstance(masks, list)
|
||||
assert len(masks) == 1
|
||||
assert masks[0].shape == (n_agents / 2, 10)
|
||||
assert masks[0][0, 0]
|
||||
|
||||
|
||||
def test_action_masking_discrete_2():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes),
|
||||
ActionSpec.create_discrete((2, 2, 6)),
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
|
||||
masks = decision_steps.action_mask
|
||||
assert isinstance(masks, list)
|
||||
assert len(masks) == 3
|
||||
assert masks[0].shape == (n_agents / 2, 2)
|
||||
assert masks[1].shape == (n_agents / 2, 2)
|
||||
assert masks[2].shape == (n_agents / 2, 6)
|
||||
assert masks[0][0, 0]
|
||||
|
||||
|
||||
def test_action_masking_continuous():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(10)
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes)
|
||||
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
|
||||
masks = decision_steps.action_mask
|
||||
assert masks is None
|
||||
|
||||
|
||||
def test_agent_behavior_spec_from_proto():
|
||||
agent_proto = generate_list_agent_proto(1, [(3,), (4,)])[0]
|
||||
bp = BrainParametersProto()
|
||||
bp.vector_action_size_deprecated.extend([5, 4])
|
||||
bp.vector_action_space_type_deprecated = 0
|
||||
behavior_spec = behavior_spec_from_proto(bp, agent_proto)
|
||||
assert behavior_spec.action_spec.is_discrete()
|
||||
assert not behavior_spec.action_spec.is_continuous()
|
||||
assert [spec.shape for spec in behavior_spec.observation_specs] == [(3,), (4,)]
|
||||
assert behavior_spec.action_spec.discrete_branches == (5, 4)
|
||||
assert behavior_spec.action_spec.discrete_size == 2
|
||||
bp = BrainParametersProto()
|
||||
bp.vector_action_size_deprecated.extend([6])
|
||||
bp.vector_action_space_type_deprecated = 1
|
||||
behavior_spec = behavior_spec_from_proto(bp, agent_proto)
|
||||
assert not behavior_spec.action_spec.is_discrete()
|
||||
assert behavior_spec.action_spec.is_continuous()
|
||||
assert behavior_spec.action_spec.continuous_size == 6
|
||||
|
||||
|
||||
def test_batched_step_result_from_proto_raises_on_infinite():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes, infinite_rewards=True)
|
||||
with pytest.raises(RuntimeError):
|
||||
steps_from_proto(ap_list, behavior_spec)
|
||||
|
||||
|
||||
def test_batched_step_result_from_proto_raises_on_nan():
|
||||
n_agents = 10
|
||||
shapes = [(3,), (4,)]
|
||||
behavior_spec = BehaviorSpec(
|
||||
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
|
||||
)
|
||||
ap_list = generate_list_agent_proto(n_agents, shapes, nan_observations=True)
|
||||
with pytest.raises(RuntimeError):
|
||||
steps_from_proto(ap_list, behavior_spec)
|
|
@ -7,7 +7,7 @@ from mlagents_envs.base_env import (
|
|||
ActionSpec,
|
||||
BehaviorSpec,
|
||||
)
|
||||
from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
|
||||
from dummy_config import create_observation_specs_with_shapes
|
||||
|
||||
|
||||
def test_decision_steps():
|
|
@ -4,7 +4,7 @@ The `mlagents` Python package is part of the
|
|||
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents`
|
||||
provides a set of reinforcement and imitation learning algorithms designed to be
|
||||
used with Unity environments. The algorithms interface with the Python API
|
||||
provided by the `mlagents_envs` package. See [here](../docs/Python-API.md) for
|
||||
provided by the `mlagents_envs` package. See [here](../docs/Python-LLAPI.md) for
|
||||
more information on `mlagents_envs`.
|
||||
|
||||
The algorithms can be accessed using the: `mlagents-learn` access point. See
|
||||
|
|
|
@ -0,0 +1,16 @@
|
|||
{
|
||||
"param_1": {
|
||||
"lesson_num": 2
|
||||
},
|
||||
"param_2": {
|
||||
"lesson_num": 0
|
||||
},
|
||||
"param_3": {
|
||||
"lesson_num": 0
|
||||
},
|
||||
"metadata": {
|
||||
"stats_format_version": "0.3.0",
|
||||
"mlagents_version": "0.29.0.dev0",
|
||||
"torch_version": "1.8.1"
|
||||
}
|
||||
}
|
|
@ -13,7 +13,7 @@ from mlagents_envs.base_env import (
|
|||
TerminalSteps,
|
||||
BehaviorMapping,
|
||||
)
|
||||
from mlagents_envs.tests.test_rpc_utils import proto_from_steps_and_action
|
||||
from .test_rpc_utils import proto_from_steps_and_action
|
||||
from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
|
||||
AgentInfoActionPairProto,
|
||||
)
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
import argparse
|
||||
|
||||
from mlagents_envs.environment import UnityEnvironment
|
||||
from gym_unity.envs import UnityToGymWrapper
|
||||
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
|
||||
|
||||
|
||||
def test_run_environment(env_name):
|
||||
|
|
|
@ -136,13 +136,12 @@ def init_venv(
|
|||
# install from pypi
|
||||
pip_commands += [
|
||||
f"mlagents=={mlagents_python_version}",
|
||||
f"gym-unity=={mlagents_python_version}",
|
||||
# TODO build these and publish to internal pypi
|
||||
"tf2onnx==1.6.1",
|
||||
]
|
||||
else:
|
||||
# Local install
|
||||
pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents", "-e ./gym-unity"]
|
||||
pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents"]
|
||||
if extra_packages:
|
||||
pip_commands += extra_packages
|
||||
|
||||
|
|
|
@ -40,7 +40,7 @@ def validate_packages(root_dir):
|
|||
|
||||
|
||||
def main():
|
||||
for root_dir in ["ml-agents", "ml-agents-envs", "gym-unity"]:
|
||||
for root_dir in ["ml-agents", "ml-agents-envs"]:
|
||||
validate_packages(root_dir)
|
||||
|
||||
|
||||
|
|
|
@ -22,7 +22,9 @@ MATCH_ANY = re.compile(r"(?s).*")
|
|||
# To allow everything in the file (effectively skipping it), use MATCH_ANY for the value
|
||||
ALLOW_LIST = {
|
||||
# Previous release table
|
||||
"README.md": re.compile(r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"),
|
||||
"docs/Python-PettingZoo-API.md": re.compile(
|
||||
r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"
|
||||
),
|
||||
"docs/Versioning.md": MATCH_ANY,
|
||||
"com.unity.ml-agents/CHANGELOG.md": MATCH_ANY,
|
||||
"utils/make_readme_table.py": MATCH_ANY,
|
||||
|
|
|
@ -8,11 +8,7 @@ import argparse
|
|||
|
||||
VERSION_LINE_START = "__version__ = "
|
||||
|
||||
DIRECTORIES = [
|
||||
"ml-agents/mlagents/trainers",
|
||||
"ml-agents-envs/mlagents_envs",
|
||||
"gym-unity/gym_unity",
|
||||
]
|
||||
DIRECTORIES = ["ml-agents/mlagents/trainers", "ml-agents-envs/mlagents_envs"]
|
||||
|
||||
MLAGENTS_PACKAGE_JSON_PATH = "com.unity.ml-agents/package.json"
|
||||
MLAGENTS_EXTENSIONS_PACKAGE_JSON_PATH = "com.unity.ml-agents.extensions/package.json"
|
||||
|
|
Загрузка…
Ссылка в новой задаче