* Dropped support for python 3.6

* Pinning python 3.9.9 for tests due to typing issues with 3.9.10

* Testing new bokken image.

* Testing new bokken image.

* Updated yamato standalone build test.

* Updated yamato standalone build test.

* Updated standalone build test.

* Updated yamato configs to use mla bokken vm.

* Bug fixes for yamato yml files.

* Fixed com.unity.ml-agents-test.yml

* Bumped min python version to 3.7.2

* pettingzoo api prototype

* add example

* update file names

* support multiple behavior names

* fix multi behavior action index

* add install in colab

* add setup

* update colab

* fix __init__

* clone single branch

* import tags only

* import in init

* catch import error

* update colab

* move colab and add readme

* handle agent dying

* add tests

* update doc

* add info

* add action mask

* fix action mask

* update action masks in colab

* change default env

* set version

* fix hybrid action

* fix colab for hybrid actions

* add note on auto reset

* Updated colab name.

* Update README.md

* Following petting_zoo registry API (#5557)

* init petting_zoo registry

* cherrypick Custom trainer editor analytics (#5511)

* cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)"

* Update colab to match pettingZoo import api

* ToRevert: pull exp-petting-registry branch

* Add init file to tests

* Install pettingzoo-unity requirements for pytest

* update pytest command

* Add docstrings and comments

* update coverage to pettingzoo folder

* unset log level

* update env string

* Two small bugfixes (#5589)

1. Add the missing `_cumulative_rewards` property
2. Update `agent_selection` to not error out when an agent finishes an episode.

* Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing.

* Some refactoring.

* Finished inital implementation of parallel. Tests not passing.

* Finished parallel API implementation and refactor. All PZ tests passing.

* Cleanup.

* Refactoring.

* Pinning numpy version.

* add metadata and behavior_specs initialization

* addressing behaviour_spec issues

* Bumped PZ version to 1.14.0. Fixed failing tests.

* Refactored gym-unity and petting-zoo into ml-agents-envs

* Added TODO to pydoc-config.yaml

* Refactored gym and pz to be under a subpackage in mlagents_env package

* Refactored ml-agents-envs docs.

* Minor update to PZ API doc.

* Updated mlagents_envs docs and colab.

* Updated pytest gh workflow to remove ref to gym and pz.

* Refactored to remove some test coupling between trainers and envs.

* Updated installation doc.

* Update ml-agents-envs/README.md

Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>

* Updated failing yamato jobs.

* pettingzoo api prototype

* add example

* update file names

* support multiple behavior names

* fix multi behavior action index

* add install in colab

* add setup

* update colab

* fix __init__

* clone single branch

* import tags only

* import in init

* catch import error

* update colab

* move colab and add readme

* handle agent dying

* add tests

* update doc

* add info

* add action mask

* fix action mask

* update action masks in colab

* change default env

* set version

* fix hybrid action

* fix colab for hybrid actions

* add note on auto reset

* Updated colab name.

* Update README.md

* Following petting_zoo registry API (#5557)

* init petting_zoo registry

* cherrypick Custom trainer editor analytics (#5511)

* cherrypick "Update dotnet-format to address breaking changes introduced by upstream changes (#5528)"

* Update colab to match pettingZoo import api

* ToRevert: pull exp-petting-registry branch

* Add init file to tests

* Install pettingzoo-unity requirements for pytest

* update pytest command

* Add docstrings and comments

* update coverage to pettingzoo folder

* unset log level

* update env string

* Two small bugfixes (#5589)

1. Add the missing `_cumulative_rewards` property
2. Update `agent_selection` to not error out when an agent finishes an episode.

* Updated gym to 0.21.0 and petting zoo to 1.13.1, fixed bugs with AEC wrapper for gym and PZ updates. API tests are passing.

* Some refactoring.

* Finished inital implementation of parallel. Tests not passing.

* Finished parallel API implementation and refactor. All PZ tests passing.

* Cleanup.

* Refactoring.

* Pinning numpy version.

* add metadata and behavior_specs initialization

* addressing behaviour_spec issues

* Bumped PZ version to 1.14.0. Fixed failing tests.

* Refactored gym-unity and petting-zoo into ml-agents-envs

* Added TODO to pydoc-config.yaml

* Refactored gym and pz to be under a subpackage in mlagents_env package

* Refactored ml-agents-envs docs.

* Minor update to PZ API doc.

* Updated mlagents_envs docs and colab.

* Updated pytest gh workflow to remove ref to gym and pz.

* Refactored to remove some test coupling between trainers and envs.

* Updated installation doc.

* Update ml-agents-envs/README.md

Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>

* Updated CHANGELOG.

* Updated Migration guide.

* Doc updates based on CR.

* Updated github workflow for colab tests.

* Updated github workflow for colab tests.

* Updated github workflow for colab tests.

* Fixed yamato import error.

Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>
Co-authored-by: Miguel Alonso Jr <miguelalonsojr>
Co-authored-by: jmercado1985 <75792879+jmercado1985@users.noreply.github.com>
Co-authored-by: Maryam Honari <honari.m94@gmail.com>
Co-authored-by: Henry Peteet <henry.peteet@unity3d.com>
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
This commit is contained in:
Miguel Alonso Jr 2022-02-02 19:32:23 -05:00 коммит произвёл GitHub Enterprise
Родитель b4cbaa6840
Коммит 28303adf6c
62 изменённых файлов: 2707 добавлений и 186 удалений

2
.github/workflows/publish_pypi.yaml поставляемый
Просмотреть файл

@ -16,7 +16,7 @@ jobs:
runs-on: [self-hosted, Linux, X64]
strategy:
matrix:
package-path: [ml-agents, ml-agents-envs, gym-unity]
package-path: [ml-agents, ml-agents-envs]
steps:
- uses: actions/checkout@main

6
.github/workflows/pytest.yml поставляемый
Просмотреть файл

@ -5,7 +5,6 @@ on:
paths: # This action will only run if the PR modifies a file in one of these directories
- 'ml-agents/**'
- 'ml-agents-envs/**'
- 'gym-unity/**'
- 'test_constraints*.txt'
- 'test_requirements.txt'
- '.github/workflows/pytest.yml'
@ -47,7 +46,7 @@ jobs:
# # This path is specific to Ubuntu
# path: ~/.cache/pip
# # Look to see if there is a cache hit for the corresponding requirements file
# key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'gym-unity/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
# key: ${{ runner.os }}-pip-${{ hashFiles('ml-agents/setup.py', 'ml-agents-envs/setup.py', 'test_requirements.txt', matrix.pip_constraints) }}
# restore-keys: |
# ${{ runner.os }}-pip-
# ${{ runner.os }}-
@ -60,14 +59,13 @@ jobs:
python -m pip install --progress-bar=off -e ./ml-agents-envs -c ${{ matrix.pip_constraints }}
python -m pip install --progress-bar=off -e ./ml-agents -c ${{ matrix.pip_constraints }}
python -m pip install --progress-bar=off -r test_requirements.txt -c ${{ matrix.pip_constraints }}
python -m pip install --progress-bar=off -e ./gym-unity -c ${{ matrix.pip_constraints }}
python -m pip install --progress-bar=off -e ./ml-agents-plugin-examples -c ${{ matrix.pip_constraints }}
- name: Save python dependencies
run: |
pip freeze > pip_versions-${{ matrix.python-version }}.txt
cat pip_versions-${{ matrix.python-version }}.txt
- name: Run pytest
run: pytest --cov=ml-agents --cov=ml-agents-envs --cov=gym-unity --cov-report html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
run: pytest --cov=ml-agents --cov=ml-agents-envs --cov-report=html --junitxml=junit/test-results-${{ matrix.python-version }}.xml -p no:warnings -v
- name: Upload pytest test results
uses: actions/upload-artifact@v2
with:

Просмотреть файл

@ -22,10 +22,6 @@ repos:
# Exclude protobuf files and don't follow them when imported
exclude: ".*_pb2.py"
args: [--ignore-missing-imports, --disallow-incomplete-defs]
- id: mypy
name: mypy-gym-unity
files: "gym-unity/.*"
args: [--ignore-missing-imports, --disallow-incomplete-defs]
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.1

Просмотреть файл

@ -30,7 +30,6 @@ test_gym_interface_{{ editor.version }}:
pull_request.changes.any match "Project/**" OR
pull_request.changes.any match "ml-agents/tests/yamato/**" OR
pull_request.changes.any match "ml-agents-envs/**" OR
pull_request.changes.any match "gym-unity/**" OR
pull_request.changes.any match ".yamato/gym-interface-test.yml") AND
NOT pull_request.changes.all match "**/*.md"
{% endif %}

Просмотреть файл

@ -38,8 +38,8 @@ developer communities.
- Train using multiple concurrent Unity environment instances
- Utilizes the [Unity Inference Engine](docs/Unity-Inference-Engine.md) to
provide native cross-platform support
- Unity environment [control from Python](docs/Python-API.md)
- Wrap Unity learning environments as a [gym](gym-unity/README.md)
- Unity environment [control from Python](docs/Python-LLAPI.md)
- Wrap Unity learning environments as a [gym](docs/Python-Gym-API.md)
See our [ML-Agents Overview](docs/ML-Agents-Overview.md) page for detailed
descriptions of all these features.

Просмотреть файл

@ -9,15 +9,17 @@ and this project adheres to
## [Unreleased]
### Major Changes
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#4)
#### ml-agents / ml-agents-envs
- The minimum supported Python version for ml-agents-envs was changed to 3.7.2 (#5)
- Added support for the PettingZoo multi-agent API (#6)
- Refactored `gym-unity` into the `ml-agents-envs` package (#6)
### Minor Changes
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
#### ml-agents / ml-agents-envs
### Bug Fixes
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
#### ml-agents / ml-agents-envs
## [2.2.1-exp.1] - 2022-01-14
### Major Changes

Просмотреть файл

@ -18,8 +18,6 @@ The ML-Agents Toolkit contains several components:
a Unity scene. It is a foundational layer that facilitates data messaging
between Unity scene and the Python machine learning algorithms.
Consequently, `mlagents` depends on `mlagents_envs`.
- [`gym_unity`](../gym-unity/) provides a Python-wrapper for your Unity scene
that supports the OpenAI Gym interface.
- Unity [Project](../Project/) that contains several
[example environments](Learning-Environment-Examples.md) that highlight the
various features of the toolkit to help you get started.

Просмотреть файл

@ -62,7 +62,7 @@ can interact with it.
## Interacting with the Environment
If you want to use the [Python API](Python-API.md) to interact with your
If you want to use the [Python API](Python-LLAPI.md) to interact with your
executable, you can pass the name of the executable with the argument
'file_name' of the `UnityEnvironment`. For instance:

Просмотреть файл

@ -5,4 +5,3 @@ See the package-specific Limitations pages:
- [`com.unity.mlagents` Unity package](../com.unity.ml-agents/Documentation~/com.unity.ml-agents.md#known-limitations)
- [`mlagents` Python package](../ml-agents/README.md#limitations)
- [`mlagents_envs` Python package](../ml-agents-envs/README.md#limitations)
- [`gym_unity` Python package](../gym-unity/README.md#limitations)

Просмотреть файл

@ -167,7 +167,7 @@ The ML-Agents Toolkit contains five high-level components:
process to communicate with and control the Academy during training. However,
it can be used for other purposes as well. For example, you could use the API
to use Unity as the simulation engine for your own machine learning
algorithms. See [Python API](Python-API.md) for more information.
algorithms. See [Python API](Python-LLAPI.md) for more information.
- **External Communicator** - which connects the Learning Environment with the
Python Low-Level API. It lives within the Learning Environment.
- **Python Trainers** which contains all the machine learning algorithms that
@ -179,9 +179,15 @@ The ML-Agents Toolkit contains five high-level components:
- **Gym Wrapper** (not pictured). A common way in which machine learning
researchers interact with simulation environments is via a wrapper provided by
OpenAI called [gym](https://github.com/openai/gym). We provide a gym wrapper
in a dedicated `gym-unity` Python package and
[instructions](../gym-unity/README.md) for using it with existing machine
in the `ml-agents-envs` package and
[instructions](Python-Gym-API.md) for using it with existing machine
learning algorithms which utilize gym.
- **PettingZoo Wrapper** (not pictured) PettingZoo is python API for
interacting with multi-agent simulation environments that provides a
gym-like interface. We provide a PettingZoo wrapper for Unity ML-Agents
environments in the `ml-agents-envs` package and
[instructions](Python-PettingZoo-API.md) for using it with machine learning
algorithms.
<p align="center">
<img src="images/learning_environment_basic.png"
@ -286,10 +292,10 @@ In the previous mode, the Agents were used for training to generate a PyTorch
model that the Agents can later use. However, any user of the ML-Agents Toolkit
can leverage their own algorithms for training. In this case, the behaviors of
all the Agents in the scene will be controlled within Python. You can even turn
your environment into a [gym.](../gym-unity/README.md)
your environment into a [gym.](Python-Gym-API.md)
We do not currently have a tutorial highlighting this mode, but you can learn
more about the Python API [here](Python-API.md).
more about the Python API [here](Python-LLAPI.md).
## Flexible Training Scenarios

Просмотреть файл

@ -1,6 +1,25 @@
# Upgrading
# Migrating
<!---
TODO: update ml-agents-env package version before release
--->
## Migrating to the ml-agents-envs 0.29.0.dev0 package
- Python 3.7 is now the minimum version of python supported due to [python3.6 EOL](https://endoflife.date/python).
Please update your python installation to 3.7.2 or higher. Note: Due to an issue with the typing system, the maximum
version of python supported is python 3.9.9.
- The `gym-unity` package has been refactored into the `ml-agents-envs` package. Please update your imports accordingly.
- Example:
- Before
```python
from gym_unity.unity_gym_env import UnityToGymWrapper
```
- After:
```python
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
```
## Migrating the package to version 2.0
- The official version of Unity ML-Agents supports is now 2020.3 LTS. If you run
into issues, please consider deleting your project's Library folder and reponening your
@ -260,9 +279,9 @@ vector observations to be used simultaneously.
- The `play_against_current_self_ratio` self-play trainer hyperparameter has
been renamed to `play_against_latest_model_ratio`
- Removed the multi-agent gym option from the gym wrapper. For multi-agent
scenarios, use the [Low Level Python API](Python-API.md).
scenarios, use the [Low Level Python API](Python-LLAPI.md).
- The low level Python API has changed. You can look at the document
[Low Level Python API documentation](Python-API.md) for more information. If
[Low Level Python API documentation](Python-LLAPI.md) for more information. If
you use `mlagents-learn` for training, this should be a transparent change.
- The obsolete `Agent` methods `GiveModel`, `Done`, `InitializeAgent`,
`AgentAction` and `AgentReset` have been removed.
@ -487,7 +506,7 @@ vector observations to be used simultaneously.
### Important changes
- The low level Python API has changed. You can look at the document
[Low Level Python API documentation](Python-API.md) for more information. This
[Low Level Python API documentation](Python-LLAPI.md) for more information. This
should only affect you if you're writing a custom trainer; if you use
`mlagents-learn` for training, this should be a transparent change.
- `reset()` on the Low-Level Python API no longer takes a `train_mode`
@ -497,7 +516,7 @@ vector observations to be used simultaneously.
`UnityEnvironment` no longer has a `reset_parameters` field. To modify float
properties in the environment, you must use a `FloatPropertiesChannel`. For
more information, refer to the
[Low Level Python API documentation](Python-API.md)
[Low Level Python API documentation](Python-LLAPI.md)
- `CustomResetParameters` are now removed.
- The Academy no longer has a `Training Configuration` nor
`Inference Configuration` field in the inspector. To modify the configuration

Просмотреть файл

@ -0,0 +1,161 @@
# Table of Contents
* [mlagents\_envs.envs.unity\_gym\_env](#mlagents_envs.envs.unity_gym_env)
* [UnityGymException](#mlagents_envs.envs.unity_gym_env.UnityGymException)
* [UnityToGymWrapper](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper)
* [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__)
* [reset](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset)
* [step](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step)
* [render](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render)
* [close](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close)
* [seed](#mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed)
* [ActionFlattener](#mlagents_envs.envs.unity_gym_env.ActionFlattener)
* [\_\_init\_\_](#mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__)
* [lookup\_action](#mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action)
<a name="mlagents_envs.envs.unity_gym_env"></a>
# mlagents\_envs.envs.unity\_gym\_env
<a name="mlagents_envs.envs.unity_gym_env.UnityGymException"></a>
## UnityGymException Objects
```python
class UnityGymException(error.Error)
```
Any error related to the gym wrapper of ml-agents.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper"></a>
## UnityToGymWrapper Objects
```python
class UnityToGymWrapper(gym.Env)
```
Provides Gym wrapper for Unity Learning Environments.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.__init__"></a>
#### \_\_init\_\_
```python
| __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None)
```
Environment initialization
**Arguments**:
- `unity_env`: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes.
- `uint8_visual`: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0).
- `flatten_branched`: If True, turn branched discrete action spaces into a Discrete space rather than
MultiDiscrete.
- `allow_multiple_obs`: If True, return a list of np.ndarrays as observations with the first elements
containing the visual observations and the last element containing the array of vector observations.
If False, returns a single np.ndarray containing either only a single visual observation or the array of
vector observations.
- `action_space_seed`: If non-None, will be used to set the random seed on created gym.Space instances.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.reset"></a>
#### reset
```python
| reset() -> Union[List[np.ndarray], np.ndarray]
```
Resets the state of the environment and returns an initial observation.
Returns: observation (object/list): the initial observation of the
space.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.step"></a>
#### step
```python
| step(action: List[Any]) -> GymStepResult
```
Run one timestep of the environment's dynamics. When end of
episode is reached, you are responsible for calling `reset()`
to reset this environment's state.
Accepts an action and returns a tuple (observation, reward, done, info).
**Arguments**:
- `action` _object/list_ - an action provided by the environment
**Returns**:
- `observation` _object/list_ - agent's observation of the current environment
reward (float/list) : amount of reward returned after previous action
- `done` _boolean/list_ - whether the episode has ended.
- `info` _dict_ - contains auxiliary diagnostic information.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.render"></a>
#### render
```python
| render(mode="rgb_array")
```
Return the latest visual observations.
Note that it will not render a new frame of the environment.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.close"></a>
#### close
```python
| close() -> None
```
Override _close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when
garbage collected or when the program exits.
<a name="mlagents_envs.envs.unity_gym_env.UnityToGymWrapper.seed"></a>
#### seed
```python
| seed(seed: Any = None) -> None
```
Sets the seed for this env's random number generator(s).
Currently not implemented.
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener"></a>
## ActionFlattener Objects
```python
class ActionFlattener()
```
Flattens branched discrete action spaces into single-branch discrete action spaces.
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.__init__"></a>
#### \_\_init\_\_
```python
| __init__(branched_action_space)
```
Initialize the flattener.
**Arguments**:
- `branched_action_space`: A List containing the sizes of each branch of the action
space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively.
<a name="mlagents_envs.envs.unity_gym_env.ActionFlattener.lookup_action"></a>
#### lookup\_action
```python
| lookup_action(action)
```
Convert a scalar discrete action into a unique set of branched actions.
**Arguments**:
- `action`: A scalar value representing one of the discrete actions.
**Returns**:
The List containing the branched actions.

126
gym-unity/README.md → docs/Python-Gym-API.md Executable file → Normal file
Просмотреть файл

@ -11,17 +11,9 @@ Unity environment via Python.
## Installation
The gym wrapper can be installed using:
The gym wrapper is part of the `mlgents_envs` package. Please refer to the
[mlagents_envs installation instructions](../ml-agents-envs/README.md).
```sh
pip3 install gym_unity
```
or by running the following from the `/gym-unity` directory of the repository:
```sh
pip3 install -e .
```
## Using the Gym Wrapper
@ -29,7 +21,7 @@ The gym interface is available from `gym_unity.envs`. To launch an environment
from the root of the project repository use:
```python
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multiple_obs)
```
@ -107,35 +99,37 @@ from baselines import deepq
from baselines import logger
from mlagents_envs.environment import UnityEnvironment
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
def main():
unity_env = UnityEnvironment(<path-to-environment>)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
logger.configure('./logs') # Change to log in a different directory
act = deepq.learn(
env,
"cnn", # For visual inputs
lr=2.5e-4,
total_timesteps=1000000,
buffer_size=50000,
exploration_fraction=0.05,
exploration_final_eps=0.1,
print_freq=20,
train_freq=5,
learning_starts=20000,
target_network_update_freq=50,
gamma=0.99,
prioritized_replay=False,
checkpoint_freq=1000,
checkpoint_path='./logs', # Change to save model in a different directory
dueling=True
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")
unity_env = UnityEnvironment( < path - to - environment >)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
logger.configure('./logs') # Change to log in a different directory
act = deepq.learn(
env,
"cnn", # For visual inputs
lr=2.5e-4,
total_timesteps=1000000,
buffer_size=50000,
exploration_fraction=0.05,
exploration_final_eps=0.1,
print_freq=20,
train_freq=5,
learning_starts=20000,
target_network_update_freq=50,
gamma=0.99,
prioritized_replay=False,
checkpoint_freq=1000,
checkpoint_path='./logs', # Change to save model in a different directory
dueling=True
)
print("Saving model to unity_model.pkl")
act.save("unity_model.pkl")
if __name__ == '__main__':
main()
main()
```
To start the training process, run the following from the directory containing
@ -163,7 +157,7 @@ method using the PPO2 baseline:
```python
from mlagents_envs.environment import UnityEnvironment
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs import UnityToGymWrapper
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
from baselines.bench import Monitor
@ -173,38 +167,44 @@ import baselines.ppo2.ppo2 as ppo2
import os
try:
from mpi4py import MPI
from mpi4py import MPI
except ImportError:
MPI = None
MPI = None
def make_unity_env(env_directory, num_env, visual, start_index=0):
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank, use_visual=True): # pylint: disable=C0111
def _thunk():
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
if visual:
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
else:
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
return DummyVecEnv([make_env(rank, use_visual=False)])
"""
Create a wrapped, monitored Unity environment.
"""
def make_env(rank, use_visual=True): # pylint: disable=C0111
def _thunk():
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
env = UnityToGymWrapper(unity_env, uint8_visual=True)
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
return env
return _thunk
if visual:
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
else:
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
return DummyVecEnv([make_env(rank, use_visual=False)])
def main():
env = make_unity_env(<path-to-environment>, 4, True)
ppo2.learn(
network="mlp",
env=env,
total_timesteps=100000,
lr=1e-3,
)
env = make_unity_env( < path - to - environment >, 4, True)
ppo2.learn(
network="mlp",
env=env,
total_timesteps=100000,
lr=1e-3,
)
if __name__ == '__main__':
main()
main()
```
## Run Google Dopamine Algorithms
@ -236,7 +236,7 @@ instantiated, just as in the Baselines example. At the top of the file, insert
```python
from mlagents_envs.environment import UnityEnvironment
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs import UnityToGymWrapper
```
to import the Gym Wrapper. Navigate to the `create_atari_environment` method in

Просмотреть файл

Просмотреть файл

@ -6,7 +6,7 @@ an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning. This document describes how to use the `mlagents_envs` API.
For information on using `mlagents-learn`, see [here](Training-ML-Agents.md).
For Python Low Level API documentation, see [here](Python-API-Documentation.md).
For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md).
The Python Low Level API can be used to interact directly with your Unity
learning environment. As such, it can serve as the basis for developing and

Просмотреть файл

@ -0,0 +1,246 @@
# Table of Contents
* [mlagents\_envs.envs.pettingzoo\_env\_factory](#mlagents_envs.envs.pettingzoo_env_factory)
* [PettingZooEnvFactory](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory)
* [env](#mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env)
* [mlagents\_envs.envs.unity\_aec\_env](#mlagents_envs.envs.unity_aec_env)
* [UnityAECEnv](#mlagents_envs.envs.unity_aec_env.UnityAECEnv)
* [\_\_init\_\_](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__)
* [step](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.step)
* [observe](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe)
* [last](#mlagents_envs.envs.unity_aec_env.UnityAECEnv.last)
* [mlagents\_envs.envs.unity\_parallel\_env](#mlagents_envs.envs.unity_parallel_env)
* [UnityParallelEnv](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv)
* [\_\_init\_\_](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__)
* [reset](#mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset)
* [mlagents\_envs.envs.unity\_pettingzoo\_base\_env](#mlagents_envs.envs.unity_pettingzoo_base_env)
* [UnityPettingzooBaseEnv](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv)
* [observation\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces)
* [observation\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space)
* [action\_spaces](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces)
* [action\_space](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space)
* [side\_channel](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel)
* [reset](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset)
* [seed](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed)
* [render](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render)
* [close](#mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close)
<a name="mlagents_envs.envs.pettingzoo_env_factory"></a>
# mlagents\_envs.envs.pettingzoo\_env\_factory
<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory"></a>
## PettingZooEnvFactory Objects
```python
class PettingZooEnvFactory()
```
<a name="mlagents_envs.envs.pettingzoo_env_factory.PettingZooEnvFactory.env"></a>
#### env
```python
| env(seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]) -> UnityAECEnv
```
Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
**Arguments**:
- `seed`: The seed for the action spaces of the agents.
- `kwargs`: Any argument accepted by `UnityEnvironment`class except file_name
<a name="mlagents_envs.envs.unity_aec_env"></a>
# mlagents\_envs.envs.unity\_aec\_env
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv"></a>
## UnityAECEnv Objects
```python
class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv)
```
Unity AEC (PettingZoo) environment wrapper.
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.__init__"></a>
#### \_\_init\_\_
```python
| __init__(env: BaseEnv, seed: Optional[int] = None)
```
Initializes a Unity AEC environment wrapper.
**Arguments**:
- `env`: The UnityEnvironment that is being wrapped.
- `seed`: The seed for the action spaces of the agents.
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.step"></a>
#### step
```python
| step(action: Any) -> None
```
Sets the action of the active agent and get the observation, reward, done
and info of the next agent.
**Arguments**:
- `action`: The action for the active agent
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.observe"></a>
#### observe
```python
| observe(agent_id)
```
Returns the observation an agent currently can make. `last()` calls this function.
<a name="mlagents_envs.envs.unity_aec_env.UnityAECEnv.last"></a>
#### last
```python
| last(observe=True)
```
returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
<a name="mlagents_envs.envs.unity_parallel_env"></a>
# mlagents\_envs.envs.unity\_parallel\_env
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv"></a>
## UnityParallelEnv Objects
```python
class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv)
```
Unity Parallel (PettingZoo) environment wrapper.
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.__init__"></a>
#### \_\_init\_\_
```python
| __init__(env: BaseEnv, seed: Optional[int] = None)
```
Initializes a Unity Parallel environment wrapper.
**Arguments**:
- `env`: The UnityEnvironment that is being wrapped.
- `seed`: The seed for the action spaces of the agents.
<a name="mlagents_envs.envs.unity_parallel_env.UnityParallelEnv.reset"></a>
#### reset
```python
| reset() -> Dict[str, Any]
```
Resets the environment.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env"></a>
# mlagents\_envs.envs.unity\_pettingzoo\_base\_env
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv"></a>
## UnityPettingzooBaseEnv Objects
```python
class UnityPettingzooBaseEnv()
```
Unity Petting Zoo base environment.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_spaces"></a>
#### observation\_spaces
```python
| @property
| observation_spaces() -> Dict[str, spaces.Space]
```
Return the observation spaces of all the agents.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.observation_space"></a>
#### observation\_space
```python
| observation_space(agent: str) -> Optional[spaces.Space]
```
The observation space of the current agent.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_spaces"></a>
#### action\_spaces
```python
| @property
| action_spaces() -> Dict[str, spaces.Space]
```
Return the action spaces of all the agents.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.action_space"></a>
#### action\_space
```python
| action_space(agent: str) -> Optional[spaces.Space]
```
The action space of the current agent.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.side_channel"></a>
#### side\_channel
```python
| @property
| side_channel() -> Dict[str, Any]
```
The side channels of the environment. You can access the side channels
of an environment with `env.side_channel[<name-of-channel>]`.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.reset"></a>
#### reset
```python
| reset()
```
Resets the environment.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.seed"></a>
#### seed
```python
| seed(seed=None)
```
Reseeds the environment (making the resulting environment deterministic).
`reset()` must be called after `seed()`, and before `step()`.
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.render"></a>
#### render
```python
| render(mode="human")
```
NOT SUPPORTED.
Displays a rendered frame from the environment, if supported.
Alternate render modes in the default environments are `'rgb_array'`
which returns a numpy array and is supported by all environments outside of classic,
and `'ansi'` which returns the strings printed (specific to classic environments).
<a name="mlagents_envs.envs.unity_pettingzoo_base_env.UnityPettingzooBaseEnv.close"></a>
#### close
```python
| close() -> None
```
Close the environment.

Просмотреть файл

@ -0,0 +1,54 @@
# Unity ML-Agents PettingZoo Wrapper
With the increasing interest in multi-agent training with a gym-like API, we provide a
PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper
provides interfaces on top of our `UnityEnvironment` class, which is the default way of
interfacing with a Unity environment via Python.
## Installation and Examples
[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb)
This colab notebook demonstrates the example usage of the wrapper, including installation,
basic usages, and an example with our
[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie)
which is a multi-agents environment with multiple different behavior names.
## API interface
This wrapper is compatible with PettingZoo API. Please check out
[PettingZoo API page](https://www.pettingzoo.ml/api) for more details.
Here's an example of interacting with wrapped environment:
```python
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.envs import UnityToPettingZooWrapper
unity_env = UnityEnvironment("StrikersVsGoalie")
env = UnityToPettingZooWrapper(unity_env)
env.reset()
for agent in env.agent_iter():
observation, reward, done, info = env.last()
action = policy(observation, agent)
env.step(action)
```
## Notes
- There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments)
and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs.
- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly
different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`,
the PZ wrapper will store the action, and will only perform environment stepping when all the
agents requesting for actions in the current step have been assigned an action. This is for
performance, considering that the communication between Unity and python is more efficient
when data are sent in batches.
- Since the actions for the AEC wrapper are stored without applying them to the environment until
all the actions are queued, some components of the API might behave in unexpected way. For example, a call
to `env.reward` should return the instantaneous reward for that particular step, but the true
reward would only be available when an actual environment step is performed. It's recommended that
you follow the API definition for training (access rewards from `env.last()` instead of
`env.reward`) and the underlying mechanism shouldn't affect training results.
- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will
keep going on until the specified max step is reached (default: `2**63`). There is no need to
call `env.reset()` except for the very beginning of instantiating an environment.

Просмотреть файл

@ -51,10 +51,10 @@
## API Docs
- [API Reference](API-Reference.md)
- [Python API Documentation](Python-API-Documentation.md)
- [How to use the Python API](Python-API.md)
- [Python API Documentation](Python-LLAPI-Documentation.md)
- [How to use the Python API](Python-LLAPI.md)
- [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)
## Translations

Просмотреть файл

@ -1,6 +1,6 @@
# Unity Environment Registry [Experimental]
The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-API.md).
The Unity Environment Registry is a database of pre-built Unity environments that can be easily used without having to install the Unity Editor. It is a great way to get started with our [UnityEnvironment API](Python-LLAPI.md).
## Loading an Environment from the Registry
@ -14,7 +14,7 @@ for name in environment_names:
print(name)
```
The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-API.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:
The `make()` method on a registry value will return a `UnityEnvironment` ready to be used. All arguments passed to the make method will be passed to the constructor of the `UnityEnvironment` as well. Refer to the documentation on the [Python-API](Python-LLAPI.md) for more information about the arguments of the `UnityEnvironment` constructor. For example, the following code will create the environment under the identifier `"my-env"`, reset it, perform a few steps and finally close it:
```python
from mlagents_envs.registry import default_registry

Просмотреть файл

@ -18,8 +18,7 @@ from dependencies of other projects. This has a few advantages:
with the different version.
## Python Version Requirement (Required)
This guide has been tested with Python 3.7 through Python 3.8. Newer versions might not
This guide has been tested with Python 3.7.2 through Python 3.9.9. Newer versions might not
have support for the dependent libraries, so are not recommended.
## Installing Pip (Required)
@ -64,8 +63,7 @@ then python3-distutils needs to be installed. Install python3-distutils using
environment using the same `activate` command listed above)
Note:
- Verify that you are using Python 3.7. Launch a command prompt
using `cmd` and execute `python --version` to verify the version.
- Verify that you are using a Python version between 3.7.2 and 3.9.9. Launch a
command prompt using `cmd` and execute `python --version` to verify the version.
- Python3 installation may require admin privileges on Windows.
- This guide is for Windows 10 using a 64-bit architecture only.

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 67 KiB

После

Ширина:  |  Высота:  |  Размер: 67 KiB

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 36 KiB

После

Ширина:  |  Высота:  |  Размер: 36 KiB

Просмотреть файл

@ -51,9 +51,9 @@
## API Docs
- [API Reference](API-Reference.md)
- [How to use the Python API](Python-API.md)
- [How to use the Python API](Python-LLAPI.md)
- [How to use the Unity Environment Registry](Unity-Environment-Registry.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](../gym-unity/README.md)
- [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)](Python-Gym-API.md)
## Translations
@ -78,4 +78,4 @@ to keep them up just in case they are helpful to you.
- [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
- [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
-->
-->

Просмотреть файл

@ -25,7 +25,7 @@ ML-Agents Academy 类按如下方式编排 agent 模拟循环:
要创建训练环境,请扩展 Academy 和 Agent 类以实现上述方法。`Agent.CollectObservations()` 和 `Agent.AgentAction()` 函数必须实现;而其他方法是可选的,即是否需要实现它们取决于您的具体情况。
**注意:**在这里用到的 Python API 也可用于其他目的。例如,借助于该 API您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-API.md) 以了解更多信息。
**注意:**在这里用到的 Python API 也可用于其他目的。例如,借助于该 API您可以将 Unity 用作您自己的机器学习算法的模拟引擎。请参阅 [Python API](/docs/Python-LLAPI.md) 以了解更多信息。
## 组织 Unity 场景

Просмотреть файл

@ -252,7 +252,7 @@ Internal Brain 中,以便为连接到该 Brain 的所有 Agent 生成
的 Brain 类型都会设置为 External并且场景中所有 Agent 的行为
都将在 Python 中接受控制。
我们目前没有教程介绍这种模式,但您可以在[这里](/docs/Python-API.md)
我们目前没有教程介绍这种模式,但您可以在[这里](/docs/Python-LLAPI.md)
了解有关 Python API 的更多信息。
### Curriculum Learning课程学习

Просмотреть файл

@ -39,6 +39,6 @@
## API 文档
* [API 参考](/docs/API-Reference.md)
* [如何使用 Python API](/docs/Python-API.md)
* [如何使用 Python API](/docs/Python-LLAPI.md)
**注:** 有翻译版的文档会在右上角标注*号。
**注:** 有翻译版的文档会在右上角标注*号。

Просмотреть файл

@ -1,5 +0,0 @@
# Version of the library that will be used to upload to pypi
__version__ = "0.29.0.dev0"
# Git tag that will be checked to determine whether to trigger upload to pypi
__release_tag__ = None

Просмотреть файл

@ -1,43 +0,0 @@
#!/usr/bin/env python
import os
import sys
from setuptools import setup, find_packages
from setuptools.command.install import install
import gym_unity
VERSION = gym_unity.__version__
EXPECTED_TAG = gym_unity.__release_tag__
class VerifyVersionCommand(install):
"""
Custom command to verify that the git tag is the expected one for the release.
Originally based on https://circleci.com/blog/continuously-deploying-python-packages-to-pypi-with-circleci/
This differs slightly because our tags and versions are different.
"""
description = "verify that the git tag matches our version"
def run(self):
tag = os.getenv("GITHUB_REF", "NO GITHUB TAG!").replace("refs/tags/", "")
if tag != EXPECTED_TAG:
info = "Git tag: {} does not match the expected tag of this app: {}".format(
tag, EXPECTED_TAG
)
sys.exit(info)
setup(
name="gym_unity",
version=VERSION,
description="Unity Machine Learning Agents Gym Interface",
license="Apache License 2.0",
author="Unity Technologies",
author_email="ML-Agents@unity3d.com",
url="https://github.com/Unity-Technologies/ml-agents",
packages=find_packages(),
install_requires=["gym==0.21.0", f"mlagents_envs=={VERSION}"],
cmdclass={"verify": VerifyVersionCommand},
)

Просмотреть файл

@ -2,9 +2,13 @@
The `mlagents_envs` Python package is part of the
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
`mlagents_envs` provides a Python API that allows direct interaction with the
Unity game engine. It is used by the trainer implementation in `mlagents` as
well as the `gym-unity` package to perform reinforcement learning within Unity.
`mlagents_envs` provides three Python APIs that allows direct interaction with the
Unity game engine:
- A single agent API (Gym API)
- A gym-like multi-agent API (PettingZoo API)
- A low-level API (LLAPI)
The LLAPI is used by the trainer implementation in `mlagents`.
`mlagents_envs` can be used independently of `mlagents` for Python
communication.
@ -13,13 +17,17 @@ communication.
Install the `mlagents_envs` package with:
```sh
python -m pip install mlagents_envs==0.28.0
python -m pip install mlagents_envs==0.29.0
```
## Usage & More Information
See the [Python API Guide](../docs/Python-API.md) for more information on how to
use the API to interact with a Unity environment.
See
- [Gym API Guide](../docs/Python-Gym-API.md)
- [PettingZoo API Guide](../docs/Python-PettingZoo-API.md)
- [Python API Guide](../docs/Python-LLAPI.md)
for more information on how to use the API to interact with a Unity environment.
For more information on the ML-Agents Toolkit and how to instrument a Unity
scene with the ML-Agents SDK, check out the main

Просмотреть файл

@ -0,0 +1,318 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML-Agents PettingZoo Wrapper"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#@title Install Rendering Dependencies { display-mode: \"form\" }\n",
"#@markdown (You only need to run this code when using Colab's hosted runtime)\n",
"\n",
"import os\n",
"from IPython.display import HTML, display\n",
"\n",
"def progress(value, max=100):\n",
" return HTML(\"\"\"\n",
" <progress\n",
" value='{value}'\n",
" max='{max}',\n",
" style='width: 100%'\n",
" >\n",
" {value}\n",
" </progress>\n",
" \"\"\".format(value=value, max=max))\n",
"\n",
"pro_bar = display(progress(0, 100), display_id=True)\n",
"\n",
"try:\n",
" import google.colab\n",
" INSTALL_XVFB = True\n",
"except ImportError:\n",
" INSTALL_XVFB = 'COLAB_ALWAYS_INSTALL_XVFB' in os.environ\n",
"\n",
"if INSTALL_XVFB:\n",
" with open('frame-buffer', 'w') as writefile:\n",
" writefile.write(\"\"\"#taken from https://gist.github.com/jterrace/2911875\n",
"XVFB=/usr/bin/Xvfb\n",
"XVFBARGS=\":1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset\"\n",
"PIDFILE=./frame-buffer.pid\n",
"case \"$1\" in\n",
" start)\n",
" echo -n \"Starting virtual X frame buffer: Xvfb\"\n",
" /sbin/start-stop-daemon --start --quiet --pidfile $PIDFILE --make-pidfile --background --exec $XVFB -- $XVFBARGS\n",
" echo \".\"\n",
" ;;\n",
" stop)\n",
" echo -n \"Stopping virtual X frame buffer: Xvfb\"\n",
" /sbin/start-stop-daemon --stop --quiet --pidfile $PIDFILE\n",
" rm $PIDFILE\n",
" echo \".\"\n",
" ;;\n",
" restart)\n",
" $0 stop\n",
" $0 start\n",
" ;;\n",
" *)\n",
" echo \"Usage: /etc/init.d/xvfb {start|stop|restart}\"\n",
" exit 1\n",
"esac\n",
"exit 0\n",
" \"\"\")\n",
" pro_bar.update(progress(5, 100))\n",
" !apt-get install daemon >/dev/null 2>&1\n",
" pro_bar.update(progress(10, 100))\n",
" !apt-get install wget >/dev/null 2>&1\n",
" pro_bar.update(progress(20, 100))\n",
" !wget http://security.ubuntu.com/ubuntu/pool/main/libx/libxfont/libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
" pro_bar.update(progress(30, 100))\n",
" !wget --output-document xvfb.deb http://security.ubuntu.com/ubuntu/pool/universe/x/xorg-server/xvfb_1.18.4-0ubuntu0.12_amd64.deb >/dev/null 2>&1\n",
" pro_bar.update(progress(40, 100))\n",
" !dpkg -i libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb >/dev/null 2>&1\n",
" pro_bar.update(progress(50, 100))\n",
" !dpkg -i xvfb.deb >/dev/null 2>&1\n",
" pro_bar.update(progress(70, 100))\n",
" !rm libxfont1_1.5.1-1ubuntu0.16.04.4_amd64.deb\n",
" pro_bar.update(progress(80, 100))\n",
" !rm xvfb.deb\n",
" pro_bar.update(progress(90, 100))\n",
" !bash frame-buffer start\n",
" os.environ[\"DISPLAY\"] = \":1\"\n",
"pro_bar.update(progress(100, 100))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installing ml-agents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" import mlagents\n",
" print(\"ml-agents already installed\")\n",
"except ImportError:\n",
" !git clone -b main --single-branch https://github.com/Unity-Technologies/ml-agents.git\n",
" !python -m pip install -q ./ml-agents/ml-agents-envs\n",
" !python -m pip install -q ./ml-agents/ml-agents\n",
" print(\"Installed ml-agents\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the Environment"
]
},
{
"cell_type": "markdown",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"List of available environments:\n",
"* Basic\n",
"* ThreeDBall\n",
"* ThreeDBallHard\n",
"* GridWorld\n",
"* Hallway\n",
"* VisualHallway\n",
"* CrawlerDynamicTarget\n",
"* CrawlerStaticTarget\n",
"* Bouncer\n",
"* SoccerTwos\n",
"* PushBlock\n",
"* VisualPushBlock\n",
"* WallJump\n",
"* Tennis\n",
"* Reacher\n",
"* Pyramids\n",
"* VisualPyramids\n",
"* Walker\n",
"* FoodCollector\n",
"* VisualFoodCollector\n",
"* StrikersVsGoalie\n",
"* WormStaticTarget\n",
"* WormDynamicTarget"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start Environment with PettingZoo Wrapper"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YSf-WhxbqtLw"
},
"outputs": [],
"source": [
"# -----------------\n",
"# This code is used to close an env that might not have been closed before\n",
"try:\n",
" env.close()\n",
"except:\n",
" pass\n",
"# -----------------\n",
"\n",
"import numpy as np\n",
"from mlagents_envs.envs import StrikersVsGoalie # import unity environment\n",
"env = StrikersVsGoalie.env()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stepping the environment\n",
"\n",
"Example of interacting with the environment in basic RL loop. It follows the same interface as described in [PettingZoo API page](https://www.pettingzoo.ml/api)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dhtl0mpeqxYi"
},
"outputs": [],
"source": [
"num_cycles = 10\n",
"\n",
"env.reset()\n",
"for agent in env.agent_iter(env.num_agents * num_cycles):\n",
" prev_observe, reward, done, info = env.last()\n",
" if isinstance(prev_observe, dict) and 'action_mask' in prev_observe:\n",
" action_mask = prev_observe['action_mask']\n",
" if done:\n",
" action = None\n",
" else:\n",
" action = env.action_spaces[agent].sample() # randomly choose an action for example\n",
" env.step(action)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional Environment API\n",
"\n",
"All the API described in the `Additional Environment API` section in the [PettingZoo API page](https://www.pettingzoo.ml/api) are all supported. A few examples are shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# `agents`: a list of the names of all current agents\n",
"print(\"Agent names:\", env.agents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# `agent_selection`: the currently agent that an action can be taken for.\n",
"print(\"Current agent:\", env.agent_selection)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# `observation_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
"print(\"Observation space of current agent:\", env.observation_spaces[env.agent_selection])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# `action_spaces`: a dict of the observation spaces of every agent, keyed by name.\n",
"print(\"Action space of current agent:\", env.action_spaces[env.agent_selection])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Close the Environment to free the port it is using"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "a7KatdThq7OV"
},
"outputs": [],
"source": [
"env.close()"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Colab-UnityEnvironment-1-Run.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

Просмотреть файл

@ -0,0 +1,15 @@
from mlagents_envs.registry import default_registry
from mlagents_envs.envs.pettingzoo_env_factory import logger, PettingZooEnvFactory
# Register each environment in default_registry as a PettingZooEnv
for key in default_registry:
env_name = key
if key[0].isdigit():
env_name = key.replace("3", "Three")
if not env_name.isidentifier():
logger.warning(
f"Environment id {env_name} can not be registered since it is"
f"not a valid identifier name."
)
continue
locals()[env_name] = PettingZooEnvFactory(key)

Просмотреть файл

@ -0,0 +1,76 @@
from urllib.parse import urlparse, parse_qs
def _behavior_to_agent_id(behavior_name: str, unique_id: int) -> str:
return f"{behavior_name}?agent_id={unique_id}"
def _agent_id_to_behavior(agent_id: str) -> str:
return agent_id.split("?agent_id=")[0]
def _unwrap_batch_steps(batch_steps, behavior_name):
decision_batch, termination_batch = batch_steps
decision_id = [
_behavior_to_agent_id(behavior_name, i) for i in decision_batch.agent_id
]
termination_id = [
_behavior_to_agent_id(behavior_name, i) for i in termination_batch.agent_id
]
agents = decision_id + termination_id
obs = {
agent_id: [batch_obs[i] for batch_obs in termination_batch.obs]
for i, agent_id in enumerate(termination_id)
}
if decision_batch.action_mask is not None:
obs.update(
{
agent_id: {
"observation": [batch_obs[i] for batch_obs in decision_batch.obs],
"action_mask": [mask[i] for mask in decision_batch.action_mask],
}
for i, agent_id in enumerate(decision_id)
}
)
else:
obs.update(
{
agent_id: [batch_obs[i] for batch_obs in decision_batch.obs]
for i, agent_id in enumerate(decision_id)
}
)
obs = {k: v if len(v) > 1 else v[0] for k, v in obs.items()}
dones = {agent_id: True for agent_id in termination_id}
dones.update({agent_id: False for agent_id in decision_id})
rewards = {
agent_id: termination_batch.reward[i]
for i, agent_id in enumerate(termination_id)
}
rewards.update(
{agent_id: decision_batch.reward[i] for i, agent_id in enumerate(decision_id)}
)
cumulative_rewards = {k: v for k, v in rewards.items()}
infos = {}
for i, agent_id in enumerate(decision_id):
infos[agent_id] = {}
infos[agent_id]["behavior_name"] = behavior_name
infos[agent_id]["group_id"] = decision_batch.group_id[i]
infos[agent_id]["group_reward"] = decision_batch.group_reward[i]
for i, agent_id in enumerate(termination_id):
infos[agent_id] = {}
infos[agent_id]["behavior_name"] = behavior_name
infos[agent_id]["group_id"] = termination_batch.group_id[i]
infos[agent_id]["group_reward"] = termination_batch.group_reward[i]
infos[agent_id]["interrupted"] = termination_batch.interrupted[i]
id_map = {agent_id: i for i, agent_id in enumerate(decision_id)}
return agents, obs, dones, rewards, cumulative_rewards, infos, id_map
def _parse_behavior(full_behavior):
parsed = urlparse(full_behavior)
name = parsed.path
ids = parse_qs(parsed.query)
team_id: int = 0
if "team" in ids:
team_id = int(ids["team"][0])
return name, team_id

Просмотреть файл

@ -0,0 +1,50 @@
from typing import Optional, Union, List
from mlagents_envs import logging_util
from mlagents_envs.exception import UnityWorkerInUseException
from mlagents_envs.registry import default_registry
from mlagents_envs.side_channel.engine_configuration_channel import (
EngineConfigurationChannel,
)
from mlagents_envs.side_channel.environment_parameters_channel import (
EnvironmentParametersChannel,
)
from mlagents_envs.side_channel.stats_side_channel import StatsSideChannel
from mlagents_envs.envs.unity_aec_env import UnityAECEnv
logger = logging_util.get_logger(__name__)
class PettingZooEnvFactory:
def __init__(self, env_id: str) -> None:
self.env_id = env_id
def env(
self, seed: Optional[int] = None, **kwargs: Union[List, int, bool, None]
) -> UnityAECEnv:
"""
Creates the environment with env_id from unity's default_registry and wraps it in a UnityToPettingZooWrapper
:param seed: The seed for the action spaces of the agents.
:param kwargs: Any argument accepted by `UnityEnvironment`class except file_name
"""
# If not side_channels specified, add the followings
if "side_channels" not in kwargs:
kwargs["side_channels"] = [
EngineConfigurationChannel(),
EnvironmentParametersChannel(),
StatsSideChannel(),
]
_env = None
# If no base port argument is provided, try ports starting at 6000 until one is free
if "base_port" not in kwargs:
port = 6000
while _env is None:
try:
kwargs["base_port"] = port
_env = default_registry[self.env_id].make(**kwargs)
except UnityWorkerInUseException:
port += 1
pass
else:
_env = default_registry[self.env_id].make(**kwargs)
return UnityAECEnv(_env, seed)

Просмотреть файл

@ -0,0 +1,72 @@
from typing import Any, Optional
from gym import error
from mlagents_envs.base_env import BaseEnv
from pettingzoo import AECEnv
from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
class UnityAECEnv(UnityPettingzooBaseEnv, AECEnv):
"""
Unity AEC (PettingZoo) environment wrapper.
"""
def __init__(self, env: BaseEnv, seed: Optional[int] = None):
"""
Initializes a Unity AEC environment wrapper.
:param env: The UnityEnvironment that is being wrapped.
:param seed: The seed for the action spaces of the agents.
"""
super().__init__(env, seed)
def step(self, action: Any) -> None:
"""
Sets the action of the active agent and get the observation, reward, done
and info of the next agent.
:param action: The action for the active agent
"""
self._assert_loaded()
if len(self._live_agents) <= 0:
raise error.Error(
"You must reset the environment before you can perform a step"
)
# Process action
current_agent = self._agents[self._agent_index]
self._process_action(current_agent, action)
self._agent_index += 1
# Reset reward
for k in self._rewards.keys():
self._rewards[k] = 0
if self._agent_index >= len(self._agents) and self.num_agents > 0:
# The index is too high, time to set the action for the agents we have
self._step()
self._live_agents.sort() # unnecessary, only for passing API test
def observe(self, agent_id):
"""
Returns the observation an agent currently can make. `last()` calls this function.
"""
return (
self._observations[agent_id],
self._cumm_rewards[agent_id],
self._dones[agent_id],
self._infos[agent_id],
)
def last(self, observe=True):
"""
returns observation, cumulative reward, done, info for the current agent (specified by self.agent_selection)
"""
obs, reward, done, info = self.observe(self._agents[self._agent_index])
return obs if observe else None, reward, done, info
@property
def agent_selection(self):
if not self._live_agents:
# If we had an agent finish then return that agent even though it isn't alive.
return self._agents[0]
return self._agents[self._agent_index]

Просмотреть файл

@ -19,8 +19,6 @@ class UnityGymException(error.Error):
logger = logging_util.get_logger(__name__)
logging_util.set_log_level(logging_util.INFO)
GymStepResult = Tuple[np.ndarray, float, bool, Dict]
@ -58,7 +56,7 @@ class UnityToGymWrapper(gym.Env):
self.visual_obs = None
# Save the step result from the last time all Agents requested decisions.
self._previous_decision_step: DecisionSteps = None
self._previous_decision_step: Optional[DecisionSteps] = None
self._flattener = None
# Hidden flag used by Atari environments to determine if the game is over
self.game_over = False
@ -355,7 +353,7 @@ class ActionFlattener:
def lookup_action(self, action):
"""
Convert a scalar discrete action into a unique set of branched actions.
:param: action: A scalar value representing one of the discrete actions.
:return: The List containing the branched actions.
:param action: A scalar value representing one of the discrete actions.
:returns: The List containing the branched actions.
"""
return self.action_lookup[action]

Просмотреть файл

@ -0,0 +1,53 @@
from typing import Optional, Dict, Any, Tuple
from gym import error
from mlagents_envs.base_env import BaseEnv
from pettingzoo import ParallelEnv
from mlagents_envs.envs.unity_pettingzoo_base_env import UnityPettingzooBaseEnv
class UnityParallelEnv(UnityPettingzooBaseEnv, ParallelEnv):
"""
Unity Parallel (PettingZoo) environment wrapper.
"""
def __init__(self, env: BaseEnv, seed: Optional[int] = None):
"""
Initializes a Unity Parallel environment wrapper.
:param env: The UnityEnvironment that is being wrapped.
:param seed: The seed for the action spaces of the agents.
"""
super().__init__(env, seed)
def reset(self) -> Dict[str, Any]:
"""
Resets the environment.
"""
super().reset()
return self._observations
def step(self, actions: Dict[str, Any]) -> Tuple:
self._assert_loaded()
if len(self._live_agents) <= 0 and actions:
raise error.Error(
"You must reset the environment before you can perform a step."
)
# Process actions
for current_agent, action in actions.items():
self._process_action(current_agent, action)
# Reset reward
for k in self._rewards.keys():
self._rewards[k] = 0
# Step environment
self._step()
# Agent cleanup and sorting
self._cleanup_agents()
self._live_agents.sort() # unnecessary, only for passing API test
return self._observations, self._rewards, self._dones, self._infos

Просмотреть файл

@ -0,0 +1,317 @@
import atexit
from typing import Optional, List, Set, Dict, Any, Tuple
import numpy as np
from gym import error, spaces
from mlagents_envs.base_env import BaseEnv, ActionTuple
from mlagents_envs.envs.env_helpers import _agent_id_to_behavior, _unwrap_batch_steps
class UnityPettingzooBaseEnv:
"""
Unity Petting Zoo base environment.
"""
def __init__(
self, env: BaseEnv, seed: Optional[int] = None, metadata: Optional[dict] = None
):
super().__init__()
atexit.register(self.close)
self._env = env
self.metadata = metadata
self._assert_loaded()
self._agent_index = 0
self._seed = seed
self._side_channel_dict = {
type(v).__name__: v
for v in self._env._side_channel_manager._side_channels_dict.values() # type: ignore
}
self._live_agents: List[str] = [] # agent id for agents alive
self._agents: List[str] = [] # all agent id in current step
self._possible_agents: Set[str] = set() # all agents that have ever appear
self._agent_id_to_index: Dict[str, int] = {} # agent_id: index in decision step
self._observations: Dict[str, np.ndarray] = {} # agent_id: obs
self._dones: Dict[str, bool] = {} # agent_id: done
self._rewards: Dict[str, float] = {} # agent_id: reward
self._cumm_rewards: Dict[str, float] = {} # agent_id: reward
self._infos: Dict[str, Dict] = {} # agent_id: info
self._action_spaces: Dict[str, spaces.Space] = {} # behavior_name: action_space
self._observation_spaces: Dict[
str, spaces.Space
] = {} # behavior_name: obs_space
self._current_action: Dict[str, ActionTuple] = {} # behavior_name: ActionTuple
# Take a single step so that the brain information will be sent over
if not self._env.behavior_specs:
self._env.step()
for behavior_name in self._env.behavior_specs.keys():
_, _, _ = self._batch_update(behavior_name)
self._update_observation_spaces()
self._update_action_spaces()
def _assert_loaded(self) -> None:
if self._env is None:
raise error.Error("No environment loaded")
@property
def observation_spaces(self) -> Dict[str, spaces.Space]:
"""
Return the observation spaces of all the agents.
"""
return {
agent_id: self._observation_spaces[_agent_id_to_behavior(agent_id)]
for agent_id in self._possible_agents
}
def observation_space(self, agent: str) -> Optional[spaces.Space]:
"""
The observation space of the current agent.
"""
behavior_name = _agent_id_to_behavior(agent)
return self._observation_spaces[behavior_name]
def _update_observation_spaces(self) -> None:
self._assert_loaded()
for behavior_name in self._env.behavior_specs.keys():
if behavior_name not in self._observation_spaces:
obs_spec = self._env.behavior_specs[behavior_name].observation_specs
obs_spaces = tuple(
spaces.Box(
low=-np.float32(np.inf),
high=np.float32(np.inf),
shape=spec.shape,
dtype=np.float32,
)
for spec in obs_spec
)
if len(obs_spaces) == 1:
self._observation_spaces[behavior_name] = obs_spaces[0]
else:
self._observation_spaces[behavior_name] = spaces.Tuple(obs_spaces)
@property
def action_spaces(self) -> Dict[str, spaces.Space]:
"""
Return the action spaces of all the agents.
"""
return {
agent_id: self._action_spaces[_agent_id_to_behavior(agent_id)]
for agent_id in self._possible_agents
}
def action_space(self, agent: str) -> Optional[spaces.Space]:
"""
The action space of the current agent.
"""
behavior_name = _agent_id_to_behavior(agent)
return self._action_spaces[behavior_name]
def _update_action_spaces(self) -> None:
self._assert_loaded()
for behavior_name in self._env.behavior_specs.keys():
if behavior_name not in self._action_spaces:
act_spec = self._env.behavior_specs[behavior_name].action_spec
if (
act_spec.continuous_size == 0
and len(act_spec.discrete_branches) == 0
):
raise error.Error("No actions found")
if act_spec.discrete_size == 1:
d_space = spaces.Discrete(act_spec.discrete_branches[0])
if self._seed is not None:
d_space.seed(self._seed)
if act_spec.continuous_size == 0:
self._action_spaces[behavior_name] = d_space
continue
if act_spec.discrete_size > 0:
d_space = spaces.MultiDiscrete(act_spec.discrete_branches)
if self._seed is not None:
d_space.seed(self._seed)
if act_spec.continuous_size == 0:
self._action_spaces[behavior_name] = d_space
continue
if act_spec.continuous_size > 0:
c_space = spaces.Box(
-1, 1, (act_spec.continuous_size,), dtype=np.int32
)
if self._seed is not None:
c_space.seed(self._seed)
if len(act_spec.discrete_branches) == 0:
self._action_spaces[behavior_name] = c_space
continue
self._action_spaces[behavior_name] = spaces.Tuple((c_space, d_space))
def _process_action(self, current_agent, action):
current_action_space = self.action_space(current_agent)
# Convert actions
if action is not None:
if isinstance(action, Tuple):
action = tuple(np.array(a) for a in action)
else:
action = self._action_to_np(current_action_space, action)
if not current_action_space.contains(action): # type: ignore
raise error.Error(
f"Invalid action, got {action} but was expecting action from {self.action_space}"
)
if isinstance(current_action_space, spaces.Tuple):
action = ActionTuple(action[0], action[1])
elif isinstance(current_action_space, spaces.MultiDiscrete):
action = ActionTuple(None, action)
elif isinstance(current_action_space, spaces.Discrete):
action = ActionTuple(None, np.array(action).reshape(1, 1))
else:
action = ActionTuple(action, None)
if not self._dones[current_agent]:
current_behavior = _agent_id_to_behavior(current_agent)
current_index = self._agent_id_to_index[current_agent]
if action.continuous is not None:
self._current_action[current_behavior].continuous[
current_index
] = action.continuous[0]
if action.discrete is not None:
self._current_action[current_behavior].discrete[
current_index
] = action.discrete[0]
else:
self._live_agents.remove(current_agent)
del self._observations[current_agent]
del self._dones[current_agent]
del self._rewards[current_agent]
del self._cumm_rewards[current_agent]
del self._infos[current_agent]
def _step(self):
for behavior_name, actions in self._current_action.items():
self._env.set_actions(behavior_name, actions)
self._env.step()
self._reset_states()
for behavior_name in self._env.behavior_specs.keys():
dones, rewards, cumulative_rewards = self._batch_update(behavior_name)
self._dones.update(dones)
self._rewards.update(rewards)
self._cumm_rewards.update(cumulative_rewards)
self._agent_index = 0
def _cleanup_agents(self):
for current_agent, done in self.dones.items():
if done:
self._live_agents.remove(current_agent)
@property
def side_channel(self) -> Dict[str, Any]:
"""
The side channels of the environment. You can access the side channels
of an environment with `env.side_channel[<name-of-channel>]`.
"""
self._assert_loaded()
return self._side_channel_dict
@staticmethod
def _action_to_np(current_action_space, action):
return np.array(action, dtype=current_action_space.dtype)
def _create_empty_actions(self, behavior_name, num_agents):
a_spec = self._env.behavior_specs[behavior_name].action_spec
return ActionTuple(
np.zeros((num_agents, a_spec.continuous_size), dtype=np.float32),
np.zeros((num_agents, len(a_spec.discrete_branches)), dtype=np.int32),
)
@property
def _cumulative_rewards(self):
return self._cumm_rewards
def _reset_states(self):
self._live_agents = []
self._agents = []
self._observations = {}
self._dones = {}
self._rewards = {}
self._cumm_rewards = {}
self._infos = {}
self._agent_id_to_index = {}
def reset(self):
"""
Resets the environment.
"""
self._assert_loaded()
self._agent_index = 0
self._reset_states()
self._possible_agents = set()
self._env.reset()
for behavior_name in self._env.behavior_specs.keys():
_, _, _ = self._batch_update(behavior_name)
self._live_agents.sort() # unnecessary, only for passing API test
self._dones = {agent: False for agent in self._agents}
self._rewards = {agent: 0 for agent in self._agents}
self._cumm_rewards = {agent: 0 for agent in self._agents}
def _batch_update(self, behavior_name):
current_batch = self._env.get_steps(behavior_name)
self._current_action[behavior_name] = self._create_empty_actions(
behavior_name, len(current_batch[0])
)
agents, obs, dones, rewards, cumulative_rewards, infos, id_map = _unwrap_batch_steps(
current_batch, behavior_name
)
self._live_agents += agents
self._agents += agents
self._observations.update(obs)
self._infos.update(infos)
self._agent_id_to_index.update(id_map)
self._possible_agents.update(agents)
return dones, rewards, cumulative_rewards
def seed(self, seed=None):
"""
Reseeds the environment (making the resulting environment deterministic).
`reset()` must be called after `seed()`, and before `step()`.
"""
self._seed = seed
def render(self, mode="human"):
"""
NOT SUPPORTED.
Displays a rendered frame from the environment, if supported.
Alternate render modes in the default environments are `'rgb_array'`
which returns a numpy array and is supported by all environments outside of classic,
and `'ansi'` which returns the strings printed (specific to classic environments).
"""
pass
@property
def dones(self):
return dict(self._dones)
@property
def agents(self):
return sorted(self._live_agents)
@property
def rewards(self):
return dict(self._rewards)
@property
def infos(self):
return dict(self._infos)
@property
def possible_agents(self):
return sorted(self._possible_agents)
def close(self) -> None:
"""
Close the environment.
"""
if self._env is not None:
self._env.close()
self._env = None # type: ignore
def __del__(self) -> None:
self.close()
def state(self):
pass

Просмотреть файл

Просмотреть файл

@ -2,7 +2,18 @@
folder: docs
modules:
- name: mlagents_envs
file_name: Python-API-Documentation.md
file_name: Python-Gym-API-Documentation.md
submodules:
- envs.unity_gym_env
- name: mlagents_envs
file_name: Python-PettingZoo-API-Documentation.md
submodules:
- envs.pettingzoo_env_factory
- envs.unity_aec_env
- envs.unity_parallel_env
- envs.unity_pettingzoo_base_env
- name: mlagents_envs
file_name: Python-LLAPI-Documentation.md
submodules:
- base_env
- environment

Просмотреть файл

@ -43,7 +43,9 @@ setup(
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
],
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
packages=find_packages(
exclude=["*.tests", "*.tests.*", "tests.*", "tests", "colabs", "*.ipynb"]
),
zip_safe=False,
install_requires=[
"cloudpickle",
@ -52,6 +54,9 @@ setup(
"Pillow>=4.2.1",
"protobuf>=3.6",
"pyyaml>=3.1.0",
"gym==0.21.0",
"pettingzoo==1.14.0",
"numpy==1.21.2",
],
python_requires=">=3.7.2",
cmdclass={"verify": VerifyVersionCommand},

Просмотреть файл

@ -0,0 +1,111 @@
from typing import List, Tuple
from mlagents_envs.base_env import ObservationSpec, DimensionProperty, ObservationType
import pytest
import copy
import os
from mlagents.trainers.settings import (
POCASettings,
TrainerSettings,
PPOSettings,
SACSettings,
GAILSettings,
CuriositySettings,
RewardSignalSettings,
NetworkSettings,
TrainerType,
RewardSignalType,
ScheduleType,
)
CONTINUOUS_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/test.demo"
DISCRETE_DEMO_PATH = os.path.dirname(os.path.abspath(__file__)) + "/testdcvis.demo"
_PPO_CONFIG = TrainerSettings(
trainer_type=TrainerType.PPO,
hyperparameters=PPOSettings(
learning_rate=5.0e-3,
learning_rate_schedule=ScheduleType.CONSTANT,
batch_size=16,
buffer_size=64,
),
network_settings=NetworkSettings(num_layers=1, hidden_units=32),
summary_freq=500,
max_steps=3000,
threaded=False,
)
_SAC_CONFIG = TrainerSettings(
trainer_type=TrainerType.SAC,
hyperparameters=SACSettings(
learning_rate=5.0e-3,
learning_rate_schedule=ScheduleType.CONSTANT,
batch_size=8,
buffer_init_steps=100,
buffer_size=5000,
tau=0.01,
init_entcoef=0.01,
),
network_settings=NetworkSettings(num_layers=1, hidden_units=16),
summary_freq=100,
max_steps=1000,
threaded=False,
)
_POCA_CONFIG = TrainerSettings(
trainer_type=TrainerType.POCA,
hyperparameters=POCASettings(
learning_rate=5.0e-3,
learning_rate_schedule=ScheduleType.CONSTANT,
batch_size=16,
buffer_size=64,
),
network_settings=NetworkSettings(num_layers=1, hidden_units=32),
summary_freq=500,
max_steps=3000,
threaded=False,
)
def ppo_dummy_config():
return copy.deepcopy(_PPO_CONFIG)
def sac_dummy_config():
return copy.deepcopy(_SAC_CONFIG)
def poca_dummy_config():
return copy.deepcopy(_POCA_CONFIG)
@pytest.fixture
def gail_dummy_config():
return {RewardSignalType.GAIL: GAILSettings(demo_path=CONTINUOUS_DEMO_PATH)}
@pytest.fixture
def curiosity_dummy_config():
return {RewardSignalType.CURIOSITY: CuriositySettings()}
@pytest.fixture
def extrinsic_dummy_config():
return {RewardSignalType.EXTRINSIC: RewardSignalSettings()}
def create_observation_specs_with_shapes(
shapes: List[Tuple[int, ...]]
) -> List[ObservationSpec]:
obs_specs: List[ObservationSpec] = []
for i, shape in enumerate(shapes):
dim_prop = (DimensionProperty.UNSPECIFIED,) * len(shape)
if len(shape) == 2:
dim_prop = (DimensionProperty.VARIABLE_SIZE, DimensionProperty.NONE)
spec = ObservationSpec(
name=f"observation {i} with shape {shape}",
shape=shape,
dimension_property=dim_prop,
observation_type=ObservationType.DEFAULT,
)
obs_specs.append(spec)
return obs_specs

Просмотреть файл

@ -0,0 +1,510 @@
"""
Copied from ml-agents/mlagents/trainers/tests/simple_test_envs.py
Modified the env so that it doesn't automatically reset and respawn agent in order to pass
pettingzoo api tests, since current PZ api test doesn't allow spawning new agents.
"""
import random
from typing import Dict, List, Any, Tuple
import numpy as np
from mlagents_envs.base_env import (
ActionSpec,
ObservationSpec,
ObservationType,
ActionTuple,
BaseEnv,
BehaviorSpec,
DecisionSteps,
TerminalSteps,
BehaviorMapping,
)
from mlagents_envs.side_channel.side_channel_manager import SideChannelManager
from dummy_config import create_observation_specs_with_shapes
OBS_SIZE = 1
VIS_OBS_SIZE = (20, 20, 3)
VAR_LEN_SIZE = (10, 5)
STEP_SIZE = 0.2
TIME_PENALTY = 0.01
MIN_STEPS = int(1.0 / STEP_SIZE) + 1
SUCCESS_REWARD = 1.0 + MIN_STEPS * TIME_PENALTY
def clamp(x, min_val, max_val):
return max(min_val, min(x, max_val))
class SimpleEnvironment(BaseEnv):
"""
Very simple "game" - the agent has a position on [-1, 1], gets a reward of 1 if it reaches 1, and a reward of -1 if
it reaches -1. The position is incremented by the action amount (clamped to [-step_size, step_size]).
"""
def __init__(
self,
brain_names,
step_size=STEP_SIZE,
num_visual=0,
num_vector=1,
num_var_len=0,
vis_obs_size=VIS_OBS_SIZE,
vec_obs_size=OBS_SIZE,
var_len_obs_size=VAR_LEN_SIZE,
action_sizes=(1, 0),
goal_indices=None,
):
super().__init__()
self.num_visual = num_visual
self.num_vector = num_vector
self.num_var_len = num_var_len
self.vis_obs_size = vis_obs_size
self.vec_obs_size = vec_obs_size
self.var_len_obs_size = var_len_obs_size
self.goal_indices = goal_indices
continuous_action_size, discrete_action_size = action_sizes
discrete_tuple = tuple(2 for _ in range(discrete_action_size))
action_spec = ActionSpec(continuous_action_size, discrete_tuple)
self.total_action_size = (
continuous_action_size + discrete_action_size
) # to set the goals/positions
self.action_spec = action_spec
self.behavior_spec = BehaviorSpec(self._make_observation_specs(), action_spec)
self.action_spec = action_spec
self.names = brain_names
self.positions: Dict[str, List[float]] = {}
self.step_count: Dict[str, float] = {}
self._side_channel_manager = SideChannelManager([])
# Concatenate the arguments for a consistent random seed
seed = (
brain_names,
step_size,
num_visual,
num_vector,
num_var_len,
vis_obs_size,
vec_obs_size,
var_len_obs_size,
action_sizes,
)
self.random = random.Random(str(seed))
self.goal: Dict[str, int] = {}
self.action = {}
self.rewards: Dict[str, float] = {}
self.final_rewards: Dict[str, List[float]] = {}
self.step_result: Dict[str, Tuple[DecisionSteps, TerminalSteps]] = {}
self.agent_id: Dict[str, int] = {}
self.step_size = step_size # defines the difficulty of the test
# Allow to be used as a UnityEnvironment during tests
self.academy_capabilities = None
for name in self.names:
self.agent_id[name] = 0
self.goal[name] = self.random.choice([-1, 1])
self.rewards[name] = 0
self.final_rewards[name] = []
self._reset_agent(name)
self.action[name] = None
self.step_result[name] = None
def _make_observation_specs(self) -> List[ObservationSpec]:
obs_shape: List[Any] = []
for _ in range(self.num_vector):
obs_shape.append((self.vec_obs_size,))
for _ in range(self.num_visual):
obs_shape.append(self.vis_obs_size)
for _ in range(self.num_var_len):
obs_shape.append(self.var_len_obs_size)
obs_spec = create_observation_specs_with_shapes(obs_shape)
if self.goal_indices is not None:
for i in range(len(obs_spec)):
if i in self.goal_indices:
obs_spec[i] = ObservationSpec(
shape=obs_spec[i].shape,
dimension_property=obs_spec[i].dimension_property,
observation_type=ObservationType.GOAL_SIGNAL,
name=obs_spec[i].name,
)
return obs_spec
def _make_obs(self, value: float) -> List[np.ndarray]:
obs = []
for _ in range(self.num_vector):
obs.append(np.ones((1, self.vec_obs_size), dtype=np.float32) * value)
for _ in range(self.num_visual):
obs.append(np.ones((1,) + self.vis_obs_size, dtype=np.float32) * value)
for _ in range(self.num_var_len):
obs.append(np.ones((1,) + self.var_len_obs_size, dtype=np.float32) * value)
return obs
@property
def behavior_specs(self):
behavior_dict = {}
for n in self.names:
behavior_dict[n] = self.behavior_spec
return BehaviorMapping(behavior_dict)
def set_action_for_agent(self, behavior_name, agent_id, action):
pass
def set_actions(self, behavior_name, action):
self.action[behavior_name] = action
def get_steps(self, behavior_name):
return self.step_result[behavior_name]
def _take_action(self, name: str) -> bool:
deltas = []
_act = self.action[name]
if self.action_spec.continuous_size > 0 and not _act:
for _cont in _act.continuous[0]:
deltas.append(_cont)
if self.action_spec.discrete_size > 0 and not _act:
for _disc in _act.discrete[0]:
deltas.append(1 if _disc else -1)
for i, _delta in enumerate(deltas):
_delta = clamp(_delta, -self.step_size, self.step_size)
self.positions[name][i] += _delta
self.positions[name][i] = clamp(self.positions[name][i], -1, 1)
self.step_count[name] += 1
# Both must be in 1.0 to be done
done = all(pos >= 1.0 or pos <= -1.0 for pos in self.positions[name])
return done
def _generate_mask(self):
action_mask = None
if self.action_spec.discrete_size > 0:
# LL-Python API will return an empty dim if there is only 1 agent.
ndmask = np.array(
2 * self.action_spec.discrete_size * [False], dtype=np.bool
)
ndmask = np.expand_dims(ndmask, axis=0)
action_mask = [ndmask]
return action_mask
def _compute_reward(self, name: str, done: bool) -> float:
if done:
reward = 0.0
for _pos in self.positions[name]:
reward += (SUCCESS_REWARD * _pos * self.goal[name]) / len(
self.positions[name]
)
else:
reward = -TIME_PENALTY
return reward
def _reset_agent(self, name):
self.goal[name] = self.random.choice([-1, 1])
self.positions[name] = [0.0 for _ in range(self.total_action_size)]
self.step_count[name] = 0
self.rewards[name] = 0
self.agent_id[name] = self.agent_id[name] + 1
def _make_batched_step(
self, name: str, done: bool, reward: float, group_reward: float
) -> Tuple[DecisionSteps, TerminalSteps]:
m_vector_obs = self._make_obs(self.goal[name])
m_reward = np.array([reward], dtype=np.float32)
m_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
m_group_id = np.array([0], dtype=np.int32)
m_group_reward = np.array([group_reward], dtype=np.float32)
action_mask = self._generate_mask()
decision_step = DecisionSteps(
m_vector_obs, m_reward, m_agent_id, action_mask, m_group_id, m_group_reward
)
terminal_step = TerminalSteps.empty(self.behavior_spec)
if done:
self.final_rewards[name].append(self.rewards[name])
# self._reset_agent(name)
# new_vector_obs = self._make_obs(self.goal[name])
# (
# new_reward,
# new_done,
# new_agent_id,
# new_action_mask,
# new_group_id,
# new_group_reward,
# ) = self._construct_reset_step(name)
# decision_step = DecisionSteps(
# new_vector_obs,
# new_reward,
# new_agent_id,
# new_action_mask,
# new_group_id,
# new_group_reward,
# )
decision_step = DecisionSteps([], [], [], [], [], [])
terminal_step = TerminalSteps(
m_vector_obs,
m_reward,
np.array([False], dtype=bool),
m_agent_id,
m_group_id,
m_group_reward,
)
return (decision_step, terminal_step)
def _construct_reset_step(
self, name: str
) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
new_reward = np.array([0.0], dtype=np.float32)
new_done = np.array([False], dtype=np.bool)
new_agent_id = np.array([self.agent_id[name]], dtype=np.int32)
new_action_mask = self._generate_mask()
new_group_id = np.array([0], dtype=np.int32)
new_group_reward = np.array([0.0], dtype=np.float32)
return (
new_reward,
new_done,
new_agent_id,
new_action_mask,
new_group_id,
new_group_reward,
)
def step(self) -> None:
assert all(action is not None for action in self.action.values())
for name in self.names:
done = self._take_action(name)
reward = self._compute_reward(name, done)
self.rewards[name] += reward
self.step_result[name] = self._make_batched_step(name, done, reward, 0.0)
def reset(self) -> None: # type: ignore
for name in self.names:
self._reset_agent(name)
self.step_result[name] = self._make_batched_step(name, False, 0.0, 0.0)
@property
def reset_parameters(self) -> Dict[str, str]:
return {}
def close(self):
pass
class MultiAgentEnvironment(BaseEnv):
"""
The MultiAgentEnvironment maintains a list of SimpleEnvironment, one for each agent.
When sending DecisionSteps and TerminalSteps to the trainers, it first batches the
decision steps from the individual environments. When setting actions, it indexes the
batched ActionTuple to obtain the ActionTuple for individual agents
"""
def __init__(
self,
brain_names,
step_size=STEP_SIZE,
num_visual=0,
num_vector=1,
num_var_len=0,
vis_obs_size=VIS_OBS_SIZE,
vec_obs_size=OBS_SIZE,
var_len_obs_size=VAR_LEN_SIZE,
action_sizes=(1, 0),
num_agents=2,
goal_indices=None,
):
super().__init__()
self.envs = {}
self.dones = {}
self.just_died = set()
self.names = brain_names
self.final_rewards: Dict[str, List[float]] = {}
for name in brain_names:
self.final_rewards[name] = []
for i in range(num_agents):
name_and_num = name + str(i)
self.envs[name_and_num] = SimpleEnvironment(
[name],
step_size,
num_visual,
num_vector,
num_var_len,
vis_obs_size,
vec_obs_size,
var_len_obs_size,
action_sizes,
goal_indices,
)
self.dones[name_and_num] = False
self.envs[name_and_num].reset()
# All envs have the same behavior spec, so just get the last one.
self.behavior_spec = self.envs[name_and_num].behavior_spec
self.action_spec = self.envs[name_and_num].action_spec
self.num_agents = num_agents
self._side_channel_manager = SideChannelManager([])
@property
def all_done(self):
return all(self.dones.values())
@property
def behavior_specs(self):
behavior_dict = {}
for n in self.names:
behavior_dict[n] = self.behavior_spec
return BehaviorMapping(behavior_dict)
def set_action_for_agent(self, behavior_name, agent_id, action):
pass
def set_actions(self, behavior_name, action):
# The ActionTuple contains the actions for all n_agents. This
# slices the ActionTuple into an action tuple for each environment
# and sets it. The index j is used to ignore agents that have already
# reached done.
j = 0
for i in range(self.num_agents):
_act = ActionTuple()
name_and_num = behavior_name + str(i)
env = self.envs[name_and_num]
if not self.dones[name_and_num]:
if self.action_spec.continuous_size > 0:
_act.add_continuous(action.continuous[j : j + 1])
if self.action_spec.discrete_size > 0:
_disc_list = [action.discrete[j, :]]
_act.add_discrete(np.array(_disc_list))
j += 1
env.action[behavior_name] = _act
def get_steps(self, behavior_name):
# This gets the individual DecisionSteps and TerminalSteps
# from the envs and merges them into a batch to be sent
# to the AgentProcessor.
dec_vec_obs = []
dec_reward = []
dec_group_reward = []
dec_agent_id = []
dec_group_id = []
ter_vec_obs = []
ter_reward = []
ter_group_reward = []
ter_agent_id = []
ter_group_id = []
interrupted = []
action_mask = None
terminal_step = TerminalSteps.empty(self.behavior_spec)
decision_step = None
for i in range(self.num_agents):
name_and_num = behavior_name + str(i)
env = self.envs[name_and_num]
_dec, _term = env.step_result[behavior_name]
if not self.dones[name_and_num]:
dec_agent_id.append(i)
dec_group_id.append(1)
if len(dec_vec_obs) > 0:
for j, obs in enumerate(_dec.obs):
dec_vec_obs[j] = np.concatenate((dec_vec_obs[j], obs), axis=0)
else:
for obs in _dec.obs:
dec_vec_obs.append(obs)
dec_reward.append(_dec.reward[0])
dec_group_reward.append(_dec.group_reward[0])
if _dec.action_mask is not None:
if action_mask is None:
action_mask = []
if len(action_mask) > 0:
action_mask[0] = np.concatenate(
(action_mask[0], _dec.action_mask[0]), axis=0
)
else:
action_mask.append(_dec.action_mask[0])
if len(_term.reward) > 0 and name_and_num in self.just_died:
ter_agent_id.append(i)
ter_group_id.append(1)
if len(ter_vec_obs) > 0:
for j, obs in enumerate(_term.obs):
ter_vec_obs[j] = np.concatenate((ter_vec_obs[j], obs), axis=0)
else:
for obs in _term.obs:
ter_vec_obs.append(obs)
ter_reward.append(_term.reward[0])
ter_group_reward.append(_term.group_reward[0])
interrupted.append(False)
self.just_died.remove(name_and_num)
decision_step = DecisionSteps(
dec_vec_obs,
dec_reward,
dec_agent_id,
action_mask,
dec_group_id,
dec_group_reward,
)
terminal_step = TerminalSteps(
ter_vec_obs,
ter_reward,
interrupted,
ter_agent_id,
ter_group_id,
ter_group_reward,
)
if self.all_done:
decision_step = DecisionSteps([], [], [], [], [], [])
return (decision_step, terminal_step)
def step(self) -> None:
# Steps all environments and calls reset if all agents are done.
for name in self.names:
for i in range(self.num_agents):
name_and_num = name + str(i)
# Does not step the env if done
if not self.dones[name_and_num]:
env = self.envs[name_and_num]
# Reproducing part of env step to intercept Dones
assert all(action is not None for action in env.action.values())
done = env._take_action(name)
reward = env._compute_reward(name, done)
self.dones[name_and_num] = done
if done:
self.just_died.add(name_and_num)
if self.all_done:
env.step_result[name] = env._make_batched_step(
name, done, 0.0, reward
)
self.final_rewards[name].append(reward)
# self.reset()
elif done:
# This agent has finished but others are still running.
# This gives a reward of the time penalty if this agent
# is successful and the negative env reward if it fails.
ceil_reward = min(-TIME_PENALTY, reward)
env.step_result[name] = env._make_batched_step(
name, done, ceil_reward, 0.0
)
self.final_rewards[name].append(reward)
else:
env.step_result[name] = env._make_batched_step(
name, done, reward, 0.0
)
def reset(self) -> None: # type: ignore
for name in self.names:
for i in range(self.num_agents):
name_and_num = name + str(i)
self.dones[name_and_num] = False
self.dones = {}
self.just_died = set()
self.final_rewards = {}
for name in self.names:
self.final_rewards[name] = []
for i in range(self.num_agents):
name_and_num = name + str(i)
self.dones[name_and_num] = False
self.envs[name_and_num].reset()
@property
def reset_parameters(self) -> Dict[str, str]:
return {}
def close(self):
pass

Просмотреть файл

@ -3,7 +3,8 @@ import pytest
import numpy as np
from gym import spaces
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
from mlagents_envs.base_env import (
BehaviorSpec,
ActionSpec,
@ -11,7 +12,7 @@ from mlagents_envs.base_env import (
TerminalSteps,
BehaviorMapping,
)
from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
from dummy_config import create_observation_specs_with_shapes
def test_gym_wrapper():

Просмотреть файл

@ -0,0 +1,32 @@
from mlagents_envs.envs.unity_aec_env import UnityAECEnv
from mlagents_envs.envs.unity_parallel_env import UnityParallelEnv
from simple_test_envs import SimpleEnvironment, MultiAgentEnvironment
from pettingzoo.test import api_test, parallel_api_test
NUM_TEST_CYCLES = 100
def test_single_agent_aec():
unity_env = SimpleEnvironment(["test_single"])
env = UnityAECEnv(unity_env)
api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
def test_multi_agent_aec():
unity_env = MultiAgentEnvironment(["test_multi_1", "test_multi_2"], num_agents=2)
env = UnityAECEnv(unity_env)
api_test(env, num_cycles=NUM_TEST_CYCLES, verbose_progress=False)
def test_single_agent_parallel():
unity_env = SimpleEnvironment(["test_single"])
env = UnityParallelEnv(unity_env)
parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)
def test_multi_agent_parallel():
unity_env = MultiAgentEnvironment(
["test_multi_1", "test_multi_2", "test_multi_3"], num_agents=3
)
env = UnityParallelEnv(unity_env)
parallel_api_test(env, num_cycles=NUM_TEST_CYCLES)

Просмотреть файл

@ -0,0 +1,503 @@
import io
import numpy as np
import pytest
from typing import List, Tuple, Any
from mlagents_envs.communicator_objects.agent_info_pb2 import AgentInfoProto
from mlagents_envs.communicator_objects.observation_pb2 import (
ObservationProto,
NONE,
PNG,
)
from mlagents_envs.communicator_objects.brain_parameters_pb2 import BrainParametersProto
from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
AgentInfoActionPairProto,
)
from mlagents_envs.communicator_objects.agent_action_pb2 import AgentActionProto
from mlagents_envs.base_env import (
BehaviorSpec,
ActionSpec,
DecisionSteps,
TerminalSteps,
)
from mlagents_envs.exception import UnityObservationException
from mlagents_envs.rpc_utils import (
behavior_spec_from_proto,
process_pixels,
_process_maybe_compressed_observation,
_process_rank_one_or_two_observation,
steps_from_proto,
)
from PIL import Image
from dummy_config import create_observation_specs_with_shapes
def generate_list_agent_proto(
n_agent: int,
shape: List[Tuple[int]],
infinite_rewards: bool = False,
nan_observations: bool = False,
) -> List[AgentInfoProto]:
result = []
for agent_index in range(n_agent):
ap = AgentInfoProto()
ap.reward = float("inf") if infinite_rewards else agent_index
ap.done = agent_index % 2 == 0
ap.max_step_reached = agent_index % 4 == 0
ap.id = agent_index
ap.action_mask.extend([True, False] * 5)
obs_proto_list = []
for obs_index in range(len(shape)):
obs_proto = ObservationProto()
obs_proto.shape.extend(list(shape[obs_index]))
obs_proto.compression_type = NONE
obs_proto.float_data.data.extend(
([float("nan")] if nan_observations else [0.1])
* np.prod(shape[obs_index])
)
obs_proto_list.append(obs_proto)
ap.observations.extend(obs_proto_list)
result.append(ap)
return result
def generate_compressed_data(in_array: np.ndarray) -> bytes:
image_arr = (in_array * 255).astype(np.uint8)
bytes_out = bytes()
num_channels = in_array.shape[2]
num_images = (num_channels + 2) // 3
# Split the input image into batches of 3 channels.
for i in range(num_images):
sub_image = image_arr[..., 3 * i : 3 * i + 3]
if (i == num_images - 1) and (num_channels % 3) != 0:
# Pad zeros
zero_shape = list(in_array.shape)
zero_shape[2] = 3 - (num_channels % 3)
z = np.zeros(zero_shape, dtype=np.uint8)
sub_image = np.concatenate([sub_image, z], axis=2)
im = Image.fromarray(sub_image, "RGB")
byteIO = io.BytesIO()
im.save(byteIO, format="PNG")
bytes_out += byteIO.getvalue()
return bytes_out
# test helper function for old C# API (no compressed channel mapping)
def generate_compressed_proto_obs(
in_array: np.ndarray, grayscale: bool = False
) -> ObservationProto:
obs_proto = ObservationProto()
obs_proto.compressed_data = generate_compressed_data(in_array)
obs_proto.compression_type = PNG
if grayscale:
# grayscale flag is only used for old API without mapping
expected_shape = [in_array.shape[0], in_array.shape[1], 1]
obs_proto.shape.extend(expected_shape)
else:
obs_proto.shape.extend(in_array.shape)
return obs_proto
# test helper function for new C# API (with compressed channel mapping)
def generate_compressed_proto_obs_with_mapping(
in_array: np.ndarray, mapping: List[int]
) -> ObservationProto:
obs_proto = ObservationProto()
obs_proto.compressed_data = generate_compressed_data(in_array)
obs_proto.compression_type = PNG
if mapping is not None:
obs_proto.compressed_channel_mapping.extend(mapping)
expected_shape = [
in_array.shape[0],
in_array.shape[1],
len({m for m in mapping if m >= 0}),
]
obs_proto.shape.extend(expected_shape)
else:
obs_proto.shape.extend(in_array.shape)
return obs_proto
def generate_uncompressed_proto_obs(in_array: np.ndarray) -> ObservationProto:
obs_proto = ObservationProto()
obs_proto.float_data.data.extend(in_array.flatten().tolist())
obs_proto.compression_type = NONE
obs_proto.shape.extend(in_array.shape)
return obs_proto
def proto_from_steps(
decision_steps: DecisionSteps, terminal_steps: TerminalSteps
) -> List[AgentInfoProto]:
agent_info_protos: List[AgentInfoProto] = []
# Take care of the DecisionSteps first
for agent_id in decision_steps.agent_id:
agent_id_index = decision_steps.agent_id_to_index[agent_id]
reward = decision_steps.reward[agent_id_index]
done = False
max_step_reached = False
agent_mask: Any = None
if decision_steps.action_mask is not None:
agent_mask = []
for _branch in decision_steps.action_mask:
agent_mask = np.concatenate(
(agent_mask, _branch[agent_id_index, :]), axis=0
)
agent_mask = agent_mask.astype(np.bool).tolist()
observations: List[ObservationProto] = []
for all_observations_of_type in decision_steps.obs:
observation = all_observations_of_type[agent_id_index]
if len(observation.shape) == 3:
observations.append(generate_uncompressed_proto_obs(observation))
else:
observations.append(
ObservationProto(
float_data=ObservationProto.FloatData(data=observation),
shape=[len(observation)],
compression_type=NONE,
)
)
agent_info_proto = AgentInfoProto(
reward=reward,
done=done,
id=agent_id,
max_step_reached=bool(max_step_reached),
action_mask=agent_mask,
observations=observations,
)
agent_info_protos.append(agent_info_proto)
# Take care of the TerminalSteps second
for agent_id in terminal_steps.agent_id:
agent_id_index = terminal_steps.agent_id_to_index[agent_id]
reward = terminal_steps.reward[agent_id_index]
done = True
max_step_reached = terminal_steps.interrupted[agent_id_index]
final_observations: List[ObservationProto] = []
for all_observations_of_type in terminal_steps.obs:
observation = all_observations_of_type[agent_id_index]
if len(observation.shape) == 3:
final_observations.append(generate_uncompressed_proto_obs(observation))
else:
final_observations.append(
ObservationProto(
float_data=ObservationProto.FloatData(data=observation),
shape=[len(observation)],
compression_type=NONE,
)
)
agent_info_proto = AgentInfoProto(
reward=reward,
done=done,
id=agent_id,
max_step_reached=bool(max_step_reached),
action_mask=None,
observations=final_observations,
)
agent_info_protos.append(agent_info_proto)
return agent_info_protos
# The arguments here are the DecisionSteps, TerminalSteps and continuous/discrete actions for a single agent name
def proto_from_steps_and_action(
decision_steps: DecisionSteps,
terminal_steps: TerminalSteps,
continuous_actions: np.ndarray,
discrete_actions: np.ndarray,
) -> List[AgentInfoActionPairProto]:
agent_info_protos = proto_from_steps(decision_steps, terminal_steps)
agent_action_protos = []
num_agents = (
len(continuous_actions)
if continuous_actions is not None
else len(discrete_actions)
)
for i in range(num_agents):
proto = AgentActionProto()
if continuous_actions is not None:
proto.continuous_actions.extend(continuous_actions[i])
proto.vector_actions_deprecated.extend(continuous_actions[i])
if discrete_actions is not None:
proto.discrete_actions.extend(discrete_actions[i])
proto.vector_actions_deprecated.extend(discrete_actions[i])
agent_action_protos.append(proto)
agent_info_action_pair_protos = [
AgentInfoActionPairProto(agent_info=agent_info_proto, action_info=action_proto)
for agent_info_proto, action_proto in zip(
agent_info_protos, agent_action_protos
)
]
return agent_info_action_pair_protos
def test_process_pixels():
in_array = np.random.rand(128, 64, 3)
byte_arr = generate_compressed_data(in_array)
out_array = process_pixels(byte_arr, 3)
assert out_array.shape == (128, 64, 3)
assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
assert np.allclose(in_array, out_array, atol=0.01)
def test_process_pixels_multi_png():
height = 128
width = 64
num_channels = 7
in_array = np.random.rand(height, width, num_channels)
byte_arr = generate_compressed_data(in_array)
out_array = process_pixels(byte_arr, num_channels)
assert out_array.shape == (height, width, num_channels)
assert np.sum(in_array - out_array) / np.prod(in_array.shape) < 0.01
assert np.allclose(in_array, out_array, atol=0.01)
def test_process_pixels_gray():
in_array = np.random.rand(128, 64, 3)
byte_arr = generate_compressed_data(in_array)
out_array = process_pixels(byte_arr, 1)
assert out_array.shape == (128, 64, 1)
assert np.mean(in_array.mean(axis=2, keepdims=True) - out_array) < 0.01
assert np.allclose(in_array.mean(axis=2, keepdims=True), out_array, atol=0.01)
def test_vector_observation():
n_agents = 10
shapes = [(3,), (4,)]
obs_specs = create_observation_specs_with_shapes(shapes)
list_proto = generate_list_agent_proto(n_agents, shapes)
for obs_index, shape in enumerate(shapes):
arr = _process_rank_one_or_two_observation(
obs_index, obs_specs[obs_index], list_proto
)
assert list(arr.shape) == ([n_agents] + list(shape))
assert np.allclose(arr, 0.1, atol=0.01)
def test_process_visual_observation():
shape = (128, 64, 3)
in_array_1 = np.random.rand(*shape)
proto_obs_1 = generate_compressed_proto_obs(in_array_1)
in_array_2 = np.random.rand(*shape)
in_array_2_mapping = [0, 1, 2]
proto_obs_2 = generate_compressed_proto_obs_with_mapping(
in_array_2, in_array_2_mapping
)
ap1 = AgentInfoProto()
ap1.observations.extend([proto_obs_1])
ap2 = AgentInfoProto()
ap2.observations.extend([proto_obs_2])
ap_list = [ap1, ap2]
obs_spec = create_observation_specs_with_shapes([shape])[0]
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
assert list(arr.shape) == [2, 128, 64, 3]
assert np.allclose(arr[0, :, :, :], in_array_1, atol=0.01)
assert np.allclose(arr[1, :, :, :], in_array_2, atol=0.01)
def test_process_visual_observation_grayscale():
in_array_1 = np.random.rand(128, 64, 3)
proto_obs_1 = generate_compressed_proto_obs(in_array_1, grayscale=True)
expected_out_array_1 = np.mean(in_array_1, axis=2, keepdims=True)
in_array_2 = np.random.rand(128, 64, 3)
in_array_2_mapping = [0, 0, 0]
proto_obs_2 = generate_compressed_proto_obs_with_mapping(
in_array_2, in_array_2_mapping
)
expected_out_array_2 = np.mean(in_array_2, axis=2, keepdims=True)
ap1 = AgentInfoProto()
ap1.observations.extend([proto_obs_1])
ap2 = AgentInfoProto()
ap2.observations.extend([proto_obs_2])
ap_list = [ap1, ap2]
shape = (128, 64, 1)
obs_spec = create_observation_specs_with_shapes([shape])[0]
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
assert list(arr.shape) == [2, 128, 64, 1]
assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
assert np.allclose(arr[1, :, :, :], expected_out_array_2, atol=0.01)
def test_process_visual_observation_padded_channels():
in_array_1 = np.random.rand(128, 64, 12)
in_array_1_mapping = [0, 1, 2, 3, -1, -1, 4, 5, 6, 7, -1, -1]
proto_obs_1 = generate_compressed_proto_obs_with_mapping(
in_array_1, in_array_1_mapping
)
expected_out_array_1 = np.take(in_array_1, [0, 1, 2, 3, 6, 7, 8, 9], axis=2)
ap1 = AgentInfoProto()
ap1.observations.extend([proto_obs_1])
ap_list = [ap1]
shape = (128, 64, 8)
obs_spec = create_observation_specs_with_shapes([shape])[0]
arr = _process_maybe_compressed_observation(0, obs_spec, ap_list)
assert list(arr.shape) == [1, 128, 64, 8]
assert np.allclose(arr[0, :, :, :], expected_out_array_1, atol=0.01)
def test_process_visual_observation_bad_shape():
in_array_1 = np.random.rand(128, 64, 3)
proto_obs_1 = generate_compressed_proto_obs(in_array_1)
ap1 = AgentInfoProto()
ap1.observations.extend([proto_obs_1])
ap_list = [ap1]
shape = (128, 42, 3)
obs_spec = create_observation_specs_with_shapes([shape])[0]
with pytest.raises(UnityObservationException):
_process_maybe_compressed_observation(0, obs_spec, ap_list)
def test_batched_step_result_from_proto():
n_agents = 10
shapes = [(3,), (4,)]
spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
)
ap_list = generate_list_agent_proto(n_agents, shapes)
decision_steps, terminal_steps = steps_from_proto(ap_list, spec)
for agent_id in range(n_agents):
if agent_id in decision_steps:
# we set the reward equal to the agent id in generate_list_agent_proto
assert decision_steps[agent_id].reward == agent_id
elif agent_id in terminal_steps:
assert terminal_steps[agent_id].reward == agent_id
else:
raise Exception("Missing agent from the steps")
# We sort the AgentId since they are split between DecisionSteps and TerminalSteps
combined_agent_id = list(decision_steps.agent_id) + list(terminal_steps.agent_id)
combined_agent_id.sort()
assert combined_agent_id == list(range(n_agents))
for agent_id in range(n_agents):
assert (agent_id in terminal_steps) == (agent_id % 2 == 0)
if agent_id in terminal_steps:
assert terminal_steps[agent_id].interrupted == (agent_id % 4 == 0)
assert decision_steps.obs[0].shape[1] == shapes[0][0]
assert decision_steps.obs[1].shape[1] == shapes[1][0]
assert terminal_steps.obs[0].shape[1] == shapes[0][0]
assert terminal_steps.obs[1].shape[1] == shapes[1][0]
def test_mismatch_observations_raise_in_step_result_from_proto():
n_agents = 10
shapes = [(3,), (4,)]
spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
)
ap_list = generate_list_agent_proto(n_agents, shapes)
# Hack an observation to be larger, we should get an exception
ap_list[0].observations[0].shape[0] += 1
ap_list[0].observations[0].float_data.data.append(0.42)
with pytest.raises(UnityObservationException):
steps_from_proto(ap_list, spec)
def test_action_masking_discrete():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((7, 3))
)
ap_list = generate_list_agent_proto(n_agents, shapes)
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
masks = decision_steps.action_mask
assert isinstance(masks, list)
assert len(masks) == 2
assert masks[0].shape == (n_agents / 2, 7) # half agents are done
assert masks[1].shape == (n_agents / 2, 3) # half agents are done
assert masks[0][0, 0]
assert not masks[1][0, 0]
assert masks[1][0, 1]
def test_action_masking_discrete_1():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_discrete((10,))
)
ap_list = generate_list_agent_proto(n_agents, shapes)
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
masks = decision_steps.action_mask
assert isinstance(masks, list)
assert len(masks) == 1
assert masks[0].shape == (n_agents / 2, 10)
assert masks[0][0, 0]
def test_action_masking_discrete_2():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes),
ActionSpec.create_discrete((2, 2, 6)),
)
ap_list = generate_list_agent_proto(n_agents, shapes)
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
masks = decision_steps.action_mask
assert isinstance(masks, list)
assert len(masks) == 3
assert masks[0].shape == (n_agents / 2, 2)
assert masks[1].shape == (n_agents / 2, 2)
assert masks[2].shape == (n_agents / 2, 6)
assert masks[0][0, 0]
def test_action_masking_continuous():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(10)
)
ap_list = generate_list_agent_proto(n_agents, shapes)
decision_steps, terminal_steps = steps_from_proto(ap_list, behavior_spec)
masks = decision_steps.action_mask
assert masks is None
def test_agent_behavior_spec_from_proto():
agent_proto = generate_list_agent_proto(1, [(3,), (4,)])[0]
bp = BrainParametersProto()
bp.vector_action_size_deprecated.extend([5, 4])
bp.vector_action_space_type_deprecated = 0
behavior_spec = behavior_spec_from_proto(bp, agent_proto)
assert behavior_spec.action_spec.is_discrete()
assert not behavior_spec.action_spec.is_continuous()
assert [spec.shape for spec in behavior_spec.observation_specs] == [(3,), (4,)]
assert behavior_spec.action_spec.discrete_branches == (5, 4)
assert behavior_spec.action_spec.discrete_size == 2
bp = BrainParametersProto()
bp.vector_action_size_deprecated.extend([6])
bp.vector_action_space_type_deprecated = 1
behavior_spec = behavior_spec_from_proto(bp, agent_proto)
assert not behavior_spec.action_spec.is_discrete()
assert behavior_spec.action_spec.is_continuous()
assert behavior_spec.action_spec.continuous_size == 6
def test_batched_step_result_from_proto_raises_on_infinite():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
)
ap_list = generate_list_agent_proto(n_agents, shapes, infinite_rewards=True)
with pytest.raises(RuntimeError):
steps_from_proto(ap_list, behavior_spec)
def test_batched_step_result_from_proto_raises_on_nan():
n_agents = 10
shapes = [(3,), (4,)]
behavior_spec = BehaviorSpec(
create_observation_specs_with_shapes(shapes), ActionSpec.create_continuous(3)
)
ap_list = generate_list_agent_proto(n_agents, shapes, nan_observations=True)
with pytest.raises(RuntimeError):
steps_from_proto(ap_list, behavior_spec)

Просмотреть файл

@ -7,7 +7,7 @@ from mlagents_envs.base_env import (
ActionSpec,
BehaviorSpec,
)
from mlagents.trainers.tests.dummy_config import create_observation_specs_with_shapes
from dummy_config import create_observation_specs_with_shapes
def test_decision_steps():

Просмотреть файл

@ -4,7 +4,7 @@ The `mlagents` Python package is part of the
[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents`
provides a set of reinforcement and imitation learning algorithms designed to be
used with Unity environments. The algorithms interface with the Python API
provided by the `mlagents_envs` package. See [here](../docs/Python-API.md) for
provided by the `mlagents_envs` package. See [here](../docs/Python-LLAPI.md) for
more information on `mlagents_envs`.
The algorithms can be accessed using the: `mlagents-learn` access point. See

Просмотреть файл

@ -0,0 +1,16 @@
{
"param_1": {
"lesson_num": 2
},
"param_2": {
"lesson_num": 0
},
"param_3": {
"lesson_num": 0
},
"metadata": {
"stats_format_version": "0.3.0",
"mlagents_version": "0.29.0.dev0",
"torch_version": "1.8.1"
}
}

Просмотреть файл

@ -13,7 +13,7 @@ from mlagents_envs.base_env import (
TerminalSteps,
BehaviorMapping,
)
from mlagents_envs.tests.test_rpc_utils import proto_from_steps_and_action
from .test_rpc_utils import proto_from_steps_and_action
from mlagents_envs.communicator_objects.agent_info_action_pair_pb2 import (
AgentInfoActionPairProto,
)

Просмотреть файл

@ -1,7 +1,7 @@
import argparse
from mlagents_envs.environment import UnityEnvironment
from gym_unity.envs import UnityToGymWrapper
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
def test_run_environment(env_name):

Просмотреть файл

@ -136,13 +136,12 @@ def init_venv(
# install from pypi
pip_commands += [
f"mlagents=={mlagents_python_version}",
f"gym-unity=={mlagents_python_version}",
# TODO build these and publish to internal pypi
"tf2onnx==1.6.1",
]
else:
# Local install
pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents", "-e ./gym-unity"]
pip_commands += ["-e ./ml-agents-envs", "-e ./ml-agents"]
if extra_packages:
pip_commands += extra_packages

Просмотреть файл

@ -40,7 +40,7 @@ def validate_packages(root_dir):
def main():
for root_dir in ["ml-agents", "ml-agents-envs", "gym-unity"]:
for root_dir in ["ml-agents", "ml-agents-envs"]:
validate_packages(root_dir)

Просмотреть файл

@ -22,7 +22,9 @@ MATCH_ANY = re.compile(r"(?s).*")
# To allow everything in the file (effectively skipping it), use MATCH_ANY for the value
ALLOW_LIST = {
# Previous release table
"README.md": re.compile(r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"),
"docs/Python-PettingZoo-API.md": re.compile(
r"\*\*(Verified Package ([0-9]\.?)*|Release [0-9]+)\*\*"
),
"docs/Versioning.md": MATCH_ANY,
"com.unity.ml-agents/CHANGELOG.md": MATCH_ANY,
"utils/make_readme_table.py": MATCH_ANY,

Просмотреть файл

@ -8,11 +8,7 @@ import argparse
VERSION_LINE_START = "__version__ = "
DIRECTORIES = [
"ml-agents/mlagents/trainers",
"ml-agents-envs/mlagents_envs",
"gym-unity/gym_unity",
]
DIRECTORIES = ["ml-agents/mlagents/trainers", "ml-agents-envs/mlagents_envs"]
MLAGENTS_PACKAGE_JSON_PATH = "com.unity.ml-agents/package.json"
MLAGENTS_EXTENSIONS_PACKAGE_JSON_PATH = "com.unity.ml-agents.extensions/package.json"