Code accompanying the paper "Better Exploration with Optimistic Actor Critic" (NeurIPS 2019)

Перейти к файлу

Quan Vuong cbc0333cc9 Update plot_against_baseline.py		2020-01-10 14:19:53 -08:00
environment	update env .txt	2019-12-12 00:29:17 -08:00
plotting	Update plot_against_baseline.py	2020-01-10 14:19:53 -08:00
trainer	Initial commit.	2019-11-26 19:05:37 +00:00
utils	Initial commit.	2019-11-26 19:05:37 +00:00
.gitignore	add data and vscode to gitignore	2019-12-12 00:48:04 -08:00
CODE_OF_CONDUCT.md	Initial CODE_OF_CONDUCT.md commit	2019-11-26 10:49:17 -08:00
LICENSE	Initial LICENSE commit	2019-11-26 10:49:19 -08:00
README.md	Update README.md	2019-12-20 09:32:55 +07:00
SECURITY.md	Initial SECURITY.md commit	2019-11-26 10:49:20 -08:00
cgmanifest.json	Initial commit.	2019-11-26 19:05:37 +00:00
humanoid-v2_formal_fig_True.png	new figure for humanoid	2019-12-18 05:15:35 -08:00
launcher_util.py	Initial commit.	2019-11-26 19:05:37 +00:00
main.py	replace ray.wait with ray.get	2019-12-18 04:26:09 -08:00
networks.py	Initial commit.	2019-11-26 19:05:37 +00:00
oac.sh	add script to run sac and oac individually	2019-12-12 00:49:36 -08:00
optimistic_exploration.py	Initial commit.	2019-11-26 19:05:37 +00:00
path_collector.py	Initial commit.	2019-11-26 19:05:37 +00:00
replay_buffer.py	Initial commit.	2019-11-26 19:05:37 +00:00
reproduce.sh	add reproduce.sh	2019-12-09 20:22:21 -08:00
rl_algorithm.py	replace ray.wait with ray.get	2019-12-18 04:26:09 -08:00
sac.sh	add script to run sac and oac individually	2019-12-12 00:49:36 -08:00

README.md

Optimistic Actor Critic

This repository contains the code accompanying the NeurIPS 2019 paper 'Better Exploration with Optimistic Actor Critic'.

If you are reading the code to understand how Optimistic Actor Critic works, have a look at the file optimistic_exploration.py, which encapsulates the logic of optimistic exploration. The remaining files in the repository implement a generic version of Soft Actor Critic.

Reproducing Results

The bash script reproduce.sh will run Soft Actor Critic and Optimistic Actor Critic on the environment Humanoid-v2, each with 5 seeds. It is recommended you execute this script on a machine with sufficient resources.

After the script finishes, to plot the learning curve, you can run

python -m plotting.plot_against_baseline

which should produce the graph below. Optimistic Actor Critic takes ~6 million timesteps to obtain an average episode return of 8000, while Soft Actor Critic requires 10 million steps. This represents a ~40% improvement in sample efficiency.

Note that the result in the paper was produced by modifying the Tensorflow code as provided in the softlearning repo.

Running Experiments

The repository supports automatic saving and restoring from checkpoint. This is useful if you run experiments on pre-emptive cloud compute.

For software dependencies, please have a look inside the environment folder, you can either build the Dockerfile, create a conda environment with environment.yml or pip environment with environments.txt.

To create the conda environment, cd into the environment folder and run:

python install_mujoco.py
conda env create -f environment.yml

To run Soft Actor Critic on Humanoid with seed 0 as a baseline to compare against Optimistic Actor Critic, run

python main.py --seed=0 --domain=humanoid

To run Optimistic Actor Critic on Humanoid with seed 0,

python main.py --seed=0 --domain=humanoid --beta_UB=4.66 --delta=23.53

Hyper-parameter Selection

Note that we are able to remove an hyperparameter relative to the code used for the paper (the k_LB hyper-parameter). The result in the graph above was obtained without using the hyper-parameter k_LB.

Acknowledgement

This reposity was based on rlkit.

Citation

If you use the codebase, please cite the paper:

@misc{oac,
    title={Better Exploration with Optimistic Actor-Critic},
    author={Kamil Ciosek and Quan Vuong and Robert Loftin and Katja Hofmann},
    year={2019},
    eprint={1910.12807},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.