FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.

Перейти к файлу

LinZichuan 0980bde05d Fix bug		2020-09-20 12:59:29 +00:00
dopamine	Fix bug	2020-09-20 12:59:29 +00:00
.DS_Store	add option for directbq and sqloss	2020-07-31 12:50:15 +08:00
.gitignore	first	2020-07-22 11:19:45 +00:00
CODE_OF_CONDUCT.md	Initial CODE_OF_CONDUCT.md commit	2020-04-20 22:53:57 -07:00
CONTRIBUTING.md	first	2020-07-22 11:19:45 +00:00
LICENSE	modified: LICENSE	2020-04-29 14:49:10 +08:00
README.md	Update README.md	2020-09-19 14:16:35 +08:00
SECURITY.md	Initial SECURITY.md commit	2020-04-20 22:53:59 -07:00
setup.py	first	2020-07-22 11:19:45 +00:00

README.md

Fully parameterized Quantile Function (FQF)

Tensorflow implementation of paper

Fully Parameterized Quantile Function for Distribution Reinforcement Learning

Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-yan Liu

If you use this code in your research, please cite

@inproceedings{yang2019fully,
  title={Fully Parameterized Quantile Function for Distributional Reinforcement Learning},
  author={Yang, Derek and Zhao, Li and Lin, Zichuan and Qin, Tao and Bian, Jiang and Liu, Tie-Yan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={6190--6199},
  year={2019}
}

Requirements

python==3.6
tensorflow
gym
absl-py
atari-py
gin-config
opencv-python

Installation on Ubuntu

sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config==0.1.4 gym opencv-python tensorflow-gpu==1.12.0
cd FQF
pip install -e .

Experiments

Our experiments and hyper-parameter searching can be simply run as the following

cd FQF/dopamine/discrete_domains
bash run-fqf.sh

Bug Fixed

It is recommended to use the L2 loss on gradient for probability proposal network, or clip the largest proposed probability to 0.98. The reason is as follows: in quantile function, when the probability goes to 1, the quantile value goes to infinity(or a very large number). Although a very large quantile value is reasonable for a probability such as 0.9999999, with limited approximation ability of neural network, quantile values for other probabilities will go up quickly, leading to a performance drop.

Acknowledgement

Our code is implemented based on dopamine.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.