FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.
Перейти к файлу
LinZichuan 0980bde05d Fix bug 2020-09-20 12:59:29 +00:00
dopamine Fix bug 2020-09-20 12:59:29 +00:00
.DS_Store add option for directbq and sqloss 2020-07-31 12:50:15 +08:00
.gitignore first 2020-07-22 11:19:45 +00:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2020-04-20 22:53:57 -07:00
CONTRIBUTING.md first 2020-07-22 11:19:45 +00:00
LICENSE modified: LICENSE 2020-04-29 14:49:10 +08:00
README.md Update README.md 2020-09-19 14:16:35 +08:00
SECURITY.md Initial SECURITY.md commit 2020-04-20 22:53:59 -07:00
setup.py first 2020-07-22 11:19:45 +00:00

README.md

Fully parameterized Quantile Function (FQF)

Tensorflow implementation of paper

Fully Parameterized Quantile Function for Distribution Reinforcement Learning

Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-yan Liu

If you use this code in your research, please cite

@inproceedings{yang2019fully,
  title={Fully Parameterized Quantile Function for Distributional Reinforcement Learning},
  author={Yang, Derek and Zhao, Li and Lin, Zichuan and Qin, Tao and Bian, Jiang and Liu, Tie-Yan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={6190--6199},
  year={2019}
}

Requirements

  • python==3.6
  • tensorflow
  • gym
  • absl-py
  • atari-py
  • gin-config
  • opencv-python

Installation on Ubuntu

sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config==0.1.4 gym opencv-python tensorflow-gpu==1.12.0
cd FQF
pip install -e .

Experiments

  • Our experiments and hyper-parameter searching can be simply run as the following
cd FQF/dopamine/discrete_domains
bash run-fqf.sh

Bug Fixed

  • It is recommended to use the L2 loss on gradient for probability proposal network, or clip the largest proposed probability to 0.98. The reason is as follows: in quantile function, when the probability goes to 1, the quantile value goes to infinity(or a very large number). Although a very large quantile value is reasonable for a probability such as 0.9999999, with limited approximation ability of neural network, quantile values for other probabilities will go up quickly, leading to a performance drop.

Acknowledgement

  • Our code is implemented based on dopamine.

Code of Conduct