Граф коммитов

1 Коммитов

Автор SHA1 Сообщение Дата
Maryam Honari df96d5c835 Develop custom trainers (#73)
* Make create_policy more generic (#54)

* add on/off policy classes and inherit from

* trainers as plugins


* remove swap files

* clean up registration debug

* clean up all pre-commit

* a2c plugin pass precommit

* move gae to trainer utils

* move lambda return to trainer util

* add validator for num_epoch

* add types for settings/type methods

* move create policy into highest level api

* move update_reward_signal into optimizer

* move get_policy into Trainer

* remove get settings type

* dummy_config settings

* move all stats from actor into dict, enables arbitrary actor data

* remove shared_critic flag, cleanups

* refactor create_policy

* remove sample_actions, evaluate_actions, update_norm from policy

* remove comments

* fix return type get stat

* update poca create_policy

* clean up policy init

* remove conftest

* add sharedecritic to settings

* fix test_networks

* fix test_policy

* fix test network

* fix some ppo/sac tests

* add back conftest.py

* improve specification of trainer type

* add defaults fpr trainer_type/hyperparam

* fix test_saver

* fix reward providers

* add settings check utility for tests

* fix some settings tests

* add trainer types to run_experiment

* type check for arbitary actor data

* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)

* make all trainers types and setting visible at module level

* remove settings from run_experiment console script

* fix test_settings and upgrade config scripts

* remove need of trainer_type argument up to trainefactory

* fix gohst trainer behavior id in policy Queue

* fix torch shadow in tests

* update trainers, rl trainers tests

* update tests to match the refactors

* fixing behavior name in ghost trainer

* update ml-agents-envs test configs

* separating the plugin package changes

* bring get_policy back for sake of ghost trainer

* add return types and remove unused returns

* remove duplicate methods in poca (_update_policy, add_policy)

Co-authored-by: mahon94 <maryam.honari@unity3d.com>

* Online/offline custom trainer examples with plugin system (#52)

* add on/off policy classes and inherit from

* trainers as plugins

* a2c trains

* remove swap files

* clean up registration debug

* clean up all pre-commit

* a2c plugin pass precommit

* move gae to trainer utils

* move lambda return to trainer util

* add validator for num_epoch

* add types for settings/type methods

* move create policy into highest level api

* move update_reward_signal into optimizer

* move get_policy into Trainer

* remove get settings type

* dummy_config settings

* move all stats from actor into dict, enables arbitrary actor data

* remove shared_critic flag, cleanups

* refactor create_policy

* remove sample_actions, evaluate_actions, update_norm from policy

* remove comments

* fix return type get stat

* update poca create_policy

* clean up policy init

* remove conftest

* add sharedecritic to settings

* fix test_networks

* fix test_policy

* fix test network

* fix some ppo/sac tests

* add back conftest.py

* improve specification of trainer type

* add defaults fpr trainer_type/hyperparam

* fix test_saver

* fix reward providers

* add settings check utility for tests

* fix some settings tests

* add trainer types to run_experiment

* type check for arbitary actor data

* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)

* make all trainers types and setting visible at module level

* remove settings from run_experiment console script

* fix test_settings and upgrade config scripts

* remove need of trainer_type argument up to trainefactory

* fix gohst trainer behavior id in policy Queue

* fix torch shadow in tests

* update trainers, rl trainers tests

* update tests to match the refactors

* fixing behavior name in ghost trainer

* update ml-agents-envs test configs

* fix precommit

* separating the plugin package changes

* bring get_policy back for sake of ghost trainer

* add return types and remove unused returns

* remove duplicate methods in poca (_update_policy, add_policy)

* add a2c trainer back

* Add DQN cleaned up trainer/optimizer

* nit naming

* fix logprob/entropy types in torch_policy.py

* clean up DQN/SAC

* add docs for custom trainers,TODO: refrence tutorial

* add docs for custom trainers,TODO: refrence tutorial

* add clipping to loss function

* set old importlim-metadata version

* bump precomit hook env to 3.8.x

* use smooth l1 loss

Co-authored-by: mahon94 <maryam.honari@unity3d.com>

* add tutorial for validation

* fix formatting errors

* clean up

* minor changes

Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
Co-authored-by: zhuo <zhuo@unity3d.com>
2022-10-20 16:06:58 -04:00