* Make create_policy more generic (#54)
* add on/off policy classes and inherit from
* trainers as plugins
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* Online/offline custom trainer examples with plugin system (#52)
* add on/off policy classes and inherit from
* trainers as plugins
* a2c trains
* remove swap files
* clean up registration debug
* clean up all pre-commit
* a2c plugin pass precommit
* move gae to trainer utils
* move lambda return to trainer util
* add validator for num_epoch
* add types for settings/type methods
* move create policy into highest level api
* move update_reward_signal into optimizer
* move get_policy into Trainer
* remove get settings type
* dummy_config settings
* move all stats from actor into dict, enables arbitrary actor data
* remove shared_critic flag, cleanups
* refactor create_policy
* remove sample_actions, evaluate_actions, update_norm from policy
* remove comments
* fix return type get stat
* update poca create_policy
* clean up policy init
* remove conftest
* add sharedecritic to settings
* fix test_networks
* fix test_policy
* fix test network
* fix some ppo/sac tests
* add back conftest.py
* improve specification of trainer type
* add defaults fpr trainer_type/hyperparam
* fix test_saver
* fix reward providers
* add settings check utility for tests
* fix some settings tests
* add trainer types to run_experiment
* type check for arbitary actor data
* cherrypick rename ml-agents/trainers/torch to torch_entities (#55)
* make all trainers types and setting visible at module level
* remove settings from run_experiment console script
* fix test_settings and upgrade config scripts
* remove need of trainer_type argument up to trainefactory
* fix gohst trainer behavior id in policy Queue
* fix torch shadow in tests
* update trainers, rl trainers tests
* update tests to match the refactors
* fixing behavior name in ghost trainer
* update ml-agents-envs test configs
* fix precommit
* separating the plugin package changes
* bring get_policy back for sake of ghost trainer
* add return types and remove unused returns
* remove duplicate methods in poca (_update_policy, add_policy)
* add a2c trainer back
* Add DQN cleaned up trainer/optimizer
* nit naming
* fix logprob/entropy types in torch_policy.py
* clean up DQN/SAC
* add docs for custom trainers,TODO: refrence tutorial
* add docs for custom trainers,TODO: refrence tutorial
* add clipping to loss function
* set old importlim-metadata version
* bump precomit hook env to 3.8.x
* use smooth l1 loss
Co-authored-by: mahon94 <maryam.honari@unity3d.com>
* add tutorial for validation
* fix formatting errors
* clean up
* minor changes
Co-authored-by: Andrew Cohen <andrew.cohen@unity3d.com>
Co-authored-by: zhuo <zhuo@unity3d.com>