Build sim from data for use in reinforcement learning and bonsai platform for machine teaching.
Перейти к файлу
Journey McDowell 426898f137 plots from old refactor, logs fixed 2021-07-12 11:57:34 -07:00
.github Updating numpy requirements 2020-12-04 13:04:34 -08:00
config New models from data with episodes and iterations, same ICs from Inkling 2021-07-09 00:05:19 -07:00
csv_data New models from data with episodes and iterations, same ICs from Inkling 2021-07-09 00:05:19 -07:00
env_data New models from data with episodes and iterations, same ICs from Inkling 2021-07-09 00:05:19 -07:00
img Adding jupyter notebook for presales qualification of data driven sim creation. It is using the nbgrader package to grade its feasibility with the Validate button. The release version is intended for usage for the 'students'. Major change with datadrivenmodel is the addition of selecting the lag for iterations between state transitions from the config 2020-11-04 01:29:34 -08:00
models New models from data with episodes and iterations, same ICs from Inkling 2021-07-09 00:05:19 -07:00
release/presales_evaluation changing verbage from sample to test in GMM 2020-12-08 10:55:26 -08:00
sim Adding original sim files, but had to modify names of initial_theta to theta 2021-07-06 17:30:28 -07:00
source/presales_evaluation changing verbage from sample to test in GMM 2020-12-08 10:55:26 -08:00
.gitignore Added checks for strings, lists, tuples, objects in data. Reworked progression of df so it gets saved to export as processed_data.csv. Removed need for checkDataQuality.py and featureImportance.py since it's included into jupyter notebook 2020-11-05 20:59:04 -08:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2020-09-08 15:20:22 -07:00
Dockerfile Including dockerfile and requirements.txt for packaging 2020-10-21 19:15:42 -07:00
LICENSE.txt Adding disclaimer and updating environment.yml to not need sdk2 2020-09-16 17:03:22 -07:00
README.md Updating README to make it clearer that this tool expects initial conditions with same name. Update notbeook with recent changes to make plots easier to read 2020-12-08 10:04:31 -08:00
SECURITY.md Initial SECURITY.md commit 2020-09-08 15:20:27 -07:00
assess_config.json Including assess_config.json 2021-07-06 17:10:39 -07:00
datamodeler.py Added checks for strings, lists, tuples, objects in data. Reworked progression of df so it gets saved to export as processed_data.csv. Removed need for checkDataQuality.py and featureImportance.py since it's included into jupyter notebook 2020-11-05 20:59:04 -08:00
env_data_modeler.py Adding jupyter notebook for presales qualification of data driven sim creation. It is using the nbgrader package to grade its feasibility with the Validate button. The release version is intended for usage for the 'students'. Major change with datadrivenmodel is the addition of selecting the lag for iterations between state transitions from the config 2020-11-04 01:29:34 -08:00
environment.yml pinning h5py and pyyaml 2020-11-30 19:52:25 -08:00
generate.py New models from data with episodes and iterations, same ICs from Inkling 2021-07-09 00:05:19 -07:00
interface.json Adding original sim files, but had to modify names of initial_theta to theta 2021-07-06 17:30:28 -07:00
machine_teacher.ink Working files with Quanser 2021-07-06 16:15:39 -07:00
main.py plots from old refactor, logs fixed 2021-07-12 11:57:34 -07:00
plot_diff.py plots from old refactor, logs fixed 2021-07-12 11:57:34 -07:00
policies.py plots from old refactor, logs fixed 2021-07-12 11:57:34 -07:00
predictor.py Added checks for strings, lists, tuples, objects in data. Reworked progression of df so it gets saved to export as processed_data.csv. Removed need for checkDataQuality.py and featureImportance.py since it's included into jupyter notebook 2020-11-05 20:59:04 -08:00
requirements.txt Adding original sim files, but had to modify names of initial_theta to theta 2021-07-06 17:30:28 -07:00
train_bonsai_main.py plots from old refactor, logs fixed 2021-07-12 11:57:34 -07:00

README.md

Data driven model creation for simulators to train brains on Bonsai

Tooling to simplify the creation and use of data driven simulators using supervised learning with the purpose of training brains with Project Bonsai. It digests data as csv and will generate simulation models which can then be directly used to train a reinforcement learning agent.

🚩 Disclaimer: This is not an official Microsoft product. This application is considered an experimental addition to Microsoft Project Bonsai's software toolchain. It's primary goal is to reduce barriers of entry to use Project Bonsai's core Machine Teaching. Pull requests for fixes and small enhancements are welcome, but we do expect this to be replaced by out-of-the-box features of Project Bonsai in the near future.

Dependencies

conda env update -f environment.yml
conda activate datadriven

Main steps to follow

Step 1. Obtain historical or surrogate sim data in csv format.

  • header names
  • a single row should be a slice in time
  • ensure data ranges cover what we might set reinforcement learning to explore
  • smooth noisy data, get rid of outliers
  • remove NaN or SNA values

Refer to later sections for help with checking data quality before using this tool.

Step 2. Change the config_model.yml file in the config/ folder

Enter the csv file name. Enter the timelag, i.e. the number of rows or iterations that define the state transition in the data. A timelag of 1 will use every row of data, where if one makes a change in the system/process the result is shown in the next sample measurement.

Define the names of the features as input to the simulator model you will create. The names should match the headers of the csv file you provide. Set the values as either state or action. Define the output_name matching the headers in your csv.

Define the model type as either gb, poly, nn, or lstm. Depending on the specific model one chooses, alter the hyperparameters in this config file as well.

# Define csv file path to train a simulator with
DATA:
  path: ./csv_data/example_data.csv
  timelag: 1
# Define the inputs and outputs of datadriven simulator
IO:
  feature_name:
    theta: state
    alpha: state
    theta_dot: state
    alpha_dot: state
    Vm: action
  output_name:
    - theta
    - alpha
    - theta_dot
    - alpha_dot
# Select the model type gb, poly, nn, or lstm
MODEL:
  type: gb
# Polynomial Regression hyperparameters
POLY:
  degree: 1
# Gradient Boost hyperparameters
GB:
  n_estimators: 100
  lr: 0.1
  max_depth: 3
# MLP Neural Network hyperparameters
NN:
  epochs: 100
  batch_size: 512
  activation: linear
  n_layer: 5
  n_neuron: 12
  lr: 0.00001
  decay: 0.0000003
  dropout: 0.5
# LSTM Neural Network hyperparameters
LSTM:
  epochs: 100
  batch_size: 512
  activation: linear
  num_hidden_layer: 5
  n_neuron: 12
  lr: 0.00001
  decay: 0.0000003
  dropout: 0.5
  markovian_order: 2
  num_lstm_units: 1

Step 3. Run the tool

python datamodeler.py

The tool will ingest your csv file as input and create a simulator model of the type you selected. THe resultant model will be placed into models/.

Step 4. Use the model directly

An adaptor class is available for usage in the following way to make custom integrations. We've already done this for you in Step 5, but this provides a good understanding. Initialize the class with model type, which consists of either 'gb', 'poly', 'nn', or 'lstm'.

Specify a noise_percentage to optionally add to the states of the simulator, leaving it at zero will not add noise. Training a brain can benefit from adding noise to the states of an approximated simulator to promote robustness.

Define the action_space_dimensions and the state_space_dimensions. The markovian_order is needed when setting the sequence length of the features for an LSTM.

from predictor import ModelPredictor

predictor = ModelPredictor(
    modeltype="gb",
    noise_percentage=0,
    state_space_dim=4,
    action_space_dim=1,
    markovian_order=0
)

Calculate next state as a function of current state and current action. IMPORTANT: input state and action are arrays. You need to convert brain action, i.e. dictionary, to an array before feeding into the predictor class.

next_state = predictor.predict(action=action, state=state)

The thing to watch out for with datadriven simulators is one cannot trust the approximations when the feature inputs are not within the range it was trained on, i.e. you may get erroneous results. One can optionally evaluate if this occurs by using the warn_limitation() functionality.

features = np.concatenate([state, action]
predictor.warn_limitation(features)

Sim should not be necessarily trusted since predicting with the feature Vm outside of range it was trained on, i.e. extrapolating.

Step 5. Train with Bonsai

Create a brain and write Inkling with type definitions that match what the simulator can provide, which you defined in config_model.yml. Run the train_bonsai_main.py file to register your newly created simulator. The integration is already done! Then connect the simulator to your brain.

Be sure to specify noise_percentage in your Inkling's scenario. Training a brain can benefit from adding noise to the states of an approximated simulator to promote robustness.

The episode_start in train_bonsai_main.py is expecting initial conditions of your states defined in config_model.yml to match scenario dictionary passed in. If you want to pass in other variables that are not modeled by the datadrivenmodel tool (except for noise_percentage), you'll likely have to modify train_bonsai_main.py.

lesson `Start Inverted` {
    scenario {
        theta: number<-1.4 .. 1.4>,
        alpha: number<-0.05 .. 0.05>,  # reset inverted
        theta_dot: number <-0.05 .. 0.05>,
        alpha_dot: number<-0.05 .. 0.05>,
        noise_percentage: 5,
    }
}

Ensure the SimConfig in Inkling matches the names of the headers in the config_model.yml to allow train_bonsai_main.py to work.

python train_bonsai_main.py --workspace <workspace-id> --accesskey <accesskey>

Optional Flags

Use pickle instead of csv as data input

Name your dataset as x_set.pickle and y_set.pickle.

python datamodeler.py --pickle <foldername>

For example one might use a <foldername> of env_data.

Hyperparameter tuning

Gradient Boost should not require much tuning at all. Polynomial Regression may benefit from changing the order. Neural Networks, however, may require significant hyperparameter tuning. Use the flag to use the specified ranges in the model_config.yml file to randomly search.

python datamodeler.py --tune-rs=True

LSTM

After creating an LSTM model for your sim, you can use the predictor class in the same way as the other models. The predictor class initializes a sequence and will continue to stack a history of state transitions and pop off the oldest information. In order to maintain a sequence of valid distributions when starting a sim using the LSTM model, the predictor class takes a single timestep of initial conditions and will automatically step through the sim using the mean value of each of the actions captured from the data in model_limits.yml.

from predictor import ModelPredictor
import numpy as np

predictor = ModelPredictor(
    modeltype="lstm",
    noise_percentage=0,
    state_space_dim=4,
    action_space_dim=1,
    markovian_order=3
)

config = {
    'theta': 0.01,
    'alpha': 0.02,
    'theta_dot': 0.04,
    'alpha_dot': 0.05,
    'noise_percentage': 5,
}

predictor.reset_state(config)

for i in range(1):
    next_state = predictor.predict(
        action=np.array([0.83076303]),
        state=np.array(
            [ 0.6157155  , 0.19910571 , 0.15492792 , 0.09268583]
        )
    )
    print('next_state: ', next_state)

print('history state: ', predictor.state)
print('history action: ', predictor.action_history)

The code snippet using the LSTM results in the following history of states and actions. Take note that reset_state(config) will auto populate realistic trajectories for the sequence using the mean action. This means that the 0th iteration of the simulation does not necessarily start where the user specified in the config. Continuing to predict() will step through the sim and maintain a history of the state transitions automatically for you, matching the sequence of information required for the LSTM.

next_state:  [ 0.41453919  0.07664483 -0.13645072  0.81335021]
history state:  deque([0.6157155, 0.19910571, 0.15492792, 0.09268583, 0.338321704451164, 0.018040116405597596, -0.5707406858943783, 0.3023940018967715, 0.01, 0.02, 0.04, 0.05], maxlen=12)
history action:  deque([0.83076303, -0.004065768182411825, -0.004065768182411825], maxlen=3)

Build Simulator Package

az acr build --image <IMAGE_NAME>:<IMAGE_VERSION> --file Dockerfile --registry <ACR_REGISTRY> .

Data Evaluation

Use the release version of the jupyter notebook to assist you with qualifying your data for creation of a simulator using supervised learning. The notebook is split up into three parts. The notebook uses the nbgrader package, where the user should click the Validate button to determine if all tests have been passed. You will be responsible for loading the data, running cells to see if you successfully pass tests, and manipulating the data in Python if the notebook finds things like NaNs and outliers. It will ask for desired operating limits of the model you wish to create and compare that against what is available in your provided datasets. Asssuming you pass the tests for data relevance, your data will be exported to a single csv named approved_data.csv which is ready to be ingested by the datadrivenmodel tool.

  • Data Relevance
  • Sparsity
  • Data Distribution Confidence
jupyter notebook release/presales_evaluation/presales_evaluation.ipynb

Once you have successfuly qualified your data using the Validate button, it is recommended to export it as a PDF to share the results without requiring access to the data.

Contribute Code

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Telemetry

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.