mwt-ds/ContextExplorer/Simulated_Data_Generator
Yaran Fan 859da7aada
Data Generator improvements (#179)
* Three updates: 1.Single best action guaranteed if increase_winning_margin is greater than 0. 2. Sample with replacement 3.Add greyscale to the plot

* Include the accuracy of the latest model in the plot
2020-02-27 13:17:50 -08:00
..
README.md Add Context Explorer (#175) 2020-02-12 19:09:28 -08:00
Simulated_Data_Generator.ipynb Data Generator improvements (#179) 2020-02-27 13:17:50 -08:00
config_data_generator.json Add Context Explorer (#175) 2020-02-12 19:09:28 -08:00
vw_offline_utilities.py Data Generator improvements (#179) 2020-02-27 13:17:50 -08:00

README.md

Simulated Data Generator

The Jupyter Notebook Simulated_Data_Generator.ipynb in this folder can generate a synthetic dataset and simulate an experiment to get DSJson logs.

Overview

To generate the dataset and logs, prepare a config file (described later) and then run the Notebook Simulated_Data_Generator.ipynb end to end. There are two parts in the notebook:

  1. Generate a Simulated Dataset

    • Use the config file to generate a dataset with the specified context, action and rewards. We refer to this as the ground truth file.
  2. Transform to DSJson and Train a VW Model

    • At each iteration, randomly sample a batch from the ground truth file.
    • Get actions according to the latest model predictions.
    • Get rewards from the ground truth file.
    • Send the batch to VW for training and update the model.
    • Save the logs for each batch separately. These logs will be used for the Context Explorer.
    • This whole process simulated an experiment in which VW learns a policy to maximize reward for the ground truth data

Config File

The key input to the notebook is the config file config_data_generator.json. Here is an example of the config file and some details:

{
    "dataset_name": "Test",
    "output_folder": "E:\\data\\20190729_context_explorer\\simulated_data",
    "reward_range": [-1, 1],
    "reward_dense_range": [0, 0.3],
    "actions": [1, 2, 3, 4, 5, 6, 7, 8],
    "contexts": {
        "CallType": ["1_1", "GVC"],
        "MediaType": ["Audio", "Video"],
        "NetworkType": ["wifi", "wired"]
    },
    "context_action_size": 1000,
    "increase_winning_margin": 0.02,
    "center": true,
    "p_value": 0.001,
    "random_state": 3,
    "model_parameters": {
        "batch_size_initial": 5000,
        "batch_size":5000,
        "iterations": 30,
        "default_action_index": 0,
        "add_control_group": false
    },
    "vw_commands":{
        "exploration_policy": "--epsilon 0.3",
        "cb_type": "ips",
        "interactions": "--interactions iFFF",
        "learning_rate": 0.001,
        "other_commands": "--power_t 0"
    }
}
  • dataset_name [str]: Name of the dataset

  • output_folder [str]: Path where the dataset will be saved. Note that the DSJson logs will be saved to output_folder\logs.

  • reward_range [list]: The reward boundaries

  • reward_dense_range [list]: The reward range where most values should fall into

  • actions [list]: List of all possible actions

  • contexts [dict]: A dictionary of contexts and their unique values. For example "Color": ["red", "blue"]

  • context_action_size [int]: Number of samples for each context*action pair

  • p_value [float]: (optional) p-value threshold for t-test. Default 0.001

  • increase_winning_margin [float]: (optional) Add this value to the winning actions rewards to increase the winning margin. The higher the value, the easier the optimization problem. Default 0

  • center [bool]: (optional) Center data by removing the mean reward. Default True

  • random_state [int]: (optional) random seed. Default 1

  • model_parameters [dict]:

    • batch_size_initial [int]: Sample size for the first iteration
    • batch_size [int]: Sample size for the following iterations
    • iterations [int]: Number of iterations
    • default_action_index [int]: (optional) Index of the default action in the “actions” list. Default 0 (the first action from the list)
    • add_control_group [bool]: (optional) To create a proper control group, whose data will not be used to train the policy. Default False
  • vw_commands [dict]:

    • exploration_policy [str]: (optional) Default "--epsilon 0.3"
    • cb_type [str]: (optional) Default "ips"
    • interactions [str]: (optional) Default "--interactions iFFF"
    • learning_rate [float]: (optional) Default 0.001
    • other_commands [str]: (optional) Default ""