* Three updates: 1.Single best action guaranteed if increase_winning_margin is greater than 0. 2. Sample with replacement 3.Add greyscale to the plot * Include the accuracy of the latest model in the plot |
.. | ||
README.md | ||
Simulated_Data_Generator.ipynb | ||
config_data_generator.json | ||
vw_offline_utilities.py |
Simulated Data Generator
The Jupyter Notebook Simulated_Data_Generator.ipynb in this folder can generate a synthetic dataset and simulate an experiment to get DSJson logs.
To generate the dataset and logs, prepare a config file (described later) and then run the Notebook Simulated_Data_Generator.ipynb end to end. There are two parts in the notebook:
Generate a Simulated Dataset
- Use the config file to generate a dataset with the specified context, action and rewards. We refer to this as the ground truth file.
Transform to DSJson and Train a VW Model
- At each iteration, randomly sample a batch from the ground truth file.
- Get actions according to the latest model predictions.
- Get rewards from the ground truth file.
- Send the batch to VW for training and update the model.
- Save the logs for each batch separately. These logs will be used for the Context Explorer.
- This whole process simulated an experiment in which VW learns a policy to maximize reward for the ground truth data
Config File
The key input to the notebook is the config file config_data_generator.json. Here is an example of the config file and some details:
"dataset_name": "Test",
"output_folder": "E:\\data\\20190729_context_explorer\\simulated_data",
"reward_range": [-1, 1],
"reward_dense_range": [0, 0.3],
"actions": [1, 2, 3, 4, 5, 6, 7, 8],
"contexts": {
"CallType": ["1_1", "GVC"],
"MediaType": ["Audio", "Video"],
"NetworkType": ["wifi", "wired"]
"context_action_size": 1000,
"increase_winning_margin": 0.02,
"center": true,
"p_value": 0.001,
"random_state": 3,
"model_parameters": {
"batch_size_initial": 5000,
"iterations": 30,
"default_action_index": 0,
"add_control_group": false
"exploration_policy": "--epsilon 0.3",
"cb_type": "ips",
"interactions": "--interactions iFFF",
"learning_rate": 0.001,
"other_commands": "--power_t 0"
dataset_name [str]: Name of the dataset
output_folder [str]: Path where the dataset will be saved. Note that the DSJson logs will be saved to output_folder\logs.
reward_range [list]: The reward boundaries
reward_dense_range [list]: The reward range where most values should fall into
actions [list]: List of all possible actions
contexts [dict]: A dictionary of contexts and their unique values. For example
"Color": ["red", "blue"]
context_action_size [int]: Number of samples for each context*action pair
p_value [float]: (optional) p-value threshold for t-test. Default 0.001
increase_winning_margin [float]: (optional) Add this value to the winning action’s rewards to increase the winning margin. The higher the value, the easier the optimization problem. Default 0
center [bool]: (optional) Center data by removing the mean reward. Default True
random_state [int]: (optional) random seed. Default 1
model_parameters [dict]:
- batch_size_initial [int]: Sample size for the first iteration
- batch_size [int]: Sample size for the following iterations
- iterations [int]: Number of iterations
- default_action_index [int]: (optional) Index of the default action in the “actions” list. Default 0 (the first action from the list)
- add_control_group [bool]: (optional) To create a proper control group, whose data will not be used to train the policy. Default False
vw_commands [dict]:
- exploration_policy [str]: (optional) Default "--epsilon 0.3"
- cb_type [str]: (optional) Default "ips"
- interactions [str]: (optional) Default "--interactions iFFF"
- learning_rate [float]: (optional) Default 0.001
- other_commands [str]: (optional) Default ""