* Add context explorer

* Data Generator: Add README; Expose add_control_group

* Context explorer add IPS fix, improve visualization, add examples
This commit is contained in:
Yaran Fan 2020-02-12 19:09:28 -08:00 коммит произвёл GitHub
Родитель d23720653e
Коммит e46f69cfd4
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
30 изменённых файлов: 1998 добавлений и 0 удалений

70
ContextExplorer/README.md Normal file
Просмотреть файл

@ -0,0 +1,70 @@
# Context Explorer
Context Explorer generates detailed context reports by reading DSJson logs. You can fild an example in the <i>sample_output</i> folder (Download the folder to view the HTML report). Here's a screenshot of the report:
<img src="sample_output/ce_report_screenshot.PNG" width="1000">
## What is Context Explorer?
For contextual bandit problems, the model performance and impact may vary on different contexts. For some experiments, it is critical to understand how the model works for different contexts.
Context Explorer tackles this problem in an A/B test fashion. The control group always takes the default action; the treatment group always applies the policy that learned by model. We then compare the performance of the two groups by context to help evaluate how the policy works.
The output is a single HTML file with 4 tabs:
* **Experiment Summary**: Experiment name and time range
* **Overall Trend**: Overall trend of counts, rewards and cumulative rewards
* **Context Summary**: Top sensitive contexts and their counts, rewards, exploit action
* **Context Trend**: Tope sensitive contexts and their trend of counts, rewards and cumulative rewards
In the report, some contexts may be excluded due to ample size constraints and sensitivity limits. You can find information of all contexts in an xlsx log file along with the report. A linked is provided at the bottom of the _Context Summary_ tab.
## Who can use Context Explorer? How?
In general, there are two use cases when you already have some logs from an experiment:
* Where there is no proper control group:
* Step 1. Prepare a <i>config_file.json</i> (details later). Specify a <i>"default_action_index"</i> in the config file. It will be used as control group
* Step 2. Generate the Context Explorer report by `python run_context_explorer.py "PATH/TO/config_file.json"`
* When there is a proper control group
* Step 1. Prepare a <i>config_file.json</i> (details later). Specify a <i>"control_identifier"</i> in the config file.
* Step 2. Generate the Context Explorer report by `python run_context_explorer.py "PATH/TO/config_file.json"`
If you do not have an existing experiment but want to play with the Context Explorer, we provide a simulate data generator in this repo. Follow the steps below to generate simulated data and use Context Explorer:
* Step 1. Generate a simulated dataset and logs by following instructions in the <i>Simulated_Data_Generator</i> folder
* Step 2. Prepare a config file
* Step 3. Generate the Context Explorer report by `python run_context_explorer.py "PATH/TO/config_file.json"`
## Context Explorer config file
The config file controls how you analyze your logs. Here is an example of the config file:
{
"exps": {
"NewTest": {
"start_date": "2020-02-12",
"end_date": "2020-03-02",
"context_feature_namespace": ["Features"],
"action_label_key": ["id"],
"default_action_index": 0,
"control_identifier": {
"_group": "control"
},
"data_folder": "E:\\data\\20190729_context_explorer\\simulated_data\\logs"
}
},
"output_folder": "\\\\avdsp-share\\InputFiles\\p_learning\\Monitoring",
"show_top_sensitive_contexts": 20,
"min_daily_sample": 20
}
* **exps** [dict]:
* **exp_name** [key]
* **start_date** [str]: _(optional)_ The earliest date to be considered for the analysis. Default the earliest date in the dataset
* **end_date** [str]: _(optional)_ The last date to be considered for the analysis. Default the latest date in the dataset
* **context_feature_namespace** [list]: Can be a list of string (eg: ["Features"]) or a list of list (eg: [["FromUrl", "dayOfWeek"], ["FromUrl", "networkType"]])
* **action_label_key** [list]: The key within “_multi” in the DSJson logs. Its values will be served as action labels
* **default_action_index** [int]: _(optional)_ The action that will be treated as control group. Default 0
* **control_identifier** [dict]: _(optional)_ Key-value pairs from the DSJson logs that can identify the control group
* **data_folder** [str]: The path where the target DSJson logs are located
* **output_folder** [str]: The path where the reports will be exported to
* **show_top_sensitive_contexts** [int]: _(optional)_ Only show the most sensitive contexts in the context analysis. Default 20
* **min_daily_sample** [int]: _(optional)_ Only show contexts with an average daily count that is at least min_daily_sample for both control and treatment group. Default 200

Просмотреть файл

@ -0,0 +1,79 @@
# Simulated Data Generator
The Jupyter Notebook <i>Simulated_Data_Generator.ipynb</i> in this folder can generate a synthetic dataset and simulate an experiment to get DSJson logs.
## Overview
To generate the dataset and logs, prepare a config file (described later) and then run the Notebook <i>Simulated_Data_Generator.ipynb</i> end to end.
There are two parts in the notebook:
1. Generate a Simulated Dataset
* Use the config file to generate a dataset with the specified context, action and rewards. We refer to this as the ground truth file.
2. Transform to DSJson and Train a VW Model
* At each iteration, randomly sample a batch from the ground truth file.
* Get actions according to the latest model predictions.
* Get rewards from the ground truth file.
* Send the batch to VW for training and update the model.
* Save the logs for each batch separately. These logs will be used for the Context Explorer.
* This whole process simulated an experiment in which VW learns a policy to maximize reward for the ground truth data
## Config File
The key input to the notebook is the config file <i>config_data_generator.json</i>. Here is an example of the config file and some details:
{
"dataset_name": "Test",
"output_folder": "E:\\data\\20190729_context_explorer\\simulated_data",
"reward_range": [-1, 1],
"reward_dense_range": [0, 0.3],
"actions": [1, 2, 3, 4, 5, 6, 7, 8],
"contexts": {
"CallType": ["1_1", "GVC"],
"MediaType": ["Audio", "Video"],
"NetworkType": ["wifi", "wired"]
},
"context_action_size": 1000,
"increase_winning_margin": 0.02,
"center": true,
"p_value": 0.001,
"random_state": 3,
"model_parameters": {
"batch_size_initial": 5000,
"batch_size":5000,
"iterations": 30,
"default_action_index": 0,
"add_control_group": false
},
"vw_commands":{
"exploration_policy": "--epsilon 0.3",
"cb_type": "ips",
"interactions": "--interactions iFFF",
"learning_rate": 0.001,
"other_commands": "--power_t 0"
}
}
* **dataset_name** [str]: Name of the dataset
* **output_folder** [str]: Path where the dataset will be saved. Note that the DSJson logs will be saved to **output_folder\logs**.
* **reward_range** [list]: The reward boundaries
* **reward_dense_range** [list]: The reward range where most values should fall into
* **actions** [list]: List of all possible actions
* **contexts** [dict]: A dictionary of contexts and their unique values. For example `"Color": ["red", "blue"]`
* **context_action_size** [int]: Number of samples for each context*action pair
* **p_value** [float]: _(optional)_ p-value threshold for t-test. Default 0.001
* **increase_winning_margin** [float]: _(optional)_ Add this value to the winning actions rewards to increase the winning margin. The higher the value, the easier the optimization problem. Default 0
* **center** [bool]: _(optional)_ Center data by removing the mean reward. Default True
* **random_state** [int]: _(optional)_ random seed. Default 1
* **model_parameters** [dict]:
* **batch_size_initial** [int]: Sample size for the first iteration
* **batch_size** [int]: Sample size for the following iterations
* **iterations** [int]: Number of iterations
* **default_action_index** [int]: _(optional)_ Index of the default action in the _“actions”_ list. Default 0 (the first action from the list)
* **add_control_group** [bool]: _(optional)_ To create a proper control group, whose data will not be used to train the policy. Default False
* **vw_commands** [dict]:
* **exploration_policy** [str]: _(optional)_ Default "--epsilon 0.3"
* **cb_type** [str]: _(optional)_ Default "ips"
* **interactions** [str]: _(optional)_ Default "--interactions iFFF"
* **learning_rate** [float]: _(optional)_ Default 0.001
* **other_commands** [str]: _(optional)_ Default ""

Просмотреть файл

@ -0,0 +1,364 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Simulated Data Generator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Prerequisite\n",
"* Install VowpalWabbit(VW) by following [this instruction](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Building)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Imports\n",
"import json\n",
"import os\n",
"import subprocess\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from collections import OrderedDict\n",
"from tqdm import tqdm\n",
"from vw_offline_utilities import *\n",
"from IPython.display import Markdown, display\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1. Generate a Simulated Dataset\n",
"\n",
"Based on the config file, we will generate a simulated dataset and save the file for further use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Config File\n",
"config_file = r'config_data_generator.json'\n",
"configs = json.load(open(config_file, 'r'))\n",
"configs = update_params(configs)\n",
"np.random.seed(configs['random_state'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Generate data\n",
"df, context_action_stats = generate_data(**configs)\n",
"\n",
"# Increase the leading gap of the best action\n",
"context_actions = summarize_dataset(df, configs, show_results=False)\n",
"df = increase_lead(df, context_actions, add_value=configs['increase_winning_margin'])\n",
"\n",
"# Finalizing\n",
"if configs['center']:\n",
" df['reward'] = df['reward'] - df['reward'].mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Summarize data\n",
"display(df.groupby(list(configs['contexts'].keys())+['action']).agg({'action': 'count', 'reward': 'mean'}).unstack(-1))\n",
"context_actions = summarize_dataset(df, configs, show_results=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Output data\n",
"df = df.reset_index().sample(frac=1, random_state=configs['random_state'])\n",
"df.to_csv(configs['df_file'], index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2. Transform to DSLogs and Train a VW Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Data Overview\n",
"\n",
"In this section, we list the contexts, actions and winning actions for each unique context."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Column names - context, reward and action columns\n",
"context_cols = list(configs['contexts'].keys())\n",
"action_col = 'action'\n",
"reward_col = 'reward'\n",
"df_cols = context_cols + [action_col, reward_col]\n",
"idx_cols = context_cols + [action_col]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Reshape data for the analysis\n",
"df.dropna(inplace=True)\n",
"for c in idx_cols:\n",
" df[c] = df[c].astype(str)\n",
"df = df.sort_values(idx_cols).set_index(idx_cols)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the space of context and action\n",
"contexts = configs['contexts']\n",
"actions = [str(x) for x in configs['actions']]\n",
"action_mapping = {i: a for i, a in enumerate(actions)}\n",
"display(Markdown('**Contexts**:'), dict(contexts))\n",
"display(Markdown('**Actions**:'), actions)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Summary\n",
"df_summary = df.reset_index().groupby(context_cols+[action_col])[reward_col].mean().unstack(-1)\n",
"df_summary.style.apply(lambda x: highlight_optimal(x, is_minimization=False), axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 VW Command Lines\n",
"We will specify the training parameters and commands for VowpalWabbit(VW) ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# VW Parameters\n",
"tc = configs['model_parameters']\n",
"vwc = configs['vw_commands']\n",
"\n",
"# VW Commands\n",
"cmd_train_initial = 'vw --dsjson {0} --cb_explore_adf {1} --cb_type {2} {3} -l {4} -f {5} {6}'.format(\n",
" configs['batch_dsjson_path'], vwc['exploration_policy'], vwc['cb_type'], vwc['interactions'], vwc['learning_rate'], configs['model_file'], vwc['other_commands'])\n",
"cmd_train_continued = 'vw -i {4} --dsjson {0} --cb_explore_adf {1} --cb_type {2} -l {3} -f {4} {5}'.format(\n",
" configs['batch_dsjson_path'], vwc['exploration_policy'], vwc['cb_type'], vwc['learning_rate'], configs['model_file'], vwc['other_commands'])\n",
"cmd_pred_unique_context = 'vw -t -i {0} --dsjson {1} -p {2} -l {3} {4}'.format(\n",
" configs['model_file'], configs['context_dsjson_path'], configs['context_pred_path'], vwc['learning_rate'], vwc['other_commands'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Transform Data for VW Modeling\n",
"\n",
"VW requires a special data format, DSJson as input. We will transform our tabular data to this format. For details, please visit this [example](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Conditional-Contextual-Bandit#example-2)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Unique context\n",
"df_contexts = get_unique_context(df_summary, action_col, reward_col, is_minimization=False)\n",
"df_contexts_json = transform_dsjson(df_contexts, context_cols, reward_col, action_col, actions, is_minimization=False)\n",
"export_dsjson(df_contexts_json, configs['context_dsjson_path'])\n",
"\n",
"# DSLog preview\n",
"display(Markdown('**DSLog Preview**'))\n",
"display(eval(df_contexts_json['output_json'][0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Train a Model with VW \n",
"\n",
"We will train a Contextual Bandit model with VW in this section. We can monitor the accuracy of exploit actions in the mean time. The training logs will be saved in a \\log subfolder."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prep plot\n",
"df_batch_accuracy = [np.nan]*tc['iterations']\n",
"fig, ax = init_plot(tc['iterations'])\n",
"\n",
"# Training\n",
"trajectory = pd.DataFrame()\n",
"for i in tqdm(range(tc['iterations'])):\n",
" # Select data\n",
" df_batch, control_identifier = select_data(i, df, df_contexts, configs, action_mapping, context_cols, action_col, reward_col)\n",
" trajectory = trajectory.append(df_batch)\n",
" # Export to dsjson format\n",
" df_batch_json = transform_dsjson(df_batch, context_cols, reward_col, action_col, actions, is_minimization=False, other_values=control_identifier)\n",
" export_dsjson(df_batch_json, configs['batch_dsjson_path'])\n",
" # Plot\n",
" df_batch_compare = pd.merge(\n",
" df_batch.loc[df_batch['action_prob']==df_batch['action_prob'].max(), idx_cols], df_contexts, \n",
" how='left', left_on=context_cols, right_on=context_cols, suffixes=['_pred', '_opt'])\n",
" df_batch_accuracy[i] = (df_batch_compare['action_pred']==df_batch_compare['action_opt']).mean()\n",
" plt_dynamic(fig, ax, df_batch_accuracy)\n",
" # Update model\n",
" if i == 0:\n",
" job = subprocess.Popen(cmd_train_initial)\n",
" job.wait()\n",
" else:\n",
" job = subprocess.Popen(cmd_train_continued)\n",
" job.wait() \n",
" # Predict with new model\n",
" job = subprocess.Popen(cmd_pred_unique_context)\n",
" job.wait() \n",
" # Keep all inputs by renaming them\n",
" new_name = configs['batch_dsjson_path'].replace('.json', '{0}.json'.format(i))\n",
" if os.path.exists(new_name):\n",
" os.remove(new_name)\n",
" os.rename(configs['batch_dsjson_path'], new_name)\n",
" # Create control group\n",
" if tc['add_control_group']:\n",
" create_control_logs(i, df, new_name, configs, actions, context_cols, action_col, reward_col)\n",
"print('Training logs are save in {0}'.format(os.path.dirname(configs['batch_dsjson_path'])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.5 Predictions and Regret\n",
"\n",
"We can compare the model predictions with the optimal (ground truth) to validate that the model is taking the best actions.\n",
"\n",
"We will also look at the average regret (distance from the optimal) over the training session."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare the final prediction with the optimal value\n",
"pred_context = load_pred_context(configs['pred_file'], df_contexts, context_cols, action_mapping)\n",
"df_compare = pd.merge(df_contexts, pred_context, left_on=context_cols, right_on=context_cols, how='left')\n",
"df_compare.rename(columns={action_col: 'optimal_action'}, inplace=True)\n",
"df_compare = df_compare[context_cols + ['optimal_action', 'exploit_action']].astype(str)\n",
"df_compare.style.apply(lambda x: highlight_suboptimal(x, df_compare['optimal_action'], ['exploit_action']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The table above shows that the model predictions match optimal actions in all contexts. \n",
"\n",
"Next, we'll look at the regret by context. Regret is defined as the distance between the optimal reward and that from the chosen action. So when then optimal action is learned, there will be no regret. In this particular example, as we used _epsilon 0.2_, which means that we always randomly explore for 20% of the population, so the regret will never be 0 but stay at a low level."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Regret by iteration\n",
"regret = get_regrets(trajectory, df_contexts, context_cols, reward_col, vwc['exploration_policy'], is_minimization=False)\n",
"\n",
"# Plot Regret by context\n",
"groups = context_cols + ['exploration', 'n_iteration']\n",
"plot_regrets(regret, groups, rolling_window=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -0,0 +1,31 @@
{
"dataset_name": "Test",
"output_folder": "E:\\data\\20190729_context_explorer\\simulated_data",
"reward_range": [-1, 1],
"reward_dense_range": [0, 0.3],
"actions": [1, 2, 3, 4, 5, 6, 7, 8],
"contexts": {
"CallType": ["1_1", "GVC"],
"MediaType": ["Audio", "Video"],
"NetworkType": ["wifi", "wired"]
},
"context_action_size": 1000,
"p_value": 0.001,
"increase_winning_margin": 0.02,
"center": true,
"random_state": 3,
"model_parameters": {
"batch_size_initial": 5000,
"batch_size":5000,
"iterations": 30,
"default_action_index": 0,
"add_control_group": false
},
"vw_commands":{
"exploration_policy": "--epsilon 0.3",
"cb_type": "ips",
"interactions": "--interactions iFFF",
"learning_rate": 0.001,
"other_commands": "--power_t 0"
}
}

Просмотреть файл

@ -0,0 +1,387 @@
import csv
import datetime
import itertools
import json
import os
import random
import uuid
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
from scipy.stats import norm, ttest_ind
def update_params(params):
# Data parameters
params['random_state'] = params.get('random_state', 1)
params['reward_diff'] = params['reward_dense_range'][1]-params['reward_dense_range'][0]
params['sd'] = params['reward_diff']/(len(params['actions'])+1)
params['reward_lower_bound'] = params['reward_dense_range'][0] + 2*params['sd']
params['reward_upper_bound'] = params['reward_dense_range'][1]
params['contexts_unique'] = list(itertools.product(*params['contexts'].values()))
params['p_value'] = params.get('p_value', 0.001)
params['increase_winning_margin'] = params.get('increase_winning_margin', 0)
params['center'] = params.get('center', True)
# Model parameters
params['model_parameters']['default_action_index'] = params['model_parameters'].get('default_action_index', 0)
params['model_parameters']['add_control_group'] = params['model_parameters'].get('add_control_group', False)
# VW Commands
params['vw_commands']['exploration_policy'] = params['vw_commands'].get('exploration_policy', '--epsilon 0.3')
params['vw_commands']['cb_type'] = params['vw_commands'].get('cb_type', 'ips')
params['vw_commands']['interactions'] = params['vw_commands'].get('interactions', '--interactions iFFF')
params['vw_commands']['learning_rate'] = params['vw_commands'].get('learning_rate', 0.001)
params['vw_commands']['other_commands'] = params['vw_commands'].get('other_commands', '')
# Files
log_path = os.path.join(params['output_folder'], 'logs')
if not os.path.exists(log_path):
os.makedirs(log_path)
params['df_file'] = os.path.join(params['output_folder'], '{0}_simulated_dataset.csv'.format(params['dataset_name']))
params['context_dsjson_path'] = os.path.join(params['output_folder'], '{0}_unique_context.json'.format(params['dataset_name']))
params['context_pred_path'] = os.path.join(params['output_folder'], '{0}_unique_context_pred.txt'.format(params['dataset_name']))
params['batch_dsjson_path'] = os.path.join(log_path, '{0}_batch_input.json'.format(params['dataset_name']))
params['model_file'] = os.path.join(params['output_folder'], '{0}_model.model'.format(params['dataset_name']))
params['pred_file'] = os.path.join(params['output_folder'], '{0}_unique_context_pred.txt'.format(params['dataset_name']))
return params
def fit_distribution(ss, reward_range):
xt = plt.xticks()[0]
xmin, xmax = reward_range[0], reward_range[1]
lnspc = np.linspace(xmin, xmax, len(ss))
m, sd = norm.fit(ss) # get mean and standard deviation
pdf_g = norm.pdf(lnspc, m, sd) # now get theoretical values in our interval
return lnspc, pdf_g
def summarize_context_action(reward_mean, ttest_df, p_value):
reward_mean_sorted = reward_mean['avg reward'].sort_values(ascending=False)
action_sorted = list(reward_mean_sorted.index)
action_best = []
for i, a in enumerate(action_sorted):
action_best.append(a)
try:
a_1 = action_sorted[i+1]
except IndexError:
continue
if ttest_df.loc[a, a_1]<p_value:
break
return reward_mean_sorted, action_best
def generate_data(**kargs):
random.seed(kargs['random_state'])
df = pd.DataFrame()
context_action_stats = {}
for c in kargs['contexts_unique']:
context_action_stats[c] = {}
for a in kargs['actions']:
mu = random.uniform(kargs['reward_lower_bound'], kargs['reward_upper_bound'])
n = kargs['context_action_size']
tmp_rand = np.random.normal(mu, kargs['sd'], n)
context_action_stats[c][a] = [mu, kargs['sd'], n]
tmp_data = pd.DataFrame(tmp_rand, columns=['reward'],
index=pd.MultiIndex.from_tuples([c]*n, names=kargs['contexts'].keys()))
tmp_data.insert(0, 'action', a)
df = df.append(tmp_data)
df['ConfigId'] = 'P-E-TEST-'+ df['action'].astype('str')
return df, context_action_stats
def summarize_vw(input_file, config_json, context_actions):
vw_summary = pd.read_excel(input_file,
dtype={'bandit_sig': bool,
'default_action': 'str',
'optimal_action': 'str',
'bandit_action': 'str'})
vw_summary.columns = [x.replace('context.', '') for x in vw_summary.columns]
vw_summary.rename(columns={'optimal_action': 'ttest_action'}, inplace=True)
vw_summary.set_index(config_json["FEATURE_COLUMNS"], inplace=True)
idx = vw_summary.columns.get_loc("ttest_action")
vw_summary.insert(idx, 'GT_best_action', np.nan)
for c, a in context_actions.items():
vw_summary.loc[c, 'GT_best_action'] = str(a['action_best'])
return vw_summary
def calculate_correct_rate(s, col_ground_truth):
tmp = s.reset_index()
pred_col = [x for x in s.columns if x!=col_ground_truth]
GT = [[str(c) for c in eval(x)] for x in s[col_ground_truth]]
preds = s[pred_col].values
tmp['correct_rate'] = [int(preds[i] in GT[i]) for i in range(len(preds))]
correct_rate = pd.DataFrame(tmp.groupby(s.index.names)['correct_rate'].mean())
return correct_rate
def summarize_dataset(df, params, show_results=True):
actions = params['actions']
context_actions = {}
for c in params['contexts_unique']:
context_actions[c] = {}
# Context DF
df_c = df.loc[c].copy()
reward_mean = df_c.groupby('action').mean()
reward_mean.columns = ['avg reward']
# Prep TTEST
ttest_df = pd.DataFrame(np.nan, columns=actions, index=actions)
# Prep Plot
if show_results:
fig = plt.figure(figsize=(15, 4))
# Loop through actions
for a1 in actions:
s_a1 = df_c.loc[df_c['action']==a1, 'reward']
# T-test
for a2 in actions:
if a1!=a2 and np.isnan(ttest_df.loc[a1, a2]):
s_a2 = df_c.loc[df_c['action']==a2, 'reward']
p = ttest_ind(s_a1, s_a2)[1]
ttest_df.loc[a1, a2] = p
ttest_df.loc[a2, a1] = p
# Plot
if show_results:
plt.subplot(121)
pltx, plty = fit_distribution(s_a1, params['reward_range'])
plt.plot(pltx, plty, label=a1)
# Show distribution results
if show_results:
plt.legend(title='Actions')
plt.title('Context {0}'.format(c))
plt.subplot(122)
# Best actions
context_actions[c]['action_rewards'], context_actions[c]['action_best'] = summarize_context_action(reward_mean, ttest_df, params['p_value'])
if show_results:
ttest_df = ttest_df.append(reward_mean.transpose())
best_actions = []
for a in ttest_df.columns:
if a in context_actions[c]['action_best']:
best_actions.append('Best')
else:
best_actions.append('')
best_actions_df = pd.DataFrame([best_actions], index=['Best Actions'], columns=ttest_df.columns)
# Show table
ttest_colors = list(np.where(ttest_df.values<params['p_value'], 'powderblue', 'white'))
ttest_colors.append(['white']*ttest_df.shape[1])
ttest_df = ttest_df.round(4)
ttest_df = ttest_df.append(best_actions_df)
plt.table(cellText=ttest_df.values,
rowLabels=ttest_df.index, colLabels=ttest_df.columns,
cellColours=ttest_colors,
colWidths = [1/(len(actions)+1)]*len(actions),
cellLoc = 'center', rowLoc = 'center', loc='center')
plt.axis('off')
plt.title('Best Action(s) and p-values')
plt.show()
return context_actions
def increase_lead(df, context_actions, add_value=0.1):
for k, v in context_actions.items():
targets = df.index.isin([k]) & (df['action'].isin(v['action_best']))
df.loc[targets, 'reward'] = df.loc[targets, 'reward'] + add_value
return df
def binary_reward(df, context_actions):
df['reward'] = 0
for k, v in context_actions.items():
targets = df.index.isin([k]) & (df['action'].isin(v['action_best']))
df.loc[targets, 'reward'] = 1
return df
def highlight_suboptimal(s, best, columns):
if s.name not in columns:
return ['']*len(s)
is_subopt = []
for i in range(len(s)):
tmp_subopt = False
if s[i] != best[i]:
tmp_subopt = True
is_subopt.append(tmp_subopt)
return ['background-color: yellow' if v else '' for v in is_subopt]
def highlight_optimal(s, is_minimization):
if is_minimization:
is_opt = s == s.min()
else:
is_opt = s == s.max()
return ['background-color: lightgreen' if v else '' for v in is_opt]
def transform_dsjson(df, context_cols, reward_col, action_col, actions, is_minimization, other_values=None):
action_list = [x+1 for x in range(len(actions))]
df_json = pd.DataFrame(index=df.index)
df_json['left_brace'] = '{'
df_json['label_cost'] = '"_label_cost":'
df_json['cost'] = df[reward_col] if is_minimization else -1*df[reward_col]
df_json['label_probability'] = ',"_label_probability":'
df_json['probability'] = 0
df_json['label_Action'] = ',"_label_Action":'
df_json['action'] = df[action_col].map({a: i+1 for i, a in enumerate(actions)})
df_json['labelIndex'] = ',"_labelIndex":'
df_json['aindex'] = df[action_col].map({a: i for i, a in enumerate(actions)})
eventid = df_json.apply(lambda x : uuid.uuid4().hex, axis=1)
df_json['o'] = ',"o":[{"EventId":"EventId_' + eventid + '","v":' + df_json['cost'].astype(str) + '}]'
df_json['Timestamp'] = ',"Timestamp":'
try:
ni = int(df['n_iteration'][0])
except KeyError:
ni = 0
df_json['time'] = '"' + (datetime.datetime.utcnow() + datetime.timedelta(days=ni)).isoformat() + 'Z"'
df_json['VE'] = ',"Version":"1","EventId":"EventId_' + eventid + '",'
df_json['a'] = '"a":' + df_json['aindex'].apply(lambda x: swap_selection(x, action_list))
add_context(df, df_json, context_cols)
context_multi = [{'id': {str(x): 1}} for x in range(len(action_list))]
df_json['multi'] = '"_multi":' + json.dumps(context_multi) + ' },'
if 'prob_list' not in df.columns:
p = round(1.0/len(actions), 4)
df['prob_list'] = [[p]*len(actions)]*df.shape[0]
df_json['prob_list'] = df['prob_list']
p_swarpped = df_json.apply(lambda x: swap_selection(x['aindex'], x['prob_list']), axis=1)
df_json.drop(columns=['prob_list'], inplace=True)
df_json['p'] = '"p":' + p_swarpped + ','
df_json['probability'] = p_swarpped.apply(lambda x: eval(x)[0])
df_json['m'] = '"m": "v1"'
if other_values is not None:
df_json['other_values'] = ',' + df[other_values]
df_json['right_brace'] = '}'
df_json['output_json'] = df_json.astype(str).sum(axis=1).str.replace(' ', '')
output_json = df_json['output_json'].to_frame()
return output_json
def add_context(df, df_json, context_cols):
df_json['c'] = ',"c": {"Features":['
for i, x in enumerate(context_cols):
if df[x].dtype.name.startswith(('float', 'int')):
df_json['c'] = df_json['c'] + '{"' + x + '": ' + df[x].astype(str) + '}'
else:
df_json['c'] = df_json['c'] + '{"' + x + '": "' + df[x].astype(str) + '"}'
if i!=len(context_cols)-1:
df_json['c'] = df_json['c'] + ','
else:
df_json['c'] = df_json['c'] + '],'
def swap_selection(select_idx, full_list):
swapped = full_list.copy()
swapped[select_idx] = full_list[0]
swapped[0] = full_list[select_idx]
swapped = str(swapped)
return swapped
def export_dsjson(output_json, batch_file_path):
output_json.to_csv(batch_file_path, index=False, header=False, sep='\t', quoting=csv.QUOTE_NONE, escapechar=' ')
def load_pred_context(pred_file, df_contexts, context_cols, action_mapping):
preds = []
with open(pred_file, 'r') as f:
for l in f.readlines():
if l!='\n':
preds.append(eval('{' + l + '}'))
df_contexts_pred = df_contexts[context_cols].copy()
df_contexts_pred['prob']= preds
df_contexts_pred['prob_list'] = df_contexts_pred['prob'].apply(lambda x: [x[k] for k in sorted(list(x.keys()))])
df_contexts_pred['exploit_action'] = df_contexts_pred['prob'].apply(lambda x: [k for k, v in x.items() if v==max(x.values())][0])
df_contexts_pred['exploit_action'] = df_contexts_pred['exploit_action'].map(action_mapping).astype(str)
return df_contexts_pred
def choose_action(df_batch, pred_context, action_col, action_mapping, balance_default=False, default_action_index=0):
context_cols = list(df_batch.columns)
df_batch = pd.merge(df_batch, pred_context, left_on=context_cols, right_on=context_cols, how='left')
df_batch['action_idx'] = df_batch['prob'].apply(lambda x: np.random.choice(list(x.keys()), 1, p=list(x.values())/np.sum(list(x.values())))[0])
df_batch['action_prob'] = df_batch.apply(lambda x: x['prob'][x['action_idx']], axis=1)
df_batch[action_col] = df_batch['action_idx'].map(action_mapping).astype(str)
df_batch.set_index(context_cols + [action_col], inplace=True)
return df_batch
def get_reward(df_batch, df, reward_col):
rewards = []
for idx in df_batch.index:
try:
rewards.append(np.random.choice(df.loc[idx, reward_col].values))
except KeyError:
rewards.append(np.nan)
df_batch[reward_col] = rewards
df_batch.dropna(inplace=True)
return df_batch
def get_unique_context(df_summary, action_col, reward_col, is_minimization):
context_cols = list(df_summary.index.names)
df_contexts = df_summary.copy()
df_contexts.columns.name = ''
if is_minimization:
df_contexts[action_col] = df_contexts.idxmin(axis=1)
df_contexts[reward_col] = df_contexts.min(axis=1)
else:
df_contexts[action_col] = df_contexts.idxmax(axis=1)
df_contexts[reward_col] = df_contexts.max(axis=1)
df_contexts = df_contexts.reset_index()[context_cols + [action_col, reward_col]]
return df_contexts
def select_data(i, df, df_contexts, configs, action_mapping, context_cols, action_col, reward_col):
if i==0:
df_batch = df.sample(configs['model_parameters']['batch_size_initial']).copy().reset_index()
df_batch['action_prob'] = round(1/len(configs['actions']), 4)
df_batch['prob_list'] = [[df_batch['action_prob'][0]]*len(configs['actions'])]*df_batch.shape[0]
else:
pred_context = load_pred_context(configs['pred_file'], df_contexts, context_cols, action_mapping)
df_batch = df.sample(configs['model_parameters']['batch_size']).copy().reset_index()[context_cols]
df_batch = choose_action(df_batch, pred_context, action_col, action_mapping)
df_batch = get_reward(df_batch, df, reward_col)
df_batch = df_batch.reset_index()
df_batch = df_batch[context_cols + [action_col, reward_col, 'action_prob', 'prob_list']]
df_batch['n_iteration'] = i
if configs['model_parameters']['add_control_group']:
df_batch['control_identifier'] = '"_group": "treatment"'
control_identifier = 'control_identifier'
else:
control_identifier = None
return df_batch, control_identifier
def get_regrets(trajectory, df_contexts, context_cols, reward_col, exploration_policy, is_minimization):
regret = pd.merge(trajectory, df_contexts, left_on=context_cols, right_on=context_cols, how='left', suffixes=['', '_optimal'])
multiplier = -1.0 if is_minimization else 1.0
regret['regret'] = multiplier * (regret[reward_col+'_optimal'] - regret[reward_col])
regret['exploration'] = exploration_policy
return regret
def plot_regrets(regret, groups, cumulate=False, rolling_window=10):
regret_avg = regret.groupby(groups)['regret'].mean().reset_index(-1)
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(111)
plot_contexts = regret_avg.index.unique().values
for c in plot_contexts:
plot_data = regret_avg.loc[c, ['n_iteration', 'regret']].set_index('n_iteration')
if cumulate:
plot_data = plot_data.cumsum()
else:
plot_data = plot_data.rolling(rolling_window, min_periods=1).mean()
plot_data.plot(label=c, ax=ax1)
plt.title('Average Regret by Iteration')
plt.legend(plot_contexts, loc="upper right", framealpha=0.2, fontsize='small')
plt.xlabel('Number of Iterations')
plt.ylabel('Average Regret by Iteration (rolling window = 10)')
plt.show()
def init_plot(iterations):
fig, ax = plt.subplots(1, 1, figsize=(8, 3))
plt.title('Accuracy by Iteration')
ax.set_xlabel('Iteration')
ax.set_ylabel('Accuracy')
ax.set_xlim(0, iterations)
ax.set_ylim(0, 1.03)
return fig, ax
def plt_dynamic(fig, ax, y):
ax.plot(y, c='c')
fig.canvas.draw()
def add_control_identifier(df_batch):
df_batch['control_identifier'] = '"_group": "treatment"'
return df_batch
def create_control_logs(i, df, new_name, configs, actions, context_cols, action_col, reward_col):
new_name_control = new_name.replace('.json', '_control.json')
if i==0:
df_batch = df.sample(configs['model_parameters']['batch_size_initial']).copy().reset_index()
else:
df_batch = df.sample(configs['model_parameters']['batch_size']).copy().reset_index()[context_cols]
df_batch[action_col] = actions[configs['model_parameters']['default_action_index']]
df_batch['action_prob'] = 1/len(actions)
df_batch['prob_list'] = [[1/len(actions)]*len(actions)]*df_batch.shape[0]
df_batch.set_index(context_cols+[action_col], inplace=True)
df_batch = get_reward(df_batch, df, reward_col)
df_batch = df_batch.reset_index()[context_cols + [action_col, reward_col, 'action_prob', 'prob_list']]
df_batch['control_identifier'] = '"_group": "control"'
df_batch['n_iteration'] = i
df_batch_json = transform_dsjson(df_batch, context_cols, reward_col, action_col, actions, is_minimization=False, other_values='control_identifier')
export_dsjson(df_batch_json, new_name_control)

Просмотреть файл

@ -0,0 +1,14 @@
{
"exps": {
"NewTest": {
"context_feature_namespace": ["Features"],
"action_label_key": ["id"],
"default_action_index": 0,
"data_folder": "E:\\data\\20190729_context_explorer\\simulated_data\\logs"
}
},
"output_folder": "\\\\avdsp-share\\InputFiles\\p_learning\\Monitoring",
"show_top_sensitive_contexts": 20,
"min_daily_sample": 20
}

Просмотреть файл

@ -0,0 +1,505 @@
import datetime
import dateutil
import itertools
import json
import operator
import os
import warnings
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import MultipleLocator
import multiprocessing
import numpy as np
import pandas as pd
from functools import reduce
from collections import defaultdict
from scipy import stats
class ContextExplorer():
'''
Provide context-specific analysis and reports for online Contextual Bandit AB Tests.
'''
def __init__(self, config_file, p_threshold=0.001, ci_std_mean=False):
'''
config_file[str]: path to config file
p_threshold [float]: p-value threshold to determine if a t-test result is statistically significant
ci_std_mean [bool]: whether to show the confidence interval as 1.96*std of the mean or the raw data
today [date]: overwrite the system date by specifying a different end date for all experiment
'''
configs = json.load(open(config_file))
config_exps = configs['exps']
self.output_folder = configs['output_folder']
self.top_n = configs.get('show_top_sensitive_contexts', 20)
self.min_sample = configs.get('min_daily_sample', 200)
self.name_cols()
self.prep_path()
self.config_exps = self.complete_config_dates(config_exps)
self.config_exps = self.complete_config(config_exps)
self.p_threshold = p_threshold
self.ci_std_mean = ci_std_mean
self.font_name = 'Arial'
self.font_family = 'sans-serif'
def name_cols(self):
self.reward_col = 'Reward'
self.reward_avg_col = 'Reward_Average'
self.reward_var_col = 'Reward_Variance'
self.ci_upper_col = 'Reward_Upper_CI'
self.ci_lower_col = 'Reward_Lower_CI'
self.cost_col = 'Cost'
self.count_col = 'Count'
self.context_col = 'Contexts'
self.action_col = 'Action'
self.control_col = 'IsControl'
self.exploit_col = 'IsExploitAction'
self.prob_col = 'Probability'
self.lasttime_col = 'LastTimestamp'
self.exp_col = 'Exp'
self.date_col = 'Date'
self.date_format = '%Y-%m-%d'
def prep_path(self):
if not os.path.exists(self.output_folder):
os.makedirs(self.output_folder)
def complete_config_dates(self, config_exps):
for exp in config_exps.keys():
if 'start_date' not in config_exps[exp]:
config_exps[exp]['start_date'] = datetime.date.min.strftime(self.date_format)
if 'end_date' not in config_exps[exp]:
config_exps[exp]['end_date'] = (datetime.date.max - datetime.timedelta(days=1)).strftime(self.date_format)
return config_exps
def complete_config(self, config_exps):
for exp in config_exps.keys():
config_exps[exp]['action_label_key'] = config_exps[exp].get('action_label_key', None)
config_exps[exp]['context_feature_namespace'] = config_exps[exp].get('context_feature_namespace', None)
config_exps[exp]['default_action_index'] = config_exps[exp].get('default_action_index', 0)
config_exps[exp]['sample_match'] = config_exps[exp].get('sample_match', None)
return config_exps
def get_dsjson_files(self, config):
dsjson_files = []
data_folder =config['data_folder']
for path, subdirs, files in os.walk(data_folder):
for fname in files:
if fname.endswith('.json'):
dsjson_files.append(os.path.join(path, fname))
return dsjson_files
def check_key_info(self, fjson):
key_info = ['_label_cost', '_label_probability', '_label_Action', '_labelIndex', 'Timestamp', 'a', 'c', 'p']
check_result = all([x in fjson.keys() for x in key_info])
return check_result
def check_time(self, fjson, config):
start_time = datetime.datetime.strptime(config['start_date'], self.date_format)
end_time = datetime.datetime.strptime(config['end_date'], self.date_format) + datetime.timedelta(days=1)
timestamp = dateutil.parser.parse(fjson['Timestamp'], ignoretz=True)
if (timestamp>=start_time) & (timestamp<=end_time):
return timestamp
else:
return False
def control_logic(self, dsjson, config):
if 'control_identifier' in config:
logic = []
for k, v in config['control_identifier'].items():
logic.append(dsjson[k] == v)
logic_bool = all(logic)
else:
logic_bool = dsjson['_labelIndex'] == config['default_action_index']
return logic_bool
def parse_others(self, dsjson, config, timestamp):
data = {}
data[self.cost_col] = dsjson['_label_cost']
data[self.reward_col] = -1 * data[self.cost_col]
data[self.control_col] = self.control_logic(dsjson, config)
data[self.action_col] = reduce(operator.getitem, config['action_label_key'], dsjson['c']['_multi'][dsjson['_labelIndex']])
data[self.exploit_col] = dsjson['p'][0] == np.max(dsjson['p'])
data[self.prob_col] = dsjson['p'][0]
data[self.lasttime_col] = timestamp
return data
def parse_context(self, dsjson, config):
context_data = {}
for ns in config['context_feature_namespace']:
if isinstance(ns, str):
for f in dsjson['c'][ns]:
context_data.update(f)
elif isinstance(ns, list):
f = [x for x in dsjson['c'][ns[0]] if ns[1] in x][0]
context_data.update(f)
else:
raise ValueError('context_feature_namespace must be a list of strings or a list of list')
return context_data
def parse_dsjson(self, file, config):
data_context = []
data_others = []
with open(file, 'r') as f:
nline = 0
for l in f.readlines():
nline = nline + 1
try:
fjson = json.loads(l)
except json.JSONDecodeError:
warnings.warn('Skip a record with invalid Json Format in file {0} line {1}'.format(file, nline))
continue
# Check key information
if self.check_key_info(fjson):
# Check time
timestamp = self.check_time(fjson, config)
if timestamp:
# Parse data
data_context.append(self.parse_context(fjson, config))
data_others.append(self.parse_others(fjson, config, timestamp))
return data_context, data_others
def process_dsjson(self, dsjson_files, config):
# Process files
args = [a for a in itertools.product(dsjson_files, [config])]
p = multiprocessing.Pool()
results = p.starmap_async(self.parse_dsjson, args)
p.close()
p.join()
# Split context and others
data_context = []
data_others = []
results_list = results.get()
for i in range(len(results_list)):
data_context.extend(results_list[i][0])
data_others.extend(results_list[i][1])
return data_context, data_others
def ci(self, x, ci_multiple=1.96):
return ci_multiple*np.std(x)/np.sqrt(len(x))
def read_df(self, econfig):
# Read and parse files
dsjson_files = self.get_dsjson_files(econfig)
data_context, data_others = self.process_dsjson(dsjson_files, econfig)
df_context = pd.DataFrame(data_context)
df_others = pd.DataFrame(data_others)
df = pd.concat([df_context, df_others], axis=1)
# Duplicate control as treatment if no real control group is included in the experiment -- when 'control_identifier' is defined.
if 'control_identifier' not in econfig:
df_control = df.loc[df[self.control_col]==True].copy()
df_control[self.control_col] = False
df = df.append(df_control, ignore_index=False)
df.loc[df[self.control_col]==True, self.exploit_col] = False
return df, list(df_context.columns)
def process_data(self, exp, df, features, config):
# Aggregated data
info_exp = defaultdict(dict)
info_exp['s_context_action']['df'], info_exp['s_context_action']['features'] = self.agg_data(exp, df, config, features, by_context=True, by_action=True)
info_exp['s_context']['df'], info_exp['s_context']['features'] = self.agg_data(exp, df, config, features, by_context=True, by_action=False)
info_exp['s_all']['df'], info_exp['s_all']['features'] = self.agg_data(exp, df, config, features, by_context=False, by_action=False)
# Dates
dates_in_df = pd.to_datetime(info_exp['s_all']['df'][self.date_col].unique())
last_date = max(dates_in_df).strftime(self.date_format)
info_exp['time_range'] = [dates_in_df.min().strftime(self.date_format), dates_in_df.max().strftime(self.date_format), last_date]
return info_exp
def format_agg_df(self, df_agg):
df_agg.columns = [self.reward_avg_col, self.reward_var_col, 'ci', self.count_col, self.lasttime_col]
df_agg[self.ci_upper_col] = df_agg[self.reward_avg_col] + df_agg['ci']
df_agg[self.ci_lower_col] = df_agg[self.reward_avg_col] - df_agg['ci']
df_agg.reset_index(inplace=True)
return df_agg
def agg_data(self, exp, df, config, features, by_context, by_action):
df_exp = df.copy()
df_exp[self.exp_col] = exp
df_exp[self.count_col] = 1
df_exp[self.date_col] = pd.to_datetime(df_exp[self.lasttime_col]).dt.date.astype(str)
df_exp[self.control_col] = df_exp[self.control_col].map({True: 'Control', False: 'Treatment'})
str_cols = [self.action_col, self.exploit_col] + features
for c in str_cols:
df_exp[c] = df_exp[c].astype(str)
if by_action == False:
df_exp[self.action_col] = 'All'
df_exp[self.exploit_col] = 'All'
if by_context == False:
df_exp[self.context_col] = 'All'
features = [self.context_col]
agg_group = [self.exp_col] + features + [self.control_col, self.action_col, self.exploit_col, self.date_col]
aggs = {self.reward_col: ['mean', 'var', self.ci], self.count_col: 'sum', self.lasttime_col: 'max'}
df_agg = df_exp.groupby(agg_group).agg(aggs)
if ('control_identifier' not in config) & (by_action == False):
df_agg = self.update_ips(df_exp, df_agg, features)
df_agg = self.format_agg_df(df_agg)
return df_agg, features
def update_ips(self, df_exp, df_agg, features):
# Split control and treatment
df_agg_control = df_agg.xs('Control', level=self.control_col, drop_level=False).reset_index(self.control_col).copy()
df_agg_treatment = df_agg.xs('Treatment', level=self.control_col, drop_level=False).copy()
# Compute IPS and update
agg_group_ips = [self.exp_col] + features + [self.action_col, self.exploit_col, self.date_col]
agg_group = [self.exp_col] + features + [self.control_col, self.action_col, self.exploit_col, self.date_col]
df_ips_control = df_exp.groupby(agg_group_ips).apply(self.ips_control).to_frame()
df_ips_control.columns = [[self.reward_col], ['mean']]
df_agg_control.update(df_ips_control)
df_agg_control = df_agg_control.reset_index().set_index(agg_group)
df_agg = df_agg_control.append(df_agg_treatment)
return df_agg
def ips_control(self, df_group):
N = df_group.loc[df_group[self.control_col]=='Treatment', self.count_col].sum()
df_control = df_group.loc[df_group[self.control_col]=='Control']
ips = (df_control[self.reward_col]/df_control[self.prob_col]).sum()/N
return ips
def add_cum_cols(self, df_wide):
for g in self.groups:
df_wide['Cum_'+self.count_col, g] = df_wide[self.count_col, g].cumsum()
df_wide['Cum_'+self.reward_avg_col, g] = 1.0 * df_wide['mu_n', g].cumsum() / df_wide['Cum_'+self.count_col, g]
df_wide['Cum_'+self.reward_var_col, g] = 1.0 * (df_wide['s2_n_1', g].cumsum() + df_wide['mu2_n', g].cumsum() - df_wide['Cum_'+self.count_col, g] * (df_wide['Cum_'+self.reward_avg_col, g]**2)) / (df_wide[self.count_col, g] - 1).cumsum()
if self.ci_std_mean:
ci95 = 1.96*((df_wide['Cum_'+self.reward_var_col, g]/df_wide['Cum_'+self.count_col, g])**1/2)
else:
ci95 = 1.96*((df_wide['Cum_'+self.reward_var_col, g])**1/2)
df_wide['Cum_'+self.ci_upper_col, g] = df_wide['Cum_'+self.reward_avg_col, g] + ci95
df_wide['Cum_'+self.ci_lower_col, g] = df_wide['Cum_'+self.reward_avg_col, g] - ci95
return df_wide
def generate_report(self):
self.set_plot_style()
html_template = self.set_html_template()
exp_data = {}
for exp, econfig in self.config_exps.items():
print('='*50)
print('>>> Reading data for {0}...'.format(exp))
df, features = self.read_df(econfig)
print('>>> Generating Report')
info_exp = self.process_data(exp, df, features, econfig)
info_exp, pic_names = self.summarize_exp(exp, info_exp)
info_exp['log_path'] = self.log_all(exp, info_exp['time_range'], info_exp)
html_exp = self.edit_html(exp, info_exp, html_template, pic_names)
html_outpath = self.export_html(exp, info_exp['time_range'], html_exp)
print('>>> Report saved to {0}'.format(html_outpath))
exp_data[exp] = info_exp
return exp_data
def summarize_exp(self, exp, info_exp):
tmp_pic_folder = self.prep_pic(exp)
pic_names = []
for s in ['s_context', 's_all']:
df_wide = self.reshape_data(info_exp[s])
info_exp[s]['table_summary'] = self.generate_summary_table(df_wide, info_exp, s)
tmp_pic_names = self.plot_trends(df_wide, info_exp, s, tmp_pic_folder)
pic_names = pic_names + tmp_pic_names
return info_exp, pic_names
def set_plot_style(self):
plt.style.use('ggplot')
matplotlib.rcParams['font.sans-serif'] = self.font_name
matplotlib.rcParams['font.family'] = self.font_family
def set_html_template(self):
with open('report_template.html', 'r') as h:
html_template = h.readlines()
html_template = ''.join(html_template)
html_template = html_template.replace('TBD_FONT_NAME', self.font_name)
html_template = html_template.replace('TBD_FONT_FAMILY', self.font_family)
return html_template
def prep_pic(self, exp):
tmp_pic_folder = os.path.join(self.output_folder, r'{0}/pic'.format(exp))
if not os.path.exists(tmp_pic_folder):
os.makedirs(tmp_pic_folder)
return tmp_pic_folder
def reshape_data(self, info):
df_wide = info['df'].copy()
self.groups = sorted(df_wide[self.control_col].unique())
df_wide['mu_n'] = df_wide[self.reward_avg_col]*df_wide[self.count_col]
df_wide['mu2_n'] = ((df_wide[self.reward_avg_col])**2)*df_wide[self.count_col]
df_wide['s2_n_1'] = df_wide[self.reward_var_col]*(df_wide[self.count_col]-1)
group_cols = [self.exp_col] + info['features']
groupby_cols = group_cols + [self.date_col, self.control_col]
df_wide = df_wide.groupby(groupby_cols).mean().unstack(-1).reset_index(-1)
df_wide = df_wide.groupby(group_cols).apply(lambda x: self.add_cum_cols(x))
keep_cols = [x for x in df_wide.columns.levels[0] if any([c in x for c in [self.date_col, self.count_col, self.reward_avg_col, self.reward_var_col, self.ci_lower_col, self.ci_upper_col]])]
df_wide = df_wide[keep_cols]
return df_wide
def generate_summary_table(self, df_wide, info_exp, s):
last_date = info_exp['time_range'][2]
df_last_wide = df_wide.loc[df_wide[self.date_col]==last_date].copy()
# Add delta and t-test results
for pre in ['', 'Cum_']:
df_last_wide[pre+self.reward_avg_col, 'Delta'] = df_last_wide[pre+self.reward_avg_col, 'Treatment'] - df_last_wide[pre+self.reward_avg_col, 'Control']
mean_diff = np.abs(df_last_wide[pre+self.reward_avg_col, 'Delta'])
s1 = df_last_wide[pre+self.reward_var_col, 'Control'] / df_last_wide[pre+self.count_col, 'Control']
s2 = df_last_wide[pre+self.reward_var_col, 'Treatment'] / df_last_wide[pre+self.count_col, 'Treatment']
sample_std = np.sqrt(s1 + s2)
degree_fredom = df_last_wide[pre+self.count_col].sum(axis=1)-2
df_last_wide[pre+self.reward_avg_col, 'p-value'] = np.round(stats.t.sf(mean_diff/sample_std, degree_fredom)*2, 6)
df_last_wide[pre+self.reward_avg_col, 'sig'] = np.where(df_last_wide[pre+self.reward_avg_col, 'p-value']<self.p_threshold, '*', '')
context_summary = df_last_wide[[self.count_col, self.reward_avg_col, 'Cum_'+self.count_col, 'Cum_'+self.reward_avg_col]]
# Get last exploit action
if s=='s_context':
context_summary = self.add_exploit_action(context_summary, info_exp)
summary_all = self.split_by_results(context_summary, df_wide)
return summary_all
else:
return context_summary
def split_by_results(self, context_summary, df_wide):
summary_all = context_summary.copy()
# Filter: sample size
check_n_small = (summary_all['Cum_'+self.count_col]/df_wide[self.date_col].nunique()<self.min_sample).any(axis=1)
summary_all['Result_Type'] = np.where(check_n_small, 'Excluded: Sample size too small', '')
# Filter: sensitivity
check_not_sig = (summary_all.iloc[:, summary_all.columns.get_level_values(1)=='sig']!='*').all(axis=1)
summary_all['Result_Type'] = np.where(check_not_sig & (summary_all['Result_Type']==''), 'Excluded: No significant movement', summary_all['Result_Type'])
# Filter: top n
tmp = summary_all.loc[(~check_n_small)&(~check_not_sig)].copy()
tmp['abs_cum_delta'] = abs(tmp['Cum_'+self.reward_avg_col]['Delta'])
tmp = tmp.sort_values('abs_cum_delta', ascending=False).drop(columns=['abs_cum_delta'], level=0)
top_pos = tmp.loc[tmp['Cum_'+self.reward_avg_col]['Delta']>0].head(self.top_n)
top_neg = tmp.loc[tmp['Cum_'+self.reward_avg_col]['Delta']<0].head(self.top_n)
summary_all.loc[top_pos.index.values, 'Result_Type'] = 'Included: Top {0} positive'.format(self.top_n)
summary_all.loc[top_neg.index.values, 'Result_Type'] = 'Included: Top {0} negative'.format(self.top_n)
summary_all['Result_Type'] = summary_all['Result_Type'].replace('', 'Excluded: Not top sensitive')
# Finalize
summary_all.sort_values('Result_Type', inplace=True)
return summary_all
def add_exploit_action(self, context_summary, info_exp):
last_date = info_exp['time_range'][2]
df_action = info_exp['s_context_action']['df'].copy()
df_exploit = df_action.loc[(df_action[self.date_col]==last_date)&(df_action[self.exploit_col]=='True')]
df_exploit_last = df_exploit[df_exploit.groupby(context_summary.index.names)[self.lasttime_col].transform(max) == df_exploit[self.lasttime_col]]
df_exploit_last.set_index(context_summary.index.names, inplace=True)
df_exploit_last = df_exploit_last[[self.action_col]]
df_exploit_last.columns = pd.MultiIndex.from_product([['Last'], ['Exploit Action']])
context_summary = pd.merge(context_summary, df_exploit_last, left_index=True, right_index=True, how='inner')
return context_summary
def plot_trends(self, df_wide, info_exp, s, tmp_pic_folder):
tmp_pic_names = []
time_range = info_exp['time_range']
summary_all = info_exp[s]['table_summary']
if s == 's_context':
excluded_list = set(summary_all.loc[summary_all['Result_Type'].str.startswith('Excluded')].index.values)
else:
excluded_list = []
for i in set(df_wide.index.values):
# Skip the excluded ones
if i in excluded_list:
continue
# Title and file name
title_text = 'Exp {0} - Context: {1}'.format(i[0], ', '.join([x for x in i[1:]]))
pic_name = '{0}_{1}_{2}-{3}.png'.format(i[0], ''.join([x for x in i[1:]]), time_range[0], time_range[1])
pic_path = os.path.join(tmp_pic_folder, pic_name)
fig, axs = plt.subplots(1, 3, figsize=(16,4), sharex=True)
# Plot Count
df_count = df_wide.loc[i, [self.date_col, self.count_col]].copy()
df_count[self.date_col] = pd.to_datetime(df_count[self.date_col])
df_count.set_index([self.date_col], inplace=True)
df_count[self.count_col].plot(ax=axs[0], marker='.', x_compat=True)
axs[0].set_title('Daily Count')
axs[0].set_xlabel("")
axs[0].set_ylabel(self.count_col)
axs[0].legend(loc='upper right', framealpha=0.4)
# Plot Reward
for p, pre in enumerate(['', 'Cum_']):
dfi = df_wide.loc[i, [self.date_col, pre+self.reward_avg_col, pre+self.ci_upper_col, pre+self.ci_lower_col]]
dfi[self.date_col] = pd.to_datetime(dfi[self.date_col])
dfi.columns = dfi.columns.map('|'.join).str.strip('|')
df_plot = dfi.set_index([self.date_col])
reward_lines = [x for x in df_plot.columns if self.reward_avg_col in x]
axs[p+1].plot(df_plot[reward_lines], marker='.')
for g in self.groups:
group_band = [x for x in df_plot.columns if x not in reward_lines and g in x]
axs[p+1].fill_between(df_plot.index, df_plot[group_band[0]], df_plot[group_band[1]], alpha=0.2, label=g)
subtitle = 'Daily Reward' if pre=='' else 'Cumulative Reward'
axs[p+1].set_title(subtitle)
axs[p+1].set_xlabel("")
axs[p+1].set_ylabel(self.reward_col)
axs[p+1].legend(loc='upper right', framealpha=0.4)
# Formats
axs[0].xaxis.set_major_locator(MultipleLocator(df_count.shape[0]//8))
axs[0].xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
fig.autofmt_xdate(rotation=30, ha='right')
fig.suptitle(title_text, fontsize=18)
fig.tight_layout(rect=[0, 0.03, 1, 0.9])
if os.path.isfile(pic_path):
os.remove(pic_path)
plt.savefig(pic_path, dpi=200)
plt.close(fig)
tmp_pic_names.append(pic_name)
return tmp_pic_names
def edit_html(self, exp, info_exp, html_template, pic_names):
# Titles
date_min = info_exp['time_range'][0]
date_max = info_exp['time_range'][1]
html_exp = html_template
html_exp = html_exp.replace('TBD_TITLE', 'Experiment {0} - Context Explorer'.format(exp))
html_exp = html_exp.replace('TBD_DATES', '{0} - {1}'.format(date_min, date_max))
html_exp = html_exp.replace('TBD_EXPID', str(exp))
# Style
html_exp = html_exp.replace('TBD_FONT_NAME', self.font_name)
html_exp = html_exp.replace('TBD_FONT_FAMILY', self.font_family)
# [1] Overall - Trend
p1_pics = [p for p in pic_names if p=='{0}_{1}_{2}-{3}.png'.format(exp, 'All', date_min, date_max)]
p1_pics = ''.join(['<img src="pic\{0}" width="1200"><br>'.format(p) for p in p1_pics])
html_exp = html_exp.replace('TBD_OverallPlot', p1_pics)
# [2] Context - Latest and Cumulative Performance
summary_all = info_exp['s_context']['table_summary'].copy()
html_exp = html_exp.replace('TBD_NIDX', str(len(summary_all.index.names)))
summary_all.reset_index(col_level=1, col_fill='Context', inplace=True)
summary_all.columns.names = [None, None]
p2_table_pos = summary_all.loc[summary_all['Result_Type'].str.endswith('positive')].drop(columns=['Result_Type'], level=0)
p2_table_neg = summary_all.loc[summary_all['Result_Type'].str.endswith('negative')].drop(columns=['Result_Type'], level=0).copy()
p2_table_pos_html = p2_table_pos.to_html(index=False)
p2_table_neg_html = p2_table_neg.to_html(index=False)
html_exp = html_exp.replace('TBD_ContextTable_Positive', p2_table_pos_html)
html_exp = html_exp.replace('TBD_ContextTable_Negative', p2_table_neg_html)
html_exp = html_exp.replace('TBD_LASTDATE', info_exp['time_range'][2])
html_exp = html_exp.replace('TBD_LOG_FILE', os.path.basename(info_exp['log_path']))
# [3] Context - Trend
if p2_table_pos.shape[0]>0:
pos_list = p2_table_pos['Context'][info_exp['s_context']['features']].astype(str).sum(axis=1).str.replace(' ', '').values
p3_pics_pos = [p for l in pos_list for p in pic_names if l in p]
p3_pics_pos = ''.join(['<img src="pic\{0}" width="1200"><br>'.format(p) for p in p3_pics_pos])
html_exp = html_exp.replace('TBD_ContextPlot_Positive', p3_pics_pos)
else:
html_exp = html_exp.replace('TBD_ContextPlot_Positive', '')
if p2_table_neg.shape[0]>0:
neg_list = p2_table_neg['Context'][info_exp['s_context']['features']].astype(str).sum(axis=1).str.replace(' ', '').values
p3_pics_neg = [p for l in neg_list for p in pic_names if l in p]
p3_pics_neg = ''.join(['<img src="pic\{0}" width="1200"><br>'.format(p) for p in p3_pics_neg])
html_exp = html_exp.replace('TBD_ContextPlot_Negative', p3_pics_neg)
else:
html_exp = html_exp.replace('TBD_ContextPlot_Negative', '')
return html_exp
def export_html(self, exp, dates, html_exp):
html_outpath = os.path.join(*[self.output_folder, exp, 'Context_Explorer_{0}_{1}-{2}.html'.format(exp, dates[0], dates[1])])
with open(html_outpath, 'w') as o:
o.write(html_exp)
return html_outpath
def log_all(self, exp, dates, info_exp):
log_path = os.path.join(*[self.output_folder, exp, 'log_all_contexts_{0}_{1}-{2}.xlsx'.format(exp, dates[0], dates[1])])
info_exp['s_context']['table_summary'].to_excel(log_path)
return log_path
def print_process(iteration, total, prefix='', suffix='', decimals=0, length=30, empty='-', fill='|'):
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
filledLength = int(round(length * iteration / total))
bar = fill * filledLength + empty * (length - filledLength)
print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = '\r')
if iteration == total:
print()

Просмотреть файл

@ -0,0 +1,151 @@
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
* {box-sizing: border-box}
body, html {
margin: 0;
font-family: TBD_FONT_NAME, TBD_FONT_FAMILY;
background-color: #000000;
padding: 10px;
}
table {
border-collapse: collapse;
font-size: 12px;
}
th, td {
text-align: center;
padding: 4px;
border: 1px solid #d9d9d9;
}
tr:nth-child(even){background-color: #f2f2f2}
tr:nth-child(odd){background-color: #ffffff}
td:nth-child(-n+TBD_NIDX){background-color: #e6e6e6;}
th {
background-color: #cccccc;
}
ul li { padding: 3px 3px; }
.tablink {
background-color: #555;
color: white;
float: left;
border: none;
outline: none;
cursor: pointer;
padding: 14px 16px;
font-size: 17px;
font-family: TBD_FONT_NAME, TBD_FONT_FAMILY;
width: 25%;
}
.tablink:hover {
background-color: #777;
}
.tabcontent {
display: none;
padding: 80px 20px;
height: 100%;
}
#Experiment {background-color: #f0f5f5;}
#OverallTrend {background-color: #ecf9f2;}
#ContextSummary {background-color: #e6f7ff;}
#ContextTrend {background-color: #fff7e6;}
</style>
</head>
<body>
<h1 style="color:white;">TBD_TITLE</h1>
<h3 style="color:white;">TBD_DATES</h3>
<button class="tablink" onclick="openPage('Experiment', this, '#75a3a3')"id="defaultOpen">Experiment</button>
<button class="tablink" onclick="openPage('OverallTrend', this, 'green')">Overall Trend</button>
<button class="tablink" onclick="openPage('ContextSummary', this, '#008ae6')">Context Summary</button>
<button class="tablink" onclick="openPage('ContextTrend', this, 'orange')">Context Trend</button>
<div id="Experiment" class="tabcontent">
<h3>Experiment Summary</h3>
<ul>
<li>Experiment ID: TBD_EXPID</li>
<li>Report Time Span: TBD_DATES</li>
</ul>
<h3> Navigate this Report: </h3>
<ul>
<li>Overall Trend: Plots for daily count and the main reward metric trend over time for the entire experiment.</li>
<li>Context Summary: A summary table for the latest and cumulative statistics by context. </li>
<li>Context Trend: Plots for daily count and the main reward metric trend over time by context. </li>
</ul>
</div>
<div id="OverallTrend" class="tabcontent">
<h3>Overall - Trend</h3>
Rewards for the control group are estimated with Inverse Propensity Scoring(IPS). Reference: <a href="https://en.wikipedia.org/wiki/Inverse_probability_weighting">Wikipedia</a>, <a href="https://github.com/microsoft/mwt-ds/blob/master/images/MWT-WhitePaper.pdf">Paper</a> page 12.
<br><br>
TBD_OverallPlot
</div>
<div id="ContextSummary" class="tabcontent">
<h3>Context Summary - Latest and Cumulative Performance</h3>
<ul>
<li>Latest Performance: data from the last full day -- TBD_LASTDATE</li>
<li>Cumulative Performance: data aggregated for all dates within TBD_DATES </li>
</ul>
<b>Top Positive Movements</b>
TBD_ContextTable_Positive
<br>
<b>Top Negative Movements</b>
TBD_ContextTable_Negative
<br><br>
<i><font size="2">* Note: Some contexts are filtered out due to sample size constraints and sensitivity limits. Please find information of all contexts in the <a href="TBD_LOG_FILE"> log file</a></font></i>
</div>
<div id="ContextTrend" class="tabcontent">
<h3>Context - Trend</h3>
<b>Top Positive Movements</b><br>
TBD_ContextPlot_Positive
<br>
<b>Top Negative Movements</b><br>
TBD_ContextPlot_Negative
</div>
<script>
function openPage(pageName,elmnt,color) {
var i, tabcontent, tablinks;
tabcontent = document.getElementsByClassName("tabcontent");
for (i = 0; i < tabcontent.length; i++) {
tabcontent[i].style.display = "none";
}
tablinks = document.getElementsByClassName("tablink");
for (i = 0; i < tablinks.length; i++) {
tablinks[i].style.backgroundColor = "";
}
document.getElementById(pageName).style.display = "block";
elmnt.style.backgroundColor = color;
}
// Get the element with id="defaultOpen" and click on it
document.getElementById("defaultOpen").click();
</script>
</body>
</html>

Просмотреть файл

@ -0,0 +1,10 @@
import sys
from context_explorer import *
def run_context_explorer(config_path):
ce = ContextExplorer(config_path)
exp_data = ce.generate_report()
if __name__ == "__main__":
# Pass the path to the config file to run Context Explorer
run_context_explorer(sys.argv[1])

Просмотреть файл

@ -0,0 +1,387 @@
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
* {box-sizing: border-box}
body, html {
margin: 0;
font-family: Arial, sans-serif;
background-color: #000000;
padding: 10px;
}
table {
border-collapse: collapse;
font-size: 12px;
}
th, td {
text-align: center;
padding: 4px;
border: 1px solid #d9d9d9;
}
tr:nth-child(even){background-color: #f2f2f2}
tr:nth-child(odd){background-color: #ffffff}
td:nth-child(-n+4){background-color: #e6e6e6;}
th {
background-color: #cccccc;
}
ul li { padding: 3px 3px; }
.tablink {
background-color: #555;
color: white;
float: left;
border: none;
outline: none;
cursor: pointer;
padding: 14px 16px;
font-size: 17px;
font-family: Arial, sans-serif;
width: 25%;
}
.tablink:hover {
background-color: #777;
}
.tabcontent {
display: none;
padding: 80px 20px;
height: 100%;
}
#Experiment {background-color: #f0f5f5;}
#OverallTrend {background-color: #ecf9f2;}
#ContextSummary {background-color: #e6f7ff;}
#ContextTrend {background-color: #fff7e6;}
</style>
</head>
<body>
<h1 style="color:white;">Experiment NewTest - Context Explorer</h1>
<h3 style="color:white;">2020-02-13 - 2020-03-13</h3>
<button class="tablink" onclick="openPage('Experiment', this, '#75a3a3')"id="defaultOpen">Experiment</button>
<button class="tablink" onclick="openPage('OverallTrend', this, 'green')">Overall Trend</button>
<button class="tablink" onclick="openPage('ContextSummary', this, '#008ae6')">Context Summary</button>
<button class="tablink" onclick="openPage('ContextTrend', this, 'orange')">Context Trend</button>
<div id="Experiment" class="tabcontent">
<h3>Experiment Summary</h3>
<ul>
<li>Experiment ID: NewTest</li>
<li>Report Time Span: 2020-02-13 - 2020-03-13</li>
</ul>
<h3> Navigate this Report: </h3>
<ul>
<li>Overall Trend: Plots for daily count and the main reward metric trend over time for the entire experiment.</li>
<li>Context Summary: A summary table for the latest and cumulative statistics by context. </li>
<li>Context Trend: Plots for daily count and the main reward metric trend over time by context. </li>
</ul>
</div>
<div id="OverallTrend" class="tabcontent">
<h3>Overall - Trend</h3>
Rewards for the control group are estimated with Inverse Propensity Scoring(IPS). Reference: <a href="https://en.wikipedia.org/wiki/Inverse_probability_weighting">Wikipedia</a>, <a href="https://github.com/microsoft/mwt-ds/blob/master/images/MWT-WhitePaper.pdf">Paper</a> page 12.
<br><br>
<img src="pic\NewTest_All_2020-02-13-2020-03-13.png" width="1200"><br>
</div>
<div id="ContextSummary" class="tabcontent">
<h3>Context Summary - Latest and Cumulative Performance</h3>
<ul>
<li>Latest Performance: data from the last full day -- 2020-03-13</li>
<li>Cumulative Performance: data aggregated for all dates within 2020-02-13 - 2020-03-13 </li>
</ul>
<b>Top Positive Movements</b>
<table border="1" class="dataframe">
<thead>
<tr>
<th colspan="4" halign="left">Context</th>
<th colspan="2" halign="left">Count</th>
<th colspan="5" halign="left">Reward_Average</th>
<th colspan="2" halign="left">Cum_Count</th>
<th colspan="5" halign="left">Cum_Reward_Average</th>
<th>Last</th>
</tr>
<tr>
<th>Exp</th>
<th>CallType</th>
<th>MediaType</th>
<th>NetworkType</th>
<th>Control</th>
<th>Treatment</th>
<th>Control</th>
<th>Treatment</th>
<th>Delta</th>
<th>p-value</th>
<th>sig</th>
<th>Control</th>
<th>Treatment</th>
<th>Control</th>
<th>Treatment</th>
<th>Delta</th>
<th>p-value</th>
<th>sig</th>
<th>Exploit Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>NewTest</td>
<td>1_1</td>
<td>Audio</td>
<td>wifi</td>
<td>28</td>
<td>616</td>
<td>-0.094817</td>
<td>0.038415</td>
<td>0.133232</td>
<td>0.000000</td>
<td>*</td>
<td>737</td>
<td>18711</td>
<td>-0.080374</td>
<td>0.036286</td>
<td>0.116660</td>
<td>0.0</td>
<td>*</td>
<td>{'7': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>1_1</td>
<td>Audio</td>
<td>wired</td>
<td>20</td>
<td>613</td>
<td>-0.078481</td>
<td>0.075195</td>
<td>0.153676</td>
<td>0.000000</td>
<td>*</td>
<td>762</td>
<td>18716</td>
<td>-0.076536</td>
<td>0.073399</td>
<td>0.149936</td>
<td>0.0</td>
<td>*</td>
<td>{'2': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>1_1</td>
<td>Video</td>
<td>wifi</td>
<td>26</td>
<td>600</td>
<td>0.013455</td>
<td>0.064231</td>
<td>0.050775</td>
<td>0.000000</td>
<td>*</td>
<td>751</td>
<td>18724</td>
<td>0.012004</td>
<td>0.061820</td>
<td>0.049817</td>
<td>0.0</td>
<td>*</td>
<td>{'1': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>1_1</td>
<td>Video</td>
<td>wired</td>
<td>14</td>
<td>660</td>
<td>-0.036584</td>
<td>0.075915</td>
<td>0.112499</td>
<td>0.000000</td>
<td>*</td>
<td>757</td>
<td>18947</td>
<td>-0.065755</td>
<td>0.070655</td>
<td>0.136410</td>
<td>0.0</td>
<td>*</td>
<td>{'7': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>GVC</td>
<td>Audio</td>
<td>wifi</td>
<td>26</td>
<td>632</td>
<td>-0.052450</td>
<td>0.061070</td>
<td>0.113519</td>
<td>0.000000</td>
<td>*</td>
<td>759</td>
<td>18702</td>
<td>-0.049284</td>
<td>0.062814</td>
<td>0.112098</td>
<td>0.0</td>
<td>*</td>
<td>{'3': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>GVC</td>
<td>Video</td>
<td>wifi</td>
<td>32</td>
<td>605</td>
<td>-0.023427</td>
<td>0.087424</td>
<td>0.110851</td>
<td>0.000000</td>
<td>*</td>
<td>732</td>
<td>18587</td>
<td>-0.000829</td>
<td>0.077970</td>
<td>0.078800</td>
<td>0.0</td>
<td>*</td>
<td>{'5': 1}</td>
</tr>
<tr>
<td>NewTest</td>
<td>GVC</td>
<td>Video</td>
<td>wired</td>
<td>18</td>
<td>621</td>
<td>0.057706</td>
<td>0.085413</td>
<td>0.027708</td>
<td>0.000067</td>
<td>*</td>
<td>1203</td>
<td>18708</td>
<td>0.064632</td>
<td>0.081562</td>
<td>0.016930</td>
<td>0.0</td>
<td>*</td>
<td>{'1': 1}</td>
</tr>
</tbody>
</table>
<br>
<b>Top Negative Movements</b>
<table border="1" class="dataframe">
<thead>
<tr>
<th colspan="4" halign="left">Context</th>
<th colspan="2" halign="left">Count</th>
<th colspan="5" halign="left">Reward_Average</th>
<th colspan="2" halign="left">Cum_Count</th>
<th colspan="5" halign="left">Cum_Reward_Average</th>
<th>Last</th>
</tr>
<tr>
<th>Exp</th>
<th>CallType</th>
<th>MediaType</th>
<th>NetworkType</th>
<th>Control</th>
<th>Treatment</th>
<th>Control</th>
<th>Treatment</th>
<th>Delta</th>
<th>p-value</th>
<th>sig</th>
<th>Control</th>
<th>Treatment</th>
<th>Control</th>
<th>Treatment</th>
<th>Delta</th>
<th>p-value</th>
<th>sig</th>
<th>Exploit Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>NewTest</td>
<td>GVC</td>
<td>Audio</td>
<td>wired</td>
<td>486</td>
<td>653</td>
<td>0.110349</td>
<td>0.073671</td>
<td>-0.036677</td>
<td>0.0</td>
<td>*</td>
<td>13150</td>
<td>18905</td>
<td>0.107776</td>
<td>0.067073</td>
<td>-0.040703</td>
<td>0.0</td>
<td>*</td>
<td>{'0': 1}</td>
</tr>
</tbody>
</table>
<br><br>
<i><font size="2">* Note: Some contexts are filtered out due to sample size constraints and sensitivity limits. Please find information of all contexts in the <a href="log_all_contexts_NewTest_2020-02-13-2020-03-13.xlsx"> log file</a></font></i>
</div>
<div id="ContextTrend" class="tabcontent">
<h3>Context - Trend</h3>
<b>Top Positive Movements</b><br>
<img src="pic\NewTest_1_1Audiowifi_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_1_1Audiowired_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_1_1Videowifi_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_1_1Videowired_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_GVCAudiowifi_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_GVCVideowifi_2020-02-13-2020-03-13.png" width="1200"><br><img src="pic\NewTest_GVCVideowired_2020-02-13-2020-03-13.png" width="1200"><br>
<br>
<b>Top Negative Movements</b><br>
<img src="pic\NewTest_GVCAudiowired_2020-02-13-2020-03-13.png" width="1200"><br>
</div>
<script>
function openPage(pageName,elmnt,color) {
var i, tabcontent, tablinks;
tabcontent = document.getElementsByClassName("tabcontent");
for (i = 0; i < tabcontent.length; i++) {
tabcontent[i].style.display = "none";
}
tablinks = document.getElementsByClassName("tablink");
for (i = 0; i < tablinks.length; i++) {
tablinks[i].style.backgroundColor = "";
}
document.getElementById(pageName).style.display = "block";
elmnt.style.backgroundColor = color;
}
// Get the element with id="defaultOpen" and click on it
document.getElementById("defaultOpen").click();
</script>
</body>
</html>

Двоичные данные
ContextExplorer/sample_output/ce_report_screenshot.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 351 KiB

Двоичный файл не отображается.

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 189 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 230 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 174 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 219 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 193 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 230 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 177 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 222 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 182 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 195 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 189 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 225 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 177 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 219 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 198 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 230 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 213 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 271 KiB