sample notebook changes and readme (#282)

* sample notebook changes and readme first draft

* addressing PR comments and adding EA screenshots

* bump requirement version of rai_core_flask in raiwidgets (#283)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* Fix annotations on False Positive/False Negative Performance Plot (#284)

The individual model view has a chart showing false positive and false negative rates (defined in `PerformancePlot.tsx`). The axis of this plot was in percent, but the annotations on the bars themselves were decimal. This problem was due to the incorrect key to `PerformanceMetrics.ts` being passed into `FormatMetrics.ts` by `PerformancePlot.tsx` - in the `PerformanceMetric` map, the names `fallout_rate` and `miss_rate` are used. This fixes that mismatch.

Had to force in because for some reason the successful builds were not registered.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* update notice file (#286)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove ipywidgets requirement and also duplicate (unused) gevent (#287)

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* move footer up in axis config dialog, add legend text, fix indexing for (#288)

what if and inspection view, add big data support, add map shift dialog

* Add localisation files to prettierignore (#290)

The language specific files `en.*.json` are produced by the localisation builds, and not by us. Remove them from prettier.

Signed-off-by: Richard Edgar <riedgar@microsoft.com>

* fix label alignment, fix what if for categoricals, change index to come (#292)

from explanation dataset

* Minor style fix (#289)

* style fix

* fabric build-in

* notebook adjustments to make them work with latest changes

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* remove debugging statement

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

* yarn lintfix ignore README

Signed-off-by: Roman Lutz <rolutz@microsoft.com>

Co-authored-by: Roman Lutz <rolutz@microsoft.com>
Co-authored-by: Richard Edgar <riedgar@microsoft.com>
Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
Co-authored-by: Bo <71688188+zhb000@users.noreply.github.com>
This commit is contained in:
Mehrnoosh Sameki 2021-01-29 02:37:57 -05:00 коммит произвёл GitHub
Родитель d755d87b93
Коммит eeff0eeba5
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
23 изменённых файлов: 739 добавлений и 602261 удалений

Просмотреть файл

@ -3,6 +3,7 @@
/dist
/coverage
_NOTICE.md
README.md
raiwidgets/raiwidgets/widget
# Language specific files are not generated by us

280
README.md
Просмотреть файл

@ -1,22 +1,276 @@
![Responsible AI Widgets Python Build](https://github.com/microsoft/responsible-ai-widgets/workflows/Responsible%20AI%20Widgets/badge.svg) ![Responsible AI Core Flask Python Build](https://github.com/microsoft/responsible-ai-widgets/workflows/Responsible%20AI%20Core%20Flask/badge.svg) ![Node.js CI](https://github.com/microsoft/responsible-ai-widgets/workflows/Node.js%20CI/badge.svg) ![Node.js CD](https://github.com/microsoft/responsible-ai-widgets/workflows/Node.js%20CD/badge.svg) ![MIT license](https://img.shields.io/badge/License-MIT-blue.svg) ![PyPI raiwidgets](https://img.shields.io/pypi/v/raiwidgets?color=blue) ![PyPI rai_core_flask](https://img.shields.io/pypi/v/rai_core_flask?color=blue) ![npm fairness](https://img.shields.io/npm/v/@responsible-ai/fairness?label=npm%20%40responsible-ai%2Ffairness) ![npm interpret](https://img.shields.io/npm/v/@responsible-ai/interpret?label=npm%20%40responsible-ai%2Finterpret) ![npm mlchartlib](https://img.shields.io/npm/v/@responsible-ai/mlchartlib?label=npm%20%40responsible-ai%2Fmlchartlib) ![npm core-ui](https://img.shields.io/npm/v/@responsible-ai/core-ui?label=npm%20%40responsible-ai%2Fcore-ui)
# Responsible AI Widgets
# Responsible-AI-Widgets
Responsible-AI-Widgets provides a collection of model and data exploration and assessment user interfaces that enable better understanding of AI systems. Together, these interfaces empower developers and stakeholders of AI systems to develop and monitor AI more responsibly. Currently, there are three widgets demonstrating how to interpret models and assess their errors and fairness issues.
This repository contains the Jupyter notebooks with examples to showcase how to use these widgets.
# Contents
- [Overview of Responsible-AI-Widgets](#intro)
- [Interpretability Dashboard](#interpretability-dashboard)
- [Error Analysis Dashboard](#error-dashboard)
- [Fairness Dashboard](#fairness-dashboard)
- [Supported Models](#supported-models)
- [Getting Started](#getting-started)
<a name="intro"></a>
# Overview of Responsible-AI-Widgets
Responsible-AI-Widgets extends the [Interpret-Community](https://github.com/interpretml/interpret-community) and [Fairlearn](https://github.com/fairlearn/fairlearn) repositories and provides user interfaces for model interpretability and fairness assessment of machine learning models. It introduces Error Analysis, a toolkit to identify and diagnose errors in machine learning models. The following table shows a list of the user interfaces available in this repository:
User Interface|Description|Use Case (Assessing a loan allocation model to accept or deny home loan applicants.)
|--|--|--|
|Interpretability Dashboard| User interface for [Interpret-Community](https://github.com/interpretml/interpret-community) which enables you to 1) evaluate your model by observing its performance metrics, 2) explore your dataset statistics, 3) understand the most important factors impacting your models overall (global) and individual (local) predictions, 4) debug models by performing a variety of feature perturbation operations (e.g., what-if analysis and Individual Conditional Expectation Plots), and 5) Understand your models explanations on different demographics.|Use the Interpretability dashboard to understand which factors have the most impact on your model's accept/deny decisions. Observe this for the whole population, for a subset of applicants (e.g., females), and individuals (such as why Marys loan got rejected).|
|Error Analysis (+ Interpretability) Dashboard| Use the Error Analysis dashboard to 1) ***Identify*** cohorts with high error rate versus benchmark and visualize how the error rate is distributed. 2) ***Diagnose*** the root causes of the errors by visually diving deeper into the characteristics of data and models (via its embedded interpretability capabilities)|Use Error Analysis to discover that the model has a higher error rate for a specific cohort (e.g., females with income <$50K) vs. the rest of the population. Next, use the embedded interpretability capabilities of this dashboard to understand most impactful factors responsible for this subsets erroneous predictions. Moreover, use interpretability to inspect some individuals of that cohort receiving erroneous predictions, understand their feature importance values, and perform what-if analysis on them to diagnose the contributing error factors better.|
|Fairness Dashboard| User interface for [Fairlearn](https://github.com/fairlearn/fairlearn) which enables you to use common fairness metrics to assess which groups of people may be negatively impacted (females vs. males vs. non-binary gender). Also explore Fairlearn's state-of-the-art unfairness mitigation algorithms to mitigate fairness issues in your classification and regression models. |Use Fairness dashboard to assess harm of allocation (i.e., to understand whether your loan allocation model approves more applications of a specific advantaged group). Use Fairness dashboard to assess harm of quality of service (i.e., Understand how your model performs on applications of your qualified males group vs. qualified females/non-binary gender.) Navigate trade offs between fairness and performance of your loan allocation model. Use [Fairlearn](https://github.com/fairlearn/fairlearn)'s mitigation algorithms to mitigate the observed fairness issues.|
Besides the above functionalities, this repository provides foundational blocks such as
- A shared Flask service layer which also maintains utilities to determine the environment that it is running in so that it can configure the local flask service accordingly. This layer is published in the ```rai_core_flask``` package on PyPI.
- A base typescript library with common controls used across responsible AI dashboards. For information on how to contribute please refer to our [Contributor Guide](#Contributing).
## Example Notebooks
- [Interpretability for binary classification (employee attrition)](https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/interpretability-dashboard-employee-attrition.ipynb)
- [Fairness assessment of a loan allocation model](https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/fairness-dashboard-loan-allocation.ipynb)
- [Joint Example: Interpretability and fairness assessment a loan allocation model](https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/fairness-interpretability-dashboard-loan-allocation.ipynb)
- [Error analysis and interpretability of a census income prediction model](https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/erroranalysis-interpretability-dashboard-census.ipynb)
- [Error analysis and interpretability of a breast cancer prediction model](https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/erroranalysis-interpretability-dashboard-breast-cancer.ipynb)
<a name="interpretability dashboard"></a>
# Interpretability Dashboard
Please refer to [Interpret-Community](https://github.com/interpretml/interpret-community)'s README and [sample notebooks](https://github.com/interpretml/interpret-community/tree/master/notebooks) to learn how you can train and generate model explanations. Once your model is trained and your explanation object is generated, load the interpretability visualization dashboard in your notebook to understand and interpret your model:
```python
from raiwidgets import ExplanationDashboard
ExplanationDashboard(global_explanation, model, dataset=x_test, true_y=y_test)
```
Once you load the visualization dashboard, you can investigate different aspects of your dataset and trained model via four tab views:
* Model Performance
* Data Explorer
* Aggregate Feature Importance
* Individual Feature Importance and what-if
---
**NOTE**
Click on "Open in a new tab" on the top left corner to get a better view of the dashboard in a new tab.
---
You can further create custom cohorts (subgroups of your dataset) to explore the insights across different subgroups (e.g., women vs. men). The created cohorts can contain more than one filter (e.g., age < 30 and sex = female) and will be visible from all of the four tabs. The following sections demonstrate the visualization dashboard capabilities on a [classification model trained on employee attrition dataset]((https://github.com/microsoft/responsible-ai-widgets/blob/master/notebooks/interpretability-dashboard-employee-attrition.ipynb)). Besides the default cohort (including the whole dataset), there are two additional cohorts created: employees with Age <= 35 and employees with Age > 35.
![Visualization Dashboard Cohorts](./img/Interpretability-Cohorts.png)
### Model performance
This tab enables you to evaluate your model by observing its performance metrics and prediction probabilities/classes/values across different cohorts.
![Visualization Dashboard Cohorts](./img/Interpretability-ModelPerformance.png)
### Dataset explorer
You can explore your dataset statistics by selecting different filters along the X, Y, and color axes of this tab to slice your data into different dimensions.
![Visualization Dashboard Cohorts](./img/Interpretability-DatasetExplorer.png)
The following plots provide a global view of the trained model along with its predictions and explanations.
### Aggregate feature importance (global explanation)
This view consists of two charts:
|Plot|Description|
|----|-----------|
|Feature Importance| Explore the top K important features that impact your overall model predictions (a.k.a. global explanation). Use the slider to show additional less important feature values. Select up to three cohorts to see their feature importance values side by side.|
|Dependence Plot|Click on any of the feature bars in the feature importance graph to see the relationship of the values of the selected feature to its corresponding feature importance values. Overall, this plot show how values of the selected feature impact model prediction.|
![Visualization Dashboard Global](./img/Interpretability-GlobalExplanation.png)
### Individual feature importance (local explanation) and what-if
You can click on any individual data point on the scatter plot to view its local feature importance values (local explanation) and individual conditional expectation (ICE) plot below. These are the capabilities covered in this tab:
|Plot|Description|
|----|-----------|
|Feature Importance Plot|Shows the top K (configurable K) important features for an individual prediction. Helps illustrate the local behavior of the underlying model on a specific data point.|
|Individual Conditional Expectation (ICE)| Allows feature value changes from a minimum value to a maximum value. Helps illustrate how the data point's prediction changes when a feature changes.|
|Perturbation Exploration (what-if analysis)|Allows changes to feature values of the selected data point to observe resulting changes to prediction value. You can then save your hypothetical what-if data point.|
![Visualization Dashboard Global](./img/Interpretability-LocalExplanation.png)
![Visualization Dashboard Global](./img/Interpretability-WhatIf.gif)
<a name="error dashboard"></a>
# Error Analysis Dashboard
Introducing the latest addition to the Responsible AI open-source toolkit collection, Error Analysis drives deeper to provide a better understanding of your machine learning model's behaviors. Use Error Analysis to identify cohorts with higher error rates and diagnose the root causes behind these errors. Combined with [Fairlearn](github.com/fairlearn/fairlearn) and [Interpret-Community](https://github.com/interpretml/interpret-community), practitioners can perform a wide variety of assessment operations to build responsible machine learning. Use this dashboard to:
1. Evaluate Cohorts: Learn how errors distribute across different cohorts at different levels of granularity
2. Explore Predictions: Use built-in interpretability features or combine with InterpretML for boosted debugging capability
3. Interactive Dashboard View customizable pre-built visuals to quickly identify errors and diagnose root causes
Run the dashboard via:
```python
from raiwidgets import ErrorAnalysisDashboard
ErrorAnalysisDashboard(global_explanation, dashboard_pipeline, dataset=X_test_original,
true_y=y_test, categorical_features=categorical_features)
```
Once you load the visualization dashboard, you can investigate different aspects of your dataset and trained model via two stages:
* Identification
* Diagnosis
---
**NOTE**
Click on "Open in a new tab" on the top left corner to get a better view of the dashboard in a new tab.
---
## Identification of Errors
Error Analysis identifies cohorts of data with higher error rate than the overall benchmark. These discrepancies might occur when the system or model underperforms for specific demographic groups or infrequently observed input conditions in the training data.
### Different Methods for Error Identification
1. Decision Tree: Discover cohorts with high error rates across multiple features using the binary tree visualization. Investigate indicators such as error rate, error coverage, and data representation for each discovered cohort.
2. Error Heatmap: Once you form hypotheses of the most impactful features for failure, use the Error Heatmap to further investigate how one or two input features impact the error rate across cohorts.
## Diagnosis of Errors
After identifying cohorts with higher error rates, Error Analysis enables debugging and exploring these cohorts further. Gain deeper insights about the model or the data through data exploration and model explanation. Different Methods for Error Diagnosis:
1. Data Exploration which explores dataset statistics and feature distributions. Compare cohort data stats with other cohorts or to benchmark data. Investigate whether certain cohorts are underrepresented or if their feature distribution is significantly different from the overall data.
2. Global Explanation which explore the top K important features that impact the overall model global explanation for a selected cohort of data. Understand how values of features impact model prediction. Compare explanations with those from other cohorts or benchmark.
3. Local Explanation which enables observing the raw data in the Instance View. Understand how each data point has correct or incorrect prediction. Visually identify any missing features or label noise that could lead to issues. Explore local feature importance values (local explanation) and individual conditional expectation (ICE) plots.
4. What-if analysis (Perturbation Exploration) which applies changes to feature values of selected data point and observe resulting changes to the prediction.
<a name="fairness dashboard"></a>
# Fairness Dashboard
Please refer to [Fairlearn](https://github.com/fairlearn/fairlearn)'s README and [user guide](https://fairlearn.github.io/v0.5.0/user_guide/index.html) to learn how you can assess and mitigate model's fairness issues. Once your model is trained, load the Fairness dashboard in your notebook to understand how your models predictions impact different groups (e.g., different ethnicities). Compare multiple models along different fairness and performance metrics.
## Setup and a single-model assessment
To assess a single models fairness and performance, the dashboard widget can be launched within a Jupyter notebook as follows:
```python
from raiwidgets import FairnessDashboard
# A_test containts your sensitive features (e.g., age, binary gender)
# y_true contains ground truth labels
# y_pred contains prediction labels
FairnessDashboard(sensitive_features=A_test,
y_true=Y_test.tolist(),
y_pred=[y_pred.tolist()])
```
Once you load the visualization dashboard, the widget walks the user through the assessment setup, where the user is asked to select
![Fairness Dashboard Sensitive Feature](./img/Fairness-Intro.png)
1. The sensitive feature of interest (e.g., ```binary gender``` or ```age```).
![Fairness Dashboard Fairness Metric](./img/Fairness-SensitiveMetric.png)
2. The performance metric (e.g., model precision) along which to evaluate the overall model performance.
![Fairness Dashboard Fairness Metric](./img/Fairness-PerformanceMetric.png)
3. The fairness metric (e.g., demographic parity ratio) along which to evaluate any disparities across groups.
![Fairness Dashboard Fairness Metric](./img/Fairness-FairnessMetric.png)
These selections are then used to obtain the visualization of the models impact on the subgroups. (e.g., one is interested to consider non-binary gender for fairness testing and selects "demographic parity ratio" as a metric of interest to see how females and males are selected to get a loan).
![Fairness Dashboard Fairness Assessment View 1](./img/Fairness-SelectionRate.png)
![Fairness Dashboard Fairness Assessment View 2](./img/Fairness-DisparityInPerformance.png)
## Comparing multiple models
The dashboard also enables comparison of multiple models, such as the models produced by different learning algorithms and different mitigation approaches, including Fairlearn's [GridSearch](https://fairlearn.github.io/v0.5.0/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch), [ExponentiatedGradient](https://fairlearn.github.io/v0.5.0/api_reference/fairlearn.reductions.html#fairlearn.reductions.ExponentiatedGradient), and [ThresholdOptimizer](https://fairlearn.github.io/v0.5.0/api_reference/fairlearn.postprocessing.html#fairlearn.postprocessing.ThresholdOptimizer).
As before, select the sensitive feature and the performance metric. The model comparison view then depicts the performance and disparity of all the provided models in a scatter plot. This allows the you to examine trade-offs between performance and fairness. Each of the dots can be clicked to open the assessment of the corresponding model. The figure below shows the model comparison view with ```binary gender``` selected as a sensitive feature and accuracy rate selected as the performance metric.
![Fairness Dashboard Model Comparison](./img/Fairness-ModelComparison.png)
<a name="supported models"></a>
# Supported Models
This interpretability and error analysis API supports models that are trained on datasets in Python `numpy.array`, `pandas.DataFrame`, `iml.datatypes.DenseData`, or `scipy.sparse.csr_matrix` format.
The explanation functions of [Interpret-Community](https://github.com/interpretml/interpret-community) accept both models and pipelines as input as long as the model or pipeline implements a `predict` or `predict_proba` function that conforms to the Scikit convention. If not compatible, you can wrap your model's prediction function into a wrapper function that transforms the output into the format that is supported (predict or predict_proba of Scikit), and pass that wrapper function to your selected interpretability techniques.
If a pipeline script is provided, the explanation function assumes that the running pipeline script returns a prediction. The repository also supports models trained via **PyTorch**, **TensorFlow**, and **Keras** deep learning frameworks.
<a name="getting started"></a>
# Getting Started
This repository uses Anaconda to simplify package and environment management.
To setup on your local machine:
<details><summary><strong><em>Install Python module, packages and necessary distributions</em></strong></summary>
```
pip install responsible-ai-widgets
```
If you intend to run repository tests:
```
pip install -r requirements.txt
```
</details>
<details>
<summary><strong><em>Set up and run Jupyter Notebook server </em></strong></summary>
Install and run Jupyter Notebook
```
if needed:
pip install jupyter
then:
jupyter notebook
```
</details>
This project provides responsible AI user interfaces for
[Fairlearn](https://fairlearn.github.io) and
[interpret-community](https://interpret.ml), as well as foundational building
blocks that they rely on.
These include
- a shared service layer which also maintains utilities to
determine the environment that it is running in so that it can configure the
local flask service accordingly.
- a base typescript library with common controls used across responsible AI
dashboards
For information on how to contribute please refer to our
[Contributor Guide](./CONTRIBUTING.md)
## Maintainers

Двоичные данные
img/EA-Heatmap.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 156 KiB

Двоичные данные
img/EA-TreeMap.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 160 KiB

Двоичные данные
img/Fairness-DisparityInPerformance.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 63 KiB

Двоичные данные
img/Fairness-FairnessMetric.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 98 KiB

Двоичные данные
img/Fairness-Intro.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 126 KiB

Двоичные данные
img/Fairness-ModelComparison.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 85 KiB

Двоичные данные
img/Fairness-PerformanceMetric.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 75 KiB

Двоичные данные
img/Fairness-SelectionRate.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 56 KiB

Двоичные данные
img/Fairness-SensitiveMetric.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 66 KiB

Двоичные данные
img/Interpretability-Cohorts.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичные данные
img/Interpretability-DatasetExplorer.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 208 KiB

Двоичные данные
img/Interpretability-GlobalExplanation.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 290 KiB

Двоичные данные
img/Interpretability-LocalExplanation.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 427 KiB

Двоичные данные
img/Interpretability-ModelPerformance.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 289 KiB

Двоичные данные
img/Interpretability-WhatIf.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 13 MiB

Просмотреть файл

@ -4,76 +4,40 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
"# Analyze Errors and Explore Interpretability of Models\n",
"\n",
"This notebook demonstrates how to use the Responsible AI Widget's Error Analysis dashboard to understand a model trained on the Breast Cancer dataset. The goal of this sample notebook is to classify breast cancer diagnosis with scikit-learn and explore model errors and explanations:\n",
"\n",
"1. Train a LightGBM classification model using Scikit-learn\n",
"2. Run Interpret-Community's 'explain_model' globally and locally to generate model explanations.\n",
"3. Visualize model errors and global and local explanations with the Error Analysis visualization dashboard."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explain binary classification model predictions\n",
"_**This notebook showcases how to use the interpret-community repo to help interpret and visualize predictions from a binary classification model.**_\n",
"\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Project](#Project)\n",
"1. [Run model explainer locally at training time](#Explain)\n",
" 1. Train a binary classification model\n",
" 1. Explain the model\n",
" 1. Generate global explanations\n",
" 1. Generate local explanations\n",
"1. [Visualize results](#Visualize)\n",
"1. [Next steps](#Next)"
"## Install Required Packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade interpret-community\n",
"%pip install --upgrade raiwidgets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Introduction'></a>\n",
"## 1. Introduction\n",
"## Explain\n",
"\n",
"This notebook illustrates how to locally use interpret-community to help interpret binary classification model predictions at training time. It demonstrates the API calls needed to obtain the global and local interpretations along with an interactive visualization dashboard for discovering patterns in data and explanations.\n",
"\n",
"Three tabular data explainers are demonstrated: \n",
"- TabularExplainer (SHAP)\n",
"- MimicExplainer (global surrogate)\n",
"- PFIExplainer.\n",
"\n",
"| ![Interpretability Toolkit Architecture](./img/interpretability-architecture.png) |\n",
"|:--:|\n",
"| *Interpretability Toolkit Architecture* |\n",
"\n",
"<a id='Project'></a> \n",
"## 2. Project\n",
"\n",
"The goal of this project is to classify breast cancer diagnosis with scikit-learn and locally running the model explainer:\n",
"\n",
"1. Train a SVM classification model using Scikit-learn\n",
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
"3. Visualize the global and local explanations with the visualization dashboard.\n",
"\n",
"<a id='Setup'></a>\n",
"## 3. Setup\n",
"\n",
"If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Explain'></a>\n",
"## 4. Run model explainer locally at training time"
"### Run model explainer at training time"
]
},
{
@ -83,29 +47,11 @@
"outputs": [],
"source": [
"from sklearn.datasets import load_breast_cancer\n",
"from sklearn import svm\n",
"from lightgbm import LGBMClassifier\n",
"\n",
"# Explainers:\n",
"# 1. SHAP Tabular Explainer\n",
"#from interpret.ext.blackbox import TabularExplainer\n",
"from interpret.ext.blackbox import TabularExplainer\n",
"\n",
"# OR\n",
"\n",
"# 2. Mimic Explainer\n",
"#from interpret.ext.blackbox import MimicExplainer\n",
"\n",
"# You can use one of the following four interpretable models as a global surrogate to the black box model\n",
"#from interpret.ext.glassbox import LGBMExplainableModel\n",
"#from interpret.ext.glassbox import LinearExplainableModel\n",
"#from interpret.ext.glassbox import SGDExplainableModel\n",
"#from interpret.ext.glassbox import DecisionTreeExplainableModel\n",
"\n",
"# OR\n",
"\n",
"# 3. PFI Explainer\n",
"#from interpret.ext.blackbox import PFIExplainer "
"# SHAP Tabular Explainer\n",
"from interpret.ext.blackbox import TabularExplainer"
]
},
{
@ -140,7 +86,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train a SVM classification model, which you want to explain"
"### Train a LightGBM classification model, which you want to explain"
]
},
{
@ -149,8 +95,6 @@
"metadata": {},
"outputs": [],
"source": [
"#clf = svm.SVC(gamma=0.001, C=100., probability=True)\n",
"#model = clf.fit(x_train, y_train)\n",
"clf = LGBMClassifier(n_estimators=1)\n",
"model = clf.fit(x_train, y_train)"
]
@ -172,39 +116,7 @@
"explainer = TabularExplainer(model, \n",
" x_train, \n",
" features=breast_cancer_data.feature_names, \n",
" classes=classes)\n",
"\n",
"\n",
"\n",
"\n",
"# 2. Using MimicExplainer\n",
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
"# explainer = MimicExplainer(model, \n",
"# x_train, \n",
"# LGBMExplainableModel, \n",
"# augment_data=True, \n",
"# max_num_of_augmentations=10, \n",
"# features=breast_cancer_data.feature_names, \n",
"# classes=classes)\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# 3. Using PFIExplainer\n",
"\n",
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
"# Note that if a metric function is provided a higher value must be better.\n",
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
"# Default metrics: \n",
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
"# Mean absolute error for regression\n",
"\n",
"# explainer = PFIExplainer(model, \n",
"# features=breast_cancer_data.feature_names, \n",
"# classes=classes)"
" classes=classes)"
]
},
{
@ -223,10 +135,7 @@
"source": [
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
"global_explanation = explainer.explain_global(x_test)\n",
"\n",
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
"global_explanation = explainer.explain_global(x_test)"
]
},
{
@ -290,7 +199,6 @@
"metadata": {},
"outputs": [],
"source": [
"# Note: Do not run this cell if using PFIExplainer, it does not support local explanations\n",
"# You can pass a specific data point or a group of data points to the explain_local function\n",
"\n",
"# E.g., Explain the first data point in the test set\n",
@ -318,9 +226,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Visualize'></a>\n",
"## 5. Visualize\n",
"Load the visualization dashboards"
"## Visualize\n",
"### [Optional] Load the interpretability visualization dashboard"
]
},
{
@ -329,8 +236,7 @@
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ExplanationDashboard\n",
"ExplanationDashboard(global_explanation, model, datasetX=x_test, true_y=y_test)"
"from raiwidgets import ExplanationDashboard"
]
},
{
@ -339,23 +245,34 @@
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ErrorAnalysisDashboard\n",
"ErrorAnalysisDashboard(global_explanation, model, dataset=x_test, true_y=y_test)"
"ExplanationDashboard(global_explanation, model, dataset=x_test, true_y=y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Next'></a>\n",
"## 6. Next steps\n",
"Learn about other use cases of the explain package on a:\n",
" \n",
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)"
"### Analyze model errors and explanations using Error Analysis dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ErrorAnalysisDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"ErrorAnalysisDashboard(global_explanation, model, dataset=x_test, true_y=y_test)"
]
}
],

Просмотреть файл

@ -4,76 +4,40 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
"# Analyze Errors and Explore Interpretability of Models\n",
"\n",
"This notebook demonstrates how to use the Responsible AI Widget's Error Analysis dashboard to understand a model trained on the Census dataset. The goal of this sample notebook is to classify breast cancer diagnosis with scikit-learn and explore model errors and explanations:\n",
"\n",
"1. Train a LightGBM classification model using Scikit-learn\n",
"2. Run Interpret-Community's 'explain_model' globally and locally to generate model explanations.\n",
"3. Visualize model errors and global and local explanations with the Error Analysis visualization dashboard."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analyze categorical binary classification model predictions\n",
"_**This notebook showcases how to use error analysis to help understand the errors in a binary classification model.**_\n",
"\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Project](#Project)\n",
"1. [Run model explainer locally at training time](#Explain)\n",
" 1. Train a binary classification model\n",
" 1. Explain the model\n",
" 1. Generate global explanations\n",
" 1. Generate local explanations\n",
"1. [Visualize results](#Analyze)\n",
"1. [Next steps](#Next)"
"## Install Required Packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade interpret-community\n",
"%pip install --upgrade raiwidgets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Introduction'></a>\n",
"## 1. Introduction\n",
"## Explain\n",
"\n",
"This notebook illustrates how to locally use interpret-community to help interpret binary classification model predictions at training time. It demonstrates the API calls needed to obtain the global and local interpretations along with an interactive visualization dashboard for discovering patterns in data and explanations.\n",
"\n",
"Three tabular data explainers are demonstrated: \n",
"- TabularExplainer (SHAP)\n",
"- MimicExplainer (global surrogate)\n",
"- PFIExplainer.\n",
"\n",
"| ![Interpretability Toolkit Architecture](./img/interpretability-architecture.png) |\n",
"|:--:|\n",
"| *Interpretability Toolkit Architecture* |\n",
"\n",
"<a id='Project'></a> \n",
"## 2. Project\n",
"\n",
"The goal of this project is to classify breast cancer diagnosis with scikit-learn and locally running the model explainer:\n",
"\n",
"1. Train a SVM classification model using Scikit-learn\n",
"2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.\n",
"3. Visualize the global and local explanations with the visualization dashboard.\n",
"\n",
"<a id='Setup'></a>\n",
"## 3. Setup\n",
"\n",
"If you are using Jupyter notebooks, the extensions should be installed automatically with the package.\n",
"If you are using Jupyter Labs run the following command:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Explain'></a>\n",
"## 4. Run model explainer locally at training time"
"### Run model explainer at training time"
]
},
{
@ -88,14 +52,7 @@
"import zipfile\n",
"from lightgbm import LGBMClassifier\n",
"\n",
"# Explainers:\n",
"# 1. SHAP Tabular Explainer\n",
"#from interpret.ext.blackbox import TabularExplainer\n",
"from interpret.ext.blackbox import TabularExplainer\n",
"\n",
"# OR\n",
"\n",
"# 2. Mimic Explainer\n",
"# Explainer Used: Mimic Explainer\n",
"from interpret.ext.blackbox import MimicExplainer\n",
"from interpret.ext.glassbox import LinearExplainableModel\n",
"from interpret.ext.glassbox import LGBMExplainableModel\n",
@ -241,41 +198,9 @@
"# 1. Using SHAP TabularExplainer\n",
"model_task = ModelTask.Classification\n",
"explainer = MimicExplainer(model, X_train_original, LGBMExplainableModel,\n",
" augment_data=True, max_num_of_augmentations=10,\n",
" features=features, classes=classes, model_task=model_task,\n",
" transformations=feat_pipe)\n",
"\n",
"\n",
"\n",
"\n",
"# 2. Using MimicExplainer\n",
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
"# explainer = MimicExplainer(model, \n",
"# x_train, \n",
"# LGBMExplainableModel, \n",
"# augment_data=True, \n",
"# max_num_of_augmentations=10, \n",
"# features=breast_cancer_data.feature_names, \n",
"# classes=classes)\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# 3. Using PFIExplainer\n",
"\n",
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
"# Note that if a metric function is provided a higher value must be better.\n",
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
"# Default metrics: \n",
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
"# Mean absolute error for regression\n",
"\n",
"# explainer = PFIExplainer(model, \n",
"# features=breast_cancer_data.feature_names, \n",
"# classes=classes)"
" augment_data=True, max_num_of_augmentations=10,\n",
" features=features, classes=classes, model_task=model_task,\n",
" transformations=feat_pipe)"
]
},
{
@ -294,10 +219,7 @@
"source": [
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
"global_explanation = explainer.explain_global(X_test_original)\n",
"\n",
"# Note: if you used the PFIExplainer in the previous step, use the next line of code instead\n",
"# global_explanation = explainer.explain_global(x_test, true_labels=y_test)"
"global_explanation = explainer.explain_global(X_test_original)"
]
},
{
@ -344,9 +266,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Analyze'></a>\n",
"## 5. Analyze\n",
"Analyze using Error Analysis Dashboard"
"## Analyze\n",
"### Analyze model errors and explanations using Error Analysis dashboard"
]
},
{
@ -373,21 +294,6 @@
" true_y=y_test, categorical_features=categorical_features,\n",
" true_y_dataset=y_test_full)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Next'></a>\n",
"## 6. Next steps\n",
"Learn about other use cases of the explain package on a:\n",
" \n",
"1. [Training time: regression problem](./explain-regression-local.ipynb)\n",
"1. [Training time: multiclass classification problem](./explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](./simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](./advanced-feature-transformations-explain-local.ipynb)"
]
}
],
"metadata": {
@ -411,7 +317,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.7.6"
}
},
"nbformat": 4,

Просмотреть файл

@ -6,7 +6,7 @@
"source": [
"# Using Fairness Dashboard with Loan Allocation Data\n",
"\n",
"This notebook shows how to use [`Fairlearn`](https://fairlearn.github.io/) and the Fairness dashboard to generate predictors for the Census dataset. This dataset is a classification problem - given a range of data about 32,000 individuals, predict whether their annual income is above or below fifty thousand dollars per year.\n",
"This notebook shows how to use [`Fairlearn`](https://fairlearn.github.io/) and the Responsible AI Widget's Fairness dashboard to generate predictors for the Census dataset. This dataset is a classification problem - given a range of data about 32,000 individuals, predict whether their annual income is above or below fifty thousand dollars per year.\n",
"\n",
"For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan.\n",
"\n",
@ -136,7 +136,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training a fairness-unaware predictor\n",
"## Train a fairness-unaware predictor\n",
"\n",
"To show the effect of `Fairlearn` we will first train a standard ML predictor that does not incorporate fairness For speed of demonstration, we use a simple logistic regression estimator from `sklearn`:"
]
@ -287,7 +287,6 @@
" dominant_all[name] = predictor.predict(X_test)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=y_test.tolist(),\n",
" y_pred=dominant_all)"
]
@ -300,13 +299,6 @@
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@ -325,7 +317,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
"version": "3.7.6"
}
},
"nbformat": 4,

Просмотреть файл

@ -4,36 +4,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"# Assess Fairness, Explore Interpretability, and Mitigate Fairness Issues \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assess Fairness and Mitigate Unfairness\n",
"## Overview of Tutorial\n",
"This notebook is Part 4 of a four part workshop that demonstrates how to use [InterpretML](interpret.ml) and [Fairlearn](fairlearn.org) (and their integrations with Azure Machine Learning) to understand and analyze models better. The different components of the workshop are as follows:\n",
"\n",
"- Part 1: [Interpretability with glassbox models (EBM)](https://github.com/microsoft/ResponsibleAI-Airlift/blob/main/Interpret/EBM/Interpretable%20Classification%20Methods.ipynb)\n",
"- Part 2: [Explain blackbox models with SHAP (and upload explanations to Azure Machine Learning)](https://github.com/microsoft/ResponsibleAI-Airlift/blob/main/Interpret/SHAP/explain-model-SHAP.ipynb)\n",
"- Part 3: [Run Interpretability on Azure Machine Learning](https://github.com/microsoft/ResponsibleAI-Airlift/blob/main/Interpret/SHAP/explain-model-Azure.ipynb)\n",
"- Part 4: [Model fairness assessment and unfairness mitigation](https://github.com/microsoft/ResponsibleAI-Airlift/blob/main/Fairness/AI-fairness-Census.ipynb) (HERE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"This notebook shows how to use `Fairlearn` and `InterpretML` and their visualizations dashboards to understand a binary classification model. The classification model has been trained on census data, which given a range of data about 32,000 individuals, predicts whether their annual income is above or below fifty thousand dollars per year.\n",
"This notebook demonstrates how to use [InterpretML](interpret.ml), [Fairlearn](fairlearn.org), and the Responsible AI Widget's Fairness and Interpretability dashboards to understand a model trained on the Census dataset. This dataset is a classification problem - given a range of data about 32,000 individuals, predict whether their annual income is above or below fifty thousand dollars per year.\n",
"\n",
"For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan.\n",
"\n",
"We will first train a fairness-unaware predictor and show that it leads to unfair decisions under a specific notion of fairness called *demographic parity*. We then mitigate unfairness by applying the `GridSearch` algorithm from `Fairlearn` package."
"We will first train a fairness-unaware predictor, load its global and local explanations, and use the interpretability and fairness dashboards to demonstrate how this model leads to unfair decisions (under a specific notion of fairness called *demographic parity*). We then mitigate unfairness by applying the `GridSearch` algorithm from `Fairlearn` package.\n"
]
},
{
@ -49,12 +26,9 @@
"metadata": {},
"outputs": [],
"source": [
"#%pip install --upgrade fairlearn==0.5.0\n",
"#%pip uninstall --yes interpret-community\n",
"#%pip install --upgrade interpret-community[visualization]\n",
"#%pip install --upgrade azureml-interpret\n",
"#%pip install --upgrade azureml-contrib-interpret\n",
"#%pip install --upgrade azureml-contrib-fairness"
"%pip install --upgrade fairlearn\n",
"%pip install --upgrade interpret-community\n",
"%pip install --upgrade raiwidgets"
]
},
{
@ -119,17 +93,6 @@
"X_raw[\"race\"].value_counts().to_dict()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#for feature_with_nan_value in ['native-country', 'workclass', 'occupation']:\n",
"# X_raw[feature_with_nan_value].cat.add_categories('N/A', inplace=True)\n",
"# X_raw[feature_with_nan_value].fillna('N/A', inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -344,7 +307,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can load this predictor into the Fairness dashboard, and examine how it is unfair (there is a warning about AzureML since we are not yet integrated with that product):"
"We can load this predictor into the Fairness dashboard, and examine how it is unfair:"
]
},
{
@ -365,7 +328,6 @@
"\n",
"y_pred = model.predict(X_test)\n",
"\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test,\n",
" y_true=y_test,\n",
" y_pred=y_pred)"
@ -384,7 +346,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mitigation with GridSearch\n",
"## Mitigation with Fairlearn (GridSearch)\n",
"\n",
"The `GridSearch` class in `Fairlearn` implements a simplified version of the exponentiated gradient reduction of [Agarwal et al. 2018](https://arxiv.org/abs/1803.02453). The user supplies a standard ML estimator, which is treated as a blackbox. `GridSearch` works by generating a sequence of relabellings and reweightings, and trains a predictor for each.\n",
"\n",
@ -431,7 +393,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the protected attribute; other potentially protected attributes will not be mitigated). In general, one might not want to do this, since there may be other considerations beyond the strict optimisation of error and disparity (of the given protected attribute)."
"We could load these predictors into the Fairness dashboard now. However, the plot would be somewhat confusing due to their number. In this case, we are going to remove the predictors which are dominated in the error-disparity space by others from the sweep (note that the disparity will only be calculated for the sensitive feature). In general, one might not want to do this, since there may be other considerations beyond the strict optimization of error and disparity (of the given protected attribute)."
]
},
{
@ -476,6 +438,8 @@
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import FairnessDashboard\n",
"\n",
"dashboard_all = {}\n",
"for name, predictor in all_models_dict.items():\n",
" value = predictor.predict(X_test_prep)\n",
@ -485,53 +449,11 @@
"for name, predictor in dominant_models_dict.items():\n",
" dominant_all[name] = predictor.predict(X_test_prep)\n",
"\n",
"FairnessDashboard(sensitive_features=sensitive_features_test,\n",
"FairnessDashboard(sensitive_features=sensitive_features_test, \n",
" y_true=y_test,\n",
" y_pred=dominant_all)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Using SHAP KernelExplainer\n",
"# model.steps[-1][1] returns the trained classification model\n",
"# winner_model = models_all['grid_6']\n",
"# explainer = KernelExplainer(winner_model, \n",
"# initialization_examples=X_train, \n",
"# features=X_raw.columns, \n",
"# classes=['Rejected', 'Approved'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#global_exp=explainer.explain_global(X_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#global_exp.get_feature_importance_dict()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#ExplanationDashboard(global_exp, winner_model, datasetX=X_train, trueY=Y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -547,220 +469,6 @@
"\n",
"By clicking on individual models on the plot, we can inspect their metrics for disparity and accuracy in greater detail. In a real example, we would then pick the model which represented the best trade-off between accuracy and disparity given the relevant business constraints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AzureML Integration\n",
"\n",
"We will now go through a brief example of the AzureML integration.\n",
"\n",
"The required package can be installed via:\n",
"\n",
"```\n",
"pip install azureml-contrib-fairness \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect To Workspace\n",
"\n",
"Just like in the previous tutorials, we will need to connect to a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py).\n",
"\n",
"The following code will allow you to create a workspace if you don't already have one created. You must have an Azure subscription to create a workspace:\n",
"\n",
"```python\n",
"from azureml.core import Workspace\n",
"ws = Workspace.create(name='myworkspace',\n",
" subscription_id='<azure-subscription-id>',\n",
" resource_group='myresourcegroup',\n",
" create_resource_group=True,\n",
" location='eastus2')\n",
"```\n",
"\n",
"**If you are running this on a Notebook VM, you can import the existing workspace.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Registering Models\n",
"\n",
"The fairness dashboard is designed to integrate with registered models, so we need to do this for the models we want in the Studio portal. The assumption is that the names of the models specified in the dashboard dictionary correspond to the `id`s (i.e. `<name>:<version>` pairs) of registered models in the workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we register each of the models in the `dashboard_predicted` dictionary into the workspace. For this, we have to save each model to a file, and then register that file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import joblib\n",
"import os\n",
"from azureml.core import Model, Experiment, Run\n",
"\n",
"os.makedirs('models', exist_ok=True)\n",
"def register_model(name, model):\n",
" print(\"Registering \", name)\n",
" model_path = f\"models/{name}.pkl\"\n",
" joblib.dump(value=model, filename=model_path)\n",
" registered_model = Model.register(model_path=model_path,\n",
" model_name=name,\n",
" workspace=ws)\n",
" print(\"Registered \", registered_model.id)\n",
" return registered_model.id\n",
"\n",
"model_name_id_mapping = dict()\n",
"for name, model in dashboard_all.items():\n",
" m_id = register_model(name, model)\n",
" model_name_id_mapping[name] = m_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, produce new predictions dictionaries, with the updated names:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dashboard_all_ids = dict()\n",
"for name, y_pred in dashboard_all.items():\n",
" dashboard_all_ids[model_name_id_mapping[name]] = y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading a dashboard\n",
"\n",
"We create a _dashboard dictionary_ using Fairlearn's `metrics` package. The `_create_group_metric_set` method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available), and we must specify the type of prediction. Note that we use the `dashboard_registered` dictionary we just created:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = { 'sex': sensitive_features_test.sex, 'race': sensitive_features_test.race }\n",
"\n",
"from fairlearn.metrics._group_metric_set import _create_group_metric_set\n",
"\n",
"\n",
"dash_dict_all = _create_group_metric_set(y_true=y_test,\n",
" predictions=dashboard_all_ids,\n",
" sensitive_features=sf,\n",
" prediction_type='binary_classification')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we import our `contrib` package which contains the routine to perform the upload:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can create an Experiment, then a Run, and upload our dashboard to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(ws, \"Fairlearn_InterpretML_Census_Demo\")\n",
"print(exp)\n",
"\n",
"run = exp.start_logging()\n",
"try:\n",
" dashboard_title = \"Upload MultiAsset from Grid Search with Census Data Notebook\"\n",
" upload_id = upload_dashboard_dictionary(run,\n",
" dash_dict_all,\n",
" dashboard_name=dashboard_title)\n",
" print(\"\\nUploaded to id: {0}\\n\".format(upload_id))\n",
"\n",
" downloaded_dict = download_dashboard_by_upload_id(run, upload_id)\n",
" \n",
" \n",
"finally:\n",
" run.complete()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading explanations\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.interpret.explanation.explanation_client import ExplanationClient\n",
"\n",
"\n",
"client = ExplanationClient.from_run(run)\n",
"client.upload_model_explanation(global_explanation, comment = \"census data global explanation\")"
]
}
],
"metadata": {
@ -779,7 +487,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
"version": "3.7.6"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -0,0 +1,377 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using Interpretability Dashboard with Employee Attrition Data\n",
"\n",
"This notebook illustrates creating explanations for a binary classification model, employee attrition classification, that uses one to one and one to many feature transformations from raw data to engineered features. It will showcase raw feature transformations with three tabular data explainers: TabularExplainer (SHAP), MimicExplainer (global surrogate), and PFIExplainer.\n",
"\n",
"Problem: Employee attrition classification with scikit-learn (run model explainer locally)\n",
"\n",
"1. Transform raw features to engineered features.\n",
"2. Train a classification model using Scikit-learn.\n",
"3. Run 'explain_model' globally and locally with full dataset.\n",
"4. Visualize the global and local explanations with the interpretability visualization dashboard."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install Required Packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade interpret-community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After installing packages, you must close and reopen the notebook as well as restarting the kernel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explain\n",
"\n",
"### Run model explainer at training time"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from lightgbm import LGBMClassifier\n",
"import pandas as pd\n",
"import numpy as np\n",
"from urllib.request import urlretrieve\n",
"import zipfile\n",
"\n",
"# SHAP Tabular Explainer\n",
"from interpret.ext.blackbox import TabularExplainer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the employee attrition data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"outdirname = 'dataset.6.21.19'\n",
"zipfilename = outdirname + '.zip'\n",
"urlretrieve('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
"with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
" unzip.extractall('.')\n",
"attritionData = pd.read_csv('./WA_Fn-UseC_-HR-Employee-Attrition.csv')\n",
"\n",
"# Dropping Employee count as all values are 1 and hence attrition is independent of this feature\n",
"attritionData = attritionData.drop(['EmployeeCount'], axis=1)\n",
"# Dropping Employee Number since it is merely an identifier\n",
"attritionData = attritionData.drop(['EmployeeNumber'], axis=1)\n",
"attritionData = attritionData.drop(['Over18'], axis=1)\n",
"\n",
"# Since all values are 80\n",
"attritionData = attritionData.drop(['StandardHours'], axis=1)\n",
"\n",
"# Converting target variables from string to numerical values\n",
"target_map = {'Yes': 0, 'No': 1}\n",
"attritionData[\"Attrition_numerical\"] = attritionData[\"Attrition\"].apply(lambda x: target_map[x])\n",
"target = attritionData[\"Attrition_numerical\"]\n",
"\n",
"attritionXData = attritionData.drop(['Attrition_numerical', 'Attrition'], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Split data into train and test\n",
"from sklearn.model_selection import train_test_split\n",
"x_train, x_test, y_train, y_test = train_test_split(attritionXData, \n",
" target, \n",
" test_size=0.2,\n",
" random_state=0,\n",
" stratify=target)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Creating dummy columns for each categorical feature\n",
"categorical = []\n",
"for col, value in attritionXData.iteritems():\n",
" if value.dtype == 'object':\n",
" categorical.append(col)\n",
" \n",
"# Store the numerical columns in a list numerical\n",
"numerical = attritionXData.columns.difference(categorical) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transform raw features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.compose import ColumnTransformer\n",
"\n",
"# We create the preprocessing pipelines for both numeric and categorical data.\n",
"numeric_transformer = Pipeline(steps=[\n",
" ('imputer', SimpleImputer(strategy='median')),\n",
" ('scaler', StandardScaler())])\n",
"\n",
"categorical_transformer = Pipeline(steps=[\n",
" ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),\n",
" ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n",
"\n",
"transformations = ColumnTransformer(\n",
" transformers=[\n",
" ('num', numeric_transformer, numerical),\n",
" ('cat', categorical_transformer, categorical)])\n",
"\n",
"# Append classifier to preprocessing pipeline.\n",
"# Now we have a full prediction pipeline.\n",
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
" ('classifier', LGBMClassifier())])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train a LightGBM classification model, which you want to explain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = clf.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Explain your model predictions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 1. Using SHAP TabularExplainer\n",
"# clf.steps[-1][1] returns the trained classification model\n",
"explainer = TabularExplainer(clf.steps[-1][1], \n",
" initialization_examples=x_train, \n",
" features=attritionXData.columns, \n",
" classes=['Leaving', 'Staying'], \n",
" transformations=transformations)\n",
"\n",
"\n",
"# 2. Using MimicExplainer\n",
"# augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model. Useful for high-dimensional data where the number of rows is less than the number of columns. \n",
"# max_num_of_augmentations is optional and defines max number of times we can increase the input data size.\n",
"# LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel\n",
"# explainer = MimicExplainer(clf.steps[-1][1], \n",
"# x_train, \n",
"# LGBMExplainableModel, \n",
"# augment_data=True, \n",
"# max_num_of_augmentations=10, \n",
"# features=attritionXData.columns, \n",
"# classes=[\"Leaving\", \"Staying\"], \n",
"# transformations=transformations)\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# 3. Using PFIExplainer\n",
"\n",
"# Use the parameter \"metric\" to pass a metric name or function to evaluate the permutation. \n",
"# Note that if a metric function is provided a higher value must be better.\n",
"# Otherwise, take the negative of the function or set the parameter \"is_error_metric\" to True.\n",
"# Default metrics: \n",
"# F1 Score for binary classification, F1 Score with micro average for multiclass classification and\n",
"# Mean absolute error for regression\n",
"\n",
"# explainer = PFIExplainer(clf.steps[-1][1], \n",
"# features=x_train.columns, \n",
"# transformations=transformations,\n",
"# classes=[\"Leaving\", \"Staying\"])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate global explanations\n",
"Explain overall model predictions (global explanation)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data\n",
"# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate\n",
"global_explanation = explainer.explain_global(x_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print out a dictionary that holds the sorted feature importance names and values\n",
"print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate local explanations\n",
"Explain local data points (individual instances)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can pass a specific data point or a group of data points to the explain_local function\n",
"# E.g., Explain the first data point in the test set\n",
"instance_num = 1\n",
"local_explanation = explainer.explain_local(x_test[:instance_num])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Get the prediction for the first member of the test set and explain why model made that prediction\n",
"prediction_value = clf.predict(x_test)[instance_num]\n",
"\n",
"sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]\n",
"sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('local importance values: {}'.format(sorted_local_importance_values))\n",
"print('local importance names: {}'.format(sorted_local_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize\n",
"Load the interpretability visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from raiwidgets import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, model, dataset=x_test, true_y=y_test.values)"
]
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}