documentation migration from aka.ms/econml (#640)

* initial commit for aka.ms/econml doc migration

* fix rst error

* update motivating examples

* fix links

* add intro to causal inference

* formatting

* update copyright

* finishing touches

* update gitignore

* host images locally

* avoid buggy sphinx version

Co-authored-by: Fabio Vera <fabiovera@microsoft.com>
This commit is contained in:
fverac 2022-07-27 11:07:39 -04:00 коммит произвёл GitHub
Родитель a0e21bb231
Коммит b4832c0120
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
15 изменённых файлов: 190 добавлений и 58 удалений

1
.gitignore поставляемый
Просмотреть файл

@ -10,7 +10,6 @@ __pycache__/
*.log
*.out
*.synctex.gz
*.pdf
# C extensions
*.so

Просмотреть файл

@ -35,6 +35,7 @@ jobs:
foreach ($file in $editedFiles) {
switch -Wildcard ($file) {
"README.md" { Continue }
".gitignore" { Continue }
"econml/_version.py" { Continue }
"prototypes/*" { Continue }
"images/*" { Continue }
@ -70,7 +71,7 @@ jobs:
- script: 'pip install git+https://github.com/slundberg/shap.git@d1d2700acc0259f211934373826d5ff71ad514de'
displayName: 'Install specific version of shap'
- script: 'pip install sphinx sphinx_rtd_theme'
- script: 'pip install sphinx!=5.1.0 sphinx_rtd_theme'
displayName: 'Install sphinx'
- script: 'python setup.py build_sphinx -W'

Двоичные данные
doc/Causal-Inference-User-Guide-v4-022520.pdf Normal file

Двоичный файл не отображается.

Просмотреть файл

@ -21,7 +21,7 @@ sys.path.insert(0, os.path.abspath('econml'))
# -- Project information -----------------------------------------------------
project = 'econml'
copyright = '2019, Microsoft Research'
copyright = '2022, Microsoft Research'
author = 'Microsoft Research'
version = econml.__version__
release = econml.__version__
@ -119,7 +119,7 @@ html_theme_options = {
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']
html_extra_path = ['map.svg']
html_extra_path = ['map.svg', 'Causal-Inference-User-Guide-v4-022520.pdf', "spec/img"]
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.

10
doc/spec/causal_intro.rst Normal file
Просмотреть файл

@ -0,0 +1,10 @@
Introduction to Causal Inference
=================================
If you are new to causal inference, it may be helpful to walk through a quick overview of concepts and techniques that we refer to over the course of the documentation. Below we provide a high level introduction to causal inference tailored for EconML:
.. raw:: html
<iframe src="../Causal-Inference-User-Guide-v4-022520.pdf" width="700" height="388"> </iframe>
The folks at DoWhy also have a broader introduction `here <https://causalinference.gitlab.io/kdd-tutorial/>`__.

77
doc/spec/faq.rst Normal file
Просмотреть файл

@ -0,0 +1,77 @@
Frequently Asked Questions (FAQ)
====================================================================
When should I use EconML?
--------------------------
EconML is designed to answer causal questions: what will happen in response to some change in behavior,
prices, or conditions? These questions require different methods than forecasting questions:
what will happen next if everything continues as it has been?
What are the advantages of EconML?
-----------------------------------
EconML offers the broadest range of cutting-edge AI models designed specifically to answer causal questions.
The EconML models also build on familiar Python packages, allowing users to easily select the best model for their question.
Finally, EconML includes custom interpreters to create presentation-ready output.
How do I know if the results make sense?
----------------------------------------
Try comparing the consistency of your estimates across multiple models, including some that make
stronger structural assumptions like linear relationships and some that do not. Pay attention to the
standard errors as well as the point estimates—imprecise estimates should be interpreted accordingly.
While researchers can introduce bias by narrowly fishing for estimates that match their prior, it is also important
to use your expertise to evaluate results. If you estimate that a 5% decrease in price generates
an implausible 5000% increase in sales you should carefully review your code!
I'm getting causal estimates that don't make sense. What next?
----------------------------------------------------------------
First carefully check your code for errors and try several causal models.
If your estimates are consistent, but implausible, you may have a confounding variable that hasnt been measured in your data.
Think carefully about the source of the data you are using: was there something unusual going on
during the period when the data were collected (for example a holiday or an economic downturn)?
Is there something unusual about your sample (for example, all men with pre-existing heart conditions)?
What if I don't have a good instrument, can't run an experiment, and don't observe all confounders?
------------------------------------------------------------------------------------------------------------
In this case, no statistical approach can perfectly isolate the causal effect of the treatment on the outcome.
DML, OrthoForest, or MetaLearners, all including all the confounders you can observe,
will deliver the best approximation of the causal effect that minimizes the bias from confounders.
Be aware of some remaining bias when using these estimates.
How can I test whether I'm identifying the causal effect?
------------------------------------------------------------
You are identifying a valid causal effect if and only if the underlying assumptions of the causal model
assumed by the estimation routine are correct. Those are often hard to test (though the `DoWhy <https://py-why.github.io/dowhy/>`__ package may help).
Having made those assumptions, the EconML package allows you to fit the best causal model you can.
Many models will store a final stage fit metric that can be used to validate how well the causal model predicts out of sample,
which is a good diagnostic as to the quality of your model.
How do I give feedback?
------------------------------------
This project welcomes contributions and suggestions. Most contributions require you to agree to
a Contributor License Agreement (CLA) declaring that you have the right to, and actually do,
grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment).
Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Двоичные данные
doc/spec/img/Attribution.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 85 KiB

Двоичные данные
doc/spec/img/Recommendation.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 20 KiB

Двоичные данные
doc/spec/img/Segmentation.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 33 KiB

Двоичные данные
doc/spec/img/imgFamiliar.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 2.4 KiB

Двоичные данные
doc/spec/img/imgFlexible.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 2.5 KiB

Двоичные данные
doc/spec/img/imgUnified.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 2.6 KiB

Просмотреть файл

@ -31,55 +31,76 @@ python API.
Motivating Examples
===================
Customer Targeting
------------------
EconML is designed to measure the causal effect of some treatment variable(s) T on an outcome variable Y, controlling for a set of features X. Use cases include:
An important problem in modern business analytics is building automated tools to prioritize customer
acquisition and personalize customer interactions to increase sales and revenue. Typically businesses
will offer personalized incentives to customers to increase spend or increase the level of
engagement via more human resources. Any such personalized intervention corresponds to a monetary
investment and the main question that business analytics are called to answer is: what is the return
on investment (ROI)?
Recommendation A/B testing
-----------------------------
Analyzing the ROI is inherently a treatment effect question: what was the effect of any investment
on a particular customer on its spend? Understanding how these return on investment varies across
customers can enable more targeted investment policies and increased ROI via better targeting. Using historical
data from deployed investments, and estimating the heterogeneous treatment effect via any of
the proposed methods, business analysts can learn in an automated manner, data-driven
customer targeting and prioritization policies.
*Interpret experiments with imperfect compliance*
Personalized Pricing
--------------------
.. image:: img/Recommendation.png
:alt: Recommendation A/B testing logo
Personalized discounts have become very widespread in the digital economy. To set the optimal
personalized discount policy a business needs to understand what is the effect
of a drop in price on the demand of a customer for a product as a function of customer
characteristics. The estimation of such personalized demand elasticities can also be
phrased in the language of heterogeneous treatment effects, where the treatment
is the price (or typically log of price) on the demand (or typically log of demand)
as a function of observable features of the customer. Hence, estimation of heterogeneous
treatment effects can lead to optimal pricing policies.
**Question**: A travel website would like to know whether joining a membership program
causes users to spend more time engaging with the website.
**Problem**: They cant look directly at existing data, comparing members and non-members,
because the customers who chose to become members are likely already more engaged than other users.
Nor can they run a direct A/B test because they cant force users to sign up for membership.
**Solution**: The company had run an earlier experiment to test the value of a new,
faster sign-up process. EconMLs DRIV estimator uses this experimental nudge towards membership
as an instrument that generates random variation in the likelihood of membership.
The DRIV model adjusts for the fact that not every customer who was offered the easier sign-up
became a member and returns the effect of membership rather than the effect of receiving the quick sign-up.
Link to jupyter notebook:
`Recommendation A/B Testing <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Recommendation%20AB%20Testing%20at%20An%20Online%20Travel%20Company.ipynb>`__
More details:
`Trip Advisor Case Study <https://www.microsoft.com/en-us/research/uploads/prod/2020/04/MSR_ALICE_casestudy_2020.pdf>`__
Stratification in Clinical Trials
----------------------------------------
Customer Segmentation
----------------------
Which patients should be selected for a clinical trial? If we want to demonstrate
that a clinical treatment has an effect on at least some subset of a population, then
fully randomized clinical trials are inappropriate as they will solely estimate
average effects. Using heterogeneous treatment effect techniques, we can use
observational data to come up with estimates of these effects and identify
good candidate patients for a clinical trial that our model estimates have high
treatment effects.
*Estimate individualized responses to incentives*
Learning Click-Through-Rates
----------------------------
.. image:: img/Segmentation.png
:alt: Customer Segmentation logo
In the design of a page layout and more importantly in ad placement, it is important
to understand the click-through-rate of page components (e.g. ads) on different positions
of a page. Even though the modern approach is to run multiple A/B tests, when such
page components involve revenue considerations (such as ad placement), then observational
data can help guide correct A/B tests to run. Heterogeneous treatment effect estimation
can provide estimates of the click-through-rate of page components from
observational data. In this setting, the treatment is simply whether the component is
placed on that page position and the response is whether the user clicked on it.
**Question**: A media subscription service would like to offer targeted discounts
through a personalized pricing plan.
**Problem**: They observe many features of their customers,
but are not sure which customers will respond most to a lower price.
**Solution**: EconMLs DML estimator uses price variations in existing data,
along with a rich set of user features, to estimate heterogeneous price sensitivities
that vary with multiple customer features.
The tree interpreter provides a presentation-ready summary of the key features
that explain the biggest differences in responsiveness to a discount.
Link to jupyter notebook:
`Customer Segmentation <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Customer%20Segmentation%20at%20An%20Online%20Media%20Company.ipynb>`__.
Multi-investment Attribution
-----------------------------
*Distinguish the effects of multiple outreach efforts*
.. image:: img/Attribution.png
:alt: Multi-investment Attribution logo
**Question**: A startup would like to know the most effective approach for recruiting new customers:
price discounts, technical support to ease adoption, or a combination of the two.
**Problem**: The risk of losing customers makes experiments across outreach efforts too expensive.
So far, customers have been offered incentives strategically,
for example larger businesses are more likely to get technical support.
**Solution**: EconMLs Doubly Robust Learner model jointly estimates the effects of multiple discrete treatments.
The model uses flexible functions of observed customer features to filter out confounding correlations
in existing data and deliver the causal effect of each effort on revenue.
Link to jupyter notebook:
`Multi-investment Attribution <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Multi-investment%20Attribution%20at%20A%20Software%20Company.ipynb>`__.

32
doc/spec/overview.rst Normal file
Просмотреть файл

@ -0,0 +1,32 @@
Overview
=========
EconML is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
EconML is open source software developed by the `ALICE <https://www.microsoft.com/en-us/research/project/alice/>`__ team at Microsoft Research.
.. raw:: html
<p></p>
<div class="ms-grid " style = "text-align: left; box-sizing: border-box; display: block; margin-left: auto; margin-right: auto; max-width: 1600px; position: relative; padding-left: 0; padding-right: 0; width: 100%;">
<div class="ms-row" style = "text-align: left; box-sizing: border-box; -webkit-box-align: stretch; align-items: stretch; display: flex; flex-wrap: wrap; margin-left: 3px; margin-right: 3px;">
<div class="m-col-8-24 x-hidden-focus" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656358 aligncenter x-hidden-focus" src="../imgFlexible.png" alt="Flexible icon" width="92" height="92"></p><p style="text-align: center"><b>Flexible</b></p><p class="x-hidden-focus">Allows for flexible model forms that do not impose strong assumptions, including models of heterogenous responses to treatment.</p><p> </p></div>
<div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656355 aligncenter" src="../imgUnified.png" alt="Unified icon" width="92" height="92"></p><p style="text-align: center"><b>Unified</b></p><p>Broad set of methods representing latest advances in the econometrics and machine learning literature within a unified API.</p><p> </p></div>
<div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
<p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656352 aligncenter" src="../imgFamiliar.png" alt="Familiar icon" width="92" height="92"></p><p style="text-align: center"><b>Familiar Interface</b></p><p class="x-hidden-focus">Built on standard Python packages for machine learning and data analysis.</p><p> </p></div>
<p></p> </div>
</div>
**Why causality?**
Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.
**Why not just a vanilla machine learning solution?**
Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior.
**Why causal machine learning/EconML?**
Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data.

Просмотреть файл

@ -1,19 +1,10 @@
EconML User Guide
=================
Causal machine learning applies the power of machine learning techniques to answer causal questions.
* Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.
* Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior.
* Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data.
The EconML Python SDK, developed by the ALICE team at MSR New England, incorporates individual machine learning steps into interpretable causal models. By reducing the need for expert judgment, these innovations improve the reliability of what-if predictions and empower data scientists without extensive economic training to conduct causal analysis using existing data.
.. toctree::
overview
motivation
causal_intro
api
flowchart
comparison
@ -23,6 +14,7 @@ The EconML Python SDK, developed by the ALICE team at MSR New England, incorpora
inference
interpretability
references
faq
.. todo::
benchmark