documentation migration from aka.ms/econml (#640)

* initial commit for aka.ms/econml doc migration * fix rst error * update motivating examples * fix links * add intro to causal inference * formatting * update copyright * finishing touches * update gitignore * host images locally * avoid buggy sphinx version Co-authored-by: Fabio Vera <fabiovera@microsoft.com>
2022-07-27 11:07:39 -04:00 · 2022-07-27 11:07:39 -04:00 · b4832c0120
--- a/.gitignore
+++ b/.gitignore
@ -10,7 +10,6 @@ __pycache__/
 *.log
 *.out
 *.synctex.gz
-*.pdf

 # C extensions
 *.so
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@ -35,6 +35,7 @@ jobs:
        foreach ($file in $editedFiles) {
          switch -Wildcard ($file) {
            "README.md" { Continue }
+            ".gitignore" { Continue }
            "econml/_version.py" { Continue }
            "prototypes/*" { Continue }
            "images/*" { Continue }
@ -70,7 +71,7 @@ jobs:
      - script: 'pip install git+https://github.com/slundberg/shap.git@d1d2700acc0259f211934373826d5ff71ad514de'
        displayName: 'Install specific version of shap'  

-      - script: 'pip install sphinx sphinx_rtd_theme'
+      - script: 'pip install sphinx!=5.1.0 sphinx_rtd_theme'
        displayName: 'Install sphinx'

      - script: 'python setup.py build_sphinx -W'
--- a/doc/Causal-Inference-User-Guide-v4-022520.pdf
+++ b/doc/Causal-Inference-User-Guide-v4-022520.pdf
--- a/doc/conf.py
+++ b/doc/conf.py
@ -21,7 +21,7 @@ sys.path.insert(0, os.path.abspath('econml'))
 # -- Project information -----------------------------------------------------

 project = 'econml'
-copyright = '2019, Microsoft Research'
+copyright = '2022, Microsoft Research'
 author = 'Microsoft Research'
 version = econml.__version__
 release = econml.__version__
@ -119,7 +119,7 @@ html_theme_options = {
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
 # html_static_path = ['_static']
-html_extra_path = ['map.svg']
+html_extra_path = ['map.svg', 'Causal-Inference-User-Guide-v4-022520.pdf', "spec/img"]

 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.
--- a/doc/spec/causal_intro.rst
+++ b/doc/spec/causal_intro.rst
@ -0,0 +1,10 @@
+Introduction to Causal Inference
+=================================
+
+If you are new to causal inference, it may be helpful to walk through a quick overview of concepts and techniques that we refer to over the course of the documentation. Below we provide a high level introduction to causal inference tailored for EconML:
+
+.. raw:: html
+
+    <iframe src="../Causal-Inference-User-Guide-v4-022520.pdf" width="700" height="388"> </iframe>
+
+The folks at DoWhy also have a broader introduction `here <https://causalinference.gitlab.io/kdd-tutorial/>`__.
--- a/doc/spec/faq.rst
+++ b/doc/spec/faq.rst
@ -0,0 +1,77 @@
+Frequently Asked Questions (FAQ)
+====================================================================
+
+When should I use EconML?
+--------------------------
+
+EconML is designed to answer causal questions: what will happen in response to some change in behavior, 
+prices, or conditions? These questions require different methods than forecasting questions: 
+what will happen next if everything continues as it has been?
+
+
+What are the advantages of EconML?
+-----------------------------------
+
+EconML offers the broadest range of cutting-edge AI models designed specifically to answer causal questions. 
+The EconML models also build on familiar Python packages, allowing users to easily select the best model for their question. 
+Finally, EconML includes custom interpreters to create presentation-ready output.
+
+
+How do I know if the results make sense?
+----------------------------------------
+
+Try comparing the consistency of your estimates across multiple models, including some that make
+stronger structural assumptions like linear relationships and some that do not. Pay attention to the 
+standard errors as well as the point estimates—imprecise estimates should be interpreted accordingly. 
+While researchers can introduce bias by narrowly fishing for estimates that match their prior, it is also important
+to use your expertise to evaluate results. If you estimate that a 5% decrease in price generates
+an implausible 5000% increase in sales you should carefully review your code!
+
+I'm getting causal estimates that don't make sense. What next?
+----------------------------------------------------------------
+First carefully check your code for errors and try several causal models. 
+If your estimates are consistent, but implausible, you may have a confounding variable that hasn’t been measured in your data.
+Think carefully about the source of the data you are using: was there something unusual going on 
+during the period when the data were collected (for example a holiday or an economic downturn)?
+Is there something unusual about your sample (for example, all men with pre-existing heart conditions)?
+
+
+What if I don't have a good instrument, can't run an experiment, and don't observe all confounders?
+------------------------------------------------------------------------------------------------------------
+In this case, no statistical approach can perfectly isolate the causal effect of the treatment on the outcome. 
+DML, OrthoForest, or MetaLearners, all including all the confounders you can observe, 
+will deliver the best approximation of the causal effect that minimizes the bias from confounders. 
+Be aware of some remaining bias when using these estimates.
+
+
+How can I test whether I'm identifying the causal effect?
+------------------------------------------------------------
+You are identifying a valid causal effect if and only if the underlying assumptions of the causal model
+assumed by the estimation routine are correct. Those are often hard to test (though the `DoWhy <https://py-why.github.io/dowhy/>`__ package may help).
+Having made those assumptions, the EconML package allows you to fit the best causal model you can.
+Many models will store a final stage fit metric that can be used to validate how well the causal model predicts out of sample, 
+which is a good diagnostic as to the quality of your model.
+
+
+How do I give feedback?
+------------------------------------
+
+This project welcomes contributions and suggestions. Most contributions require you to agree to
+a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, 
+grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
+
+
+When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., label, comment). 
+Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
+
+
+This project has adopted the Microsoft Open Source Code of Conduct. 
+For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
+
+
+
+
+
+
+
--- a/doc/spec/img/Attribution.png
+++ b/doc/spec/img/Attribution.png
--- a/doc/spec/img/Recommendation.png
+++ b/doc/spec/img/Recommendation.png
--- a/doc/spec/img/Segmentation.png
+++ b/doc/spec/img/Segmentation.png
--- a/doc/spec/img/imgFamiliar.png
+++ b/doc/spec/img/imgFamiliar.png
--- a/doc/spec/img/imgFlexible.png
+++ b/doc/spec/img/imgFlexible.png
--- a/doc/spec/img/imgUnified.png
+++ b/doc/spec/img/imgUnified.png
--- a/doc/spec/motivation.rst
+++ b/doc/spec/motivation.rst
@ -31,55 +31,76 @@ python API.
 Motivating Examples
 ===================

-Customer Targeting
------------------
+EconML is designed to measure the causal effect of some treatment variable(s) T on an outcome variable Y, controlling for a set of features X. Use cases include:

-An important problem in modern business analytics is building automated tools to prioritize customer
-acquisition and personalize customer interactions to increase sales and revenue. Typically businesses
-will offer personalized incentives to customers to increase spend or increase the level of
-engagement via more human resources. Any such personalized intervention corresponds to a monetary
-investment and the main question that business analytics are called to answer is: what is the return
-on investment (ROI)? 
+Recommendation A/B testing
+-----------------------------

-Analyzing the ROI is inherently a treatment effect question: what was the effect of any investment
-on a particular customer on its spend? Understanding how these return on investment varies across
-customers can enable more targeted investment policies and increased ROI via better targeting. Using historical
-data from deployed investments, and estimating the heterogeneous treatment effect via any of
-the proposed methods, business analysts can learn in an automated manner, data-driven
-customer targeting and prioritization policies.
+*Interpret experiments with imperfect compliance*

-Personalized Pricing
--------------------
+.. image:: img/Recommendation.png
+  :alt: Recommendation A/B testing logo

-Personalized discounts have become very widespread in the digital economy. To set the optimal
-personalized discount policy a business needs to understand what is the effect
-of a drop in price on the demand of a customer for a product as a function of customer
-characteristics. The estimation of such personalized demand elasticities can also be
-phrased in the language of heterogeneous treatment effects, where the treatment 
-is the price (or typically log of price) on the demand (or typically log of demand)
-as a function of observable features of the customer. Hence, estimation of heterogeneous
-treatment effects can lead to optimal pricing policies.
+**Question**: A travel website would like to know whether joining a membership program
+causes users to spend more time engaging with the website. 
+
+**Problem**: They can’t look directly at existing data, comparing members and non-members,
+because the customers who chose to become members are likely already more engaged than other users.
+Nor can they run a direct A/B test because they can’t force users to sign up for membership. 
+
+**Solution**: The company had run an earlier experiment to test the value of a new,
+faster sign-up process. EconML’s DRIV estimator uses this experimental nudge towards membership
+as an instrument that generates random variation in the likelihood of membership. 
+The DRIV model adjusts for the fact that not every customer who was offered the easier sign-up
+became a member and returns the effect of membership rather than the effect of receiving the quick sign-up.
+
+Link to jupyter notebook: 
+`Recommendation A/B Testing <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Recommendation%20AB%20Testing%20at%20An%20Online%20Travel%20Company.ipynb>`__
+
+More details:
+`Trip Advisor Case Study <https://www.microsoft.com/en-us/research/uploads/prod/2020/04/MSR_ALICE_casestudy_2020.pdf>`__


-Stratification in Clinical Trials
----------------------------------------
+Customer Segmentation
+----------------------

-Which patients should be selected for a clinical trial? If we want to demonstrate
-that a clinical treatment has an effect on at least some subset of a population, then
-fully randomized clinical trials are inappropriate as they will solely estimate
-average effects. Using heterogeneous treatment effect techniques, we can use
-observational data to come up with estimates of these effects and identify
-good candidate patients for a clinical trial that our model estimates have high
-treatment effects.
+*Estimate individualized responses to incentives*

-Learning Click-Through-Rates
----------------------------
+.. image:: img/Segmentation.png
+  :alt: Customer Segmentation logo

-In the design of a page layout and more importantly in ad placement, it is important
-to understand the click-through-rate of page components (e.g. ads) on different positions
-of a page. Even though the modern approach is to run multiple A/B tests, when such
-page components involve revenue considerations (such as ad placement), then observational
-data can help guide correct A/B tests to run. Heterogeneous treatment effect estimation
-can provide estimates of the click-through-rate of page components from
-observational data. In this setting, the treatment is simply whether the component is
-placed on that page position and the response is whether the user clicked on it.
+**Question**: A media subscription service would like to offer targeted discounts
+through a personalized pricing plan. 
+
+**Problem**: They observe many features of their customers,
+but are not sure which customers will respond most to a lower price. 
+
+**Solution**: EconML’s DML estimator uses price variations in existing data, 
+along with a rich set of user features, to estimate heterogeneous price sensitivities
+that vary with multiple customer features. 
+The tree interpreter provides a presentation-ready summary of the key features
+that explain the biggest differences in responsiveness to a discount.
+
+Link to jupyter notebook: 
+`Customer Segmentation <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Customer%20Segmentation%20at%20An%20Online%20Media%20Company.ipynb>`__.
+
+Multi-investment Attribution
+-----------------------------
+*Distinguish the effects of multiple outreach efforts*
+
+.. image:: img/Attribution.png
+  :alt: Multi-investment Attribution logo
+
+**Question**: A startup would like to know the most effective approach for recruiting new customers: 
+price discounts, technical support to ease adoption, or a combination of the two. 
+
+**Problem**: The risk of losing customers makes experiments across outreach efforts too expensive. 
+So far, customers have been offered incentives strategically, 
+for example larger businesses are more likely to get technical support. 
+
+**Solution**: EconML’s Doubly Robust Learner model jointly estimates the effects of multiple discrete treatments. 
+The model uses flexible functions of observed customer features to filter out confounding correlations
+in existing data and deliver the causal effect of each effort on revenue.
+
+Link to jupyter notebook: 
+`Multi-investment Attribution <https://github.com/microsoft/EconML/blob/main/notebooks/CustomerScenarios/Case%20Study%20-%20Multi-investment%20Attribution%20at%20A%20Software%20Company.ipynb>`__.
--- a/doc/spec/overview.rst
+++ b/doc/spec/overview.rst
@ -0,0 +1,32 @@
+Overview
+=========
+
+EconML is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
+
+EconML is open source software developed by the `ALICE <https://www.microsoft.com/en-us/research/project/alice/>`__ team at Microsoft Research.
+
+.. raw:: html
+
+    <p></p>
+    <div class="ms-grid " style = "text-align: left; box-sizing: border-box; display: block; margin-left: auto; margin-right: auto; max-width: 1600px; position: relative; padding-left: 0; padding-right: 0; width: 100%;">
+            <div class="ms-row" style = "text-align: left; box-sizing: border-box; -webkit-box-align: stretch; align-items: stretch; display: flex; flex-wrap: wrap; margin-left: 3px; margin-right: 3px;">
+                    <div class="m-col-8-24 x-hidden-focus" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
+                    <p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656358 aligncenter x-hidden-focus" src="../imgFlexible.png" alt="Flexible icon" width="92" height="92"></p><p style="text-align: center"><b>Flexible</b></p><p class="x-hidden-focus">Allows for flexible model forms that do not impose strong assumptions, including models of heterogenous responses to treatment.</p><p>	</p></div>
+            <div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
+            <p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656355 aligncenter" src="../imgUnified.png" alt="Unified icon" width="92" height="92"></p><p style="text-align: center"><b>Unified</b></p><p>Broad set of methods representing latest advances in the econometrics and machine learning literature within a unified API.</p><p>	</p></div>
+            <div class="m-col-8-24" style = "text-align: left; box-sizing: border-box; float: left; margin: 0; padding-left: 1vw; padding-right: 1vw; position: relative; width: 33.33333%;">
+            <p style="text-align:center;"><img loading="lazy" class="size-full wp-image-656352 aligncenter" src="../imgFamiliar.png" alt="Familiar icon" width="92" height="92"></p><p style="text-align: center"><b>Familiar Interface</b></p><p class="x-hidden-focus">Built on standard Python packages for machine learning and data analysis.</p><p>	</p></div>
+        <p></p>		</div>
+        </div>
+
+**Why causality?**
+
+Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.
+
+**Why not just a vanilla machine learning solution?**
+
+Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior. 
+
+**Why causal machine learning/EconML?**
+
+Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data. 
--- a/doc/spec/spec.rst
+++ b/doc/spec/spec.rst
@ -1,19 +1,10 @@
 EconML User Guide
 =================

-Causal machine learning applies the power of machine learning techniques to answer causal questions.  
-
-* Decision-makers need estimates of causal impacts to answer what-if questions about shifts in policy - such as changes in product pricing for businesses or new treatments for health professionals.
-
-* Most current machine learning tools are designed to forecast what will happen next under the present strategy, but cannot be interpreted to predict the effects of particular changes in behavior. 
-
-* Existing solutions to answer what-if questions are expensive. Decision-makers can engage in active experimentation like A/B testing or employ highly trained economists who use traditional statistical models to infer causal effects from previously collected data. 
-
-The EconML Python SDK, developed by the ALICE team at MSR New England, incorporates individual machine learning steps into interpretable causal models. By reducing the need for expert judgment, these innovations improve the reliability of what-if predictions and empower data scientists without extensive economic training to conduct causal analysis using existing data. 
-
-
 .. toctree::
+    overview
    motivation
+    causal_intro
    api
    flowchart
    comparison
@ -23,6 +14,7 @@ The EconML Python SDK, developed by the ALICE team at MSR New England, incorpora
    inference
    interpretability
    references
+    faq

 .. todo::
    benchmark