Граф коммитов

58 Коммитов

Автор SHA1 Сообщение Дата
Maggie Hei a9330c0c56
Change the summary table format (#407)
* change the summary table format
* fix the bug of can't pickle InferenceResults instance
* assign the right column names in dowhy wrapper when input is pandas dataframe
2021-02-19 20:33:34 -05:00
Miruna Oprescu a4ba61cb5a
Fixed column input names bugs. (#398)
* Fixes #375, fixes #376
* Added tests
2021-02-19 12:52:15 -05:00
Maggie Hei f3e46f4097
Enable calling dowhy from econml (#400)
* add feature of calling dowhy through econml

* fix bug for shap when parse const_marginal_effect
2021-02-11 11:34:37 -05:00
Julian Aronowitz a53d8bdc7a
grammar fix (#364)
deleted an extra 'the'

Co-authored-by: Keith Battocchi <kebatt@microsoft.com>
2021-01-19 10:03:07 -05:00
vsyrgkanis a27fa3cdaa
Vasilis/docs (#370)
* restructured modules into folders. Deprecate warning for old naming conventions.

* re-structured public module in docs

* removed automl for now from docs until import is fixed

* fixed wording in hyperparm tuning

* fixed reference to hoenst forest in docs

* added verbosity to bootstrap

* mvoed bootstrap to private
2021-01-19 08:50:10 -05:00
Miruna Oprescu fb3484615c
Add details on choosing first stage models (#372)
* Add details on choosing first stage models

* Modified docs, README and added a new notebook
2021-01-15 13:14:45 -05:00
vsyrgkanis 35c5418618
Averate Treatment Effect methods to all estimators (#365)
* added ate inference methods
2021-01-14 10:39:08 -05:00
vsyrgkanis f0b0e5b9f3
Enabling summary() even when inference not available (#363)
* enable summary inference with stderr = None
2021-01-13 12:31:51 -05:00
vsyrgkanis 1dd73c73d0
Deprecate `n_splits` with `cv` (#362)
* deprecated n_splits with cv
2021-01-12 18:38:21 -05:00
vsyrgkanis b5c25ccb52
RScorer class for causal model selection (#361)
* added rscorer for model selection

* added readme on model selection
2021-01-12 07:15:47 -05:00
vsyrgkanis 9a9687558b
Refitting `model_final` and nuisance averaging (#360)
* Support refitting only final model in DML after changing estimator parameters

* Add support for monte carlo nuisance estimation, with multiple k-fold draws.

* added rlearner residuals_ property that returns fitted residuals for training data (fixes #350)

* fixed flaky cate interpreter test

* added refit example in the dml notebook
2021-01-10 17:26:09 -05:00
vsyrgkanis ce2f2b54e1
some small fixes to the debiased lasso (#358)
* some small fixes to the debiased lasso

* added parallelism across rows of design matrix to run each lassocv in parallel. added n_jobs param to debiased lasso and to sparselineardml

* added n_jobs to multioutput debiasedlasso

* added separate options for alpha for the covariance matrxi estimation

* added extra alpha options in sparselineardrlearner and sparsellineardml
2021-01-09 08:56:09 -05:00
vsyrgkanis bb042d541d
Cython implementation of GRF and CausalForestDML (#341)
* added backend option in orf, adding verbosity, restructuring static functions

* added cython grf module that implements generalized random forests

* added cuthon version of causal forest and causal forest dml

* deprecating older CausalForest

* updates to CF and ORF notebook

* restructured dml into folder. Deprecated ForestDML by CausalForestDML. 

* Removed two legacy files in our main folder.

* deprecating ensemble.SubsampledHonestForest

* made drlearner use the non dprecated regression forest. 

* Enable setuptools build process

* fixed flaky random_state test

* fixed tests and api consistency

* updated tables and library flow chart

* enforce sklearn 0.24.

* fixed _cross_val_predict

* added option for max background samples to shap make computation more reasonable

* fixed error_score param in gcvlist due to sklearn upgrade

* added shap cells in DML notebook

* added shap values to GRF notebook

* fixed bug in the way input_feature_names where used in summary. enabled shap to use input featurenames

* updated readme. removed autoreload from noteoboks

* added shap specific notebook

* updated dowhy notebook
2021-01-08 22:29:56 -05:00
Maggie Hei 3df959d120
add shap value features on each estimator (depends on master branch of shap) (#336)
* add shap value features on each estimator

* fix setup dependency

* set a upper bound for sklearn dependency
2020-12-25 15:10:41 -05:00
vsyrgkanis cbb5948e4e
Added GridSearchCV list that can help auto select among multiple models (#328)
* added gridsearchcv list that can help auto select among multiple models for first stages.
2020-12-16 16:45:23 -05:00
Miruna Oprescu e55b093ec6
Retain column names from input dataframes/series (#337)
* Retain column names from input dataframes/series

Functions like `cate_feature_names`, `summary`, `population_summary`
display the input column names if inputs are pandas DataFrames or
Series.
In this commit:
* changed cate_estimator, ortho_learner and inference classes to set
  input names upon fit
* moved input verification to the base class for some estimators
* modified customer scenario notebooks to reflect changes
* added tests

Co-authored-by: Keith Battocchi <kebatt@microsoft.com>
2020-12-15 22:13:44 -05:00
vsyrgkanis 5cc93a13e3
Update README.md (#320)
* Update README.md
2020-11-17 17:32:19 -05:00
vsyrgkanis 839c2253f3
Vasilis/orf speed (#316)
* orf speedup by moving pointwise effect outside of class

* removed the one hot encoding from the nuisance and parameter estimators in discrete ORF and passed directly the one hot encoding as T. Removed cross-fitting for Y_hat in the first stage from discrete ORF, which was there by mistake. Removed the creation of split_indices if split_indices is None, but we are in the first stage, since we are not doing cross fitting. Removed the use of np.insert, as it was slower than setting a slice to a constant.

* fixed bug in code that removes first stage cross-fitting. Replaced polynomialfeatures(degree=1, include_bias=True) with np.hstack

* added the option of global residulization to the continuous treatment ortho forest. This now replicates exactly the grf functionality. Added some missing tests regarding shape of output of orf and fixed some bad shapes according to API for column y or column t. Added tests for the global residualization. Replaced polynomial fit trasnform in second stage param func with hstack.

* added causal forest module that is enabled via the global residualization option of continuoustreatment orf. Added all tests and notebook examples

* added inference to BLBInference so that all ortho forests and causal forests have effect_inference.

* orthoforest notebook added inference.

* fixed shape api in inference of ortho forest

* replaced super(class, obj) with just super()

* changed ORF names to DMLOrthoForest and DROrthoForest and deprecated old names with warning. Made Regwrapper private

* using stratified kfold in dmlorthoforest when discrete_treatment=True both in local and global residualization. Also adding check that all treatments are represented in nuisance estimator, similar to the drorthoforest.
2020-11-17 15:02:09 -05:00
Keith Battocchi 022de3513d Pass fit arguments by keyword 2020-11-17 00:49:14 -05:00
Keith Battocchi 43894368b9 Switch examples to 'auto' inference 2020-11-12 16:23:23 -05:00
Keith Battocchi 2a7f540624 Update old DML alias usages 2020-11-12 16:23:23 -05:00
vsyrgkanis 61cd136636
Enabled feature_importances_ for our ForestDML and ForestDRLearner estimators (#306)
This required changing the subsampled honest forest code a bit so that it does not alter the arrays of the tree structures of sklearn but rather stores two additional arrays required for prediction. This does add around 1.5 times the original running time, so makes it slightly slower due to the extra memory allocation.

However this enables correct feature_importance calculation and also in the future correct SHAP calculation (fixes #297), as now the tree entries are consistent with a tree in a randomforestregressor and so shap logic can be applied if we recast the subsampled honest forest as a randomforestregressor (additivity of shap will still be violated since the prediction of the subsample honest forest is not just the aggregation of the predictions across the trees but more complex weighted average). But we can still call shap and still get meaningful shap numbers. One discrepancy is that shap is explaining a different value that what effect returns, since it explains the value that corresponds to the average of the predictions of each honest tree regressor. however, the prediction of an honest forest is not the average of the tree predictions. For a full solution to this small discrepancy, one would need a full re-working of Shap's tree explainer and the tree explainer algorithm to account for such alternative aggregations of tree predictors.

* changed subsampledhonest forest to not alter the entries of each tree but rather create auxiliary numpy arrays that store the numerator and denominator of every node. This enables consistent feature_importance calculation and also potentially more accurate shap_values calcualtion.

* added feature improtances in dr learner example notebook

* added feature_importances_ to DML example notebook

* enabled feature_importances_ for forestDML and forestDRLearner as an attribute

* fixed doctest in subsample honest forest which was producing old feature_importances_. Added tests that the feature_importances_ API is working in test_drlearner and test_dml.

* Transformed sparse matrices to dense matrices after dot product in parallel_add_trees_ of ensemble.py. This leads to 6 fold speed-up as we were doing many slicing operations to sparse matrices before, which are very slow!
2020-11-09 16:37:29 -05:00
Miruna Oprescu 8026439010
Added new customer scenario notebooks (#290)
* Added new customer scenario notebook. 
* Changed images in the other customer scenario notebooks.
2020-10-20 16:03:00 -04:00
Keith Battocchi 9d53a89568 Fix broken link 2020-08-11 16:04:44 -04:00
Keith Battocchi 5b981e172d Remove balanced class weights from metalearner notebook 2020-08-11 16:04:44 -04:00
Miruna Oprescu 62f0be674c
Moprescu/econml dowhy customer scenarios (#255)
* Added recommendation A/B testing with EconML+DoWhy

* Added customer segmentation notebook for econml and dowhy

* Updated notebooks to use DoWhy version 0.4

Co-authored-by: Maggie Hei <mehei@microsoft.com>
2020-06-09 11:46:21 -04:00
Maggie Hei 4eca7bd71c
Add two customer scenarios case study (#230) 2020-03-06 22:55:13 -05:00
Maggie Hei 922ecbfeaf
Update readme, add example for interpreter and inference (#224)
* update README, add example for interpreter and inference
2020-02-25 17:49:16 -05:00
Maggie Hei f42251e274
Mehei/otherinferences (#203)
* add analytical effect/marginal effect/constant marginal effect inferences for DML and DRLearner
* add coefficient inference and intercept inference for linear final model
* add population summary inference given dataset X
2020-02-14 19:25:16 -05:00
v-keacqu e72422d423
AutomatedML for EconML (#213)
Enable easy use of Auto ML by wrapping any estimator
2020-02-14 16:36:21 -05:00
Miruna Oprescu 70cd6d3145
Added `blb` inference option to the OrthoForest (#214)
* Added `blb` inference option to the OrthoForest

* Added Bootstrap of Little Bags inference to the ORF classes
* Added tests and updated notebook
* Fixed the marginal effect shape when T is a vector
* Fixed bugs and reorganized class functionality

* Addressed PR comments
2020-02-07 18:20:06 -05:00
vasilismsr dbcbae5ee8 Vasilis/cate interpreters (#177)
Added CATE interpreters and notebooks
2019-12-07 01:49:06 -05:00
vasilismsr f7cf669083
Added Non Parametric DML with the weighting trick and ForestDML, ForestDRLearner with CIs (#170)
Added a SubsampledHonestForest scikit-learn extension, which is a regression forest that implements honesty and instead of bootstrap, performs subsampling to construct each tree. It also offers predict_interval via the bootstrap of little bags approach and the asymptotic normal characterization of the prediction estimate.

Added NonParamDMLCateEstimator, which is essentially another meta-learner that has an arbitrary final stage that supports fit and predict (albeit fit must accept sample_weight). This is based on the observation that, when treatment is single-dimensional or binary one can view the RLearner problem as a weighted regression.

Added ForestDMLCateEstimator (which is essentially a causal forest implemented slightly differently via viewing it as a weighted non-parametric regression and piggy backing on scikit-learn tree construction) and has bootstrap of little bags based inference. This is essentially a NonParamDMLCateEstimator with a SubsampledHonestForest final model.

Also added ForestDRLearner, which uses the doubly robust approach and uses an honest forest for each pseudo outcome regression. This also offers non-parametric confidence intervals to the Doubly Robust estimation classes. This is essentially a DRLearner with a SubsampledHonestForest final model.

Side additions:

re-organized inference class hierarchy to make most of code re-use.
added monte_carlo folder and monte_carlo experiments for LinearDMLCateEstimator and SubsampledHonestForest
2019-12-04 18:17:34 -05:00
Keith Battocchi 9b6c939e61
Add fit_cate_intercept to DML, rework feature generation (#174)
Add fit_cate_intercept to DML, rework feature generation
2019-11-21 18:31:24 -05:00
Maggie Hei 818c8320d6 automate the first stage model T and update DML notebook (#172)
* automate the first stage model T and update DML notebook
* Changed model defaults in ORF and fixed a bug in WeightedKFold
2019-11-21 17:58:34 -05:00
Maggie Hei b7e826e268
support multi treatment in meta learners (#141)
* support multi treatment in meta learners
2019-11-13 11:58:22 -05:00
vasilismsr 72a7022d99
Vasilis/drlearner (#137)
This PR creates the DRLearner class which replaces the DoublyRobustLearner from the metalearners. Many updates to it:
* crosffitting (made DRLearner child of _OrthoLearner)
* confidence intervals (LinearDRLearner and asymptotically normal based intervals)
* access to nuisance and final models for interpretability and debugging
* support for multiple treatments

Detailed commits:

* separating drlearner

* removed doubly robust learner from metalearners

* revmoed drlearner test from metalearners and started new test file for drlearner.

* changed tests in dml to adhere to the CATE API. Some calls to effect had T0 and T1 as non keyword arguments. They should be. A bug fix in treatment expansion mixin revealed this mistake.

* added exhaustive tests for drlearner. 

* bug in docstring regarding how split is called to generate crossfit folds

* changed bootstarp tests to conform with keyword only effect and effect_interval

* bootstrap tests, fixing bugs related to positional arguments T0, T1

* added checks of fitted dims for W and Z during scoring in _OrthoLearner. 

* docstring fix regarding n_splits in multiple places. Fixed notebook for metalearners to call the new doubly robust learner. 

* changed statsmodels inference input properties to reflect that we only support a specific subset of covariance types

* added comment on overlapping tests between DML and DRLearner

* added TODO for allowing for 2d y of shape (n,1) and also added test that T can also be a vector

* added TODO so that we merge functionality between statsmodelsinference and statsmodelsinferencediscrete

* fixed docstring in dml. Added utility function of inverse_onehot encoding with corresponding test and used that function in the nuisances in DMLCateEstimator and DRLearner. Made nuisances a keyword argument with no default in DRLearner.

* made statsmodelslinearregression be child of BaseEstimator

* added comment on code design choice in model_final of drlearner, related to mulitask model_final
2019-11-11 13:29:07 -05:00
vasilismsr 5bf6dc7557
Fixing bug with DiscreteTreatmentOrthoForest effect shapes (#146)
* fixed ortho forest effect function bug. The FunctionTransformer was not doing the one hot encoding at all. Once the one hot encoding was added, things work correctly.

* fixed orthoforest notebook. Dimension related bug
2019-11-10 14:30:33 -05:00
Keith Battocchi 6d99f9d6e1
Migrate notebooks to new-style inference (#128) 2019-11-05 17:51:00 -05:00
Keith Battocchi 9b33b10e2a Improve treatment expansion 2019-10-29 16:52:07 -04:00
Keith Battocchi 3e216bd957 Add statsmodels inference to LinearDMLCateEstimator 2019-10-29 16:52:07 -04:00
Keith Battocchi f77a4e52a2 Refactor DML classes to enable cross-fit later 2019-10-29 16:52:07 -04:00
Keith Battocchi 0f0d1ec702
Enable deep IV to be used with vectors (#59)
* Enable deep IV to be used with vectors

* Reduce training epochs in Deep IV tests
2019-05-31 17:31:08 -04:00
Maggie Hei 07dee11b09
add discrete treatment example (#58) 2019-05-06 14:01:12 -04:00
Keith Battocchi 296fabb383
Reorder effect arguments; allow scalar treatments (#49)
Reorder effect arguments; allow scalar treatments
2019-05-03 10:41:57 -04:00
Maggie Hei c09692c6bf
Mehei/scoring (#44)
* add scoring function
* add score example on dml notebook
2019-05-02 20:34:27 -04:00
GregLewis 986f0e905b Deep iv notebook edits (#50)
Improved example and formatting
2019-05-02 18:41:04 -04:00
Keith Battocchi 288f39e0fb Enable setting all Keras fit options (to allow callbacks, etc.) 2019-05-01 12:06:33 -04:00
Greg Lewis df7e1c99cc Hell yeah, I'm doing this! 2019-04-30 19:26:01 -04:00
Keith Battocchi 3bf88b5277
Create Deep IV notebook (#32) 2019-04-10 20:09:20 -04:00