EconML

Граф коммитов

Автор	SHA1	Сообщение	Дата
Maggie Hei	a9330c0c56	Change the summary table format (#407 ) * change the summary table format * fix the bug of can't pickle InferenceResults instance * assign the right column names in dowhy wrapper when input is pandas dataframe	2021-02-19 20:33:34 -05:00
Miruna Oprescu	a4ba61cb5a	Fixed column input names bugs. (#398 ) * Fixes #375, fixes #376 * Added tests	2021-02-19 12:52:15 -05:00
Maggie Hei	f3e46f4097	Enable calling dowhy from econml (#400 ) * add feature of calling dowhy through econml * fix bug for shap when parse const_marginal_effect	2021-02-11 11:34:37 -05:00
Julian Aronowitz	a53d8bdc7a	grammar fix (#364 ) deleted an extra 'the' Co-authored-by: Keith Battocchi <kebatt@microsoft.com>	2021-01-19 10:03:07 -05:00
vsyrgkanis	a27fa3cdaa	Vasilis/docs (#370 ) * restructured modules into folders. Deprecate warning for old naming conventions. * re-structured public module in docs * removed automl for now from docs until import is fixed * fixed wording in hyperparm tuning * fixed reference to hoenst forest in docs * added verbosity to bootstrap * mvoed bootstrap to private	2021-01-19 08:50:10 -05:00
Miruna Oprescu	fb3484615c	Add details on choosing first stage models (#372 ) * Add details on choosing first stage models * Modified docs, README and added a new notebook	2021-01-15 13:14:45 -05:00
vsyrgkanis	35c5418618	Averate Treatment Effect methods to all estimators (#365 ) * added ate inference methods	2021-01-14 10:39:08 -05:00
vsyrgkanis	f0b0e5b9f3	Enabling summary() even when inference not available (#363 ) * enable summary inference with stderr = None	2021-01-13 12:31:51 -05:00
vsyrgkanis	1dd73c73d0	Deprecate `n_splits` with `cv` (#362 ) * deprecated n_splits with cv	2021-01-12 18:38:21 -05:00
vsyrgkanis	b5c25ccb52	RScorer class for causal model selection (#361 ) * added rscorer for model selection * added readme on model selection	2021-01-12 07:15:47 -05:00
vsyrgkanis	9a9687558b	Refitting `model_final` and nuisance averaging (#360 ) * Support refitting only final model in DML after changing estimator parameters * Add support for monte carlo nuisance estimation, with multiple k-fold draws. * added rlearner residuals_ property that returns fitted residuals for training data (fixes #350) * fixed flaky cate interpreter test * added refit example in the dml notebook	2021-01-10 17:26:09 -05:00
vsyrgkanis	ce2f2b54e1	some small fixes to the debiased lasso (#358 ) * some small fixes to the debiased lasso * added parallelism across rows of design matrix to run each lassocv in parallel. added n_jobs param to debiased lasso and to sparselineardml * added n_jobs to multioutput debiasedlasso * added separate options for alpha for the covariance matrxi estimation * added extra alpha options in sparselineardrlearner and sparsellineardml	2021-01-09 08:56:09 -05:00
vsyrgkanis	bb042d541d	Cython implementation of GRF and CausalForestDML (#341 ) * added backend option in orf, adding verbosity, restructuring static functions * added cython grf module that implements generalized random forests * added cuthon version of causal forest and causal forest dml * deprecating older CausalForest * updates to CF and ORF notebook * restructured dml into folder. Deprecated ForestDML by CausalForestDML. * Removed two legacy files in our main folder. * deprecating ensemble.SubsampledHonestForest * made drlearner use the non dprecated regression forest. * Enable setuptools build process * fixed flaky random_state test * fixed tests and api consistency * updated tables and library flow chart * enforce sklearn 0.24. * fixed _cross_val_predict * added option for max background samples to shap make computation more reasonable * fixed error_score param in gcvlist due to sklearn upgrade * added shap cells in DML notebook * added shap values to GRF notebook * fixed bug in the way input_feature_names where used in summary. enabled shap to use input featurenames * updated readme. removed autoreload from noteoboks * added shap specific notebook * updated dowhy notebook	2021-01-08 22:29:56 -05:00
Maggie Hei	3df959d120	add shap value features on each estimator (depends on master branch of shap) (#336 ) * add shap value features on each estimator * fix setup dependency * set a upper bound for sklearn dependency	2020-12-25 15:10:41 -05:00
vsyrgkanis	cbb5948e4e	Added GridSearchCV list that can help auto select among multiple models (#328 ) * added gridsearchcv list that can help auto select among multiple models for first stages.	2020-12-16 16:45:23 -05:00
Miruna Oprescu	e55b093ec6	Retain column names from input dataframes/series (#337 ) * Retain column names from input dataframes/series Functions like `cate_feature_names`, `summary`, `population_summary` display the input column names if inputs are pandas DataFrames or Series. In this commit: * changed cate_estimator, ortho_learner and inference classes to set input names upon fit * moved input verification to the base class for some estimators * modified customer scenario notebooks to reflect changes * added tests Co-authored-by: Keith Battocchi <kebatt@microsoft.com>	2020-12-15 22:13:44 -05:00
vsyrgkanis	5cc93a13e3	Update README.md (#320 ) * Update README.md	2020-11-17 17:32:19 -05:00
vsyrgkanis	839c2253f3	Vasilis/orf speed (#316 ) * orf speedup by moving pointwise effect outside of class * removed the one hot encoding from the nuisance and parameter estimators in discrete ORF and passed directly the one hot encoding as T. Removed cross-fitting for Y_hat in the first stage from discrete ORF, which was there by mistake. Removed the creation of split_indices if split_indices is None, but we are in the first stage, since we are not doing cross fitting. Removed the use of np.insert, as it was slower than setting a slice to a constant. * fixed bug in code that removes first stage cross-fitting. Replaced polynomialfeatures(degree=1, include_bias=True) with np.hstack * added the option of global residulization to the continuous treatment ortho forest. This now replicates exactly the grf functionality. Added some missing tests regarding shape of output of orf and fixed some bad shapes according to API for column y or column t. Added tests for the global residualization. Replaced polynomial fit trasnform in second stage param func with hstack. * added causal forest module that is enabled via the global residualization option of continuoustreatment orf. Added all tests and notebook examples * added inference to BLBInference so that all ortho forests and causal forests have effect_inference. * orthoforest notebook added inference. * fixed shape api in inference of ortho forest * replaced super(class, obj) with just super() * changed ORF names to DMLOrthoForest and DROrthoForest and deprecated old names with warning. Made Regwrapper private * using stratified kfold in dmlorthoforest when discrete_treatment=True both in local and global residualization. Also adding check that all treatments are represented in nuisance estimator, similar to the drorthoforest.	2020-11-17 15:02:09 -05:00
Keith Battocchi	022de3513d	Pass fit arguments by keyword	2020-11-17 00:49:14 -05:00
Keith Battocchi	43894368b9	Switch examples to 'auto' inference	2020-11-12 16:23:23 -05:00
Keith Battocchi	2a7f540624	Update old DML alias usages	2020-11-12 16:23:23 -05:00
vsyrgkanis	61cd136636	Enabled feature_importances_ for our ForestDML and ForestDRLearner estimators (#306 ) This required changing the subsampled honest forest code a bit so that it does not alter the arrays of the tree structures of sklearn but rather stores two additional arrays required for prediction. This does add around 1.5 times the original running time, so makes it slightly slower due to the extra memory allocation. However this enables correct feature_importance calculation and also in the future correct SHAP calculation (fixes #297), as now the tree entries are consistent with a tree in a randomforestregressor and so shap logic can be applied if we recast the subsampled honest forest as a randomforestregressor (additivity of shap will still be violated since the prediction of the subsample honest forest is not just the aggregation of the predictions across the trees but more complex weighted average). But we can still call shap and still get meaningful shap numbers. One discrepancy is that shap is explaining a different value that what effect returns, since it explains the value that corresponds to the average of the predictions of each honest tree regressor. however, the prediction of an honest forest is not the average of the tree predictions. For a full solution to this small discrepancy, one would need a full re-working of Shap's tree explainer and the tree explainer algorithm to account for such alternative aggregations of tree predictors. * changed subsampledhonest forest to not alter the entries of each tree but rather create auxiliary numpy arrays that store the numerator and denominator of every node. This enables consistent feature_importance calculation and also potentially more accurate shap_values calcualtion. * added feature improtances in dr learner example notebook * added feature_importances_ to DML example notebook * enabled feature_importances_ for forestDML and forestDRLearner as an attribute * fixed doctest in subsample honest forest which was producing old feature_importances_. Added tests that the feature_importances_ API is working in test_drlearner and test_dml. * Transformed sparse matrices to dense matrices after dot product in parallel_add_trees_ of ensemble.py. This leads to 6 fold speed-up as we were doing many slicing operations to sparse matrices before, which are very slow!	2020-11-09 16:37:29 -05:00
Miruna Oprescu	8026439010	Added new customer scenario notebooks (#290 ) * Added new customer scenario notebook. * Changed images in the other customer scenario notebooks.	2020-10-20 16:03:00 -04:00
Keith Battocchi	9d53a89568	Fix broken link	2020-08-11 16:04:44 -04:00
Keith Battocchi	5b981e172d	Remove balanced class weights from metalearner notebook	2020-08-11 16:04:44 -04:00
Miruna Oprescu	62f0be674c	Moprescu/econml dowhy customer scenarios (#255 ) * Added recommendation A/B testing with EconML+DoWhy * Added customer segmentation notebook for econml and dowhy * Updated notebooks to use DoWhy version 0.4 Co-authored-by: Maggie Hei <mehei@microsoft.com>	2020-06-09 11:46:21 -04:00
Maggie Hei	4eca7bd71c	Add two customer scenarios case study (#230 )	2020-03-06 22:55:13 -05:00
Maggie Hei	922ecbfeaf	Update readme, add example for interpreter and inference (#224 ) * update README, add example for interpreter and inference	2020-02-25 17:49:16 -05:00
Maggie Hei	f42251e274	Mehei/otherinferences (#203 ) * add analytical effect/marginal effect/constant marginal effect inferences for DML and DRLearner * add coefficient inference and intercept inference for linear final model * add population summary inference given dataset X	2020-02-14 19:25:16 -05:00
v-keacqu	e72422d423	AutomatedML for EconML (#213 ) Enable easy use of Auto ML by wrapping any estimator	2020-02-14 16:36:21 -05:00
Miruna Oprescu	70cd6d3145	Added `blb` inference option to the OrthoForest (#214 ) * Added `blb` inference option to the OrthoForest * Added Bootstrap of Little Bags inference to the ORF classes * Added tests and updated notebook * Fixed the marginal effect shape when T is a vector * Fixed bugs and reorganized class functionality * Addressed PR comments	2020-02-07 18:20:06 -05:00
vasilismsr	dbcbae5ee8	Vasilis/cate interpreters (#177 ) Added CATE interpreters and notebooks	2019-12-07 01:49:06 -05:00
vasilismsr	f7cf669083	Added Non Parametric DML with the weighting trick and ForestDML, ForestDRLearner with CIs (#170 ) Added a SubsampledHonestForest scikit-learn extension, which is a regression forest that implements honesty and instead of bootstrap, performs subsampling to construct each tree. It also offers predict_interval via the bootstrap of little bags approach and the asymptotic normal characterization of the prediction estimate. Added NonParamDMLCateEstimator, which is essentially another meta-learner that has an arbitrary final stage that supports fit and predict (albeit fit must accept sample_weight). This is based on the observation that, when treatment is single-dimensional or binary one can view the RLearner problem as a weighted regression. Added ForestDMLCateEstimator (which is essentially a causal forest implemented slightly differently via viewing it as a weighted non-parametric regression and piggy backing on scikit-learn tree construction) and has bootstrap of little bags based inference. This is essentially a NonParamDMLCateEstimator with a SubsampledHonestForest final model. Also added ForestDRLearner, which uses the doubly robust approach and uses an honest forest for each pseudo outcome regression. This also offers non-parametric confidence intervals to the Doubly Robust estimation classes. This is essentially a DRLearner with a SubsampledHonestForest final model. Side additions: re-organized inference class hierarchy to make most of code re-use. added monte_carlo folder and monte_carlo experiments for LinearDMLCateEstimator and SubsampledHonestForest	2019-12-04 18:17:34 -05:00
Keith Battocchi	9b6c939e61	Add fit_cate_intercept to DML, rework feature generation (#174 ) Add fit_cate_intercept to DML, rework feature generation	2019-11-21 18:31:24 -05:00
Maggie Hei	818c8320d6	automate the first stage model T and update DML notebook (#172 ) * automate the first stage model T and update DML notebook * Changed model defaults in ORF and fixed a bug in WeightedKFold	2019-11-21 17:58:34 -05:00
Maggie Hei	b7e826e268	support multi treatment in meta learners (#141 ) * support multi treatment in meta learners	2019-11-13 11:58:22 -05:00
vasilismsr	72a7022d99	Vasilis/drlearner (#137 ) This PR creates the DRLearner class which replaces the DoublyRobustLearner from the metalearners. Many updates to it: * crosffitting (made DRLearner child of _OrthoLearner) * confidence intervals (LinearDRLearner and asymptotically normal based intervals) * access to nuisance and final models for interpretability and debugging * support for multiple treatments Detailed commits: * separating drlearner * removed doubly robust learner from metalearners * revmoed drlearner test from metalearners and started new test file for drlearner. * changed tests in dml to adhere to the CATE API. Some calls to effect had T0 and T1 as non keyword arguments. They should be. A bug fix in treatment expansion mixin revealed this mistake. * added exhaustive tests for drlearner. * bug in docstring regarding how split is called to generate crossfit folds * changed bootstarp tests to conform with keyword only effect and effect_interval * bootstrap tests, fixing bugs related to positional arguments T0, T1 * added checks of fitted dims for W and Z during scoring in _OrthoLearner. * docstring fix regarding n_splits in multiple places. Fixed notebook for metalearners to call the new doubly robust learner. * changed statsmodels inference input properties to reflect that we only support a specific subset of covariance types * added comment on overlapping tests between DML and DRLearner * added TODO for allowing for 2d y of shape (n,1) and also added test that T can also be a vector * added TODO so that we merge functionality between statsmodelsinference and statsmodelsinferencediscrete * fixed docstring in dml. Added utility function of inverse_onehot encoding with corresponding test and used that function in the nuisances in DMLCateEstimator and DRLearner. Made nuisances a keyword argument with no default in DRLearner. * made statsmodelslinearregression be child of BaseEstimator * added comment on code design choice in model_final of drlearner, related to mulitask model_final	2019-11-11 13:29:07 -05:00
vasilismsr	5bf6dc7557	Fixing bug with DiscreteTreatmentOrthoForest effect shapes (#146 ) * fixed ortho forest effect function bug. The FunctionTransformer was not doing the one hot encoding at all. Once the one hot encoding was added, things work correctly. * fixed orthoforest notebook. Dimension related bug	2019-11-10 14:30:33 -05:00
Keith Battocchi	6d99f9d6e1	Migrate notebooks to new-style inference (#128 )	2019-11-05 17:51:00 -05:00
Keith Battocchi	9b33b10e2a	Improve treatment expansion	2019-10-29 16:52:07 -04:00
Keith Battocchi	3e216bd957	Add statsmodels inference to LinearDMLCateEstimator	2019-10-29 16:52:07 -04:00
Keith Battocchi	f77a4e52a2	Refactor DML classes to enable cross-fit later	2019-10-29 16:52:07 -04:00
Keith Battocchi	0f0d1ec702	Enable deep IV to be used with vectors (#59 ) * Enable deep IV to be used with vectors * Reduce training epochs in Deep IV tests	2019-05-31 17:31:08 -04:00
Maggie Hei	07dee11b09	add discrete treatment example (#58 )	2019-05-06 14:01:12 -04:00
Keith Battocchi	296fabb383	Reorder effect arguments; allow scalar treatments (#49 ) Reorder effect arguments; allow scalar treatments	2019-05-03 10:41:57 -04:00
Maggie Hei	c09692c6bf	Mehei/scoring (#44 ) * add scoring function * add score example on dml notebook	2019-05-02 20:34:27 -04:00
GregLewis	986f0e905b	Deep iv notebook edits (#50 ) Improved example and formatting	2019-05-02 18:41:04 -04:00
Keith Battocchi	288f39e0fb	Enable setting all Keras fit options (to allow callbacks, etc.)	2019-05-01 12:06:33 -04:00
Greg Lewis	df7e1c99cc	Hell yeah, I'm doing this!	2019-04-30 19:26:01 -04:00
Keith Battocchi	3bf88b5277	Create Deep IV notebook (#32 )	2019-04-10 20:09:20 -04:00

1 2

58 Коммитов