- In the "early stopping" method, the runs were converging slowly when a Shapley value is exactly zero, because the change was always considered as 100%. Now, if the Shapley value between two consecutive runs are exactly zero, then it is considered as 'converging'.
- Since the Shapley estimator supports the estimation of multiple vectors of Shapley values at the same time, this change also introduces some logic to keep track of the convergence for each vector individually.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
- Speed up by reducing tolerance threshold
- Increase default value for num_samples_conditional to improve accuracy
- Fix bug when num_samples_conditional is set to a higher number than the actual number of given samples
- Clarify tolerance parameter in docstring
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
- Shapley config parameter now clearly distinguishes between num_subset_sampling and num_permutations
- Using a quasi-random sequence generator now to improve uniform sampling of permutation. This should improve convergence (i.e., more accurate estimations)
- Fix bug in early stopping where one can get negative percentages and, thus, the estimator would falsely stop too early.
- Early stopping criteria is now applied to the change of each Shapley value individually instead of the average change. This should ensure less variance in the results between two runs and more accurate results.
- Change minimum change in percentage threshold to 0.05 by default for early stopping. This balances out the potential slower runtime from the change above.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
Before, numeric arrays with the wrong dtype would cause an error when summing them up and using np.isclose. Now, the dtype is explicitly changed to float64.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
Before, one dimension is dropped in the encoding. However, this requires to assume that there are no unknown categories, since these would be mapped to a zero vector as well, which then coincides with one of the categories. Now, there are as many dimensions as categories, which allow to map unknown categories to a zero vector.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
The new name 'feature_relevance' will help clarify the type of functionality to expect from the module.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
The new parameter allows to indicate whether existing causal mechanisms should be overridden by inferred ones based on the data. This would also include uncertainties of the model selection when computing confidence intervals.
Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
Enhancement: warn about unobserved graph variables
Make `causal_model.CausalModel` constructor emit a
`UserWarning` in case there are graph variables
(common causes, instruments, effect modifiers)
that are not contained in the observed data (`self._data`).
Furthermore log additional information to logger level
`logging.WARNING` to help debug typos in variable names
and point out data variables that are not used in a
given graph.
Signed-off-by: Moritz Freidank <freidankm@gmail.com>
Co-authored-by: Moritz Freidank <freidankm@gmail.com>
* fix: significance_level
https://github.com/py-why/dowhy/issues/809
Signed-off-by: Michael Klesel <michael@klesel.info>
* fix: default number of simulations for CI
https://github.com/py-why/dowhy/issues/841
Signed-off-by: Michael Klesel <michael@klesel.info>
* fix: significance_level
https://github.com/py-why/dowhy/issues/809
Signed-off-by: Michael Klesel <michael@klesel.info>
* fix: default number of simulations for CI
https://github.com/py-why/dowhy/issues/841
Signed-off-by: Michael Klesel <michael@klesel.info>
* remove: self._linear_model (not used in class)
Signed-off-by: Michael Klesel <michael@klesel.info>
---------
Signed-off-by: Michael Klesel <michael@klesel.info>
* use arguments in estimate effect that were not being passed to the functional version
Signed-off-by: Padarn Wilson <padarn.wilson@grabtaxi.com>
* fix formatting
---------
Signed-off-by: Padarn Wilson <padarn.wilson@grabtaxi.com>
This is required to have all necessary dependencies and it also makes this workflow consistent with the nightly tests.
Signed-off-by: Peter Goetz <pego@amazon.com>
This is mainly for consistency, so we can get better aggregate metrics in Google Analytics. For users it probably makes no difference.
Signed-off-by: Peter Goetz <pego@amazon.com>
This makes it compatible with Numba version 0.56.4. The way the Numpy version was specified before, install version 1.24.0 which is incompatible, see also https://github.com/numba/numba/pull/8691. Numba is a transient dependency coming in through econml and sparse.
Signed-off-by: Peter Goetz <pego@amazon.com>
* Initial commit
* Add data to CausalEstimate (workaround for methods that depend on data)
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* remove treatment_name from estimator object
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* remove outcome_name from causal_estimators
* Remove _outcome, _treatment, _outcome_name from estimator object
* Restore notebooks
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* Add removed code
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* Restore notebook
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* Remove treatment and outcome as parameters
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* apply formatting
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* Remove treatment and outcome parameters
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* Apply formatting
Signed-off-by: Andres Morales <andresmor@microsoft.com>
Signed-off-by: Andres Morales <andresmor@microsoft.com>
* feat: Add scaffolding for overrule, including basic test
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Update dependencies for overrule
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add the full set of overrule code
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* style: Black styling on overrule code
style: Black formatting on beam_search
style: Black formatting for ruleset
style: Black format utils
style: Black formatting on load_process_data_BCS
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Change np.matmul to np.dot
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Update to appropriate matmul notation for latest CVXPY
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Remove unnecessary overrule code
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Minimum working example for OverRule
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* test: Minimum viable test for OverRule
test: Fix test to work with new interface
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Print rules with option to recompute metrics
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Improve printing of results for refutation
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Pass in additional arguments to OverRule
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Add docstrings to ruleset.py
docs: Docstrings for assess_overlap.py
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Update logger
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Add additional docstrings
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Path bug
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add notebook with a toy example to demonstrate OverRule
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add back using LP coeff by default
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Typing and docstrings
docs: Consistent module docstrings
docs: Add docstrings to overrule/utils.py
docs: Add docstrings to overrule/BCS/beam_search.py
docs: Add doctstrings and typing to load_process_data_BCS
docs: Add docstrings and type hints to BCS/overlap_boolean_rule.py
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Fix and rename notebook for demonstrating overrule
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Use default_rng instead of setting a global seed in sample_Unif
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Use rng in place of numpy.random
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Replace list with numpy array to fix type error
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Fix type hint on ref_range
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add option to only fit overlap or support rules
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Flesh out example notebook with parameters
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add thresh_override as a argument for configuration
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Functional API for overrule has defaults
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Add API reference
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* ci: Fix support rule test
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* chore: Update poetry.lock for cxvpy dependency
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Remove `XGBClassifier` as default classifier
To avoid dependency on `xgboost`, replace `XGBClassifier` as the default
propensity score model with `RandomForestClassifier` from `sklearn`
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Fix logic so that when verbose=True, silent=False
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Remove seaborn dependency from overrule notebook
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add option to pass random seed to support estimation
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Update notebook to use random seed on support estimation
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Typo
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add PSID dataset (observational controls for Lalonde)
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Prevent fitting overlap rules if all samples in overlap region
One of the assertions in OverlapBooleanRule will trip if all samples are
in the overlap region.
This commit adds a more informative error if the assertion gets tripped,
and raises a more informative warning upstream if all samples are in the
overlap region
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Add function that can be used with target_units
`refute.filter_dataframe(df)` can be used to filter a dataframe to units
that are in the overlap/support region.
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Clarify notebook intro
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Clarify how to read rules in refuter output
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* fix: Return a copy when filtering
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Update notebook with Lalonde example
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Add return to docstring
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* docs: Add citation for pricing problem
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* feat: Change progressbar error to warning
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
* chore: Update lockfile
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>
Signed-off-by: Michael Oberst <michael.k.oberst@gmail.com>