Revise user guide entry for intrinsic causal influence

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
This commit is contained in:
Patrick Bloebaum 2023-02-24 13:31:38 -08:00 коммит произвёл Patrick Blöbaum
Родитель a1dcccbc80
Коммит 12168ea7bd
1 изменённых файлов: 57 добавлений и 48 удалений

Просмотреть файл

@ -6,51 +6,64 @@ By quantifying intrinsic causal influence, we answer the question:
How strong is the causal influence of a source node to a target node How strong is the causal influence of a source node to a target node
that is not inherited from the parents of the source node? that is not inherited from the parents of the source node?
Naturally, descendants will have a zero intrinsic influence on the target node. Naturally, descendants will have a zero intrinsic causal influence on the target node. This method is based on the paper:
Dominik Janzing, Patrick Blöbaum, Lenon Minorics, Philipp Faller, Atalanti Mastakouri. `Quantifying intrinsic causal contributions via structure preserving interventions <https://arxiv.org/abs/2007.00714>`_
arXiv:2007.00714, 2021
Let's consider an example from the paper to understand the type of influence being measured here. Imagine a schedule of
three trains, ``Train A, Train B`` and ``Train C``, where the departure time of ``Train C`` depends on the arrival time of ``Train B``,
and the departure time of ``Train B`` depends on the arrival time of ``Train A``. Suppose ``Train A`` typically experiences much
longer delays than ``Train B`` and ``Train C``. The question we want to answer is: How strong is the influence of each train
on the delay of ``Train C``?
While there are various definitions of influence in the literature, we are interested in the *intrinsic causal influence*,
which measures the influence of a node that has not been inherited from its parents, that is, the influence of the noise
of a node. The reason for this is that, while ``Train C`` has to wait for ``Train B``, ``Train B`` mostly inherits the delay from
``Train A``. Thus, ``Train A`` should be identified as the node that contributes the most to the delay of ``Train C``.
See the :ref:`Understanding the method <understand-method>` section for another example and more details.
How to use it How to use it
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
To see how the method works, let us generate some data. To see how the method works, let us generate some data following the example above:
>>> import numpy as np, pandas as pd, networkx as nx >>> import numpy as np, pandas as pd, networkx as nx
>>> from dowhy import gcm >>> from dowhy import gcm
>>> from dowhy.gcm.uncertainty import estimate_variance
>>> np.random.seed(10) # to reproduce these results
>>> X = np.random.normal(loc=0, scale=1, size=1000) >>> X = abs(np.random.normal(loc=0, scale=5, size=1000))
>>> Y = 2*X + np.random.normal(loc=0, scale=1, size=1000) >>> Y = X + abs(np.random.normal(loc=0, scale=1, size=1000))
>>> Z = 3*Y + np.random.normal(loc=0, scale=1, size=1000) >>> Z = Y + abs(np.random.normal(loc=0, scale=1, size=1000))
>>> data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z)) >>> data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
Next, we will model cause-effect relationships as a structural causal model and fit it to the data. Note the larger standard deviation of the 'noise' in :math:`X`.
Next, we will model cause-effect relationships as a structural causal model and fit it to the data. Here, we are using
the auto module to automatically assign causal mechanisms:
>>> causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z >>> causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z
>>> causal_model.set_causal_mechanism('X', gcm.EmpiricalDistribution()) >>> gcm.auto.assign_causal_mechanisms(causal_model, data)
>>> causal_model.set_causal_mechanism('Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))
>>> causal_model.set_causal_mechanism('Z', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))
>>> gcm.fit(causal_model, data) >>> gcm.fit(causal_model, data)
..
Todo: Use auto module for automatic assignment!
Finally, we can ask for the intrinsic causal influences of ancestors to a node of interest (e.g., :math:`Z`). Finally, we can ask for the intrinsic causal influences of ancestors to a node of interest (e.g., :math:`Z`).
>>> contributions = gcm.intrinsic_causal_influence(causal_model, 'Z', >>> contributions = gcm.intrinsic_causal_influence(causal_model, 'Z')
>>> gcm.ml.create_linear_regressor(),
>>> lambda x, _: estimate_variance(x))
>>> contributions >>> contributions
{'X': 33.34300732332951, 'Y': 9.599478688607254, 'Z': 0.9750701113403872} {'X': 8.736841722582117, 'Y': 0.4491606897202768, 'Z': 0.35377942123477574}
**Interpreting the results:** We estimated the intrinsic influence of ancestors of Note that, although we use a linear relationship here, the method can also handle arbitrary non-linear relationships.
:math:`Z`, including itself, to its variance. These contributions sum up to the variance of :math:`Z`.
We observe that ~76% of the variance of :math:`Z` comes from :math:`X`. **Interpreting the results:** We estimated the intrinsic causal influence of ancestors of
:math:`Z`, including itself, to its variance (the default measure). These contributions sum up to the variance of :math:`Z`.
As we see here, we observe that ~92% of the variance of :math:`Z` comes from :math:`X`.
.. _understand-method:
Understanding the method Understanding the method
^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
Consider the following example to get the intuition behind the notion of "intrinsic" Let's look at a different example to explain the intuition behind the notion of "intrinsic" causal influence further:
causal influence we seek to measure here.
A charity event is organised to collect funds to help an orphanage. At the end of the event, A charity event is organised to collect funds to help an orphanage. At the end of the event,
a donation box is passed around to each participant. Since the donation is voluntary, some may a donation box is passed around to each participant. Since the donation is voluntary, some may
@ -59,12 +72,12 @@ causal influence we seek to measure here.
anything to the collective donation after all. Each person's contribution then is simply the anything to the collective donation after all. Each person's contribution then is simply the
amount they donated. amount they donated.
To measure the `intrinsic causal influence <https://arxiv.org/pdf/2007.00714.pdf>`_ of a source To measure the intrinsic causal influence of a source
node to a target node, we need a functional causal model. In particular, we assume that the node to a target node, we need a functional causal model. For instance, we can assume that the
causal model of each node follows an additive noise model (ANM), i.e. :math:`X_j := f_j causal model of each node follows an additive noise model (ANM), i.e. :math:`X_j := f_j
(\textrm{PA}_j) + N_j`, where :math:`\textrm{PA}_j` are the parents of node :math:`X_j` in the causal graph, (\textrm{PA}_j) + N_j`, where :math:`\textrm{PA}_j` are the parents of node :math:`X_j` in the causal graph,
and :math:`N_j` is the independent unobserved noise term. To compute the "intrinsic" contribution of ancestors of :math:`X_n` to and :math:`N_j` is the independent unobserved noise term. To compute the "intrinsic" contribution of ancestors of :math:`X_n` to
some property (e.g. entropy, variance) of the marginal distribution of :math:`X_n`, we first some property (e.g. variance or entropy) of the marginal distribution of :math:`X_n`, we first
have to set up our causal graph, and learn the causal model of each node from the dataset. have to set up our causal graph, and learn the causal model of each node from the dataset.
Consider a causal graph :math:`X \rightarrow Y \rightarrow Z` as in the code example above, Consider a causal graph :math:`X \rightarrow Y \rightarrow Z` as in the code example above,
@ -94,40 +107,36 @@ quantify the intrinsic causal influence of :math:`X, Y` and :math:`Z` to
>>> from dowhy.gcm.uncertainty import estimate_variance >>> from dowhy.gcm.uncertainty import estimate_variance
>>> prediction_model_from_noises_to_target = gcm.ml.create_linear_regressor() >>> prediction_model_from_noises_to_target = gcm.ml.create_linear_regressor()
>>> node_to_contribution = gcm.intrinsic_causal_influence(causal_model, 'Z', >>> node_to_contribution = gcm.intrinsic_causal_influence(causal_model, 'Z',
>>> prediction_model_from_noises_to_target, >>> prediction_model_from_noises_to_target,
>>> lambda x, _: estimate_variance(x)) >>> attribution_func=lambda x, _: estimate_variance(x))
Here, we explicitly defined the variance in the parameter ``attribution_func`` as the property we are interested in.
.. note:: .. note::
While using variance as uncertainty estimator gives valuable information about the While using variance as uncertainty estimator gives valuable information about the
contribution of nodes to the squared deviations in the target, one might be rather interested contribution of nodes to the squared deviations in the target, one might be rather interested
in other quantities, such as absolute deviations. This can also be simply computed by replacing in other quantities, such as absolute deviations. This can also be simply computed by replacing
the uncertainty estimator with a custom function: the ``attribution_func`` with a custom function:
>>> mean_absolute_deviation_estimator = lambda x: np.mean(abs(x)) >>> mean_absolute_deviation_estimator = lambda x: np.mean(abs(x))
>>> node_to_contribution = gcm.intrinsic_causal_influence(causal_model, 'Z', >>> node_to_contribution = gcm.intrinsic_causal_influence(causal_model, 'Z',
>>> prediction_model_from_noises_to_target, >>> prediction_model_from_noises_to_target,
>>> mean_absolute_deviation_estimator) >>> attribution_func=mean_absolute_deviation_estimator)
If the choice of a prediction model is unclear, the prediction model parameter can also be set If the choice of a prediction model is unclear, the prediction model parameter can also be set
to "auto". to "auto".
.. **Remark on using the mean for the attribution:** Although the ``attribution_func`` can be customized for a given use
Todo: Add this once confidence intervals is added! case, not all definitions make sense. For instance,
Above, we report point estimates of Shapley values from a sample drawn from the estimated joint using the **mean** does not provide any meaningful results. This is because the way influences are estimated is based
distribution :math:`\hat{P}_{X, Y, Z}`. To quantify the uncertainty of those point estimates, we on the concept of Shapley values. To understand this better, we can look at a general property of Shapley values, which
now compute their `bootstrap confidence intervals <https://ocw.mit states that the sum of Shapley values, in our case the sum of the attributions, adds up to :math:`\nu(T) - \nu(\{\})`.
.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings Here, :math:`\nu` is a set function (in our case, the expectation of the ``attribution_func``), and :math:`T` is the full
/MIT18_05S14_Reading24.pdf>`_ by simply running the above a number of times, and aggregating the set of all players (in our case, all noise variables).
results.
>>> from gcm import confidence_intervals, bootstrap_sampling Now, if we use the mean, :math:`\nu(T)` becomes :math:`\mathbb{E}_\mathbf{N}[\mathbb{E}[Y | \mathbf{N}]] = \mathbb{E}[Y]`,
>>> because the target variable :math:`Y` depends deterministically on all noise variables :math:`\mathbf{N}` in the graphical
>>> node_to_mean_contrib, node_to_contrib_conf = confidence_intervals( causal model. Similarly, :math:`\nu(\{\})` becomes :math:`\mathbb{E}[Y | \{\}] = \mathbb{E}[Y]`. This would result in
>>> bootstrap_sampling(gcm.intrinsic_causal_influence, causal_model, 'Z', :math:`\mathbb{E}_\mathbb{N}[\mathbb{E}[Y | \mathbb{N}]] - \mathbb{E}[Y | \{\}] = 0`, i.e. the resulting attributions
>>> prediction_model_from_noises_to_target, lambda x, _: estimate_variance(x)), are close to 0. For more details, see Section 3.3 of the paper.
>>> confidence_level=0.95,
>>> num_bootstrap_resamples=200)
Note that the higher the number of repetitions, the better we are able to approximate the
sampling distribution of Shapley values.