Refactor graph.py, fcms.py and cms.py in gcm module
- fcms.py is now called causal_mechanisms.py - cms.py is now called causal_models.py - StochasticModel and ConditionalStochasticModel are now part of causal_mechanisms.py instead of graph.py - graph.py is moved to the main dowhy module in preparation to replace the CausalModel class - causal_models.py now only contains the causal models ProbabilisticCausalModel, StructuralCausalModel and InvertibleStructuralCausalModel. It also has all the validation methods related to cms. - The PredictionModel class is now part of the gcm.ml module instead of causal_mechanisms.py Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
This commit is contained in:
Родитель
0fb1314c5d
Коммит
961ffc6373
|
@ -46,10 +46,10 @@ dowhy.gcm.auto module
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
dowhy.gcm.cms module
|
||||
dowhy.gcm.causal_models module
|
||||
--------------------
|
||||
|
||||
.. automodule:: dowhy.gcm.cms
|
||||
.. automodule:: dowhy.gcm.causal_models
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
@ -118,10 +118,10 @@ dowhy.gcm.divergence module
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
dowhy.gcm.fcms module
|
||||
dowhy.gcm.causal_mechanisms module
|
||||
---------------------
|
||||
|
||||
.. automodule:: dowhy.gcm.fcms
|
||||
.. automodule:: dowhy.gcm.causal_mechanisms
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
@ -142,10 +142,10 @@ dowhy.gcm.fitting\_sampling module
|
|||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
dowhy.gcm.graph module
|
||||
dowhy.graph module
|
||||
----------------------
|
||||
|
||||
.. automodule:: dowhy.gcm.graph
|
||||
.. automodule:: dowhy.graph
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
|
@ -18,13 +18,13 @@ distinguishes between these two types of nodes.
|
|||
|
||||
For root nodes such as :math:`X`, the distribution :math:`P_x` is modeled using a stochastic model.
|
||||
Non-root nodes such as :math:`Y` are modelled using a *conditional* stochastic model. DoWhy's gcm package
|
||||
defines corresponding interfaces for both, namely :class:`~dowhy.gcm.graph.StochasticModel` and
|
||||
:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
|
||||
defines corresponding interfaces for both, namely :class:`~dowhy.gcm.causal_mechanisms.StochasticModel` and
|
||||
:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.
|
||||
|
||||
The gcm package also provides ready-to-use implementations, such as :class:`~dowhy.gcm.stochastic_models
|
||||
.ScipyDistribution` or :class:`~dowhy.gcm.stochastic_models.BayesianGaussianMixtureDistribution` for
|
||||
:class:`~dowhy.gcm.graph.StochasticModel`, and :class:`~dowhy.gcm.fcms.AdditiveNoiseModel` for
|
||||
:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
|
||||
:class:`~dowhy.gcm.causal_mechanisms.StochasticModel`, and :class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` for
|
||||
:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.
|
||||
|
||||
Knowing that, we can now start to manually assign causal models to nodes according to our needs.
|
||||
Say, we know from domain knowledge, that our root node X follows a normal distribution. In this
|
||||
|
@ -38,7 +38,7 @@ case, we can explicitly assign this:
|
|||
>>> causal_model.set_causal_mechanism('X', gcm.ScipyDistribution(norm))
|
||||
|
||||
For the non-root node Y, let's use an additive noise model (ANM), represented by the
|
||||
:class:`~dowhy.gcm.fcms.AdditiveNoiseModel` class. It has a
|
||||
:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` class. It has a
|
||||
structural assignment of the form: :math:`Y := f(X) + N`. Here, f is a deterministic prediction
|
||||
function, whereas N is a noise term. Let's put all of this together:
|
||||
|
||||
|
@ -48,7 +48,7 @@ function, whereas N is a noise term. Let's put all of this together:
|
|||
|
||||
The rather interesting part here is the ``prediction_model``, which corresponds to our function
|
||||
:math:`f` above. This prediction model must satisfy the contract defined by
|
||||
:class:`~dowhy.gcm.fcms.PredictionModel`, i.e. it must implement the methods::
|
||||
:class:`~dowhy.gcm.ml.PredictionModel`, i.e. it must implement the methods::
|
||||
|
||||
def fit(self, X: np.ndarray, Y: np.ndarray) -> None: ...
|
||||
def predict(self, X: np.ndarray) -> np.ndarray: ...
|
||||
|
@ -79,9 +79,9 @@ Using ground truth models
|
|||
In some scenarios the ground truth models might be known and should be used instead. Let's
|
||||
assume, we know that our relationship are linear with coefficients :math:`\alpha = 2` and
|
||||
:math:`\beta = 3`. Let's make use of this knowledge by creating a custom prediction model that
|
||||
implements the :class:`~dowhy.gcm.fcms.PredictionModel` interface:
|
||||
implements the :class:`~dowhy.gcm.ml.PredictionModel` interface:
|
||||
|
||||
>>> class MyCustomModel(gcm.PredictionModel):
|
||||
>>>import dowhy.gcm.ml.prediction_model class MyCustomModel(gcm.ml.PredictionModel):
|
||||
>>> def __init__(self, coefficient):
|
||||
>>> self.coefficient = coefficient
|
||||
>>>
|
||||
|
|
|
@ -13,15 +13,14 @@ from .anomaly_scorers import (
|
|||
MedianDeviationScorer,
|
||||
RescaledMedianCDFQuantileScorer,
|
||||
)
|
||||
from .cms import FunctionalCausalModel, InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from .causal_mechanisms import AdditiveNoiseModel, ClassifierFCM, PostNonlinearModel
|
||||
from .causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from .confidence_intervals import confidence_intervals
|
||||
from .confidence_intervals_cms import bootstrap_sampling, fit_and_compute
|
||||
from .density_estimators import GaussianMixtureDensityEstimator, KernelDensityEstimator1D
|
||||
from .distribution_change import distribution_change, distribution_change_of_graphs
|
||||
from .fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PostNonlinearModel, PredictionModel
|
||||
from .feature_relevance import feature_relevance_distribution, feature_relevance_sample, parent_relevance
|
||||
from .fitting_sampling import draw_samples, fit
|
||||
from .graph import ConditionalStochasticModel, DirectedGraph, FunctionalCausalModel, StochasticModel, is_root_node
|
||||
from .independence_test import (
|
||||
approx_kernel_based,
|
||||
generalised_cov_based,
|
||||
|
@ -30,6 +29,7 @@ from .independence_test import (
|
|||
regression_based,
|
||||
)
|
||||
from .influence import arrow_strength, intrinsic_causal_influence
|
||||
from .ml import ClassificationModel, PredictionModel
|
||||
from .stochastic_models import BayesianGaussianMixtureDistribution, EmpiricalDistribution, ScipyDistribution
|
||||
from .unit_change import unit_change
|
||||
from .validation import RejectionResult, refute_causal_structure, refute_invertible_model
|
||||
|
|
|
@ -4,10 +4,15 @@ import networkx as nx
|
|||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from dowhy.gcm.fcms import PredictionModel
|
||||
from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view, validate_causal_dag
|
||||
from dowhy.gcm.causal_models import (
|
||||
InvertibleStructuralCausalModel,
|
||||
ProbabilisticCausalModel,
|
||||
StructuralCausalModel,
|
||||
validate_causal_dag,
|
||||
)
|
||||
from dowhy.gcm.ml.prediction_model import PredictionModel
|
||||
from dowhy.gcm.util.general import shape_into_2d
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view
|
||||
|
||||
|
||||
def compute_data_from_noise(causal_model: StructuralCausalModel, noise_data: pd.DataFrame) -> pd.DataFrame:
|
||||
|
|
|
@ -9,11 +9,12 @@ from dowhy.gcm import config
|
|||
from dowhy.gcm._noise import compute_noise_from_data, get_noise_dependent_function, noise_samples_of_ancestors
|
||||
from dowhy.gcm.anomaly_scorer import AnomalyScorer
|
||||
from dowhy.gcm.anomaly_scorers import MedianCDFQuantileScorer, RescaledMedianCDFQuantileScorer
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel
|
||||
from dowhy.gcm.graph import ConditionalStochasticModel, get_ordered_predecessors, is_root_node, validate_causal_dag
|
||||
from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
|
||||
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, validate_causal_dag
|
||||
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
|
||||
from dowhy.gcm.stats import permute_features
|
||||
from dowhy.gcm.util.general import shape_into_2d
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node
|
||||
|
||||
|
||||
def conditional_anomaly_scores(
|
||||
|
|
|
@ -13,10 +13,11 @@ from sklearn.model_selection import KFold, train_test_split
|
|||
from sklearn.preprocessing import MultiLabelBinarizer
|
||||
|
||||
from dowhy.gcm import config
|
||||
from dowhy.gcm.cms import ProbabilisticCausalModel
|
||||
from dowhy.gcm.fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PredictionModel
|
||||
from dowhy.gcm.graph import CAUSAL_MECHANISM, get_ordered_predecessors, is_root_node, validate_causal_model_assignment
|
||||
from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM
|
||||
from dowhy.gcm.causal_models import CAUSAL_MECHANISM, ProbabilisticCausalModel, validate_causal_model_assignment
|
||||
from dowhy.gcm.ml import (
|
||||
ClassificationModel,
|
||||
PredictionModel,
|
||||
create_hist_gradient_boost_classifier,
|
||||
create_hist_gradient_boost_regressor,
|
||||
create_lasso_regressor,
|
||||
|
@ -49,6 +50,7 @@ from dowhy.gcm.util.general import (
|
|||
set_random_seed,
|
||||
shape_into_2d,
|
||||
)
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node
|
||||
|
||||
_LIST_OF_POTENTIAL_CLASSIFIERS_GOOD = [
|
||||
partial(create_logistic_regression_classifier, max_iter=1000),
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
"""This module defines multiple implementations of the abstract class :class:`~dowhy.gcm.graph.FunctionalCausalModel`
|
||||
(FCM)
|
||||
"""This module implements different causal mechanisms.
|
||||
|
||||
Classes in this module should be considered experimental, meaning there might be breaking API changes in the future.
|
||||
"""
|
||||
|
@ -10,52 +9,69 @@ from typing import List, Optional
|
|||
|
||||
import numpy as np
|
||||
|
||||
from dowhy.gcm.graph import FunctionalCausalModel, InvertibleFunctionalCausalModel, StochasticModel
|
||||
from dowhy.gcm.ml import ClassificationModel, PredictionModel
|
||||
from dowhy.gcm.ml.regression import InvertibleFunction
|
||||
from dowhy.gcm.util.general import is_categorical, shape_into_2d
|
||||
|
||||
|
||||
class PredictionModel:
|
||||
"""Represents general prediction model implementations. Each prediction model should provide a fit and a predict
|
||||
method."""
|
||||
class StochasticModel(ABC):
|
||||
"""A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""
|
||||
|
||||
@abstractmethod
|
||||
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
|
||||
def fit(self, X: np.ndarray) -> None:
|
||||
"""Fits the model according to the data."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def predict(self, X: np.ndarray) -> np.ndarray:
|
||||
def draw_samples(self, num_samples: int) -> np.ndarray:
|
||||
"""Draws samples for the fitted model."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def clone(self):
|
||||
"""
|
||||
Clones the prediction model using the same hyper parameters but not fitted.
|
||||
|
||||
:return: An unfitted clone of the prediction model.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class ClassificationModel(PredictionModel):
|
||||
@abstractmethod
|
||||
def predict_probabilities(self, X: np.array) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
class ConditionalStochasticModel(ABC):
|
||||
"""A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
|
||||
causal model."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def classes(self) -> List[str]:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class InvertibleFunction:
|
||||
@abstractmethod
|
||||
def evaluate(self, X: np.ndarray) -> np.ndarray:
|
||||
"""Applies the function on the input."""
|
||||
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
|
||||
"""Fits the model according to the data."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
|
||||
"""Returns the outcome of applying the inverse of the function on the inputs."""
|
||||
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
|
||||
"""Draws samples for the fitted model."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def clone(self):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class FunctionalCausalModel(ConditionalStochasticModel):
|
||||
"""Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
|
||||
as:
|
||||
Y := f(X, N), N: Noise
|
||||
"""
|
||||
|
||||
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
|
||||
return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
|
||||
|
||||
@abstractmethod
|
||||
def draw_noise_samples(self, num_samples: int) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
|
||||
@abstractmethod
|
||||
def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
|
|
@ -7,15 +7,26 @@ from typing import Any, Callable, Optional, Union
|
|||
|
||||
import networkx as nx
|
||||
|
||||
from dowhy.gcm.graph import (
|
||||
CAUSAL_MECHANISM,
|
||||
from dowhy.gcm.causal_mechanisms import (
|
||||
ConditionalStochasticModel,
|
||||
DirectedGraph,
|
||||
FunctionalCausalModel,
|
||||
InvertibleFunctionalCausalModel,
|
||||
StochasticModel,
|
||||
clone_causal_models,
|
||||
)
|
||||
from dowhy.graph import (
|
||||
DirectedGraph,
|
||||
HasNodes,
|
||||
get_ordered_predecessors,
|
||||
is_root_node,
|
||||
validate_acyclic,
|
||||
validate_node_in_graph,
|
||||
)
|
||||
|
||||
# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
|
||||
CAUSAL_MECHANISM = "causal_mechanism"
|
||||
# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
|
||||
# afterwards.
|
||||
PARENTS_DURING_FIT = "parents_during_fit"
|
||||
|
||||
|
||||
class ProbabilisticCausalModel:
|
||||
|
@ -83,7 +94,7 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):
|
|||
:func:`~dowhy.gcm.whatif.counterfactual_samples`. This is a subclass of
|
||||
:class:`~dowhy.gcm.cms.StructuralCausalModel` and has further restrictions on the class of causal mechanisms.
|
||||
Here, the mechanisms of non-root nodes need to be invertible with respect to the noise,
|
||||
such as :class:`~dowhy.gcm.fcms.PostNonlinearModel`.
|
||||
such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel`.
|
||||
"""
|
||||
|
||||
def set_causal_mechanism(
|
||||
|
@ -93,3 +104,60 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):
|
|||
|
||||
def causal_mechanism(self, node: Any) -> Union[StochasticModel, InvertibleFunctionalCausalModel]:
|
||||
return super().causal_mechanism(node)
|
||||
|
||||
|
||||
def validate_causal_dag(causal_graph: DirectedGraph) -> None:
|
||||
validate_acyclic(causal_graph)
|
||||
validate_causal_graph(causal_graph)
|
||||
|
||||
|
||||
def validate_causal_graph(causal_graph: DirectedGraph) -> None:
|
||||
for node in causal_graph.nodes:
|
||||
validate_node(causal_graph, node)
|
||||
|
||||
|
||||
def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
|
||||
validate_causal_model_assignment(causal_graph, node)
|
||||
validate_local_structure(causal_graph, node)
|
||||
|
||||
|
||||
def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
|
||||
validate_node_has_causal_model(causal_graph, target_node)
|
||||
|
||||
causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
|
||||
|
||||
if is_root_node(causal_graph, target_node):
|
||||
if not isinstance(causal_model, StochasticModel):
|
||||
raise RuntimeError(
|
||||
"Node %s is a root node and, thus, requires a StochasticModel, "
|
||||
"but a %s was found!" % (target_node, causal_model)
|
||||
)
|
||||
elif not isinstance(causal_model, ConditionalStochasticModel):
|
||||
raise RuntimeError(
|
||||
"Node %s has parents and, thus, requires a ConditionalStochasticModel, "
|
||||
"but a %s was found!" % (target_node, causal_model)
|
||||
)
|
||||
|
||||
|
||||
def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
|
||||
if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
|
||||
PARENTS_DURING_FIT
|
||||
] != get_ordered_predecessors(causal_graph, node):
|
||||
raise RuntimeError(
|
||||
"The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
|
||||
"causal models in the graph first. If the mechanism is already fitted based on the causal"
|
||||
"parents, consider to update the persisted parents for that node manually." % node
|
||||
)
|
||||
|
||||
|
||||
def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
|
||||
validate_node_in_graph(causal_graph, node)
|
||||
|
||||
if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
|
||||
raise ValueError("Node %s has no assigned causal mechanism!" % node)
|
||||
|
||||
|
||||
def clone_causal_models(source: HasNodes, destination: HasNodes):
|
||||
for node in destination.nodes:
|
||||
if CAUSAL_MECHANISM in source.nodes[node]:
|
||||
destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()
|
|
@ -10,7 +10,7 @@ import numpy as np
|
|||
import pandas as pd
|
||||
|
||||
from dowhy.gcm import auto
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from dowhy.gcm.fitting_sampling import fit
|
||||
|
||||
# A convenience function when computing confidence intervals specifically for non-deterministic causal queries. This
|
||||
|
|
|
@ -14,22 +14,19 @@ from statsmodels.stats.multitest import multipletests
|
|||
from tqdm import tqdm
|
||||
|
||||
from dowhy.gcm.auto import AssignmentQuality, assign_causal_mechanisms
|
||||
from dowhy.gcm.cms import ProbabilisticCausalModel
|
||||
from dowhy.gcm.divergence import auto_estimate_kl_divergence
|
||||
from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
|
||||
from dowhy.gcm.graph import (
|
||||
from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
|
||||
from dowhy.gcm.causal_models import (
|
||||
PARENTS_DURING_FIT,
|
||||
ConditionalStochasticModel,
|
||||
DirectedGraph,
|
||||
ProbabilisticCausalModel,
|
||||
clone_causal_models,
|
||||
get_ordered_predecessors,
|
||||
is_root_node,
|
||||
node_connected_subgraph_view,
|
||||
validate_causal_dag,
|
||||
)
|
||||
from dowhy.gcm.divergence import auto_estimate_kl_divergence
|
||||
from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
|
||||
from dowhy.gcm.independence_test.kernel import kernel_based
|
||||
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
|
||||
from dowhy.gcm.util.general import shape_into_2d
|
||||
from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node, node_connected_subgraph_view
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
|
@ -16,11 +16,11 @@ from joblib import Parallel, delayed
|
|||
from tqdm import tqdm
|
||||
|
||||
import dowhy.gcm.config as config
|
||||
from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors
|
||||
from dowhy.gcm.independence_test import kernel_based
|
||||
from dowhy.gcm.util import plot
|
||||
from dowhy.gcm.util.general import set_random_seed
|
||||
from dowhy.gcm.validation import _get_non_descendants
|
||||
from dowhy.graph import DirectedGraph, get_ordered_predecessors
|
||||
|
||||
COLORS = list(mcolors.TABLEAU_COLORS.values())
|
||||
|
||||
|
|
|
@ -6,7 +6,7 @@ import numpy as np
|
|||
import pandas as pd
|
||||
|
||||
from dowhy.gcm import feature_relevance
|
||||
from dowhy.gcm.cms import StructuralCausalModel
|
||||
from dowhy.gcm.causal_models import StructuralCausalModel
|
||||
from dowhy.gcm.shapley import ShapleyConfig
|
||||
|
||||
|
||||
|
|
|
@ -10,13 +10,13 @@ from typing import Any, Callable, Dict, Optional, Tuple, Union
|
|||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from dowhy.gcm.cms import StructuralCausalModel
|
||||
from dowhy.gcm.fcms import ProbabilityEstimatorModel
|
||||
from dowhy.gcm.causal_mechanisms import ProbabilityEstimatorModel
|
||||
from dowhy.gcm.causal_models import StructuralCausalModel, validate_node
|
||||
from dowhy.gcm.fitting_sampling import draw_samples
|
||||
from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, validate_node
|
||||
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
|
||||
from dowhy.gcm.stats import marginal_expectation
|
||||
from dowhy.gcm.util.general import shape_into_2d, variance_of_deviations, variance_of_matching_values
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node
|
||||
|
||||
|
||||
def parent_relevance(
|
||||
|
|
|
@ -11,14 +11,13 @@ import pandas as pd
|
|||
from tqdm import tqdm
|
||||
|
||||
from dowhy.gcm import config
|
||||
from dowhy.gcm.cms import ProbabilisticCausalModel
|
||||
from dowhy.gcm.graph import (
|
||||
from dowhy.gcm.causal_models import (
|
||||
PARENTS_DURING_FIT,
|
||||
get_ordered_predecessors,
|
||||
is_root_node,
|
||||
ProbabilisticCausalModel,
|
||||
validate_causal_dag,
|
||||
validate_causal_model_assignment,
|
||||
)
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node
|
||||
|
||||
|
||||
def fit(causal_model: ProbabilisticCausalModel, data: pd.DataFrame):
|
||||
|
|
|
@ -1,201 +0,0 @@
|
|||
"""This module defines the fundamental interfaces and functions related to causal graphs in graphical causal models.
|
||||
|
||||
Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
|
||||
the future.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, List
|
||||
|
||||
import networkx as nx
|
||||
import numpy as np
|
||||
from networkx.algorithms.dag import has_cycle
|
||||
from typing_extensions import Protocol
|
||||
|
||||
# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
|
||||
CAUSAL_MECHANISM = "causal_mechanism"
|
||||
|
||||
# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
|
||||
# afterwards.
|
||||
PARENTS_DURING_FIT = "parents_during_fit"
|
||||
|
||||
|
||||
class HasNodes(Protocol):
|
||||
"""This protocol defines a trait for classes having nodes."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def nodes(self):
|
||||
""":returns Dict[Any, Dict[Any, Any]]"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class HasEdges(Protocol):
|
||||
"""This protocol defines a trait for classes having edges."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def edges(self):
|
||||
""":returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class DirectedGraph(HasNodes, HasEdges, Protocol):
|
||||
"""A protocol representing a directed graph as needed by graphical causal models.
|
||||
|
||||
This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
|
||||
compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
|
||||
a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def predecessors(self, node):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class StochasticModel(ABC):
|
||||
"""A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""
|
||||
|
||||
@abstractmethod
|
||||
def fit(self, X: np.ndarray) -> None:
|
||||
"""Fits the model according to the data."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def draw_samples(self, num_samples: int) -> np.ndarray:
|
||||
"""Draws samples for the fitted model."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def clone(self):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class ConditionalStochasticModel(ABC):
|
||||
"""A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
|
||||
causal model."""
|
||||
|
||||
@abstractmethod
|
||||
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
|
||||
"""Fits the model according to the data."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
|
||||
"""Draws samples for the fitted model."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def clone(self):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class FunctionalCausalModel(ConditionalStochasticModel):
|
||||
"""Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
|
||||
as:
|
||||
Y := f(X, N), N: Noise
|
||||
"""
|
||||
|
||||
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
|
||||
return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
|
||||
|
||||
@abstractmethod
|
||||
def draw_noise_samples(self, num_samples: int) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
|
||||
@abstractmethod
|
||||
def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
|
||||
return list(causal_graph.predecessors(node)) == []
|
||||
|
||||
|
||||
def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
|
||||
"""This function returns predecessors of a node in a well-defined order.
|
||||
|
||||
This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
|
||||
might not be returned in a reliable order.
|
||||
"""
|
||||
return sorted(causal_graph.predecessors(node))
|
||||
|
||||
|
||||
def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
|
||||
"""Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
|
||||
# can't use nx.node_connected_component, because it doesn't work with DiGraphs.
|
||||
# Hence a manual loop:
|
||||
return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
|
||||
|
||||
|
||||
def clone_causal_models(source: HasNodes, destination: HasNodes):
|
||||
for node in destination.nodes:
|
||||
if CAUSAL_MECHANISM in source.nodes[node]:
|
||||
destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()
|
||||
|
||||
|
||||
def validate_acyclic(causal_graph: DirectedGraph) -> None:
|
||||
if has_cycle(causal_graph):
|
||||
raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
|
||||
|
||||
|
||||
def validate_causal_dag(causal_graph: DirectedGraph) -> None:
|
||||
validate_acyclic(causal_graph)
|
||||
validate_causal_graph(causal_graph)
|
||||
|
||||
|
||||
def validate_causal_graph(causal_graph: DirectedGraph) -> None:
|
||||
for node in causal_graph.nodes:
|
||||
validate_node(causal_graph, node)
|
||||
|
||||
|
||||
def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
|
||||
validate_causal_model_assignment(causal_graph, node)
|
||||
validate_local_structure(causal_graph, node)
|
||||
|
||||
|
||||
def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
|
||||
validate_node_has_causal_model(causal_graph, target_node)
|
||||
|
||||
causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
|
||||
|
||||
if is_root_node(causal_graph, target_node):
|
||||
if not isinstance(causal_model, StochasticModel):
|
||||
raise RuntimeError(
|
||||
"Node %s is a root node and, thus, requires a StochasticModel, "
|
||||
"but a %s was found!" % (target_node, causal_model)
|
||||
)
|
||||
elif not isinstance(causal_model, ConditionalStochasticModel):
|
||||
raise RuntimeError(
|
||||
"Node %s has parents and, thus, requires a ConditionalStochasticModel, "
|
||||
"but a %s was found!" % (target_node, causal_model)
|
||||
)
|
||||
|
||||
|
||||
def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
|
||||
if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
|
||||
PARENTS_DURING_FIT
|
||||
] != get_ordered_predecessors(causal_graph, node):
|
||||
raise RuntimeError(
|
||||
"The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
|
||||
"causal models in the graph first. If the mechanism is already fitted based on the causal"
|
||||
"parents, consider to update the persisted parents for that node manually." % node
|
||||
)
|
||||
|
||||
|
||||
def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
|
||||
validate_node_in_graph(causal_graph, node)
|
||||
|
||||
if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
|
||||
raise ValueError("Node %s has no assigned causal mechanism!" % node)
|
||||
|
||||
|
||||
def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
|
||||
if node not in causal_graph.nodes:
|
||||
raise ValueError("Node %s can not be found in the given graph!" % node)
|
|
@ -4,7 +4,7 @@ import numpy as np
|
|||
from scipy import stats
|
||||
|
||||
from dowhy.gcm.auto import AssignmentQuality, select_model
|
||||
from dowhy.gcm.fcms import PredictionModel
|
||||
from dowhy.gcm.ml import PredictionModel
|
||||
from dowhy.gcm.util.general import is_categorical, shape_into_2d
|
||||
|
||||
|
||||
|
|
|
@ -14,22 +14,22 @@ from numpy.matlib import repmat
|
|||
import dowhy.gcm.auto as auto
|
||||
from dowhy.gcm import feature_relevance_sample
|
||||
from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, noise_samples_of_ancestors
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
|
||||
from dowhy.gcm.fcms import ClassificationModel, ClassifierFCM, PredictionModel, ProbabilityEstimatorModel
|
||||
from dowhy.gcm.fitting_sampling import draw_samples
|
||||
from dowhy.gcm.graph import (
|
||||
ConditionalStochasticModel,
|
||||
get_ordered_predecessors,
|
||||
is_root_node,
|
||||
node_connected_subgraph_view,
|
||||
from dowhy.gcm.causal_mechanisms import ClassifierFCM, ConditionalStochasticModel, ProbabilityEstimatorModel
|
||||
from dowhy.gcm.causal_models import (
|
||||
InvertibleStructuralCausalModel,
|
||||
ProbabilisticCausalModel,
|
||||
StructuralCausalModel,
|
||||
validate_causal_dag,
|
||||
validate_node,
|
||||
)
|
||||
from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
|
||||
from dowhy.gcm.fitting_sampling import draw_samples
|
||||
from dowhy.gcm.ml import ClassificationModel, PredictionModel
|
||||
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
|
||||
from dowhy.gcm.stats import marginal_expectation
|
||||
from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
|
||||
from dowhy.gcm.util.general import has_categorical, is_categorical, means_difference, set_random_seed, shape_into_2d
|
||||
from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
"""This module defines implementations of :class:`~dowhy.gcm.fcms.PredictionModel` used by the different
|
||||
:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.fcms.PostNonlinearModel` or
|
||||
:class:`~dowhy.gcm.fcms.AdditiveNoiseModel`.
|
||||
"""This module defines implementations of :class:`~dowhy.gcm.ml.PredictionModel` used by the different
|
||||
:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel` or
|
||||
:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel`.
|
||||
"""
|
||||
|
||||
from .classification import (
|
||||
ClassificationModel,
|
||||
SklearnClassificationModel,
|
||||
create_gaussian_process_classifier,
|
||||
create_hist_gradient_boost_classifier,
|
||||
|
@ -11,7 +12,9 @@ from .classification import (
|
|||
create_polynom_logistic_regression_classifier,
|
||||
create_random_forest_classifier,
|
||||
)
|
||||
from .prediction_model import PredictionModel
|
||||
from .regression import (
|
||||
InvertibleFunction,
|
||||
SklearnRegressionModel,
|
||||
create_elastic_net_regressor,
|
||||
create_gaussian_process_regressor,
|
||||
|
|
|
@ -6,7 +6,7 @@ from autogluon import tabular
|
|||
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
|
||||
from packaging import version
|
||||
|
||||
from dowhy.gcm.fcms import ClassificationModel, PredictionModel
|
||||
from dowhy.gcm.ml import ClassificationModel, PredictionModel
|
||||
from dowhy.gcm.util.general import shape_into_2d
|
||||
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
"""Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
|
||||
in the future.
|
||||
"""
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import List
|
||||
|
||||
import numpy as np
|
||||
|
@ -10,6 +10,8 @@ from packaging import version
|
|||
from sklearn.pipeline import make_pipeline
|
||||
from sklearn.preprocessing import PolynomialFeatures
|
||||
|
||||
from dowhy.gcm.ml.prediction_model import PredictionModel
|
||||
|
||||
if version.parse(sklearn.__version__) < version.parse("1.0"):
|
||||
from sklearn.experimental import enable_hist_gradient_boosting # noqa
|
||||
|
||||
|
@ -25,11 +27,21 @@ from sklearn.naive_bayes import GaussianNB
|
|||
from sklearn.neighbors import KNeighborsClassifier
|
||||
from sklearn.svm import SVC
|
||||
|
||||
from dowhy.gcm.fcms import ClassificationModel
|
||||
from dowhy.gcm.ml.regression import SklearnRegressionModel
|
||||
from dowhy.gcm.util.general import auto_apply_encoders, shape_into_2d
|
||||
|
||||
|
||||
class ClassificationModel(PredictionModel):
|
||||
@abstractmethod
|
||||
def predict_probabilities(self, X: np.array) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def classes(self) -> List[str]:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class SklearnClassificationModel(SklearnRegressionModel, ClassificationModel):
|
||||
def predict_probabilities(self, X: np.array) -> np.ndarray:
|
||||
return shape_into_2d(self._sklearn_mdl.predict_proba(auto_apply_encoders(X, self._encoders)))
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
from abc import abstractmethod
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
class PredictionModel:
|
||||
"""Represents general prediction model implementations. Each prediction model should provide a fit and a predict
|
||||
method."""
|
||||
|
||||
@abstractmethod
|
||||
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def predict(self, X: np.ndarray) -> np.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def clone(self):
|
||||
"""
|
||||
Clones the prediction model using the same hyper parameters but not fitted.
|
||||
|
||||
:return: An unfitted clone of the prediction model.
|
||||
"""
|
||||
raise NotImplementedError
|
|
@ -1,7 +1,7 @@
|
|||
"""Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
|
||||
in the future.
|
||||
"""
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
|
@ -24,7 +24,7 @@ from sklearn.linear_model import ElasticNetCV, LassoCV, LassoLarsIC, LinearRegre
|
|||
from sklearn.neighbors import KNeighborsRegressor
|
||||
from sklearn.svm import SVR
|
||||
|
||||
from dowhy.gcm.fcms import InvertibleFunction, PredictionModel
|
||||
from dowhy.gcm.ml.prediction_model import PredictionModel
|
||||
from dowhy.gcm.util.general import auto_apply_encoders, auto_fit_encoders, shape_into_2d
|
||||
|
||||
|
||||
|
@ -122,6 +122,18 @@ def create_polynom_regressor(degree: int = 2, **kwargs_linear_model) -> SklearnR
|
|||
)
|
||||
|
||||
|
||||
class InvertibleFunction:
|
||||
@abstractmethod
|
||||
def evaluate(self, X: np.ndarray) -> np.ndarray:
|
||||
"""Applies the function on the input."""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
|
||||
"""Returns the outcome of applying the inverse of the function on the inputs."""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class InvertibleIdentityFunction(InvertibleFunction):
|
||||
def evaluate(self, X: np.ndarray) -> np.ndarray:
|
||||
return X
|
||||
|
|
|
@ -13,8 +13,8 @@ from sklearn.cluster import KMeans
|
|||
from sklearn.metrics import silhouette_score
|
||||
from sklearn.mixture import BayesianGaussianMixture
|
||||
|
||||
from dowhy.gcm.causal_mechanisms import StochasticModel
|
||||
from dowhy.gcm.divergence import estimate_kl_divergence_continuous
|
||||
from dowhy.gcm.graph import StochasticModel
|
||||
from dowhy.gcm.util.general import shape_into_2d
|
||||
|
||||
_CONTINUOUS_DISTRIBUTIONS = [
|
||||
|
|
|
@ -9,7 +9,7 @@ import pandas as pd
|
|||
from sklearn.linear_model._base import LinearModel
|
||||
from sklearn.utils.validation import check_is_fitted
|
||||
|
||||
from dowhy.gcm.fcms import PredictionModel
|
||||
from dowhy.gcm.ml.prediction_model import PredictionModel
|
||||
from dowhy.gcm.ml.regression import SklearnRegressionModel
|
||||
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
|
||||
|
||||
|
|
|
@ -11,9 +11,9 @@ import numpy as np
|
|||
import pandas as pd
|
||||
from statsmodels.stats.multitest import multipletests
|
||||
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel
|
||||
from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors, is_root_node, validate_causal_graph
|
||||
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, validate_causal_graph
|
||||
from dowhy.gcm.independence_test import kernel_based
|
||||
from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node
|
||||
|
||||
|
||||
class RejectionResult(Enum):
|
||||
|
|
|
@ -10,15 +10,19 @@ import numpy as np
|
|||
import pandas as pd
|
||||
|
||||
from dowhy.gcm._noise import compute_noise_from_data
|
||||
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
|
||||
from dowhy.gcm.fcms import ClassifierFCM
|
||||
from dowhy.gcm.causal_mechanisms import ClassifierFCM
|
||||
from dowhy.gcm.causal_models import (
|
||||
InvertibleStructuralCausalModel,
|
||||
ProbabilisticCausalModel,
|
||||
StructuralCausalModel,
|
||||
validate_causal_dag,
|
||||
)
|
||||
from dowhy.gcm.fitting_sampling import draw_samples
|
||||
from dowhy.gcm.graph import (
|
||||
from dowhy.graph import (
|
||||
DirectedGraph,
|
||||
get_ordered_predecessors,
|
||||
is_root_node,
|
||||
node_connected_subgraph_view,
|
||||
validate_causal_dag,
|
||||
validate_node_in_graph,
|
||||
)
|
||||
|
||||
|
|
|
@ -0,0 +1,75 @@
|
|||
"""This module defines the fundamental interfaces and functions related to causal graphs..
|
||||
|
||||
Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
|
||||
the future.
|
||||
"""
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import Any, List
|
||||
|
||||
import networkx as nx
|
||||
from networkx.algorithms.dag import has_cycle
|
||||
from typing_extensions import Protocol
|
||||
|
||||
|
||||
class HasNodes(Protocol):
|
||||
"""This protocol defines a trait for classes having nodes."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def nodes(self):
|
||||
""":returns Dict[Any, Dict[Any, Any]]"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class HasEdges(Protocol):
|
||||
"""This protocol defines a trait for classes having edges."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def edges(self):
|
||||
""":returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class DirectedGraph(HasNodes, HasEdges, Protocol):
|
||||
"""A protocol representing a directed graph as needed by graphical causal models.
|
||||
|
||||
This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
|
||||
compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
|
||||
a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def predecessors(self, node):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
|
||||
return list(causal_graph.predecessors(node)) == []
|
||||
|
||||
|
||||
def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
|
||||
"""This function returns predecessors of a node in a well-defined order.
|
||||
|
||||
This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
|
||||
might not be returned in a reliable order.
|
||||
"""
|
||||
return sorted(causal_graph.predecessors(node))
|
||||
|
||||
|
||||
def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
|
||||
"""Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
|
||||
# can't use nx.node_connected_component, because it doesn't work with DiGraphs.
|
||||
# Hence, a manual loop:
|
||||
return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
|
||||
|
||||
|
||||
def validate_acyclic(causal_graph: DirectedGraph) -> None:
|
||||
if has_cycle(causal_graph):
|
||||
raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
|
||||
|
||||
|
||||
def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
|
||||
if node not in causal_graph.nodes:
|
||||
raise ValueError("Node %s can not be found in the given graph!" % node)
|
|
@ -5,7 +5,7 @@ from flaky import flaky
|
|||
from pytest import approx, importorskip, mark
|
||||
from sklearn.model_selection import train_test_split
|
||||
|
||||
from dowhy.gcm.fcms import AdditiveNoiseModel, ClassifierFCM
|
||||
from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM
|
||||
|
||||
autogluon = importorskip("dowhy.gcm.ml.autogluon")
|
||||
from dowhy.gcm.ml.autogluon import AutoGluonClassifier, AutoGluonRegressor
|
||||
|
|
|
@ -9,13 +9,13 @@ from dowhy.gcm import (
|
|||
InverseDensityScorer,
|
||||
InvertibleStructuralCausalModel,
|
||||
MedianCDFQuantileScorer,
|
||||
PredictionModel,
|
||||
attribute_anomalies,
|
||||
auto,
|
||||
fit,
|
||||
)
|
||||
from dowhy.gcm.anomaly import _relative_frequency, attribute_anomaly_scores
|
||||
from dowhy.gcm.density_estimators import GaussianMixtureDensityEstimator
|
||||
from dowhy.gcm.ml import PredictionModel
|
||||
|
||||
|
||||
@flaky(max_runs=3)
|
||||
|
|
|
@ -5,15 +5,9 @@ import pytest
|
|||
from flaky import flaky
|
||||
from pytest import approx
|
||||
|
||||
from dowhy.gcm import (
|
||||
AdditiveNoiseModel,
|
||||
EmpiricalDistribution,
|
||||
ProbabilisticCausalModel,
|
||||
draw_samples,
|
||||
fit,
|
||||
is_root_node,
|
||||
)
|
||||
from dowhy.gcm import AdditiveNoiseModel, EmpiricalDistribution, ProbabilisticCausalModel, draw_samples, fit
|
||||
from dowhy.gcm.ml import create_linear_regressor
|
||||
from dowhy.graph import is_root_node
|
||||
|
||||
|
||||
@flaky(max_runs=2)
|
||||
|
|
|
@ -16,11 +16,11 @@ from dowhy.gcm import (
|
|||
intrinsic_causal_influence,
|
||||
)
|
||||
from dowhy.gcm._noise import noise_samples_of_ancestors
|
||||
from dowhy.gcm.graph import node_connected_subgraph_view
|
||||
from dowhy.gcm.influence import intrinsic_causal_influence_sample
|
||||
from dowhy.gcm.ml import create_hist_gradient_boost_classifier, create_linear_regressor_with_given_parameters
|
||||
from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
|
||||
from dowhy.gcm.util.general import apply_one_hot_encoding, fit_one_hot_encoders
|
||||
from dowhy.graph import node_connected_subgraph_view
|
||||
from tests.gcm.test_noise import _persist_parents
|
||||
|
||||
|
||||
|
|
|
@ -6,7 +6,6 @@ from flaky import flaky
|
|||
|
||||
from dowhy.gcm import (
|
||||
AdditiveNoiseModel,
|
||||
DirectedGraph,
|
||||
EmpiricalDistribution,
|
||||
InvertibleStructuralCausalModel,
|
||||
StructuralCausalModel,
|
||||
|
@ -14,12 +13,13 @@ from dowhy.gcm import (
|
|||
)
|
||||
from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, get_noise_dependent_function
|
||||
from dowhy.gcm.auto import assign_causal_mechanisms
|
||||
from dowhy.gcm.graph import PARENTS_DURING_FIT, get_ordered_predecessors
|
||||
from dowhy.gcm.causal_models import PARENTS_DURING_FIT
|
||||
from dowhy.gcm.ml import (
|
||||
create_linear_regressor,
|
||||
create_linear_regressor_with_given_parameters,
|
||||
create_logistic_regression_classifier,
|
||||
)
|
||||
from dowhy.graph import DirectedGraph, get_ordered_predecessors
|
||||
|
||||
|
||||
def test_given_data_with_known_noise_values_when_compute_data_from_noise_then_returns_correct_values():
|
||||
|
|
Загрузка…
Ссылка в новой задаче