Refactor graph.py, fcms.py and cms.py in gcm module

- fcms.py is now called causal_mechanisms.py
- cms.py is now called causal_models.py
- StochasticModel and ConditionalStochasticModel are now part of causal_mechanisms.py instead of graph.py
- graph.py is moved to the main dowhy module in preparation to replace the CausalModel class
- causal_models.py now only contains the causal models ProbabilisticCausalModel, StructuralCausalModel and InvertibleStructuralCausalModel. It also has all the validation methods related to cms.
- The PredictionModel class is now part of the gcm.ml module instead of causal_mechanisms.py

Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
This commit is contained in:
Patrick Bloebaum 2023-04-28 13:32:46 -07:00 коммит произвёл Patrick Blöbaum
Родитель 0fb1314c5d
Коммит 961ffc6373
32 изменённых файлов: 329 добавлений и 317 удалений

Просмотреть файл

@ -46,10 +46,10 @@ dowhy.gcm.auto module
:undoc-members:
:show-inheritance:
dowhy.gcm.cms module
dowhy.gcm.causal_models module
--------------------
.. automodule:: dowhy.gcm.cms
.. automodule:: dowhy.gcm.causal_models
:members:
:undoc-members:
:show-inheritance:
@ -118,10 +118,10 @@ dowhy.gcm.divergence module
:undoc-members:
:show-inheritance:
dowhy.gcm.fcms module
dowhy.gcm.causal_mechanisms module
---------------------
.. automodule:: dowhy.gcm.fcms
.. automodule:: dowhy.gcm.causal_mechanisms
:members:
:undoc-members:
:show-inheritance:
@ -142,10 +142,10 @@ dowhy.gcm.fitting\_sampling module
:undoc-members:
:show-inheritance:
dowhy.gcm.graph module
dowhy.graph module
----------------------
.. automodule:: dowhy.gcm.graph
.. automodule:: dowhy.graph
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -18,13 +18,13 @@ distinguishes between these two types of nodes.
For root nodes such as :math:`X`, the distribution :math:`P_x` is modeled using a stochastic model.
Non-root nodes such as :math:`Y` are modelled using a *conditional* stochastic model. DoWhy's gcm package
defines corresponding interfaces for both, namely :class:`~dowhy.gcm.graph.StochasticModel` and
:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
defines corresponding interfaces for both, namely :class:`~dowhy.gcm.causal_mechanisms.StochasticModel` and
:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.
The gcm package also provides ready-to-use implementations, such as :class:`~dowhy.gcm.stochastic_models
.ScipyDistribution` or :class:`~dowhy.gcm.stochastic_models.BayesianGaussianMixtureDistribution` for
:class:`~dowhy.gcm.graph.StochasticModel`, and :class:`~dowhy.gcm.fcms.AdditiveNoiseModel` for
:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
:class:`~dowhy.gcm.causal_mechanisms.StochasticModel`, and :class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` for
:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.
Knowing that, we can now start to manually assign causal models to nodes according to our needs.
Say, we know from domain knowledge, that our root node X follows a normal distribution. In this
@ -38,7 +38,7 @@ case, we can explicitly assign this:
>>> causal_model.set_causal_mechanism('X', gcm.ScipyDistribution(norm))
For the non-root node Y, let's use an additive noise model (ANM), represented by the
:class:`~dowhy.gcm.fcms.AdditiveNoiseModel` class. It has a
:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` class. It has a
structural assignment of the form: :math:`Y := f(X) + N`. Here, f is a deterministic prediction
function, whereas N is a noise term. Let's put all of this together:
@ -48,7 +48,7 @@ function, whereas N is a noise term. Let's put all of this together:
The rather interesting part here is the ``prediction_model``, which corresponds to our function
:math:`f` above. This prediction model must satisfy the contract defined by
:class:`~dowhy.gcm.fcms.PredictionModel`, i.e. it must implement the methods::
:class:`~dowhy.gcm.ml.PredictionModel`, i.e. it must implement the methods::
def fit(self, X: np.ndarray, Y: np.ndarray) -> None: ...
def predict(self, X: np.ndarray) -> np.ndarray: ...
@ -79,9 +79,9 @@ Using ground truth models
In some scenarios the ground truth models might be known and should be used instead. Let's
assume, we know that our relationship are linear with coefficients :math:`\alpha = 2` and
:math:`\beta = 3`. Let's make use of this knowledge by creating a custom prediction model that
implements the :class:`~dowhy.gcm.fcms.PredictionModel` interface:
implements the :class:`~dowhy.gcm.ml.PredictionModel` interface:
>>> class MyCustomModel(gcm.PredictionModel):
>>>import dowhy.gcm.ml.prediction_model class MyCustomModel(gcm.ml.PredictionModel):
>>> def __init__(self, coefficient):
>>> self.coefficient = coefficient
>>>

Просмотреть файл

@ -13,15 +13,14 @@ from .anomaly_scorers import (
MedianDeviationScorer,
RescaledMedianCDFQuantileScorer,
)
from .cms import FunctionalCausalModel, InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from .causal_mechanisms import AdditiveNoiseModel, ClassifierFCM, PostNonlinearModel
from .causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from .confidence_intervals import confidence_intervals
from .confidence_intervals_cms import bootstrap_sampling, fit_and_compute
from .density_estimators import GaussianMixtureDensityEstimator, KernelDensityEstimator1D
from .distribution_change import distribution_change, distribution_change_of_graphs
from .fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PostNonlinearModel, PredictionModel
from .feature_relevance import feature_relevance_distribution, feature_relevance_sample, parent_relevance
from .fitting_sampling import draw_samples, fit
from .graph import ConditionalStochasticModel, DirectedGraph, FunctionalCausalModel, StochasticModel, is_root_node
from .independence_test import (
approx_kernel_based,
generalised_cov_based,
@ -30,6 +29,7 @@ from .independence_test import (
regression_based,
)
from .influence import arrow_strength, intrinsic_causal_influence
from .ml import ClassificationModel, PredictionModel
from .stochastic_models import BayesianGaussianMixtureDistribution, EmpiricalDistribution, ScipyDistribution
from .unit_change import unit_change
from .validation import RejectionResult, refute_causal_structure, refute_invertible_model

Просмотреть файл

@ -4,10 +4,15 @@ import networkx as nx
import numpy as np
import pandas as pd
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from dowhy.gcm.fcms import PredictionModel
from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view, validate_causal_dag
from dowhy.gcm.causal_models import (
InvertibleStructuralCausalModel,
ProbabilisticCausalModel,
StructuralCausalModel,
validate_causal_dag,
)
from dowhy.gcm.ml.prediction_model import PredictionModel
from dowhy.gcm.util.general import shape_into_2d
from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view
def compute_data_from_noise(causal_model: StructuralCausalModel, noise_data: pd.DataFrame) -> pd.DataFrame:

Просмотреть файл

@ -9,11 +9,12 @@ from dowhy.gcm import config
from dowhy.gcm._noise import compute_noise_from_data, get_noise_dependent_function, noise_samples_of_ancestors
from dowhy.gcm.anomaly_scorer import AnomalyScorer
from dowhy.gcm.anomaly_scorers import MedianCDFQuantileScorer, RescaledMedianCDFQuantileScorer
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel
from dowhy.gcm.graph import ConditionalStochasticModel, get_ordered_predecessors, is_root_node, validate_causal_dag
from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, validate_causal_dag
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
from dowhy.gcm.stats import permute_features
from dowhy.gcm.util.general import shape_into_2d
from dowhy.graph import get_ordered_predecessors, is_root_node
def conditional_anomaly_scores(

Просмотреть файл

@ -13,10 +13,11 @@ from sklearn.model_selection import KFold, train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from dowhy.gcm import config
from dowhy.gcm.cms import ProbabilisticCausalModel
from dowhy.gcm.fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PredictionModel
from dowhy.gcm.graph import CAUSAL_MECHANISM, get_ordered_predecessors, is_root_node, validate_causal_model_assignment
from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM
from dowhy.gcm.causal_models import CAUSAL_MECHANISM, ProbabilisticCausalModel, validate_causal_model_assignment
from dowhy.gcm.ml import (
ClassificationModel,
PredictionModel,
create_hist_gradient_boost_classifier,
create_hist_gradient_boost_regressor,
create_lasso_regressor,
@ -49,6 +50,7 @@ from dowhy.gcm.util.general import (
set_random_seed,
shape_into_2d,
)
from dowhy.graph import get_ordered_predecessors, is_root_node
_LIST_OF_POTENTIAL_CLASSIFIERS_GOOD = [
partial(create_logistic_regression_classifier, max_iter=1000),

Просмотреть файл

@ -1,5 +1,4 @@
"""This module defines multiple implementations of the abstract class :class:`~dowhy.gcm.graph.FunctionalCausalModel`
(FCM)
"""This module implements different causal mechanisms.
Classes in this module should be considered experimental, meaning there might be breaking API changes in the future.
"""
@ -10,52 +9,69 @@ from typing import List, Optional
import numpy as np
from dowhy.gcm.graph import FunctionalCausalModel, InvertibleFunctionalCausalModel, StochasticModel
from dowhy.gcm.ml import ClassificationModel, PredictionModel
from dowhy.gcm.ml.regression import InvertibleFunction
from dowhy.gcm.util.general import is_categorical, shape_into_2d
class PredictionModel:
"""Represents general prediction model implementations. Each prediction model should provide a fit and a predict
method."""
class StochasticModel(ABC):
"""A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""
@abstractmethod
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
def fit(self, X: np.ndarray) -> None:
"""Fits the model according to the data."""
raise NotImplementedError
@abstractmethod
def predict(self, X: np.ndarray) -> np.ndarray:
def draw_samples(self, num_samples: int) -> np.ndarray:
"""Draws samples for the fitted model."""
raise NotImplementedError
@abstractmethod
def clone(self):
"""
Clones the prediction model using the same hyper parameters but not fitted.
:return: An unfitted clone of the prediction model.
"""
raise NotImplementedError
class ClassificationModel(PredictionModel):
@abstractmethod
def predict_probabilities(self, X: np.array) -> np.ndarray:
raise NotImplementedError
class ConditionalStochasticModel(ABC):
"""A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
causal model."""
@property
@abstractmethod
def classes(self) -> List[str]:
raise NotImplementedError
class InvertibleFunction:
@abstractmethod
def evaluate(self, X: np.ndarray) -> np.ndarray:
"""Applies the function on the input."""
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
"""Fits the model according to the data."""
raise NotImplementedError
@abstractmethod
def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
"""Returns the outcome of applying the inverse of the function on the inputs."""
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
"""Draws samples for the fitted model."""
raise NotImplementedError
@abstractmethod
def clone(self):
raise NotImplementedError
class FunctionalCausalModel(ConditionalStochasticModel):
"""Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
as:
Y := f(X, N), N: Noise
"""
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
@abstractmethod
def draw_noise_samples(self, num_samples: int) -> np.ndarray:
raise NotImplementedError
@abstractmethod
def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
raise NotImplementedError
class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
@abstractmethod
def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
raise NotImplementedError

Просмотреть файл

@ -7,15 +7,26 @@ from typing import Any, Callable, Optional, Union
import networkx as nx
from dowhy.gcm.graph import (
CAUSAL_MECHANISM,
from dowhy.gcm.causal_mechanisms import (
ConditionalStochasticModel,
DirectedGraph,
FunctionalCausalModel,
InvertibleFunctionalCausalModel,
StochasticModel,
clone_causal_models,
)
from dowhy.graph import (
DirectedGraph,
HasNodes,
get_ordered_predecessors,
is_root_node,
validate_acyclic,
validate_node_in_graph,
)
# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
CAUSAL_MECHANISM = "causal_mechanism"
# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
# afterwards.
PARENTS_DURING_FIT = "parents_during_fit"
class ProbabilisticCausalModel:
@ -83,7 +94,7 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):
:func:`~dowhy.gcm.whatif.counterfactual_samples`. This is a subclass of
:class:`~dowhy.gcm.cms.StructuralCausalModel` and has further restrictions on the class of causal mechanisms.
Here, the mechanisms of non-root nodes need to be invertible with respect to the noise,
such as :class:`~dowhy.gcm.fcms.PostNonlinearModel`.
such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel`.
"""
def set_causal_mechanism(
@ -93,3 +104,60 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):
def causal_mechanism(self, node: Any) -> Union[StochasticModel, InvertibleFunctionalCausalModel]:
return super().causal_mechanism(node)
def validate_causal_dag(causal_graph: DirectedGraph) -> None:
validate_acyclic(causal_graph)
validate_causal_graph(causal_graph)
def validate_causal_graph(causal_graph: DirectedGraph) -> None:
for node in causal_graph.nodes:
validate_node(causal_graph, node)
def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
validate_causal_model_assignment(causal_graph, node)
validate_local_structure(causal_graph, node)
def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
validate_node_has_causal_model(causal_graph, target_node)
causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
if is_root_node(causal_graph, target_node):
if not isinstance(causal_model, StochasticModel):
raise RuntimeError(
"Node %s is a root node and, thus, requires a StochasticModel, "
"but a %s was found!" % (target_node, causal_model)
)
elif not isinstance(causal_model, ConditionalStochasticModel):
raise RuntimeError(
"Node %s has parents and, thus, requires a ConditionalStochasticModel, "
"but a %s was found!" % (target_node, causal_model)
)
def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
PARENTS_DURING_FIT
] != get_ordered_predecessors(causal_graph, node):
raise RuntimeError(
"The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
"causal models in the graph first. If the mechanism is already fitted based on the causal"
"parents, consider to update the persisted parents for that node manually." % node
)
def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
validate_node_in_graph(causal_graph, node)
if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
raise ValueError("Node %s has no assigned causal mechanism!" % node)
def clone_causal_models(source: HasNodes, destination: HasNodes):
for node in destination.nodes:
if CAUSAL_MECHANISM in source.nodes[node]:
destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()

Просмотреть файл

@ -10,7 +10,7 @@ import numpy as np
import pandas as pd
from dowhy.gcm import auto
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from dowhy.gcm.fitting_sampling import fit
# A convenience function when computing confidence intervals specifically for non-deterministic causal queries. This

Просмотреть файл

@ -14,22 +14,19 @@ from statsmodels.stats.multitest import multipletests
from tqdm import tqdm
from dowhy.gcm.auto import AssignmentQuality, assign_causal_mechanisms
from dowhy.gcm.cms import ProbabilisticCausalModel
from dowhy.gcm.divergence import auto_estimate_kl_divergence
from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
from dowhy.gcm.graph import (
from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
from dowhy.gcm.causal_models import (
PARENTS_DURING_FIT,
ConditionalStochasticModel,
DirectedGraph,
ProbabilisticCausalModel,
clone_causal_models,
get_ordered_predecessors,
is_root_node,
node_connected_subgraph_view,
validate_causal_dag,
)
from dowhy.gcm.divergence import auto_estimate_kl_divergence
from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
from dowhy.gcm.independence_test.kernel import kernel_based
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
from dowhy.gcm.util.general import shape_into_2d
from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node, node_connected_subgraph_view
_logger = logging.getLogger(__name__)

Просмотреть файл

@ -16,11 +16,11 @@ from joblib import Parallel, delayed
from tqdm import tqdm
import dowhy.gcm.config as config
from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors
from dowhy.gcm.independence_test import kernel_based
from dowhy.gcm.util import plot
from dowhy.gcm.util.general import set_random_seed
from dowhy.gcm.validation import _get_non_descendants
from dowhy.graph import DirectedGraph, get_ordered_predecessors
COLORS = list(mcolors.TABLEAU_COLORS.values())

Просмотреть файл

@ -6,7 +6,7 @@ import numpy as np
import pandas as pd
from dowhy.gcm import feature_relevance
from dowhy.gcm.cms import StructuralCausalModel
from dowhy.gcm.causal_models import StructuralCausalModel
from dowhy.gcm.shapley import ShapleyConfig

Просмотреть файл

@ -10,13 +10,13 @@ from typing import Any, Callable, Dict, Optional, Tuple, Union
import numpy as np
import pandas as pd
from dowhy.gcm.cms import StructuralCausalModel
from dowhy.gcm.fcms import ProbabilityEstimatorModel
from dowhy.gcm.causal_mechanisms import ProbabilityEstimatorModel
from dowhy.gcm.causal_models import StructuralCausalModel, validate_node
from dowhy.gcm.fitting_sampling import draw_samples
from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, validate_node
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
from dowhy.gcm.stats import marginal_expectation
from dowhy.gcm.util.general import shape_into_2d, variance_of_deviations, variance_of_matching_values
from dowhy.graph import get_ordered_predecessors, is_root_node
def parent_relevance(

Просмотреть файл

@ -11,14 +11,13 @@ import pandas as pd
from tqdm import tqdm
from dowhy.gcm import config
from dowhy.gcm.cms import ProbabilisticCausalModel
from dowhy.gcm.graph import (
from dowhy.gcm.causal_models import (
PARENTS_DURING_FIT,
get_ordered_predecessors,
is_root_node,
ProbabilisticCausalModel,
validate_causal_dag,
validate_causal_model_assignment,
)
from dowhy.graph import get_ordered_predecessors, is_root_node
def fit(causal_model: ProbabilisticCausalModel, data: pd.DataFrame):

Просмотреть файл

@ -1,201 +0,0 @@
"""This module defines the fundamental interfaces and functions related to causal graphs in graphical causal models.
Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
the future.
"""
from abc import ABC, abstractmethod
from typing import Any, List
import networkx as nx
import numpy as np
from networkx.algorithms.dag import has_cycle
from typing_extensions import Protocol
# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
CAUSAL_MECHANISM = "causal_mechanism"
# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
# afterwards.
PARENTS_DURING_FIT = "parents_during_fit"
class HasNodes(Protocol):
"""This protocol defines a trait for classes having nodes."""
@property
@abstractmethod
def nodes(self):
""":returns Dict[Any, Dict[Any, Any]]"""
raise NotImplementedError
class HasEdges(Protocol):
"""This protocol defines a trait for classes having edges."""
@property
@abstractmethod
def edges(self):
""":returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
raise NotImplementedError
class DirectedGraph(HasNodes, HasEdges, Protocol):
"""A protocol representing a directed graph as needed by graphical causal models.
This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
"""
@abstractmethod
def predecessors(self, node):
raise NotImplementedError
class StochasticModel(ABC):
"""A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""
@abstractmethod
def fit(self, X: np.ndarray) -> None:
"""Fits the model according to the data."""
raise NotImplementedError
@abstractmethod
def draw_samples(self, num_samples: int) -> np.ndarray:
"""Draws samples for the fitted model."""
raise NotImplementedError
@abstractmethod
def clone(self):
raise NotImplementedError
class ConditionalStochasticModel(ABC):
"""A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
causal model."""
@abstractmethod
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
"""Fits the model according to the data."""
raise NotImplementedError
@abstractmethod
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
"""Draws samples for the fitted model."""
raise NotImplementedError
@abstractmethod
def clone(self):
raise NotImplementedError
class FunctionalCausalModel(ConditionalStochasticModel):
"""Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
as:
Y := f(X, N), N: Noise
"""
def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
@abstractmethod
def draw_noise_samples(self, num_samples: int) -> np.ndarray:
raise NotImplementedError
@abstractmethod
def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
raise NotImplementedError
class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
@abstractmethod
def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
raise NotImplementedError
def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
return list(causal_graph.predecessors(node)) == []
def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
"""This function returns predecessors of a node in a well-defined order.
This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
might not be returned in a reliable order.
"""
return sorted(causal_graph.predecessors(node))
def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
"""Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
# can't use nx.node_connected_component, because it doesn't work with DiGraphs.
# Hence a manual loop:
return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
def clone_causal_models(source: HasNodes, destination: HasNodes):
for node in destination.nodes:
if CAUSAL_MECHANISM in source.nodes[node]:
destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()
def validate_acyclic(causal_graph: DirectedGraph) -> None:
if has_cycle(causal_graph):
raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
def validate_causal_dag(causal_graph: DirectedGraph) -> None:
validate_acyclic(causal_graph)
validate_causal_graph(causal_graph)
def validate_causal_graph(causal_graph: DirectedGraph) -> None:
for node in causal_graph.nodes:
validate_node(causal_graph, node)
def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
validate_causal_model_assignment(causal_graph, node)
validate_local_structure(causal_graph, node)
def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
validate_node_has_causal_model(causal_graph, target_node)
causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
if is_root_node(causal_graph, target_node):
if not isinstance(causal_model, StochasticModel):
raise RuntimeError(
"Node %s is a root node and, thus, requires a StochasticModel, "
"but a %s was found!" % (target_node, causal_model)
)
elif not isinstance(causal_model, ConditionalStochasticModel):
raise RuntimeError(
"Node %s has parents and, thus, requires a ConditionalStochasticModel, "
"but a %s was found!" % (target_node, causal_model)
)
def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
PARENTS_DURING_FIT
] != get_ordered_predecessors(causal_graph, node):
raise RuntimeError(
"The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
"causal models in the graph first. If the mechanism is already fitted based on the causal"
"parents, consider to update the persisted parents for that node manually." % node
)
def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
validate_node_in_graph(causal_graph, node)
if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
raise ValueError("Node %s has no assigned causal mechanism!" % node)
def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
if node not in causal_graph.nodes:
raise ValueError("Node %s can not be found in the given graph!" % node)

Просмотреть файл

@ -4,7 +4,7 @@ import numpy as np
from scipy import stats
from dowhy.gcm.auto import AssignmentQuality, select_model
from dowhy.gcm.fcms import PredictionModel
from dowhy.gcm.ml import PredictionModel
from dowhy.gcm.util.general import is_categorical, shape_into_2d

Просмотреть файл

@ -14,22 +14,22 @@ from numpy.matlib import repmat
import dowhy.gcm.auto as auto
from dowhy.gcm import feature_relevance_sample
from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, noise_samples_of_ancestors
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
from dowhy.gcm.fcms import ClassificationModel, ClassifierFCM, PredictionModel, ProbabilityEstimatorModel
from dowhy.gcm.fitting_sampling import draw_samples
from dowhy.gcm.graph import (
ConditionalStochasticModel,
get_ordered_predecessors,
is_root_node,
node_connected_subgraph_view,
from dowhy.gcm.causal_mechanisms import ClassifierFCM, ConditionalStochasticModel, ProbabilityEstimatorModel
from dowhy.gcm.causal_models import (
InvertibleStructuralCausalModel,
ProbabilisticCausalModel,
StructuralCausalModel,
validate_causal_dag,
validate_node,
)
from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
from dowhy.gcm.fitting_sampling import draw_samples
from dowhy.gcm.ml import ClassificationModel, PredictionModel
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
from dowhy.gcm.stats import marginal_expectation
from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
from dowhy.gcm.util.general import has_categorical, is_categorical, means_difference, set_random_seed, shape_into_2d
from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view
_logger = logging.getLogger(__name__)

Просмотреть файл

@ -1,9 +1,10 @@
"""This module defines implementations of :class:`~dowhy.gcm.fcms.PredictionModel` used by the different
:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.fcms.PostNonlinearModel` or
:class:`~dowhy.gcm.fcms.AdditiveNoiseModel`.
"""This module defines implementations of :class:`~dowhy.gcm.ml.PredictionModel` used by the different
:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel` or
:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel`.
"""
from .classification import (
ClassificationModel,
SklearnClassificationModel,
create_gaussian_process_classifier,
create_hist_gradient_boost_classifier,
@ -11,7 +12,9 @@ from .classification import (
create_polynom_logistic_regression_classifier,
create_random_forest_classifier,
)
from .prediction_model import PredictionModel
from .regression import (
InvertibleFunction,
SklearnRegressionModel,
create_elastic_net_regressor,
create_gaussian_process_regressor,

Просмотреть файл

@ -6,7 +6,7 @@ from autogluon import tabular
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
from packaging import version
from dowhy.gcm.fcms import ClassificationModel, PredictionModel
from dowhy.gcm.ml import ClassificationModel, PredictionModel
from dowhy.gcm.util.general import shape_into_2d

Просмотреть файл

@ -1,7 +1,7 @@
"""Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
in the future.
"""
from abc import abstractmethod
from typing import List
import numpy as np
@ -10,6 +10,8 @@ from packaging import version
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from dowhy.gcm.ml.prediction_model import PredictionModel
if version.parse(sklearn.__version__) < version.parse("1.0"):
from sklearn.experimental import enable_hist_gradient_boosting # noqa
@ -25,11 +27,21 @@ from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from dowhy.gcm.fcms import ClassificationModel
from dowhy.gcm.ml.regression import SklearnRegressionModel
from dowhy.gcm.util.general import auto_apply_encoders, shape_into_2d
class ClassificationModel(PredictionModel):
@abstractmethod
def predict_probabilities(self, X: np.array) -> np.ndarray:
raise NotImplementedError
@property
@abstractmethod
def classes(self) -> List[str]:
raise NotImplementedError
class SklearnClassificationModel(SklearnRegressionModel, ClassificationModel):
def predict_probabilities(self, X: np.array) -> np.ndarray:
return shape_into_2d(self._sklearn_mdl.predict_proba(auto_apply_encoders(X, self._encoders)))

Просмотреть файл

@ -0,0 +1,25 @@
from abc import abstractmethod
import numpy as np
class PredictionModel:
"""Represents general prediction model implementations. Each prediction model should provide a fit and a predict
method."""
@abstractmethod
def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
raise NotImplementedError
@abstractmethod
def predict(self, X: np.ndarray) -> np.ndarray:
raise NotImplementedError
@abstractmethod
def clone(self):
"""
Clones the prediction model using the same hyper parameters but not fitted.
:return: An unfitted clone of the prediction model.
"""
raise NotImplementedError

Просмотреть файл

@ -1,7 +1,7 @@
"""Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
in the future.
"""
from abc import abstractmethod
from typing import Any
import numpy as np
@ -24,7 +24,7 @@ from sklearn.linear_model import ElasticNetCV, LassoCV, LassoLarsIC, LinearRegre
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from dowhy.gcm.fcms import InvertibleFunction, PredictionModel
from dowhy.gcm.ml.prediction_model import PredictionModel
from dowhy.gcm.util.general import auto_apply_encoders, auto_fit_encoders, shape_into_2d
@ -122,6 +122,18 @@ def create_polynom_regressor(degree: int = 2, **kwargs_linear_model) -> SklearnR
)
class InvertibleFunction:
@abstractmethod
def evaluate(self, X: np.ndarray) -> np.ndarray:
"""Applies the function on the input."""
raise NotImplementedError
@abstractmethod
def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
"""Returns the outcome of applying the inverse of the function on the inputs."""
raise NotImplementedError
class InvertibleIdentityFunction(InvertibleFunction):
def evaluate(self, X: np.ndarray) -> np.ndarray:
return X

Просмотреть файл

@ -13,8 +13,8 @@ from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.mixture import BayesianGaussianMixture
from dowhy.gcm.causal_mechanisms import StochasticModel
from dowhy.gcm.divergence import estimate_kl_divergence_continuous
from dowhy.gcm.graph import StochasticModel
from dowhy.gcm.util.general import shape_into_2d
_CONTINUOUS_DISTRIBUTIONS = [

Просмотреть файл

@ -9,7 +9,7 @@ import pandas as pd
from sklearn.linear_model._base import LinearModel
from sklearn.utils.validation import check_is_fitted
from dowhy.gcm.fcms import PredictionModel
from dowhy.gcm.ml.prediction_model import PredictionModel
from dowhy.gcm.ml.regression import SklearnRegressionModel
from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values

Просмотреть файл

@ -11,9 +11,9 @@ import numpy as np
import pandas as pd
from statsmodels.stats.multitest import multipletests
from dowhy.gcm.cms import InvertibleStructuralCausalModel
from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors, is_root_node, validate_causal_graph
from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, validate_causal_graph
from dowhy.gcm.independence_test import kernel_based
from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node
class RejectionResult(Enum):

Просмотреть файл

@ -10,15 +10,19 @@ import numpy as np
import pandas as pd
from dowhy.gcm._noise import compute_noise_from_data
from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
from dowhy.gcm.fcms import ClassifierFCM
from dowhy.gcm.causal_mechanisms import ClassifierFCM
from dowhy.gcm.causal_models import (
InvertibleStructuralCausalModel,
ProbabilisticCausalModel,
StructuralCausalModel,
validate_causal_dag,
)
from dowhy.gcm.fitting_sampling import draw_samples
from dowhy.gcm.graph import (
from dowhy.graph import (
DirectedGraph,
get_ordered_predecessors,
is_root_node,
node_connected_subgraph_view,
validate_causal_dag,
validate_node_in_graph,
)

75
dowhy/graph.py Normal file
Просмотреть файл

@ -0,0 +1,75 @@
"""This module defines the fundamental interfaces and functions related to causal graphs..
Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
the future.
"""
from abc import abstractmethod
from typing import Any, List
import networkx as nx
from networkx.algorithms.dag import has_cycle
from typing_extensions import Protocol
class HasNodes(Protocol):
"""This protocol defines a trait for classes having nodes."""
@property
@abstractmethod
def nodes(self):
""":returns Dict[Any, Dict[Any, Any]]"""
raise NotImplementedError
class HasEdges(Protocol):
"""This protocol defines a trait for classes having edges."""
@property
@abstractmethod
def edges(self):
""":returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
raise NotImplementedError
class DirectedGraph(HasNodes, HasEdges, Protocol):
"""A protocol representing a directed graph as needed by graphical causal models.
This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
"""
@abstractmethod
def predecessors(self, node):
raise NotImplementedError
def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
return list(causal_graph.predecessors(node)) == []
def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
"""This function returns predecessors of a node in a well-defined order.
This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
might not be returned in a reliable order.
"""
return sorted(causal_graph.predecessors(node))
def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
"""Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
# can't use nx.node_connected_component, because it doesn't work with DiGraphs.
# Hence, a manual loop:
return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
def validate_acyclic(causal_graph: DirectedGraph) -> None:
if has_cycle(causal_graph):
raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
if node not in causal_graph.nodes:
raise ValueError("Node %s can not be found in the given graph!" % node)

Просмотреть файл

@ -5,7 +5,7 @@ from flaky import flaky
from pytest import approx, importorskip, mark
from sklearn.model_selection import train_test_split
from dowhy.gcm.fcms import AdditiveNoiseModel, ClassifierFCM
from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM
autogluon = importorskip("dowhy.gcm.ml.autogluon")
from dowhy.gcm.ml.autogluon import AutoGluonClassifier, AutoGluonRegressor

Просмотреть файл

@ -9,13 +9,13 @@ from dowhy.gcm import (
InverseDensityScorer,
InvertibleStructuralCausalModel,
MedianCDFQuantileScorer,
PredictionModel,
attribute_anomalies,
auto,
fit,
)
from dowhy.gcm.anomaly import _relative_frequency, attribute_anomaly_scores
from dowhy.gcm.density_estimators import GaussianMixtureDensityEstimator
from dowhy.gcm.ml import PredictionModel
@flaky(max_runs=3)

Просмотреть файл

@ -5,15 +5,9 @@ import pytest
from flaky import flaky
from pytest import approx
from dowhy.gcm import (
AdditiveNoiseModel,
EmpiricalDistribution,
ProbabilisticCausalModel,
draw_samples,
fit,
is_root_node,
)
from dowhy.gcm import AdditiveNoiseModel, EmpiricalDistribution, ProbabilisticCausalModel, draw_samples, fit
from dowhy.gcm.ml import create_linear_regressor
from dowhy.graph import is_root_node
@flaky(max_runs=2)

Просмотреть файл

@ -16,11 +16,11 @@ from dowhy.gcm import (
intrinsic_causal_influence,
)
from dowhy.gcm._noise import noise_samples_of_ancestors
from dowhy.gcm.graph import node_connected_subgraph_view
from dowhy.gcm.influence import intrinsic_causal_influence_sample
from dowhy.gcm.ml import create_hist_gradient_boost_classifier, create_linear_regressor_with_given_parameters
from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
from dowhy.gcm.util.general import apply_one_hot_encoding, fit_one_hot_encoders
from dowhy.graph import node_connected_subgraph_view
from tests.gcm.test_noise import _persist_parents

Просмотреть файл

@ -6,7 +6,6 @@ from flaky import flaky
from dowhy.gcm import (
AdditiveNoiseModel,
DirectedGraph,
EmpiricalDistribution,
InvertibleStructuralCausalModel,
StructuralCausalModel,
@ -14,12 +13,13 @@ from dowhy.gcm import (
)
from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, get_noise_dependent_function
from dowhy.gcm.auto import assign_causal_mechanisms
from dowhy.gcm.graph import PARENTS_DURING_FIT, get_ordered_predecessors
from dowhy.gcm.causal_models import PARENTS_DURING_FIT
from dowhy.gcm.ml import (
create_linear_regressor,
create_linear_regressor_with_given_parameters,
create_logistic_regression_classifier,
)
from dowhy.graph import DirectedGraph, get_ordered_predecessors
def test_given_data_with_known_noise_values_when_compute_data_from_noise_then_returns_correct_values():