Refactor graph.py, fcms.py and cms.py in gcm module

- fcms.py is now called causal_mechanisms.py - cms.py is now called causal_models.py - StochasticModel and ConditionalStochasticModel are now part of causal_mechanisms.py instead of graph.py - graph.py is moved to the main dowhy module in preparation to replace the CausalModel class - causal_models.py now only contains the causal models ProbabilisticCausalModel, StructuralCausalModel and InvertibleStructuralCausalModel. It also has all the validation methods related to cms. - The PredictionModel class is now part of the gcm.ml module instead of causal_mechanisms.py Signed-off-by: Patrick Bloebaum <bloebp@amazon.com>
2023-04-28 13:32:46 -07:00 · 2023-04-28 13:32:46 -07:00 · 961ffc6373
--- a/docs/source/dowhy.gcm.rst
+++ b/docs/source/dowhy.gcm.rst
@ -46,10 +46,10 @@ dowhy.gcm.auto module
   :undoc-members:
   :show-inheritance:

-dowhy.gcm.cms module
+dowhy.gcm.causal_models module
 --------------------

-.. automodule:: dowhy.gcm.cms
+.. automodule:: dowhy.gcm.causal_models
   :members:
   :undoc-members:
   :show-inheritance:
@ -118,10 +118,10 @@ dowhy.gcm.divergence module
   :undoc-members:
   :show-inheritance:

-dowhy.gcm.fcms module
+dowhy.gcm.causal_mechanisms module
 ---------------------

-.. automodule:: dowhy.gcm.fcms
+.. automodule:: dowhy.gcm.causal_mechanisms
   :members:
   :undoc-members:
   :show-inheritance:
@ -142,10 +142,10 @@ dowhy.gcm.fitting\_sampling module
   :undoc-members:
   :show-inheritance:

-dowhy.gcm.graph module
+dowhy.graph module
 ----------------------

-.. automodule:: dowhy.gcm.graph
+.. automodule:: dowhy.graph
   :members:
   :undoc-members:
   :show-inheritance:
--- a/docs/source/user_guide/gcm_based_inference/customizing_model_assignment.rst
+++ b/docs/source/user_guide/gcm_based_inference/customizing_model_assignment.rst
@ -18,13 +18,13 @@ distinguishes between these two types of nodes.

 For root nodes such as :math:`X`, the distribution :math:`P_x` is modeled using a stochastic model.
 Non-root nodes such as :math:`Y` are modelled using a *conditional* stochastic model. DoWhy's gcm package
-defines corresponding interfaces for both, namely :class:`~dowhy.gcm.graph.StochasticModel` and
-:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
+defines corresponding interfaces for both, namely :class:`~dowhy.gcm.causal_mechanisms.StochasticModel` and
+:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.

 The gcm package also provides ready-to-use implementations, such as :class:`~dowhy.gcm.stochastic_models
 .ScipyDistribution` or :class:`~dowhy.gcm.stochastic_models.BayesianGaussianMixtureDistribution` for
-:class:`~dowhy.gcm.graph.StochasticModel`, and :class:`~dowhy.gcm.fcms.AdditiveNoiseModel` for
-:class:`~dowhy.gcm.graph.ConditionalStochasticModel`.
+:class:`~dowhy.gcm.causal_mechanisms.StochasticModel`, and :class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` for
+:class:`~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel`.

 Knowing that, we can now start to manually assign causal models to nodes according to our needs.
 Say, we know from domain knowledge, that our root node X follows a normal distribution. In this
@ -38,7 +38,7 @@ case, we can explicitly assign this:
 >>> causal_model.set_causal_mechanism('X', gcm.ScipyDistribution(norm))

 For the non-root node Y, let's use an additive noise model (ANM), represented by the
-:class:`~dowhy.gcm.fcms.AdditiveNoiseModel` class. It has a
+:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel` class. It has a
 structural assignment of the form: :math:`Y := f(X) + N`. Here, f is a deterministic prediction
 function, whereas N is a noise term. Let's put all of this together:

@ -48,7 +48,7 @@ function, whereas N is a noise term. Let's put all of this together:

 The rather interesting part here is the ``prediction_model``, which corresponds to our function
 :math:`f` above. This prediction model must satisfy the contract defined by
-:class:`~dowhy.gcm.fcms.PredictionModel`, i.e. it must implement the methods::
+:class:`~dowhy.gcm.ml.PredictionModel`, i.e. it must implement the methods::

    def fit(self, X: np.ndarray, Y: np.ndarray) -> None: ...
    def predict(self, X: np.ndarray) -> np.ndarray: ...
@ -79,9 +79,9 @@ Using ground truth models
 In some scenarios the ground truth models might be known and should be used instead. Let's
 assume, we know that our relationship are linear with coefficients :math:`\alpha = 2` and
 :math:`\beta = 3`. Let's make use of this knowledge by creating a custom prediction model that
-implements the :class:`~dowhy.gcm.fcms.PredictionModel` interface:
+implements the :class:`~dowhy.gcm.ml.PredictionModel` interface:

->>> class MyCustomModel(gcm.PredictionModel):
+>>>import dowhy.gcm.ml.prediction_model class MyCustomModel(gcm.ml.PredictionModel):
 >>>     def __init__(self, coefficient):
 >>>         self.coefficient = coefficient
 >>>
--- a/dowhy/gcm/init.py
+++ b/dowhy/gcm/init.py
@ -13,15 +13,14 @@ from .anomaly_scorers import (
    MedianDeviationScorer,
    RescaledMedianCDFQuantileScorer,
 )
-from .cms import FunctionalCausalModel, InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
+from .causal_mechanisms import AdditiveNoiseModel, ClassifierFCM, PostNonlinearModel
+from .causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
 from .confidence_intervals import confidence_intervals
 from .confidence_intervals_cms import bootstrap_sampling, fit_and_compute
 from .density_estimators import GaussianMixtureDensityEstimator, KernelDensityEstimator1D
 from .distribution_change import distribution_change, distribution_change_of_graphs
-from .fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PostNonlinearModel, PredictionModel
 from .feature_relevance import feature_relevance_distribution, feature_relevance_sample, parent_relevance
 from .fitting_sampling import draw_samples, fit
-from .graph import ConditionalStochasticModel, DirectedGraph, FunctionalCausalModel, StochasticModel, is_root_node
 from .independence_test import (
    approx_kernel_based,
    generalised_cov_based,
@ -30,6 +29,7 @@ from .independence_test import (
    regression_based,
 )
 from .influence import arrow_strength, intrinsic_causal_influence
+from .ml import ClassificationModel, PredictionModel
 from .stochastic_models import BayesianGaussianMixtureDistribution, EmpiricalDistribution, ScipyDistribution
 from .unit_change import unit_change
 from .validation import RejectionResult, refute_causal_structure, refute_invertible_model
--- a/dowhy/gcm/_noise.py
+++ b/dowhy/gcm/_noise.py
@ -4,10 +4,15 @@ import networkx as nx
 import numpy as np
 import pandas as pd

-from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
-from dowhy.gcm.fcms import PredictionModel
-from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view, validate_causal_dag
+from dowhy.gcm.causal_models import (
+    InvertibleStructuralCausalModel,
+    ProbabilisticCausalModel,
+    StructuralCausalModel,
+    validate_causal_dag,
+)
+from dowhy.gcm.ml.prediction_model import PredictionModel
 from dowhy.gcm.util.general import shape_into_2d
+from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view


 def compute_data_from_noise(causal_model: StructuralCausalModel, noise_data: pd.DataFrame) -> pd.DataFrame:
--- a/dowhy/gcm/anomaly.py
+++ b/dowhy/gcm/anomaly.py
@ -9,11 +9,12 @@ from dowhy.gcm import config
 from dowhy.gcm._noise import compute_noise_from_data, get_noise_dependent_function, noise_samples_of_ancestors
 from dowhy.gcm.anomaly_scorer import AnomalyScorer
 from dowhy.gcm.anomaly_scorers import MedianCDFQuantileScorer, RescaledMedianCDFQuantileScorer
-from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel
-from dowhy.gcm.graph import ConditionalStochasticModel, get_ordered_predecessors, is_root_node, validate_causal_dag
+from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
+from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, validate_causal_dag
 from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
 from dowhy.gcm.stats import permute_features
 from dowhy.gcm.util.general import shape_into_2d
+from dowhy.graph import get_ordered_predecessors, is_root_node


 def conditional_anomaly_scores(
--- a/dowhy/gcm/auto.py
+++ b/dowhy/gcm/auto.py
@ -13,10 +13,11 @@ from sklearn.model_selection import KFold, train_test_split
 from sklearn.preprocessing import MultiLabelBinarizer

 from dowhy.gcm import config
-from dowhy.gcm.cms import ProbabilisticCausalModel
-from dowhy.gcm.fcms import AdditiveNoiseModel, ClassificationModel, ClassifierFCM, PredictionModel
-from dowhy.gcm.graph import CAUSAL_MECHANISM, get_ordered_predecessors, is_root_node, validate_causal_model_assignment
+from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM
+from dowhy.gcm.causal_models import CAUSAL_MECHANISM, ProbabilisticCausalModel, validate_causal_model_assignment
 from dowhy.gcm.ml import (
+    ClassificationModel,
+    PredictionModel,
    create_hist_gradient_boost_classifier,
    create_hist_gradient_boost_regressor,
    create_lasso_regressor,
@ -49,6 +50,7 @@ from dowhy.gcm.util.general import (
    set_random_seed,
    shape_into_2d,
 )
+from dowhy.graph import get_ordered_predecessors, is_root_node

 _LIST_OF_POTENTIAL_CLASSIFIERS_GOOD = [
    partial(create_logistic_regression_classifier, max_iter=1000),
--- a/dowhy/gcm/causal_mechanisms.py
+++ b/dowhy/gcm/causal_mechanisms.py
@ -1,5 +1,4 @@
-"""This module defines multiple implementations of the abstract class :class:`~dowhy.gcm.graph.FunctionalCausalModel`
-(FCM)
+"""This module implements different causal mechanisms.

 Classes in this module should be considered experimental, meaning there might be breaking API changes in the future.
 """
@ -10,52 +9,69 @@ from typing import List, Optional

 import numpy as np

-from dowhy.gcm.graph import FunctionalCausalModel, InvertibleFunctionalCausalModel, StochasticModel
+from dowhy.gcm.ml import ClassificationModel, PredictionModel
+from dowhy.gcm.ml.regression import InvertibleFunction
 from dowhy.gcm.util.general import is_categorical, shape_into_2d


-class PredictionModel:
-    """Represents general prediction model implementations. Each prediction model should provide a fit and a predict
-    method."""
+class StochasticModel(ABC):
+    """A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""

    @abstractmethod
-    def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
+    def fit(self, X: np.ndarray) -> None:
+        """Fits the model according to the data."""
        raise NotImplementedError

    @abstractmethod
-    def predict(self, X: np.ndarray) -> np.ndarray:
+    def draw_samples(self, num_samples: int) -> np.ndarray:
+        """Draws samples for the fitted model."""
        raise NotImplementedError

    @abstractmethod
    def clone(self):
-        """
-        Clones the prediction model using the same hyper parameters but not fitted.
-
-        :return: An unfitted clone of the prediction model.
-        """
        raise NotImplementedError


-class ClassificationModel(PredictionModel):
-    @abstractmethod
-    def predict_probabilities(self, X: np.array) -> np.ndarray:
-        raise NotImplementedError
+class ConditionalStochasticModel(ABC):
+    """A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
+    causal model."""

-    @property
    @abstractmethod
-    def classes(self) -> List[str]:
-        raise NotImplementedError
-
-
-class InvertibleFunction:
-    @abstractmethod
-    def evaluate(self, X: np.ndarray) -> np.ndarray:
-        """Applies the function on the input."""
+    def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
+        """Fits the model according to the data."""
        raise NotImplementedError

    @abstractmethod
-    def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
-        """Returns the outcome of applying the inverse of the function on the inputs."""
+    def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
+        """Draws samples for the fitted model."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def clone(self):
+        raise NotImplementedError
+
+
+class FunctionalCausalModel(ConditionalStochasticModel):
+    """Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
+    as:
+        Y := f(X, N), N: Noise
+    """
+
+    def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
+        return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
+
+    @abstractmethod
+    def draw_noise_samples(self, num_samples: int) -> np.ndarray:
+        raise NotImplementedError
+
+    @abstractmethod
+    def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
+        raise NotImplementedError
+
+
+class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
+    @abstractmethod
+    def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
        raise NotImplementedError


--- a/dowhy/gcm/causal_models.py
+++ b/dowhy/gcm/causal_models.py
@ -7,15 +7,26 @@ from typing import Any, Callable, Optional, Union

 import networkx as nx

-from dowhy.gcm.graph import (
-    CAUSAL_MECHANISM,
+from dowhy.gcm.causal_mechanisms import (
    ConditionalStochasticModel,
-    DirectedGraph,
    FunctionalCausalModel,
    InvertibleFunctionalCausalModel,
    StochasticModel,
-    clone_causal_models,
 )
+from dowhy.graph import (
+    DirectedGraph,
+    HasNodes,
+    get_ordered_predecessors,
+    is_root_node,
+    validate_acyclic,
+    validate_node_in_graph,
+)
+
+# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
+CAUSAL_MECHANISM = "causal_mechanism"
+# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
+# afterwards.
+PARENTS_DURING_FIT = "parents_during_fit"


 class ProbabilisticCausalModel:
@ -83,7 +94,7 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):
    :func:`~dowhy.gcm.whatif.counterfactual_samples`. This is a subclass of
    :class:`~dowhy.gcm.cms.StructuralCausalModel` and has further restrictions on the class of causal mechanisms.
    Here, the mechanisms of non-root nodes need to be invertible with respect to the noise,
-    such as :class:`~dowhy.gcm.fcms.PostNonlinearModel`.
+    such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel`.
    """

    def set_causal_mechanism(
@ -93,3 +104,60 @@ class InvertibleStructuralCausalModel(StructuralCausalModel):

    def causal_mechanism(self, node: Any) -> Union[StochasticModel, InvertibleFunctionalCausalModel]:
        return super().causal_mechanism(node)
+
+
+def validate_causal_dag(causal_graph: DirectedGraph) -> None:
+    validate_acyclic(causal_graph)
+    validate_causal_graph(causal_graph)
+
+
+def validate_causal_graph(causal_graph: DirectedGraph) -> None:
+    for node in causal_graph.nodes:
+        validate_node(causal_graph, node)
+
+
+def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
+    validate_causal_model_assignment(causal_graph, node)
+    validate_local_structure(causal_graph, node)
+
+
+def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
+    validate_node_has_causal_model(causal_graph, target_node)
+
+    causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
+
+    if is_root_node(causal_graph, target_node):
+        if not isinstance(causal_model, StochasticModel):
+            raise RuntimeError(
+                "Node %s is a root node and, thus, requires a StochasticModel, "
+                "but a %s was found!" % (target_node, causal_model)
+            )
+    elif not isinstance(causal_model, ConditionalStochasticModel):
+        raise RuntimeError(
+            "Node %s has parents and, thus, requires a ConditionalStochasticModel, "
+            "but a %s was found!" % (target_node, causal_model)
+        )
+
+
+def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
+    if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
+        PARENTS_DURING_FIT
+    ] != get_ordered_predecessors(causal_graph, node):
+        raise RuntimeError(
+            "The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
+            "causal models in the graph first. If the mechanism is already fitted based on the causal"
+            "parents, consider to update the persisted parents for that node manually." % node
+        )
+
+
+def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
+    validate_node_in_graph(causal_graph, node)
+
+    if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
+        raise ValueError("Node %s has no assigned causal mechanism!" % node)
+
+
+def clone_causal_models(source: HasNodes, destination: HasNodes):
+    for node in destination.nodes:
+        if CAUSAL_MECHANISM in source.nodes[node]:
+            destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()
--- a/dowhy/gcm/confidence_intervals_cms.py
+++ b/dowhy/gcm/confidence_intervals_cms.py
@ -10,7 +10,7 @@ import numpy as np
 import pandas as pd

 from dowhy.gcm import auto
-from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
+from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
 from dowhy.gcm.fitting_sampling import fit

 # A convenience function when computing confidence intervals specifically for non-deterministic causal queries. This
--- a/dowhy/gcm/distribution_change.py
+++ b/dowhy/gcm/distribution_change.py
@ -14,22 +14,19 @@ from statsmodels.stats.multitest import multipletests
 from tqdm import tqdm

 from dowhy.gcm.auto import AssignmentQuality, assign_causal_mechanisms
-from dowhy.gcm.cms import ProbabilisticCausalModel
-from dowhy.gcm.divergence import auto_estimate_kl_divergence
-from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
-from dowhy.gcm.graph import (
+from dowhy.gcm.causal_mechanisms import ConditionalStochasticModel
+from dowhy.gcm.causal_models import (
    PARENTS_DURING_FIT,
-    ConditionalStochasticModel,
-    DirectedGraph,
+    ProbabilisticCausalModel,
    clone_causal_models,
-    get_ordered_predecessors,
-    is_root_node,
-    node_connected_subgraph_view,
    validate_causal_dag,
 )
+from dowhy.gcm.divergence import auto_estimate_kl_divergence
+from dowhy.gcm.fitting_sampling import draw_samples, fit_causal_model_of_target
 from dowhy.gcm.independence_test.kernel import kernel_based
 from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
 from dowhy.gcm.util.general import shape_into_2d
+from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node, node_connected_subgraph_view

 _logger = logging.getLogger(__name__)

--- a/dowhy/gcm/falsify.py
+++ b/dowhy/gcm/falsify.py
@ -16,11 +16,11 @@ from joblib import Parallel, delayed
 from tqdm import tqdm

 import dowhy.gcm.config as config
-from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors
 from dowhy.gcm.independence_test import kernel_based
 from dowhy.gcm.util import plot
 from dowhy.gcm.util.general import set_random_seed
 from dowhy.gcm.validation import _get_non_descendants
+from dowhy.graph import DirectedGraph, get_ordered_predecessors

 COLORS = list(mcolors.TABLEAU_COLORS.values())

--- a/dowhy/gcm/feature.py
+++ b/dowhy/gcm/feature.py
@ -6,7 +6,7 @@ import numpy as np
 import pandas as pd

 from dowhy.gcm import feature_relevance
-from dowhy.gcm.cms import StructuralCausalModel
+from dowhy.gcm.causal_models import StructuralCausalModel
 from dowhy.gcm.shapley import ShapleyConfig


--- a/dowhy/gcm/feature_relevance.py
+++ b/dowhy/gcm/feature_relevance.py
@ -10,13 +10,13 @@ from typing import Any, Callable, Dict, Optional, Tuple, Union
 import numpy as np
 import pandas as pd

-from dowhy.gcm.cms import StructuralCausalModel
-from dowhy.gcm.fcms import ProbabilityEstimatorModel
+from dowhy.gcm.causal_mechanisms import ProbabilityEstimatorModel
+from dowhy.gcm.causal_models import StructuralCausalModel, validate_node
 from dowhy.gcm.fitting_sampling import draw_samples
-from dowhy.gcm.graph import get_ordered_predecessors, is_root_node, validate_node
 from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
 from dowhy.gcm.stats import marginal_expectation
 from dowhy.gcm.util.general import shape_into_2d, variance_of_deviations, variance_of_matching_values
+from dowhy.graph import get_ordered_predecessors, is_root_node


 def parent_relevance(
--- a/dowhy/gcm/fitting_sampling.py
+++ b/dowhy/gcm/fitting_sampling.py
@ -11,14 +11,13 @@ import pandas as pd
 from tqdm import tqdm

 from dowhy.gcm import config
-from dowhy.gcm.cms import ProbabilisticCausalModel
-from dowhy.gcm.graph import (
+from dowhy.gcm.causal_models import (
    PARENTS_DURING_FIT,
-    get_ordered_predecessors,
-    is_root_node,
+    ProbabilisticCausalModel,
    validate_causal_dag,
    validate_causal_model_assignment,
 )
+from dowhy.graph import get_ordered_predecessors, is_root_node


 def fit(causal_model: ProbabilisticCausalModel, data: pd.DataFrame):
--- a/dowhy/gcm/graph.py
+++ b/dowhy/gcm/graph.py
@ -1,201 +0,0 @@
-"""This module defines the fundamental interfaces and functions related to causal graphs in graphical causal models.
-
-Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
-the future.
-"""
-
-from abc import ABC, abstractmethod
-from typing import Any, List
-
-import networkx as nx
-import numpy as np
-from networkx.algorithms.dag import has_cycle
-from typing_extensions import Protocol
-
-# This constant is used as key when storing/accessing models as causal mechanisms in graph node attributes
-CAUSAL_MECHANISM = "causal_mechanism"
-
-# This constant is used as key when storing the parents of a node during fitting. It's used for validation purposes
-# afterwards.
-PARENTS_DURING_FIT = "parents_during_fit"
-
-
-class HasNodes(Protocol):
-    """This protocol defines a trait for classes having nodes."""
-
-    @property
-    @abstractmethod
-    def nodes(self):
-        """:returns Dict[Any, Dict[Any, Any]]"""
-        raise NotImplementedError
-
-
-class HasEdges(Protocol):
-    """This protocol defines a trait for classes having edges."""
-
-    @property
-    @abstractmethod
-    def edges(self):
-        """:returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
-        raise NotImplementedError
-
-
-class DirectedGraph(HasNodes, HasEdges, Protocol):
-    """A protocol representing a directed graph as needed by graphical causal models.
-
-    This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
-    compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
-    a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
-    """
-
-    @abstractmethod
-    def predecessors(self, node):
-        raise NotImplementedError
-
-
-class StochasticModel(ABC):
-    """A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model."""
-
-    @abstractmethod
-    def fit(self, X: np.ndarray) -> None:
-        """Fits the model according to the data."""
-        raise NotImplementedError
-
-    @abstractmethod
-    def draw_samples(self, num_samples: int) -> np.ndarray:
-        """Draws samples for the fitted model."""
-        raise NotImplementedError
-
-    @abstractmethod
-    def clone(self):
-        raise NotImplementedError
-
-
-class ConditionalStochasticModel(ABC):
-    """A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical
-    causal model."""
-
-    @abstractmethod
-    def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
-        """Fits the model according to the data."""
-        raise NotImplementedError
-
-    @abstractmethod
-    def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
-        """Draws samples for the fitted model."""
-        raise NotImplementedError
-
-    @abstractmethod
-    def clone(self):
-        raise NotImplementedError
-
-
-class FunctionalCausalModel(ConditionalStochasticModel):
-    """Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined
-    as:
-        Y := f(X, N), N: Noise
-    """
-
-    def draw_samples(self, parent_samples: np.ndarray) -> np.ndarray:
-        return self.evaluate(parent_samples, self.draw_noise_samples(parent_samples.shape[0]))
-
-    @abstractmethod
-    def draw_noise_samples(self, num_samples: int) -> np.ndarray:
-        raise NotImplementedError
-
-    @abstractmethod
-    def evaluate(self, parent_samples: np.ndarray, noise_samples: np.ndarray) -> np.ndarray:
-        raise NotImplementedError
-
-
-class InvertibleFunctionalCausalModel(FunctionalCausalModel, ABC):
-    @abstractmethod
-    def estimate_noise(self, target_samples: np.ndarray, parent_samples: np.ndarray) -> np.ndarray:
-        raise NotImplementedError
-
-
-def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
-    return list(causal_graph.predecessors(node)) == []
-
-
-def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
-    """This function returns predecessors of a node in a well-defined order.
-
-    This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
-    might not be returned in a reliable order.
-    """
-    return sorted(causal_graph.predecessors(node))
-
-
-def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
-    """Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
-    # can't use nx.node_connected_component, because it doesn't work with DiGraphs.
-    # Hence a manual loop:
-    return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
-
-
-def clone_causal_models(source: HasNodes, destination: HasNodes):
-    for node in destination.nodes:
-        if CAUSAL_MECHANISM in source.nodes[node]:
-            destination.nodes[node][CAUSAL_MECHANISM] = source.nodes[node][CAUSAL_MECHANISM].clone()
-
-
-def validate_acyclic(causal_graph: DirectedGraph) -> None:
-    if has_cycle(causal_graph):
-        raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
-
-
-def validate_causal_dag(causal_graph: DirectedGraph) -> None:
-    validate_acyclic(causal_graph)
-    validate_causal_graph(causal_graph)
-
-
-def validate_causal_graph(causal_graph: DirectedGraph) -> None:
-    for node in causal_graph.nodes:
-        validate_node(causal_graph, node)
-
-
-def validate_node(causal_graph: DirectedGraph, node: Any) -> None:
-    validate_causal_model_assignment(causal_graph, node)
-    validate_local_structure(causal_graph, node)
-
-
-def validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) -> None:
-    validate_node_has_causal_model(causal_graph, target_node)
-
-    causal_model = causal_graph.nodes[target_node][CAUSAL_MECHANISM]
-
-    if is_root_node(causal_graph, target_node):
-        if not isinstance(causal_model, StochasticModel):
-            raise RuntimeError(
-                "Node %s is a root node and, thus, requires a StochasticModel, "
-                "but a %s was found!" % (target_node, causal_model)
-            )
-    elif not isinstance(causal_model, ConditionalStochasticModel):
-        raise RuntimeError(
-            "Node %s has parents and, thus, requires a ConditionalStochasticModel, "
-            "but a %s was found!" % (target_node, causal_model)
-        )
-
-
-def validate_local_structure(causal_graph: DirectedGraph, node: Any) -> None:
-    if PARENTS_DURING_FIT not in causal_graph.nodes[node] or causal_graph.nodes[node][
-        PARENTS_DURING_FIT
-    ] != get_ordered_predecessors(causal_graph, node):
-        raise RuntimeError(
-            "The causal mechanism of node %s is not fitted to the graphical structure! Fit all"
-            "causal models in the graph first. If the mechanism is already fitted based on the causal"
-            "parents, consider to update the persisted parents for that node manually." % node
-        )
-
-
-def validate_node_has_causal_model(causal_graph: HasNodes, node: Any) -> None:
-    validate_node_in_graph(causal_graph, node)
-
-    if CAUSAL_MECHANISM not in causal_graph.nodes[node]:
-        raise ValueError("Node %s has no assigned causal mechanism!" % node)
-
-
-def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
-    if node not in causal_graph.nodes:
-        raise ValueError("Node %s can not be found in the given graph!" % node)
--- a/dowhy/gcm/independence_test/generalised_cov_measure.py
+++ b/dowhy/gcm/independence_test/generalised_cov_measure.py
@ -4,7 +4,7 @@ import numpy as np
 from scipy import stats

 from dowhy.gcm.auto import AssignmentQuality, select_model
-from dowhy.gcm.fcms import PredictionModel
+from dowhy.gcm.ml import PredictionModel
 from dowhy.gcm.util.general import is_categorical, shape_into_2d


--- a/dowhy/gcm/influence.py
+++ b/dowhy/gcm/influence.py
@ -14,22 +14,22 @@ from numpy.matlib import repmat
 import dowhy.gcm.auto as auto
 from dowhy.gcm import feature_relevance_sample
 from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, noise_samples_of_ancestors
-from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
-from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
-from dowhy.gcm.fcms import ClassificationModel, ClassifierFCM, PredictionModel, ProbabilityEstimatorModel
-from dowhy.gcm.fitting_sampling import draw_samples
-from dowhy.gcm.graph import (
-    ConditionalStochasticModel,
-    get_ordered_predecessors,
-    is_root_node,
-    node_connected_subgraph_view,
+from dowhy.gcm.causal_mechanisms import ClassifierFCM, ConditionalStochasticModel, ProbabilityEstimatorModel
+from dowhy.gcm.causal_models import (
+    InvertibleStructuralCausalModel,
+    ProbabilisticCausalModel,
+    StructuralCausalModel,
    validate_causal_dag,
    validate_node,
 )
+from dowhy.gcm.divergence import estimate_kl_divergence_of_probabilities
+from dowhy.gcm.fitting_sampling import draw_samples
+from dowhy.gcm.ml import ClassificationModel, PredictionModel
 from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values
 from dowhy.gcm.stats import marginal_expectation
 from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
 from dowhy.gcm.util.general import has_categorical, is_categorical, means_difference, set_random_seed, shape_into_2d
+from dowhy.graph import get_ordered_predecessors, is_root_node, node_connected_subgraph_view

 _logger = logging.getLogger(__name__)

--- a/dowhy/gcm/ml/init.py
+++ b/dowhy/gcm/ml/init.py
@ -1,9 +1,10 @@
-"""This module defines implementations of :class:`~dowhy.gcm.fcms.PredictionModel` used by the different
-:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.fcms.PostNonlinearModel` or
-:class:`~dowhy.gcm.fcms.AdditiveNoiseModel`.
+"""This module defines implementations of :class:`~dowhy.gcm.ml.PredictionModel` used by the different
+:class:`~dowhy.gcm.graph.FunctionalCausalModel` implementations, such as :class:`~dowhy.gcm.causal_mechanisms.PostNonlinearModel` or
+:class:`~dowhy.gcm.causal_mechanisms.AdditiveNoiseModel`.
 """

 from .classification import (
+    ClassificationModel,
    SklearnClassificationModel,
    create_gaussian_process_classifier,
    create_hist_gradient_boost_classifier,
@ -11,7 +12,9 @@ from .classification import (
    create_polynom_logistic_regression_classifier,
    create_random_forest_classifier,
 )
+from .prediction_model import PredictionModel
 from .regression import (
+    InvertibleFunction,
    SklearnRegressionModel,
    create_elastic_net_regressor,
    create_gaussian_process_regressor,
--- a/dowhy/gcm/ml/autogluon.py
+++ b/dowhy/gcm/ml/autogluon.py
@ -6,7 +6,7 @@ from autogluon import tabular
 from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
 from packaging import version

-from dowhy.gcm.fcms import ClassificationModel, PredictionModel
+from dowhy.gcm.ml import ClassificationModel, PredictionModel
 from dowhy.gcm.util.general import shape_into_2d


--- a/dowhy/gcm/ml/classification.py
+++ b/dowhy/gcm/ml/classification.py
@ -1,7 +1,7 @@
 """Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
 in the future.
 """
-
+from abc import abstractmethod
 from typing import List

 import numpy as np
@ -10,6 +10,8 @@ from packaging import version
 from sklearn.pipeline import make_pipeline
 from sklearn.preprocessing import PolynomialFeatures

+from dowhy.gcm.ml.prediction_model import PredictionModel
+
 if version.parse(sklearn.__version__) < version.parse("1.0"):
    from sklearn.experimental import enable_hist_gradient_boosting  # noqa

@ -25,11 +27,21 @@ from sklearn.naive_bayes import GaussianNB
 from sklearn.neighbors import KNeighborsClassifier
 from sklearn.svm import SVC

-from dowhy.gcm.fcms import ClassificationModel
 from dowhy.gcm.ml.regression import SklearnRegressionModel
 from dowhy.gcm.util.general import auto_apply_encoders, shape_into_2d


+class ClassificationModel(PredictionModel):
+    @abstractmethod
+    def predict_probabilities(self, X: np.array) -> np.ndarray:
+        raise NotImplementedError
+
+    @property
+    @abstractmethod
+    def classes(self) -> List[str]:
+        raise NotImplementedError
+
+
 class SklearnClassificationModel(SklearnRegressionModel, ClassificationModel):
    def predict_probabilities(self, X: np.array) -> np.ndarray:
        return shape_into_2d(self._sklearn_mdl.predict_proba(auto_apply_encoders(X, self._encoders)))
--- a/dowhy/gcm/ml/prediction_model.py
+++ b/dowhy/gcm/ml/prediction_model.py
@ -0,0 +1,25 @@
+from abc import abstractmethod
+
+import numpy as np
+
+
+class PredictionModel:
+    """Represents general prediction model implementations. Each prediction model should provide a fit and a predict
+    method."""
+
+    @abstractmethod
+    def fit(self, X: np.ndarray, Y: np.ndarray) -> None:
+        raise NotImplementedError
+
+    @abstractmethod
+    def predict(self, X: np.ndarray) -> np.ndarray:
+        raise NotImplementedError
+
+    @abstractmethod
+    def clone(self):
+        """
+        Clones the prediction model using the same hyper parameters but not fitted.
+
+        :return: An unfitted clone of the prediction model.
+        """
+        raise NotImplementedError
--- a/dowhy/gcm/ml/regression.py
+++ b/dowhy/gcm/ml/regression.py
@ -1,7 +1,7 @@
 """Functions and classes in this module should be considered experimental, meaning there might be breaking API changes
 in the future.
 """
-
+from abc import abstractmethod
 from typing import Any

 import numpy as np
@ -24,7 +24,7 @@ from sklearn.linear_model import ElasticNetCV, LassoCV, LassoLarsIC, LinearRegre
 from sklearn.neighbors import KNeighborsRegressor
 from sklearn.svm import SVR

-from dowhy.gcm.fcms import InvertibleFunction, PredictionModel
+from dowhy.gcm.ml.prediction_model import PredictionModel
 from dowhy.gcm.util.general import auto_apply_encoders, auto_fit_encoders, shape_into_2d


@ -122,6 +122,18 @@ def create_polynom_regressor(degree: int = 2, **kwargs_linear_model) -> SklearnR
    )


+class InvertibleFunction:
+    @abstractmethod
+    def evaluate(self, X: np.ndarray) -> np.ndarray:
+        """Applies the function on the input."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def evaluate_inverse(self, X: np.ndarray) -> np.ndarray:
+        """Returns the outcome of applying the inverse of the function on the inputs."""
+        raise NotImplementedError
+
+
 class InvertibleIdentityFunction(InvertibleFunction):
    def evaluate(self, X: np.ndarray) -> np.ndarray:
        return X
--- a/dowhy/gcm/stochastic_models.py
+++ b/dowhy/gcm/stochastic_models.py
@ -13,8 +13,8 @@ from sklearn.cluster import KMeans
 from sklearn.metrics import silhouette_score
 from sklearn.mixture import BayesianGaussianMixture

+from dowhy.gcm.causal_mechanisms import StochasticModel
 from dowhy.gcm.divergence import estimate_kl_divergence_continuous
-from dowhy.gcm.graph import StochasticModel
 from dowhy.gcm.util.general import shape_into_2d

 _CONTINUOUS_DISTRIBUTIONS = [
--- a/dowhy/gcm/unit_change.py
+++ b/dowhy/gcm/unit_change.py
@ -9,7 +9,7 @@ import pandas as pd
 from sklearn.linear_model._base import LinearModel
 from sklearn.utils.validation import check_is_fitted

-from dowhy.gcm.fcms import PredictionModel
+from dowhy.gcm.ml.prediction_model import PredictionModel
 from dowhy.gcm.ml.regression import SklearnRegressionModel
 from dowhy.gcm.shapley import ShapleyConfig, estimate_shapley_values

--- a/dowhy/gcm/validation.py
+++ b/dowhy/gcm/validation.py
@ -11,9 +11,9 @@ import numpy as np
 import pandas as pd
 from statsmodels.stats.multitest import multipletests

-from dowhy.gcm.cms import InvertibleStructuralCausalModel
-from dowhy.gcm.graph import DirectedGraph, get_ordered_predecessors, is_root_node, validate_causal_graph
+from dowhy.gcm.causal_models import InvertibleStructuralCausalModel, validate_causal_graph
 from dowhy.gcm.independence_test import kernel_based
+from dowhy.graph import DirectedGraph, get_ordered_predecessors, is_root_node


 class RejectionResult(Enum):
--- a/dowhy/gcm/whatif.py
+++ b/dowhy/gcm/whatif.py
@ -10,15 +10,19 @@ import numpy as np
 import pandas as pd

 from dowhy.gcm._noise import compute_noise_from_data
-from dowhy.gcm.cms import InvertibleStructuralCausalModel, ProbabilisticCausalModel, StructuralCausalModel
-from dowhy.gcm.fcms import ClassifierFCM
+from dowhy.gcm.causal_mechanisms import ClassifierFCM
+from dowhy.gcm.causal_models import (
+    InvertibleStructuralCausalModel,
+    ProbabilisticCausalModel,
+    StructuralCausalModel,
+    validate_causal_dag,
+)
 from dowhy.gcm.fitting_sampling import draw_samples
-from dowhy.gcm.graph import (
+from dowhy.graph import (
    DirectedGraph,
    get_ordered_predecessors,
    is_root_node,
    node_connected_subgraph_view,
-    validate_causal_dag,
    validate_node_in_graph,
 )

--- a/dowhy/graph.py
+++ b/dowhy/graph.py
@ -0,0 +1,75 @@
+"""This module defines the fundamental interfaces and functions related to causal graphs..
+
+Classes and functions in this module should be considered experimental, meaning there might be breaking API changes in
+the future.
+"""
+
+from abc import abstractmethod
+from typing import Any, List
+
+import networkx as nx
+from networkx.algorithms.dag import has_cycle
+from typing_extensions import Protocol
+
+
+class HasNodes(Protocol):
+    """This protocol defines a trait for classes having nodes."""
+
+    @property
+    @abstractmethod
+    def nodes(self):
+        """:returns Dict[Any, Dict[Any, Any]]"""
+        raise NotImplementedError
+
+
+class HasEdges(Protocol):
+    """This protocol defines a trait for classes having edges."""
+
+    @property
+    @abstractmethod
+    def edges(self):
+        """:returns a Dict[Tuple[Any, Any], Dict[Any, Any]]"""
+        raise NotImplementedError
+
+
+class DirectedGraph(HasNodes, HasEdges, Protocol):
+    """A protocol representing a directed graph as needed by graphical causal models.
+
+    This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically
+    compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing
+    a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.
+    """
+
+    @abstractmethod
+    def predecessors(self, node):
+        raise NotImplementedError
+
+
+def is_root_node(causal_graph: DirectedGraph, node: Any) -> bool:
+    return list(causal_graph.predecessors(node)) == []
+
+
+def get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) -> List[Any]:
+    """This function returns predecessors of a node in a well-defined order.
+
+    This is necessary, because we select subsets of columns in Dataframes by using a node's parents, and these parents
+    might not be returned in a reliable order.
+    """
+    return sorted(causal_graph.predecessors(node))
+
+
+def node_connected_subgraph_view(g: DirectedGraph, node: Any) -> Any:
+    """Returns a view of the provided graph g that contains only nodes connected to the node passed in"""
+    # can't use nx.node_connected_component, because it doesn't work with DiGraphs.
+    # Hence, a manual loop:
+    return nx.induced_subgraph(g, [n for n in g.nodes if nx.has_path(g, n, node)])
+
+
+def validate_acyclic(causal_graph: DirectedGraph) -> None:
+    if has_cycle(causal_graph):
+        raise RuntimeError("The graph contains a cycle, but an acyclic graph is expected!")
+
+
+def validate_node_in_graph(causal_graph: HasNodes, node: Any) -> None:
+    if node not in causal_graph.nodes:
+        raise ValueError("Node %s can not be found in the given graph!" % node)
--- a/tests/gcm/ml/test_autogluon.py
+++ b/tests/gcm/ml/test_autogluon.py
@ -5,7 +5,7 @@ from flaky import flaky
 from pytest import approx, importorskip, mark
 from sklearn.model_selection import train_test_split

-from dowhy.gcm.fcms import AdditiveNoiseModel, ClassifierFCM
+from dowhy.gcm.causal_mechanisms import AdditiveNoiseModel, ClassifierFCM

 autogluon = importorskip("dowhy.gcm.ml.autogluon")
 from dowhy.gcm.ml.autogluon import AutoGluonClassifier, AutoGluonRegressor
--- a/tests/gcm/test_anomaly_attribution.py
+++ b/tests/gcm/test_anomaly_attribution.py
@ -9,13 +9,13 @@ from dowhy.gcm import (
    InverseDensityScorer,
    InvertibleStructuralCausalModel,
    MedianCDFQuantileScorer,
-    PredictionModel,
    attribute_anomalies,
    auto,
    fit,
 )
 from dowhy.gcm.anomaly import _relative_frequency, attribute_anomaly_scores
 from dowhy.gcm.density_estimators import GaussianMixtureDensityEstimator
+from dowhy.gcm.ml import PredictionModel


@flaky(max_runs=3)
--- a/tests/gcm/test_graph.py
+++ b/tests/gcm/test_graph.py
@ -5,15 +5,9 @@ import pytest
 from flaky import flaky
 from pytest import approx

-from dowhy.gcm import (
-    AdditiveNoiseModel,
-    EmpiricalDistribution,
-    ProbabilisticCausalModel,
-    draw_samples,
-    fit,
-    is_root_node,
-)
+from dowhy.gcm import AdditiveNoiseModel, EmpiricalDistribution, ProbabilisticCausalModel, draw_samples, fit
 from dowhy.gcm.ml import create_linear_regressor
+from dowhy.graph import is_root_node


@flaky(max_runs=2)
--- a/tests/gcm/test_intrinsic_influence.py
+++ b/tests/gcm/test_intrinsic_influence.py
@ -16,11 +16,11 @@ from dowhy.gcm import (
    intrinsic_causal_influence,
 )
 from dowhy.gcm._noise import noise_samples_of_ancestors
-from dowhy.gcm.graph import node_connected_subgraph_view
 from dowhy.gcm.influence import intrinsic_causal_influence_sample
 from dowhy.gcm.ml import create_hist_gradient_boost_classifier, create_linear_regressor_with_given_parameters
 from dowhy.gcm.uncertainty import estimate_entropy_of_probabilities, estimate_variance
 from dowhy.gcm.util.general import apply_one_hot_encoding, fit_one_hot_encoders
+from dowhy.graph import node_connected_subgraph_view
 from tests.gcm.test_noise import _persist_parents


--- a/tests/gcm/test_noise.py
+++ b/tests/gcm/test_noise.py
@ -6,7 +6,6 @@ from flaky import flaky

 from dowhy.gcm import (
    AdditiveNoiseModel,
-    DirectedGraph,
    EmpiricalDistribution,
    InvertibleStructuralCausalModel,
    StructuralCausalModel,
@ -14,12 +13,13 @@ from dowhy.gcm import (
 )
 from dowhy.gcm._noise import compute_data_from_noise, compute_noise_from_data, get_noise_dependent_function
 from dowhy.gcm.auto import assign_causal_mechanisms
-from dowhy.gcm.graph import PARENTS_DURING_FIT, get_ordered_predecessors
+from dowhy.gcm.causal_models import PARENTS_DURING_FIT
 from dowhy.gcm.ml import (
    create_linear_regressor,
    create_linear_regressor_with_given_parameters,
    create_logistic_regression_classifier,
 )
+from dowhy.graph import DirectedGraph, get_ordered_predecessors


 def test_given_data_with_known_noise_values_when_compute_data_from_noise_then_returns_correct_values():