diff --git a/docs/source/gcm/index.rst b/docs/source/gcm/index.rst
new file mode 100644
index 000000000..bbe09ddeb
--- /dev/null
+++ b/docs/source/gcm/index.rst
@@ -0,0 +1,4 @@
+.. toctree::
+    :maxdepth: 2
+
+    user_guide/index
diff --git a/docs/source/gcm/user_guide/answering_causal_questions/attribute_distributional_changes.rst b/docs/source/gcm/user_guide/answering_causal_questions/attribute_distributional_changes.rst
new file mode 100644
index 000000000..45e192657
--- /dev/null
+++ b/docs/source/gcm/user_guide/answering_causal_questions/attribute_distributional_changes.rst
@@ -0,0 +1,51 @@
+Attributing Distributional Changes
+==================================
+
+When attributing distribution changes, we answer the question:
+
+    What mechanism in my system changed between two sets of data?
+
+For example, in a distributed computing system, we want to know why an important system metric changed in a negative way.
+
+How to use it
+^^^^^^^^^^^^^^
+
+To see how the method works, let's take the example from above and assume we have a system of three services X, Y, Z,
+producing latency numbers. The first dataset ``data_old`` is before the deployment, ``data_new`` is after the
+deployment:
+
+>>> import networkx as nx, numpy as np, pandas as pd
+>>> from dowhy import gcm
+>>> from scipy.stats import halfnorm
+>>>
+>>> X = halfnorm.rvs(size=1000, loc=0.5, scale=0.2)
+>>> Y = halfnorm.rvs(size=1000, loc=1.0, scale=0.2)
+>>> Z = np.maximum(X, Y) + np.random.normal(loc=0, scale=1, size=1000)
+>>> data_old = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
+>>>
+>>> X = halfnorm.rvs(size=1000, loc=0.5, scale=0.2)
+>>> Y = halfnorm.rvs(size=1000, loc=1.0, scale=0.2)
+>>> Z = X + Y + np.random.normal(loc=0, scale=1, size=1000)
+>>> data_new = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
+
+The change here simulates an accidental conversion of multi-threaded code into sequential one (waiting for X and Y in
+parallel vs. waiting for them sequentially).
+
+Next, we'll model cause-effect relationships as a probabilistic causal model:
+
+>>> causal_model = gcm.ProbabilisticCausalModel(nx.DiGraph([('X', 'Z'), ('Y', 'Z')]))  # X -> Z <- Y
+>>> gcm.auto_assign_causal_models(causal_model, based_on=data_old)
+
+Finally, we attribute changes in distributions to changes in causal mechanisms:
+
+>>> attributions = gcm.distribution_change(causal_model, data_old, data_new, 'Z')
+>>> attributions
+{'X': -0.0066425020480165905, 'Y': 0.009816959724738061, 'Z': 0.21957816956354193}
+
+As we can see, :math:`Z` got the highest attribution score here, which matches what we would
+expect, given that we changed the mechanism for variable :math:`Z` in our data generation.
+
+As the reader may have noticed, there is no fitting step involved when using this method. The
+reason is, that this function will call ``fit`` internally. To be precise, this function will
+make two copies of the causal graph and fit one graph to the first dataset and the second graph
+to the second datset.
diff --git a/docs/source/gcm/user_guide/answering_causal_questions/computing_counterfactuals.rst b/docs/source/gcm/user_guide/answering_causal_questions/computing_counterfactuals.rst
new file mode 100644
index 000000000..5021ceaa1
--- /dev/null
+++ b/docs/source/gcm/user_guide/answering_causal_questions/computing_counterfactuals.rst
@@ -0,0 +1,83 @@
+Computing Counterfactuals
+==========================
+
+By computing counterfactuals, we answer the question:
+
+    I observed a certain outcome z for a variable Z where variable X was set to a value x. What
+    would have happened to the value of Z, had I intervened on X to assign it a different value x'?
+
+As a concrete example, we can imagine the following:
+
+   I'm seeing unhealthy high levels of my `cholesterol LDL
+   <https://www.google.com/search?q=cholesterol+ldl>`_ (Z=10). I didn't take any medication
+   against it in recent months (X=0). What would have happened to my cholesterol LDL level (Z),
+   had I taken a medication dosage of 5g a day (X := 5)?
+
+How to use it
+^^^^^^^^^^^^^^
+
+To see how the method works, let's generate some data:
+
+>>> import networkx as nx, numpy as np, pandas as pd
+>>> from dowhy import gcm
+>>>
+>>> X = np.random.normal(loc=0, scale=1, size=1000)
+>>> Y = 2*X + np.random.normal(loc=0, scale=1, size=1000)
+>>> Z = 3*Y + np.random.normal(loc=0, scale=1, size=1000)
+>>> training_data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
+
+Next, we'll model cause-effect relationships as an invertible SCM and fit it to the data:
+
+>>> causal_model = gcm.InvertibleStructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z
+>>> gcm.auto_assign_causal_models(causal_model, training_data, gcm.AutoAssignQuality.GOOD)
+>>>
+>>> gcm.fit(causal_model, training_data)
+
+Finally, let's compute the counterfactual when intervening on X:
+
+>>> gcm.estimate_counterfactuals(
+>>>     causal_model,
+>>>     {'X': lambda x: 2},
+>>>     observed_data=pd.DataFrame(data=dict(X=[1], Y=[2], Z=[3])))
+   X         Y         Z
+0  2  4.034229  9.073294
+
+As we can see, :math:`X` takes our treatment-/intervention-value of 2, and :math:`Y` and :math:`Z`
+take deterministic values, based on our trained causal models and fixed observed data. I.e., based
+on the data generation process, if :math:`X = 1`, :math:`Y = 2`, we would expect :math:`Z` to
+be 6, but we *observed* :math:`Z = 3`, which means the particular noise value for :math:`Z` in this
+particular sample is approximately -2.98. Now, given that we know this hidden noise factor, we can
+estimate the counterfactual value of :math:`Z`, had we set :math:`X := 2`, which is approximately
+9.07 (as can be seen in the result above).
+
+This shows that the observed data is used to calculate the noise data in the system. We can also
+provide these noise values directly, via:
+
+>>> gcm.counterfactual_distribution(
+>>>     causal_model,
+>>>     {'X': lambda x: 2},
+>>>     noise_data=pd.DataFrame(data=dict(X=[0], Y=[-0.007913], Z=[-2.97568])))
+   X         Y         Z
+0  2  4.034229  9.073293
+
+As we see, with :math:`X = 2` and :math:`Y \approx 4.03`, :math:`Z` should be approximately 12. But
+we know the hidden noise for this sample, approximately -2.98. So the counterfactual outcome
+is again :math:`Z \approx 9.07`.
+
+Understanding the method
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Counterfactuals are very similar to :doc:`simulate_impact_of_interventions`, with an important
+difference: when performing interventions, we look into the future, for counterfactuals we look into
+an alternative past. To reflect this in the computation, when performing interventions, we generate
+all noise using our causal models. For counterfactuals, we use the noise from actual observed data.
+
+To expand on our example above, we assume there are other factors that contribute to cholesterol
+levels, e.g. exercising or genetic predisposition. While we *assume* medication helps against high
+LDL levels, it's important to take into account all other factors that could also help against it.
+We want to prove *what* has helped. Hence, it's important to use the noise from the real data,
+not some generated noise from our generative models. Otherwise, I may be able to reduce my
+cholesterol LDL level in the counterfactual world, where I take medication (X := 5), but not because
+I took the medication, but because the *generated noise* of Z also just happened to be low and so
+caused a low value for Z. By taking the *real* noise value of Z (derived from the observed data of
+Z), I can prove that it was the medication that helped.
diff --git a/docs/source/gcm/user_guide/answering_causal_questions/index.rst b/docs/source/gcm/user_guide/answering_causal_questions/index.rst
new file mode 100644
index 000000000..cd0e017de
--- /dev/null
+++ b/docs/source/gcm/user_guide/answering_causal_questions/index.rst
@@ -0,0 +1,13 @@
+Answering Causal Questions
+===========================
+
+In the following sub-sections, we'll dive deep into all causal questions the GCM-based inference in
+DoWhy can answer and explain the concepts behind them and how to interpret the results.
+
+
+.. toctree::
+    :maxdepth: 3
+
+    simulate_impact_of_interventions
+    computing_counterfactuals
+    attribute_distributional_changes
diff --git a/docs/source/gcm/user_guide/answering_causal_questions/simulate_impact_of_interventions.rst b/docs/source/gcm/user_guide/answering_causal_questions/simulate_impact_of_interventions.rst
new file mode 100644
index 000000000..efd6f49c4
--- /dev/null
+++ b/docs/source/gcm/user_guide/answering_causal_questions/simulate_impact_of_interventions.rst
@@ -0,0 +1,50 @@
+Simulating the Impact of Interventions
+======================================
+
+By simulating the impact of interventions, we answer the question:
+
+     What will happen to the variable Z if I intervene on Y?
+
+How to use it
+^^^^^^^^^^^^^^
+
+To see how the method works, let's generate some data:
+
+>>> import numpy as np, pandas as pd
+>>>
+>>> X = np.random.normal(loc=0, scale=1, size=1000)
+>>> Y = 2*X + np.random.normal(loc=0, scale=1, size=1000)
+>>> Z = 3*Y + np.random.normal(loc=0, scale=1, size=1000)
+>>> training_data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
+
+Next, we'll model cause-effect relationships as a probabilistic causal model and fit it to the data:
+
+>>> import networkx as nx
+>>> from dowhy import gcm
+>>>
+>>> causal_model = gcm.ProbabilisticCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z
+>>> gcm.auto_assign_causal_models(causal_model, training_data)
+>>> gcm.fit(causal_model, training_data)
+
+Finally, let's perform an intervention on Y:
+
+>>> samples = gcm.perform_intervention(causal_model, {'X': lambda x: 1}, num_samples_to_draw=1000)
+>>> samples.head()
+       X         Y          Z
+    0  1  3.481467  12.475105
+    1  1  1.282945   3.279435
+    2  1  2.508717   7.907412
+    3  1  2.077061   5.506252
+    4  1  1.400568   6.097633
+
+As we can see, X is now fixed at a constant value of 1. This is known as an atomic intervention. We can also perform
+shift interventions where we shift the random variable X by some value:
+
+>>> samples = gcm.perform_intervention(causal_model, {'X': lambda x: x + 0.5}, num_samples_to_draw=1000)
+>>> samples.head()
+              X         Y          Z
+    0 -0.542813  0.031771   1.195391
+    1  1.615089  2.156833   6.704683
+    2  1.340949  1.910316   5.882468
+    3  1.837919  4.360685  12.565738
+    4  3.791410  8.361918  25.477725
diff --git a/docs/source/gcm/user_guide/customizing_model_assignment.rst b/docs/source/gcm/user_guide/customizing_model_assignment.rst
new file mode 100644
index 000000000..10d364d64
--- /dev/null
+++ b/docs/source/gcm/user_guide/customizing_model_assignment.rst
@@ -0,0 +1,4 @@
+Customizing Model Assignment
+============================
+
+TODO
\ No newline at end of file
diff --git a/docs/source/gcm/user_guide/index.rst b/docs/source/gcm/user_guide/index.rst
new file mode 100644
index 000000000..d56f4e04f
--- /dev/null
+++ b/docs/source/gcm/user_guide/index.rst
@@ -0,0 +1,10 @@
+GCMs User Guide
+===============
+
+.. toctree::
+    :maxdepth: 1
+    :glob:
+
+    introduction
+    answering_causal_questions/index
+    customizing_model_assignment
diff --git a/docs/source/gcm/user_guide/introduction.rst b/docs/source/gcm/user_guide/introduction.rst
new file mode 100644
index 000000000..46fdea3f0
--- /dev/null
+++ b/docs/source/gcm/user_guide/introduction.rst
@@ -0,0 +1,163 @@
+Introduction
+============
+
+Graphical causal model-based inference, or GCM-based inference for short, is an experimental addition to DoWhy, that
+currently works separately from DoWhy's main API. Its experimental status also means that its API may
+undergo breaking changes in the future. It will be forming a part of a joint, new API (<link to proposal>). We
+welcome your comments.
+
+The ``dowhy.gcm`` package provides a variety of ways to answer causal questions and we'll go through them in detail in
+section :doc:`answering_causal_questions/index`. However, before diving into them, let's understand
+the basic building blocks and usage patterns it is built upon.
+
+The basic building blocks
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+All main features of the GCM-based inference in DoWhy are built around the concept of **graphical causal models**. A
+graphical causal model consists of a causal direct acyclic graph (DAG) of variables and a **causal mechanism** for
+each of the variables. A causal mechanism defines the conditional distribution of a variable given its parents in the
+graph, or, in case of root node variables, simply its distribution.
+
+The most general case of a GCM is a **probabilistic causal model** (PCM), where causal mechanisms are defined by
+**conditional stochastic models** and **stochastic models**. In the ``dowhy.gcm`` package, these are represented by
+:class:`~ProbabilsiticCausalModel`, :class:`~ConditionalStochasticModel`, and :class:`~StochasticModel`.
+
+.. image:: pcm.png
+   :width: 80%
+   :align: center
+
+|
+
+In practical terms however, we often use **structural causal models** (SCMs) to represent our GCMs,
+and the causal mechanisms are defined by **functional causal models** (FCMs) for non-root nodes and **stochastic
+models** for root nodes. An SCM implements the same traits as a PCM, but on top of that, its FCMs allow us to
+reason *further* about its data generation process based on parents and noise, and hence, allow us e.g. to compute
+counterfactuals.
+
+.. image:: scm.png
+   :width: 80%
+   :align: center
+
+|
+
+To keep this introduction simple, we will stick with SCMs for now.
+
+As mentioned above, a causal mechanism describes how the values of a node are influenced by the values of its parent
+nodes. We will dive much deeper into the details of causal mechanisms and their meaning in section
+:doc:`customizing_model_assignment`. But for this introduction, we will treat them as an opaque thing that is needed
+to answer causal questions. With that in mind, the typical steps involved in answering a causal question, are:
+
+1. **Modeling cause-effect relationships as a GCM (causal graph + causal mechanisms):**
+::
+
+   causal_model = StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z
+   auto_assign_causal_models(causal_model, based_on=data)
+
+1. **Fitting the GCM to the data:**
+::
+
+   fit(causal_model, data)
+
+3. **Answering a causal query based on the GCM:**
+::
+
+   results = <causal_query>(causal_model, ...)
+
+Where ``<causal_query>`` can be one of multiple functions explained in
+:doc:`answering_causal_questions/index`.
+
+Let's look at each of these steps in more detail.
+
+Step 1: Modeling cause-effect relationships as a structural causal model (SCM)
+------------------------------------------------------------------------------
+
+The first step is to model the cause-effect relationships between variables relevant
+to our use case. We do that in form of a causal graph. A causal graph is a directed acyclic
+graph (DAG) where an edge X→Y implies that X causes Y. Statistically, a causal graph encodes the
+conditional independence relations between variables. Using the `networkx <https://networkx
+.github.io/>`__ library, we can create causal graphs. In the snippet below, we create a chain
+X→Y→Z:
+
+>>> import networkx as nx
+>>> causal_graph = nx.DiGraph([('X', 'Y'), ('Y', 'Z')])
+
+To answer causal questions using causal graphs, we also have to know the nature of underlying
+data-generating process of variables. A causal graph by itself, being a diagram, does not have
+any information about the data-generating process. To introduce this data-generating process, we use an SCM that's
+built on top of our causal graph:
+
+>>> from dowhy import gcm
+>>> causal_model = gcm.StructuralCausalModel(causal_graph)
+
+This causal model allows us now to assign causal mechanisms to each node in the form of functional causal models.
+Section :doc:`customizing_model_assignment` explains how this can be done explicitly, but for now, we'll rely on
+our auto-assign feature. The feature automatically determines a good set of default functional causal
+models based on the data we work with.
+
+Therefore, at this point we would normally load our dataset. For this introduction, we generate
+some synthetic data instead. The API takes data in form of Pandas DataFrames:
+
+>>> import numpy as np, pandas as pd
+>>>
+>>> X = np.random.normal(loc=0, scale=1, size=1000)
+>>> Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
+>>> Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
+>>> data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
+>>> data.head()
+          X         Y          Z
+0 -2.253500 -3.638579 -10.370047
+1 -1.078337 -2.114581  -6.028030
+2 -0.962719 -2.157896  -5.750563
+3 -0.300316 -0.440721  -2.619954
+4  0.127419  0.158185   1.555927
+
+Note how the columns X, Y, Z correspond to our nodes X, Y, Z in the graph constructed above. We can also see how the
+values of X influence the values of Y and how the values of Y influence the values of Z in that data set.
+
+In the real world, this data comes as an opaque stream of values, where we don't know how one
+variable influences another. The SCM-based  can basically help us to deconstruct these causal
+relationships again, even though we didn't know them before.
+
+Now that we have the data, let's automatically assign a functional causal model (FCM) to each node in the graph,
+based on the data:
+
+>>> gcm.auto_assign_causal_models(causal_model, based_on=data)
+
+While this function provides a good default, section :doc:`customizing_model_assignment` explains
+how we can manually optimize the choice of models according to our problem and improve our results.
+
+Step 2: Fitting the SCM to the data
+-----------------------------------
+
+With the data at hand and the graph constructed earlier, we can now train the SCM using ``fit``:
+
+>>> gcm.fit(causal_model, data)
+
+Fitting means, we learn the generative models of the variables in the SCM according to the data.
+
+Step 3: Answering a causal query based on the SCM
+-------------------------------------------------
+
+The last step, answering a causal question, is our actual goal. E.g. we could ask the question:
+
+    What will happen to the variable Z if I intervene on Y?
+
+This can be done via the ``perform_intervention`` function. Here's how:
+
+>>> samples = gcm.perform_intervention(causal_model,
+>>>                                    {'Y': lambda y: 2.34 },
+>>>                                    num_samples_to_draw=1000)
+>>> samples.head()
+          X         Y          Z
+0  1.186229  6.918607  20.682375
+1 -0.758809 -0.749365  -2.530045
+2 -1.177379 -5.678514 -17.110836
+3 -1.211356 -2.152073  -6.212703
+4 -0.100224 -0.285047   0.256471
+
+This intervention says: "I'll ignore any causal effects of X on Y, and set every value of Y
+to 2.34." So the distribution of X will remain unchanged, whereas values of Y will be at a fixed
+value and Z will respond according to its causal model.
+
+With this knowledge, we can now dive deep into the meaning and usages of causal queries in section
+:doc:`answering_causal_questions/index`.
diff --git a/docs/source/gcm/user_guide/pcm.png b/docs/source/gcm/user_guide/pcm.png
new file mode 100644
index 000000000..fb9d3c41f
Binary files /dev/null and b/docs/source/gcm/user_guide/pcm.png differ
diff --git a/docs/source/gcm/user_guide/scm.png b/docs/source/gcm/user_guide/scm.png
new file mode 100644
index 000000000..0cf133bad
Binary files /dev/null and b/docs/source/gcm/user_guide/scm.png differ
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 36eba2ea6..99636c623 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -35,6 +35,12 @@
    
    example_notebooks/nb_advanced_index
 
+.. toctree::
+   :maxdepth: 2
+   :caption: GCM-based inference (Experimental)
+
+   gcm/index
+
 .. toctree::
    :maxdepth: 2
    :caption: Package