Causal graphs represent the flow of information in the underlying data generating process of a given data set. They comprise a collection of causal relationships between variables in the data set, which dictate how a given variable causally affects other variables. It is important to note that the causal graph does not define how a specific node is functionally related to its parents; that information is encoded in a structural causal model. The causal graph simply shows how information flows from one node (i.e., a variable in a data set) to another. As a result, causal graphs are a fundamental element of Causal AI. The majority of Causal AI tasks, such as causal modeling, counterfactual reasoning, causal effect estimation, root cause analysis, algorithmic recourse, and causal fairness, rely on an accurate and comprehensive causal graph that correctly reflects the underlying data generating process.
Causal graphs can be discovered from observational data, which is the goal of causal discovery. An essential part of this discovery process is providing prior knowledge about specific causal relationships, usually formed by experts, which substantially reduces computational time and improves accuracy of causal discovery. There are many types of prior knowledge that can be provided, such as forbidden edges, directed edges, tiers of causal relationships, among others. This type of human-guided causal discovery is a key component in decisionOS.
cai-causal-graph package provides a user-friendly implementation of a causal graph class
(CausalGraph) that allows you to easily define mixed graphs that can represent various
types of causal graphs. See the Types of Causal Graphs section below for information
on different types of causal graphs.
You can find a quickstart to see how to easily build a basic graph, with further details provided in the Causal Graph documentation page. For a full list of all the classes and methods, please see the provided reference docs. For example, these are the reference docs for the CausalGraph class.
Types of Causal Graphs
A Directed Acyclic Graph (DAG) is the most common type of mixed graph used to represent a causal graph. It has
only directed edges between nodes (
->) and permits no cycles.
A Completed Partially Directed Acyclic Graph (CPDAG) can contain directed (
->) and undirected (
In this case, an undirected edge implies that a causal relationship exists but can point either way, i.e.,
A -- B can be
resolved to either
A -> B or
A <- B.
A Maximal Ancestral Graph (MAG) can encode all the information that a CPDAG can, but also provides
information such as whether a latent confounder is likely to exist or selection bias is likely to be present. Specifically,
MAGs may also contain bi-directed edges
A <> B, which imply the existence of a latent confounder between the respective
variables. Additionally, an undirected edge
A -- B in a MAG implies the existence of a latent selection bias variable
leading to the association being observed between
A Partial Ancestral Graph (PAG) describes an equivalence class of MAGs. PAGs may also contain "wild-card" or
"circle" edges (
-o), which can either be a directed or undirected arrow head, i.e.
A -o B can be resolved to
A -- B or
A -> B. The
o end is referred to as "unknown" in this package.
|Type of Graph
|Latent confounder edges
See EdgeType for all the supported edge types in this package. Note that the CausalGraph class can contain all the aforementioned edge types, and can therefore represent the entire hierarchy of DAGs, CPDAGs, MAGs, and PAGs.
Discovering a single DAG for a given data set is difficult. Certain causal relationships are indistinguishable from each other with only observational data, because they encode the same conditional independencies between variables. The set of such causal relationships is called the Markov equivalence class (MEC) for a particular set of nodes.
Multiple DAGs/CPDAGs/MAGs/PAGs can be consistent with the same MEC. For instance, if you identify the
X -> Y -> Z, then corresponding data would show that
X is independent of
However, the graphical structures
X <- Y <- Z and
X <- Y -> Z would lead to the exact same conditional independence
test result as above. Only if the graphical structure found was a collider connection
X -> Y <- Z would you be able to
identify the structure from observational data, because the data would tell you that
Z are independent, but
become dependent given
In a CPDAG the
-- edge implies an existence of an edge which can be in either direction,
->. In a MAG or
a PAG, the
-- edge implies the existence of latent selection variable. When resolving this, it can be possible to
resolve to no edge at all. In a PAG, the
-- edge is a possible outcome of a wildcard edge (for example