# Causal Graph

Causal graphs represent the underlying data-generation process of a given data set. They comprise a collection of causal relationships between variables in the data set, which dictate how a given variable causally affects another variable. As a result, causal graphs are one of the most fundamental building blocks in causality.

Causal graphs consist of *nodes*, i.e. variables, and *edges*, i.e. their causal relationships. A directed edge `->`

between two nodes `A`

and `B`

would imply that `A`

is a causal driver of `B`

, but not the other way around. A causal
graph is essentially a collection of such nodes and edges that aim to fully specify the data-generating process of
corresponding data. There are several types of edges besides the directed edge and consequently several types of
causal graphs that can be captured by the CausalGraph class. See the
Types of Causal Graphs section for more information on this.

## Constructing a Causal Graph

### Adding Nodes and Edges

You can easily add nodes and edges to the CausalGraph class by using the add_node / add_nodes_from and add_edge methods, as shown below.

`from cai_causal_graph import CausalGraph`

# construct the causal graph object

causal_graph = CausalGraph()

# add a single node to the causal graph

causal_graph.add_node('A')

# add several nodes at once to the causal graph

causal_graph.add_nodes_from(['B', 'C', 'D'])

# add edges to the causal graph

causal_graph.add_edge('A', 'B') # this adds a directed edge (i.e., an edge from A to B) by default

causal_graph.add_edge('B', 'E') # if the node does not exist, it gets added automatically

### Node Variable Types

Any node added to a causal graph will, by default, be an unspecified variable type. It is, however, possible to specify
different variable types via the `variable_type`

argument. For a full list of variable types, see
NodeVariableType. For instance, you can add a binary node `F`

, as shown below.

`from cai_causal_graph import NodeVariableType`

causal_graph.add_node('F', variable_type=NodeVariableType.BINARY)

These are the different variable types that are supported (these can be accessed via the NodeVariableType enumeration):

`NodeVariableType.UNSPECIFIED`

(default for new nodes)`NodeVariableType.CONTINUOUS`

`NodeVariableType.BINARY`

`NodeVariableType.MULTICLASS`

`NodeVariableType.ORDINAL`

### Edge Types

Any edge added to causal graph will, by default, be a directed edge. It is, however, possible to specify different
edge types via the `edge_type`

argument. For instance, you can add an undirected edge `A -- C`

, as shown below, which
can be resolved to either `A -> C`

or `A <- C`

in a later downstream task.

`from cai_causal_graph import EdgeType`

# add an undirected edge between A and C

causal_graph.add_edge('A', 'C', edge_type=EdgeType.UNDIRECTED_EDGE)

These are the different edge types that are supported by the CausalGraph class (these can be accessed via the EdgeType enumeration):

`EdgeType.DIRECTED_EDGE`

(`->`

) (default for new edges)`EdgeType.UNDIRECTED_EDGE`

(`--`

)`EdgeType.BIDIRECTED_EDGE`

(`<>`

)`EdgeType.UNKNOWN_EDGE`

(`oo`

)`EdgeType.UNKNOWN_DIRECTED_EDGE`

(`o>`

)`EdgeType.UNKNOWN_UNDIRECTED_EDGE`

(`o-`

)

See the Types of Causal Graphs section for more information on edge types.

## Interacting with a Causal Graph

### Accessing Nodes

The CausalGraph class stores nodes via Node objects. It is possible to obtain a list of these Node objects by calling the nodes property.

`from typing import List`

from cai_causal_graph.graph_components import Node

# query a list of nodes

list_of_nodes: List[Node] = causal_graph.nodes

If, instead of obtaining a list of Node objects, you wish to obtain a list of string node identifiers, you can call the get_node_names method:

`# obtain a list of node identifiers`

node_names: List[str] = causal_graph.get_node_names()

It is also possible to query a specific node using its string identifier by means of the
get_node and get_nodes methods.
The former method returns a single node, while the latter method returns a list of nodes. More concretely, the
get_nodes method accepts a single node identifier (yielding a list
containing only one Node object), a list of node identifiers (yielding a list
containing the corresponding Node objects), or the default `None`

(yielding a
list of all nodes).

`# query a specific node`

node_object: Node = causal_graph.get_node(identifier='node_1')

# query a specific node with get_nodes

node_objects: List[Node] = causal_graph.get_nodes(identifier='node_1')

# query a list of nodes with get_nodes

node_objects: List[Node] = causal_graph.get_nodes(identifier=['node_1', 'node_2'])

# query all nodes with get_nodes; equivalent to the nodes property

node_objects: List[Node] = causal_graph.get_nodes()

Some nodes can be classified as *inputs* or *outputs* which means that they either have no incoming edges or no
outgoing edges, respectively. Inputs can be thought of as source nodes and outputs can be thought of as sink nodes. A
list of such nodes within a causal graph can be obtained via the
get_inputs and
get_outputs methods:

`# get a list of inputs; these have no incoming edges`

list_of_inputs: List[Node] = causal_graph.get_inputs()

# get a list of outputs; these have no outgoing edges

list_of_outputs: List[Node] = causal_graph.get_outputs()

You can check for the existence of a node using the node_exists method:

`# returns True if the node exists and False otherwise`

causal_graph.node_exists(identifier='node_1')

### Accessing Edges

The CausalGraph class stores edges via Edge objects. It is possible to obtain a list of these Edge objects by calling the edges property.

`from typing import List`

from cai_causal_graph.graph_components import Edge

# query a list of edges

list_of_edges: List[Edge] = causal_graph.edges

It is also possible to query a specific edge using the string identifier of its source node and/or its destination node
by means of the get_edge and
get_edges methods. The former method returns a single edge, while the latter
method returns a list of edges. If the `source`

argument of
get_edges is not provided, i.e. it is `None`

, then a list of edges
connecting to the `destination`

node are returned, and vice versa. If neither `source`

nor `destination`

are
provided, a list of all edges are returned.

`# query a specific edge`

edge_object: Edge = causal_graph.get_edge(source='node_1', destination='node_2')

# query all edges originating from node_1

node_1_edges: List[Edge] = causal_graph.get_edges(source='node_1')

# query all edges terminating at node_2

node_2_edges: List[Edge] = causal_graph.get_edges(destination='node_1')

# query all edges with get_edges; equivalent to the edges property

edge_objects: List[Edge] = causal_graph.get_edges()

Alternatively, you can also provide a tuple of node identifiers via the get_edge_by_pair method to query an edge:

`# query a specific edge`

edge_object: Edge = causal_graph.get_edge_by_pair(pair=('node_1', 'node_2'))

You can check for the existence of an edge using the edge_exists method:

`# returns True if the edge exists and False otherwise`

causal_graph.node_exists(source='node_1', destination='node_2')

Importantly, each of the above queries can also include the `edge_type`

of the relevant edge. By default, the
`edge_type`

argument of the above methods is `None`

, which means the edge is queried no matter its type. However, in
some settings you may wish to further specify the `edge_type`

(see the Edge Types section above for more information
on the available types as defined by the EdgeType enumeration). If the edge does not
exist with that type (note that it may exist with a different type), then an
`cai_causal_graph.exceptions.CausalGraphErrors.EdgeDoesNotExistError`

is raised.

`from cai_causal_graph import EdgeType`

# query for the edge knowing that it is undirected

edge_object: Edge = causal_graph.get_edge(

source='node_1', destination='node_2', edge_type=EdgeType.UNDIRECTED_EDGE

)

Lastly, while the get_edges method returns all edges no matter the type, it is possible to only obtain a list of edges that only have a certain type. The following methods are available to do this:

- get_directed_edges
- get_undirected_edges
- get_bidirected_edges
- get_unknown_edges
- get_unknown_directed_edges
- get_unknown_undirected_edges

### Manipulating Nodes and Edges

You can delete a node from the CausalGraph object by calling the delete_node method. This also removes any edges that were previously connecting that node to any other nodes. There is also the remove_node method, which does exactly the same thing and only exists because the words "delete" and "remove" are often used interchangeably.

`# delete the node and all incoming / outgoing edges; does the same as remove_node`

causal_graph.delete_node(identifier='node_1')

Sometimes you way wish to replace a node, or simply rename it to something else. This can be done using the replace_node method:

`# replace a node with a new one`

causal_graph.replace_node(node_id='node_1', new_node_id='node_new')

Similar to deleting a node, you can delete an edge using the delete_edge method (which is the same as the remove_edge method).

`# delete the edge; does the same as remove_edge`

causal_graph.delete_edge(source='node_1', destination='node_2')

## Working with other graph formats

### Skeleton

The CausalGraph class has a skeleton
property that returns the skeleton of the underlying causal graph (as a Skeleton
object), which contains the same nodes and edges but only has undirected edges. For instance, the skeleton of
`A -> B <> C`

would be `A -- B -- C`

.

`from cai_causal_graph import Skeleton`

# query the skeleton of the causal graph

skeleton_object: Skeleton = causal_graph.skeleton

### Adjacency Matrix

It is also possible to query the adjacency matrix $A$ of the underlying causal graph. This is a $p \times p$ matrix, where $p$ is the number of nodes, containing elements $A_{ij} = 1$ if there is an edge from node $i$ to node $j$. If there is an undirected edge between nodes $i$ and $j$, then $A_{ij} = A_{ji} = 1$. A fully undirected causal graph will therefore have a symmetric adjacency matrix. You can query the adjacency matrix using the adjacency_matrix property, as shown below. Note that adjacency matrices are only defined for causal graphs with directed or undirected edges; if the CausalGraph object contains any other edge types, this will raise an error.

`import numpy`

# obtain the adjacency matrix

adjacency: numpy.ndarray = causal_graph.adjacency_matrix

The to_numpy method is equivalent to querying the adjacency matrix, but also returns a list of node names:

`from typing import List, Tuple`

import numpy

# obtain the adjacency matrix

adjacency, node_names: Tuple[numpy.ndarray, List[str]] = causal_graph.to_numpy()

Naturally, you can also construct a `causal_graph.causal_graph.CausalGraph`

class from an adjacency matrix, by means
of the `causal_graph.causal_graph.CausalGraph`

`.from_adjacency_matrix`

method. Note that this method allows you to
pass a list of node names that can be used to construct node identifiers. If the `node_names`

argument is not
provided, the default node names will instead be `"node_x"`

, where x is between 1 and $p$ (the number of nodes).

`from cai_causal_graph import CausalGraph`

# construct a causal graph from an adjacency matrix

causal_graph: CausalGraph = CausalGraph.from_adjacency_matrix(adjacency, node_names)

Open-source packages often rely on `networkx`

for their causal graph objects. Specifically, the `networkx.DiGraph`

class
which can represent DAGs, i.e. an acyclic graph with only directed edges. It is straightforward to transform a
CausalGraph instance to a `networkx.DiGraph`

(or a `networkx.Graph`

) instance, and vice
versa, as shown below.

`import networkx`

# CausalGraph to networkx.DiGraph or networkx.Graph depending on edge types in the CausalGraph instance

networkx_digraph: networkx.DiGraph = causal_graph.to_networkx() # if graph is fully directed

networkx_graph: networkx.Graph = causal_graph.to_networkx() # if graph is fully undirected

# networkx.DiGraph to CausalGraph

causal_graph: CausalGraph = CausalGraph.from_networkx(networkx_digraph)

# networkx.Graph to CausalGraph

causal_graph: CausalGraph = CausalGraph.from_networkx(networkx_graph)

Open-source packages also utilize a Graph Modelling Language (GML) string to represent a mixed graph. It is possible to convert to and from GML strings as shown below.

`# CausalGraph to GML string`

gml_graph: str = causal_graph.to_gml_string()

# GML string to CausalGraph

causal_graph: CausalGraph = CausalGraph.from_gml_string(gml_graph)

Lastly, the CausalGraph object is serializable and can therefore be converted to / from a dictionary:

`# CausalGraph to dictionary`

causal_graph_dict: dict = causal_graph.to_dict()

# Dictionary to CausalGraph

causal_graph: CausalGraph = CausalGraph.from_dict(causal_graph_dict)