# Utilities

## Identify Useful Subsets of a Causal Graph

The `cai-causal-graph`

package allows for identifying useful subsets of a causal graph. These can be helpful in
interpreting a causal graph, but may also be used directly in some applications such as causal effect estimation. One
example of such a subset is the set of confounders between two variables.

### Identifying Confounders

The `cai-causal-graph`

package implements the `cai_causal_graph.identify_utils.identify_confounders`

utility function,
which allows you to identify the set of confounders between two variables in a directed acyclic graph (DAG).

Confounders are defined to be nodes in the causal graph that are (minimal) ancestors of both the source and destination
nodes. Note that, in this package, any parents of confounders (that do not have other direct causal paths to the
destination) are not returned as confounders themselves, even though they may be confounders in other definitions.
Hence, only **minimal** confounders are returned.

`from typing import List`

from cai_causal_graph import CausalGraph

from cai_causal_graph.identify_utils import identify_confounders

# define a causal graph that is a DAG

cg = CausalGraph()

cg.add_edge('z', 'u')

cg.add_edge('u', 'x')

cg.add_edge('u', 'y')

cg.add_edge('x', 'y')

# compute confounders between source and destination; output: ['u']

confounder_variables: List[str] = identify_confounders(cg, node_1='x', node_2='y')

### Identifying Instruments

The `cai-causal-graph`

package implements the `cai_causal_graph.identify_utils.identify_instruments`

utility function,
which allows you to identify a list of potential instrumental variables for the causal effect of one node on another
node in a directed acyclic graph (DAG).

An instrumental variable for the causal effect of `source`

on `destination`

satisfies the following criteria:

`1. There is a causal effect between the `instrument` and the `source`.`

2. The `instrument` has a causal effect on the `destination` _only_ through the `source`.

3. There is no confounding between the `instrument` and the `destination`.

`from typing import List`

from cai_causal_graph import CausalGraph

from cai_causal_graph.identify_utils import identify_instruments

# define a causal graph that is a DAG

cg = CausalGraph()

cg.add_edge('z', 'x')

cg.add_edge('u', 'x')

cg.add_edge('u', 'y')

cg.add_edge('x', 'y')

# find the instruments between 'x' and 'y'; output: ['z']

instrumental_variables: List[str] = identify_instruments(cg, source='x', destination='y')

### Identifying Mediators

The `cai-causal-graph`

package implements the `cai_causal_graph.identify_utils.identify_mediators`

utility function,
which allows you to identify a list of potential mediator variables for the causal effect of one node on another node
in a directed acyclic graph (DAG).

`A mediator variable for the causal effect of `source` on `destination` satisfies the following criteria:`

1. There is a causal effect between the `source` and the `mediator`.

2. There is a causal effect between the `mediator` and the `destination`.

3. The `mediator` blocks all directed causal paths between the `source` and the `destination`.

4. There is no directed causal path from any confounder between `source` and `destination` to the `mediator`.

`from typing import List`

from cai_causal_graph import CausalGraph

from cai_causal_graph.identify_utils import identify_mediators

# define a causal graph

cg = CausalGraph()

cg.add_edge('x', 'm')

cg.add_edge('m', 'y')

cg.add_edge('u', 'x')

cg.add_edge('u', 'y')

cg.add_edge('x', 'y')

# find the mediators between 'x' and 'y'; output: ['m']

mediator_variables: List[str] = identify_mediators(cg, source='x', destination='y')

### Identifying Markov Boundary

The `cai-causal-graph`

package implements the `cai_causal_graph.identify_utils.identify_markov_boundary`

utility
function, which allows you to identify the Markov boundary for a variable in a directed acyclic graph (DAG) or for a
variable in an undirected graph.

The Markov boundary is defined as the minimal Markov blanket. The Markov blanket is defined as the set of variables
such that if you condition on them, it makes your variable of interest (`node`

in this case) conditionally independent
of all other variables. The Markov boundary is minimal meaning that you cannot drop any variables from it for the
conditional independence condition to still hold.

For a DAG, provided as a CausalGraph instance, the Markov boundary of a node is defined as its parents, its children, and the other parents of its children.

For an undirected graph, provided as a Skeleton instance, the Markov boundary of a node is simply defined as its neighbors.

See https://en.wikipedia.org/wiki/Markov_blanket for further information. The code example below uses the graph from this site.

`from typing import List`

from cai_causal_graph import CausalGraph, Skeleton

from cai_causal_graph.identify_utils import identify_markov_boundary

# define a causal graph

cg = CausalGraph()

cg.add_edge('u', 'b')

cg.add_edge('v', 'c')

cg.add_edge('b', 'a') # 'b' is a parent of 'a'

cg.add_edge('c', 'a') # 'c' is a parent of 'a'

cg.add_edge('a', 'd') # 'd' is a child of 'a'

cg.add_edge('a', 'e') # 'e' is a child of 'a'

cg.add_edge('w', 'f')

cg.add_edge('f', 'd') # 'f' is a parent of 'd', which is a child of 'a'

cg.add_edge('d', 'x')

cg.add_edge('d', 'y')

cg.add_edge('g', 'e') # 'g' is a parent of 'e', which is a child of 'a'

cg.add_edge('g', 'z')

# compute Markov boundary for node 'a'; output: ['b', 'c', 'd', 'e', 'f', 'g']

# parents: 'b' and 'c', children: 'd' and 'e', and other parents of children are 'f' and 'g'

# note the order may not match but the elements will be those six.

markov_boundary: List[str] = identify_markov_boundary(cg, node='a')

# use causal graph from above and get is skeleton

skeleton: Skeleton = cg.skeleton

# compute Markov boundary for node 'a'; output: ['b', 'c', 'd', 'e']

# as we have no directional information in the undirected skeleton, the neighbors of 'a' are returned.

# note the order may not match but the elements will be those four.

markov_boundary: List[str] = identify_markov_boundary(skeleton, node='a')