Skip to main content

identify_utils

identify_confounders

def identify_confounders(graph: CausalGraph, node_1: NodeLike,
node_2: NodeLike) -> List[str]

Identify all confounders between node_1 and node_2 in the provided graph.

A confounder between node_1 and node_2 is a node that is a (minimal) ancestor of both the node_1 and node_2. Being a minimal ancestor here means that the node is not an ancestor of other confounder nodes, unless it has another directed path to either node_1 or node_2 that does not go through other confounder nodes.

Note that this method returns a full list of all possible confounders. It is up to the user to decide which confounder(s) to use for downstream tasks, e.g. causal effect estimation. Note, however, that the list of (minimal) confounders returned by this method is a sufficient adjustment set for causal effect estimation, and therefore it is advised to use all returned variables when adjusting for confounding.

Example:

 from typing import List
from cai_causal_graph import CausalGraph
from cai_causal_graph.identify_utils import identify_confounders

# define a causal graph
cg = CausalGraph()
cg.add_edge('z', 'u')
cg.add_edge('u', 'x')
cg.add_edge('u', 'y')
cg.add_edge('x', 'y')

# compute confounders between node_1 and node_2; output: ['u']
confounder_variables: List[str] = identify_confounders(cg, node_1='x', node_2='y')

Arguments:

  • graph: The causal graph given by a CausalGraph instance. This must be a DAG, i.e. it must only contain directed edges and be acyclic, otherwise a TypeError is raised.
  • node_1: The first node or its identifier.
  • node_2: The second node or its identifier.

Returns:

A list of all confounders between node_1 and node_2.

identify_instruments

def identify_instruments(graph: CausalGraph,
source: NodeLike,
destination: NodeLike,
max_num_paths: int = 25) -> List[str]

Identify all instrumental variables for the causal effect of source on destination in the provided graph.

An instrumental variable for the causal effect of source on destination satisfies the following criteria:

1. There is a causal effect between the `instrument` and the `source`.
2. The `instrument` has a causal effect on the `destination` _only_ through the `source`.
3. There is no confounding between the `instrument` and the `destination`.

Note that this method returns a full list of all possible instrumental variables. It may not be necessary to use all identified instruments in instrumental variable regression, e.g. for causal effect estimation, and it is up to the user to decide which instruments to use (if not all).

Example:

 from typing import List
from cai_causal_graph import CausalGraph
from cai_causal_graph.identify_utils import identify_instruments

# define a causal graph
cg = CausalGraph()
cg.add_edge('z', 'x')
cg.add_edge('u', 'x')
cg.add_edge('u', 'y')
cg.add_edge('x', 'y')

# find the instruments between 'x' and 'y'; output: ['z']
instrumental_variables: List[str] = identify_instruments(cg, source='x', destination='y')

Arguments:

  • graph: The causal graph given by a CausalGraph instance. This must be a DAG, i.e. it must only contain directed edges and be acyclic, otherwise a TypeError is raised.
  • source: The source node or its identifier.
  • destination: The destination node or its identifier.
  • max_num_paths: The maximum number of paths to consider between the source and destination. Default is 25.

Returns:

A list of instrumental variables for the causal effect of source on destination.

identify_mediators

def identify_mediators(graph: CausalGraph,
source: NodeLike,
destination: NodeLike,
max_num_paths: int = 25) -> List[str]

Identify all mediators for the causal effect of source on destination in the provided graph.

A mediator variable for the causal effect of source on destination satisfies the following criteria:

1. There is a causal effect between the `source` and the `mediator`.
2. There is a causal effect between the `mediator` and the `destination`.
3. The `mediator` blocks all directed causal paths between the `source` and the `destination`.
4. There is no directed causal path from any confounder between `source` and `destination` to the `mediator`.

Note that this method returns a full list of all possible mediator variables. It will be up to the user to decide which mediators to use for downstream tasks, e.g. causal effect estimation.

Example:

 from typing import List
from cai_causal_graph import CausalGraph
from cai_causal_graph.identify_utils import identify_mediators

# define a causal graph
cg = CausalGraph()
cg.add_edge('x', 'm')
cg.add_edge('m', 'y')
cg.add_edge('u', 'x')
cg.add_edge('u', 'y')
cg.add_edge('x', 'y')

# find the mediators between 'x' and 'y'; output: ['m']
mediator_variables: List[str] = identify_mediators(cg, source='x', destination='y')

Arguments:

  • graph: The causal graph given by a CausalGraph instance. This must be a DAG, i.e. it must only contain directed edges and be acyclic, otherwise a TypeError is raised.
  • source: The source node or its identifier.
  • destination: The destination node or its identifier.
  • max_num_paths: The maximum number of paths to consider between the source and destination. Default is 25.

Returns:

A list of mediator variables for the causal effect of source on destination.

identify_markov_boundary

def identify_markov_boundary(graph: Union[CausalGraph, Skeleton],
node: NodeLike) -> List[str]

Identify all the Markov boundary for the specified node in the provided graph`.

The Markov boundary is defined as the minimal Markov blanket. The Markov blanket is defined as the set of variables such that if you condition on them, it makes your variable of interest (node in this case) conditionally independent of all other variables. The Markov boundary is minimal meaning that you cannot drop any variables from it for the conditional independence condition to still hold.

For a directed acyclic graph (DAG), provided as a CausalGraph instance, the Markov boundary of node 'A' is defined as the parents of 'A', the children of 'A', and the other parents of the children of 'A'.

For an undirected graph, provided as a Skeleton instance, the Markov boundary of node 'A' is simply defined as the neighbors of 'A'.

See https://en.wikipedia.org/wiki/Markov_blanket for further information.

Example for CausalGraph:

 from typing import List
from cai_causal_graph import CausalGraph
from cai_causal_graph.identify_utils import identify_markov_boundary

# define a causal graph
cg = CausalGraph()
cg.add_edge('u', 'b')
cg.add_edge('v', 'c')
cg.add_edge('b', 'a') # 'b' is a parent of 'a'
cg.add_edge('c', 'a') # 'c' is a parent of 'a'
cg.add_edge('a', 'd') # 'd' is a child of 'a'
cg.add_edge('a', 'e') # 'e' is a child of 'a'
cg.add_edge('w', 'f')
cg.add_edge('f', 'd') # 'f' is a parent of 'd', which is a child of 'a'
cg.add_edge('d', 'x')
cg.add_edge('d', 'y')
cg.add_edge('g', 'e') # 'g' is a parent of 'e', which is a child of 'a'
cg.add_edge('g', 'z')

# compute Markov boundary for node 'a'; output: ['b', 'c', 'd', 'e', 'f', 'g']
# parents: 'b' and 'c', children: 'd' and 'e', and other parents of children are 'f' and 'g'
# note the order may not match but the elements will be those six.
markov_boundary: List[str] = identify_markov_boundary(cg, node='a')

Example for Skeleton:

 from typing import List
from cai_causal_graph import Skeleton
from cai_causal_graph.identify_utils import identify_markov_boundary

# use causal graph from above and get is skeleton
skeleton: Skeleton = cg.skeleton

# compute Markov boundary for node 'a'; output: ['b', 'c', 'd', 'e']
# as we have no directional information in the undirected skeleton, the neighbors of 'a' are returned.
# note the order may not match but the elements will be those four.
markov_boundary: List[str] = identify_markov_boundary(skeleton, node='a')

Arguments:

  • graph: The graph given by a CausalGraph or Skeleton instance. If a CausalGraph is provided, it must be a DAG, i.e. it must only contain directed edges and be acyclic, otherwise a TypeError is raised.
  • node: The node or its identifier.

Returns:

A list of all node identifiers for the nodes in the Markov boundary of node.

identify_colliders

def identify_colliders(graph: CausalGraph,
unshielded_only: bool = False) -> List[str]

Identify all the collider nodes in the provided graph.

Arguments:

  • graph: The graph given by a CausalGraph instance.
  • unshielded_only: If True, only unshielded colliders are returned. If False, all colliders are returned. Default is False. A collider is unshielded if there is no edge between any of its parents.

Returns:

A list of all node identifiers for the collider nodes in the graph.