time_series_causal_graph

TimeSeriesCausalGraph

from cai_causal_graph.time_series_causal_graph import TimeSeriesCausalGraph

class TimeSeriesCausalGraph(CausalGraph)

A causal graph for time series data.

The node in a time series causal graph will have additional metadata that provides the time information of the node together with the variable name. The two additional metadata are:

cai_causal_graph.type_definitions.TIME_LAG: the time difference with respect to the reference time 0
cai_causal_graph.type_definitions.VARIABLE_NAME: the name of the variable (without the lag information)

Attributes

_NodeCls: Type[TimeSeriesNode]
_EdgeCls: Type[Edge]
_SummaryGraphCls: Type[CausalGraph]
_lag_to_nodes: Dict[int, List[TimeSeriesNode]]
_variable_name_to_nodes: Dict[str, List[TimeSeriesNode]]
from_adjacency_matrix: Callable[[], TimeSeriesCausalGraph]
from_skeleton: Callable[[], TimeSeriesCausalGraph]
from_networkx: Callable[[], TimeSeriesCausalGraph]
from_gml_string: Callable[[], TimeSeriesCausalGraph]
nodes: List[TimeSeriesNode]
edges: List[TimeSeriesEdge]
get_nodes: Callable[[], List[TimeSeriesNode]]
get_node: Callable[[], TimeSeriesNode]

Methods

init

def __init__(input_list: Optional[List[NodeLike]] = None,
             output_list: Optional[List[NodeLike]] = None,
             fully_connected: bool = False,
             meta: Optional[dict] = None)

Initialize the time series causal graph.

Example usage:

 from cai_causal_graph import CausalGraph, TimeSeriesCausalGraph


 # How to initialize a TimeSeriesCausalGraph directly
 ts_cg = TimeSeriesCausalGraph()
 ts_cg.add_edge('X1 lag(n=1)', 'X1', edge_type=EdgeType.DIRECTED_EDGE)
 ts_cg.add_edge('X2 lag(n=1)', 'X2', edge_type=EdgeType.DIRECTED_EDGE)
 ts_cg.add_edge('X1', 'X3', edge_type=EdgeType.DIRECTED_EDGE)


 # How to initialize a TimeSeriesCausalGraph from a CausalGraph
 cg = CausalGraph()
 cg.add_edge('X1 lag(n=1)', 'X1', edge_type=EdgeType.DIRECTED_EDGE)
 cg.add_edge('X2 lag(n=1)', 'X2', edge_type=EdgeType.DIRECTED_EDGE)
 cg.add_edge('X1', 'X3', edge_type=EdgeType.DIRECTED_EDGE)

 # The time series causal graph will have the same nodes and edges as the causal graph,
 # but it is aware of the time information so 'X1 lag(n=1)' and 'X1' represent the same
 # variable but at different times.
 ts_cg = TimeSeriesCausalGraph.from_causal_graph(cg)

Arguments:

input_list: List of objects coercable to TimeSeriesNode. Each element is treated as an input node, if full_connected parameter is True. Otherwise, the nodes will simply be added to the graph with no edges.
output_list: List of objects coercable to TimeSeriesNode. Each element is treated as an output node, if fully_connected parameter is True. Otherwise, the nodes will simply be added to the graph with no edges.
fully_connected: If set to True, create a fully-connected bipartite directed graph, with all inputs connected to all outputs. If no input_list and no output_list is provided, an empty graph will be created. If either or both are provided, but this is False (default), then the nodes will be added but not connected by edges.
meta: Any metadata defined on the graph. The keys must be strings, but no requirement is placed on the values of the dictionary. Default is None. If passed, meta is shallow-copied.

is_stationary_graph

def is_stationary_graph() -> bool

Check if the graph is stationary. That is, if the graph is time invariant.

If there exists the edge X(t-1) -> X(t), then there must be the same edge X(t-2) -> X(t-1), etc.

Returns:

True if the graph is stationary, False otherwise.

get_stationary_graph

def get_stationary_graph() -> TimeSeriesCausalGraph

Make the graph stationary by adding the missing edges if needed.

If there exists the edge X(t-1) -> X(t), then there must be the edge X(t-2) -> X(t-1), etc.

get_minimal_graph

def get_minimal_graph() -> TimeSeriesCausalGraph

Return a minimal time series causal graph.

The minimal graph is the graph with the minimal number of edges that is equivalent to the original graph. In other words, it is a graph that has no edges whose destination is not time delta 0.

Example: Input graph:

- X1(t-2)-> X1(t-1) -> X2(t-1) -> X2(t), X1(t) -> X2(t), X1(t-1) -> X2(t-1)

Minimal graph:

- X1(t-1) -> X1(t) -> X2(t)

Returns:

The minimal graph as a TimeSeriesCausalGraph object.

get_topological_order

def get_topological_order(
        return_all: bool = False,
        respect_time_ordering: bool = True
) -> Union[List[str], List[List[str]]]

Return either a single or all topological orders of the graph.

A topological order is a non-unique permutation of the nodes such that an edge from 'A' to 'B' implies that 'A' appears before 'B' in the topological sort order. Generating all possible topological orders may be expensive for large graphs.

It is only possible to get topological order if the graph is a valid DAG.

For more details, see get_topological_order.

Arguments:

return_all: If True, return all the possible topological orders. Default is False.
respect_time_ordering: If True, return the topological order that is ordered in time. Default is True. For example, if the graph is 'Y lag(n=1)' -> 'Y' <- 'X', then ['X', 'Y lag(n=1)', 'Y'] and ['Y lag(n=1)', 'X', 'Y'] are both valid topological orders. However, only the second one would respect time ordering. If both return_all and respect_time_ordering are True, then only all topological orders that respect time are returned, not all valid topological orders.

Returns:

either a list of strings identifying a single topological order, or a list of lists identifying all possible topological orders.

is_minimal_graph

def is_minimal_graph() -> bool

Return True if the graph is minimal.

See get_minimal_graph for more details.

get_summary_graph

def get_summary_graph() -> CausalGraph

Return a summary graph.

Collapse graph in time into a single node per variable (column name). This can become cyclic and bi-directed as X(t-1) -> Y and Y(t-1) -> X would become X <-> Y.

There are several cases to consider. Assume the edge in consideration is called B.

if the edge is not in the summary graph, add it
if there is an edge A is in the summary graph, then:
- if A and B have the same direction, keep the direction
- if A and B have different directions, make it bi-directed
- if one of the two is already bi-directed, keep it bi-directed

Returns:

The summary graph as a CausalGraph object.

extend_graph

def extend_graph(backward_steps: Optional[int] = None,
                 forward_steps: Optional[int] = None,
                 include_all_parents: bool = True) -> TimeSeriesCausalGraph

Return an extended graph.

Extend the graph in time by adding nodes for each variable at each time step from backward_steps to forward_steps. If a backward step of n is specified, it means that the graph will be extended in order to include nodes back to time -n. For example, if t is the reference time, if backward_steps is 2, the graph will be extended to include nodes back to time t-2. This does not mean that the graph will be extended to only include nodes at time t-2, but rather that it will include nodes at time t-2 and all the nodes that are connected to them as specified by the minimal graph.

By default, this will also add all parents of newly added nodes. This is done to ensure that all nodes corresponding to the same variable name have consistent parents, which is particularly useful for causal modeling tasks. This can be controlled by the include_all_parents parameter. If set to False, then only nodes up-to backward_steps back in time are added.

If both backward_steps and forward_steps are None, the original graph is returned.

Arguments:

backward_steps: Number of steps to extend the graph backwards in time. If None, do not extend backwards.
forward_steps: Number of steps to extend the graph forwards in time. If None, do not extend forwards.
include_all_parents: If True, any nodes and edges required to predict nodes up to backward_steps ago will also be added, in addition to the nodes/edges normally added by this method. This may mean that nodes further back in time than backward_steps will be added, if they are parents of any nodes up to backward_steps ago. Default is True, meaning this extra nodes/edges are added.

include_all_parents is only valid when specifying a backward_steps, and will have no effect on the logic of forward_steps since by definition all the parents of the added nodes in the future will either already exist in the minimal graph or be added.

Returns:

Extended graph with nodes for each variable at each time step from backward_steps to forward_steps.

get_variable_names_from_node_names

@staticmethod
def get_variable_names_from_node_names(node_names: List[str]) -> List[str]

Return a list of variable names from a list of node names.

This is useful for converting a list of node names into a list of variable names. Variables are the elementary (unique) units of a time series causal graph.

Example: ['X', 'X lag(n=1)', 'Y', 'Z lag(n=2)'] -> ['X', 'Y', 'Z']

Arguments:

node_names: A list of node names.

Returns:

A sorted list of variable names.

get_all_variable_names

def get_all_variable_names() -> List[str]

Return all variable names in the provided causal graph.

The returned list is sorted by variable name.

add_node

@reset_cached_attributes_decorator
def add_node(identifier: Optional[NodeLike] = None,
             variable_name: Optional[str] = None,
             time_lag: Optional[int] = None,
             *,
             variable_type: NodeVariableType = NodeVariableType.UNSPECIFIED,
             meta: Optional[dict] = None,
             node: Optional[TimeSeriesNode] = None) -> TimeSeriesNode

Add a node to the time series graph. See add_node for more details.

In addition to the add_node method, this method also populates the metadata of the node with the variable name and the time lag.

Arguments:

identifier: The identifier of the time series node.
variable_name: The variable name of the time series node.
time_lag: The time lag of the time series node.
variable_type: The type of the variable.
meta: The metadata of the time series node.
node: The node to add.

Returns:

The added node.

replace_node

@reset_cached_attributes_decorator
def replace_node(
        node_id: NodeLike,
        new_node_id: Optional[NodeLike] = None,
        time_lag: Optional[int] = None,
        variable_name: Optional[str] = None,
        *,
        variable_type: NodeVariableType = NodeVariableType.UNSPECIFIED,
        meta: Optional[dict] = None)

Replace a node in the graph.

See replace_node for more details.

delete_node

@reset_cached_attributes_decorator
def delete_node(identifier: NodeLike)

Delete a node from the graph.

See delete_node for more details.

delete_edge

@reset_cached_attributes_decorator
def delete_edge(source: NodeLike,
                destination: NodeLike,
                *,
                edge_type: Optional[EdgeType] = None)

Delete an edge from the graph.

See delete_edge for more details.

add_time_edge

@reset_cached_attributes_decorator
def add_time_edge(source_variable: str,
                  source_time: int,
                  destination_variable: str,
                  destination_time: int,
                  *,
                  meta: Optional[dict] = None,
                  validate: bool = True) -> Edge

Add a time edge to the graph from the variable at the source time to the variable at the destination time.

Arguments:

source_variable: The name of the source variable.
source_time: The time of the source variable.
destination_variable: The name of the destination variable.
destination_time: The time of the destination variable.
meta: The metadata for the edge.
validate: Whether to perform validation checks. The validation checks will raise if any cycles are introduced to the graph by adding the edge. There is no guarantees about the behavior of the resulting graph if this is disabled specifically to introduce cycles. This should only be used to speed up this method in situations where it is known the new edge will not add cycles, for example when copying a graph. Default is True.

Returns:

The edge that was added. Example:

- `add_time_edge('x', -2, 'x', 2)` will add an edge from the variable `x` at time -2 to the variable `x` at
    time 2.

from_causal_graph

@classmethod
def from_causal_graph(cls, causal_graph: CausalGraph) -> TimeSeriesCausalGraph

Instantiate a TimeSeriesCausalGraph from a

CausalGraph object. If the CausalGraph is already a TimeSeriesCausalGraph, it is returned as is.

This is useful for converting a CausalGraph from a single time step into a TimeSeriesCausalGraph.

Arguments:

causal_graph: The causal graph as a CausalGraph object.

Returns:

A TimeSeriesCausalGraph object.

from_adjacency_matrices

@classmethod
def from_adjacency_matrices(
        cls,
        adjacency_matrices: Dict[int, numpy.ndarray],
        variable_names: Optional[List[Union[NodeLike, int]]] = None,
        construct_minimal: bool = True) -> TimeSeriesCausalGraph

Instantiate a TimeSeriesCausalGraph from a dictionary of

adjacency matrices.

Keys are the time deltas. This is useful for converting a list of adjacency matrices into a TimeSeriesCausalGraph.

For example, the adjacency matrix with time delta -1 is stored in adjacency_matrices[-1] and would correspond to X(t-1) -> X, where X is the set of nodes.

Example:

 import numpy
 from cai_causal_graph import TimeSeriesCausalGraph

 # define the adjacency matrices
 adjacency_matrices = {
     -2: numpy.array([[0, 0, 0], [1, 0, 0], [0, 0, 1]]),
     -1: numpy.array([[0, 1, 0], [1, 0, 0], [0, 0, 0]]),
     0: numpy.array([[0, 1, 1], [0, 0, 1], [0, 0, 0]]),
 }

 # construct a time series causal graph from the adjacency matrices
 graph = TimeSeriesCausalGraph.from_adjacency_matrices(adjacency_matrices, variable_names=['X', 'Y', 'Z'])

 # query the edges of the constructed time series causal graph
 graph.get_edges()
 # output:
 # [
 #   Edge("X", "Y", type=->),
 #   Edge("X", "Z", type=->),
 #   Edge("X lag(n=1)", "Y", type=->),
 #   Edge("Y", "Z", type=->),
 #   Edge("Y lag(n=1)", "X", type=->),
 #   Edge("Y lag(n=2)", "X", type=->),
 #   Edge("Z lag(n=2)", "Z", type=->)
 # ]

Arguments:

adjacency_matrices: A dictionary of adjacency matrices. Keys are the time delta.
variable_names: A list of variable names. If not provided, the variable names are integers starting from 0. Node names must correspond to the variable names and must not contain the lag.
construct_minimal: Whether to return a minimal time series graph. Default is True.

Returns:

A time series causal graph.

adjacency_matrices

@property
def adjacency_matrices() -> Dict[int, numpy.ndarray]

Return the adjacency matrix dictionary of the minimal time series causal graph.

The keys are the time deltas and the values are the adjacency matrices.

maxlag

@property
def maxlag() -> Optional[int]

Return the absolute maximum backward time lag of the graph. Retained for backwards compatibility.

For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum backward lag is 2.

If the graph is empty or only forward lags are present, None is returned. Otherwise, the maximum backward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.

max_forward_lag

@property
def max_forward_lag() -> Optional[int]

Return the maximum forward time lag of the graph.

For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum forward lag is 1.

If the graph is empty or only backward lags are present, None is returned. Otherwise, the maximum forward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.

max_backward_lag

@property
def max_backward_lag() -> Optional[int]

Return the absolute maximum backward time lag of the graph.

For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum backward lag is 2.

If the graph is empty or only forward lags are present, None is returned. Otherwise, the maximum backward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.

variables

@property
def variables() -> Optional[List[str]]

Return the variables in the graph.

Variables differ from nodes in that they do not contain the lag. For example, if the graph contains the node 'X1 lag(n=2)', the variable is 'X1'.

to_numpy_by_lag

def to_numpy_by_lag() -> Tuple[Dict[int, numpy.ndarray], List[str]]

Return the adjacency matrices of the minimal time series causal graph ordered by the time delta.

Different time deltas are represented by different adjacency matrices. The keys of the dictionary are the time deltas and the values are the adjacency matrices.

Returns:

A tuple containing the dictionary for the adjacency matrices and the variable names.

get_nodes_at_lag

def get_nodes_at_lag(time_lag: int = 0) -> List[TimeSeriesNode]

Return all nodes at time delta time_lag.

Arguments:

time_lag: Time lag to return nodes for. Default is 0.

get_nodes_for_variable_name

def get_nodes_for_variable_name(variable_name: str) -> List[TimeSeriesNode]

Return all nodes for the variable variable_name.

Arguments:

variable_name: Variable name to return nodes for.

get_contemporaneous_nodes

def get_contemporaneous_nodes(node: NodeLike) -> List[TimeSeriesNode]

Return all nodes that are contemporaneous (i.e. have the same time_lag) to the provided node.