time_series_causal_graph
TimeSeriesCausalGraph
from cai_causal_graph.time_series_causal_graph import TimeSeriesCausalGraph
class TimeSeriesCausalGraph(CausalGraph)
A causal graph for time series data.
The node in a time series causal graph will have additional metadata that provides the time information of the node together with the variable name. The two additional metadata are:
cai_causal_graph.type_definitions.TIME_LAG
: the time difference with respect to the reference time 0cai_causal_graph.type_definitions.VARIABLE_NAME
: the name of the variable (without the lag information)
Attributes
- _NodeCls: Type[TimeSeriesNode]
- _EdgeCls: Type[Edge]
- _SummaryGraphCls: Type[CausalGraph]
- _lag_to_nodes: Dict[int, List[TimeSeriesNode]]
- _variable_name_to_nodes: Dict[str, List[TimeSeriesNode]]
- from_adjacency_matrix: Callable[[], TimeSeriesCausalGraph]
- from_skeleton: Callable[[], TimeSeriesCausalGraph]
- from_networkx: Callable[[], TimeSeriesCausalGraph]
- from_gml_string: Callable[[], TimeSeriesCausalGraph]
- nodes: List[TimeSeriesNode]
- edges: List[TimeSeriesEdge]
- get_nodes: Callable[[], List[TimeSeriesNode]]
- get_node: Callable[[], TimeSeriesNode]
Methods
__init__
def __init__(input_list: Optional[List[NodeLike]] = None,
output_list: Optional[List[NodeLike]] = None,
fully_connected: bool = False,
meta: Optional[dict] = None)
Initialize the time series causal graph.
Example usage:
from cai_causal_graph import CausalGraph, TimeSeriesCausalGraph
# How to initialize a TimeSeriesCausalGraph directly
ts_cg = TimeSeriesCausalGraph()
ts_cg.add_edge('X1 lag(n=1)', 'X1', edge_type=EdgeType.DIRECTED_EDGE)
ts_cg.add_edge('X2 lag(n=1)', 'X2', edge_type=EdgeType.DIRECTED_EDGE)
ts_cg.add_edge('X1', 'X3', edge_type=EdgeType.DIRECTED_EDGE)
# How to initialize a TimeSeriesCausalGraph from a CausalGraph
cg = CausalGraph()
cg.add_edge('X1 lag(n=1)', 'X1', edge_type=EdgeType.DIRECTED_EDGE)
cg.add_edge('X2 lag(n=1)', 'X2', edge_type=EdgeType.DIRECTED_EDGE)
cg.add_edge('X1', 'X3', edge_type=EdgeType.DIRECTED_EDGE)
# The time series causal graph will have the same nodes and edges as the causal graph,
# but it is aware of the time information so 'X1 lag(n=1)' and 'X1' represent the same
# variable but at different times.
ts_cg = TimeSeriesCausalGraph.from_causal_graph(cg)
Arguments:
input_list
: List of objects coercable to TimeSeriesNode. Each element is treated as an input node, iffull_connected
parameter isTrue
. Otherwise, the nodes will simply be added to the graph with no edges.output_list
: List of objects coercable to TimeSeriesNode. Each element is treated as an output node, iffully_connected
parameter isTrue
. Otherwise, the nodes will simply be added to the graph with no edges.fully_connected
: If set toTrue
, create a fully-connected bipartite directed graph, with all inputs connected to all outputs. If noinput_list
and nooutput_list
is provided, an empty graph will be created. If either or both are provided, but this isFalse
(default), then the nodes will be added but not connected by edges.meta
: Any metadata defined on the graph. The keys must be strings, but no requirement is placed on the values of the dictionary. Default isNone
. If passed, meta is shallow-copied.
is_stationary_graph
def is_stationary_graph() -> bool
Check if the graph is stationary. That is, if the graph is time invariant.
If there exists the edge X(t-1) -> X(t)
, then there must be the same edge X(t-2) -> X(t-1)
, etc.
Returns:
True if the graph is stationary, False otherwise.
get_stationary_graph
def get_stationary_graph() -> TimeSeriesCausalGraph
Make the graph stationary by adding the missing edges if needed.
If there exists the edge X(t-1) -> X(t)
, then there must be the edge X(t-2) -> X(t-1)
, etc.
get_minimal_graph
def get_minimal_graph() -> TimeSeriesCausalGraph
Return a minimal time series causal graph.
The minimal graph is the graph with the minimal number of edges that is equivalent to the original graph. In other words, it is a graph that has no edges whose destination is not time delta 0.
Example: Input graph:
- X1(t-2)-> X1(t-1) -> X2(t-1) -> X2(t), X1(t) -> X2(t), X1(t-1) -> X2(t-1)
Minimal graph:
- X1(t-1) -> X1(t) -> X2(t)
Returns:
The minimal graph as a TimeSeriesCausalGraph object.
get_topological_order
def get_topological_order(
return_all: bool = False,
respect_time_ordering: bool = True
) -> Union[List[str], List[List[str]]]
Return either a single or all topological orders of the graph.
A topological order is a non-unique permutation of the nodes such that an edge from 'A'
to 'B'
implies
that 'A'
appears before 'B'
in the topological sort order. Generating all possible topological orders may
be expensive for large graphs.
It is only possible to get topological order if the graph is a valid DAG.
For more details, see get_topological_order.
Arguments:
return_all
: IfTrue
, return all the possible topological orders. Default isFalse
.respect_time_ordering
: IfTrue
, return the topological order that is ordered in time. Default isTrue
. For example, if the graph is'Y lag(n=1)' -> 'Y' <- 'X'
, then['X', 'Y lag(n=1)', 'Y']
and['Y lag(n=1)', 'X', 'Y']
are both valid topological orders. However, only the second one would respect time ordering. If bothreturn_all
andrespect_time_ordering
areTrue
, then only all topological orders that respect time are returned, not all valid topological orders.
Returns:
either a list of strings identifying a single topological order, or a list of lists identifying all possible topological orders.
is_minimal_graph
def is_minimal_graph() -> bool
Return True
if the graph is minimal.
See get_minimal_graph for more details.
get_summary_graph
def get_summary_graph() -> CausalGraph
Return a summary graph.
Collapse graph in time into a single node per variable (column name).
This can become cyclic and bi-directed as X(t-1) -> Y
and Y(t-1) -> X
would become X <-> Y
.
There are several cases to consider. Assume the edge in consideration is called B.
- if the edge is not in the summary graph, add it
- if there is an edge A is in the summary graph, then:
- if A and B have the same direction, keep the direction
- if A and B have different directions, make it bi-directed
- if one of the two is already bi-directed, keep it bi-directed
Returns:
The summary graph as a CausalGraph object.
extend_graph
def extend_graph(backward_steps: Optional[int] = None,
forward_steps: Optional[int] = None,
include_all_parents: bool = True) -> TimeSeriesCausalGraph
Return an extended graph.
Extend the graph in time by adding nodes for each variable at each time step from backward_steps
to
forward_steps
. If a backward step of n is specified, it means that the graph will be extended in order to
include nodes back to time -n. For example, if t is the reference time, if backward_steps
is 2, the graph
will be extended to include nodes back to time t-2. This does not mean that the graph will be extended to only
include nodes at time t-2, but rather that it will include nodes at time t-2 and all the nodes that are
connected to them as specified by the minimal graph.
By default, this will also add all parents of newly added nodes. This is done to ensure that all nodes
corresponding to the same variable name have consistent parents, which is particularly useful for
causal modeling tasks. This can be controlled by the include_all_parents
parameter. If set to False
, then
only nodes up-to backward_steps
back in time are added.
If both backward_steps
and forward_steps
are None, the original graph is returned.
Arguments:
backward_steps
: Number of steps to extend the graph backwards in time. If None, do not extend backwards.forward_steps
: Number of steps to extend the graph forwards in time. If None, do not extend forwards.include_all_parents
: IfTrue
, any nodes and edges required to predict nodes up tobackward_steps
ago will also be added, in addition to the nodes/edges normally added by this method. This may mean that nodes further back in time thanbackward_steps
will be added, if they are parents of any nodes up tobackward_steps
ago. Default isTrue
, meaning this extra nodes/edges are added.
include_all_parents
is only valid when specifying a backward_steps
, and will have no effect on the
logic of forward_steps
since by definition all the parents of the added nodes in the future will either
already exist in the minimal graph or be added.
Returns:
Extended graph with nodes for each variable at each time step from backward_steps
to forward_steps
.
get_variable_names_from_node_names
@staticmethod
def get_variable_names_from_node_names(node_names: List[str]) -> List[str]
Return a list of variable names from a list of node names.
This is useful for converting a list of node names into a list of variable names. Variables are the elementary (unique) units of a time series causal graph.
Example: ['X', 'X lag(n=1)', 'Y', 'Z lag(n=2)'] -> ['X', 'Y', 'Z']
Arguments:
node_names
: A list of node names.
Returns:
A sorted list of variable names.
get_all_variable_names
def get_all_variable_names() -> List[str]
Return all variable names in the provided causal graph.
The returned list is sorted by variable name.
add_node
@reset_cached_attributes_decorator
def add_node(identifier: Optional[NodeLike] = None,
variable_name: Optional[str] = None,
time_lag: Optional[int] = None,
*,
variable_type: NodeVariableType = NodeVariableType.UNSPECIFIED,
meta: Optional[dict] = None,
node: Optional[TimeSeriesNode] = None) -> TimeSeriesNode
Add a node to the time series graph. See add_node for more details.
In addition to the add_node method, this method also populates the metadata of the node with the variable name and the time lag.
Arguments:
identifier
: The identifier of the time series node.variable_name
: The variable name of the time series node.time_lag
: The time lag of the time series node.variable_type
: The type of the variable.meta
: The metadata of the time series node.node
: The node to add.
Returns:
The added node.
replace_node
@reset_cached_attributes_decorator
def replace_node(
node_id: NodeLike,
new_node_id: Optional[NodeLike] = None,
time_lag: Optional[int] = None,
variable_name: Optional[str] = None,
*,
variable_type: NodeVariableType = NodeVariableType.UNSPECIFIED,
meta: Optional[dict] = None)
Replace a node in the graph.
See replace_node for more details.
delete_node
@reset_cached_attributes_decorator
def delete_node(identifier: NodeLike)
Delete a node from the graph.
See delete_node for more details.
delete_edge
@reset_cached_attributes_decorator
def delete_edge(source: NodeLike,
destination: NodeLike,
*,
edge_type: Optional[EdgeType] = None)
Delete an edge from the graph.
See delete_edge for more details.
add_time_edge
@reset_cached_attributes_decorator
def add_time_edge(source_variable: str,
source_time: int,
destination_variable: str,
destination_time: int,
*,
meta: Optional[dict] = None,
validate: bool = True) -> Edge
Add a time edge to the graph from the variable at the source time to the variable at the destination time.
Arguments:
source_variable
: The name of the source variable.source_time
: The time of the source variable.destination_variable
: The name of the destination variable.destination_time
: The time of the destination variable.meta
: The metadata for the edge.validate
: Whether to perform validation checks. The validation checks will raise if any cycles are introduced to the graph by adding the edge. There is no guarantees about the behavior of the resulting graph if this is disabled specifically to introduce cycles. This should only be used to speed up this method in situations where it is known the new edge will not add cycles, for example when copying a graph. Default isTrue
.
Returns:
The edge that was added. Example:
- `add_time_edge('x', -2, 'x', 2)` will add an edge from the variable `x` at time -2 to the variable `x` at
time 2.
from_causal_graph
@classmethod
def from_causal_graph(cls, causal_graph: CausalGraph) -> TimeSeriesCausalGraph
Instantiate a TimeSeriesCausalGraph from a
CausalGraph object. If the CausalGraph is already a TimeSeriesCausalGraph, it is returned as is.
This is useful for converting a CausalGraph from a single time step into a TimeSeriesCausalGraph.
Arguments:
causal_graph
: The causal graph as a CausalGraph object.
Returns:
A TimeSeriesCausalGraph object.
from_adjacency_matrices
@classmethod
def from_adjacency_matrices(
cls,
adjacency_matrices: Dict[int, numpy.ndarray],
variable_names: Optional[List[Union[NodeLike, int]]] = None,
construct_minimal: bool = True) -> TimeSeriesCausalGraph
Instantiate a TimeSeriesCausalGraph from a dictionary of
adjacency matrices.
Keys are the time deltas. This is useful for converting a list of adjacency matrices into a TimeSeriesCausalGraph.
For example, the adjacency matrix with time delta -1 is stored in adjacency_matrices[-1] and would correspond
to X(t-1) -> X
, where X
is the set of nodes.
Example:
import numpy
from cai_causal_graph import TimeSeriesCausalGraph
# define the adjacency matrices
adjacency_matrices = {
-2: numpy.array([[0, 0, 0], [1, 0, 0], [0, 0, 1]]),
-1: numpy.array([[0, 1, 0], [1, 0, 0], [0, 0, 0]]),
0: numpy.array([[0, 1, 1], [0, 0, 1], [0, 0, 0]]),
}
# construct a time series causal graph from the adjacency matrices
graph = TimeSeriesCausalGraph.from_adjacency_matrices(adjacency_matrices, variable_names=['X', 'Y', 'Z'])
# query the edges of the constructed time series causal graph
graph.get_edges()
# output:
# [
# Edge("X", "Y", type=->),
# Edge("X", "Z", type=->),
# Edge("X lag(n=1)", "Y", type=->),
# Edge("Y", "Z", type=->),
# Edge("Y lag(n=1)", "X", type=->),
# Edge("Y lag(n=2)", "X", type=->),
# Edge("Z lag(n=2)", "Z", type=->)
# ]
Arguments:
adjacency_matrices
: A dictionary of adjacency matrices. Keys are the time delta.variable_names
: A list of variable names. If not provided, the variable names are integers starting from 0. Node names must correspond to the variable names and must not contain the lag.construct_minimal
: Whether to return a minimal time series graph. Default isTrue
.
Returns:
A time series causal graph.
adjacency_matrices
@property
def adjacency_matrices() -> Dict[int, numpy.ndarray]
Return the adjacency matrix dictionary of the minimal time series causal graph.
The keys are the time deltas and the values are the adjacency matrices.
maxlag
@property
def maxlag() -> Optional[int]
Return the absolute maximum backward time lag of the graph. Retained for backwards compatibility.
For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum backward lag is 2.
If the graph is empty or only forward lags are present, None is returned. Otherwise, the maximum backward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.
max_forward_lag
@property
def max_forward_lag() -> Optional[int]
Return the maximum forward time lag of the graph.
For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum forward lag is 1.
If the graph is empty or only backward lags are present, None is returned. Otherwise, the maximum forward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.
max_backward_lag
@property
def max_backward_lag() -> Optional[int]
Return the absolute maximum backward time lag of the graph.
For example, if the graph is X lag(n=2) -> Y future(n=1), the maximum backward lag is 2.
If the graph is empty or only forward lags are present, None is returned. Otherwise, the maximum backward lag is a non-negative integer where 0 is returned if only contemporaneous nodes are present.
variables
@property
def variables() -> Optional[List[str]]
Return the variables in the graph.
Variables differ from nodes in that they do not contain the lag. For example, if the graph contains the node 'X1 lag(n=2)', the variable is 'X1'.
to_numpy_by_lag
def to_numpy_by_lag() -> Tuple[Dict[int, numpy.ndarray], List[str]]
Return the adjacency matrices of the minimal time series causal graph ordered by the time delta.
Different time deltas are represented by different adjacency matrices. The keys of the dictionary are the time deltas and the values are the adjacency matrices.
Returns:
A tuple containing the dictionary for the adjacency matrices and the variable names.
get_nodes_at_lag
def get_nodes_at_lag(time_lag: int = 0) -> List[TimeSeriesNode]
Return all nodes at time delta time_lag
.
Arguments:
time_lag
: Time lag to return nodes for. Default is0
.
get_nodes_for_variable_name
def get_nodes_for_variable_name(variable_name: str) -> List[TimeSeriesNode]
Return all nodes for the variable variable_name
.
Arguments:
variable_name
: Variable name to return nodes for.
get_contemporaneous_nodes
def get_contemporaneous_nodes(node: NodeLike) -> List[TimeSeriesNode]
Return all nodes that are contemporaneous (i.e. have the same time_lag) to the provided node.