Documented internals#

Classes#

class datafold.pcfold.kernels.RadialBasisKernel(required_metric, distance=None)[source]#

Abstract base class for radial basis kernels.

“A radial basis function (RBF) is a real-valued function whose value depends only on the distance between the input and some fixed point.” (taken from Wikipedia)

Parameters:: required_metric (str) – metric required for kernel

class datafold.dynfold.dynsystem.LinearDynamicalSystem(sys_type, sys_mode, is_controlled=False, is_control_affine=False, is_time_invariant=True)[source]#

Evolve linear dynamical system forward in time.

There are various definitions of a linear dynamical system, the specific form is selected from

Parameters:

sys_type (Literal['differential', 'flowmap']) –
Type of linear system:
- ”differential”
- ”flowmap”
sys_mode (Literal['matrix', 'spectral']) –
Whether the system is evaluated with
- ”matrix” (i.e. $A$ or $\mathcal{A}$ are given)
- ”spectral” (i.e. eigenpairs of $A$ or $\mathcal{A}$ are given)
is_controlled (bool) – Whether the system is controlled. If set to True a control matrix must be passed to setup_matrix_system (currently there is no implementation for spectral systems)
is_control_affine (bool) – Whether the system is a control affine. The control matrix must be a 3-dim. array (tensor) and the implementation is based on Peitz et al. [2020].
is_time_invariant (bool) – If True, the system internally always starts with time=0. This is irrespective of the time given in the time values. If the initial time is larger than zero, the internal times are corrected to the requested time.

References

[Kutz et al., 2016] (pages 3 ff.)

compute_spectral_system_states(states)[source]#

Compute the spectral states of the system.

If the linear system is written in its spectral form:

$\Psi_r \Lambda^n \Psi_l x_0 &= x_n \\ \Psi_r \Lambda^n b_0 &= x_n \\$

then b_0 is the spectral state, which is computed in this function. It does not necessarily need to be an initial state but instead can be arbitrary states.

In the context of dynamic mode decomposition, the spectral state is also often referred to as “amplitudes”. E.g., see Kutz et al. [2016], page 8. In the context of EDMD, where the DMD model acts on a dictionary space, then the spectral states are the evaluation of the Koopman eigenfunctions. See e.g., Williams et al. [2015] Eq. 3 or 6.

There are two alternatives in how to compute the states.

By using the right eigenvectors and solving in a least square sense

$\Psi_r b_0 = x_0$
or by using the left eigenvectors and computing the matrix-vector product

$\Psi_l x_0 = b_0$

If the left eigenvectors where set during setup_sys_spectral(), then alternative 2 is used always and otherwise alternative 1.

Parameters:: states (ndarray) – The states of original data space in column-orientation (state_length, n_states).
Returns:: spectrally aligned states
Return type:: numpy.ndarray

evolve_system(initial_conditions, *, time_values, control_input=None, overwrite_sys_matrix=None, time_delta=None, time_series_ids=None, feature_names_out=None)[source]#

Evolve specified linear dynamical system.

The system evolves depending on the specified system solver (set during initialization).

Parameters:

initial_conditions (ndarray) – Single initial condition of shape (n_features,) or multiple initial conditions of shape (n_features, n_initial_conditions).
time_values (Union[ndarray, float, int, list]) – Time values to evaluate the linear system at $t \in \mathbb{R}^{+}$
control_input (Optional[ndarray]) – Control states over the time horizon acting to the system dynamics. The array has to have a shape of (n_timesteps, n_control_features) for a single initial condition and be a tensor with (n_initial_condition, n_timesteps, n_control_features) for multiple initial conditions.
overwrite_sys_matrix (Optional[ndarray]) – Primarily for performance reasons the system matrix $A$ can also be overwritten. An example is to include linear post-mappings of the system (e.g. a projection matrix $P$ , resulting in only returning some quantities of interest; $A^{*} = P \cdot A$ ).
time_delta (Optional[float]) – Time delta $\Delta t$ . This is a required parameter in a “flowmap” system.
time_series_ids (Optional[ndarray]) – Unique integer time series IDs of shape (n_initial_conditions,) for each respective initial condition. Defaults to (0, 1, 2, …).
feature_names_out (Union[Index, list, None]) – Unique feature columns names of shape (n_features,). Defaults to (0, 1, 2, …).

Returns:

Time series for each initial condition, each time series has shape (n_time_values, n_features)

Return type:

TSCDataFrame

property is_differential_system: bool#

Indicate whether the linear system is of “differential” type.

The system uses either the matrix $\mathcal{A}$ or the spectral components to evolve the system.

property is_flowmap_system: bool#

Indicate whether the linear system is a “flowmap” system.

The system uses either the matrix $A$ or its spectral components to evolve the system.

is_linear_system_setup(raise_error_if_not_setup=False)[source]#

Indicate whether the linear system is set up.

Return type:: bool

property is_matrix_mode: bool#

Indicate whether the linear system is in “matrix” mode.

The system uses either matrix $A$ for flowmap or $\mathcal{A}$ for a differential system.

property is_spectral_mode: bool#

Indicate whether the linear system is in “spectral” mode.

The system uses the spectral components of either matrix $A$ for flowmap or $\mathcal{A}$ for differential.

setup_matrix_system(system_matrix, *, control_matrix=None)[source]#

Set up linear system with system matrix.

Parameters:

system_matrix – The system matrix (either $A$ for flowmap or $\mathcal{A}$ for differential type).
control_matrix – The control matrix. Required if the linear system is controlled.

Returns:

self

Return type:

LinearDynamicalSystem

setup_spectral_system(eigenvectors_right, eigenvalues, eigenvectors_left=None, control_matrix=None)[source]#

Set up linear system with spectral components of system matrix.

If the left eigenvectors (attribute eigenvectors_left_) are available the initial condition always solves with the second case for $b_0$ in note of evolve_linear_system() because this is more efficient.

Parameters:

eigenvectors_right (ndarray) – The right eigenvectors $\Psi_r$ of system matrix.
eigenvalues (ndarray) – The eigenvalues $\Lambda$ of system matrix.
eigenvectors_left (Optional[ndarray]) – The left eigenvectors $\Psi_l$ of system matrix.
control_matrix (Optional[ndarray]) – An additional control matrix (note that currently the control matrix is not described in spectral components.

Returns:

self

Return type:

LinearDynamicalSystem

class datafold.pcfold.timeseries.accessor.TSCAccessor(tsc_df)[source]#

Extension functions for TSCDataFrame.

See documentation for regular pandas accessors.

The functions are available through the accessor tsc, for example,

tsc_object.tsc.normalize_time()

Parameters:: tsc_df (TSCDataFrame) – time series collection data to carry out accessor functions on

assign_ids_const_delta(drop_samples=False)[source]#

Split time series with irregular time sampling frequencies in new time series of intervals with constant time sampling.

This function only considers time series with irregular delta_time and aims to split these into sub time series of finite delta_time. Time series with a finite delta_time may only receive a new ID, but the samples remain the same.

The detection of constant sampling intervals is carried out with the second time differences. For the detection of a new sub time series the sampling rate must be at constant for three samples (i.e. two time differences). This means the function does not assign sample pairs (time series of length two) when dealing with completely irregular time series. Instead the samples of irregular intervals are dropped. The main reason for this is that the assignment is not unique.

Parameters:: drop_samples – If True, the function drops samples from irregular sampled intervals (up to entire time series). If dropping samples is required for assignment and the parameter is set to False, a ValueError is raised.
Returns:: Time series collection with re-allocated time series if irregular time series are present in the TSCDataFrame. It returns None if all time series have a completely irregular time sampling.
Return type:: Optional[TSCDataFrame]

assign_ids_sequential()[source]#

Assign time series IDs sequentially starting from zero in a collection.

Note, that this operation is inplace and overwrites the existing time series IDs.

Returns:: The data with time series IDs in sequential order.
Return type:: TSCDataFrame

assign_ids_train_test(train_indices, test_indices, return_dropped=False)[source]#

Split and assign time series IDs based on training and test indices.

Note, that the indices included in train_indices and test_indices must be disjoint (i.e. a sample cannot be both in train_indices and test_indices). Indices that are not included in either training or testing are dropped samples.

Parameters:

train_indices (ndarray) – The indices to indicate which samples are included in the training set.
test_indices (ndarray) – The indices to indicate which samples are included in the test set.
return_dropped (bool) – If True, a DataFrame is returned to the third return value which includes all samples that were neither included in the training indices nor the test indices.

Returns:

TSCDataFrame – The time series collection for training.
TSCDataFrame – The time series collection for testing.
pandas.DataFrame – The dropped samples; only returned if return_dropped_samples=True.

check_const_time_delta()[source]#

Check if all time series have the same time-delta.

Return type:: Union[Series, float]

check_const_timesteps()[source]#

Check that all time series have the same number of timesteps. The time values itself can differ between the time series.

Return type:: int

check_contain_required_ids(required_ids, check_order=False)[source]#: Check that the time series collection contains exactly the required IDs.

classmethod check_equal_delta_time(X, Y, atol=1e-15, require_const=False)[source]#

Check if two time series collections have the same delta times.

Parameters:

X (TSCDataFrame) – First time series collection.
Y (TSCDataFrame) – Second time series collection.
atol – Tolerance passed to equal_const_delta_time()
require_const – If True, both X and Y must have constant delta times.

Raises:

TSCException –
constant with require_const=True. –

Return type:

tuple[Union[float, Series], Union[float, Series]]

check_equal_timevalues()[source]#

Check if all time series in the collection have identical time values.

Return type:: None

check_finite()[source]#

Check if all values are finite (i.e. does not contain nan or inf).

Return type:: None

check_min_features(min_features)[source]#

Check if there is a minimum number of features included in the collection.

Return type:: None

check_min_samples(min_samples)[source]#

Check if there is a minimum number of samples included in the collection.

Return type:: None

check_non_overlapping_timeseries()[source]#

Check if all time series have disjoint time values (do not overlap).

Return type:: None

check_normalized_time()[source]#: Check if time series collection has normalized time. :rtype: None

See also

TSCAccessor.normalize_time()

check_required_min_timesteps(required_min_timesteps)[source]#

Check if all time series in the collection have a minimum number of time steps.

Parameters:: required_min_timesteps (int) – value
Return type:: None

check_required_n_timeseries(required_n_timeseries)[source]#

Check if in the collection are exactly the required number of time series.

Parameters:: required_n_timeseries (int) – value
Return type:: None

check_required_time_delta(required_time_delta)[source]#

Check if time series collection has required time-delta.

Parameters:: required_time_delta (Union[Series, float, int]) – single value or per time series
Return type:: None

check_timeseries_same_length()[source]#

Check if time series in the collection have the same length.

Return type:: None

check_tsc(*, ensure_all_finite=True, ensure_min_samples=1, ensure_min_features=1, ensure_same_length=False, ensure_const_delta_time=True, ensure_delta_time=None, ensure_same_time_values=False, ensure_normalized_time=False, ensure_n_timeseries=None, ensure_min_timesteps=None, ensure_n_timesteps=None, ensure_no_degenerate_ts=True, ensure_dtype_time=None)[source]#

Validate time series properties.

This summarises the single check functions also contained in TSCAccessor.

Parameters:

ensure_all_finite (bool) – If True, check if all values are finite (no ‘nan’ or ‘inf’ values).
ensure_min_samples (int) – If provided, check that the frame has at least required samples.
ensure_same_length (bool) – If True, check if all time series have the same length.
ensure_const_delta_time (bool) – If True, check that all time series have the same time-delta.
ensure_delta_time (Optional[float]) – If provided, check that time series have required time-delta.
ensure_same_time_values (bool) – If True, check that all time series share the same time values.
ensure_normalized_time (bool) – If True, check if the time values are normalized.
ensure_n_timeseries (Optional[int]) – If provided, check if the required number time series are present.
ensure_n_timesteps (Optional[int]) – If provded, check that all time series have exactly the number the timesteps spectifed.
ensure_min_timesteps (Optional[int]) – If provided, check if every time series has the required minimum of time steps.
ensure_no_degenerate_ts (bool) – If True, make sure that no degenerate (single sampled) time series are present.
ensure_dtype_time – Check the data type of the time index.

Returns:

validated time series collection (without changes)

Return type:

TSCDataFrame

drop_last_n_samples(n_samples)[source]#

Drop last n samples per time series in the collection.

n_samples: Number of samples to drop.

Returns:: reduced time series collection
Return type:: TSCDataFrame

classmethod equal_const_delta_time(dt1, dt2, atol=1e-15)[source]#

Returns True, if the time deltas should be treated equally.

Parameters:

dt1 (float) – First delta time.
dt2 (float) – Second delta time.
atol – Acceptable absolute tolerance between the two delta times. This is relevant for delta times with floating point arithmetic which can introduce “numerical noise” (breaking the exact equidistant spacing).

Return type:

bool

fill_timeseries_with_last_state(n_timesteps)[source]#: Fills the time series with less than n_timesteps to a length with n_timesteps by filling the last available state.

iter_timevalue_window(window_size, offset, per_time_series=False)[source]#

Iterator over time series windows.

Parameters:

window_size (int) – The number of samples for each window. Note that the blocksize is not guaranteed and is usually shorter in last iterations if the number of samples are not a multiple of blocksize.
offset (int) – A positive integer value that indicates by how much the next window should be shifted. If offset=blocksize, then the windows are non-overlapping.
per_time_series (bool) – Treat every time series separately when iterating. This is recommended if the time series in a collection have disjoint time values.

Returns:

An iterator for the windowed time series data.

Return type:

Generator[TSCDataFrame]

normalize_time()[source]#

Normalize time in time series collection.

A TSCDataFrame with normalized time has the following properties:

the global time starts at zero
delta_time is constant one

Note, that at least one time series starts at time zero, but other can

Returns:: normalized data with same shape as input
Return type:: TSCDataFrame
Raises:: TSCException – If time delta between all time series is not constant.

plot_density2d(time, xresolution, yresolution, covariance=None)[source]#

Plot the density for a given time.

For this:

Take the first two columns of the underlying data frame and interpret them as x and y coordinates.

Place Gaussian bells onto these coordinates and sum up the values of the corresponding probability density functions (PDF).

The PDF must be evaluated on a fine-granular grid.

Parameters:

time – time value at which to draw the density
xresolution (int) – resolution in x direction
yresolution (int) – resolution in y direction
covariance (Optional[ndarray]) – covariance of Gaussian bells

Returns:

axis handle

Return type:

matplotlib object

shift_matrices(snapshot_orientation='col', validate=True)[source]#

Computes shift matrices from time series data.

Both shift matrices have the same shape with (n_features, n_snapshots-1) or (n_snapshots-1, n_features), depending on snapshot_orientation.

Parameters:

snapshot_orientation (str) – Orientation of snapshots (system states at time) either in rows (“row”) or column-wise (“col”)
validate (bool) – If True, validation steps (constant sampling and that each time series has at least two samples) are performed.

Return type:

tuple[ndarray, ndarray]

Returns:

numpy.ndarray – shift matrix for time steps (0,1,2,…,N-1)
numpy.ndarray – shift matrix for time steps (1,2,…,N)

Raises:

TSCException – If time series collection has no constant time delta.

Time series ID	start time	end time	dt
1	1	10	2
2	1	10	1
3	3	13	3

Modules#

datafold.dynfold.base Module#

Classes#

`TSCBase`()	Base class for Mixin's in datafold.
`TSCPredictMixin`()	Mixin to provide functionality for models that train on time series data.
`TSCTransformerMixin`()	Mixin to provide functionality for point cloud and time series transformations.

Class Inheritance Diagram#

Inheritance diagram of datafold.dynfold.base.TSCBase, datafold.dynfold.base.TSCPredictMixin, datafold.dynfold.base.TSCTransformerMixin

datafold.pcfold.distance Module#

Functions#

`compute_distance_matrix`(X[, Y, metric, ...])	Compute distance matrix with different settings and backends.
`get_backend_distance_algorithm`(backend[, ...])	Selects and validates the backend class for distance matrix computation.
`init_distance_algorithm`([backend, metric, ...])	Initialize a distance matrix by name and keywords.

Classes#

`BruteForceDist`(metric[, exact_numeric, cut_off])	Computes all distance pairs in the distance matrix.
`DistanceAlgorithm`(metric, is_symmetric[, ...])	Abstract base class for distance matrix algorithms (dense or sparse).
`GuessOptimalDist`(metric, is_symmetric[, ...])	Tries to guess a suitable algorithm based on sparsity, metric and installed backends algorithms.
`RDist`(cut_off[, kmin, metric])	Sparse distance matrix algorithm rdist, for point clouds with manifold assumption.
`ScipyKdTreeDist`(cut_off[, metric, kmin])	Sparse distance matrix computation using scipy's kd-tree implementation.
`SklearnBalltreeDist`(cut_off[, metric, kmin])	Distance matrix using ball tree implementation from scikit-learn.
`SklearnKNN`(metric, k, **backend_options)

Class Inheritance Diagram#

Inheritance diagram of datafold.pcfold.distance.BruteForceDist, datafold.pcfold.distance.DistanceAlgorithm, datafold.pcfold.distance.GuessOptimalDist, datafold.pcfold.distance.RDist, datafold.pcfold.distance.ScipyKdTreeDist, datafold.pcfold.distance.SklearnBalltreeDist, datafold.pcfold.distance.SklearnKNN

datafold.pcfold.eigsolver Module#

Functions#

`compute_kernel_eigenpairs`(kernel, ...[, ...])	Compute eigenvalues and -vectors from kernel matrix with consideration of matrix properties.
`compute_kernel_svd`(kernel_matrix, n_svdtriplet)
`scipy_eigsolver`(kernel, kernel_matrix, ...)	Compute eigenpairs of kernel matrix with scipy backend.
`scipy_svdsolver`(kernel_matrix, ...)	Decompose a (possibly rectangular) kernel matrix into singular value components.

Classes#

NumericalMathError(message)

Use for numerical problems/issues, such as singular matrices or too large imaginary part.

Class Inheritance Diagram#

Inheritance diagram of datafold.pcfold.eigsolver.NumericalMathError