Documented internals#

Classes#

class datafold.pcfold.kernels.RadialBasisKernel(required_metric, distance=None)[source]#

Abstract base class for radial basis kernels.

“A radial basis function (RBF) is a real-valued function whose value depends only on the distance between the input and some fixed point.” (taken from Wikipedia)

Parameters:

required_metric (str) – metric required for kernel

class datafold.dynfold.dynsystem.LinearDynamicalSystem(sys_type, sys_mode, is_controlled=False, is_control_affine=False, is_time_invariant=True)[source]#

Evolve linear dynamical system forward in time.

There are various definitions of a linear dynamical system, the specific form is selected from

Parameters:
  • sys_type (Literal['differential', 'flowmap']) –

    Type of linear system:

    • ”differential”

    • ”flowmap”

  • sys_mode (Literal['matrix', 'spectral']) –

    Whether the system is evaluated with

    • ”matrix” (i.e. A or \mathcal{A} are given)

    • ”spectral” (i.e. eigenpairs of A or \mathcal{A} are given)

  • is_controlled (bool) – Whether the system is controlled. If set to True a control matrix must be passed to setup_matrix_system (currently there is no implementation for spectral systems)

  • is_control_affine (bool) – Whether the system is a control affine. The control matrix must be a 3-dim. array (tensor) and the implementation is based on Peitz et al. [2020].

  • is_time_invariant (bool) – If True, the system internally always starts with time=0. This is irrespective of the time given in the time values. If the initial time is larger than zero, the internal times are corrected to the requested time.

References

[Kutz et al., 2016] (pages 3 ff.)

compute_spectral_system_states(states)[source]#

Compute the spectral states of the system.

If the linear system is written in its spectral form:

\Psi_r \Lambda^n \Psi_l x_0 &= x_n \\
\Psi_r \Lambda^n b_0 &= x_n \\

then b_0 is the spectral state, which is computed in this function. It does not necessarily need to be an initial state but instead can be arbitrary states.

In the context of dynamic mode decomposition, the spectral state is also often referred to as “amplitudes”. E.g., see Kutz et al. [2016], page 8. In the context of EDMD, where the DMD model acts on a dictionary space, then the spectral states are the evaluation of the Koopman eigenfunctions. See e.g., Williams et al. [2015] Eq. 3 or 6.

There are two alternatives in how to compute the states.

  1. By using the right eigenvectors and solving in a least square sense

    \Psi_r b_0 = x_0

  2. or by using the left eigenvectors and computing the matrix-vector product

    \Psi_l x_0 = b_0

If the left eigenvectors where set during setup_sys_spectral(), then alternative 2 is used always and otherwise alternative 1.

Parameters:

states (ndarray) – The states of original data space in column-orientation (state_length, n_states).

Returns:

spectrally aligned states

Return type:

numpy.ndarray

evolve_system(initial_conditions, *, time_values, control_input=None, overwrite_sys_matrix=None, time_delta=None, time_series_ids=None, feature_names_out=None)[source]#

Evolve specified linear dynamical system.

The system evolves depending on the specified system solver (set during initialization).

Parameters:
  • initial_conditions (ndarray) – Single initial condition of shape (n_features,) or multiple initial conditions of shape (n_features, n_initial_conditions).

  • time_values (Union[ndarray, float, int, list]) – Time values to evaluate the linear system at t \in \mathbb{R}^{+}

  • control_input (Optional[ndarray]) – Control states over the time horizon acting to the system dynamics. The array has to have a shape of (n_timesteps, n_control_features) for a single initial condition and be a tensor with (n_initial_condition, n_timesteps, n_control_features) for multiple initial conditions.

  • overwrite_sys_matrix (Optional[ndarray]) – Primarily for performance reasons the system matrix A can also be overwritten. An example is to include linear post-mappings of the system (e.g. a projection matrix P, resulting in only returning some quantities of interest; A^{*} = P \cdot A).

  • time_delta (Optional[float]) – Time delta \Delta t. This is a required parameter in a “flowmap” system.

  • time_series_ids (Optional[ndarray]) – Unique integer time series IDs of shape (n_initial_conditions,) for each respective initial condition. Defaults to (0, 1, 2, …).

  • feature_names_out (Union[Index, list, None]) – Unique feature columns names of shape (n_features,). Defaults to (0, 1, 2, …).

Returns:

Time series for each initial condition, each time series has shape (n_time_values, n_features)

Return type:

TSCDataFrame

property is_differential_system: bool#

Indicate whether the linear system is of “differential” type.

The system uses either the matrix \mathcal{A} or the spectral components to evolve the system.

property is_flowmap_system: bool#

Indicate whether the linear system is a “flowmap” system.

The system uses either the matrix A or its spectral components to evolve the system.

is_linear_system_setup(raise_error_if_not_setup=False)[source]#

Indicate whether the linear system is set up.

Return type:

bool

property is_matrix_mode: bool#

Indicate whether the linear system is in “matrix” mode.

The system uses either matrix A for flowmap or \mathcal{A} for a differential system.

property is_spectral_mode: bool#

Indicate whether the linear system is in “spectral” mode.

The system uses the spectral components of either matrix A for flowmap or \mathcal{A} for differential.

setup_matrix_system(system_matrix, *, control_matrix=None)[source]#

Set up linear system with system matrix.

Parameters:
  • system_matrix – The system matrix (either A for flowmap or \mathcal{A} for differential type).

  • control_matrix – The control matrix. Required if the linear system is controlled.

Returns:

self

Return type:

LinearDynamicalSystem

setup_spectral_system(eigenvectors_right, eigenvalues, eigenvectors_left=None, control_matrix=None)[source]#

Set up linear system with spectral components of system matrix.

If the left eigenvectors (attribute eigenvectors_left_) are available the initial condition always solves with the second case for b_0 in note of evolve_linear_system() because this is more efficient.

Parameters:
  • eigenvectors_right (ndarray) – The right eigenvectors \Psi_r of system matrix.

  • eigenvalues (ndarray) – The eigenvalues \Lambda of system matrix.

  • eigenvectors_left (Optional[ndarray]) – The left eigenvectors \Psi_l of system matrix.

  • control_matrix (Optional[ndarray]) – An additional control matrix (note that currently the control matrix is not described in spectral components.

Returns:

self

Return type:

LinearDynamicalSystem

class datafold.pcfold.timeseries.accessor.TSCAccessor(tsc_df)[source]#

Extension functions for TSCDataFrame.

See documentation for regular pandas accessors.

The functions are available through the accessor tsc, for example,

tsc_object.tsc.normalize_time()
Parameters:

tsc_df (TSCDataFrame) – time series collection data to carry out accessor functions on

assign_ids_const_delta(drop_samples=False)[source]#

Split time series with irregular time sampling frequencies in new time series of intervals with constant time sampling.

This function only considers time series with irregular delta_time and aims to split these into sub time series of finite delta_time. Time series with a finite delta_time may only receive a new ID, but the samples remain the same.

The detection of constant sampling intervals is carried out with the second time differences. For the detection of a new sub time series the sampling rate must be at constant for three samples (i.e. two time differences). This means the function does not assign sample pairs (time series of length two) when dealing with completely irregular time series. Instead the samples of irregular intervals are dropped. The main reason for this is that the assignment is not unique.

Parameters:

drop_samples – If True, the function drops samples from irregular sampled intervals (up to entire time series). If dropping samples is required for assignment and the parameter is set to False, a ValueError is raised.

Returns:

Time series collection with re-allocated time series if irregular time series are present in the TSCDataFrame. It returns None if all time series have a completely irregular time sampling.

Return type:

Optional[TSCDataFrame]

assign_ids_sequential()[source]#

Assign time series IDs sequentially starting from zero in a collection.

Note, that this operation is inplace and overwrites the existing time series IDs.

Returns:

The data with time series IDs in sequential order.

Return type:

TSCDataFrame

assign_ids_train_test(train_indices, test_indices, return_dropped=False)[source]#

Split and assign time series IDs based on training and test indices.

Note, that the indices included in train_indices and test_indices must be disjoint (i.e. a sample cannot be both in train_indices and test_indices). Indices that are not included in either training or testing are dropped samples.

Parameters:
  • train_indices (ndarray) – The indices to indicate which samples are included in the training set.

  • test_indices (ndarray) – The indices to indicate which samples are included in the test set.

  • return_dropped (bool) – If True, a DataFrame is returned to the third return value which includes all samples that were neither included in the training indices nor the test indices.

Returns:

  • TSCDataFrame – The time series collection for training.

  • TSCDataFrame – The time series collection for testing.

  • pandas.DataFrame – The dropped samples; only returned if return_dropped_samples=True.

check_const_time_delta()[source]#

Check if all time series have the same time-delta.

Return type:

Union[Series, float]

check_const_timesteps()[source]#

Check that all time series have the same number of timesteps. The time values itself can differ between the time series.

Return type:

int

check_contain_required_ids(required_ids, check_order=False)[source]#

Check that the time series collection contains exactly the required IDs.

classmethod check_equal_delta_time(X, Y, atol=1e-15, require_const=False)[source]#

Check if two time series collections have the same delta times.

Parameters:
Raises:
Return type:

tuple[Union[float, Series], Union[float, Series]]

check_equal_timevalues()[source]#

Check if all time series in the collection have identical time values.

Return type:

None

check_finite()[source]#

Check if all values are finite (i.e. does not contain nan or inf).

Return type:

None

check_min_features(min_features)[source]#

Check if there is a minimum number of features included in the collection.

Return type:

None

check_min_samples(min_samples)[source]#

Check if there is a minimum number of samples included in the collection.

Return type:

None

check_non_overlapping_timeseries()[source]#

Check if all time series have disjoint time values (do not overlap).

Return type:

None

check_normalized_time()[source]#

Check if time series collection has normalized time. :rtype: None

check_required_min_timesteps(required_min_timesteps)[source]#

Check if all time series in the collection have a minimum number of time steps.

Parameters:

required_min_timesteps (int) – value

Return type:

None

check_required_n_timeseries(required_n_timeseries)[source]#

Check if in the collection are exactly the required number of time series.

Parameters:

required_n_timeseries (int) – value

Return type:

None

check_required_time_delta(required_time_delta)[source]#

Check if time series collection has required time-delta.

Parameters:

required_time_delta (Union[Series, float, int]) – single value or per time series

Return type:

None

check_timeseries_same_length()[source]#

Check if time series in the collection have the same length.

Return type:

None

check_tsc(*, ensure_all_finite=True, ensure_min_samples=1, ensure_min_features=1, ensure_same_length=False, ensure_const_delta_time=True, ensure_delta_time=None, ensure_same_time_values=False, ensure_normalized_time=False, ensure_n_timeseries=None, ensure_min_timesteps=None, ensure_n_timesteps=None, ensure_no_degenerate_ts=True, ensure_dtype_time=None)[source]#

Validate time series properties.

This summarises the single check functions also contained in TSCAccessor.

Parameters:
  • ensure_all_finite (bool) – If True, check if all values are finite (no ‘nan’ or ‘inf’ values).

  • ensure_min_samples (int) – If provided, check that the frame has at least required samples.

  • ensure_same_length (bool) – If True, check if all time series have the same length.

  • ensure_const_delta_time (bool) – If True, check that all time series have the same time-delta.

  • ensure_delta_time (Optional[float]) – If provided, check that time series have required time-delta.

  • ensure_same_time_values (bool) – If True, check that all time series share the same time values.

  • ensure_normalized_time (bool) – If True, check if the time values are normalized.

  • ensure_n_timeseries (Optional[int]) – If provided, check if the required number time series are present.

  • ensure_n_timesteps (Optional[int]) – If provded, check that all time series have exactly the number the timesteps spectifed.

  • ensure_min_timesteps (Optional[int]) – If provided, check if every time series has the required minimum of time steps.

  • ensure_no_degenerate_ts (bool) – If True, make sure that no degenerate (single sampled) time series are present.

  • ensure_dtype_time – Check the data type of the time index.

Returns:

validated time series collection (without changes)

Return type:

TSCDataFrame

drop_last_n_samples(n_samples)[source]#

Drop last n samples per time series in the collection.

n_samples

Number of samples to drop.

Returns:

reduced time series collection

Return type:

TSCDataFrame

classmethod equal_const_delta_time(dt1, dt2, atol=1e-15)[source]#

Returns True, if the time deltas should be treated equally.

Parameters:
  • dt1 (float) – First delta time.

  • dt2 (float) – Second delta time.

  • atol – Acceptable absolute tolerance between the two delta times. This is relevant for delta times with floating point arithmetic which can introduce “numerical noise” (breaking the exact equidistant spacing).

Return type:

bool

fill_timeseries_with_last_state(n_timesteps)[source]#

Fills the time series with less than n_timesteps to a length with n_timesteps by filling the last available state.

iter_timevalue_window(window_size, offset, per_time_series=False)[source]#

Iterator over time series windows.

Parameters:
  • window_size (int) – The number of samples for each window. Note that the blocksize is not guaranteed and is usually shorter in last iterations if the number of samples are not a multiple of blocksize.

  • offset (int) – A positive integer value that indicates by how much the next window should be shifted. If offset=blocksize, then the windows are non-overlapping.

  • per_time_series (bool) – Treat every time series separately when iterating. This is recommended if the time series in a collection have disjoint time values.

Returns:

An iterator for the windowed time series data.

Return type:

Generator[TSCDataFrame]

normalize_time()[source]#

Normalize time in time series collection.

A TSCDataFrame with normalized time has the following properties:

  • the global time starts at zero

  • delta_time is constant one

Note, that at least one time series starts at time zero, but other can

Returns:

normalized data with same shape as input

Return type:

TSCDataFrame

Raises:

TSCException – If time delta between all time series is not constant.

plot_density2d(time, xresolution, yresolution, covariance=None)[source]#

Plot the density for a given time.

For this:

  • Take the first two columns of the underlying data frame and interpret them as x and y coordinates.

  • Place Gaussian bells onto these coordinates and sum up the values of the corresponding probability density functions (PDF).

  • The PDF must be evaluated on a fine-granular grid.

Parameters:
  • time – time value at which to draw the density

  • xresolution (int) – resolution in x direction

  • yresolution (int) – resolution in y direction

  • covariance (Optional[ndarray]) – covariance of Gaussian bells

Returns:

axis handle

Return type:

matplotlib object

shift_matrices(snapshot_orientation='col', validate=True)[source]#

Computes shift matrices from time series data.

Both shift matrices have the same shape with (n_features, n_snapshots-1) or (n_snapshots-1, n_features), depending on snapshot_orientation.

Parameters:
  • snapshot_orientation (str) – Orientation of snapshots (system states at time) either in rows (“row”) or column-wise (“col”)

  • validate (bool) – If True, validation steps (constant sampling and that each time series has at least two samples) are performed.

Return type:

tuple[ndarray, ndarray]

Returns:

Raises:

TSCException – If time series collection has no constant time delta.

See also

DMDFull

shift_time_by_delta(shift_t)[source]#

Shift all time values from the time series by a constant value.

Parameters:

shift_t (float) – positive or negative time shift value

Returns:

same shape as input

Return type:

TSCDataFrame

shift_time_per_time_series(shift_values=None, ensure_identical_values=False, return_shift_values=False)[source]#

Shift each time series by a value given in shift_values. If shift_values is None then each time series is normalized to zero. This may be beneficial when dealing with time series data from autonomous systems.

Parameters:
  • shift_values (Optional[Series]) – If provided, then the series must contain the shift value for each time series. If None then the shift values are computed such that each time series has an initial time value of zero.

  • ensure_identical_values – A flag that performs an extra routine that counteracts numerical noise after the time values are shifted. Note that this is only possible if the time series collection is equally spaced (otherwise the parameter is ignored).

  • reutrn_shift_values – If True the applied shift_values are returned. This is useful if the parameter shift_values is None.

Returns:

shifted time series collection and shift_values (optional)

Return type:

TSCDataFrame, pd.Series

time_derivative(scheme='center', diff_order=1, accuracy=2, shift_index=False)[source]#

Compute finite differences in time for each time series.

Note

The boundary samples are dropped at which no finite difference scheme of the set accuracy is possible. To apply lower accuracy schemes requires implementation.

Parameters:
  • scheme – The finite difference scheme ‘backward’, ‘center’ or ‘forward’.

  • diff_order (int) – The order of the derivative.

  • accuracy (int) – The accuracy (even positive integer) of the derivative scheme.

  • shift_index (bool) – If True, then the time is shifted such that no future samples are included. For example, for the coefficients` [-1,0,1]`, the computed time derivative for time 1 is then shifted to time 2. The option is inteded for scheme=’center’. The parameter has no effect for scheme=backward and is discouraged for scheme=forward.

Returns:

The finite difference time series. The boundary samples are removed, i.e. the number of samples decrease accordingly.

Return type:

Union[pd.DataFrame, TSCDataFrame]

time_values_overview()[source]#

Generate table with overview of time values.

Example of how the table looks: .. Comment: generated with https://truben.no/table/

Time series ID

start time

end time

dt

1

1

10

2

2

1

10

1

3

3

13

3

Returns:

overview

Return type:

pandas.DataFrame

class datafold.pcfold.timeseries.collection.TSCException(message)[source]#

Error raised if TSC is not correct.

Modules#

datafold.dynfold.base Module#

Classes#

TSCBase()

Base class for Mixin's in datafold.

TSCPredictMixin()

Mixin to provide functionality for models that train on time series data.

TSCTransformerMixin()

Mixin to provide functionality for point cloud and time series transformations.

Class Inheritance Diagram#

Inheritance diagram of datafold.dynfold.base.TSCBase, datafold.dynfold.base.TSCPredictMixin, datafold.dynfold.base.TSCTransformerMixin

datafold.pcfold.distance Module#

Functions#

compute_distance_matrix(X[, Y, metric, ...])

Compute distance matrix with different settings and backends.

get_backend_distance_algorithm(backend[, ...])

Selects and validates the backend class for distance matrix computation.

init_distance_algorithm([backend, metric, ...])

Initialize a distance matrix by name and keywords.

Classes#

BruteForceDist(metric[, exact_numeric, cut_off])

Computes all distance pairs in the distance matrix.

DistanceAlgorithm(metric, is_symmetric[, ...])

Abstract base class for distance matrix algorithms (dense or sparse).

GuessOptimalDist(metric, is_symmetric[, ...])

Tries to guess a suitable algorithm based on sparsity, metric and installed backends algorithms.

RDist(cut_off[, kmin, metric])

Sparse distance matrix algorithm rdist, for point clouds with manifold assumption.

ScipyKdTreeDist(cut_off[, metric, kmin])

Sparse distance matrix computation using scipy's kd-tree implementation.

SklearnBalltreeDist(cut_off[, metric, kmin])

Distance matrix using ball tree implementation from scikit-learn.

SklearnKNN(metric, k, **backend_options)

Class Inheritance Diagram#

Inheritance diagram of datafold.pcfold.distance.BruteForceDist, datafold.pcfold.distance.DistanceAlgorithm, datafold.pcfold.distance.GuessOptimalDist, datafold.pcfold.distance.RDist, datafold.pcfold.distance.ScipyKdTreeDist, datafold.pcfold.distance.SklearnBalltreeDist, datafold.pcfold.distance.SklearnKNN

datafold.pcfold.eigsolver Module#

Functions#

compute_kernel_eigenpairs(kernel, ...[, ...])

Compute eigenvalues and -vectors from kernel matrix with consideration of matrix properties.

compute_kernel_svd(kernel_matrix, n_svdtriplet)

scipy_eigsolver(kernel, kernel_matrix, ...)

Compute eigenpairs of kernel matrix with scipy backend.

scipy_svdsolver(kernel_matrix, ...)

Decompose a (possibly rectangular) kernel matrix into singular value components.

Classes#

NumericalMathError(message)

Use for numerical problems/issues, such as singular matrices or too large imaginary part.

Class Inheritance Diagram#

Inheritance diagram of datafold.pcfold.eigsolver.NumericalMathError