ConeKernel#

class datafold.pcfold.ConeKernel(zeta=0.0, epsilon=1.0, fd_accuracy=4, distance=None)[source]#

Bases: TSCManifoldKernel

Compute a dynamically adapted cone kernel on time series collection data.

The equations below describe the kernel evaluation and are taken from the referenced paper below.

A single kernel evaluation between time series samples x and y is computed with

K(x, y) = \exp
\left(
-\frac{\vert\vert \omega_{ij}\vert\vert^2}
{\varepsilon \Delta t^2 \vert\vert \xi_i \vert\vert \vert\vert \xi_j \vert\vert }
\left[ (1-\zeta \cos^2 \theta_i)(1-\zeta \cos^2 \theta_j) \right]^{0.5}
\right)

where,

\cos \theta_i =
\frac{(\xi_i, \omega_{ij})}
{\vert\vert \xi_i \vert\vert \vert\vert \omega_{ij} \vert\vert}

is the angle between samples,

\omega_{ij} = y - x

is a difference vector between the point pairs,

\Delta t

is the (constant) time sampling in the time series,

\varepsilon

is an additional scaling parameter of the kernel bandwidth,

\zeta

is the parameter to control the angular influence, and

\xi_i = \Delta_p x_i = \sum_{j=-p/2}^{p/2} w_j x_{i+j}

is the approximation of the dynamical vector field. The approximation is carried out with \Delta_p, a p-th order accurate central finite difference (in a sense that \frac{\xi}{\Delta t} + \mathcal{O}(\Delta t^p)) with associated weights w.

Note

In the centered finite difference the time values are shifted such that no samples are taken from the future. For exmaple, for the scheme x_{t+1} - x_{t-1}, at time t, then the new assigned time value is t+1. See also TSCAccessor.time_derivative().

Parameters:
  • zeta (float) – A scalar between [0, 1) to control the angular influence. The weight from one point to a neighboring point is increased if the relative displacement vector is aligned with the dynamical flow. The special case of zeta=0, corresponds to the so-called “Non-Linear Laplacian Spectral Analysis” kernel (NLSA).

  • epsilon (float) – An additional scaling parameter with which the kernel scale can be adapted to the actual time sampling frequency.

  • fd_accuracy (int) – The accuracy of the centered finite difference scheme (p in the description). Note, that the higher the order the more smaples are required in a warm-up phase, where the centered scheme cannot be evaluated with the given accuracy. All samples from this warm-up phase are dropped in the kernel evaluation.

Variables:
  • timederiv_X (np.ndarray) – The time derivative from the finite difference scheme for the reference data X. Required for the component-wise evaluation and only available after a pairse evaluation of the kernel.

  • norm_timederiv_X (np.ndarray) – Norm of the time derivative for the reference data X. Required for the component-wise evaluation and only available after a pairse evaluation of the kernel.

References

Giannakis [2015] (the equations are taken from the arXiv version)

Methods Summary

__call__(X[, Y])

Compute kernel matrix.

Methods Documentation

__call__(X, Y=None, **kernel_kwargs)[source]#

Compute kernel matrix.

Parameters:
  • X (DataFrame) – The reference time series collection of shape (n_samples_X, n_features_X).

  • Y (Optional[DataFrame]) – The query time series collection of shape (n_samples_Y, n_features_Y). If Y is not provided, then Y=X.

  • **kernel_kwargs – None

Returns:

The kernel matrix with time information.

Return type:

TSCDataFrame