ConeKernel#

class datafold.pcfold.ConeKernel(zeta=0.0, epsilon=1.0, fd_accuracy=4, distance=None)[source]#

Bases: TSCManifoldKernel

Compute a dynamically adapted cone kernel on time series collection data.

The equations below describe the kernel evaluation and are taken from the referenced paper below.

A single kernel evaluation between time series samples $x$ and $y$ is computed with

$K(x, y) = \exp \left( -\frac{\vert\vert \omega_{ij}\vert\vert^2} {\varepsilon \Delta t^2 \vert\vert \xi_i \vert\vert \vert\vert \xi_j \vert\vert } \left[ (1-\zeta \cos^2 \theta_i)(1-\zeta \cos^2 \theta_j) \right]^{0.5} \right)$

where,

$\cos \theta_i = \frac{(\xi_i, \omega_{ij})} {\vert\vert \xi_i \vert\vert \vert\vert \omega_{ij} \vert\vert}$

is the angle between samples,

$\omega_{ij} = y - x$

is a difference vector between the point pairs,

$\Delta t$

is the (constant) time sampling in the time series,

$\varepsilon$

is an additional scaling parameter of the kernel bandwidth,

$\zeta$

is the parameter to control the angular influence, and

$\xi_i = \Delta_p x_i = \sum_{j=-p/2}^{p/2} w_j x_{i+j}$

is the approximation of the dynamical vector field. The approximation is carried out with $\Delta_p$ , a $p$ -th order accurate central finite difference (in a sense that $\frac{\xi}{\Delta t} + \mathcal{O}(\Delta t^p)$ ) with associated weights $w$ .

Note

In the centered finite difference the time values are shifted such that no samples are taken from the future. For exmaple, for the scheme $x_{t+1} - x_{t-1}$ , at time $t$ , then the new assigned time value is t+1. See also TSCAccessor.time_derivative().

Parameters:

zeta (float) – A scalar between $[0, 1)$ to control the angular influence. The weight from one point to a neighboring point is increased if the relative displacement vector is aligned with the dynamical flow. The special case of zeta=0, corresponds to the so-called “Non-Linear Laplacian Spectral Analysis” kernel (NLSA).
epsilon (float) – An additional scaling parameter with which the kernel scale can be adapted to the actual time sampling frequency.
fd_accuracy (int) – The accuracy of the centered finite difference scheme ( $p$ in the description). Note, that the higher the order the more smaples are required in a warm-up phase, where the centered scheme cannot be evaluated with the given accuracy. All samples from this warm-up phase are dropped in the kernel evaluation.

Variables:

timederiv_X (np.ndarray) – The time derivative from the finite difference scheme for the reference data X. Required for the component-wise evaluation and only available after a pairse evaluation of the kernel.
norm_timederiv_X (np.ndarray) – Norm of the time derivative for the reference data X. Required for the component-wise evaluation and only available after a pairse evaluation of the kernel.

References

Giannakis [2015] (the equations are taken from the arXiv version)

Methods Summary

__call__(X[, Y])

Compute kernel matrix.

Methods Documentation

__call__(X, Y=None, **kernel_kwargs)[source]#

Compute kernel matrix.

Parameters:

X (DataFrame) – The reference time series collection of shape (n_samples_X, n_features_X).
Y (Optional[DataFrame]) – The query time series collection of shape (n_samples_Y, n_features_Y). If Y is not provided, then Y=X.
**kernel_kwargs – None

Returns:

The kernel matrix with time information.

Return type:

TSCDataFrame