datafold.pcfold Package#

The lowest level of datafold provides data structures, objects directly associated with data (e.g., kernels) and fundamental algorithms on data (e.g., distance matrix and eigen-solver). There are two data structures provided in datafold:

  • PCManifold for point cloud data with manifold assumption. The data structure is derived from numpy.ndarray and attaches a kernel to describe local proximity between points. All kernels implemented in datafold have the base class PCManifoldKernel. The data structure encapsulates the complexity of recurring routines in kernel methods. For example, it computes sparse/dense kernel matrices of different distance metrics and eigenpairs with different backends.

  • TSCDataFrame is a collections of time series data. The data structure’s base class is Pandas’ pandas.DataFrame and can index multiple time series and time values of potentially multi-dimensional time series in a single object. The data structure is mainly required for system identification models. The time series can have different properties, e.g. different time series lengths or time values. For this reason. there are many methods available to validate the model’s assumptions.

There are other low-level machine learning algorithms or tasks directly connected to the data structures. For estimating the scale of a GaussianKernel the algorithms in PCManifold.optimize_parameters() are suitable. For TSCDataframe this includes time series splits into training/test sets ( (TSCKfoldSeries or TSCKFoldTime) or measuring error metrics between predicted and true time series (TSCMetric).

Functions#

allocate_time_series_tensor(n_time_series, ...)

Allocate a time series tensor that complies with TSCDataFrame.from_tensor().

estimate_cutoff(pcm[, n_subsample, k, ...])

Estimates a good choice of cut-off for a Gaussian radial basis kernel, given a certain tolerance below which the kernel values are considered zero.

estimate_scale(pcm[, tol, cut_off])

Estimates the Gaussian kernel scale (epsilon) for a Gaussian kernel, given a certain tolerance below which the kernel values are considered zero.

pcm_remove_outlier(pcm, kmin, cut_off)

Remove all points that have not a minimum number of neighbors insinde the distance range.

pcm_subsample(pcm[, n_samples, ...])

Subsample a manifold point cloud with a uniform sample density.

Classes#

ConeKernel([zeta, epsilon, fd_accuracy, ...])

Compute a dynamically adapted cone kernel on time series collection data.

ContinuousNNKernel(k_neighbor, delta[, distance])

Compute the continuous k nearest-neighbor adjacency graph.

CubicKernel([distance])

Cubic radial basis kernel.

DmapKernelFixed(internal_kernel, *[, ...])

Diffusion map kernel with fixed kernel bandwidth.

GaussianKernel([epsilon, distance])

Gaussian radial basis kernel.

InitialCondition()

Helper functions to create and validate initial conditions for time series predictions.

InverseMultiquadricKernel([epsilon, distance])

Inverse multiquadric radial basis kernel.

InverseQuadraticKernel([epsilon])

Inverse quadratic radial basis kernel.

MultiquadricKernel([epsilon, distance])

Multiquadric radial basis kernel.

PCManifold(data[, kernel, dist_kwargs])

Represent a point cloud lying near a manifold with a kernel.

PCManifoldKernel([is_symmetric, ...])

Abstract base class for kernels evaluated on static point clouds or time series.

QuinticKernel([distance])

Quintic radial basis kernel.

TSCDataFrame(*args[, fixed_delta, validate])

Data frame to store time series collections.

TSCKFoldTime([n_splits])

K-fold splits on time values.

TSCKfoldSeries([n_splits, shuffle, random_state])

K-fold splits on entire time series.

TSCMetric(metric, mode[, scaling])

Compute metrics for time series collection data.

TSCScoring(tsc_metric[, greater_is_better])

Create scoring function from TSCMetric.

TSCWindowFoldTime(test_window_length[, ...])

Assign windows of test samples starting from the end of the time series collection.

Class Inheritance Diagram#

Inheritance diagram of datafold.pcfold.kernels.ConeKernel, datafold.pcfold.kernels.ContinuousNNKernel, datafold.pcfold.kernels.CubicKernel, datafold.pcfold.kernels.DmapKernelFixed, datafold.pcfold.kernels.GaussianKernel, datafold.pcfold.timeseries.collection.InitialCondition, datafold.pcfold.kernels.InverseMultiquadricKernel, datafold.pcfold.kernels.InverseQuadraticKernel, datafold.pcfold.kernels.MultiquadricKernel, datafold.pcfold.pointcloud.PCManifold, datafold.pcfold.kernels.PCManifoldKernel, datafold.pcfold.kernels.QuinticKernel, datafold.pcfold.timeseries.collection.TSCDataFrame, datafold.pcfold.timeseries.metric.TSCKFoldTime, datafold.pcfold.timeseries.metric.TSCKfoldSeries, datafold.pcfold.timeseries.metric.TSCMetric, datafold.pcfold.timeseries.metric.TSCScoring, datafold.pcfold.timeseries.metric.TSCWindowFoldTime