TSCWindowFoldTime#

class datafold.pcfold.TSCWindowFoldTime(test_window_length, window_offset=0, train_min_timesteps=None)[source]#

Bases: TSCCrossValidationSplit

Assign windows of test samples starting from the end of the time series collection.

This method is useful for time series collections with gaps (time intervals of no data). Specifically, the time series’ time values should not overlap and be in ordered with time (e.g. time series ID 0 should not start after ID 1).

The windows are set with these rules:

  • The iteration is in reverse order, i.e., the first window for testing is set in the last ID with the respective last time samples. The benefit is that when the number of splits are reduced, then a optimization procedure uses the latest time samples. This window has the most predictive power because it contains the most recent samples.

  • The window is always of the same size and only placed within a time series (i.e. no overlapping. If the next window within a time series cannot be placed, then these samples will not be included in any test set.

  • If a time series has less samples than test_window_length, then this time series will not be considered for testing.

Parameters:
  • test_window_length (int) – The length of a window for samples included in testing.

  • window_offset (int) – The offset to next possible test window. In a single long time series the offset equals the gap between windows.

  • train_min_timesteps (Optional[int]) – The minimum number of time steps required for training. If a time series has less samples than the required minimum (e.g. because the time series is split due to a test window) then these time series are dropped.

Methods Summary

get_n_splits([X, y, groups])

Number of splits.

plot_splits(X[, test_set])

Plot the test, training and dropped samples.

split(X[, y, groups])

Yield windows of indices for training and testing for a time series collection with non-overlapping time series.

Methods Documentation

get_n_splits(X=None, y=None, groups=None)[source]#

Number of splits.

Parameters:
  • X (Optional[TSCDataFrame]) – The time series data to split. This parameter is mandatory to compute the number of splits.

  • y – ignored

  • groups – ignored

Return type:

int

plot_splits(X, test_set=None)[source]#

Plot the test, training and dropped samples.

Parameters:
  • X (TSCDataFrame) – The time series data to split.

  • test_set – A completely separated test set. If provided, then the test windows produced by TSCWindowFoldTime are labeled as validation sets.

Return type:

None

split(X, y=None, groups=None)[source]#

Yield windows of indices for training and testing for a time series collection with non-overlapping time series.

Parameters:
  • X (TSCDataFrame) – The data to split.

  • y – ignored

  • groups – ignored

Yields:
  • numpy.ndarray – train indices

  • numpy.ndarray – test indices