LocalRegressionSelection#

class datafold.dynfold.LocalRegressionSelection(*, eps_med_scale=3, n_subsample=inf, strategy='dim', intrinsic_dim=2, regress_threshold=0.9, bandwidth_type='median', random_state=None)[source]#

Bases: BaseEstimator, TSCTransformerMixin

Automatic selection of functional independent geometric harmonic vectors for parsimonious data manifold embedding.

To measure the functional dependency a local regression is performed: The larger the residuals between eigenvector sets the more information they add and are therefore more likely to be considered in an embedding.

The kernel used for the local linear regression has a scale of

scale = bandwidth_type(distances) / eps_med_scale

In the referenced paper this is described on page 6, Eq. 11.

Parameters:
  • eps_med_scale – Epsilon scale in kernel of the local linear regression.

  • n_subsample – Number of randomly uniform selected samples to reduce the computational cost of the linear regressions. Lower numbers boost the performance of the selection at the cost of accuracy. The minimum value is 100 samples.

  • strategy

    • “dim” - set the expected dimension (fixed set of eigenvectors)

    • ”threshold” - choose all eigenvectors that are above the threshold (variable set of eigenpairs)

  • intrinsic_dim – Number of eigenvectors to select with largest residuals.

  • regress_threshold – Threshold for local residual to include eigenvectors that are above, if strategy=”threshold”.

  • bandwidth_type – “median” or “mean”

  • random_state (Optional[int]) – seed for random generator if the data is subsampled

Variables:
  • evec_indices

  • residuals

References

[Dsilva et al., 2018]

Methods Summary

fit(X[, y])

Select indices according to strategy.

get_feature_names_out([input_features])

inverse_transform(X)

n/a.

transform(X)

Select parsimonious representation of full set of eigenvectors.

Methods Documentation

fit(X, y=None, **fit_params)[source]#

Select indices according to strategy.

Parameters:
  • X (Union[TSCDataFrame, ndarray]) – Eigenvectors of shape (n_samples, n_eigenvectors) to make selection on.

  • y (None) – ignored

  • **fit_params (Dict[str, object]) – None

Returns:

self

Return type:

LocalRegressionSelection

get_feature_names_out(input_features=None)[source]#
inverse_transform(X)[source]#

n/a.

Warning

Not implemented.

transform(X)[source]#

Select parsimonious representation of full set of eigenvectors.

Parameters:

X (Union[TSCDataFrame, ndarray]) – Eigenvectors of shape (n_samples, n_eigenvectors) to carry out selection.

Returns:

same type as X of shape (n_samples, n_evec_indices)

Return type:

TSCDataFrame, pandas.DataFrame, numpy.ndarray