LocalRegressionSelection#
- class datafold.dynfold.LocalRegressionSelection(*, eps_med_scale=3, n_subsample=inf, strategy='dim', intrinsic_dim=2, regress_threshold=0.9, bandwidth_type='median', random_state=None)[source]#
Bases:
BaseEstimator
,TSCTransformerMixin
Automatic selection of functional independent geometric harmonic vectors for parsimonious data manifold embedding.
To measure the functional dependency a local regression is performed: The larger the residuals between eigenvector sets the more information they add and are therefore more likely to be considered in an embedding.
The kernel used for the local linear regression has a scale of
scale = bandwidth_type(distances) / eps_med_scale
In the referenced paper this is described on page 6, Eq. 11.
…
- Parameters:
eps_med_scale – Epsilon scale in kernel of the local linear regression.
n_subsample – Number of randomly uniform selected samples to reduce the computational cost of the linear regressions. Lower numbers boost the performance of the selection at the cost of accuracy. The minimum value is 100 samples.
strategy –
“dim” - set the expected dimension (fixed set of eigenvectors)
”threshold” - choose all eigenvectors that are above the threshold (variable set of eigenpairs)
intrinsic_dim – Number of eigenvectors to select with largest residuals.
regress_threshold – Threshold for local residual to include eigenvectors that are above, if strategy=”threshold”.
bandwidth_type – “median” or “mean”
random_state (
Optional
[int
]) – seed for random generator if the data is subsampled
- Variables:
evec_indices –
residuals –
References
Methods Summary
fit
(X[, y])Select indices according to strategy.
get_feature_names_out
([input_features])n/a.
transform
(X)Select parsimonious representation of full set of eigenvectors.
Methods Documentation
- fit(X, y=None, **fit_params)[source]#
Select indices according to strategy.
- Parameters:
X (
Union
[TSCDataFrame
,ndarray
]) – Eigenvectors of shape (n_samples, n_eigenvectors) to make selection on.y (None) – ignored
- Returns:
self
- Return type:
- transform(X)[source]#
Select parsimonious representation of full set of eigenvectors.
- Parameters:
X (
Union
[TSCDataFrame
,ndarray
]) – Eigenvectors of shape (n_samples, n_eigenvectors) to carry out selection.- Returns:
same type as X of shape (n_samples, n_evec_indices)
- Return type: