scikit-learn Transformers¶
tsfresh includes three scikit-learn compatible transformers, which allow you to easily incorporate feature extraction and feature selection from time series into your existing machine learning pipelines.
The scikit-learn pipeline allows you to assemble several pre-processing steps that will be executed in sequence and thus, can be cross-validated together while setting different parameters (for more details about the scikit-learn’s pipeline, take a look at the official documentation [1]). Our tsfresh transformers allow you to extract and filter the time series features during these pre-processing sequence.
The first two estimators in tsfresh are the FeatureAugmenter
,
which extracts the features, and the FeatureSelector
, which
performs the feature selection algorithm.
It is preferable to combine extracting and filtering of the features in a single step to avoid unnecessary feature
calculations.
Hence, the RelevantFeatureAugmenter
combines both the
extraction and filtering of the features in a single step.
Example¶
In the following example you see how we combine tsfresh’s
RelevantFeatureAugmenter
and a
RandomForestClassifier
into a single pipeline. This pipeline can then fit both our
transformer and the classifier in one step.
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from tsfresh.examples import load_robot_execution_failures
from tsfresh.transformers import RelevantFeatureAugmenter
import pandas as pd
# Download dataset
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures
download_robot_execution_failures()
pipeline = Pipeline([
('augmenter', RelevantFeatureAugmenter(column_id='id', column_sort='time')),
('classifier', RandomForestClassifier()),
])
df_ts, y = load_robot_execution_failures()
X = pd.DataFrame(index=y.index)
pipeline.set_params(augmenter__timeseries_container=df_ts)
pipeline.fit(X, y)
The parameters of the RelevantFeatureAugmenter
correspond to
the parameters of the top-level convenience function
extract_relevant_features()
.
In the above example, we only set the names of two columns column_id='id'
, column_sort='time'
(see Data Formats for more details on those parameters).
Because we cannot pass the time series container directly as a parameter to the augmenter step when calling fit or
transform on a sklearn.pipeline.Pipeline
, we have to set it manually by calling
pipeline.set_params(augmenter__timeseries_container=df_ts)
.
In general, you can change the time series container from which the features are extracted by calling either the
pipeline’s set_params()
method or the transformers
set_timeseries_container()
method.
For further examples, visit the Jupyter Notebook 02 sklearn Pipeline.ipynb in the notebooks folder of the tsfresh github repository.