tsfresh includes three scikit-learn compatible transformers, which allow you to easily incorporate feature extraction and feature selection from time series into your existing machine learning pipelines.
The scikit-learn pipeline allows you to assemble several pre-processing steps that will be executed in sequence and thus, can be cross-validated together while setting different parameters (for more details about the scikit-learn’s pipeline, take a look at the official documentation ). Our tsfresh transformers allow you to extract and filter the time series features during these pre-processing sequence.
The first two estimators in tsfresh are the
which extracts the features, and the
performs the feature selection algorithm.
It is preferable to combine extracting and filtering of the features in a single step to avoid unnecessary feature
RelevantFeatureAugmenter combines both the
extraction and filtering of the features in a single step.
In the following example you see how we combine tsfresh’s
RelevantFeatureAugmenter and a
RandomForestClassifier into a single pipeline. This pipeline can then fit both our
transformer and the classifier in one step.
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from tsfresh.examples import load_robot_execution_failures
from tsfresh.transformers import RelevantFeatureAugmenter
import pandas as pd
# Download dataset
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures
pipeline = Pipeline([
('augmenter', RelevantFeatureAugmenter(column_id='id', column_sort='time')),
df_ts, y = load_robot_execution_failures()
X = pd.DataFrame(index=y.index)
The parameters of the
RelevantFeatureAugmenter correspond to
the parameters of the top-level convenience function
In the above example, we only set the names of two columns
(see Data Formats for more details on those parameters).
Because we cannot pass the time series container directly as a parameter to the augmenter step when calling fit or
transform on a
sklearn.pipeline.Pipeline, we have to set it manually by calling
In general, you can change the time series container from which the features are extracted by calling either the
set_params() method or the transformers
For further examples, visit the Jupyter Notebook 02 sklearn Pipeline.ipynb in the notebooks folder of the tsfresh github repository.