scikit-learn Transformers

tsfresh includes three scikit-learn compatible transformers, which allow you to easily incorporate feature extraction and feature selection from time series into your existing machine learning pipelines.

The scikit-learn pipeline allows you to assemble several pre-processing steps that will be executed in sequence and thus, can be cross-validated together while setting different parameters (for more details about the scikit-learn’s pipeline, take a look at the official documentation [1]). Our tsfresh transformers allow you to extract and filter the time series features during these pre-processing sequence.

The first two estimators in tsfresh are the FeatureAugmenter, which extracts the features, and the FeatureSelector, which performs the feature selection algorithm. It is preferable to combine extracting and filtering of the features in a single step to avoid unnecessary feature calculations. Hence, the RelevantFeatureAugmenter combines both the extraction and filtering of the features in a single step.

Example

In the following example you see how we combine tsfresh’s RelevantFeatureAugmenter and a RandomForestClassifier into a single pipeline. This pipeline can then fit both our transformer and the classifier in one step.

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from tsfresh.examples import load_robot_execution_failures
from tsfresh.transformers import RelevantFeatureAugmenter
import pandas as pd

# Download dataset
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures
download_robot_execution_failures()

pipeline = Pipeline([
            ('augmenter', RelevantFeatureAugmenter(column_id='id', column_sort='time')),
            ('classifier', RandomForestClassifier()),
            ])

df_ts, y = load_robot_execution_failures()
X = pd.DataFrame(index=y.index)

pipeline.set_params(augmenter__timeseries_container=df_ts)
pipeline.fit(X, y)

The parameters of the RelevantFeatureAugmenter correspond to the parameters of the top-level convenience function extract_relevant_features(). In the above example, we only set the names of two columns column_id='id', column_sort='time' (see Data Formats for more details on those parameters).

Because we cannot pass the time series container directly as a parameter to the augmenter step when calling fit or transform on a sklearn.pipeline.Pipeline, we have to set it manually by calling pipeline.set_params(augmenter__timeseries_container=df_ts). In general, you can change the time series container from which the features are extracted by calling either the pipeline’s set_params() method or the transformers set_timeseries_container() method.

For further examples, visit the Jupyter Notebook 02 sklearn Pipeline.ipynb in the notebooks folder of the tsfresh github repository.

References