tsfresh includes three scikit-learn compatible transformers. You can easily add them to your existing data science pipeline. If you are not familiar with scikit-learn’s pipeline we recommend you take a look at the official documentation .
The purpose of such a pipeline is to assemble several preprocessing steps that can be cross-validated together while setting different parameters. Our tsfresh transformer allows you to extract and filter the time series features during such a preprocessing sequence.
The first two estimators contained in tsfresh are the
which extracts the features, and the
FeatureSelector, which only
performs the feature selection algorithm.
It is preferable to combine extracting and filtering of the features in a single step to avoid unnecessary feature
Hence, we have the
RelevantFeatureAugmenter, which combines both the
extraction and filtering of the features in a single step.
In the following example you see how we combine tsfresh’s
RelevantFeatureAugmenter and a
RandomForestClassifier into a single pipeline. This pipeline can then fit both our
transformer and the classifier in one step.
from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestClassifier from tsfresh.examples import load_robot_execution_failures from tsfresh.transformers import RelevantFeatureAugmenter import pandas as pd # Download dataset from tsfresh.examples.robot_execution_failures import download_robot_execution_failures download_robot_execution_failures() pipeline = Pipeline([('augmenter', RelevantFeatureAugmenter(column_id='id', column_sort='time')), ('classifier', RandomForestClassifier())]) df_ts, y = load_robot_execution_failures() X = pd.DataFrame(index=y.index) pipeline.set_params(augmenter__timeseries_container=df_ts) pipeline.fit(X, y)
The parameters of the augment transformer correspond to the parameters of the top-level convenience function
In the example, we only set the names of two columns
(see Data Formats for an explanation of those parameters).
Because we cannot pass the time series container directly as a parameter to the augmenter step when calling fit or
transform on a
sklearn.pipeline.Pipeline we have to set it manually by calling
In general, you can change the time series container from which the features are extracted by calling either the
set_params() method or the transformers
For further examples, see the Jupyter Notebook pipeline_example.ipynb in the notebooks folder of the tsfresh package.