tsfresh.convenience package

Submodules

tsfresh.convenience.relevant_extraction module

tsfresh.convenience.relevant_extraction.extract_relevant_features(timeseries_container, y, X=None, default_fc_parameters=None, kind_to_fc_parameters=None, column_id=None, column_sort=None, column_kind=None, column_value=None, parallelization=None, show_warnings=False, disable_progressbar=False, profile=False, profiling_filename='profile.txt', profiling_sorting='cumulative', test_for_binary_target_binary_feature='fisher', test_for_binary_target_real_feature='mann', test_for_real_target_binary_feature='mann', test_for_real_target_real_feature='kendall', fdr_level=0.05, hypotheses_independent=False, n_processes=2, chunksize=None)[source]

High level convenience function to extract time series features from timeseries_container. Then return feature matrix X possibly augmented with relevent features with respect to target vector y.

For more details see the documentation of extract_features() and select_features().

Examples

>>> from tsfresh.examples import load_robot_execution_failures
>>> from tsfresh import extract_relevant_features
>>> df, y = load_robot_execution_failures()
>>> X = extract_relevant_features(df, y, column_id='id', column_sort='time')
Parameters:
  • timeseries_container – The pandas.DataFrame with the time series to compute the features for, or a dictionary of pandas.DataFrames. See extract_features().
  • X (pandas.DataFrame) – A DataFrame containing additional features
  • y (pandas.Series) – The target vector
  • default_fc_parameters (dict) – mapping from feature calculator names to parameters. Only those names which are keys in this dict will be calculated. See the class:ComprehensiveFCParameters for more information.
  • kind_to_fc_parameters (dict) – mapping from kind names to objects of the same type as the ones for default_fc_parameters. If you put a kind as a key here, the fc_parameters object (which is the value), will be used instead of the default_fc_parameters.
  • column_id (str) – The name of the id column to group by.
  • column_sort (str) – The name of the sort column.
  • column_kind (str) – The name of the column keeping record on the kind of the value.
  • column_value (str) – The name for the column keeping the value itself.
  • parallelization (str) – Either 'per_sample' or 'per_kind' , see _extract_features_parallel_per_sample(), _extract_features_parallel_per_kind() and Parallelization for details. Choosing None makes the algorithm look for the best parallelization technique by applying some general remarks.
  • chunksize (None or int) – The size of one chunk for the parallelisation
  • n_processes (int) – The number of processes to use for parallelisation.
  • disable_progressbar (bool) – Do not show a progressbar while doing the calculation.
  • profile (bool) – Turn on profiling during feature extraction
  • profiling_sorting (basestring) – How to sort the profiling results (see the documentation of the profiling package for more information)
  • profiling_filename (basestring) – Where to save the profiling results.
  • test_for_binary_target_binary_feature (str) – Which test to be used for binary target, binary feature (currently unused)
  • test_for_binary_target_real_feature (str) – Which test to be used for binary target, real feature
  • test_for_real_target_binary_feature (str) – Which test to be used for real target, binary feature (currently unused)
  • test_for_real_target_real_feature (str) – Which test to be used for real target, real feature (currently unused)
  • fdr_level (float) – The FDR level that should be respected, this is the theoretical expected percentage of irrelevant features among all created features.
  • hypotheses_independent (bool) – Can the significance of the features be assumed to be independent? Normally, this should be set to False as the features are never independent (e.g. mean and median)
  • write_selection_report (bool) – Whether to store the selection report after the Benjamini Hochberg procedure has finished.
  • result_dir (str) – Where to store the selection report
Param:

show_warnings: Show warnings during the feature extraction (needed for debugging of calculators).

Returns:

Feature matrix X, possibly extended with relevant time series features.

Module contents

The convenience submodule contains methods that allow the user to extract and filter features conveniently.