tsfresh.convenience package

Submodules

tsfresh.convenience.relevant_extraction module

tsfresh.convenience.relevant_extraction.extract_relevant_features(timeseries_container, y, X=None, default_fc_parameters=None, kind_to_fc_parameters=None, column_id=None, column_sort=None, column_kind=None, column_value=None, show_warnings=False, disable_progressbar=False, profile=False, profiling_filename='profile.txt', profiling_sorting='cumulative', test_for_binary_target_binary_feature='fisher', test_for_binary_target_real_feature='mann', test_for_real_target_binary_feature='mann', test_for_real_target_real_feature='kendall', fdr_level=0.05, hypotheses_independent=False, n_jobs=2, chunksize=None, ml_task='auto')[source]

High level convenience function to extract time series features from timeseries_container. Then return feature matrix X possibly augmented with relevent features with respect to target vector y.

For more details see the documentation of extract_features() and select_features().

Examples

>>> from tsfresh.examples import load_robot_execution_failures
>>> from tsfresh import extract_relevant_features
>>> df, y = load_robot_execution_failures()
>>> X = extract_relevant_features(df, y, column_id='id', column_sort='time')
Parameters:
  • timeseries_container – The pandas.DataFrame with the time series to compute the features for, or a dictionary of pandas.DataFrames. See extract_features().
  • X (pandas.DataFrame) – A DataFrame containing additional features
  • y (pandas.Series) – The target vector
  • default_fc_parameters (dict) – mapping from feature calculator names to parameters. Only those names which are keys in this dict will be calculated. See the class:ComprehensiveFCParameters for more information.
  • kind_to_fc_parameters (dict) – mapping from kind names to objects of the same type as the ones for default_fc_parameters. If you put a kind as a key here, the fc_parameters object (which is the value), will be used instead of the default_fc_parameters.
  • column_id (str) – The name of the id column to group by.
  • column_sort (str) – The name of the sort column.
  • column_kind (str) – The name of the column keeping record on the kind of the value.
  • column_value (str) – The name for the column keeping the value itself.
  • chunksize (None or int) – The size of one chunk for the parallelisation
  • n_jobs (int) – The number of processes to use for parallelization. If zero, no parallelization is used.
  • disable_progressbar (bool) – Do not show a progressbar while doing the calculation.
  • profile (bool) – Turn on profiling during feature extraction
  • profiling_sorting (basestring) – How to sort the profiling results (see the documentation of the profiling package for more information)
  • profiling_filename (basestring) – Where to save the profiling results.
  • test_for_binary_target_binary_feature (str) – Which test to be used for binary target, binary feature (currently unused)
  • test_for_binary_target_real_feature (str) – Which test to be used for binary target, real feature
  • test_for_real_target_binary_feature (str) – Which test to be used for real target, binary feature (currently unused)
  • test_for_real_target_real_feature (str) – Which test to be used for real target, real feature (currently unused)
  • fdr_level (float) – The FDR level that should be respected, this is the theoretical expected percentage of irrelevant features among all created features.
  • hypotheses_independent (bool) – Can the significance of the features be assumed to be independent? Normally, this should be set to False as the features are never independent (e.g. mean and median)
  • ml_task (str) – The intended machine learning task. Either ‘classification’, ‘regression’ or ‘auto’. Defaults to ‘auto’, meaning the intended task is inferred from y. If y has a boolean, integer or object dtype, the task is assumend to be classification, else regression.
Param:

show_warnings: Show warnings during the feature extraction (needed for debugging of calculators).

Returns:

Feature matrix X, possibly extended with relevant time series features.

Module contents

The convenience submodule contains methods that allow the user to extract and filter features conveniently.