tsfresh.convenience package

Submodules

tsfresh.convenience.relevant_extraction module

tsfresh.convenience.relevant_extraction.extract_relevant_features(timeseries_container, y, X=None, default_fc_parameters=None, kind_to_fc_parameters=None, column_id=None, column_sort=None, column_kind=None, column_value=None, show_warnings=False, disable_progressbar=False, profile=False, profiling_filename='profile.txt', profiling_sorting='cumulative', test_for_binary_target_binary_feature='fisher', test_for_binary_target_real_feature='mann', test_for_real_target_binary_feature='mann', test_for_real_target_real_feature='kendall', fdr_level=0.05, hypotheses_independent=False, n_jobs=2, chunksize=None, ml_task='auto')[source]

High level convenience function to extract time series features from timeseries_container. Then return feature matrix X possibly augmented with relevant features with respect to target vector y.

For more details see the documentation of extract_features() and select_features().

Examples

>>> from tsfresh.examples import load_robot_execution_failures
>>> from tsfresh import extract_relevant_features
>>> df, y = load_robot_execution_failures()
>>> X = extract_relevant_features(df, y, column_id='id', column_sort='time')
Parameters:
  • timeseries_container – The pandas.DataFrame with the time series to compute the features for, or a dictionary of pandas.DataFrames. See extract_features().
  • X (pandas.DataFrame) – A DataFrame containing additional features
  • y (pandas.Series) – The target vector
  • default_fc_parameters (dict) – mapping from feature calculator names to parameters. Only those names which are keys in this dict will be calculated. See the class:ComprehensiveFCParameters for more information.
  • kind_to_fc_parameters (dict) – mapping from kind names to objects of the same type as the ones for default_fc_parameters. If you put a kind as a key here, the fc_parameters object (which is the value), will be used instead of the default_fc_parameters.
  • column_id (str) – The name of the id column to group by.
  • column_sort (str) – The name of the sort column.
  • column_kind (str) – The name of the column keeping record on the kind of the value.
  • column_value (str) – The name for the column keeping the value itself.
  • chunksize (None or int) – The size of one chunk that is submitted to the worker process for the parallelisation. Where one chunk is defined as a singular time series for one id and one kind. If you set the chunksize to 10, then it means that one task is to calculate all features for 10 time series. If it is set it to None, depending on distributor, heuristics are used to find the optimal chunksize. If you get out of memory exceptions, you can try it with the dask distributor and a smaller chunksize.
  • n_jobs (int) – The number of processes to use for parallelization. If zero, no parallelization is used.
  • disable_progressbar (bool) – Do not show a progressbar while doing the calculation.
  • profile (bool) – Turn on profiling during feature extraction
  • profiling_sorting (basestring) – How to sort the profiling results (see the documentation of the profiling package for more information)
  • profiling_filename (basestring) – Where to save the profiling results.
  • test_for_binary_target_binary_feature (str) – Which test to be used for binary target, binary feature (currently unused)
  • test_for_binary_target_real_feature (str) – Which test to be used for binary target, real feature
  • test_for_real_target_binary_feature (str) – Which test to be used for real target, binary feature (currently unused)
  • test_for_real_target_real_feature (str) – Which test to be used for real target, real feature (currently unused)
  • fdr_level (float) – The FDR level that should be respected, this is the theoretical expected percentage of irrelevant features among all created features.
  • hypotheses_independent (bool) – Can the significance of the features be assumed to be independent? Normally, this should be set to False as the features are never independent (e.g. mean and median)
  • ml_task (str) – The intended machine learning task. Either ‘classification’, ‘regression’ or ‘auto’. Defaults to ‘auto’, meaning the intended task is inferred from y. If y has a boolean, integer or object dtype, the task is assumend to be classification, else regression.
Param:

show_warnings: Show warnings during the feature extraction (needed for debugging of calculators).

Returns:

Feature matrix X, possibly extended with relevant time series features.

Module contents

The convenience submodule contains methods that allow the user to extract and filter features conveniently.