tsfresh.convenience package¶

Submodules¶

tsfresh.convenience.relevant_extraction module¶

tsfresh.convenience.relevant_extraction.extract_relevant_features(timeseries_container, y, X=None, default_fc_parameters=None, kind_to_fc_parameters=None, column_id=None, column_sort=None, column_kind=None, column_value=None, parallelization=None, show_warnings=False, disable_progressbar=False, profile=False, profiling_filename='profile.txt', profiling_sorting='cumulative', test_for_binary_target_binary_feature='fisher', test_for_binary_target_real_feature='mann', test_for_real_target_binary_feature='mann', test_for_real_target_real_feature='kendall', fdr_level=0.05, hypotheses_independent=False, n_processes=2, chunksize=None)[source]¶

High level convenience function to extract time series features from timeseries_container. Then return feature matrix X possibly augmented with relevent features with respect to target vector y.

For more details see the documentation of extract_features() and select_features().

Examples

>>> from tsfresh.examples import load_robot_execution_failures
>>> from tsfresh import extract_relevant_features
>>> df, y = load_robot_execution_failures()
>>> X = extract_relevant_features(df, y, column_id='id', column_sort='time')

Parameters:	timeseries_container – The pandas.DataFrame with the time series to compute the features for, or a dictionary of pandas.DataFrames. See `extract_features()`. X (pandas.DataFrame) – A DataFrame containing additional features y (pandas.Series) – The target vector default_fc_parameters (dict) – mapping from feature calculator names to parameters. Only those names which are keys in this dict will be calculated. See the class:ComprehensiveFCParameters for more information. kind_to_fc_parameters (dict) – mapping from kind names to objects of the same type as the ones for default_fc_parameters. If you put a kind as a key here, the fc_parameters object (which is the value), will be used instead of the default_fc_parameters. column_id (str) – The name of the id column to group by. column_sort (str) – The name of the sort column. column_kind (str) – The name of the column keeping record on the kind of the value. column_value (str) – The name for the column keeping the value itself. parallelization (str) – Either `'per_sample'` or `'per_kind'` , see `_extract_features_parallel_per_sample()`, `_extract_features_parallel_per_kind()` and Parallelization for details. Choosing None makes the algorithm look for the best parallelization technique by applying some general remarks. chunksize (None or int) – The size of one chunk for the parallelisation n_processes (int) – The number of processes to use for parallelisation. disable_progressbar (bool) – Do not show a progressbar while doing the calculation. profile (bool) – Turn on profiling during feature extraction profiling_sorting (basestring) – How to sort the profiling results (see the documentation of the profiling package for more information) profiling_filename (basestring) – Where to save the profiling results. test_for_binary_target_binary_feature (str) – Which test to be used for binary target, binary feature (currently unused) test_for_binary_target_real_feature (str) – Which test to be used for binary target, real feature test_for_real_target_binary_feature (str) – Which test to be used for real target, binary feature (currently unused) test_for_real_target_real_feature (str) – Which test to be used for real target, real feature (currently unused) fdr_level (float) – The FDR level that should be respected, this is the theoretical expected percentage of irrelevant features among all created features. hypotheses_independent (bool) – Can the significance of the features be assumed to be independent? Normally, this should be set to False as the features are never independent (e.g. mean and median) write_selection_report (bool) – Whether to store the selection report after the Benjamini Hochberg procedure has finished. result_dir (str) – Where to store the selection report
Param:	show_warnings: Show warnings during the feature extraction (needed for debugging of calculators).
Returns:	Feature matrix X, possibly extended with relevant time series features.

Module contents¶

The convenience submodule contains methods that allow the user to extract and filter features conveniently.