tsfresh.utilities package

Submodules

tsfresh.utilities.dataframe_functions module

Utility functions for handling the DataFrame conversions to the internal normalized format (see normalize_input_to_internal_representation) or on how to handle NaN and inf in the DataFrames.

tsfresh.utilities.dataframe_functions.check_for_nans_in_columns(df, columns=None)[source]

Helper function to check for NaN in the data frame and raise a ValueError if there is one.

Parameters:
  • df (pandas.DataFrame) – the pandas DataFrame to test for NaNs
  • columns (list) – a list of columns to test for NaNs. If left empty, all columns of the DataFrame will be tested.
Returns:

None

Return type:

None

Raise:

ValueError of NaNs are found in the DataFrame.

tsfresh.utilities.dataframe_functions.get_range_values_per_column(df)[source]

Retrieves the finite max, min and mean values per column in the DataFrame df and stores them in three dictionaries. Those dictionaries col_to_max, col_to_min, col_to_median map the columnname to the maximal, minimal or median value of that column.

If a column does not contain any finite values at all, a 0 is stored instead.

Parameters:df (pandas.DataFrame) – the Dataframe to get columnswise max, min and median from
Returns:Dictionaries mapping column names to max, min, mean values
Return type:(dict, dict, dict)
tsfresh.utilities.dataframe_functions.impute(df_impute)[source]

Columnwise replaces all NaNs and infs from the DataFrame df_impute with average/extreme values from the same columns. This is done as follows: Each occurring inf or NaN in df_impute is replaced by

  • -inf -> min
  • +inf -> max
  • NaN -> median

If the column does not contain finite values at all, it is filled with zeros.

This function modifies df_impute in place. After that, df_impute is guaranteed to not contain any non-finite values. Also, all columns will be guaranteed to be of type np.float64.

Parameters:df_impute (pandas.DataFrame) – DataFrame to impute
Return df_impute:
 imputed DataFrame
Rtype df_impute:
 pandas.DataFrame
tsfresh.utilities.dataframe_functions.impute_dataframe_range(df_impute, col_to_max, col_to_min, col_to_median)[source]

Columnwise replaces all NaNs, -inf and +inf from the DataFrame df_impute with average/extreme values from the provided dictionaries.

This is done as follows: Each occurring inf or NaN in df_impute is replaced by

  • -inf -> by value in col_to_min
  • +inf -> by value in col_to_max
  • NaN -> by value in col_to_median

If a column of df_impute is not found in the one of the dictionaries, this method will raise a ValueError. Also, if one of the values to replace is not finite a ValueError is returned

This function modifies df_impute in place. Afterwards df_impute is guaranteed to not contain any non-finite values. Also, all columns will be guaranteed to be of type np.float64.

Parameters:
  • df_impute (pandas.DataFrame) – DataFrame to impute
  • col_to_max (dict) – Dictionary mapping column names to max values
  • col_to_min – Dictionary mapping column names to min values
  • col_to_median – Dictionary mapping column names to median values
Return df_impute:
 

imputed DataFrame

Rtype df_impute:
 

pandas.DataFrame

Raises:

ValueError – if a column of df_impute is missing in col_to_max, col_to_min or col_to_median or a value to replace is non finite

tsfresh.utilities.dataframe_functions.impute_dataframe_zero(df_impute)[source]

Replaces all NaNs, -infs and +infs from the DataFrame df_impute with 0s. The df_impute will be modified in place. All its columns will be into converted into dtype np.float64.

Parameters:df_impute (pandas.DataFrame) – DataFrame to impute
Return df_impute:
 imputed DataFrame
Rtype df_impute:
 pandas.DataFrame
tsfresh.utilities.dataframe_functions.make_forecasting_frame(x, kind, max_timeshift, rolling_direction)[source]

Takes a singular time series x and constructs a DataFrame df and target vector y that can be used for a time series forecasting task.

The returned df will contain, for every time stamp in x, the last max_timeshift data points as a new time series, such can be used to fit a time series forecasting model.

See Time series forecasting for a detailed description of the rolling process and how the feature matrix and target vector are derived.

The returned time series container df, will contain the rolled time series as a flat data frame, the first format from Data Formats.

When x is a pandas.Series, the index will be used as id.

Parameters:
  • x (np.array or pd.Series) – the singular time series
  • kind (str) – the kind of the time series
  • rolling_direction (int) – The sign decides, if to roll backwards (if sign is positive) or forwards in “time”
  • max_timeshift (int) – If not None, shift only up to max_timeshift. If None, shift as often as possible.
Returns:

time series container df, target vector y

Return type:

(pd.DataFrame, pd.Series)

tsfresh.utilities.dataframe_functions.restrict_input_to_index(df_or_dict, column_id, index)[source]

Restrict df_or_dict to those ids contained in index.

Parameters:
  • df_or_dict (pandas.DataFrame or dict) – a pandas DataFrame or a dictionary.
  • column_id (basestring) – it must be present in the pandas DataFrame or in all DataFrames in the dictionary. It is not allowed to have NaN values in this column.
  • index (Iterable or pandas.Series) – Index containing the ids
Return df_or_dict_restricted:
 

the restricted df_or_dict

Rtype df_or_dict_restricted:
 

dict or pandas.DataFrame

Raise:

TypeError if df_or_dict is not of type dict or pandas.DataFrame

tsfresh.utilities.dataframe_functions.roll_time_series(df_or_dict, column_id, column_sort, column_kind, rolling_direction, max_timeshift=None)[source]

This method creates sub windows of the time series. It rolls the (sorted) data frames for each kind and each id separately in the “time” domain (which is represented by the sort order of the sort column given by column_sort).

For each rolling step, a new id is created by the scheme “id={id}, shift={shift}”, here id is the former id of the column and shift is the amount of “time” shifts.

A few remarks:

  • This method will create new IDs!
  • The sign of rolling defines the direction of time rolling, a positive value means we are going back in time
  • It is possible to shift time series of different lengths but
  • We assume that the time series are uniformly sampled
  • For more information, please see Time series forecasting.
Parameters:
  • df_or_dict (pandas.DataFrame or dict) – a pandas DataFrame or a dictionary. The required shape/form of the object depends on the rest of the passed arguments.
  • column_id (basestring or None) – it must be present in the pandas DataFrame or in all DataFrames in the dictionary. It is not allowed to have NaN values in this column.
  • column_sort (basestring or None) – if not None, sort the rows by this column. It is not allowed to have NaN values in this column.
  • column_kind (basestring or None) – It can only be used when passing a pandas DataFrame (the dictionary is already assumed to be grouped by the kind). Is must be present in the DataFrame and no NaN values are allowed. If the kind column is not passed, it is assumed that each column in the pandas DataFrame (except the id or sort column) is a possible kind.
  • rolling_direction (int) – The sign decides, if to roll backwards or forwards in “time”
  • max_timeshift (int) – If not None, shift only up to max_timeshift. If None, shift as often as possible.
Returns:

The rolled data frame or dictionary of data frames

Return type:

the one from df_or_dict

tsfresh.utilities.profiling module

Contains methods to start and stop the profiler that checks the runtime of the different feature calculators

tsfresh.utilities.profiling.end_profiling(profiler, filename, sorting=None)[source]

Helper function to stop the profiling process and write out the profiled data into the given filename. Before this, sort the stats by the passed sorting.

Parameters:
  • profiler (cProfile.Profile) – An already started profiler (probably by start_profiling).
  • filename (basestring) – The name of the output file to save the profile.
  • sorting (basestring) – The sorting of the statistics passed to the sort_stats function.
Returns:

None

Return type:

None

Start and stop the profiler with:

>>> profiler = start_profiling()
>>> # Do something you want to profile
>>> end_profiling(profiler, "out.txt", "cumulative")
tsfresh.utilities.profiling.start_profiling()[source]

Helper function to start the profiling process and return the profiler (to close it later).

Returns:a started profiler.
Return type:cProfile.Profile

Start and stop the profiler with:

>>> profiler = start_profiling()
>>> # Do something you want to profile
>>> end_profiling(profiler, "cumulative", "out.txt")

tsfresh.utilities.string_manipulation module

tsfresh.utilities.string_manipulation.convert_to_output_format(param)[source]

Helper function to convert parameters to a valid string, that can be used in a column name. Does the opposite which is used in the from_columns function.

The parameters are sorted by their name and written out in the form

<param name>_<param value>__<param name>_<param value>__ ...

If a <param_value> is a string, this method will wrap it with parenthesis ”, so “<param_value>”

Parameters:param (dict) – The dictionary of parameters to write out
Returns:The string of parsed parameters
Return type:str
tsfresh.utilities.string_manipulation.get_config_from_string(parts)[source]

Helper function to extract the configuration of a certain function from the column name. The column name parts (split by “__”) should be passed to this function. It will skip the kind name and the function name and only use the parameter parts. These parts will be split up on “_” into the parameter name and the parameter value. This value is transformed into a python object (for example is “(1, 2, 3)” transformed into a tuple consisting of the ints 1, 2 and 3).

Returns None of no parameters are in the column name.

Parameters:parts (list) – The column name split up on “__”
Returns:a dictionary with all parameters, which are encoded in the column name.
Return type:dict

Module contents

This utilities submodule contains several utility functions. Those should only be used internally inside tsfresh.