tsfresh.utilities package¶
Submodules¶
tsfresh.utilities.dataframe_functions module¶
Utility functions for handling the DataFrame conversions to the internal normalized format
(see normalize_input_to_internal_representation
) or on how to handle NaN
and inf
in the DataFrames.
-
tsfresh.utilities.dataframe_functions.
check_for_nans_in_columns
(df, columns=None)[source]¶ Helper function to check for
NaN
in the data frame and raise aValueError
if there is one.Parameters: - df (pandas.DataFrame) – the pandas DataFrame to test for NaNs
- columns (list) – a list of columns to test for NaNs. If left empty, all columns of the DataFrame will be tested.
Returns: None
Return type: Raise: ValueError
ofNaNs
are found in the DataFrame.
-
tsfresh.utilities.dataframe_functions.
get_range_values_per_column
(df)[source]¶ Retrieves the finite max, min and mean values per column in the DataFrame df and stores them in three dictionaries. Those dictionaries col_to_max, col_to_min, col_to_median map the columnname to the maximal, minimal or median value of that column.
If a column does not contain any finite values at all, a 0 is stored instead.
Parameters: df (pandas.DataFrame) – the Dataframe to get columnswise max, min and median from Returns: Dictionaries mapping column names to max, min, mean values Return type: (dict, dict, dict)
-
tsfresh.utilities.dataframe_functions.
impute
(df_impute)[source]¶ Columnwise replaces all
NaNs
andinfs
from the DataFrame df_impute with average/extreme values from the same columns. This is done as follows: Each occurringinf
orNaN
in df_impute is replaced by-inf
->min
+inf
->max
NaN
->median
If the column does not contain finite values at all, it is filled with zeros.
This function modifies df_impute in place. After that, df_impute is guaranteed to not contain any non-finite values. Also, all columns will be guaranteed to be of type
np.float64
.Parameters: df_impute (pandas.DataFrame) – DataFrame to impute Return df_impute: imputed DataFrame Rtype df_impute: pandas.DataFrame
-
tsfresh.utilities.dataframe_functions.
impute_dataframe_range
(df_impute, col_to_max, col_to_min, col_to_median)[source]¶ Columnwise replaces all
NaNs
,-inf
and+inf
from the DataFrame df_impute with average/extreme values from the provided dictionaries.This is done as follows: Each occurring
inf
orNaN
in df_impute is replaced by-inf
-> by value in col_to_min+inf
-> by value in col_to_maxNaN
-> by value in col_to_median
If a column of df_impute is not found in the one of the dictionaries, this method will raise a ValueError. Also, if one of the values to replace is not finite a ValueError is returned
This function modifies df_impute in place. Afterwards df_impute is guaranteed to not contain any non-finite values. Also, all columns will be guaranteed to be of type
np.float64
.Parameters: - df_impute (pandas.DataFrame) – DataFrame to impute
- col_to_max (dict) – Dictionary mapping column names to max values
- col_to_min – Dictionary mapping column names to min values
- col_to_median – Dictionary mapping column names to median values
Return df_impute: imputed DataFrame
Rtype df_impute: pandas.DataFrame
Raises: ValueError – if a column of df_impute is missing in col_to_max, col_to_min or col_to_median or a value to replace is non finite
-
tsfresh.utilities.dataframe_functions.
impute_dataframe_zero
(df_impute)[source]¶ Replaces all
NaNs
,-infs
and+infs
from the DataFrame df_impute with 0s. The df_impute will be modified in place. All its columns will be into converted into dtypenp.float64
.Parameters: df_impute (pandas.DataFrame) – DataFrame to impute Return df_impute: imputed DataFrame Rtype df_impute: pandas.DataFrame
-
tsfresh.utilities.dataframe_functions.
normalize_input_to_internal_representation
(df_or_dict, column_id, column_sort, column_kind, column_value)[source]¶ Try to transform any given input to the internal representation of time series, which is a mapping from string (the kind) to a pandas DataFrame with exactly two columns (the value and the id).
This function can transform pandas DataFrames in different formats or dictionaries to pandas DataFrames in different formats. It is used internally in the extract_features function and should not be called by the user.
Parameters: - df_or_dict (pandas.DataFrame or dict) – a pandas DataFrame or a dictionary. The required shape/form of the object depends on the rest of the passed arguments.
- column_id (basestring or None) – if not None, it must be present in the pandas DataFrame or in all DataFrames in the dictionary. It is not allowed to have NaN values in this column. If this column name is None, a new column will be added to the pandas DataFrame (or all pandas DataFrames in the dictionary) and the same id for all entries is assumed.
- column_sort (basestring or None) – if not None, sort the rows by this column. Then, the column is dropped. It is not allowed to have NaN values in this column.
- column_kind (basestring or None) – It can only be used when passing a pandas DataFrame (the dictionary is already assumed to be grouped by the kind). Is must be present in the DataFrame and no NaN values are allowed. The DataFrame will be grouped by the values in the kind column and each group will be one entry in the resulting mapping. If the kind column is not passed, it is assumed that each column in the pandas DataFrame (except the id or sort column) is a possible kind and the DataFrame is split up into as many DataFrames as there are columns. Except when a value column is given: then it is assumed that there is only one column.
- column_value (basestring or None) – If it is given, it must be present and not-NaN on the pandas DataFrames (or all pandas DataFrames in the dictionaries). If it is None, it is assumed that there is only a single remaining column in the DataFrame(s) (otherwise an exception is raised).
Returns: A tuple of 3 elements: the normalized DataFrame as a dictionary mapping from the kind (as a string) to the corresponding DataFrame, the name of the id column and the name of the value column
Return type: (dict, basestring, basestring)
Raise: ValueError
when the passed combination of parameters is wrong or does not fit to the input DataFrame or dict.
-
tsfresh.utilities.dataframe_functions.
restrict_input_to_index
(df_or_dict, column_id, index)[source]¶ Restrict df_or_dict to those ids contained in index.
Parameters: - df_or_dict (pandas.DataFrame or dict) – a pandas DataFrame or a dictionary.
- column_id (basestring) – it must be present in the pandas DataFrame or in all DataFrames in the dictionary. It is not allowed to have NaN values in this column.
- index (Iterable or pandas.Series) – Index containing the ids
Return df_or_dict_restricted: the restricted df_or_dict
Rtype df_or_dict_restricted: dict or pandas.DataFrame
Raise: TypeError
if df_or_dict is not of type dict or pandas.DataFrame
tsfresh.utilities.helper_functions module¶
Some helper functions.
-
tsfresh.utilities.helper_functions.
calculate_best_chunksize
(iterable_list, settings)[source]¶ Helper function to calculate the best chunksize for a given number of elements to calculate, or use the one in the settings object.
The formula is more or less an empirical result. :param iterable_list: A list which defines how many calculations there need to be. :param settings: The settings object where the chunksize may already be given (or not). :return: The chunksize which should be used.
TODO: Investigate which is the best chunk size for different settings.
tsfresh.utilities.profiling module¶
Contains methods to start and stop the profiler that checks the runtime of the different feature calculators
-
tsfresh.utilities.profiling.
end_profiling
(profiler, filename, sorting=None)[source]¶ Helper function to stop the profiling process and write out the profiled data into the given filename. Before this, sort the stats by the passed sorting.
Parameters: - profiler (cProfile.Profile) – An already started profiler (probably by start_profiling).
- filename (basestring) – The name of the output file to save the profile.
- sorting (basestring) – The sorting of the statistics passed to the sort_stats function.
Returns: None
Return type: Start and stop the profiler with:
>>> profiler = start_profiling() >>> # Do something you want to profile >>> end_profiling(profiler, "out.txt", "cumulative")
-
tsfresh.utilities.profiling.
start_profiling
()[source]¶ Helper function to start the profiling process and return the profiler (to close it later).
Returns: a started profiler. Return type: cProfile.Profile Start and stop the profiler with:
>>> profiler = start_profiling() >>> # Do something you want to profile >>> end_profiling(profiler, "cumulative", "out.txt")