How to add a custom feature
If you want to extract custom made features from your time series, tsfresh allows you to do so in a few simple steps:
Step 1. Decide which type of feature you want to implement
tsfresh supports two types of feature calculation methods:
1. simple
2. combiner
The difference lays in the number of features calculated for a singular time series. The feature_calculator is simple if it returns one (1.) feature, and it is a combiner and returns multiple features (2.). So if you want to add a singular feature, you should select 1., the simple feature calculator class. If it is however, better to calculate multiple features at the same time (e.g., to perform auxiliary calculations only once for all features), then you should choose type 2..
Step 2. Write the feature calculator
Depending on which type of feature calculator you are implementing, you can use the following feature calculator skeletons:
1. simple features
You can write a simple feature calculator that returns exactly one feature, without parameters as follows:
from tsfresh.feature_extraction.feature_calculators import set_property
@set_property("fctype", "simple")
def your_feature_calculator(x):
"""
The description of your feature
:param x: the time series to calculate the feature of
:type x: pandas.Series
:return: the value of this feature
:return type: bool, int or float
"""
# Calculation of feature as float, int or bool
result = f(x)
return result
or with parameters:
@set_property("fctype", "simple"")
def your_feature_calculator(x, p1, p2, ...):
"""
Description of your feature
:param x: the time series to calculate the feature of
:type x: pandas.Series
:param p1: description of your parameter p1
:type p1: type of your parameter p1
:param p2: description of your parameter p2
:type p2: type of your parameter p2
...
:return: the value of this feature
:return type: bool, int or float
"""
# Calculation of feature as float, int or bool
f = f(x)
return f
2. combiner features
Alternatively, you can write a combiner feature calculator that returns multiple features as follows:
from tsfresh.utilities.string_manipulation import convert_to_output_format
@set_property("fctype", "combiner")
def your_feature_calculator(x, param):
"""
Short description of your feature (should be a one liner as we parse the first line of the description)
Long detailed description, add somme equations, add some references, what kind of statistics is the feature
capturing? When should you use it? When not?
:param x: the time series to calculate the feature of
:type x: pandas.Series
:param c: the time series name
:type c: str
:param param: contains dictionaries {"p1": x, "p2": y, ...} with p1 float, p2 int ...
:type param: list
:return: list of tuples (s, f) where s are the parameters, serialized as a string,
and f the respective feature value as bool, int or float
:return type: pandas.Series
"""
# Do some pre-processing if needed for all parameters
# f is a function that calculates the feature value for each single parameter combination
return [(convert_to_output_format(config), f(x, config)) for config in param]
Writing your own time-based feature calculators
Writing your own time-based feature calculators is no different than usual. Only two new properties must be set using the @set_property decorator:
Adding
@set_property("input", "pd.Series")
tells the function that the input of the function is apd.Series
rather than anumpy
array. This allows the index to be used automatically.Adding
@set_property("index_type", pd.DatetimeIndex)
tells the function that the input is a DatetimeIndex, allowing it to perform calculations based on time data types.
For example, if we want to write a function that calculates the time between the first and last measurement, it could look something like this:
@set_property("input", "pd.Series")
@set_property("index_type", pd.DatetimeIndex)
def timespan(x, param):
ix = x.index
# Get differences between the last timestamp and the first timestamp in seconds,
# then convert to hours.
times_seconds = (ix[-1] - ix[0]).total_seconds()
return times_seconds / float(3600)
Step 3. Add custom settings for your feature
Finally, you need to add your new custom feature to the extraction settings, otherwise it is not used
during extraction.
To do this, create a new settings object (by default, tsfresh
uses the
tsfresh.feature_extraction.settings.ComprehensiveFCParameters
) and
add your function as a key to the dictionary.
As a value, either use None
if your function does not need parameters or a list with the
parameters you want to use (as dictionaries).
settings = ComprehensiveFCParameters()
settings[f] = [{"n": 1}, {"n": 2}]
After that, make sure you pass your newly created settings in the call to extract_features
.
Step 4. Make a pull request
We would be very happy if you contribute your custom features to tsfresh.
To do this, add your feature into the feature_calculators.py
file and append your
feature (as a name) with safe default parameters to the name_to_param
dictionary inside the
tsfresh.feature_extraction.settings.ComprehensiveFCParameters
constructor:
name_to_param.update({
# here are the existing settings
...
# Now the settings of your feature calculator
"your_feature_calculator" = [{"p1": x, "p2": y, ...} for x,y in ...],
})
Make sure, that the different feature extraction settings
(e.g. tsfresh.feature_extraction.settings.EfficientFCParameters
,
tsfresh.feature_extraction.settings.MinimalFCParameters
or
tsfresh.feature_extraction.settings.ComprehensiveFCParameters
) do include different sets of
feature calculators to use. You can control, which feature extraction settings object will include your new
feature calculator by giving your function attributes like “minimal” or “high_comp_cost”. See the
classes in tsfresh.feature_extraction.settings
for more information.
After that, add some tests and make a pull request to our github repo. We happily accept partly implemented feature calculators, which we can finalize together.