Changelog
tsfresh uses Semantic Versioning
Version 0.20.1
- Added Features
Make tsfresh compatible with numpy 1.24 (#1018) and pandas 2.0 (#1028)
- Bugfixes/Typos/Documentation:
Use pandas Index.equals in check_if_pandas_series (#963)
Updates to package layout, CI/CD and developer setup
Version 0.20.0
- Breaking Change
The matrixprofile package becomes an optional dependency
- Bugfixes/Typos/Documentation:
Fix feature extraction of Friedrich coefficients for pandas>1.3.5
Fix file paths after example notebooks were moved
Version 0.19.0
- Breaking Change
Drop Python 3.6 support due to dependency on statsmodels 0.13
- Added Features
Improve documentation (#831, #834, #851, #853, #870)
Add absolute_maximum and mean_n_absolute_max features (#833)
Make settings pickable (#845, #847, #910)
Disable multiprocessing for n_jobs=1 (#852)
Add black, isort, and pre-commit (#876)
- Bugfixes/Typos/Documentation:
Fix conversion of time-series into sequence for lempel_ziv_complexity (#806)
Fix range count config (#827)
Reword documentation (#893)
Fix statsmodels deprecation issues (#898, #912)
Fix typo in requirements (#903)
Bump statsmodels to v0.13 (#
Updated references
Version 0.18.0
- Added Features
Allow arbitrary rolling sizes (#766)
Allow for multiclass significance tests (#762)
Add multiclass option to RelevantFeatureAugmenter (#782)
Addition of matrix_profile feature (#793)
Added new query similarity counter feature (#798)
Add root mean square feature (#813)
- Bugfixes/Typos/Documentation:
Do not send coverage of notebook tests to codecov (#759)
Fix typos in notebook (#757, #780)
Fix output format of make_forecasting_frame (#758)
Fix badges and remove benchmark test
Fix BY notebook plot (#760)
Ts forecast example improvement (#763)
Also surpress warnings in dask (#769)
Update relevant_feature_augmenter.py (#779)
Fix column names in quick_start.rst (#778)
Improve relevance table function documentation (#781)
Fixed #789 Typo in “how to add custom feature” (#790)
Convert to the correct type on warnings (#799)
Fix minor typos in the docs (#802)
Add unwanted filetypes to gitignore (#819)
Fix build and test failures (#815)
Fix imputing docu (#800)
Bump the scikit-learn version (#822)
Version 0.17.0
We changed the default branch from “master” to “main”.
- Breaking Change
Changed constructed id in roll_time_series from string to tuple (#700)
Same for add_sub_time_series_index (#720)
- Added Features
Implemented the Lempel-Ziv-Complexity and the Fourier Entropy (#688)
Prevent #524 by adding an assert for common identifiers (#690)
Added permutation entropy (#691)
Added a logo :-) (#694)
Implemented the benford distribution feature (#689)
Reworked the notebooks (#701, #704)
Speed up the result pivoting (#705)
Add a test for the dask bindings (#719)
Refactor input data iteration to need less memory (#707)
Added benchmark tests (#710)
Make dask a possible input format (#736)
- Bugfixes:
Fixed a bug in the selection, that caused all regression tasks with un-ordered index to be wrong (#715)
Fixed readthedocs (#695, #696)
Fix spark and dask after #705 and for non-id named id columns (#712)
Fix in the forecasting notebook (#729)
Let tsfresh choose the value column if possible (#722)
Move from coveralls github action to codecov (#734)
Improve speed of data processing (#735)
Fix for newer, more strict pandas versions (#737)
Fix documentation for feature calculators (#743)
Version 0.16.0
- Breaking Change
Fix the sorting of the parameters in the feature names (#656) The feature names consist of a sorted list of all parameters now. That used to be true for all non-combiner features, and is now also true for combiner features. If you relied on the actual feature name, this is a breaking change.
Change the id after the rolling (#668) Now, the old id of your data is still kept. Additionally, we improved the way dataframes without a time column are rolled and how the new sub-time series are named. Also, the documentation was improved a lot.
- Added Features
Added variation coefficient (#654)
Added the datetimeindex explanation from the notebook to the docs (#661)
Optimize RelevantFeatureAugmenter to avoid re-extraction (#669)
Added a function add_sub_time_series_index (#666)
Added Dockerfile
Speed optimizations and speed testing script (#681)
- Bugfixes
Increase the extracted ar coefficients to the full parameter range. (#662)
Documentation fixes (#663, #664, #665)
Rewrote the sample_entropy feature calculator (#681) It is now faster and (hopefully) more correct. But your results will change!
Version 0.15.1
Changelog and documentation fixes
Version 0.15.0
- Added Features
Add count_above and count_below feature (#632)
Add convenience bindings for dask dataframes and pyspark dataframes (#651)
- Bugfixes
Fix documentation build and feature table in sphinx (#637, #631, #627)
Add scripts to API documentation
Skip dask test for older python versions (#649)
Add missing distributor keyword (#648)
Fix tuple input for cwt (#645)
Version 0.14.1
Fix travis deployment
Version 0.14.0
- Breaking Change
Replace Benjamini-Hochberg implementation with statsmodels implementation (#570)
- Refactoring and Documentation
travis.yml (#605)
gitignore (#608)
Fix docstring of c3 (#590)
Feature/pep8 (#607)
- Added Features
Improve test coverage (#609)
Add “autolag” parameter to augmented_dickey_fuller() (#612)
- Bugfixes
Feature/pep8 (#607)
Fix filtering on warnings with multiprocessing on Windows (#610)
Remove outdated logging config (#621)
Replace Benjamini-Hochberg implementation with statsmodels implementation (#570)
Fix the kernel and the naming of a notebook (#626)
Version 0.13.0
Drop python 2.7 support (#568)
- Fixed bugs
Fix cache in friedrich_coefficients and agg_linear_trend (#593)
Added a check for wrong column names and a test for this check (#586)
Make sure to not install the tests folder (#599)
Make sure there is at least a single column which we can use for data (#589)
Avoid division by zero in energy_ratio_by_chunks (#588)
Ensure that get_moment() uses float computations (#584)
Preserve index when column_value and column_kind not provided (#576)
Add @set_property(“input”, “pd.Series”) when needed (#582)
Fix off-by-one error in longest strike features (fixes #577) (#578)
Add set_property import (#572)
Fix typo (#571)
Fix indexing of melted normalized input (#563)
Fix travis (#569)
Remove warnings (#583)
Update to newest python version (#594)
- Optimizations
Early return from change_quantiles if ql >= qh (#591)
Optimize mean_second_derivative_central (#587)
Improve performance with Numpy’s sum function (#567)
Optimize mean_change (fixes issue #542) and correct documentation (#574)
Version 0.12.0
- fixed bugs
wrong calculation of friedrich coefficients
feature selection selected too many features
an ignored max_timeshift parameter in roll_time_series
add deprecation warning for python 2
added support for index based features
- new feature calculator
linear_trend_timewise
enable the RelevantFeatureAugmenter to be used in cross validated pipelines
increased scipy dependency to 1.2.0
Version 0.11.2
change chunking in energy_ratio_by_chunks to use all data points
fix warning for spkt_welch_density
adapt default settings for “value_count” and “range_count”
- added
maxlag parameter to agg_autocorrelation function
now, the kind column of the input DataFrame is cast as str, old derived FC_Settings can become invalid
only set default_fc_parameters to ComprehensiveFCParameters() if also kind_to_fc_parameters is set None in extract_features
removed pyscaffold
use asymptotic algorithm to derive kendal tau
Version 0.11.1
general performance improvements
removed hard pinning of dependencies
- fixed bugs
the stock price forecasting notebook
the multi classification notebook
Version 0.11.0
- new feature calculators:
fft_aggregated
cid_ce
renamed mean_second_derivate_central to mean_second_derivative_central
add warning if no relevant features were found in feature selection
add columns_to_ignore parameter to from_columns method
add distribution module, contains support for distributed feature extraction on Dask
Version 0.10.1
split test suite into unit and integration tests
- fixed the following bugs
use name of value column as time series kind
prevent the spawning of subprocesses which lead to high memory consumption
fix deployment from travis to pypi
Version 0.10.0
- new feature calculators:
partial autocorrelation
added list of calculated features to documentation
- added two ipython notebooks to
illustrate PCA on features
illustrate the Benjamini Yekutieli procedure
- fixed the following bugs
improperly quotation of dickey fuller settings
Version 0.9.0
- new feature calculators:
ratio_beyond_r_sigma
energy_ratio_by_chunks
number_crossing_m
c3
angle & abs for fft coefficients
agg_autocorrelation
p-Value and usedLag for augmented_dickey_fuller
change_quantiles
- changed the calculation of the following features:
fft_coefficients
autocorrelation
time_reversal_asymmetry_statistic
- removed the following feature calculators:
large_number_of_peak
mean_autocorrelation
mean_abs_change_quantiles
add support for multi classification in the feature selection
improved description of the rolling mechanism
added function make_forecasting_frame method for forecasting tasks
internally ditched the pandas representation of the time series, yielding drastic speed improvements
replaced feature calculator types from aggregate/aggregate with parameter/apply to simple/combiner
add test for the ipython notebooks
added notebook to inspect dft features
make sure that RelevantFeatureAugmentor always imputes
- fixed the following bugs
impute was replacing whole columns by mean
fft coefficient were only calculated on truncated part
allow to suppress warnings from impute function
added missing lag in time_reversal_asymmetry_statistic
Version 0.8.1
- new features:
linear trend
agg trend
- new sklearn compatible transformers
PerColumnImputer
- fixed bugs
make mannwhitneyu method compatible with scipy > v0.18.0
added caching to travis
internally, added serial calculation of features
Version 0.8.0
- Breaking API changes:
removing of feature extraction settings object, replaced by keyword arguments and a plain dictionary (fc_parameters)
removing of feature selection settings object, replaced by keyword arguments
added notebook with examples of new API
added chapter in docs about the new API
adjusted old notebooks and documentation to new API
Version 0.7.1
added a maximum shift parameter to the rolling utility
added a FAQ entry about how to use tsfresh on windows
- drastically decreased the runtime of the following features
cwt_coefficient
index_mass_quantile
number_peaks
large_standard_deviation
symmetry_looking
removed baseline unit tests
- bugfixes:
per sample parallel imputing was done on chunks which gave non deterministic results
imputing on dtypes other that float32 did not work properly
several improvements to documentation
Version 0.7.0
new rolling utility to use tsfresh for time series forecasting tasks
- bugfixes:
index_mass_quantile was using global index of time series container
an index with same name as id_column was breaking parallelization
friedrich_coefficients and max_langevin_fixed_point were occasionally stalling
Version 0.6.0
progress bar for feature selection
new feature: estimation of largest fixed point of deterministic dynamics
new notebook: demonstration how to use tsfresh in a pipeline with train and test datasets
remove no logging handler warning
fixed bug in the RelevantFeatureAugmenter regarding the evaluate_only_added_features parameters
Version 0.5.0
new example: driftbif simulation
further improvements of the parallelization
language improvements in the documentation
performance improvements for some features
performance improvements for the impute function
new feature and feature renaming: sum_of_recurring_values, sum_of_recurring_data_points
Version 0.4.0
fixed several bugs: checking of UCI dataset, out of index error for mean_abs_change_quantiles
added a progress bar denoting the progress of the extraction process
added parallelization per sample
added unit tests for comparing results of feature extraction to older snapshots
added “high_comp_cost” attribute
added ReasonableFeatureExtraction settings only calculating features without “high_comp_cost” attribute
Version 0.3.1
fixed several bugs: closing multiprocessing pools / index out of range cwt calculator / division by 0 in index_mass_quantile
now all warnings are disabled by default
for a singular type time series data, the name of value column is used as feature prefix
Version 0.3.0
fixed bug with parsing of “NUMBER_OF_CPUS” environment variable
now features are calculated in parallel for each type
Version 0.2.0
now p-values are calculated in parallel
fixed bugs for constant features
allow time series columns to be named 0
moved uci repository datasets to github mirror
added feature calculator sample_entropy
added MinimalFeatureExtraction settings
fixed bug in calculation of fourier coefficients
Version 0.1.2
added support for python 3.5.2
fixed bug with the naming of the features that made the naming of features non-deterministic
Version 0.1.1
mainly fixes for the read-the-docs documentation, the pypi readme and so on
Version 0.1.0
Initial version :)