tsfresh.scripts package

Submodules

tsfresh.scripts.measure_execution_time module

class tsfresh.scripts.measure_execution_time.CombinerTask(*args, **kwargs)[source]

Bases: Task

Collect all tasks into a single result.csv file

complete()[source]

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

max_batch_size = 1: Maximum number of tasks to run together as a batch. Infinite by default

output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

requires()[source]

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

class tsfresh.scripts.measure_execution_time.DataCreationTask(*args, **kwargs)[source]

Bases: Task

Create random data for testing

max_batch_size = 1: Maximum number of tasks to run together as a batch. Infinite by default

num_ids: Parameter whose value is an int.

output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

random_seed: Parameter whose value is an int.

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

time_series_length: Parameter whose value is an int.

class tsfresh.scripts.measure_execution_time.FullTimingTask(*args, **kwargs)[source]

Bases: Task

Run tsfresh with all calculators for comparison

max_batch_size = 1: Maximum number of tasks to run together as a batch. Infinite by default

n_jobs: Parameter whose value is an int.

num_ids: Parameter whose value is an int.

output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

random_seed: Parameter whose value is an int.

requires()

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

time_series_length: Parameter whose value is an int.

class tsfresh.scripts.measure_execution_time.TimingTask(*args, **kwargs)[source]

Bases: Task

Run tsfresh with the given parameters

feature_parameter

Parameter whose value is a dict.

In the task definition, use

class MyTask(luigi.Task):
  tags = luigi.DictParameter()

    def run(self):
        logging.info("Find server with role: %s", self.tags['role'])
        server = aws.ec2.find_my_resource(self.tags)

At the command line, use

$ luigi --module my_tasks MyTask --tags <JSON string>

Simple example with two tags:

$ luigi --module my_tasks MyTask --tags '{"role": "web", "env": "staging"}'

It can be used to define dynamic parameters, when you do not know the exact list of your parameters (e.g. list of tags, that are dynamically constructed outside Luigi), or you have a complex parameter containing logically related values (like a database connection config).

It is possible to provide a JSON schema that should be validated by the given value:

class MyTask(luigi.Task):
  tags = luigi.DictParameter(
    schema={
      "type": "object",
      "patternProperties": {
        ".*": {"type": "string", "enum": ["web", "staging"]},
      }
    }
  )

  def run(self):
    logging.info("Find server with role: %s", self.tags['role'])
    server = aws.ec2.find_my_resource(self.tags)

Using this schema, the following command will work:

$ luigi --module my_tasks MyTask --tags '{"role": "web", "env": "staging"}'

while this command will fail because the parameter is not valid:

$ luigi --module my_tasks MyTask --tags '{"role": "UNKNOWN_VALUE", "env": "staging"}'

Finally, the provided schema can be a custom validator:

custom_validator = jsonschema.Draft4Validator(
  schema={
    "type": "object",
    "patternProperties": {
      ".*": {"type": "string", "enum": ["web", "staging"]},
    }
  }
)

class MyTask(luigi.Task):
  tags = luigi.DictParameter(schema=custom_validator)

  def run(self):
    logging.info("Find server with role: %s", self.tags['role'])
    server = aws.ec2.find_my_resource(self.tags)

max_batch_size = 1: Maximum number of tasks to run together as a batch. Infinite by default

n_jobs: Parameter whose value is an int.

num_ids: Parameter whose value is an int.

output()[source]

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

random_seed: Parameter whose value is an int.

requires()

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

time_series_length: Parameter whose value is an int.

try_number: Parameter whose value is an int.

tsfresh.scripts.run_tsfresh module

This script can be run with:

python run_tsfresh.py path_to_your_csv.csv

A corresponding csv containing time series features will be saved as features_path_to_your_csv.csv

There are a few limitations though

Currently this only samples to first 50 values.
Your csv must be space delimited.
Output is saved as path_to_your_csv.features.csv

tsfresh.scripts.run_tsfresh.main(console_args=None)[source]

tsfresh.scripts.test_timing module

tsfresh.scripts.test_timing.measure_temporal_complexity()[source]

tsfresh.scripts.test_timing.plot_results()[source]

tsfresh.scripts.test_timing.simulate_with_length(length, df)[source]

tsfresh.scripts package

Submodules

tsfresh.scripts.measure_execution_time module

tsfresh.scripts.run_tsfresh module

tsfresh.scripts.test_timing module

Module contents