Support Multiprocessing in create_dataset

# Context

When working with muxed packet files generated from ground testing data, we often have to inefficiently parse thousands of files, filtering for the APID of interest. This is quite slow when `create_dataset` parses each file serially.

# Driving Requirements

Support an optional `n_workers` kwarg to `create_dataset` to enable multiprocessing of input packet files.

# Implementation Requirements

To save the overhead of creating many processes, multiprocessing inside create_dataset should probably spin up n_worker processes (or n_files processes if n_files < n_workers) and send roughly an equal number of packet files to each worker process.

This parallelization should apply only to the parsing loop itself with all the post processing for numpy data types occurring afterwards. 

# Considerations

Make sure we aren't forgetting about any consistency checking that occurs in the packet parsing loop (e.g. duplicates or anything like that). I don't think this will be an issue but good to think about.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Multiprocessing in create_dataset #241

Context

Driving Requirements

Implementation Requirements

Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Multiprocessing in create_dataset #241

Description

Context

Driving Requirements

Implementation Requirements

Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions