Skip to content

Commit 61319c1

Browse files
committed
feat: Add synthetic time-series data generator
This commit introduces a new generator for creating synthetic longitudinal (time-series) data, which is essential for testing the time-series capabilities of the ml-grid pipeline. Key changes include: SyntheticTSDataGenerator class: A new class dedicated to generating long-format time-series data. It creates a DataFrame with client_idcode and timestamp columns, where each patient has a specified number of timepoints. Realistic Data Structure: The generator mirrors the structure of real-world longitudinal data, making it suitable for end-to-end pipeline testing. It reuses feature naming conventions from the existing SyntheticDataGenerator for consistency. generate_synthetic_ts_data function: A convenience wrapper function is added for easy instantiation and use of the new generator. Example Usage: The if __name__ == "__main__" block is updated to include a comprehensive example of generating, imputing, and saving a synthetic time-series dataset, alongside the existing tabular data example. This demonstrates its usage and integration with existing utility functions like mean_impute_dataframe.
1 parent 07d4a77 commit 61319c1

1 file changed

Lines changed: 446 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)