Skip to content

Commit e403bb2

Browse files
committed
init
0 parents  commit e403bb2

15 files changed

Lines changed: 3647 additions & 0 deletions

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Sleepiness Data Processing Library
2+
3+
A high-performance Python library for processing large JSON data streams from clinical studies.
4+
5+
## Features
6+
7+
- **Streaming JSON Parsing**: Uses `ijson` to handle very large JSON files with minimal memory footprint.
8+
- **Schema Discovery**: Automatically scans and infers the schema of the data.
9+
- **Data Grouping**: Efficiently groups data by any field (e.g., data type, device ID) into separate files.
10+
- **Export**: Export filtered data to JSON.
11+
- **Rich & Tqdm**: Beautiful terminal output and progress bars.
12+
13+
## Installation
14+
15+
```bash
16+
pip install .
17+
```
18+
19+
## Usage
20+
21+
```python
22+
from sleepiness import SleepinessData
23+
24+
# Initialize with a large JSON file
25+
sd = SleepinessData("data/phase-1-1/data-streams.json")
26+
27+
# Scan and print the schema
28+
sd.print_schema()
29+
30+
# Group data by data type into separate files
31+
sd.group_by_field("dataStream.dataType.name", "output_by_type")
32+
33+
# Export specific data type
34+
sd.export_to_json("heartbeat_data.json", data_type="dk.cachet.carp.heartbeat")
35+
```
36+
37+
## Project Structure
38+
39+
- `src/sleepiness/reader.py`: Core logic for streaming and processing JSON data.
40+

0 commit comments

Comments
 (0)