v0.2.0

SamoraHunter released this 23 Mar 10:22

· 22 commits to main since this release

92c96f3

Release v0.2.0

Database Backend Implementation

This release introduces a robust database backend using SQLAlchemy, which replaces the legacy file-based system as the default storage mechanism.

New Features

Database Support: Added support for SQLite (default) and PostgreSQL.
- Defaults to a local {project_name}.db SQLite database if no connection string is provided.
- Supports in-memory SQLite for testing.
Schema Management: The pipeline now handles automatic table creation and schema updates for:
- Raw Data: raw_data tables (e.g., raw_data.raw_bloods).
- Annotations: MedCAT annotations tables.
- Features: Feature vectors with JSON serialization for sparse/high-dimensional data.
Migration Utility: Added pat2vec/util/migrate_to_db.py to migrate existing file-based projects to the new database structure.

Configuration Changes

Added storage_backend option to config_class (values: 'database', 'file').
Added db_connection_string option to config_class.

Technical Improvements

Centralized Data Retrieval: Implemented get_df_from_db and updated retrieve_patient_data to abstract data access.
Performance: Implemented batch insertion and automatic index creation on primary keys (e.g., client_idcode, timestamps) to improve query performance.

Assets 4