Skip to content

v0.2.0

Choose a tag to compare

@SamoraHunter SamoraHunter released this 23 Mar 10:22
· 22 commits to main since this release

Release v0.2.0

Database Backend Implementation

This release introduces a robust database backend using SQLAlchemy, which replaces the legacy file-based system as the default storage mechanism.

New Features

  • Database Support: Added support for SQLite (default) and PostgreSQL.
    • Defaults to a local {project_name}.db SQLite database if no connection string is provided.
    • Supports in-memory SQLite for testing.
  • Schema Management: The pipeline now handles automatic table creation and schema updates for:
    • Raw Data: raw_data tables (e.g., raw_data.raw_bloods).
    • Annotations: MedCAT annotations tables.
    • Features: Feature vectors with JSON serialization for sparse/high-dimensional data.
  • Migration Utility: Added pat2vec/util/migrate_to_db.py to migrate existing file-based projects to the new database structure.

Configuration Changes

  • Added storage_backend option to config_class (values: 'database', 'file').
  • Added db_connection_string option to config_class.

Technical Improvements

  • Centralized Data Retrieval: Implemented get_df_from_db and updated retrieve_patient_data to abstract data access.
  • Performance: Implemented batch insertion and automatic index creation on primary keys (e.g., client_idcode, timestamps) to improve query performance.