v0.2.0
Release v0.2.0
Database Backend Implementation
This release introduces a robust database backend using SQLAlchemy, which replaces the legacy file-based system as the default storage mechanism.
New Features
- Database Support: Added support for SQLite (default) and PostgreSQL.
- Defaults to a local
{project_name}.dbSQLite database if no connection string is provided. - Supports in-memory SQLite for testing.
- Defaults to a local
- Schema Management: The pipeline now handles automatic table creation and schema updates for:
- Raw Data:
raw_datatables (e.g.,raw_data.raw_bloods). - Annotations: MedCAT annotations tables.
- Features: Feature vectors with JSON serialization for sparse/high-dimensional data.
- Raw Data:
- Migration Utility: Added
pat2vec/util/migrate_to_db.pyto migrate existing file-based projects to the new database structure.
Configuration Changes
- Added
storage_backendoption toconfig_class(values:'database','file'). - Added
db_connection_stringoption toconfig_class.
Technical Improvements
- Centralized Data Retrieval: Implemented
get_df_from_dband updatedretrieve_patient_datato abstract data access. - Performance: Implemented batch insertion and automatic index creation on primary keys (e.g.,
client_idcode, timestamps) to improve query performance.