|
1 | 1 | # VirtualDB |
2 | 2 |
|
3 | | -VirtualDB provides a unified query interface across heterogeneous datasets with |
4 | | -different experimental condition structures and terminologies. Each dataset |
5 | | -defines experimental conditions in its own way, with properties stored at |
6 | | -different hierarchy levels (repository, dataset, or field) and using different |
7 | | -naming conventions. VirtualDB uses an external YAML configuration to map these |
8 | | -varying structures to a common schema, normalize factor level names (e.g., |
9 | | -"D-glucose", "dextrose", "glu" all become "glucose"), and enable cross-dataset |
10 | | -queries with standardized field names and values. |
| 3 | +VirtualDB provides a SQL query interface across heterogeneous HuggingFace |
| 4 | +datasets using an in-memory DuckDB database. Each dataset defines experimental |
| 5 | +conditions in its own way, with properties stored at different hierarchy levels |
| 6 | +(repository, dataset, or field) and using different naming conventions. |
| 7 | +VirtualDB uses an external YAML configuration to map these varying structures |
| 8 | +to a common schema, normalize factor level names (e.g., "D-glucose", |
| 9 | +"dextrose", "glu" all become "glucose"), and enable cross-dataset queries with |
| 10 | +standardized field names and values. |
11 | 11 |
|
12 | | -## API Reference |
| 12 | +For primary datasets, VirtualDB creates: |
13 | 13 |
|
14 | | -::: tfbpapi.virtual_db.VirtualDB |
15 | | - options: |
16 | | - show_root_heading: true |
17 | | - show_source: true |
| 14 | +- **`<db_name>_meta`** -- one row per sample with derived metadata columns |
| 15 | +- **`<db_name>`** -- full measurement-level data joined to the metadata view |
18 | 16 |
|
19 | | -### Helper Functions |
| 17 | +For comparative analysis datasets, VirtualDB creates: |
20 | 18 |
|
21 | | -::: tfbpapi.virtual_db.get_nested_value |
22 | | - options: |
23 | | - show_root_heading: true |
| 19 | +- **`<db_name>_expanded`** -- the raw data with composite ID fields parsed |
| 20 | + into `<link_field>_source` (aliased to configured `db_name`) and |
| 21 | + `<link_field>_id` (sample_id) columns |
| 22 | + |
| 23 | +See the [configuration guide](virtual_db_configuration.md) for setup details |
| 24 | +and the [tutorial](tutorials/virtual_db_tutorial.ipynb) for usage examples. |
| 25 | + |
| 26 | +## API Reference |
24 | 27 |
|
25 | | -::: tfbpapi.virtual_db.normalize_value |
| 28 | +::: tfbpapi.virtual_db.VirtualDB |
26 | 29 | options: |
27 | 30 | show_root_heading: true |
| 31 | + show_source: true |
0 commit comments