Commit c2b87ae
authored
Merge pull request #82 from VEZY/attributes-in-table -> Columnar Attribute Backend + Traversal Integration
### Summary
This PR migrates MTG attributes to a **typed, per-symbol columnar backend** and updates traversal/query paths to leverage typed data while preserving current semantics (`nothing` in traversal outputs for absent attributes, `ignore_nothing` behavior, etc.).
---
### 1. New attribute backend (table-like, per symbol)
Implemented core types:
- `Column{T}`
- `SymbolBucket`
- `MTGAttributeStore`
- `NodeAttrRef`
- `ColumnarAttrs`
- `ColumnarStore` (marker type)
Design:
- Attributes are stored in typed columns (`Vector{T}`) grouped by symbol (e.g. `:Leaf`, `:Internode`).
- Node-to-row and row-to-node mappings provide O(1) access from node to its data row.
- This improves type stability and memory locality compared to per-node `Dict{...,Any}`.
---
### 2. Default backend switched to columnar
- `read_mtg` now always uses `ColumnarAttrs`.
- Parsed MTG attributes are converted into the typed columnar store by default.
- Manual node construction still accepts `Dict`/`NamedTuple` input, which is auto-converted to `ColumnarAttrs`.
---
### 3. New explicit attribute API
Added/standardized:
- `attribute(node, key; default=nothing)`
- `attribute!(node, key, value)`
- `attributes(node; format=:namedtuple | :dict)`
- `attribute_names(node)`
Schema operations (per symbol):
- `add_column!(mtg, symbol, key, T; default=...)`
- `drop_column!(mtg, symbol, key)`
- `rename_column!(mtg, symbol, from, to)`
Note: schema changes are **symbol-bucket scoped** (apply to all nodes of that symbol).
---
### 4. Tables.jl integration
Added tabular sources:
- `symbol_table(mtg, symbol)` (per-symbol view)
- `mtg_table(mtg)` (unified view in traversal order)
Behavior:
- Unified view uses `missing` for attributes absent on some symbols.
- Works directly with `DataFrame(...)`.
---
### 5. Descendants/ancestors: typed inference + optimized paths
- Return eltypes are now inferred from typed columns.
- `type=` kw for `descendants`/`ancestors` is deprecated.
- `ignore_nothing` semantics preserved.
Implemented:
- Multi-key descendants (`descendants(node, [:k1, :k2], ...)`) with correct grouped row output and key order.
- In-place and allocating variants parity.
- Fast columnar lookup path (`unsafe_getindex` with query plan).
---
### 6. Hybrid descendants strategy
Added strategy control:
- `descendants_strategy(mtg)`
- `descendants_strategy!(mtg, :pointer | :indexed | :auto)`
Backend behavior:
- `:pointer`: direct tree traversal (as before)
- `:indexed`: DFS interval index (`tin/tout/dfs_order`) for fast repeated subtree queries
- `:auto`: switches based on query/mutation dynamics (the default)
Structural mutations mark the index as dirty and trigger a rebuild when needed.
---
### 7. Structural mutation integration
Updated mutation paths to keep store/index consistent:
- insert binds newly created nodes into the columnar store
- delete/prune removes nodes from store rows
- node structure changes mark subtree index dirty
This keeps traversal and attribute access coherent after growth operations.
---
### 8. Benchmark suite expansion (ASV)
Benchmarks now cover:
- traversal updates (node indexing vs explicit attribute API)
- descendants/ancestors one-key and multi-key
- mixed-symbol queries with `ignore_nothing=true/false`
- repeated small queries (`children`, `parent`, `ancestors`, `descendants`)
- mutation API surface (`insert/delete/prune/select/transform/write`)
- table conversion (`symbol_table`, `mtg_table`)
Also made benchmark harness cross-version robust for Symbol/String filters/keys.
---
### 9. Tests/docs updates
Tests added/extended for:
- columnar backend correctness
- schema operations add/drop/rename
- descendants multi-key correctness and parity
- traversal strategy behavior
- Tables.jl views and DataFrame conversion
Docs updated to explain:
- new attribute API
- columnar backend behavior
- strategy modes (`:pointer`, `:indexed`, `:auto`)
- performance guidance for growth-heavy vs query-heavy workloads
---
### Important notes for reviewers
- The large speedups in some descendants workloads are real for typed/multi-attr columnar retrieval.
- Mutation-heavy and some short-query workloads are still areas of active optimization.
- Public traversal semantics were preserved (notably `nothing` handling in mixed-symbol traversals).46 files changed
Lines changed: 2165 additions & 464 deletions
File tree
- benchmark
- docs/src
- the_mtg
- tutorials
- src
- compute_MTG
- filter
- conversion
- read_MTG
- types
- test
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
33 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
15 | | - | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
0 commit comments