Skip to content

Commit c2b87ae

Browse files
authored
Merge pull request #82 from VEZY/attributes-in-table -> Columnar Attribute Backend + Traversal Integration
### Summary This PR migrates MTG attributes to a **typed, per-symbol columnar backend** and updates traversal/query paths to leverage typed data while preserving current semantics (`nothing` in traversal outputs for absent attributes, `ignore_nothing` behavior, etc.). --- ### 1. New attribute backend (table-like, per symbol) Implemented core types: - `Column{T}` - `SymbolBucket` - `MTGAttributeStore` - `NodeAttrRef` - `ColumnarAttrs` - `ColumnarStore` (marker type) Design: - Attributes are stored in typed columns (`Vector{T}`) grouped by symbol (e.g. `:Leaf`, `:Internode`). - Node-to-row and row-to-node mappings provide O(1) access from node to its data row. - This improves type stability and memory locality compared to per-node `Dict{...,Any}`. --- ### 2. Default backend switched to columnar - `read_mtg` now always uses `ColumnarAttrs`. - Parsed MTG attributes are converted into the typed columnar store by default. - Manual node construction still accepts `Dict`/`NamedTuple` input, which is auto-converted to `ColumnarAttrs`. --- ### 3. New explicit attribute API Added/standardized: - `attribute(node, key; default=nothing)` - `attribute!(node, key, value)` - `attributes(node; format=:namedtuple | :dict)` - `attribute_names(node)` Schema operations (per symbol): - `add_column!(mtg, symbol, key, T; default=...)` - `drop_column!(mtg, symbol, key)` - `rename_column!(mtg, symbol, from, to)` Note: schema changes are **symbol-bucket scoped** (apply to all nodes of that symbol). --- ### 4. Tables.jl integration Added tabular sources: - `symbol_table(mtg, symbol)` (per-symbol view) - `mtg_table(mtg)` (unified view in traversal order) Behavior: - Unified view uses `missing` for attributes absent on some symbols. - Works directly with `DataFrame(...)`. --- ### 5. Descendants/ancestors: typed inference + optimized paths - Return eltypes are now inferred from typed columns. - `type=` kw for `descendants`/`ancestors` is deprecated. - `ignore_nothing` semantics preserved. Implemented: - Multi-key descendants (`descendants(node, [:k1, :k2], ...)`) with correct grouped row output and key order. - In-place and allocating variants parity. - Fast columnar lookup path (`unsafe_getindex` with query plan). --- ### 6. Hybrid descendants strategy Added strategy control: - `descendants_strategy(mtg)` - `descendants_strategy!(mtg, :pointer | :indexed | :auto)` Backend behavior: - `:pointer`: direct tree traversal (as before) - `:indexed`: DFS interval index (`tin/tout/dfs_order`) for fast repeated subtree queries - `:auto`: switches based on query/mutation dynamics (the default) Structural mutations mark the index as dirty and trigger a rebuild when needed. --- ### 7. Structural mutation integration Updated mutation paths to keep store/index consistent: - insert binds newly created nodes into the columnar store - delete/prune removes nodes from store rows - node structure changes mark subtree index dirty This keeps traversal and attribute access coherent after growth operations. --- ### 8. Benchmark suite expansion (ASV) Benchmarks now cover: - traversal updates (node indexing vs explicit attribute API) - descendants/ancestors one-key and multi-key - mixed-symbol queries with `ignore_nothing=true/false` - repeated small queries (`children`, `parent`, `ancestors`, `descendants`) - mutation API surface (`insert/delete/prune/select/transform/write`) - table conversion (`symbol_table`, `mtg_table`) Also made benchmark harness cross-version robust for Symbol/String filters/keys. --- ### 9. Tests/docs updates Tests added/extended for: - columnar backend correctness - schema operations add/drop/rename - descendants multi-key correctness and parity - traversal strategy behavior - Tables.jl views and DataFrame conversion Docs updated to explain: - new attribute API - columnar backend behavior - strategy modes (`:pointer`, `:indexed`, `:auto`) - performance guidance for growth-heavy vs query-heavy workloads --- ### Important notes for reviewers - The large speedups in some descendants workloads are real for typed/multi-attr columnar retrieval. - Mutation-heavy and some short-query workloads are still areas of active optimization. - Public traversal semantics were preserved (notably `nothing` handling in mixed-symbol traversals).
2 parents 9b958ad + 28cac59 commit c2b87ae

46 files changed

Lines changed: 2165 additions & 464 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENT.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
- `src/compute_MTG/indexing.jl`
2222
- `src/compute_MTG/check_filters.jl`
2323
- `src/types/Node.jl`
24+
- `src/types/Attributes.jl`
2425
- `src/compute_MTG/node_funs.jl`
2526

2627
## Practical Optimization Rules
@@ -30,4 +31,14 @@
3031
- `descendants!(buffer, node, key; ...)`
3132
- Keep filter checks branch-light when no filters are provided.
3233
- Keep key access on typed attribute containers (`NamedTuple`, `MutableNamedTuple`, typed dicts) in specialized methods when possible.
34+
- Prefer explicit attribute APIs:
35+
- `attribute(node, key)`
36+
- `attribute!(node, key, value)`
37+
- `attributes(node; format=:namedtuple|:dict)`
38+
- `add_column!` / `drop_column!` / `rename_column!`
3339
- Preserve API behavior and add tests for every optimization that changes internals.
40+
41+
## Benchmark Tiers
42+
- `small` (~10k nodes): full matrix including API-surface benchmarks
43+
- `medium` (~100k nodes): hot-path matrix
44+
- `large` (~300k nodes): critical hot paths only

Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ MutableNamedTuples = "af6c499f-54b4-48cc-bbd2-094bba7533c7"
1414
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
1515
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
1616
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
17+
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
1718
XLSX = "fdbf4ff8-1666-58a4-91e7-1b58723a45e0"
1819

1920
[compat]
@@ -25,6 +26,7 @@ MetaGraphsNext = "0.5, 0.6, 0.7"
2526
MutableNamedTuples = "0.1"
2627
OrderedCollections = "1.4"
2728
SHA = "0.7, 1"
29+
Tables = "1"
2830
XLSX = "0.7, 0.8, 0.9, 0.10"
2931
julia = "1.6"
3032

benchmark/Project.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
[deps]
22
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
33
MultiScaleTreeGraph = "dd4a991b-8a45-4075-bede-262ee62d5583"
4+
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
5+
6+
[sources]
7+
MultiScaleTreeGraph = {path = ".."}

benchmark/README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,21 @@ julia --project=benchmark benchmark/benchmarks.jl
1010

1111
Workloads currently covered:
1212

13-
- full-tree traversal
14-
- traversal + data extraction from descendants
15-
- repeated many-small queries (`children`, `parent`, `ancestors`)
13+
- tiered datasets: `small` (~10k nodes), `medium` (~100k), `large` (~300k)
14+
- full-tree traversal and traversal updates
15+
- one/multi-attribute updates using node indexing (`node[:k]`)
16+
- one/multi-attribute updates using explicit API (`attribute`/`attribute!`)
17+
- one symbol (`:Leaf`) and mixed symbols (`[:Leaf, :Internode]`)
18+
- descendants queries
19+
- one/many attributes, one symbol
20+
- one/many attributes, mixed symbols
21+
- `ignore_nothing=true/false`
22+
- in-place and allocating variants
23+
- repeated many-small queries
24+
- `children`, `parent`, `ancestors`, `ancestors!`
25+
- repeated descendants retrieval with and without in-place buffers
26+
- API surface (small tier)
27+
- insertion/deletion/pruning
28+
- `transform!`, `select!`
29+
- `symbol_table` / `mtg_table`
30+
- `write_mtg`

0 commit comments

Comments
 (0)