Hublabel#239
Conversation
…it returns, and which is a different size on mac
…rs that validate it
…opefully query off the right orientations
…ses that all can run
…rtices from a net graph child. Add a bunch of comments exlaining why I am confused by the distance index orientation bookkeeping.
…ord access attempts
…t that instead of unpicking it
fix by GPT-5 mini thru GitHub Copilot
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adamnovak
left a comment
There was a problem hiding this comment.
This looks pretty good, but there are still some fairly big functions that are neither called nor documented (and appear to be removable), and it looks like something went wrong with an attempt to add doc comments by find-and-replace which probably needs to be cleaned up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The committed Binder-generated bindings had drifted from the headers and no
longer compiled:
- is_regular_snarl was simplified to a single (const net_handle_t&) overload,
but the bindings still referenced the removed 2-arg/3-arg forms.
- get_start_endpoint/get_end_endpoint now return endpoint_t (not const), but
the bindings cast to const endpoint_t.
Regenerated all bindings with make_and_run_binder.py. To make regeneration work
again on this branch:
- Add the Homebrew include dir to Binder's clang invocation so it can find
Boost (where CMake's find_package(Boost) locates it).
- Mark bdsg/ch.hpp (contraction hierarchy, C++20 + Boost Graph) BINDER_IGNORE
so Binder skips it. It was never bound and cannot be cleanly auto-bound;
its Boost/<ranges> includes and #include sites are ignored.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
adamnovak
left a comment
There was a problem hiding this comment.
I still think the doc comment syntax isn't all quite right, but this is probably good enough to ship!
| * For each node reachable from node along forward (out-) edges, the candidate | ||
| * distance is compared against what the already-built labels can already | ||
| * express (binary_intersection_ch of labels_back[cur] against labels[node]). | ||
| * If the existing labels do not already cover the distance, node's rank is | ||
| * appended as a hub to labels_back[cur] and the search continues from cur; | ||
| * otherwise that branch is pruned. |
There was a problem hiding this comment.
This is a summary of how the implementation currently works (which would be better in the function body), not documentation for what the function promises to accomplish or requires in order to do it.
| * Seeds labels[node] with node itself at distance 0, then explores backward | ||
| * (in-) edges. For each node from which node is reachable, the candidate | ||
| * distance is compared against what the already-built labels can already | ||
| * express (binary_intersection_ch of labels[cur] against labels_back[node]). | ||
| * If not already covered, node's rank is appended as a hub to labels[cur] and | ||
| * the search continues; otherwise that branch is pruned. |
There was a problem hiding this comment.
This also seems more like a summary.
| /** | ||
| * These are used to interpret snarl_tree_records, which is just a vector of ints. | ||
| * Uses a SnarlTreeRecord as the main class for defining and interpreting the records. | ||
| * SnarlTreeRecords are given a base pointer, which points to the start of the record | ||
| * Each record starts with a "tag", which defines which type of record it is. The | ||
| * contents of the record are defined below: | ||
| */ | ||
|
|
||
| /*Root record | ||
| */ |
There was a problem hiding this comment.
This doesn't apply to a particular member after the comment, and so can't really be a doc comment.
| /** Root record | ||
| * - The (single) root vector has the format: | ||
| * [root tag, # connected components (N), # nodes (M), min_node_id, max depth [pointer to node/snarl/chain record] x N], [pointer to node+ node offset] x M] | ||
| * The root vector stores the root of every connected component, which can be a | ||
| * node, snarl, or chain | ||
| */ | ||
| const static size_t ROOT_RECORD_SIZE = 6; |
There was a problem hiding this comment.
This is going to get used as the doc comment for just the first variable. It's really about the whole group of them, but these pseudo-structs are not a thing Doxygen lets you document.
I guess that makes as much sense as anywhere else to put the text in the docs?
| /** | ||
| * | ||
| * The "tags" for defining what kind of record we're looking at. These are the first entry in any | ||
| * record. They will be a record_t and a bit vector indicating connectivity. | ||
| * The bit vector will be the last 6 bits of the tag | ||
| * | ||
| * Each bit represents one type of connectivity: | ||
| * start-start, start-end, start-tip, end-end, end-tip, tip-tip | ||
| * | ||
| * The remainder of the tag will be the record_t of the record | ||
| */ | ||
| /////////// Methods for interpreting the tags for each snarl tree record |
There was a problem hiding this comment.
I think these both are going to end up as the documentation for get_record_type().
| /** Functions to add children to a chain. Assumes that the chain is well formed up to here. | ||
| * These will always be called in order going forward in the chain. | ||
| * The chain is actually composed of snarl records and trivial snarl records, but we | ||
| * add things by node and snarl. | ||
| * We need to keep a SnarlTreeRecordWriter to the last thing that we added (snarl or trivial snarl), | ||
| * so that when we add nodes we either add them to the end of the last trivial snarl or | ||
| * (if we have too many nodes in the last trivial snarl or if the last thing on the chain is a snarl) | ||
| * create a new trivial snarl . | ||
| * Each (trivial/simple) snarl record is flanked by the size of the record, so it will be | ||
| * [chain info] ts size | ts record | s size | s record | s size | ts size | ts | ts size .... | ||
| * | ||
| */ | ||
|
|
||
| //Add a snarl to the end of the chain and return a SnarlRecordWriter pointing to it | ||
| SnarlRecordWriter add_snarl(size_t snarl_size, record_t type, size_t hhl_size, size_t previous_child_offset); |
There was a problem hiding this comment.
This block about "Functions to add children to a chain." is the doc comment for add_snarl, and the line about what add_snarl in particular does gets thrown away.
| * existing labels already cover, and node_dists is left reset to INF_INT. | ||
| */ | ||
| void build_backward_labels(int node, CHOverlay& ov, vector<DIST_UINT>& node_dists, vector<vector<HubRecord>>& labels, vector<vector<HubRecord>>& labels_back) { | ||
| auto in_node = node; |
There was a problem hiding this comment.
I don't know what Doxygen will do if we have two doc comments for the same function in different places.
But since we're not binding ch.hpp they won't end up on the web page Python docs, so maybe we don't care as much?
libbdsg changes to go with merging the hublabel branch of vg