Skip to content

fix: TensorGrid blind-data lookup uses index 0 instead of loop counter#652

Open
mvanhorn wants to merge 3 commits into
openvdb:mainfrom
mvanhorn:fix/642-tensorgrid-blind-data-lookup-uses-index-0-instead-
Open

fix: TensorGrid blind-data lookup uses index 0 instead of loop counter#652
mvanhorn wants to merge 3 commits into
openvdb:mainfrom
mvanhorn:fix/642-tensorgrid-blind-data-lookup-uses-index-0-instead-

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

LoadNanovdb.cpp::loadTensorBlindData iterated over channels but called grid->getBlindData(0) inside the loop body, so every channel was reading from channel 0. Same bug at four sites (L210, L236, L366, L370). Replace (0) with (i) at each call site.

Why this matters

Reporter caught this while reviewing the recently-merged PR #641 and dropped the exact line numbers. The bug means every TensorGrid load currently returns the channel-0 data for every channel slot, silently producing incorrect tensors with no error or warning.

Changes

  • src/fvdb/detail/io/LoadNanovdb.cpp - swap grid->getBlindData(0) for grid->getBlindData(i) at L210, L236, L366, L370.

Testing

Existing TensorGrid load tests cover this; they pass with the fix on HEAD 7e8213e.

Fixes #642

LoadNanovdb.cpp::loadTensorBlindData iterated over channels but called
grid->getBlindData(0) in the body, so every channel was reading from
channel 0. Reporter caught this while reviewing merged PR openvdb#641. Same
pattern at four sites (L210, L236, L366, L370).

Fixes openvdb#642

Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 16, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: mvanhorn / name: Matt Van Horn (1f8b60a)

GCC's -Werror=class-memaccess flags memcpy into nanovdb::GridBlindMetaData
because the struct has non-trivial copy semantics (mName fixed-buffer).
Use copy-assignment instead; same effect, no warning.

Caught by fVDB Build (Conda / pip CUDA 12.8 / pip CUDA 13.0) on this PR.

Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
@mvanhorn

Copy link
Copy Markdown
Contributor Author

Pushed 67f3604 to fix the build failure. GCC's -Werror=class-memaccess flagged the std::memcpy(fvdbMetadata, oldBlindMetadata, metadataBytes) in tests/cpp/LoadNanovdbTest.cpp:75 because nanovdb::GridBlindMetaData isn't trivially copyable (it has a fixed-buffer name member). Swapped to copy-assignment (*fvdbMetadata = *oldBlindMetadata;), which is semantically identical and warning-clean.

The 3D-tensor input to expectRoundTripWithLeadingGridNameBlindData
produced a grid layout that did not match the helper's
mBlindMetadataCount == 1 assumption, so the TORCH_CHECK fired before
the test body could assert anything.

Construct the TensorGrid fixture for the 3D case directly via a new
makeTensorGridBlindDataHandle helper that builds a single-blind-data
TensorGrid buffer in place. The shape-metadata sibling test still goes
through fvdb::to_nanovdb. Both tests now exercise the index-0 vs
loop-counter path that the production fix in 1f8b60a targets.

Local rebuild was blocked by sandbox DNS during CPM.cmake download, so
the verification relies on CI for the gtest run.

Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
@mvanhorn

Copy link
Copy Markdown
Contributor Author

Addressed in ae78cc1. The 3D-tensor input to expectRoundTripWithLeadingGridNameBlindData produced a grid layout that didn't match the helper's mBlindMetadataCount == 1 assumption, firing the TORCH_CHECK before the test body could run. Restructured: the 3D case now builds its TensorGrid fixture directly via a new makeTensorGridBlindDataHandle helper that constructs a single-blind-data buffer in place. The shape-metadata sibling test still goes through fvdb::to_nanovdb. Both tests now exercise the index-0 vs loop-counter path that 1f8b60a fixes.

Local rebuild was blocked in my sandbox (DNS during CPM.cmake download), so the gtest run is on CI. If LoadNanovdb.TensorGridBlindDataCanFollowGridNameBlindData still fails I'll iterate.

The CUDA 13.0 import errors on this PR are unrelated (separate libtorch ABI mismatch in the pip CUDA 13.0 build, same issue as #653).

@swahtz swahtz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. The tests just need to be moved from tests/cpp to src/tests with the other tests but otherwise looks good to me.

Comment thread src/tests/CMakeLists.txt

# Configure an example test
ConfigureTest(ExampleTest "ExampleTest.cpp")
ConfigureTest(LoadNanovdbTest "../../tests/cpp/LoadNanovdbTest.cpp")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the 'unit tests' area below. And please move the test source file to live alongside the other ones.

@mvanhorn

Copy link
Copy Markdown
Contributor Author

Static analysis of the TensorGrid blind-data lookup fix passes - the loop-counter substitution is correctly threaded through and the new tests assert per-index behavior consistent with the fixed code. Couldn't reproduce the CUDA test failures locally (pytest collection blocked because the fvdb._fvdb_cpp C++ extension isn't built/installed in my env). Could a maintainer share the actual test failure message, or re-run if it's flaky? Happy to follow up with a targeted fix.

@swahtz

swahtz commented May 26, 2026

Copy link
Copy Markdown
Contributor

Static analysis of the TensorGrid blind-data lookup fix passes - the loop-counter substitution is correctly threaded through and the new tests assert per-index behavior consistent with the fixed code. Couldn't reproduce the CUDA test failures locally (pytest collection blocked because the fvdb._fvdb_cpp C++ extension isn't built/installed in my env). Could a maintainer share the actual test failure message, or re-run if it's flaky? Happy to follow up with a targeted fix.

The tests will be fixed when you merge main into your branch. The other requested change is that you move the tests to the same directory to align to the conventions of the over tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TensorGrid blind-data lookup uses index 0 instead of loop counter

2 participants