fix: count parameters shared across modules once (#322, #377, #358, #303)#397
Open
Mikyx-1 wants to merge 3 commits into
Open
fix: count parameters shared across modules once (#322, #377, #358, #303)#397Mikyx-1 wants to merge 3 commits into
Mikyx-1 wants to merge 3 commits into
Conversation
…#327) A module instance shared by several parents (e.g. one nn.ReLU() passed into every block, as in the reported VNet) was counted incorrectly, inflating the total parameters — especially when combined with nested ModuleLists. Two root causes, both stemming from a shared module having one parent recorded instead of many: 1. Hierarchy (torchinfo.py): the pre-hook captured (var_name, depth, parent_info) at registration time and kept only the last parent, so every execution of a shared module reported the wrong parent. This scrambled the layer tree and mis-grouped children. Fixed by resolving the parent dynamically at execution time: accumulate every structural context a module is reached through, maintain a runtime call stack via the pre/post hooks, and select the context whose nearest executing ancestor is the current stack top. Single-parent modules are unchanged. 2. Counting (layer_info.py): leftover_params() excluded recursive children from its subtraction, re-attributing a recursive child's params (already counted at their real occurrence) to the parent — counting a shared parameterized module once per parent. Fixed with a shared _leftover() helper that subtracts each distinct child once (keyed by layer_id) and skips recursive subtrees. Adds the SharedModuleInNestedList fixture and a regression test. Verified no behavioral change for existing models (RecursiveNet, ReuseReLU, ReuseLinear, SimpleRNN, etc. all produce identical output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Yep#377) torchinfo built total_params by summing each row's num_params, with no deduplication across parameter tensors. When one tensor is referenced by multiple distinct modules (weight tying -- tied embeddings / lm_head, shared projection heads, etc.) it was counted once per referencing module, overestimating the total (e.g. flan-t5-small reported 93,410,688 vs the true 76,961,152). This differs from the module-instance sharing fixed in TylerYep#327: tied tensors live on different module objects, so id(module)-based recursion detection doesn't catch them. Take parameter totals from the root module instead. A module's named_parameters() already deduplicates shared tensors (remove_duplicate defaults to True) and includes submodules not run in the forward pass, so this matches `sum(p.numel() for p in model.parameters())`. A module whose parameters were all already counted by an earlier row is marked "(recursive)" so the per-row counts still sum to the total. Add TiedWeightsModel + test_tied_weights, and regenerate the flan-t5 snapshot (Python 3.14). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes parameter over-counting for any model that shares parameters, so
summary(...).total_paramsalways equalssum(p.numel() for p in model.parameters())— what PyTorch itself reports.Important
Stacked on #396 (fix for #327). This branch contains the #327 commit as its base, so the diff currently shows both commits. Please merge #396 first; afterwards this PR's diff reduces to just the second commit (
fix: count parameters shared across modules once). The two fixes are complementary and only together resolve all the linked issues — see below.The two complementary bugs
Shared parameters reach torchinfo two ways, and each was counted multiple times:
lm_head, shared projection heads (T5, SD, LLMs)#327 dedups by
id(module), so case B slips through (tied tensors live on different module objects). This PR closes case B.Root cause (B)
total_paramswas built bottom-up by summing each row'snum_params, with no dedup across parameter tensors. A tensor referenced by N modules was added N times.Fix
Take parameter totals from the root module.
Module.named_parameters()already deduplicates shared tensors (remove_duplicate=True) and includes submodules not run in the forward pass, so this matchessum(p.numel() for p in model.parameters())exactly. A module whose parameters were all already counted by an earlier row is marked(recursive), so the per-row column still sums to the total.Issues resolved (verified)
model.info())lm_headdouble-countedmodel.parameters())YOLO is fixed by the #327 half (module reuse); the tied-weight cases by this half. Both are needed.
Tests
TiedWeightsModelfixture +test_tied_weightsregression test.flan_t5_small.outsnapshot regenerated (Python 3.14): total93,410,688→76,961,152, decoder embedding row now correctly(recursive).🤖 Generated with Claude Code