fix: count parameters shared across modules once (#322, #377, #358, #303) by Mikyx-1 · Pull Request #397 · TylerYep/torchinfo

Mikyx-1 · 2026-06-07T10:15:19Z

Summary

Fixes parameter over-counting for any model that shares parameters, so summary(...).total_params always equals sum(p.numel() for p in model.parameters()) — what PyTorch itself reports.

Important

Stacked on #396 (fix for #327). This branch contains the #327 commit as its base, so the diff currently shows both commits. Please merge #396 first; afterwards this PR's diff reduces to just the second commit (fix: count parameters shared across modules once). The two fixes are complementary and only together resolve all the linked issues — see below.

The two complementary bugs

Shared parameters reach torchinfo two ways, and each was counted multiple times:

	Mechanism	Fixed by
A	One module instance reused under several parents (e.g. YOLO detection layers)	#327 / #396
B	One parameter tensor referenced by distinct modules — weight tying: tied embeddings / `lm_head`, shared projection heads (T5, SD, LLMs)	this PR

#327 dedups by id(module), so case B slips through (tied tensors live on different module objects). This PR closes case B.

Root cause (B)

total_params was built bottom-up by summing each row's num_params, with no dedup across parameter tensors. A tensor referenced by N modules was added N times.

Fix

Take parameter totals from the root module. Module.named_parameters() already deduplicates shared tensors (remove_duplicate=True) and includes submodules not run in the forward pass, so this matches sum(p.numel() for p in model.parameters()) exactly. A module whose parameters were all already counted by an earlier row is marked (recursive), so the per-row column still sums to the total.

Issues resolved (verified)

Issue	Symptom	Result on this branch
#322	overcount 1.78–2.37× (YOLO, T5, SD)	✅ YOLOv8n 5,257,936 → 3,157,200; YOLOv10n 4,932,416 → 2,775,520 (matches `model.info()`)
#377	tied embeddings / `lm_head` double-counted	✅ counted once by default
#358	flan-t5-small reports 128M, expected 76,961,152	✅ 76,961,152 (exact)
#303	flan-t5-small reports 109M, expected ~77M	✅ 76,961,152 (= `model.parameters()`)

YOLO is fixed by the #327 half (module reuse); the tied-weight cases by this half. Both are needed.

Tests

New TiedWeightsModel fixture + test_tied_weights regression test.
flan_t5_small.out snapshot regenerated (Python 3.14): total 93,410,688 → 76,961,152, decoder embedding row now correctly (recursive).
Full suite green (snapshots regenerated under Python 3.14 to match CI); zero regressions.

🤖 Generated with Claude Code

…#327) A module instance shared by several parents (e.g. one nn.ReLU() passed into every block, as in the reported VNet) was counted incorrectly, inflating the total parameters — especially when combined with nested ModuleLists. Two root causes, both stemming from a shared module having one parent recorded instead of many: 1. Hierarchy (torchinfo.py): the pre-hook captured (var_name, depth, parent_info) at registration time and kept only the last parent, so every execution of a shared module reported the wrong parent. This scrambled the layer tree and mis-grouped children. Fixed by resolving the parent dynamically at execution time: accumulate every structural context a module is reached through, maintain a runtime call stack via the pre/post hooks, and select the context whose nearest executing ancestor is the current stack top. Single-parent modules are unchanged. 2. Counting (layer_info.py): leftover_params() excluded recursive children from its subtraction, re-attributing a recursive child's params (already counted at their real occurrence) to the parent — counting a shared parameterized module once per parent. Fixed with a shared _leftover() helper that subtracts each distinct child once (keyed by layer_id) and skips recursive subtrees. Adds the SharedModuleInNestedList fixture and a regression test. Verified no behavioral change for existing models (RecursiveNet, ReuseReLU, ReuseLinear, SimpleRNN, etc. all produce identical output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Yep#377) torchinfo built total_params by summing each row's num_params, with no deduplication across parameter tensors. When one tensor is referenced by multiple distinct modules (weight tying -- tied embeddings / lm_head, shared projection heads, etc.) it was counted once per referencing module, overestimating the total (e.g. flan-t5-small reported 93,410,688 vs the true 76,961,152). This differs from the module-instance sharing fixed in TylerYep#327: tied tensors live on different module objects, so id(module)-based recursion detection doesn't catch them. Take parameter totals from the root module instead. A module's named_parameters() already deduplicates shared tensors (remove_duplicate defaults to True) and includes submodules not run in the forward pass, so this matches `sum(p.numel() for p in model.parameters())`. A module whose parameters were all already counted by an earlier row is marked "(recursive)" so the per-row counts still sum to the total. Add TiedWeightsModel + test_tied_weights, and regenerate the flan-t5 snapshot (Python 3.14). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Mikyx-1 and others added 3 commits June 7, 2026 15:21

fix: cast TiedWeightsModel.forward return to satisfy mypy no-any-return

c8683b3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: count parameters shared across modules once (#322, #377, #358, #303)#397

fix: count parameters shared across modules once (#322, #377, #358, #303)#397
Mikyx-1 wants to merge 3 commits into
TylerYep:mainfrom
Mikyx-1:fix/issue-322

Mikyx-1 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mikyx-1 commented Jun 7, 2026

Summary

The two complementary bugs

Root cause (B)

Fix

Issues resolved (verified)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant