Skip to content

fix: count parameters shared across modules once (#322, #377, #358, #303)#397

Open
Mikyx-1 wants to merge 3 commits into
TylerYep:mainfrom
Mikyx-1:fix/issue-322
Open

fix: count parameters shared across modules once (#322, #377, #358, #303)#397
Mikyx-1 wants to merge 3 commits into
TylerYep:mainfrom
Mikyx-1:fix/issue-322

Conversation

@Mikyx-1

@Mikyx-1 Mikyx-1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes parameter over-counting for any model that shares parameters, so summary(...).total_params always equals sum(p.numel() for p in model.parameters()) — what PyTorch itself reports.

Important

Stacked on #396 (fix for #327). This branch contains the #327 commit as its base, so the diff currently shows both commits. Please merge #396 first; afterwards this PR's diff reduces to just the second commit (fix: count parameters shared across modules once). The two fixes are complementary and only together resolve all the linked issues — see below.

The two complementary bugs

Shared parameters reach torchinfo two ways, and each was counted multiple times:

Mechanism Fixed by
A One module instance reused under several parents (e.g. YOLO detection layers) #327 / #396
B One parameter tensor referenced by distinct modules — weight tying: tied embeddings / lm_head, shared projection heads (T5, SD, LLMs) this PR

#327 dedups by id(module), so case B slips through (tied tensors live on different module objects). This PR closes case B.

Root cause (B)

total_params was built bottom-up by summing each row's num_params, with no dedup across parameter tensors. A tensor referenced by N modules was added N times.

Fix

Take parameter totals from the root module. Module.named_parameters() already deduplicates shared tensors (remove_duplicate=True) and includes submodules not run in the forward pass, so this matches sum(p.numel() for p in model.parameters()) exactly. A module whose parameters were all already counted by an earlier row is marked (recursive), so the per-row column still sums to the total.

Issues resolved (verified)

Issue Symptom Result on this branch
#322 overcount 1.78–2.37× (YOLO, T5, SD) ✅ YOLOv8n 5,257,936 → 3,157,200; YOLOv10n 4,932,416 → 2,775,520 (matches model.info())
#377 tied embeddings / lm_head double-counted ✅ counted once by default
#358 flan-t5-small reports 128M, expected 76,961,152 76,961,152 (exact)
#303 flan-t5-small reports 109M, expected ~77M 76,961,152 (= model.parameters())

YOLO is fixed by the #327 half (module reuse); the tied-weight cases by this half. Both are needed.

Tests

  • New TiedWeightsModel fixture + test_tied_weights regression test.
  • flan_t5_small.out snapshot regenerated (Python 3.14): total 93,410,68876,961,152, decoder embedding row now correctly (recursive).
  • Full suite green (snapshots regenerated under Python 3.14 to match CI); zero regressions.

🤖 Generated with Claude Code

Mikyx-1 and others added 3 commits June 7, 2026 15:21
…#327)

A module instance shared by several parents (e.g. one nn.ReLU() passed
into every block, as in the reported VNet) was counted incorrectly,
inflating the total parameters — especially when combined with nested
ModuleLists.

Two root causes, both stemming from a shared module having one parent
recorded instead of many:

1. Hierarchy (torchinfo.py): the pre-hook captured (var_name, depth,
   parent_info) at registration time and kept only the last parent, so
   every execution of a shared module reported the wrong parent. This
   scrambled the layer tree and mis-grouped children. Fixed by resolving
   the parent dynamically at execution time: accumulate every structural
   context a module is reached through, maintain a runtime call stack via
   the pre/post hooks, and select the context whose nearest executing
   ancestor is the current stack top. Single-parent modules are unchanged.

2. Counting (layer_info.py): leftover_params() excluded recursive children
   from its subtraction, re-attributing a recursive child's params (already
   counted at their real occurrence) to the parent — counting a shared
   parameterized module once per parent. Fixed with a shared _leftover()
   helper that subtracts each distinct child once (keyed by layer_id) and
   skips recursive subtrees.

Adds the SharedModuleInNestedList fixture and a regression test. Verified
no behavioral change for existing models (RecursiveNet, ReuseReLU,
ReuseLinear, SimpleRNN, etc. all produce identical output).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Yep#377)

torchinfo built total_params by summing each row's num_params, with no
deduplication across parameter tensors. When one tensor is referenced by
multiple distinct modules (weight tying -- tied embeddings / lm_head,
shared projection heads, etc.) it was counted once per referencing
module, overestimating the total (e.g. flan-t5-small reported
93,410,688 vs the true 76,961,152).

This differs from the module-instance sharing fixed in TylerYep#327: tied tensors
live on different module objects, so id(module)-based recursion detection
doesn't catch them.

Take parameter totals from the root module instead. A module's
named_parameters() already deduplicates shared tensors (remove_duplicate
defaults to True) and includes submodules not run in the forward pass, so
this matches `sum(p.numel() for p in model.parameters())`. A module whose
parameters were all already counted by an earlier row is marked
"(recursive)" so the per-row counts still sum to the total.

Add TiedWeightsModel + test_tied_weights, and regenerate the flan-t5
snapshot (Python 3.14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant