Skip to content

[auto_docstring] needs to be only run on __doc__ #45056

Open
ArthurZucker wants to merge 3 commits intomainfrom
fix-auto-doc
Open

[auto_docstring] needs to be only run on __doc__ #45056
ArthurZucker wants to merge 3 commits intomainfrom
fix-auto-doc

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Mar 27, 2026

What does this PR do?

This is mega long due I wanted to check benches.
Its not super super huge but a win is a win

@ArthurZucker ArthurZucker marked this pull request as ready for review March 27, 2026 11:36
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker marked this pull request as draft March 27, 2026 13:09
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Benchmark Update 4 — Decoration speedup (warm process, without PyTorch)

Setup: same Python process, all imports and caches already warm (inspect signature cache, regex, auto-module). Both branches measured in the same process using explicit sys.path injection to bypass the editable install. 50 rounds × 3 real config classes.


Decoration cost per class

@auto_docstring call cost what it does
branch ~0.35 µs / class stores a _LazyDocClass closure
main ~1 106 µs / class generates the full docstring eagerly
ratio ~3 160×
branch: 0.001 ms / 3 classes  =  0.35 µs/class   ← just stores a closure
main:   3.317 ms / 3 classes  = 1106 µs/class    ← full generation happens here

Cached cls.__doc__ access after generation: ~60 ns/class on both (identical).


What this means for inference / training

main branch
from transformers import LlamaConfig pays ~1 ms to generate doc immediately pays ~0.35 µs to store a closure
model.forward(inputs) __doc__ never touched __doc__ never touched
LlamaConfig.__doc__ (explicit access) ~0 ns (already done) ~1 ms (generated once, then cached)
LlamaConfig.__doc__ again ~60 ns ~60 ns

Inference and training never read __doc__. On main, each from transformers import Xxx pays ~1 ms to generate the docstring whether or not it is ever used. On branch, that cost is deferred and only paid if .__doc__ is explicitly accessed.


Why this does not show up in cold-process import benchmarks

The ~1 ms generation cost is negligible compared to Python startup (~200 ms) + transformers package init (~600 ms) + optional PyTorch import (~1 500 ms). The cold-process noise floor is ~50 ms, so a ~1–5 ms per-class saving is invisible there. The benefit accumulates across all decorated classes but is swamped by startup variance in single-class measurements.

@ArthurZucker ArthurZucker marked this pull request as ready for review March 27, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants