Skip to content

Optimization: bulk-parse symbol tables (~20x faster iter_symbols)#653

Open
DanielBotnik wants to merge 1 commit into
eliben:mainfrom
DanielBotnik:faster-symbol-table
Open

Optimization: bulk-parse symbol tables (~20x faster iter_symbols)#653
DanielBotnik wants to merge 1 commit into
eliben:mainfrom
DanielBotnik:faster-symbol-table

Conversation

@DanielBotnik

@DanielBotnik DanielBotnik commented May 16, 2026

Copy link
Copy Markdown

SymbolTableSection parsed every Elf_Sym individually through the recursive construct parser, each with its own stream seek, and resolved every name with a separate per-symbol cstring stream parse. On binaries with many symbols this dominates load time.

iter_symbols() now:

  • reads the whole table once and decodes it with a single struct.Struct (struct.iter_unpack in the common equal-stride case, a fallback offset loop for padded tables);
  • slices all names out of one string-table read;
  • interns the st_info/st_other sub-Containers, of which there are at most 256 distinct values each per table, turning 2N allocations into <=512 and dropping 4N enum lookups;
  • builds the Symbol entry Container via new + dict assignment.

Measured best-of-6, iter_symbols over a whole .symtab:
MIPS BE32, 8675 syms: 205 ms -> 10 ms (~20x)
ARM LE32, 6221 syms: 144 ms -> 7 ms (~21x)

@DanielBotnik

Copy link
Copy Markdown
Author

Also, I couldn't find the project formatter :(

I would go and manually restore unnecessary changes

@DanielBotnik DanielBotnik force-pushed the faster-symbol-table branch 10 times, most recently from 86eac14 to b1d6c8b Compare May 16, 2026 17:16
@eliben eliben requested a review from sevaa May 18, 2026 13:06
@eliben

eliben commented May 18, 2026

Copy link
Copy Markdown
Owner

@sevaa this seems of interest to your work to speed up pyelftools

@DanielBotnik DanielBotnik force-pushed the faster-symbol-table branch 2 times, most recently from eeb3ada to be378e3 Compare May 21, 2026 20:33
@DanielBotnik

Copy link
Copy Markdown
Author

Rebased onto main and added Type Annotations

SymbolTableSection parsed every Elf_Sym individually through the
recursive `construct` parser, each with its own stream seek, and
resolved every name with a separate per-symbol cstring stream parse.
On binaries with many symbols this dominates load time.

iter_symbols() now reads the whole table once and decodes it with a
single struct.Struct (struct.iter_unpack), slicing all names out of
one string-table read; the per-symbol get_symbol() keeps the original
construct parse as the authoritative single-symbol path. The
st_info/st_other sub-Containers (<=256 distinct values per table) are
interned, and the entry Container is built via __new__ + __dict__
assignment (what Container.__init__ does anyway) -- measured ~15% of
this function over Container(**fields).

Measured best-of-6, iter_symbols over a whole .symtab:

  MIPS BE32, 8675 syms:  205 ms -> 10 ms  (~20x)
  ARM  LE32, 6221 syms:  144 ms ->  7 ms  (~21x)

Class/endianness are read from self.structs (always set wherever the
construct path worked), not self.elffile, so the fast path covers
callers that build a SymbolTableSection without an ELFFile (e.g. cle's
dynamic-symbol table) exactly as the old construct path did.

The enum/bitfield members are decoded through value->name maps placed
in elf/enums.py next to their forward enums, mirroring the existing
DW_FORM_raw2name convention; an unnamed value stays the raw integer,
matching construct's _default_=Pass Enum semantics, so the returned
Symbol objects are indistinguishable from the slow path.

Validated: full unittest suite (118), the readelf comparison across
all 62 test binaries, and the examples suite all pass; symbol output
is byte-identical to stock for MIPS BE32, ARM LE32 and x86-64 LE64.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@DanielBotnik DanielBotnik force-pushed the faster-symbol-table branch from be378e3 to 2b4e734 Compare May 28, 2026 20:01
@DanielBotnik

Copy link
Copy Markdown
Author

Hey, Rebased again :)

@DanielBotnik

Copy link
Copy Markdown
Author

Hey @sevaa, anyway we can push this forward? Thanks :)

@DanielBotnik

Copy link
Copy Markdown
Author

@eliben any chance you can take a look at this?

@eliben

eliben commented Jun 5, 2026

Copy link
Copy Markdown
Owner

@eliben any chance you can take a look at this?

@sevaa is the one here most interested in run-time performance optimizations. It's summer time, perhaps he's out on vacation?

You'll have to be patient, @DanielBotnik, sorry. You can obviously use your fork for now - this is OSS, after all.

@sevaa

sevaa commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Just preoccupied elsewhere. It's OSS, as you mentioned :) I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants