Optimization: bulk-parse symbol tables (~20x faster iter_symbols)#653
Open
DanielBotnik wants to merge 1 commit into
Open
Optimization: bulk-parse symbol tables (~20x faster iter_symbols)#653DanielBotnik wants to merge 1 commit into
DanielBotnik wants to merge 1 commit into
Conversation
Author
|
Also, I couldn't find the project formatter :( I would go and manually restore unnecessary changes |
86eac14 to
b1d6c8b
Compare
Owner
|
@sevaa this seems of interest to your work to speed up pyelftools |
eeb3ada to
be378e3
Compare
Author
|
Rebased onto main and added Type Annotations |
SymbolTableSection parsed every Elf_Sym individually through the recursive `construct` parser, each with its own stream seek, and resolved every name with a separate per-symbol cstring stream parse. On binaries with many symbols this dominates load time. iter_symbols() now reads the whole table once and decodes it with a single struct.Struct (struct.iter_unpack), slicing all names out of one string-table read; the per-symbol get_symbol() keeps the original construct parse as the authoritative single-symbol path. The st_info/st_other sub-Containers (<=256 distinct values per table) are interned, and the entry Container is built via __new__ + __dict__ assignment (what Container.__init__ does anyway) -- measured ~15% of this function over Container(**fields). Measured best-of-6, iter_symbols over a whole .symtab: MIPS BE32, 8675 syms: 205 ms -> 10 ms (~20x) ARM LE32, 6221 syms: 144 ms -> 7 ms (~21x) Class/endianness are read from self.structs (always set wherever the construct path worked), not self.elffile, so the fast path covers callers that build a SymbolTableSection without an ELFFile (e.g. cle's dynamic-symbol table) exactly as the old construct path did. The enum/bitfield members are decoded through value->name maps placed in elf/enums.py next to their forward enums, mirroring the existing DW_FORM_raw2name convention; an unnamed value stays the raw integer, matching construct's _default_=Pass Enum semantics, so the returned Symbol objects are indistinguishable from the slow path. Validated: full unittest suite (118), the readelf comparison across all 62 test binaries, and the examples suite all pass; symbol output is byte-identical to stock for MIPS BE32, ARM LE32 and x86-64 LE64. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
be378e3 to
2b4e734
Compare
Author
|
Hey, Rebased again :) |
Author
|
Hey @sevaa, anyway we can push this forward? Thanks :) |
Author
|
@eliben any chance you can take a look at this? |
Owner
@sevaa is the one here most interested in run-time performance optimizations. It's summer time, perhaps he's out on vacation? You'll have to be patient, @DanielBotnik, sorry. You can obviously use your fork for now - this is OSS, after all. |
Collaborator
|
Just preoccupied elsewhere. It's OSS, as you mentioned :) I'll take a look. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SymbolTableSection parsed every Elf_Sym individually through the recursive
constructparser, each with its own stream seek, and resolved every name with a separate per-symbol cstring stream parse. On binaries with many symbols this dominates load time.iter_symbols() now:
Measured best-of-6, iter_symbols over a whole .symtab:
MIPS BE32, 8675 syms: 205 ms -> 10 ms (~20x)
ARM LE32, 6221 syms: 144 ms -> 7 ms (~21x)