Fix metadata path leaking outside cache when an ancestor dir contains '.py'#762
Open
Kymi808 wants to merge 1 commit into
Open
Fix metadata path leaking outside cache when an ancestor dir contains '.py'#762Kymi808 wants to merge 1 commit into
Kymi808 wants to merge 1 commit into
Conversation
`_copy_script_and_other_resources_in_importable_dir` derived the cache
metadata sibling via `importable_local_file.split(".py")[0] + ".json"`. The
intent (per the comment immediately below) is to swap the trailing `.py`
extension for `.json`, but `str.split(".py")` splits on every occurrence and
`[0]` takes everything before the first.
For any cache path with `.py` in an ancestor directory — `.pyenv`,
`.pycache`, pypy paths — the prefix gets truncated to before that directory.
On a pyenv user's machine
`/home/u/.pyenv/.../accuracy/<hash>/accuracy.py` becomes `/home/u/.json`,
so the metadata is written outside the cache tree (and gets clobbered by every
subsequent `evaluate.load()`).
Use `os.path.splitext(...)[0]` to strip only the trailing extension, matching
the comment's intent. Adds a regression test using a tmp dir whose ancestor
contains `.pyenv` and asserts the metadata lands next to the copied script
and not at `<root>/.json`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_copy_script_and_other_resources_in_importable_dir(src/evaluate/loading.py:333) derives the cache metadata JSON sibling of the loaded script:The intent (per the comment two lines below — *"the filename is .py in our case, so better rename to filenam.json instead of filename.py.json") is to swap the trailing
.pyextension for.json. Butstr.split(".py")splits on every occurrence of.pyand[0]keeps everything before the first.For any user whose cache path has
.pysomewhere in an ancestor directory —.pyenv,.pycache, pypy install paths, project directories containing.py— the prefix gets truncated to before that directory.A pyenv user with an importable file at
/home/u/.pyenv/versions/.../evaluate_modules/metrics/accuracy/<hash>/accuracy.pyends up writing the metadata to
/home/u/.json— outside the cache tree — and every subsequentevaluate.load(...)clobbers it.Fix
Use
os.path.splitext(...)[0]so only the trailing extension is stripped, matching the comment's stated intent.Why this isn't a behavior change someone is relying on
grep -rn '"original file path"\|"local file path"\|meta_path'shows nothing in the repo reads the metadata back — it's write-only informational JSON. The only effect of the bug is junk JSON files at unexpected paths.Test plan
Added
test_copy_script_metadata_path_when_ancestor_dir_contains_pyintests/test_load.py. It calls_copy_script_and_other_resources_in_importable_dirwithimportable_directory_path = <tmp>/.pyenv/evaluate_modulesand asserts:<tmp>/.pyenv/evaluate_modules/<hash>/accuracy.json(next to the script);<tmp>/.json.The test fails on
mainwithAssertionError: meta file missing at <expected path>and passes with this change.pytest tests/test_load.py::test_copy_script_metadata_path_when_ancestor_dir_contains_py→ 1 passed.