Skip to content

fix: implement thread-safe in-memory cache in data_loader.py#642

Merged
komalharshita merged 3 commits into
komalharshita:mainfrom
anshul23102:fix/data-loader-cache
Jun 5, 2026
Merged

fix: implement thread-safe in-memory cache in data_loader.py#642
komalharshita merged 3 commits into
komalharshita:mainfrom
anshul23102:fix/data-loader-cache

Conversation

@anshul23102
Copy link
Copy Markdown
Contributor

@anshul23102 anshul23102 commented May 26, 2026

Description

utils/data_loader.py declared _projects_cache = None but load_all_projects() never read or wrote that variable. Every call to load_all_projects() opened and parsed projects.json from disk unconditionally. Routes that touch /, /api/recommend, and /project/<id> each trigger at least one redundant file read per request.

Root Cause

_projects_cache = None  # defined but never checked or populated

def load_all_projects():
    with open(DATA_FILE, "r", encoding="utf-8") as f:
        return json.load(f)  # reads the file on every call

Related Issue

Closes #271

Type of Change

  • Bug fix

Changes Made

  • utils/data_loader.py:
    • Added _cache_lock = threading.Lock() for thread safety.
    • Implemented double-checked locking in load_all_projects() so the file is read at most once per process lifetime.
    • clear_cache() acquires the lock before resetting _projects_cache = None to prevent partial reads during cache invalidation.

Testing Done

  1. First call to load_all_projects(): file is read and result is cached.
  2. Subsequent calls: cached list returned without file I/O.
  3. clear_cache(): resets the cache; next call re-reads the file.
  4. Concurrent calls from multiple threads: no duplicate reads due to the lock.

Checklist

  • My code follows the style and formatting of this project
  • I have tested my changes locally and they work as expected
  • There are no merge conflicts with the base branch
  • This PR is linked to the correct issue

_projects_cache was declared but load_all_projects() never read or
wrote it, causing a redundant disk read on every request. Added
double-checked locking with threading.Lock so the JSON file is
read once and reused safely across concurrent requests.
clear_cache() now acquires the same lock before resetting.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

@anshul23102 is attempting to deploy a commit to the komalsony234-1530's projects Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added gssoc-2026 type:bug Something isn't working type:performance type:security and removed type:bug Something isn't working gssoc-2026 labels May 26, 2026
@anshul23102
Copy link
Copy Markdown
Contributor Author

@komalharshita this PR is ready for review and all CI checks pass. Could you please add the relevant labels? It helps with tracking. Thank you!

@anshul23102
Copy link
Copy Markdown
Contributor Author

Hi @komalharshita, just a gentle check-in on this PR. It has been a couple of days since the last activity. Happy to make any changes if you have feedback. Thanks for your time!

@anshul23102
Copy link
Copy Markdown
Contributor Author

Gentle ping -- this PR has been open for 2 days with no activity. Could you please review it when you get a chance? Happy to make any adjustments.

Merges upstream single-check cache with thread-safe double-checked
locking. Keeps the _cache_lock guard to prevent duplicate file reads
under concurrent requests.
@github-actions github-actions Bot added gssoc-2026 type:bug Something isn't working labels Jun 2, 2026
@komalharshita
Copy link
Copy Markdown
Owner

Looks safe to merge noq

@komalharshita komalharshita merged commit 5b9046e into komalharshita:main Jun 5, 2026
4 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: In-memory cache in data_loader.py is defined but never used

2 participants