Skip to content

Vine: Add caching support for selective task re-execution#4376

Open
talha129 wants to merge 4 commits intocooperative-computing-lab:masterfrom
talha129:task-caching
Open

Vine: Add caching support for selective task re-execution#4376
talha129 wants to merge 4 commits intocooperative-computing-lab:masterfrom
talha129:task-caching

Conversation

@talha129
Copy link
Copy Markdown

@talha129 talha129 commented Mar 9, 2026

Proposed Changes

This change is motivated by the caching requirements described in "Efficiently Reproducing Distributed Workflows in Notebook-based Systems" (Azaz et al., arXiv:2603.26965), as suggested by Dr. Douglas Thain.

Task Result Caching (enable_tasks_cache)

Adds opt-in memoization of Task results via Manager.enable_tasks_cache(). On first execution, task outputs are fingerprinted and stored to a local cache directory. On subsequent submissions with identical function and arguments, results are returned directly from cache without dispatching to a worker. Cache state persists across manager restarts via a JSON transaction log. A new test TR_vine_python_cache.sh verifies first-run execution, cache hits on re-submission, and correct cache misses for new arguments.

Give an overall description of the changes, along with the context and motivation.
Mention relevant issues and pull requests as needed.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

@talha129 talha129 changed the title ADD: Caching support for task previosuly completed NBReplay - Add caching support for selective task re-execution Mar 9, 2026
@dthain
Copy link
Copy Markdown
Member

dthain commented Mar 9, 2026

Please add a test case that exercises the new feature in taskvine/test.
You might look at vine_python_tag.py and TR_vine_python_tag.sh as examples to start from.

@dthain
Copy link
Copy Markdown
Member

dthain commented Mar 18, 2026

It looks like your tests ran, all except the newly added one.
I think you need to make the TR_vine_python_cache.sh executable in order for it to be recognized as a test.

@talha129
Copy link
Copy Markdown
Author

@dthain thanks for pointing it out. I have made it executable. I'm still unable to run complete test suite locally (7 failing) even though the new test file work fine. Im running it on MacOS so wanted to know if there is any previously logged issue about this.

@talha129 talha129 changed the title NBReplay - Add caching support for selective task re-execution Vine: Add caching support for selective task re-execution Mar 23, 2026
@talha129 talha129 requested a review from btovar April 1, 2026 20:00
Copy link
Copy Markdown
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.
I'd suggest moving any if self._tasks_cache: ...code... to their own functions. Inside these functions the first lines should be if not self._tasks_cache:\n return. I think that will make it clear what is part of the cache and what of the regular taskvine task management.

Comment thread taskvine/src/bindings/python3/ndcctools/taskvine/manager.py Outdated
Comment thread taskvine/src/bindings/python3/ndcctools/taskvine/manager.py Outdated
Comment thread taskvine/src/bindings/python3/ndcctools/taskvine/manager.py Outdated
Comment thread taskvine/src/bindings/python3/ndcctools/taskvine/manager.py Outdated
self._update_status_display()

# Drain cached queue before blocking on C runtime.
if self._task_cache and self._cached_queue:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, let's isolate to a method wait_for_tag_with_cache or something like that.

Comment thread taskvine/src/bindings/python3/ndcctools/taskvine/vine_cache.py Outdated
…cache and TaskCache to _tasks_cache and TasksCache
@talha129
Copy link
Copy Markdown
Author

@btovar I have moved all caching logic to it's isolated functions and also made other changes as requested.

@talha129 talha129 requested a review from btovar April 15, 2026 22:25
Copy link
Copy Markdown
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! ready to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants