You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Features:
- Add client.submit_many() with TaskRef for batch submission and
automatic topological ordering of dependencies.
- Expose ResultRef.commit_hash so users can inspect lineage, diff,
and manage commits directly from a result reference.
- Add Client.clear() as a convenience wrapper for gc(timedelta(days=0)).
- Add CASHET_DIR environment variable fallback for store_dir.
- CLI improvements: cashet get pretty-prints strings/dicts/lists,
cashet clear alias, and human-readable disk sizes in stats.
Notebook & dynamic source support:
- Add dill dependency for source extraction in notebooks/REPLs.
- Tiered fallback in get_func_source(): inspect.getsource -> dill ->
stable bytecode representation (co_code + co_consts).
- Functions defined in Jupyter, IPython, exec(), and lambdas now
hash correctly and invalidate cache on semantic changes.
Thread safety & correctness:
- Fix LocalExecutor check-then-act race with a shared lock registry
keyed by store path. Concurrent submits deduplicate across threads
and even across separate Client instances sharing the same store.
- Fix client.get() to populate commit_hash on the temporary ResultRef.
Robustness:
- Replace string-building _stable_repr with progressive _stable_hash
to avoid multi-megabyte intermediate strings for large args.
- Add length-prefixed hashing tags to prevent collision attacks.
- Add cycle detection in _stable_repr_to and _stable_hash for
recursive data structures (lists, dicts, sets, objects).
- Add input validation to submit_many for clear TypeError messages
on bad task shapes.
Tests:
- Add coverage for submit_many, TaskRef wiring, progressive hash
collision resistance, disk_bytes stats, CASHET_DIR env var,
recursive structures, dynamic/bytecode hashing, and concurrent
deduplication both within one Client and across multiple Clients.
- Verify all 8 pipeline scenarios execute correctly inside an actual
Jupyter kernel via nbconvert.
Docs:
- Document commit_hash, Client.clear(), Jupyter/REPL support,
and thread safety guarantees in README.md.
Copy file name to clipboardExpand all lines: README.md
+54-4Lines changed: 54 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -176,7 +176,7 @@ Fix the `join_crm` function and re-run the script. Steps 1-2 return instantly fr
176
176
177
177
### 3. Reproducible Notebook Results
178
178
179
-
Share a result with a colleague and they can verify exactly how it was produced:
179
+
`cashet` is designed to work in Jupyter notebooks and IPython sessions. Share a result with a colleague and they can verify exactly how it was produced:
180
180
181
181
```python
182
182
# your notebook
@@ -340,13 +340,22 @@ Submit a function for execution. Returns a `ResultRef` — a lazy handle to the
`cashet` works seamlessly in Jupyter notebooks, IPython, and the Python REPL. It uses a tiered source-resolution strategy:
460
+
461
+
1.**`inspect.getsource()`** — for normal `.py` files
462
+
2.**`dill.source.getsource()`** — for interactive sessions with live history
463
+
3.**`dis.Bytecode` fallback** — for any live function, even after a kernel restart
464
+
465
+
This means you can define functions in a notebook cell, rerun the cell with changes, and `cashet` will correctly invalidate the cache based on the new code.
466
+
467
+
```python
468
+
# In a notebook cell
469
+
client = Client()
470
+
471
+
defpreprocess(data):
472
+
return [x *2for x in data]
473
+
474
+
ref = client.submit(preprocess, [1, 2, 3])
475
+
```
476
+
477
+
Change the cell body and rerun — the cache invalidates automatically.
478
+
479
+
### Thread Safety
480
+
481
+
`cashet` is safe to use from multiple threads (and processes sharing the same store directory). Concurrent submissions of the same uncached task are deduplicated: the function executes **exactly once** and all callers receive the same cached result.
482
+
483
+
```python
484
+
import threading
485
+
486
+
defworker():
487
+
c = Client() # separate Client instance, same store
488
+
c.submit(expensive_func, arg)
489
+
490
+
threads = [threading.Thread(target=worker) for _ inrange(10)]
491
+
for t in threads:
492
+
t.start()
493
+
for t in threads:
494
+
t.join()
495
+
# expensive_func ran only once
496
+
```
497
+
448
498
### `ResultRef`
449
499
450
500
A lazy reference to a stored result. Pass it as an argument to chain tasks:
0 commit comments