Fix: volatile store partial writes through table flipping by shyba · Pull Request #806 · permaweb/HyperBEAM

shyba · 2026-03-29T03:08:12Z

Bug: during max ttl reset, volatile cache can receive partial writes, causing groups and links to point to stale data. Check tests.

Proposal: dual table approach with promotion and table flip at TTL/2

starts with two tables, new and old
writes go to new
reads checks both, promotes old to new (except groups)
TTL/2: old wiped, tables flipped, old is now new and new is old

It points out the issue. Feel free to fix in another way, but I believe this is the safest without any coordinator or mutex-like behaviour to make writes and reset atomics.
As a bonus, it acts kind of like LRU.

samcamwilliams · 2026-03-30T17:34:59Z

Nice find.

promotes old to new (except groups)

Sounds complex (why no groups?) and costly, though... Has to be a simpler way than that?

shyba · 2026-03-30T18:20:30Z

Thanks.
Groups can have childs that aren't going to be promoted, so we would need to either promote all childs or simply not promote groups. Writing a group whose childs are dead is one way of breaking consistency through dangling references.

The logic is mainly that promotion happens from links to leaves. The other way around would leave dangling references, while this way leaves just orphans, which are fine (cleaned up on next cycle and doesn't cause any error).

I measured performance to be very close but yes, it does have a cost but I was able to optimize most of the cost away by pattern matching on old/new hits so I dont promote twice.

The core issue is write atomicity. There are many ways to solve it, like a controller process, contexts, per record expire/generation, etc. However, this way solves it by basically giving the writer a new table to write long before erase hits the old table. This way, erase will never erase half of a current write.

As a bonus, it kind of acts like LRU (but way less precise) and cleanup is fast since it just cleans the whole table.

Demonstrates the "link to link: not_found" bug: hb_cache:read gets lazy links, TTL reset fires and wipes the table, then ensure_all_loaded fails because the data behind the links is gone.

The old max-ttl wiped the entire ETS table on a timer, causing dangling links when hb_cache writes span a reset boundary. The new approach uses two tables: writes go to both, reads check "new" first with promote-on-read from "old", and every TTL/2 the old table is wiped and roles flip. Active data survives via promotion; idle data expires atomically — no partial messages, no per-item timestamps, no cleanup sweeps.

Pattern-match on ets-flip presence instead of calling get_tables.

shyba marked this pull request as ready for review March 29, 2026 03:14

shyba changed the title ~~Fix: volatile store partial writes through tablet flipping~~ Fix: volatile store partial writes through table flipping Mar 29, 2026

shyba force-pushed the fix/volatile_tableflip branch 3 times, most recently from ae27de2 to 5875cf7 Compare March 29, 2026 16:48

shyba force-pushed the fix/volatile_tableflip branch from 5875cf7 to 01d2796 Compare March 31, 2026 14:09

shyba added 3 commits April 8, 2026 16:54

test: add ttl_wipe_lazy_link reproducing production 500 error

4b7c209

Demonstrates the "link to link: not_found" bug: hb_cache:read gets lazy links, TTL reset fires and wipes the table, then ensure_all_loaded fails because the data behind the links is gone.

perf: optimize lookup_entry and list for single-table fast path

e0117d8

Pattern-match on ets-flip presence instead of calling get_tables.

shyba force-pushed the fix/volatile_tableflip branch from 01d2796 to e0117d8 Compare April 8, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: volatile store partial writes through table flipping #806

Fix: volatile store partial writes through table flipping #806
shyba wants to merge 3 commits intoneo/edgefrom
fix/volatile_tableflip

shyba commented Mar 29, 2026

Uh oh!

samcamwilliams commented Mar 30, 2026

Uh oh!

shyba commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shyba commented Mar 29, 2026

Uh oh!

samcamwilliams commented Mar 30, 2026

Uh oh!

shyba commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shyba commented Mar 30, 2026 •

edited

Loading