Skip to content

miner predictions: bigtable storage backend (--storage.backend)#265

Open
Thykof wants to merge 7 commits into
mainfrom
bigtable-prediction-storage
Open

miner predictions: bigtable storage backend (--storage.backend)#265
Thykof wants to merge 7 commits into
mainfrom
bigtable-prediction-storage

Conversation

@Thykof
Copy link
Copy Markdown
Collaborator

@Thykof Thykof commented May 25, 2026

Summary

  • New --storage.backend CLI flag (default postgres, alt bigtable). Postgres path is unchanged for retrocompat with other validators.
  • When bigtable is selected, save_responses ships each CORRECT prediction to a Bigtable row keyed {asset}#{start_time_unix}#{miner_id:06d} as raw float32.tobytes(). The Postgres prediction column carries a sentinel {"stored": "bigtable"} and a new nullable bigtable_key column carries the row key.
  • Two Bigtable tables (one per prompt_label, low/high) absorb retention via per-table GC policy (60d low, 12d high), so cleanup_old_history stays untouched. The table itself encodes the prompt label, so the row key does not include it.
  • Reads use a single range scan over {asset}#{start_time_unix}# and filter by the Postgres-provided bigtable_key whitelist. get_predictions_by_request hydrates Bigtable-backed rows back into the existing [start_ts, time_increment, *paths] shape so reward.py/CRPS scoring is backend-agnostic.

Design notes

  • Composition, not polymorphism. Bigtable logic lives in synth/validator/bigtable_prediction_storage.py and is injected into MinerDataHandler. The rest of miner_data_handler.py stays Postgres-focused.
  • Bigtable credentials come from env vars (BIGTABLE_PROJECT, BIGTABLE_INSTANCE, BIGTABLE_TABLE_LOW, BIGTABLE_TABLE_HIGH), matching the Postgres-vars pattern. Only --storage.backend is CLI.
  • google-cloud-bigtable is added to requirements.txt and lazy-imported in neurons/validator.py; the import is guarded by TYPE_CHECKING in miner_data_handler.py so Postgres-only validators never load the dep.
  • Anti-hotspotting: the row key leads with {asset} (~12 distinct values), which fans writes across many tablets so sequential start_time values don't pile on one tablet. Reordering the key fields casually would regress this.
  • miner_id zero-padded to 6 digits so lexicographic range scans match numeric order.
  • Failure mode: any non-zero mutate_rows status raises so save_responses' @retry(3) kicks in. Per-row decode errors on the read path log and hydrate to [] (treated as no-prediction) — one bad blob can't crash the whole scoring batch.
  • label_from_time_length raises ValueError on unknown time_length. Loud failure beats silently routing a future third cycle to the low table.
  • Per the wiki, ticket 05 (GCS prediction storage) is superseded by this change.

Test plan

  • pytest tests/ — 146 passed, 0 regressions.
  • tests/test_bigtable_prediction_storage.py — row-key format (zero-padded), float32 round-trip, table routing (low vs high), skip-on-invalid-format, raise-on-write-failure, missing-row hydration, per-row decode guard, range-scan filtering against the Postgres whitelist, unknown time_length raise.
  • tests/test_miner_data_handler.pysave_responses stores sentinel + bigtable_key, get_predictions_by_request substitutes the Bigtable blob back in, missing Bigtable rows hydrate to [].
  • black + flake8 clean on changed files. mypy delta: 0 new errors (file has pre-existing SQLAlchemy + decorator typing noise).
  • Testnet smoke: deploy with --storage.backend bigtable, verify rows show up in miner_predictions_low/miner_predictions_high, CRPS scores match a postgres-backend control.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional Bigtable storage backend for miner prediction payloads, keeping the existing Postgres schema/flows mostly intact while offloading large prediction blobs to Bigtable when enabled via CLI.

Changes:

  • Introduces --storage.backend (postgres default, bigtable optional) and lazy Bigtable initialization in the validator.
  • Stores Bigtable-backed predictions as a sentinel JSON in Postgres plus a new nullable miner_predictions.bigtable_key, and hydrates Bigtable payloads on read.
  • Adds a Bigtable storage module and accompanying unit/integration tests, plus an Alembic migration and env var documentation.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
synth/validator/bigtable_prediction_storage.py Implements Bigtable read/write + float32 serialization for prediction paths.
synth/validator/miner_data_handler.py Injects optional Bigtable storage into save/hydrate paths.
neurons/validator.py Adds backend selection and lazy import/initialization of Bigtable storage.
synth/utils/config.py Adds --storage.backend CLI flag and help text.
synth/db/models.py Adds nullable bigtable_key column to MinerPrediction.
alembic/versions/7c32a2966205_add_bigtable_key_to_miner_predictions.py Migration to add/drop bigtable_key.
synth/validator/forward.py Threads prompt_label through to save_responses.
synth/validator/storage_backend.py Centralizes backend constants + Bigtable sentinel.
requirements.txt Adds google-cloud-bigtable dependency.
.env.example Documents required Bigtable env vars.
tests/test_bigtable_prediction_storage.py Unit tests for routing/serialization/skip logic.
tests/test_miner_data_handler.py Integration tests for sentinel/key storage + hydration behavior.
Comments suppressed due to low confidence (1)

synth/validator/miner_data_handler.py:516

  • The exception log message in get_predictions_by_request says in get_miner_prediction, which is misleading when debugging failures in this method. It should reference get_predictions_by_request (and ideally avoid copy/paste drift).
        except Exception as e:
            bt.logging.exception(
                f"in get_miner_prediction (got an exception): {e}"
            )
            return None

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synth/utils/config.py Outdated
Comment thread synth/validator/miner_data_handler.py Outdated
Comment thread synth/validator/bigtable_prediction_storage.py
Comment thread synth/validator/bigtable_prediction_storage.py Outdated
Comment thread synth/utils/config.py Outdated
Comment thread synth/validator/bigtable_prediction_storage.py
Comment thread synth/validator/forward.py Outdated
Comment thread synth/validator/miner_data_handler.py Outdated
Comment thread synth/validator/miner_data_handler.py
Comment thread synth/validator/miner_data_handler.py Outdated
Comment on lines 441 to 448
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_predictions_by_request should take in argument the object validator_request to avoid fetching it again from database.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Comment thread synth/validator/prompt_config.py Outdated
Comment on lines +99 to +103
def label_from_time_length(time_length: int) -> str:
"""Return the prompt label ('low' or 'high') for a given time_length."""
if time_length == HIGH_FREQUENCY.time_length:
return HIGH_FREQUENCY.label
return LOW_FREQUENCY.label
Comment thread synth/validator/miner_data_handler.py Outdated
Comment on lines +399 to +416
@@ -361,6 +413,7 @@ def get_predictions_by_request(
MinerPrediction.prediction,
MinerPrediction.format_validation,
MinerPrediction.process_time,
MinerPrediction.bigtable_key,
Comment on lines +3 to +8
Each (asset, start_time, miner) is one row, value = raw float32 bytes shaped
(num_simulations x num_timesteps). Two tables are used so retention is
enforced by per-table GC policy: one for the `low` prompt and one for the
`high` prompt. The table choice already encodes the prompt label, so the
row key omits it.



def test_start_time_to_unix_treats_naive_as_utc():
# Naive vs +00:00 vs trailing Z should all produce the same unix int.
Comment thread synth/utils/config.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Comment on lines +213 to +222
is_correct = (
format_validation == response_validation_v2.CORRECT
)
bigtable_key = bigtable_keys.get(miner_uid)
if self.bigtable_storage is not None and is_correct:
prediction_column: typing.Any = BIGTABLE_SENTINEL
elif is_correct:
prediction_column = prediction
else:
prediction_column = []
Comment on lines +472 to +475
paths_by_key = self.bigtable_storage.read_predictions(
validator_request,
[r.bigtable_key for r in bigtable_rows],
)
Comment on lines +1107 to +1112
def read_predictions(self, validator_request, keys):
assert keys == [expected_key]
assert (
validator_request.time_length == simulation_input.time_length
)
return {expected_key: prediction[2:]}
@Thykof Thykof force-pushed the bigtable-prediction-storage branch from 8740a36 to d687e97 Compare May 26, 2026 16:03
@Thykof Thykof requested a review from Copilot May 26, 2026 16:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

Comment thread tests/test_miner_data_handler.py Outdated
Comment on lines +992 to +998
with db_engine.connect() as connection:
thinned_vr = connection.execute(
select(ValidatorRequest).where(
ValidatorRequest.id == thinned_vr_id
)
).one()
assert handler.get_predictions_by_request(thinned_vr) == []
Thykof and others added 7 commits May 27, 2026 18:49
New --storage.backend CLI flag (default `postgres`, alt `bigtable`). When
set to `bigtable`, save_responses ships each CORRECT prediction to a
Bigtable row keyed `{asset}#{prompt_label}#{start_time}#{miner_id}` as raw
float32 bytes; the Postgres `prediction` column then holds a sentinel JSON
and a new `bigtable_key` column carries the row key.

Two Bigtable tables (one per prompt label) absorb retention via per-table
GC policy, so cleanup_old_history stays untouched. Bigtable wiring is
isolated to synth/validator/bigtable_prediction_storage.py via composition;
miner_data_handler.py keeps its postgres focus and only branches when a
storage object is injected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helper)

- shorter --storage.backend help text
- bt.logging.warning when a miner_uid is missing from miner_id_map on
  bigtable writes (was silently skipped)
- fix get_predictions_by_request docstring (copy/paste drift from
  get_miner_prediction) and the matching exception log message
- extract _hydrate_from_bigtable helper so get_predictions_by_request
  stays scannable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r_request

The new prompt_label kwarg pushed onto save_responses and
query_available_miners_and_save_responses was redundant — every caller had
time_length in hand. Move derivation inside the storage class:

- `prompt_config.label_from_time_length` is now public (moved from a
  private helper in miner_data_handler).
- `BigtablePredictionStorage.write_predictions` takes simulation_input
  and derives the label internally; the prompt_label kwarg is gone from
  the public API.
- `BigtablePredictionStorage.read_predictions` takes a validator_request
  row and the list of keys; shape (num_simulations, num_timesteps) is
  derived from the row. Callers no longer build per-row metadata tuples.
- `save_responses` and `query_available_miners_and_save_responses` drop
  their prompt_label parameter; the neurons/validator.py call site stops
  threading it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Combines the Copilot bug fixes with the row-key / read-path tightening
that follows from having two tables (the table already encodes the prompt,
so the key doesn't need to).

Hardening:
- write_predictions raises on any non-zero mutate_rows status so
  save_responses' @Retry(3) kicks in. Previously a failed write still
  returned a bigtable_key, leaving Postgres pointing at nothing — those
  rows hydrated as [] later and silently dropped scoring data.
- read_predictions wraps the per-row decode in try/except: corrupted /
  mis-shaped blobs no longer crash the whole scoring batch; the bad row
  hydrates as [] and the scoring loop continues with the other miners.

Row-key + read path:
- Row key is now `{asset}#{start_time_unix}#{miner_id:06d}` — drops the
  prompt_label (redundant when the table already encodes it), uses a
  unix timestamp instead of an ISO string for deterministic parity
  between write-side (simulation_input.start_time) and read-side
  (validator_request.start_time from Postgres), zero-pads miner_id so
  lexicographic range scans match numeric order.
- read_predictions switches from N point lookups to a single range scan
  over `{asset}#{start_time_unix}#`. Filters the result by the
  Postgres-provided key whitelist so density-tapered rows (which still
  exist on the Bigtable side until GC) don't leak into the result.
- _start_time_to_unix normalizes naive ISO strings as UTC so the
  derived row-key prefix matches what Postgres-derived datetimes
  produce on read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- get_predictions_by_request(validator_request) — was (validator_request_id).
  Eliminates the duplicate `select(ValidatorRequest...)` we were doing for
  hydration metadata; reward.py and tests already have the row in hand.
- label_from_time_length raises ValueError on unknown time_length. Bigtable
  routes writes to one of two tables off this label, so a silent fallback
  to LOW could mis-shelve a future third cycle. Loud failure beats quiet
  mis-routing.
- --storage.backend uses STORAGE_BACKEND_CHOICES / STORAGE_BACKEND_POSTGRES
  from synth.validator.storage_backend instead of duplicating string
  literals.
- get_predictions_by_request docstring updated to be honest about the
  mixed Row / SimpleNamespace return — consumers access by attribute and
  don't care, but the docstring was previously claiming a uniform tuple
  shape.
- Drop the misleading "trailing Z" mention from
  test_start_time_to_unix_treats_naive_as_utc — datetime.fromisoformat
  doesn't parse Z on Python <3.11, and callers always pass
  isoformat() output with +00:00 anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- get_validator_requests_to_score now selects ValidatorRequest.num_simulations
  and populates it on the returned vr. Without this, _hydrate_from_bigtable
  crashes on int(None) at runtime — Bigtable backend would have failed on
  the first scoring cycle. Surface introduced by the v2 refactor that
  routed validator_request through to read_predictions.
- save_responses raises if the Bigtable backend returns no key for a row
  it's about to mark as sentinel. The scenario shouldn't happen given
  write_predictions' current contract (returns a key for every CORRECT
  known miner, raises on mutate failure), but the dependency was implicit
  — make the invariant explicit.
- Integration test now asserts validator_request.num_simulations matches
  simulation_input.num_simulations, so a future regression in
  get_validator_requests_to_score gets caught at test time instead of
  in prod.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Core `connection.execute(select(ValidatorRequest)...).one()` happened
to work (Core expands the ORM class to columns and the resulting Row
exposes them by name), but read as ORM/Session usage to reviewers and
would actually break under Session. Switch to the explicit Session +
scalar_one() pattern so the entity-vs-row intent is unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Thykof Thykof force-pushed the bigtable-prediction-storage branch from 869dc09 to db21434 Compare May 27, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants