feat(curtailment): persistence + preview selector by rongxin-liu · Pull Request #185 · block/proto-fleet

rongxin-liu · 2026-05-06T17:09:33Z

Background

Curtailment is the proto-fleet feature that reduces a fleet's mining power on demand — operators preview a plan, start an event, verify reduction via telemetry, then restore safely.

The contract foundation (#118) shipped the proto surface with handler stubs returning Unimplemented. The admin layer (#173) added AdminTerminateEvent, three admin-gated override fields, session-only registration of the recovery RPC, and requireAdminFromContext. This PR is the next ticket: persistence + the first operator-facing RPC. Builds on #118, #173, and the command-preflight groundwork from #110. Supersedes the closed #141.

Summary

Lands the curtailment persistence layer plus a working read-only PreviewCurtailmentPlan. Preview computes a plan but does not write events, write target rows, dispatch commands, or start reconciliation. The remaining curtailment RPCs continue to return Unimplemented and ship in subsequent tickets.

Six logical commits, layered top-down:

6f7d59c Foundation — migration 000040, sqlc queries, store interface + sqlstore, domain models, FixedKw mode + tests, TestCurtailmentEventStateNumericPins guard for the AdminTerminateEvent validator from feat(curtailment): admin RPC, override fields, and session-only auth #173.
c6d9b9a Selector — pure-logic BuildPlan with the dual-signal filter, ranking, and mode hand-off. 10 unit tests.
b5e20b1 Service + handler — orchestration, the cross-table candidate-aggregation query, proto translators, fleetd startup wiring.
8b2f64e Service tests — 14 cases against an in-memory fake store covering scope resolution, the filter matrix, cooldown vs. EMERGENCY bypass, the maintenance-pair gate, the admin-gated override, cross-tenant isolation, and insufficient-load detail propagation.
0f3cc14 Lint cleanup — golangci exhaustive / gosec / unconvert.
58c90ec … d329ff3 Review-pass fixes — bug fixes from a multi-reviewer pass plus a comment scrub and small simplifications. Detailed below.

What changed

Migration `000040`

curtailment_event — full lifecycle columns plus the admin fields shipped in feat(curtailment): admin RPC, override fields, and session-only auth #173 (allow_unbounded, effective_batch_size). CHECK constraints for the maintenance-pair invariant and non-empty external_source / external_reference / idempotency_key / reason. Partial UNIQUE indexes for idempotency, webhook dedupe, and active-event lookup.
curtailment_target — composite PK (curtailment_event_id, device_identifier). Partial indexes for pending work and active-by-device lookup.
curtailment_reconciler_heartbeat — singleton row with CHECK (id = 1). Seeded so the staleness alert always reads a row.
curtailment_org_config — per-org tunables. One row per existing non-deleted org seeded in the same transaction; down migration drops the table.

Preview pipeline

Service.Preview validates the request → resolves scope (whole-org / device-list; device-set returns Unimplemented) → loads org config → resolves the effective candidate_min_power_w (admin-gated request override > org default) → fetches active-event and cooldown exclusion sets (EMERGENCY priority bypasses cooldown) → classifies candidates with skip-reason attribution → hands the partition to BuildPlan + FixedKw.
BuildPlan is pure (no I/O, no time, no shared state): dual-signal filter (power_w >= candidate_min_power_w AND hash_rate_hs > 0, with distinct skip reasons for phantom-load / dead-monitor / below-threshold) → rank by avg_efficiency descending with unknown-efficiency last → mode hand-off.
FixedKw walks the ranking and produces target-reached / undershoot-tolerated / insufficient-load. tolerance_kw == 0 collapses the undershoot branch.
handlers/curtailment.Handler accepts an optional *Service: nil keeps the existing stub-test contract; populated wires the real implementation. The admin gate from feat(curtailment): admin RPC, override fields, and session-only auth #173 still runs first when the request carries an override field. OutcomeInsufficientLoad maps to InvalidArgument carrying available_kw, requested_kw, tolerance_kw, and per-reason exclusion counts (Connect-RPC error-detail propagation is a future enhancement).
cmd/fleetd/main.go instantiates NewSQLCurtailmentStore + curtailmentDomain.NewService and threads them into the curtailment route registration.
A new domain/curtailment/models package holds boundary types so the rest of the curtailment domain doesn't import sqlc-generated code.

Review-pass fixes

Layered on top of commits 1–4, from a multi-reviewer pass. All P0 / P1 behavior findings addressed; coverage-only items routed to follow-ups.

P0 — New orgs created post-migration no longer break Preview with NotFound. GetOrgConfig is now an INSERT ... ON CONFLICT DO UPDATE upsert that always returns a row with table-level DEFAULTs.
P1 — ListCurtailmentCandidatesByOrg no longer crashes the row scan when a device has no device_status row joined (COALESCE(ds.status::text, ''::text)::text).
P1 — InsufficientLoadDetail per-reason counters (below-threshold / phantom-load / dead-monitor) now populate; BuildPlan tracks each category locally and merges into the rejection detail post-mode.Select.
P1 — Empty DeviceStatus is skipped with SkipStaleTelemetry instead of admitted as eligible.
P1 — Device-list scope with cross-org device_identifiers rejects with NotFound listing the unrecognized IDs instead of returning InsufficientLoad.
P2 — latest_hourly CTE is org-scoped and 24h-bucket-bounded.
P2 — candidate_min_power_w_override = 0 is rejected at the service layer.
P2 — Capability gate (partial). Devices missing driver metadata are skipped with SkipCurtailFullUnsupported. The full plugin-registry-driven check is follow-up work.
P3 — reason_selected is derived from the request strategy via strategyReasonLabel(s) instead of hard-coded.
P3 — Migration's per-org seed excludes soft-deleted organizations.

Operational notes for reviewers

Write-on-read for curtailment_org_config — GetOrgConfig upserts on every Preview call. The conflict arm is a PK-keyed no-op; on first hit for a new org it inserts a row with table DEFAULTs (max_duration_default_sec=14400, candidate_min_power_w=1500, post_event_cooldown_sec=600).
latest_hourly CTE plan — the new shape pulls strictly fewer rows (org-scoped + 24h window).
Migration immutability — 000040 is not yet on main, so this PR edits it in place. The next ticket inherits the post-merge immutability rule.

What is intentionally not in this PR

StartCurtailment, Curtail dispatch, reconciler, and drift handling.
StopCurtailment and staggered restore. The effective_batch_size column lands here; the restorer that populates it lives in a follow-up.
GetActiveCurtailment, ListCurtailmentEvents, UpdateCurtailmentEvent implementations.
CurtailmentActiveFilter registration on commandSvc.
An admin RPC for mutating curtailment_org_config — the table and seed exist; the write surface lands later.
Device-set scope resolution; full plugin-registry-driven curtail_full capability gating (partial driver-metadata gate landed inline); webhook-triggered curtailment; Fleet-level efficiency events; smart PDU / outlet / rack target controls.
Frontend settings page and preview/start flow.
Coverage-only follow-ups from the review pass: handler / translate.go / SQL-integration tests, migration CHECK DB-level test, OutcomeUndershootTolerated end-to-end test, InsufficientLoad printf format pinning, Strategy / Mode / Level drop-or-wire decision, P3 polish.

Test plan

buf lint clean; golangci-lint run clean for the changed packages; sqlc generate clean against the migration; go build ./server/... clean.
go test ./internal/domain/curtailment/... ./internal/handlers/curtailment/... ./internal/domain/activity/models/... — 39+ cases green.
TestCurtailmentEventStateNumericPins asserts the validator pins from feat(curtailment): admin RPC, override fields, and session-only auth #173 (CANCELLED == 6, FAILED == 7).
modes/FixedKw covers all three outcomes, boundary ordering, empty-input guard, and constructor input validation.
selector_test.go covers the dual-signal matrix, pre-filtered forwarding, ranking (worst-efficiency-first, unknown-last, stable equal-efficiency), realized + remaining kW arithmetic, and an end-to-end FixedKw integration.
service_test.go covers (against an in-memory fake store): happy path; request validation; scope resolution; the pre-selector filter matrix; maintenance-pair gate; cooldown vs. EMERGENCY bypass; active-event exclusion; candidate_min_power_w_override precedence; cross-tenant isolation; insufficient-load detail propagation; the partial capability gate; and four review-pass regressions (EmptyDeviceStatusSkipsAsStale, DeviceListScopeRejectsCrossOrgIdentifiers, DualSignalCountersPropagateIntoInsufficientLoadDetail, MissingDriverSkipsAsCurtailFullUnsupported).

Closes #140
Refs #110
Refs #118
Refs #173

🤖 Generated with Claude Code

…-2 foundation) Lays the foundation for the BE-2 ticket (#140): the curtailment persistence layer, sqlc-backed store, domain models, the FIXED_KW mode implementation, and the enum-stability guard for the AdminTerminateEvent validator pinned in BE-1.x (#173). Migration 000040: - curtailment_event with full lifecycle columns plus the BE-1.x admin-only fields (allow_unbounded BOOLEAN, effective_batch_size INT). CHECK constraints enforce maintenance-pair consistency, non-empty external source/reference/idempotency_key, and non-empty reason. Partial UNIQUE indexes cover idempotency, webhook dedupe, and active-event lookup. - curtailment_target with composite PK (event_id, device_identifier), partial indexes for pending work and active-by-device schedule lookup. - curtailment_reconciler_heartbeat singleton seeded at migration time so the staleness alert always has a row to read. - curtailment_org_config with per-org tunables seeded one row per existing org in the same migration transaction; down migration drops the table. Domain layer: - server/internal/domain/curtailment/models defines the boundary shapes (Event, Target, OrgConfig, Heartbeat, EventState/TargetState typed wrappers) so selector/handler/modes do not import sqlc-generated code. - server/internal/domain/curtailment/modes ships the Mode interface and the FixedKw implementation. Pure logic — no I/O, no time, no shared state. Covers the three design-doc outcomes: target reached (overshoot bounded by last-added candidate), undershoot tolerated (only with explicit positive tolerance_kw), and insufficient curtailable load (with a structured InsufficientLoadDetail the handler can echo back). Store layer: - interfaces/curtailment.go defines the org-scoped CurtailmentStore; v1 surface is the minimum needed to support Preview plus the basic event/target CRUD primitives so store tests can verify the schema constraints round-trip. - sqlstores/curtailment.go implements the interface using the sqlc- generated queries (GetCurtailmentOrgConfig, ListActiveCurtailedDevicesByOrg, ListRecentlyResolvedCurtailedDevicesByOrg, InsertCurtailmentEvent, GetCurtailmentEventByUUID, InsertCurtailmentTarget, ListCurtailmentTargetsByEvent, GetCurtailmentReconcilerHeartbeat). BE-1.x guard: - TestCurtailmentEventStateNumericPins asserts CANCELLED == 6 and FAILED == 7 at build time. The AdminTerminateEventRequest validator pins on (buf.validate.field).enum.in: [6, 7]; this test fails CI before any future enum reorder can silently desynchronize the validator. Selector + handler implementation lands in follow-up commits on this branch. Refs #140 Refs #118 Refs #173 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…de hand-off) Pure-logic BuildPlan(inputs, preFiltered, candidateMinPowerW, mode) → Plan. Three responsibilities, separable for testing: 1. Dual-signal filter. Each candidate is admitted only when both power_w >= candidate_min_power_w AND hash_rate_hs > 0. The skip reason vocabulary surfaces the diagnostic flavor: - phantom_load_no_hash: drawing power but not hashing (stuck firmware) - power_telemetry_unreliable: hashing but power reads near zero (broken AC monitor) - below_candidate_min_power_w: both signals fail (likely fully idle) 2. Ranking by avg_efficiency descending — worst J/H first. Devices with nil avg_efficiency rank LAST (not first via a COALESCE-to-zero artifact), so an unranked miner doesn't accidentally get treated as best-in-class. Stable sort preserves input order on ties. 3. Mode hand-off. The ranked list goes to mode.Select; the mode owns the stop condition. The selector maps the Result back to a Plan carrying selected/skipped lists, realized + remaining kW, outcome, and the InsufficientLoadDetail when applicable. Status / pairing / capability / cooldown / active-event filters happen upstream in the service layer before reaching the selector; their skip reasons arrive via the preFiltered argument and forward through unchanged. Tests cover all three dual-signal branches, pre-filtered forwarding, ranking worst-first, unknown-efficiency-last, stable equal-efficiency ordering, realized/remaining kW arithmetic, and an end-to-end FixedKw integration. Insufficient-load propagation includes the selector-supplied exclusion counts so the handler can echo CandidateMinPowerW and per-reason counts back to the caller without a re-query. Refs #140 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires PreviewCurtailmentPlan end-to-end on top of the BE-2 foundation. Service layer (server/internal/domain/curtailment/service.go): - Validates the request (mode/level/target_kw/tolerance + maintenance pair). - Resolves scope: whole_org and device_list pass through; device_set is rejected as Unimplemented (DeviceSetStore wiring is a follow-up). - Loads org config and resolves the effective candidate_min_power_w — per-org default tier or the admin-gated request override (the handler enforces the role gate before the service runs, per BE-1.x contract). - Pulls active-event and cooldown exclusion sets from the store; EMERGENCY priority bypasses cooldown, matching the design doc. - Calls ListCandidates and partitions the cross-table rows into (eligible, preFiltered) with skip-reason attribution: pairing, device_status (UPDATING / REBOOT_REQUIRED / OFFLINE / MAINTENANCE), stale telemetry, cooldown, active-event. The maintenance override pair admits MAINTENANCE-state miners only when both flags are true. - Hands the partition to the existing selector + FixedKw mode, which applies the dual-signal filter, ranks worst-efficiency-first, and produces a Plan with selected/skipped lists, realized + remaining kW, and an InsufficientLoadDetail when the rejection branch fires. Store + sqlc: - ListCurtailmentCandidatesByOrg joins device + latest device_metrics (15-min freshness) + latest device_metrics_hourly + device_pairing + device_status. LEFT JOINs leave nil pointers on absent rows so the service can attribute stale-telemetry skips. - ListCandidates added to CurtailmentStore + SQLCurtailmentStore. - New domain Candidate model carries the cross-table state independent of sqlc-generated types. Handler (server/internal/handlers/curtailment/handler.go): - NewHandler(service *curtailment.Service): the service is optional — nil keeps the existing stub-test contract (PreviewCurtailmentPlan returns Unimplemented), and the production wiring at fleetd startup passes a real service. - PreviewCurtailmentPlan body: admin gate from BE-1.x runs first when the request carries an override field; then session.OrganizationID drives the service call; then the Plan is translated to the proto response. OutcomeInsufficientLoad maps to InvalidArgument with a message body carrying available/requested kW and per-reason exclusion counts (Connect-RPC error-detail propagation is a future enhancement). - translate.go isolates the proto<->service-shape conversion so the service stays decoupled from generated proto types and is testable with plain Go inputs. Wiring (server/cmd/fleetd/main.go): - NewSQLCurtailmentStore + NewService instantiated alongside the other domain services; passed into NewHandler at the curtailment route registration. Existing handler tests pass nil and continue to assert the Unimplemented stub contract for the non-Preview RPCs and the BE-1.x admin-gate behavior; they all still pass after the constructor change. Refs #140 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… / scope / org-isolation matrix 14 cases against an in-memory fake CurtailmentStore. Pins the behaviors the BE-2 acceptance criteria call out plus the cross-tenant isolation invariant. Each case exercises the Preview orchestration without touching the database; the fake captures arguments so tests can assert that the service threads the caller's org_id through every store call. Coverage: - Happy path: 3 ranked miners, FIXED_KW target reached at the second. - Request validation: unsupported mode, unbalanced maintenance pair, zero/negative target. - Scope resolution: device-set returns Unimplemented (deferred); device-list narrows the store query to the supplied identifiers; empty device list is rejected. - Pre-selector filter matrix: each non-eligible device_status / pairing / staleness branch maps to its expected SkipReason; the lone ACTIVE + PAIRED + fresh miner is the only one selected. - Maintenance pair: MAINTENANCE-state miner is admitted only when both include_maintenance and force_include_maintenance are true on the request. - Cooldown: NORMAL priority calls the cooldown lookup with the org's post_event_cooldown_sec; EMERGENCY skips the lookup entirely. - Active-event exclusion: device locked in another non-terminal event is skipped with reason active_event. - candidate_min_power_w override: the request's admin-gated override takes precedence over the per-org default. - Cross-tenant isolation: every store method receives the caller's org_id; the fake captures and asserts no leakage from a populated other-org row set. - Insufficient-load propagation: AvailableKW / RequestedKW / CandidateMinPowerW / per-reason exclusion counts forward into the detail the handler echoes back to the caller. The fake panics on methods Preview does not exercise (InsertEvent, GetEventByUUID, InsertTarget, ListTargetsByEvent, GetHeartbeat) so a regression that newly invokes one surfaces immediately rather than silently consuming a zero value. Refs #140 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-06T17:14:38Z

💡 Codex Review

proto-fleet/server/sqlc/queries/curtailment.sql

Line 188 in 8b2f64e

ds.status::text AS device_status,

Coalesce nullable device status in candidate query

device_status is selected from a LEFT JOIN as ds.status::text, so devices without a status row produce NULL. The generated sqlc row type for this column is string, and scanning a SQL NULL into a Go string fails the whole query. In practice, a single unstatused device causes ListCurtailmentCandidatesByOrg to error and PreviewCurtailmentPlan to return an internal failure instead of a plan.

proto-fleet/server/migrations/000040_create_curtailment.up.sql

Lines 28 to 30 in 8b2f64e

    
           INSERT INTO curtailment_org_config (org_id) 
        
               SELECT id FROM organization 
        
               ON CONFLICT (org_id) DO NOTHING;

Populate curtailment config for organizations created later

This migration only backfills curtailment_org_config rows for organizations that already exist at migration time. Service.Preview hard-depends on GetOrgConfig and returns an error when the row is missing, so any organization created after this migration will be unable to run preview until a manual insert happens. Add an automatic row-creation path (trigger/app write) or a safe fallback on read.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T17:21:17Z

💡 Codex Review

proto-fleet/server/sqlc/queries/curtailment.sql

Line 188 in 88b525d

ds.status::text AS device_status,

Coalesce nullable device_status before scanning

ListCurtailmentCandidatesByOrg selects ds.status::text from a LEFT JOIN, so devices without a device_status row produce NULL; sqlc generated DeviceStatus as a non-null string, which will fail scan at runtime (converting NULL to string is unsupported) and make PreviewCurtailmentPlan return an internal error for affected orgs. This is reachable because device_status is not guaranteed to exist for every device row.

proto-fleet/server/migrations/000040_create_curtailment.up.sql

Lines 28 to 30 in 88b525d

    
           INSERT INTO curtailment_org_config (org_id) 
        
               SELECT id FROM organization 
        
               ON CONFLICT (org_id) DO NOTHING;

Provision curtailment config for newly created orgs

This migration only backfills curtailment_org_config from organizations present at migration time, but does not add any mechanism for future organizations; new org creation (e.g., SQLUserStore.CreateAdminUserWithOrganization) inserts into organization without inserting a matching curtailment config row, so GetOrgConfig returns NotFound and Preview fails for every org created after this migration.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-actions · 2026-05-06T17:22:25Z

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

Reviewed pull request diff only (98904856f189d82db44b62872d317116831526bf...d329ff326c0ed6461811eb6059bfc4cc77952546, exact PR three-dot diff)

Model: gpt-5.4

💡 Click "edited" above to see previous reviews for this PR.

Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] Preview now performs a real config-table write on every read

Category: Reliability
Location: server/internal/domain/stores/sqlstores/curtailment.go:38
Description: GetOrgConfig() now unconditionally calls EnsureCurtailmentOrgConfig(). The new SQL behind that helper uses INSERT ... ON CONFLICT DO UPDATE SET org_id = EXCLUDED.org_id RETURNING ..., so an existing row is updated on every preview request. Because curtailment_org_config also has a BEFORE UPDATE trigger that rewrites updated_at, this is not a no-op read path.
Impact: Every preview hits the primary with a write, creates avoidable WAL/churn, hot-locks the same org row under concurrent previews, and makes updated_at look like the config was modified by read traffic.
Recommendation: Split this into a true read path: SELECT first, and only on ErrNoRows do an INSERT ... ON CONFLICT DO NOTHING followed by SELECT (or use a CTE that writes only when missing).

[MEDIUM] Preview can select miners that the plugin layer cannot actually curtail

Category: Plugin
Location: server/internal/domain/curtailment/service.go:218
Description: The new selector only rejects candidates with missing driver_name. It does not verify that the resolved device/model advertises curtail_full support, and the file itself notes this is still a TODO. A populated driver name is not equivalent to curtailment capability.
Impact: Preview can overstate available curtailable load and return plans containing devices that the plugin boundary will reject later. That makes the new API produce plans that are not executable on mixed-capability fleets.
Recommendation: Gate eligibility on actual curtailment capability, not just driver metadata. Inject a capability provider/registry lookup into preview and skip any candidate that lacks curtail_full.

[MEDIUM] Candidate ordering is nondeterministic for equal-efficiency miners

Category: Correctness
Location: server/sqlc/queries/curtailment.sql:207
Description: The final candidate query has no ORDER BY. In the selector, ties are intentionally left to sort.SliceStable() so input order is preserved for equal/unknown efficiencies, but PostgreSQL does not guarantee row order without an explicit sort.
Impact: Identical preview requests can pick different devices whenever multiple miners have the same hourly efficiency or no efficiency row at all. Because fixed-kW selection stops once the target is met, this can make plans flap across calls and will make future execution/retry behavior non-idempotent.
Recommendation: Add a deterministic tie-breaker before the selector runs, e.g. an ORDER BY on stable fields such as efficiency-preserving rank inputs plus device_identifier, or add an explicit secondary sort key in Go.

Notes

I did not find an auth bypass, SQL injection, command injection, or pool-hijack path in the reviewed hunks. The main risks in this diff are reliability and plan correctness in the new curtailment preview path.

_{Generated by Codex Security Review |

Triggered by: @rongxin-liu |

Review workflow run}

CI's golangci-lint surfaced four issues on #185: - exhaustive: EventState.IsTerminal switch was missing the non-terminal cases. Adding pending/active/restoring as an explicit no-match arm preserves the same return value while satisfying the linter. - exhaustive: priorityName switch hid UNSPECIFIED / NORMAL / HIGH behind the default arm. Listing them explicitly documents that HIGH (rejected by the proto validator before this code runs) collapses to NORMAL. - gosec G115: the uint32 -> int32 cast on candidate_min_power_w_override needed an explicit bounds check. The buf.validate ceiling already caps the value well below int32 max, but a defense-in-depth guard surfaces a clean InvalidArgument if interceptor wiring is ever bypassed instead of silently wrapping. - unconvert: int32(*req.CandidateMinPowerWOverride) in the service was a no-op since the field is already *int32. Drop the conversion.

chatgpt-codex-connector · 2026-05-06T17:31:17Z

💡 Codex Review

proto-fleet/server/sqlc/queries/curtailment.sql

Line 188 in 0f3cc14

ds.status::text AS device_status,

Coalesce nullable device status in candidate query

ListCurtailmentCandidatesByOrg LEFT JOINs device_status, so ds.status is NULL when a device has no status row, but the query selects ds.status::text directly. sqlc generated DeviceStatus as a plain string, so scanning a NULL here will fail at runtime and make PreviewCurtailmentPlan return an internal error for any org containing such devices. Please coalesce this column (or otherwise make the type nullable end-to-end) before scanning.

proto-fleet/server/migrations/000040_create_curtailment.up.sql

Lines 28 to 30 in 0f3cc14

    
           INSERT INTO curtailment_org_config (org_id) 
        
               SELECT id FROM organization 
        
               ON CONFLICT (org_id) DO NOTHING;

Backfill curtailment config for newly created orgs

This migration only inserts curtailment_org_config rows for organizations that already exist at migration time. Organizations created afterward will not get a config row, and GetOrgConfig currently treats that as NotFound, which blocks preview/start flows for those tenants until someone inserts the row manually. Add a creation hook/trigger or lazy upsert path so every new org has defaults automatically.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… dual-signal counters, latest_hourly scope, cross-org guard, capability-meta gate Layered on top of the four logical commits already on this branch, addressing review findings (multi-reviewer pass, with three of the critical items previously flagged on Codex review runs without resolution). P0 - Lazy-upsert curtailment_org_config in GetOrgConfig so orgs created after migration 000040 don't break Preview with NotFound. New EnsureCurtailmentOrgConfig query (INSERT ... ON CONFLICT DO UPDATE) always returns a row with table-level DEFAULTs. P1 - COALESCE(ds.status::text, ''::text)::text in ListCurtailmentCandidatesByOrg so a NULL device_status row no longer crashes the scan; sqlc now generates DeviceStatus as string. - Skip empty-DeviceStatus candidates with SkipStaleTelemetry rather than admitting them as eligible (closes the COALESCE-fallout). - Thread dual-signal exclusion counts through BuildPlan into Result.InsufficientDetail so per-reason counts reach the rejection detail (was always zero for below-threshold/phantom-load/dead-monitor). - Reject device-list scope with cross-org device_identifiers via NotFound listing the unrecognized IDs (org-ownership boundary the BE-2 plan called for). P2 - Org-scope and 24h-bucket the latest_hourly CTE (INNER JOIN device d3 ON ... org_id = $1, WHERE bucket > NOW() - 24h) so Preview is no longer a global TimescaleDB hot spot. - Reject candidate_min_power_w_override < 1 at the service layer as a backstop for callers that bypass the proto validator. - Defense-in-depth capability gate: skip candidates whose discovered_device LEFT JOIN missed (DriverName == nil or empty) with SkipCurtailFullUnsupported. Full plugin-registry check still defers to BE-3 / BE-4. P3 - Derive reason_selected from the request strategy enum via strategyReasonLabel(s) so a future strategy forces the surface to be touched. - Exclude soft-deleted orgs from the migration's per-org seed. Tests: four additive regression tests in service_test.go pinning the new behaviors: - TestService_Preview_EmptyDeviceStatusSkipsAsStale - TestService_Preview_DeviceListScopeRejectsCrossOrgIdentifiers - TestService_Preview_DualSignalCountersPropagateIntoInsufficientLoadDetail - TestService_Preview_MissingDriverSkipsAsCurtailFullUnsupported Verified: go test ./internal/domain/curtailment/... ./internal/handlers/curtailment/... green; golangci-lint 0 issues; go build ./... clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removes BE-1.x / BE-2 / BE-3 / BE-4 / BE-5 references and the related PR/issue numbers from in-code comments across the curtailment surface. Comments now describe behavior in terms of feature work (Start dispatch, restorer, plugin registry, etc.) rather than internal ticket scopes, matching the project convention of keeping process artifacts out of shipped source. Touches: cmd/fleetd/main.go, internal/domain/curtailment/{selector, service,service_test,enum_stability_test}.go, internal/domain/stores/ interfaces/curtailment.go, internal/handlers/curtailment/handler.go, migrations/000040_create_curtailment.up.sql, sqlc/queries/curtailment.sql, and the regenerated server/generated/sqlc/curtailment.sql.go. No behavior change. Tests, lint, and build remain green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58c90ec0ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T19:01:18Z

+-- and the caller can read the effective row in a single round trip.
+INSERT INTO curtailment_org_config (org_id)
+VALUES (sqlc.arg('org_id'))
+ON CONFLICT (org_id) DO UPDATE SET org_id = EXCLUDED.org_id


Replace no-op conflict update with non-mutating ensure

ON CONFLICT ... DO UPDATE SET org_id = EXCLUDED.org_id is not a read-only no-op here: curtailment_org_config has a BEFORE UPDATE trigger (update_curtailment_org_config_updated_at in migration 000040) that rewrites updated_at, so every Preview call performs a real row update. This introduces continuous write amplification on a hot read path and corrupts the semantic meaning of updated_at (it becomes "last preview read" rather than "last config change").

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-06T19:01:18Z

+    AND d.deleted_at IS NULL
+    AND (
+        sqlc.narg('device_identifiers')::text[] IS NULL
+        OR d.device_identifier = ANY(sqlc.narg('device_identifiers')::text[])
+    );


Add ORDER BY to make tie-breaking deterministic

This candidate query returns rows without an ORDER BY, but selector logic relies on stable ordering for equal-efficiency miners (sort.SliceStable preserves input order). Without deterministic SQL ordering, identical inputs can produce different selected miners across calls whenever efficiencies tie near the target boundary, making preview results non-reproducible.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86e78aeff1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T19:07:27Z

+	}
+
+	// Loop ended without hitting target — totalW is the full-set sum.
+	if toleranceW > 0 && totalW >= targetW-toleranceW {


Reject tolerance bands that make zero-load plans succeed

The undershoot branch treats any request with tolerance_kw >= target_kw as successful even when no load is available, because totalW >= targetW - toleranceW becomes true for totalW == 0. With current proto validation (tolerance_kw >= 0 only), a caller can get OutcomeUndershootTolerated and a 200 Preview response with an empty candidate list for a positive target, which is a false-success plan. Please add a guard (for example reject tolerance_kw > target_kw, or otherwise prevent zero/near-zero availability from entering the tolerated-success path).

Useful? React with 👍 / 👎.

…tes allocation Simplification pass over the curtailment surface — comment trims plus one small allocation fix. No behavior change. Comments - service.go: shorten the cross-org-guard rationale and the capability-gate TODO; per-PR depth lives in the ticket, not the source. - selector.go: replace the 20-line BuildPlan doc that re-narrated the function body with a 6-line summary. Kept the load-bearing "pure: no time, no I/O, no shared state" invariant. - modes/mode.go: trim per-Outcome doc comments; keep the non-obvious "small overshoot is unavoidable since miners are atomic" note. Code - translate.go: collapse the levelName two-if-same-return into a single OR-check. - service.go: pre-allocate skipped with len(cands) capacity in classifyCandidates so a low-power fleet doesn't re-grow the slice on every iteration. Verified: go test ./internal/domain/curtailment/... ./internal/handlers/curtailment/... green; golangci-lint 0 issues; go build ./... clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d329ff326c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T19:20:27Z

+		Outcome:            OutcomeInsufficientLoad,
+		Selected:           nil,
+		RealizedReductionW: totalW,
+		InsufficientDetail: &detail,


Report zero reduction when insufficient load selects no miners

In the insufficient-load branch, Selected is empty but RealizedReductionW is set to totalW, which violates the mode contract that realized reduction is the sum of selected candidates. This makes downstream Plan math contradictory (EstimatedReductionKW can be non-zero while zero miners are selected, and remaining power becomes 0), which can corrupt diagnostics for any caller using Service.Preview directly or if the handler later returns plans for insufficient cases.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-06T19:20:27Z

+        sqlc.narg('device_identifiers')::text[] IS NULL
+        OR d.device_identifier = ANY(sqlc.narg('device_identifiers')::text[])


Treat empty device identifier filters as whole-org scope

The filter condition only treats NULL as whole-org scope; an empty array ([]) falls into ANY('{}'), which matches nothing. This conflicts with the store contract that nil/empty should both mean whole-org and can silently produce empty candidate sets (and false insufficient-load results) when callers pass an empty slice instead of nil.

Useful? React with 👍 / 👎.

rongxin-liu · 2026-05-06T20:43:22Z

Closing in favor of a fresh PR off the same branch. Earlier iterations accumulated stale review feedback (now resolved in code) that adds noise to ongoing review. Re-opening with the same diff and PR description on the same branch; will link the replacement here once it's up.

rongxin-liu · 2026-05-06T20:43:44Z

Replacement: #188. Same branch, same diff, same description — fresh review thread.

rongxin-liu and others added 4 commits May 6, 2026 18:16

Copilot AI review requested due to automatic review settings May 6, 2026 17:09

rongxin-liu requested a review from a team as a code owner May 6, 2026 17:09

github-actions Bot added the server label May 6, 2026

github-actions Bot assigned rongxin-liu May 6, 2026

Copilot started reviewing on behalf of rongxin-liu May 6, 2026 17:10 View session

Regenerated codes

88b525d

rongxin-liu and others added 2 commits May 6, 2026 20:54

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

rongxin-liu closed this May 6, 2026

		sqlc.narg('device_identifiers')::text[] IS NULL
		OR d.device_identifier = ANY(sqlc.narg('device_identifiers')::text[])

Conversation

rongxin-liu commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Summary

What changed

Migration 000040

Preview pipeline

Review-pass fixes

Operational notes for reviewers

What is intentionally not in this PR

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 6, 2026

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot commented May 6, 2026

💡 Codex Review

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔐 Codex Security Review

Review Summary

Findings

[MEDIUM] Preview now performs a real config-table write on every read

[MEDIUM] Preview can select miners that the plugin layer cannot actually curtail

[MEDIUM] Candidate ordering is nondeterministic for equal-efficiency miners

Notes

Uh oh!

chatgpt-codex-connector Bot commented May 6, 2026

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

rongxin-liu commented May 6, 2026

Uh oh!

rongxin-liu commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rongxin-liu commented May 6, 2026 •

edited

Loading

Migration `000040`

github-actions Bot commented May 6, 2026 •

edited

Loading