fix(self-hosted): improve support for bigquery backend#3488
Draft
aantti wants to merge 4 commits into
Draft
Conversation
Author
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For additional context, see supabase/supabase#44104.
Fixed
1.
supabase_mode_source_schemas_updated?always returnsfalsewhen sources existRegression introduced via PR #2001 ("feat: BigQuery V1-V2 pipeline compat adjustments", commit
6cd81cebf). Pre-regression line wasEnum.all?(checks) and not Enum.empty?(sources).The fix is slightly stricter than the pre-regression code, which passed when no schemas had been cached yet. The new
not Enum.empty?(checks)guard catches that case explicitly. Made safe by the seed-task throttle fix.2.
update_supabase_source_schemas/0casts to a non-existent process namelib/logflare/single_tenant.ex:330-331:Observable symptom: BQ tables seeded at boot stay at the 3 default columns (
timestamp | id | event_message) instead of being enriched frompriv/supabase/ingest_samples/<source>.json.Fix: use the 3-arity name + a synchronous variant so the seed task actually waits for the BQ patch and the PG
source_schemasinsert to complete. AddsSchema.update_sync/3with a matchinghandle_call, an explicitCachex.delonSourceSchemas.Cachefor each seeded source (in case the WAL → CacheBuster path isn't connected in self-hosted), atry/catch :exitso one missing supervisor doesn't fail the whole batch, and a 60sTask.await_manytimeout to accommodate BQ latency under retry backoff.3. Logger config violates OTP's
:logger_olpinvariantconfig/prod.exsafter #3455 (commitc93825683, 2026-05-14):Visible signature:
Fix: add
flush_qlen: 20_000so the invariant holds while preserving the high-throughput intent of the original tuning.4. Rate-limit cascade orphans BQ tables on cold boot
create_supabase_sources/0creates 9 source rows in a tight loop; each auto-starts a source supervisor that firesinit_table!in aTasks.start_child. The 9 concurrentcreate_dataset+create_tablecalls breach BigQuery's "5 dataset metadata-update operations per dataset per 10 seconds" quota; 4 typically fail withrateLimitExceeded.init_table!logs the error and exits without retry, leaving 4 sources without BQ tables. Studio'slogs.allCTE then can't validate.Existing workaround is to restart the analytics container - the dataset already exists, so the second boot's
create_tablecalls don't hit the metadata-update quota path. RedChops's supabase/supabase#44104 thread describes this state.Fix: wrap
create_datasetandcreate_tableinsideinit_table!/6with exponential backoff + jitter, up to 5 attempts. Detects rate-limit specifically via the Tesla error body (rate limit/rateLimitExceeded/quota); non-rate-limit errors propagate unchanged. Backoff schedule: ~1s, 2s, 4s, 8s base with up to +50% jitter; total sleep time ranges from ~15s to ~22s - comfortably outlasts BQ's 10s quota window.5. Sync seed task is silenced when racing log traffic poisons the throttle timer
Schema.handle_cast/2gates schema updates onschema_needs_update?/3, which respects a per-sourcenext_updatetimer that gets pushed forward inlog_error_and_update_state/3after a failedpatch_tablecall. When Vector (or any client) posts a log event to the freshly-open Bandit port beforeinit_table!has finished creating the source's BQ table, the resulting 404 frompatch_tablepushesnext_updateseveral seconds into the future. The subsequentupdate_supabase_source_schemas/0seed call then entershandle_cast(via the newhandle_callwrapper) but is short-circuited by the stale timer - the seed returns:okbut the schema is never enriched. That source stays at 3 fields, theEnum.all?health predicate fails forever.Observable signature: a
[warning] Source schema update error!log line during startup, followed by one source stuck atlength(bigquery_schema.fields) == 3even after seeding "completes."/healthreturns 503 indefinitely.Fix: force
next_update: 0in thehandle_callwrapper so the sync seed path always reachespatch_bigquery_table, regardless of prior throttle state. The existinghandle_castpath used by real log ingestion is unchanged and still honours the throttle.6.
postgres.logsseed missing fieldsStudio's PG-logs queries reference
parsed.{application_name,detail,hint,query}, but the seed sample omits them - so the self-hosted BQ schema never grew those columns and Studio's Postgres/Cron tabs returned 400 at query-parse time. Added the missing fields topriv/supabase/ingest_samples/postgres.logs.json.7.
BQ → PGtranslator chokes onLIKEagainst JSONB columnsStudio's pg_cron logs chart query 500s on self-hosted with the Postgres analytics backend because
event_message LIKE '%cron job%'references the CTE-projected JSONB column, and PostgreSQL has nojsonb ~~ textoperator.The dialect translator already handles this for
REGEXP_CONTAINS(cast LHS to text). Apply the same treatment toLike/ILikeAST nodes.Verified by translating the actual failing Studio query:
WHERE event_message LIKE '%cron job%'→ jsonb ~~ unknownWHERE event_message::TEXT LIKE '%cron job%'→ valid PGIncludes a regression test mirroring
casts CTE identifiers to text when used with regex operators.Related work in supabase/supabase
depends_onforanalytics.