Skip to content

fix(self-hosted): improve support for bigquery backend#3488

Draft
aantti wants to merge 4 commits into
Logflare:mainfrom
aantti:fix/self-hosted-bq
Draft

fix(self-hosted): improve support for bigquery backend#3488
aantti wants to merge 4 commits into
Logflare:mainfrom
aantti:fix/self-hosted-bq

Conversation

@aantti

@aantti aantti commented May 16, 2026

Copy link
Copy Markdown

Summary

For additional context, see supabase/supabase#44104.

Fixed

1. supabase_mode_source_schemas_updated? always returns false when sources exist

Regression introduced via PR #2001 ("feat: BigQuery V1-V2 pipeline compat adjustments", commit 6cd81cebf). Pre-regression line was Enum.all?(checks) and not Enum.empty?(sources).

The fix is slightly stricter than the pre-regression code, which passed when no schemas had been cached yet. The new not Enum.empty?(checks) guard catches that case explicitly. Made safe by the seed-task throttle fix.

2. update_supabase_source_schemas/0 casts to a non-existent process name

lib/logflare/single_tenant.ex:330-331:

Backends.via_source(source, Schema)
|> Schema.update(log_event, source)

Observable symptom: BQ tables seeded at boot stay at the 3 default columns (timestamp | id | event_message) instead of being enriched from priv/supabase/ingest_samples/<source>.json.

Fix: use the 3-arity name + a synchronous variant so the seed task actually waits for the BQ patch and the PG source_schemas insert to complete. Adds Schema.update_sync/3 with a matching handle_call, an explicit Cachex.del on SourceSchemas.Cache for each seeded source (in case the WAL → CacheBuster path isn't connected in self-hosted), a try/catch :exit so one missing supervisor doesn't fail the whole batch, and a 60s Task.await_many timeout to accommodate BQ latency under retry backoff.

3. Logger config violates OTP's :logger_olp invariant

config/prod.exs after #3455 (commit c93825683, 2026-05-14):

config :logger, :default_handler,
  config: %{
    sync_mode_qlen: 10_000,
    drop_mode_qlen: 10_000
  }

Visible signature:

Could not attach default Logger handler: {:handler_not_added, {:invalid_olp_levels, %{drop_mode_qlen: 10000, flush_qlen: 1000, sync_mode_qlen: 10000}}}

Fix: add flush_qlen: 20_000 so the invariant holds while preserving the high-throughput intent of the original tuning.

4. Rate-limit cascade orphans BQ tables on cold boot

create_supabase_sources/0 creates 9 source rows in a tight loop; each auto-starts a source supervisor that fires init_table! in a Tasks.start_child. The 9 concurrent create_dataset + create_table calls breach BigQuery's "5 dataset metadata-update operations per dataset per 10 seconds" quota; 4 typically fail with rateLimitExceeded. init_table! logs the error and exits without retry, leaving 4 sources without BQ tables. Studio's logs.all CTE then can't validate.

Existing workaround is to restart the analytics container - the dataset already exists, so the second boot's create_table calls don't hit the metadata-update quota path. RedChops's supabase/supabase#44104 thread describes this state.

Fix: wrap create_dataset and create_table inside init_table!/6 with exponential backoff + jitter, up to 5 attempts. Detects rate-limit specifically via the Tesla error body (rate limit / rateLimitExceeded / quota); non-rate-limit errors propagate unchanged. Backoff schedule: ~1s, 2s, 4s, 8s base with up to +50% jitter; total sleep time ranges from ~15s to ~22s - comfortably outlasts BQ's 10s quota window.

5. Sync seed task is silenced when racing log traffic poisons the throttle timer

Schema.handle_cast/2 gates schema updates on schema_needs_update?/3, which respects a per-source next_update timer that gets pushed forward in log_error_and_update_state/3 after a failed patch_table call. When Vector (or any client) posts a log event to the freshly-open Bandit port before init_table! has finished creating the source's BQ table, the resulting 404 from patch_table pushes next_update several seconds into the future. The subsequent update_supabase_source_schemas/0 seed call then enters handle_cast (via the new handle_call wrapper) but is short-circuited by the stale timer - the seed returns :ok but the schema is never enriched. That source stays at 3 fields, the Enum.all? health predicate fails forever.

Observable signature: a [warning] Source schema update error! log line during startup, followed by one source stuck at length(bigquery_schema.fields) == 3 even after seeding "completes." /health returns 503 indefinitely.

Fix: force next_update: 0 in the handle_call wrapper so the sync seed path always reaches patch_bigquery_table, regardless of prior throttle state. The existing handle_cast path used by real log ingestion is unchanged and still honours the throttle.

6. postgres.logs seed missing fields

Studio's PG-logs queries reference parsed.{application_name,detail,hint,query}, but the seed sample omits them - so the self-hosted BQ schema never grew those columns and Studio's Postgres/Cron tabs returned 400 at query-parse time. Added the missing fields to priv/supabase/ingest_samples/postgres.logs.json.

7. BQ → PG translator chokes on LIKE against JSONB columns

Studio's pg_cron logs chart query 500s on self-hosted with the Postgres analytics backend because event_message LIKE '%cron job%' references the CTE-projected JSONB column, and PostgreSQL has no jsonb ~~ text operator.

The dialect translator already handles this for REGEXP_CONTAINS (cast LHS to text). Apply the same treatment to Like / ILike AST nodes.

Verified by translating the actual failing Studio query:

  • before: WHERE event_message LIKE '%cron job%' → jsonb ~~ unknown
  • after: WHERE event_message::TEXT LIKE '%cron job%' → valid PG

Includes a regression test mirroring casts CTE identifiers to text when used with regex operators.

Related work in supabase/supabase

  • Vector should have depends_on for analytics.
  • Extend Vector configuration.

Comment thread lib/logflare/single_tenant.ex Outdated
@aantti aantti removed their assignment May 16, 2026
@aantti

aantti commented May 16, 2026

Copy link
Copy Markdown
Author

@Ziinc, @djwhitt fyi

@Ziinc Ziinc marked this pull request as draft May 19, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants