Skip to content

ClickHouse usage analytics: events/gauges tables with daily MV#3

Open
lohanidamodar wants to merge 94 commits intomainfrom
claude/rebuild-analytics-clickhouse-OHWGZ
Open

ClickHouse usage analytics: events/gauges tables with daily MV#3
lohanidamodar wants to merge 94 commits intomainfrom
claude/rebuild-analytics-clickhouse-OHWGZ

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

@lohanidamodar lohanidamodar commented Mar 14, 2026

Summary

Complete rewrite of the usage analytics library with a two-table architecture optimized for both real-time analytics and billing.

Architecture

  • Events table (MergeTree) — raw request events with dedicated columns for path, method, status, resource, resourceId, country (LowCardinality), userAgent
  • Gauges table (MergeTree) — simple resource snapshots (metric, value, time, tags)
  • Daily MV (SummingMergeTree) — pre-aggregates events by metric + tenant + day for fast billing
  • No periods — query-time aggregation via toStartOfHour/toStartOfDay instead of write-time fan-out
  • Single write path — collect(metric, value, type, tags) routes to correct table; event columns auto-extracted from tags

Key Changes

  • Two separate tables instead of one — events have 7 extra columns gauges don't need
  • Plain MergeTree for both tables — raw appends, query-time aggregation
  • Daily MV with minimal schema (metric, value, time, tenant)
  • LowCardinality(String) for country column
  • Bloom filter indexes on all filterable columns
  • String tenant — setTenant(?string)
  • Utopia Query for all read operations — parameterized queries, no SQL injection risk

API

Write

  • collect(metric, value, type, tags) — buffer with auto-flush
  • addBatch(metrics, type) — direct batch insert
  • flush() — write buffered metrics

Read

  • find(queries, type) / count(queries, type) / sum(queries, attr, type)
  • getTotal(metric, queries, type) — SUM for events, argMax for gauges
  • getTotalBatch(metrics, queries, type) — batch totals
  • getTimeSeries(metrics, interval, start, end, queries, zeroFill, type)

Billing (Daily MV)

  • findDaily(queries) / sumDaily(queries) / sumDailyBatch(metrics, queries)

Test Plan

  • Unit tests for Metric schema, getters, validation
  • Integration tests for ClickHouse and Database adapters
  • PHPStan level max passing
  • Linter passing
  • Security audit — no SQL injection vulnerabilities
  • CI green (CodeQL, Tests, Linter)

- Database adapter
- ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter.
- Introduced new Query class for building ClickHouse queries with fluent interface.
- Added support for advanced query operations including find and count methods.
- Enhanced error handling and SQL injection prevention mechanisms.
- Created comprehensive usage guide for ClickHouse adapter.
- Added unit tests for Query class to ensure functionality and robustness.
- Maintained backward compatibility with existing methods while improving overall architecture.
lohanidamodar added a commit that referenced this pull request Apr 7, 2026
- Refactor Metric class from mutable ArrayObject to immutable DTO using
  constructor property promotion with readonly properties (review #5)
- Add Metric::fromArray() factory for constructing from associative arrays
- Store extra dimensions (path, method, status, etc.) in extras map
- Remove setUseFinal option; always use FINAL in ClickHouse queries (review #6)
- Remove unused Metric import from ClickHouse adapter
- Update all Metric tests to use new constructor and fromArray() factory

Review items already addressed in prior commits: projectId->tenant (#1),
region removal (#2), no purgeExpired on facade (#3), no HTTP log methods
on Metric (#4), composer test script (#7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lohanidamodar and others added 2 commits April 8, 2026 03:11
… aggregation

- Replace SummingMergeTree + ReplacingMergeTree with single MergeTree table
- Remove period column and fan-out; add type column (event/gauge)
- Raw append with UUID IDs instead of deterministic dedup
- Add SummingMergeTree materialized view for billing (events only)
- Query-time aggregation: SUM for events, argMax for gauges
- New methods: addBatch, getTimeSeries, getTotal, getTotalBatch
- Remove: increment, set, incrementBatch, setBatch, all *ByPeriod* methods
- Single collect() method with type parameter replaces collect/collectSet

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Usage::TYPE_EVENT and Usage::TYPE_GAUGE constants
- Use constants in validation and type comparisons
- Add SummingMergeTree daily aggregation MV for events (toStartOfDay)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lohanidamodar lohanidamodar force-pushed the claude/rebuild-analytics-clickhouse-OHWGZ branch from 3b32515 to 15112a0 Compare April 8, 2026 04:47
- Change tenant type from ?int to ?string everywhere (Adapter, Usage,
  ClickHouse, Database, Metric)
- ClickHouse tenant column: Nullable(String) instead of Nullable(UInt64)
- Fix tenant key mismatch: validateMetricsBatch now checks '$tenant'
  matching resolveTenantFromMetric
- Fix MV GROUP BY: conditional on sharedTables (no tenant column when
  sharedTables=false)
- Fix billing/daily target tables: conditional tenant column and ORDER BY
- Add collect() validation: empty metric name and negative value checks
- Fix ltrim() misuse in buildWhereClause: getTenantFilter now returns
  bare condition without ' AND ' prefix
- Fix SQL.php: replace 'Audit' references with 'Usage', remove unused
  Database import
- Fix parseQueries: use Int64 param type for value attribute instead of
  String
- Rewrite all tests (UsageBase, ClickHouseTest, MetricTest) for new API:
  replace increment/set/period-based methods with addBatch/collect/
  getTotal/getTotalBatch/getTimeSeries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lohanidamodar and others added 14 commits April 8, 2026 05:26
- Remove $useFinal property and setUseFinal() (MergeTree doesn't support FINAL)
- Remove buildTableReference $useFinal param
- Fix resolveTenantFromMetric mixed type handling
- Remove unreachable branch in Database::getTotal()
- Remove always-true count check in addBatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove separate billing table and MV (monthly aggregation)
- Daily MV uses same column definitions as source table
- Billing queries use daily table (SUM over daily aggregated rows)
- Only events are pre-aggregated; gauges query raw table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the single MergeTree table into two separate tables:
- Events table with dedicated columns for path, method, status, resource, resourceId
- Gauges table with simple metric/value/time/tags schema

Event-specific columns are automatically extracted from tags during addBatch.
The daily SummingMergeTree MV now aggregates by metric, resource, resourceId.
All read methods accept an optional $type parameter to target specific tables,
with null querying both tables transparently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query the pre-aggregated daily SummingMergeTree table for fast
billing/analytics instead of scanning raw events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- country: LowCardinality(Nullable(String)) for efficient low-cardinality storage
- userAgent: Nullable(String) with bloom filter index
- Both extracted from tags into dedicated columns like other event fields
- Added getCountry() and getUserAgent() getters on Metric

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Daily table only has metric, value, time, resource, resourceId, tenant.
No path/status/userAgent/country/tags — those don't aggregate meaningfully.
MV groups by metric, resource, resourceId, tenant, day.
ORDER BY includes resource and resourceId for efficient billing queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Metric::EVENT_COLUMNS to extract all event columns from tags
instead of hardcoding the list. Now country and userAgent are properly
stored in dedicated columns instead of being left in tags JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Daily table now only has metric, value, time, tenant.
One row per metric per project per day — optimal for billing.
Resource-level breakdown queries the raw events table directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single query with GROUP BY metric for summing multiple metrics
from the daily table. Returns array<string, int>.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +1968 to +1974
if ($eventTotal > 0 && $gaugeTotal > 0) {
// A metric shouldn't be in both tables; return whichever is nonzero
// In practice, callers specify type for ambiguous cases
return $eventTotal + $gaugeTotal;
}

return $eventTotal > 0 ? $eventTotal : $gaugeTotal;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getTotal() sums semantically incompatible aggregations

The inline comment says "return whichever is nonzero" but the code actually returns $eventTotal + $gaugeTotal. These two values are computed with fundamentally different aggregation semantics — SUM(value) for events and argMax(value, time) for gauges — so adding them together produces a number with no valid meaning.

In the common case where a metric name only exists in one table this is harmless (one of the two is 0). But if a metric name were ever present in both tables (e.g. during a data migration or a misconfigured adapter), callers would silently receive a corrupted total without any warning.

Consider throwing (or at least returning the dominant value) instead:

if ($eventTotal > 0 && $gaugeTotal > 0) {
    // Metric exists in both tables — this is unexpected; callers should specify $type.
    throw new \RuntimeException(
        "Metric '{$metric}' found in both events and gauges tables. Specify \$type explicitly."
    );
}

Or, if silent fallback is preferred, at least fix the comment to match what the code does.

foreach ($json['data'] as $row) {
$metricName = $row['metric'] ?? '';
$bucketTime = $row['bucket'] ?? '';
$value = (int) ($row['agg_value'] ?? 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getTimeSeriesFromTable() truncates float bucket values to int

$value = (int) ($row['agg_value'] ?? 0) will silently truncate any fractional part returned by ClickHouse. For gauge metrics the aggregation expression is argMax(value, time), and for event metrics it is SUM(value). Large event sums can exceed the range of a PHP int on 32-bit hosts, and gauge values can legitimately be fractional (e.g. storage in fractional GB) if the schema ever changes to Float64.

The declared return type in the docblock is array{total: int, data: array<array{value: int, date: string}>}, which locks in int — but the abstract Adapter interface at the getTimeSeries() level does not restrict callers to integer-only metrics. Consider casting to (float) internally and widening the return type, or at minimum documenting the truncation explicitly so future schema changes surface the discrepancy immediately.

@lohanidamodar lohanidamodar changed the title Add ClickHouse adapter and Usage analytics library ClickHouse usage analytics: events/gauges tables with daily MV Apr 8, 2026
lohanidamodar and others added 4 commits April 8, 2026 08:00
Updated for events/gauges split, event-specific columns,
daily MV, query-time aggregation, billing methods, and
complete API reference with examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…country/userAgent

- Remove stale 'type' key from addBatch() @param array shape in Usage.php, Adapter.php, Database.php
- Fix mixed-to-string cast in ClickHouse.php event column extraction with type-safe checks
- Reduce path size from 1024 to 255 and userAgent size from 512 to 255 in Metric::getEventSchema() to stay within MySQL 768-byte index limit
- Update MetricTest assertions: 11 attributes, 9 indexes, 7 EVENT_COLUMNS
- Update ClickHouseTest: userAgent/country are now event columns, not tags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add UsageQuery class extending Query with a custom groupByInterval method
that enables time-bucketed aggregated queries. When present in the queries
array, the ClickHouse adapter switches from raw row returns to aggregated
results grouped by time bucket (SUM for events, argMax for gauges).

Supported intervals: 1m, 5m, 15m, 1h, 1d, 1w, 1M.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +1727 to +1728
$parsed = $this->parseQueries($queries, Usage::TYPE_EVENT);
$whereData = $this->buildWhereClause($parsed['filters'], $parsed['params']);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Event-specific column filters silently break findDaily/sumDaily/sumDailyBatch

parseQueries($queries, Usage::TYPE_EVENT) validates query attributes against the full event schema (which includes path, method, status, resource, resourceId, userAgent), so no exception is thrown client-side. However the daily table only contains metric, value, time, and optionally tenant. A caller passing Query::equal('path', '/v1/...') produces WHERE \path` = {param_0:String}` against the daily table, causing a ClickHouse "No such column 'path'" server error at runtime.

The same issue affects sumDaily() (line 1761) and sumDailyBatch() (line 1802). A lightweight fix is to validate the attribute name against the daily table's actual column set before building the clause:

private const DAILY_TABLE_COLUMNS = ['metric', 'value', 'time', 'tenant'];

// in findDaily/sumDaily/sumDailyBatch, before parseQueries:
foreach ($queries as $query) {
    $attr = $query->getAttribute();
    if ($attr !== '' && !in_array($attr, self::DAILY_TABLE_COLUMNS, true)) {
        throw new \InvalidArgumentException(
            "Column '{$attr}' does not exist in the daily table. Allowed: "
            . implode(', ', self::DAILY_TABLE_COLUMNS)
        );
    }
}

Comment on lines +304 to +315
if ($type === 'event') {
// Additive: sum values for the same metric
if (isset($this->buffer[$key])) {
$this->buffer[$key]['value'] += $value;
} else {
$this->buffer[$key] = [
'metric' => $metric,
'value' => $value,
'type' => $type,
'tags' => $tags,
];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 collect() silently drops event tags on repeated same-metric calls, corrupting path/method attribution

The buffer key is $metric . ':' . $type. When two events for the same metric name arrive with different tags — e.g. collect('bandwidth', 100, 'event', ['path' => '/v1/files']) followed by collect('bandwidth', 200, 'event', ['path' => '/v1/functions']) — only the value is summed; the second call's tags are silently discarded. The single ClickHouse row written at flush time carries path='/v1/files' and value=300, misattributing 200 bytes to the wrong endpoint.

Given that the system has dedicated path, method, status, resource, and resourceId columns specifically for per-path analytics, this aggregation strategy loses the metadata granularity that the schema was designed to preserve. Either:

  • Use addBatch() directly (bypassing the buffer) when per-event tag fidelity matters, or
  • Change the buffer key to include a hash of the tags so that events with distinct contexts remain separate entries.

Query::parse() uses static::isMethod() which allows UsageQuery
to extend the valid method list. Without this override, parsing
'groupByInterval("time","1h")' throws "Invalid query method".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants