|
| 1 | +# Resource analytics plan |
| 2 | + |
| 3 | +This document describes **general resource analytics** for TechDiary: view events and time-series insights for any trackable `(resource_type, resource_id)`, not only articles. It extends the intent of [GitHub issue #114](https://github.com/techdiary-dev/techdiary.dev/issues/114) and the writer-analytics line item in `comeback-strategy.md`. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +- Give **owners** visibility into **reach** (views over time) for content they control. |
| 10 | +- Reuse the same **polymorphic resource keys** the app already uses for reactions, bookmarks, and comments (`resource_type` + `resource_id`). |
| 11 | +- Keep **heavy write traffic** off PostgreSQL by storing view events in **ClickHouse** (or an equivalent columnar/analytics store). |
| 12 | + |
| 13 | +Non-goals for v1: |
| 14 | + |
| 15 | +- Real-time public view counters on every page. |
| 16 | +- Analytics for resources the current user does not own (except future admin tooling). |
| 17 | +- Perfect “read time” without client instrumentation (optional later via scroll/time signals). |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Product surface |
| 22 | + |
| 23 | +- **Dashboard (or editor)** entry: “Insights” / “Analytics” for a resource the user owns. |
| 24 | +- **Summary cards:** **impressions** (every append / page load in range) and **unique viewers** (`uniq(session_id)` in range); optional all-time headlines from ClickHouse. |
| 25 | +- **Time-series chart:** primary line = **unique viewers per day**; optional second series or toggle = **impressions per day** (see [Metrics](#metrics-impressions-vs-unique-viewers)). Range **7d / 30d / 90d** (default **30d** per #114). |
| 26 | +- **Engagement from Postgres:** reaction count and bookmark count where the schema supports that `resource_type` (today: **ARTICLE**, **GIST** for reactions; bookmarks per existing `bookmarks` rules). |
| 27 | + |
| 28 | +First consumer: **published article** detail views. Second: **gist** public views, using the same pipeline. |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Data model (ClickHouse) |
| 33 | + |
| 34 | +Single events table keyed by resource, not by article only. |
| 35 | + |
| 36 | +```sql |
| 37 | +CREATE TABLE resource_views ( |
| 38 | + resource_type LowCardinality(String), -- e.g. ARTICLE, GIST (allowlist) |
| 39 | + resource_id UUID, |
| 40 | + viewer_id Nullable(UUID), -- NULL when anonymous |
| 41 | + session_id String, -- stable anonymous bucket (cookie / hash) |
| 42 | + referrer Nullable(String), |
| 43 | + country_code Nullable(String), -- optional, from geo later |
| 44 | + viewed_at DateTime DEFAULT now() |
| 45 | +) |
| 46 | +ENGINE = MergeTree() |
| 47 | +PARTITION BY toYYYYMM(viewed_at) |
| 48 | +ORDER BY (resource_type, resource_id, viewed_at); |
| 49 | +``` |
| 50 | + |
| 51 | +**Deduped “views” metric (v1):** `uniq(session_id)` per calendar day per resource, for a chosen window: |
| 52 | + |
| 53 | +```sql |
| 54 | +SELECT |
| 55 | + toDate(viewed_at) AS date, |
| 56 | + uniq(session_id) AS views |
| 57 | +FROM resource_views |
| 58 | +WHERE resource_type = {type:String} |
| 59 | + AND resource_id = {id:UUID} |
| 60 | + AND viewed_at >= now() - INTERVAL 30 DAY |
| 61 | +GROUP BY date |
| 62 | +ORDER BY date; |
| 63 | +``` |
| 64 | + |
| 65 | +Adjust interval via app-level parameter (7 / 30 / 90 / custom). |
| 66 | + |
| 67 | +### Metrics: impressions vs unique viewers |
| 68 | + |
| 69 | +Append-on-each-load stores **one row per event**. That supports **two KPIs from the same table** without changing ingestion: |
| 70 | + |
| 71 | +| Metric | Meaning | Per-day aggregation | |
| 72 | +|--------|---------|---------------------| |
| 73 | +| **Impressions** | Total times the page was loaded and sent an event (includes refreshes, repeat loads same session). | `count()` | |
| 74 | +| **Unique viewers** | Distinct sessions that saw the resource at least once that day. | `uniq(session_id)` | |
| 75 | + |
| 76 | +**Single query — daily series for both** (same filter as above): |
| 77 | + |
| 78 | +```sql |
| 79 | +SELECT |
| 80 | + toDate(viewed_at) AS date, |
| 81 | + count() AS impressions, |
| 82 | + uniq(session_id) AS unique_viewers |
| 83 | +FROM resource_views |
| 84 | +WHERE resource_type = {type:String} |
| 85 | + AND resource_id = {id:UUID} |
| 86 | + AND viewed_at >= now() - INTERVAL 30 DAY |
| 87 | +GROUP BY date |
| 88 | +ORDER BY date; |
| 89 | +``` |
| 90 | + |
| 91 | +**Range totals** (summary cards): `sum(impressions)` and `sum(unique_viewers)` over the returned daily rows, or a separate aggregate query with the same `WHERE` and `count()` / `uniq(session_id)` over the whole window (note: `uniq` across the whole window is not the same as summing daily uniques — use **one global `uniq(session_id)`** for “unique viewers in period” if you want strict uniques; use **sum of daily uniques** if you want “approximate reach by day” stacked; document which you show). |
| 92 | + |
| 93 | +**Recommendation for v1 summary:** report **unique viewers in period** as `uniq(session_id)` with the date filter only (one number), and **impressions in period** as `count()` with the same filter. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Ingestion |
| 98 | + |
| 99 | +### HTTP API |
| 100 | + |
| 101 | +- **Route:** `POST /api/analytics/view` (or `pageview` if you prefer naming parity with #114). |
| 102 | +- **Body (JSON):** `{ "resource_type": "ARTICLE", "resource_id": "<uuid>", "session_id": "<string>" }`. |
| 103 | +- **Behavior:** validate allowlist + UUID, insert one row, return **204** quickly. Optionally use **wait_end_of_query: false** (or fire-and-forget pattern) so the client never blocks on ClickHouse latency. |
| 104 | + |
| 105 | +### Client |
| 106 | + |
| 107 | +- **Fire-and-forget** `fetch` on the **public** resource page (article read, gist read) inside a small client component mounted once per page. |
| 108 | +- **Session id:** anonymous stable id from a long-lived cookie or `localStorage` + fallback; if logged in, still send `session_id` for dedupe and optionally set `viewer_id` from server-side context if the route chooses to enrich (or leave enrichment to a later iteration). |
| 109 | + |
| 110 | +### Allowlist |
| 111 | + |
| 112 | +Reject unknown `resource_type` values at the API to avoid garbage dimensions. Start with: `ARTICLE`, `GIST`. Expand deliberately (e.g. `SERIES`) when there is a clear owner and a public URL. |
| 113 | + |
| 114 | +### Abuse / noise |
| 115 | + |
| 116 | +- Rate limit by IP at the edge or in the route (follow existing API patterns). |
| 117 | +- Optional: drop obvious bots via `User-Agent` heuristics later. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## Read path (server) |
| 122 | + |
| 123 | +### Server action (or cached query helper) |
| 124 | + |
| 125 | +- **Name:** e.g. `getResourceAnalytics({ resource_type, resource_id, rangeDays })`. |
| 126 | +- **Steps:** |
| 127 | + 1. Resolve current user (`authID()`). |
| 128 | + 2. **Assert ownership** for `(resource_type, resource_id)` via a small internal map: |
| 129 | + - `ARTICLE` → author_id matches user. |
| 130 | + - `GIST` → owner matches user. |
| 131 | + - (Future types → same pattern.) |
| 132 | + 3. Run the ClickHouse aggregation query. |
| 133 | + 4. Optionally merge **reaction** / **bookmark** counts from SQLKit repositories for that resource. |
| 134 | + |
| 135 | +Return a typed payload, e.g. `{ series: { date, impressions, unique_viewers }[], totals: { impressions, unique_viewers }, reactions: number, bookmarks: number }` (shape can evolve). Chart can default to **unique viewers** with impressions as secondary or toggle. |
| 136 | + |
| 137 | +**Caching:** read-only, user-specific data — **do not** use `'use cache'` on functions that call `cookies()` / session. Per-request or short-lived client cache (TanStack Query) is appropriate. |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## Code layout (suggested) |
| 142 | + |
| 143 | +| Area | Path / artifact | |
| 144 | +| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | |
| 145 | +| ClickHouse client singleton | `src/backend/persistence/clickhouse.client.ts` | |
| 146 | +| Analytics server actions | `src/backend/services/analytics.actions.ts` | |
| 147 | +| Zod input | `src/backend/services/inputs/analytics.input.ts` | |
| 148 | +| Ingest route | `src/app/api/analytics/view/route.ts` | |
| 149 | +| Client tracker | e.g. `src/components/analytics/ResourceViewTracker.tsx` | |
| 150 | +| Chart | e.g. `src/components/analytics/ResourceAnalyticsChart.tsx` (Recharts or existing chart lib) | |
| 151 | +| Dashboard page | e.g. `src/app/(dashboard-editor)/dashboard/analytics/[resourceType]/[resourceId]/page.tsx` or nested under article/gist edit flows | |
| 152 | + |
| 153 | +Keep **ownership checks** in one module (e.g. `assertResourceAnalyticsAccess`) so new resource types only add a branch there. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Environment variables |
| 158 | + |
| 159 | +Add to the server env schema (e.g. `@t3-oss/env-nextjs`): |
| 160 | + |
| 161 | +- `CLICKHOUSE_HOST` |
| 162 | +- `CLICKHOUSE_DATABASE` |
| 163 | +- `CLICKHOUSE_USERNAME` |
| 164 | +- `CLICKHOUSE_PASSWORD` |
| 165 | + |
| 166 | +Optional: `CLICKHOUSE_URL` if using a single connection string provider. |
| 167 | + |
| 168 | +When vars are missing, ingest route should **no-op or 503** consistently; dashboard should show a clear “analytics unavailable” state so local dev without ClickHouse still runs. |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Dependencies |
| 173 | + |
| 174 | +- `@clickhouse/client` — official client for insert/query. |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Rollout order |
| 179 | + |
| 180 | +1. Provision ClickHouse (Cloud or Docker) and create `resource_views`. |
| 181 | +2. Add env vars + `clickhouse.client.ts` + connection health check (optional). |
| 182 | +3. Implement `POST /api/analytics/view` with allowlist validation. |
| 183 | +4. Add `ResourceViewTracker` to **published article** page (and optionally gist page). |
| 184 | +5. Implement `getResourceAnalytics` + ownership assertions. |
| 185 | +6. Build dashboard UI: summary cards (**impressions** + **unique viewers**) + range toggle + chart (unique viewers primary; impressions optional second series or toggle). |
| 186 | +7. Wire entry points from article editor / dashboard nav. |
| 187 | +8. Production monitoring: insert errors, query latency, row growth. |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## Relationship to PostgreSQL |
| 192 | + |
| 193 | +- **Authoritative for identity and engagement:** users, articles, gists, reactions, bookmarks remain in Postgres. |
| 194 | +- **Analytics store:** append-only view events; cheap `count()` and `uniq(session_id)` per day. |
| 195 | +- **Issue #114 acceptance criteria** map as follows: |
| 196 | + - View count / time series → ClickHouse: **unique viewers** time series (`uniq(session_id)` by day); **impressions** (`count()` by day) from the same rows. |
| 197 | + - Reaction and bookmark counts → existing repositories by `resource_type` / `resource_id`. |
| 198 | + - “Estimated reads” → future enhancement (scroll depth, time-on-page); not required for v1. |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Open decisions |
| 203 | + |
| 204 | +- **Public view counter:** whether to show aggregate views on the article page (requires either a cached aggregate job or a separate materialized summary). |
| 205 | +- **GDPR / retention:** retention window for `resource_views` and deletion story when a user or resource is removed (ClickHouse TTL or periodic cleanup). |
| 206 | +- **Cross-resource dashboard:** one page listing “all my content” with sparklines — nice follow-up after single-resource insights ship. |
0 commit comments