Skip to content

Commit 2aefe5a

Browse files
committed
feat: add resource analytics plan documentation
- Introduced a comprehensive document outlining the resource analytics plan for TechDiary. - Defined goals, product surface, data model, ingestion methods, and server actions for tracking resource views. - Established metrics for impressions and unique viewers, along with API specifications for data ingestion.
1 parent a34d490 commit 2aefe5a

1 file changed

Lines changed: 206 additions & 0 deletions

File tree

docs/resource-analytics-plan.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Resource analytics plan
2+
3+
This document describes **general resource analytics** for TechDiary: view events and time-series insights for any trackable `(resource_type, resource_id)`, not only articles. It extends the intent of [GitHub issue #114](https://github.com/techdiary-dev/techdiary.dev/issues/114) and the writer-analytics line item in `comeback-strategy.md`.
4+
5+
---
6+
7+
## Goals
8+
9+
- Give **owners** visibility into **reach** (views over time) for content they control.
10+
- Reuse the same **polymorphic resource keys** the app already uses for reactions, bookmarks, and comments (`resource_type` + `resource_id`).
11+
- Keep **heavy write traffic** off PostgreSQL by storing view events in **ClickHouse** (or an equivalent columnar/analytics store).
12+
13+
Non-goals for v1:
14+
15+
- Real-time public view counters on every page.
16+
- Analytics for resources the current user does not own (except future admin tooling).
17+
- Perfect “read time” without client instrumentation (optional later via scroll/time signals).
18+
19+
---
20+
21+
## Product surface
22+
23+
- **Dashboard (or editor)** entry: “Insights” / “Analytics” for a resource the user owns.
24+
- **Summary cards:** **impressions** (every append / page load in range) and **unique viewers** (`uniq(session_id)` in range); optional all-time headlines from ClickHouse.
25+
- **Time-series chart:** primary line = **unique viewers per day**; optional second series or toggle = **impressions per day** (see [Metrics](#metrics-impressions-vs-unique-viewers)). Range **7d / 30d / 90d** (default **30d** per #114).
26+
- **Engagement from Postgres:** reaction count and bookmark count where the schema supports that `resource_type` (today: **ARTICLE**, **GIST** for reactions; bookmarks per existing `bookmarks` rules).
27+
28+
First consumer: **published article** detail views. Second: **gist** public views, using the same pipeline.
29+
30+
---
31+
32+
## Data model (ClickHouse)
33+
34+
Single events table keyed by resource, not by article only.
35+
36+
```sql
37+
CREATE TABLE resource_views (
38+
resource_type LowCardinality(String), -- e.g. ARTICLE, GIST (allowlist)
39+
resource_id UUID,
40+
viewer_id Nullable(UUID), -- NULL when anonymous
41+
session_id String, -- stable anonymous bucket (cookie / hash)
42+
referrer Nullable(String),
43+
country_code Nullable(String), -- optional, from geo later
44+
viewed_at DateTime DEFAULT now()
45+
)
46+
ENGINE = MergeTree()
47+
PARTITION BY toYYYYMM(viewed_at)
48+
ORDER BY (resource_type, resource_id, viewed_at);
49+
```
50+
51+
**Deduped “views” metric (v1):** `uniq(session_id)` per calendar day per resource, for a chosen window:
52+
53+
```sql
54+
SELECT
55+
toDate(viewed_at) AS date,
56+
uniq(session_id) AS views
57+
FROM resource_views
58+
WHERE resource_type = {type:String}
59+
AND resource_id = {id:UUID}
60+
AND viewed_at >= now() - INTERVAL 30 DAY
61+
GROUP BY date
62+
ORDER BY date;
63+
```
64+
65+
Adjust interval via app-level parameter (7 / 30 / 90 / custom).
66+
67+
### Metrics: impressions vs unique viewers
68+
69+
Append-on-each-load stores **one row per event**. That supports **two KPIs from the same table** without changing ingestion:
70+
71+
| Metric | Meaning | Per-day aggregation |
72+
|--------|---------|---------------------|
73+
| **Impressions** | Total times the page was loaded and sent an event (includes refreshes, repeat loads same session). | `count()` |
74+
| **Unique viewers** | Distinct sessions that saw the resource at least once that day. | `uniq(session_id)` |
75+
76+
**Single query — daily series for both** (same filter as above):
77+
78+
```sql
79+
SELECT
80+
toDate(viewed_at) AS date,
81+
count() AS impressions,
82+
uniq(session_id) AS unique_viewers
83+
FROM resource_views
84+
WHERE resource_type = {type:String}
85+
AND resource_id = {id:UUID}
86+
AND viewed_at >= now() - INTERVAL 30 DAY
87+
GROUP BY date
88+
ORDER BY date;
89+
```
90+
91+
**Range totals** (summary cards): `sum(impressions)` and `sum(unique_viewers)` over the returned daily rows, or a separate aggregate query with the same `WHERE` and `count()` / `uniq(session_id)` over the whole window (note: `uniq` across the whole window is not the same as summing daily uniques — use **one global `uniq(session_id)`** for “unique viewers in period” if you want strict uniques; use **sum of daily uniques** if you want “approximate reach by day” stacked; document which you show).
92+
93+
**Recommendation for v1 summary:** report **unique viewers in period** as `uniq(session_id)` with the date filter only (one number), and **impressions in period** as `count()` with the same filter.
94+
95+
---
96+
97+
## Ingestion
98+
99+
### HTTP API
100+
101+
- **Route:** `POST /api/analytics/view` (or `pageview` if you prefer naming parity with #114).
102+
- **Body (JSON):** `{ "resource_type": "ARTICLE", "resource_id": "<uuid>", "session_id": "<string>" }`.
103+
- **Behavior:** validate allowlist + UUID, insert one row, return **204** quickly. Optionally use **wait_end_of_query: false** (or fire-and-forget pattern) so the client never blocks on ClickHouse latency.
104+
105+
### Client
106+
107+
- **Fire-and-forget** `fetch` on the **public** resource page (article read, gist read) inside a small client component mounted once per page.
108+
- **Session id:** anonymous stable id from a long-lived cookie or `localStorage` + fallback; if logged in, still send `session_id` for dedupe and optionally set `viewer_id` from server-side context if the route chooses to enrich (or leave enrichment to a later iteration).
109+
110+
### Allowlist
111+
112+
Reject unknown `resource_type` values at the API to avoid garbage dimensions. Start with: `ARTICLE`, `GIST`. Expand deliberately (e.g. `SERIES`) when there is a clear owner and a public URL.
113+
114+
### Abuse / noise
115+
116+
- Rate limit by IP at the edge or in the route (follow existing API patterns).
117+
- Optional: drop obvious bots via `User-Agent` heuristics later.
118+
119+
---
120+
121+
## Read path (server)
122+
123+
### Server action (or cached query helper)
124+
125+
- **Name:** e.g. `getResourceAnalytics({ resource_type, resource_id, rangeDays })`.
126+
- **Steps:**
127+
1. Resolve current user (`authID()`).
128+
2. **Assert ownership** for `(resource_type, resource_id)` via a small internal map:
129+
- `ARTICLE` → author_id matches user.
130+
- `GIST` → owner matches user.
131+
- (Future types → same pattern.)
132+
3. Run the ClickHouse aggregation query.
133+
4. Optionally merge **reaction** / **bookmark** counts from SQLKit repositories for that resource.
134+
135+
Return a typed payload, e.g. `{ series: { date, impressions, unique_viewers }[], totals: { impressions, unique_viewers }, reactions: number, bookmarks: number }` (shape can evolve). Chart can default to **unique viewers** with impressions as secondary or toggle.
136+
137+
**Caching:** read-only, user-specific data — **do not** use `'use cache'` on functions that call `cookies()` / session. Per-request or short-lived client cache (TanStack Query) is appropriate.
138+
139+
---
140+
141+
## Code layout (suggested)
142+
143+
| Area | Path / artifact |
144+
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
145+
| ClickHouse client singleton | `src/backend/persistence/clickhouse.client.ts` |
146+
| Analytics server actions | `src/backend/services/analytics.actions.ts` |
147+
| Zod input | `src/backend/services/inputs/analytics.input.ts` |
148+
| Ingest route | `src/app/api/analytics/view/route.ts` |
149+
| Client tracker | e.g. `src/components/analytics/ResourceViewTracker.tsx` |
150+
| Chart | e.g. `src/components/analytics/ResourceAnalyticsChart.tsx` (Recharts or existing chart lib) |
151+
| Dashboard page | e.g. `src/app/(dashboard-editor)/dashboard/analytics/[resourceType]/[resourceId]/page.tsx` or nested under article/gist edit flows |
152+
153+
Keep **ownership checks** in one module (e.g. `assertResourceAnalyticsAccess`) so new resource types only add a branch there.
154+
155+
---
156+
157+
## Environment variables
158+
159+
Add to the server env schema (e.g. `@t3-oss/env-nextjs`):
160+
161+
- `CLICKHOUSE_HOST`
162+
- `CLICKHOUSE_DATABASE`
163+
- `CLICKHOUSE_USERNAME`
164+
- `CLICKHOUSE_PASSWORD`
165+
166+
Optional: `CLICKHOUSE_URL` if using a single connection string provider.
167+
168+
When vars are missing, ingest route should **no-op or 503** consistently; dashboard should show a clear “analytics unavailable” state so local dev without ClickHouse still runs.
169+
170+
---
171+
172+
## Dependencies
173+
174+
- `@clickhouse/client` — official client for insert/query.
175+
176+
---
177+
178+
## Rollout order
179+
180+
1. Provision ClickHouse (Cloud or Docker) and create `resource_views`.
181+
2. Add env vars + `clickhouse.client.ts` + connection health check (optional).
182+
3. Implement `POST /api/analytics/view` with allowlist validation.
183+
4. Add `ResourceViewTracker` to **published article** page (and optionally gist page).
184+
5. Implement `getResourceAnalytics` + ownership assertions.
185+
6. Build dashboard UI: summary cards (**impressions** + **unique viewers**) + range toggle + chart (unique viewers primary; impressions optional second series or toggle).
186+
7. Wire entry points from article editor / dashboard nav.
187+
8. Production monitoring: insert errors, query latency, row growth.
188+
189+
---
190+
191+
## Relationship to PostgreSQL
192+
193+
- **Authoritative for identity and engagement:** users, articles, gists, reactions, bookmarks remain in Postgres.
194+
- **Analytics store:** append-only view events; cheap `count()` and `uniq(session_id)` per day.
195+
- **Issue #114 acceptance criteria** map as follows:
196+
- View count / time series → ClickHouse: **unique viewers** time series (`uniq(session_id)` by day); **impressions** (`count()` by day) from the same rows.
197+
- Reaction and bookmark counts → existing repositories by `resource_type` / `resource_id`.
198+
- “Estimated reads” → future enhancement (scroll depth, time-on-page); not required for v1.
199+
200+
---
201+
202+
## Open decisions
203+
204+
- **Public view counter:** whether to show aggregate views on the article page (requires either a cached aggregate job or a separate materialized summary).
205+
- **GDPR / retention:** retention window for `resource_views` and deletion story when a user or resource is removed (ClickHouse TTL or periodic cleanup).
206+
- **Cross-resource dashboard:** one page listing “all my content” with sparklines — nice follow-up after single-resource insights ship.

0 commit comments

Comments
 (0)