Skip to content

stani-chirk/logwick

Repository files navigation

Logwick

License: AGPL-3.0 Node.js Version

Server log analytics for AI bot detection, AI agent monitoring, and AI visibility — including the traffic your JavaScript analytics never shows.

Most analytics tools count pageviews. Logwick classifies who made the request: real users, known crawlers, AI training scrapers, AI user-fetch agents, archive bots, and suspicious automated traffic — straight from your edge logs, on your own machine, with no data leaving your infrastructure. No GA, Plausible, or client-side snippet required — if you can get JSONL from your CDN or origin, you get the full picture.

AI agents don't run JavaScript. They never appear in GA or Plausible. Raw HTTP logs are the only place to catch them — Logwick does exactly that.

Dashboard UI

Run npm run dashboard-api, then open http://127.0.0.1:8787/ (static UI + read-only /api/*, 127.0.0.1 only — no data leaves your machine). See docs/dashboard.md.

Logwick dashboard — summary, timeseries, and traffic taxonomy

Summary, timeseries, and traffic taxonomy.

Logwick — session drill-down with flow Sankey and session list

Drill-down — flow Sankey and the session list.


Why Logwick

You're flying blind on ~40% of your traffic. GA and similar tools see only browser sessions. Bots, AI crawlers, and automated clients skip JavaScript entirely — they're invisible in your dashboard but very real in your server logs.

Your logs already have the answer. Logwick processes the JSONL your CDN or edge already produces, classifies every request against a multi-phase ruleset (UA patterns, path heuristics, behavioral signals), and rolls it up into sessions you can explore — all from server-side logs, not a tracking tag on the page.

Stays on your machine. SQLite on disk, dashboard on localhost. No SaaS account, no data upload, no vendor lock-in. You control retention. Logwick never phones home.

AI crawler analytics & visibility

Logwick is server-side analytics built for AI bot traffic monitoring, AI crawler detection, and AI agent monitoring from your own server logs — cookieless, self-hosted, no tracking tag.

AI visibility (from logs, not rank trackers). Many “AI visibility” products measure whether your brand appears in ChatGPT or Perplexity answers. Logwick answers the upstream question: which AI assistants and crawlers actually fetch your pages, and which URLs they hit — GPTBot, ClaudeBot, ChatGPT-User, Perplexity, Gemini Deep Research, and dozens more. That is the server-side signal behind visibility: who is reading your site so an AI can answer someone else.

Logwick splits AI traffic into three intents (not one generic “bot” bucket):

AI traffic type Examples What you learn
Training corpus GPTBot, ClaudeBot, CCBot Who is scraping pages for model training
User-fetch ChatGPT-User, Perplexity, Gemini Deep Research Real-time fetches when a user asks an AI — the strongest AI visibility signal in raw HTTP logs
Search index OAI-SearchBot, PerplexityBot (on robots/sitemap) AI search indexers probing your site

You also get GPTBot / ChatGPT bot tracking at vendor level, bot detection from server logs (“who is crawling my website?”), and Sankey flows that map each AI vendor to the exact paths it fetched.


What you get

Capability Details
AI & bot classification AI crawler detection and AI scraper identification (GPTBot, ClaudeBot, Common Crawl…), plus search crawlers, link-preview bots, archive scrapers, and security probes — split by training / user-fetch / search index, with explainable rules (classification, traffic taxonomy)
Session rollups Group requests into sessions with idle-timeout logic; engagement-style signals from HTTP behavior (methods, paths, timing, response sizes)
Social sharing analysis See when your pages are shared on social and messaging platforms — link-preview bots (Facebook, LinkedIn, Slack, Telegram, Discord, X/Twitter…) mapped to the exact URLs they fetch, in a dedicated sharing flow (traffic taxonomy, dashboard)
Local-first storage Single SQLite file, idempotent ingestion — all processing stays on your machine
Explore & drill down Read-only JSON API over SQLite (summary, timeseries, breakdowns, sessions) on 127.0.0.1 — no auth token needed (dashboard API)

Who needs this

Who What you can answer
SEO / content teams AI visibility and crawl mix over time — which AI assistants fetch your content, plus search bots, link previews, and archives
Infra & SRE Suspicious paths, probe-like traffic, and automated clients alongside “normal” requests, without shipping logs to a vendor
Indie hackers & small sites Know exactly who's on your site without GA, Plausible, or any client-side snippet
SaaS & API products Patterns that look like scraping or bulk automated use, grounded in HTTP facts rather than pageviews

How Logwick compares

Logwick is server-side analytics that sits between privacy-first web analytics and AI / bot traffic dashboards: it reads your raw server / edge logs instead of a client-side tag — so it catches the automated traffic those tools never see, and keeps everything on your own machine.

Tool Data source Catches AI agents & bots that skip JS Self-hosted / local Open source
Logwick Server / edge JSONL logs ✅ UA + path + behavior taxonomy ✅ SQLite + localhost ✅ AGPL-3.0
Cloudflare AI Crawl Control Cloudflare edge ✅ AI crawlers ❌ SaaS, Cloudflare-only
Dark Visitors Tracking + agent list ✅ AI agents ❌ SaaS
GoAccess Server logs ⚠️ generic bots, no AI taxonomy
GA4 / Plausible / Umami / Matomo (JS) Client-side JS tag ❌ bots & AI don't run JS varies varies

If you already run Cloudflare AI Crawl Control or a JavaScript analytics tool, Logwick complements them: a local, vendor-neutral view of the same traffic, rebuilt from your own logs and never leaving your infrastructure.

Frequently asked

  • Is Logwick an open-source / self-hosted alternative to Cloudflare AI Crawl Control? Yes — AI bot traffic monitoring and crawler analytics from any edge or CDN that can emit JSONL (not only Cloudflare), runs locally, and is AGPL-3.0.
  • Does Logwick measure “AI visibility” like GEO rank trackers? Not LLM answer rankings. Logwick shows which AI agents and crawlers fetch your URLs from server logs — the upstream traffic signal (training crawl vs user-fetch vs search index), including which pages ChatGPT-User or Perplexity hit in real time.
  • How do I track GPTBot or ChatGPT on my site? Ingest your edge JSONL, run process, open the dashboard — GPTBot, ChatGPT-User, and other vendors are classified by family with per-path breakdowns and AI user-fetch Sankey flows.
  • How is it different from GoAccess or other log analyzers? Logwick adds a multi-phase traffic taxonomy (humans, search crawlers, AI training/user-fetch, link previews, security probes) and rolls requests into sessions, instead of only counting hits.
  • Why not just use Google Analytics, Plausible, or Umami? Those rely on a JavaScript tag, and AI agents and most bots don't execute JavaScript — they're invisible there. Logwick reads server logs, so it sees them.
  • Does any data leave my machine? No. Processing is local, storage is a single SQLite file, and the dashboard binds to 127.0.0.1.

Quick start

npm install   # from the repository root after clone
npm run process -- --config config/process.example.json --target-id demo --db data/analytics/http-analytics.db --input path/to/logs.jsonl
npm run dashboard-api -- --db data/analytics/http-analytics.db --port 8787
# browser: http://127.0.0.1:8787/
curl -s "http://127.0.0.1:8787/api/summary"

You can set LOGWICK_ANALYTICS_DB to the .db path and omit --db on the API (see Environment variables). Full setup: docs/getting-started.md.

Documentation lives in docs/ (architecture, CLI, dashboard API, classification, persistence).


How it works

Drop your logs in — get a dashboard of humans vs bots in seconds.

Your edge (Nginx, Caddy, Cloudflare, Fastly — anything that writes JSONL) is already recording every request. Logwick takes that file, classifies each entry, sessionizes the traffic, and stores the result in a local SQLite database. A read-only HTTP API on 127.0.0.1 serves a static browser dashboard and JSON metrics for exploration.

  Your edge / CDN                  Logwick pipeline
 ┌─────────────┐                 ┌──────────────────────────────────┐
 │  Access log │   JSONL on disk │  classify → sessionize →         │
 │   (JSONL)   │ ──────────────► │  SQLite  →  dashboard API        │
 └─────────────┘                 │             (127.0.0.1:8787)     │
                                 └──────────────────────────────────┘

Log fetch and shipping from your edge are not in this repo — you add ingest when you need it. That keeps the project small and deployment-agnostic.

JSONL in → classify & sessionize → SQLite → explore locally.


Tech stack

Area In short
Runtime & repo Node.js 20+, npm workspaces (apps/*, packages/*), ES modules
Pipeline & data JSONL → CLI processSQLite (better-sqlite3)
Dashboard Read-only JSON API on Node http plus static UI at http://127.0.0.1:8787/
Rules & quality JSON Schema / Ajv, YAML registry → generated rules, ESLint 9, node --test

Per-package dependencies and tooling: docs/tech-stack.md.


Changelog

CHANGELOG.md — release history (current: 1.1.0).


Privacy

HTTP logs often include IP addresses and User-Agents (personal data in many jurisdictions). Logwick never phones home — all processing happens locally, and retention and access control are yours to define. Keep secrets in env and local config — not in git.


Maintainers

See MAINTAINERS.md for contact, commercial inquiries, and the @http-logs/* package naming note.

Workspace packages are published under the npm scope @http-logs/*; the product name is Logwick.


License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). The full text is in LICENSE. Root package.json declares license: AGPL-3.0-only (SPDX).

For commercial or closed-source use, a commercial license is required. If you use Logwick inside a company or in a product without releasing your modifications under the AGPL, contact hi@r-sun.ai (Raising Sun s.r.o.) for a commercial license.

Paid options

Offering What you get
Commercial license Use without AGPL obligations inside your company or product
Extended detection signatures Broader bot/AI patterns and heuristics, updated on a cadence
Integration & consulting Pipeline design, customization, training
Custom support Priority support, bug fixes, feature requests

Third-party npm dependencies remain under their own licenses; they are not automatically AGPL. Audit node_modules or use a license tool before you ship a product build.


Contributing & security

  • CONTRIBUTING.md — workspaces, lint boundaries, how to contribute.
  • SECURITY.md — responsible disclosure (do not file public issues for sensitive reports).

About

Self-hosted server log analytics for AI bot detection, AI agent monitoring, and AI visibility. Classifies GPTBot, ClaudeBot, ChatGPT-User, Perplexity & more from edge JSONL — no JS tag, no SaaS. Open-source alternative to Cloudflare AI Crawl Control.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors