Relevance Engine — Backend

High-throughput web relevancy scoring service built in Rust. Crawls URLs at scale, extracts structured metadata, and delegates AI-powered relevancy scoring via the Anthropic Claude API. Designed for sustained operation at 10K+ URLs/day with minimal resource overhead.

Architecture

┌──────────────────────────────────────────────────────┐
│                     Axum HTTP API                    │
│  /campaigns  /urls  /config  /jobs  /stats  /schema  │
└────────┬───────────────┬───────────────┬─────────────┘
         │               │               │
    ┌────▼────┐    ┌─────▼─────┐   ┌─────▼─────┐
    │ Crawler │    │  Scorer   │   │   Queue   │
    │ Service │    │ (Claude)  │   │  Worker   │
    └────┬────┘    └─────┬─────┘   └─────┬─────┘
         │               │               │
         └───────────────▼───────────────┘
                   PostgreSQL

The backend exposes a REST API consumed by the admin portal. Long-running operations (crawling, scoring) are dispatched through a job queue backed by PostgreSQL, allowing the API to return immediately while work proceeds asynchronously.

Tech Stack

Layer	Choice	Rationale
Language	Rust 1.93+	Memory safety, zero-cost abstractions, predictable latency
Framework	Axum 0.8	Tower-based, async-native, composable middleware
Database	PostgreSQL via SQLx 0.8	Compile-time query checking, async, migrations
HTTP Client	Reqwest 0.12	Connection pooling, gzip/brotli, redirect handling
HTML Parsing	Scraper 0.22	CSS selector-based extraction, built on html5ever
Observability	tracing + tracing-subscriber	Structured logging, env-based filtering

Project Structure

src/
├── main.rs              # Entrypoint: pool init, migrations, router composition
├── api/
│   ├── mod.rs           # Route tree
│   ├── campaigns.rs     # CRUD + start + results
│   ├── urls.rs          # Bulk import, list, delete, CSV export
│   ├── config.rs        # Runtime config CRUD (stored in DB)
│   ├── jobs.rs          # Queue inspection and cancellation
│   └── crawl.rs         # Manual crawl trigger
├── models/
│   └── mod.rs           # Domain types: UrlMetadata, Campaign, RelevancyCheck, Job, SystemConfig
├── crawler/
│   └── mod.rs           # URL fetcher + HTML metadata extractor
├── scorer/
│   └── mod.rs           # Claude API integration (batched scoring)
├── queue/
│   └── mod.rs           # Background job worker (poll + dispatch)
├── db/
│   └── mod.rs           # Database utilities
└── config/
    └── mod.rs           # Typed application configuration

migrations/
└── 20260221_initial.sql # Schema + indexes + seed config

API Reference

All endpoints return JSON. Base path: /api.

Health

Method	Path	Description
`GET`	`/health`	Returns `"ok"`

Campaigns

Method	Path	Description
`GET`	`/campaigns`	List campaigns (paginated: `?page=&per_page=`)
`POST`	`/campaigns`	Create campaign `{ name, reference_url?, urls[] }`
`GET`	`/campaigns/:id`	Get campaign detail
`PUT`	`/campaigns/:id`	Update campaign fields
`DELETE`	`/campaigns/:id`	Delete campaign and associated results
`POST`	`/campaigns/:id/start`	Queue campaign for processing
`GET`	`/campaigns/:id/results`	Paginated relevancy check results

URLs

Method	Path	Description
`GET`	`/urls`	List URLs (paginated)
`POST`	`/urls`	Bulk import `{ urls: string[] }`. Deduplicates on insert.
`GET`	`/urls/:id`	Get single URL metadata
`DELETE`	`/urls/:id`	Remove URL from database
`GET`	`/urls/export`	CSV export of all URL metadata

Configuration

Method	Path	Description
`GET`	`/config`	List all config entries
`GET`	`/config/:key`	Get single config value
`POST`	`/config`	Upsert `{ key, value, description? }`

Runtime-editable keys seeded at migration:

scoring_weights — { relevancy, quality, authority } point allocation
decision_thresholds — { approve, reject } score boundaries
schema_auto_reject — array of schema.org types to filter pre-AI
ai_model — Claude model identifier
ai_batch_size — URLs per API call
crawl_concurrency — max parallel fetches
crawl_timeout_secs — per-request timeout
stale_threshold_days — cache expiry window

Jobs

Method	Path	Description
`GET`	`/jobs`	List all jobs (paginated)
`GET`	`/jobs/:id`	Get job detail
`POST`	`/jobs/:id/cancel`	Cancel a pending or running job

Stats

Method	Path	Description
`GET`	`/stats/dashboard`	Aggregated operational metrics
`GET`	`/stats/costs`	Cost breakdown (daily/weekly/monthly)

Schema Management

Method	Path	Description
`GET`	`/schema/columns`	List custom columns on `url_metadata`
`POST`	`/schema/columns`	Add a column to `url_metadata` dynamically

Database Schema

Four core tables plus a job queue:

url_metadata — Crawled URL data with GIN-indexed schema_types for fast filtering
campaigns — Scoring runs with cost/performance tracking
relevancy_checks — Per-URL scores and AI reasoning, foreign-keyed to campaigns
system_config — JSONB key-value store for runtime configuration
jobs — Queue table with status, priority, retry tracking, and error capture

Indexes cover the primary query patterns: domain lookup, status filtering, score ordering, campaign association, and schema type containment.

Setup

Prerequisites

Rust 1.93+ (rustup update stable)
PostgreSQL 15+
An Anthropic API key (for scoring)

Database

# Creates the database parsed from DATABASE_URL in .env
# Idempotent: safe to run multiple times
./scripts/create-db.sh

# Or pass the URL directly:
DATABASE_URL=postgres://user:pass@localhost:5432/relevance_engine ./scripts/create-db.sh

Configuration

cp .env.example .env
# Edit .env:
#   DATABASE_URL=postgres://localhost:5432/relevance_engine
#   BIND_ADDR=0.0.0.0:3001
#   ANTHROPIC_API_KEY=sk-ant-...

Build and Run

cargo build --release
./target/release/relevance-engine

Migrations run automatically on startup via SQLx.

Development

cargo run
# Server starts on http://localhost:3001
# Logs controlled via RUST_LOG env var

Design Decisions

PostgreSQL as job queue. For this throughput (10K jobs/day), a dedicated broker adds operational complexity without proportional benefit. The jobs table with SELECT ... FOR UPDATE SKIP LOCKED provides exactly-once delivery with zero additional infrastructure.

Config in database, not files. The spec requires non-technical operators to change scoring prompts, weights, and thresholds without redeployment. JSONB columns give schema flexibility; the API layer validates on read.

Schema pre-filtering. URLs with schema types like Product or JobPosting are rejected before reaching the Claude API, targeting 35-45% cost reduction on typical datasets.

Batch scoring. URLs are sent to Claude in configurable batches (default: 20) to amortize prompt overhead and reduce API call count.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
experiments		experiments
migrations		migrations
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relevance Engine — Backend

Architecture

Tech Stack

Project Structure

API Reference

Health

Campaigns

URLs

Configuration

Jobs

Stats

Schema Management

Database Schema

Setup

Prerequisites

Database

Configuration

Build and Run

Development

Design Decisions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Relevance Engine — Backend

Architecture

Tech Stack

Project Structure

API Reference

Health

Campaigns

URLs

Configuration

Jobs

Stats

Schema Management

Database Schema

Setup

Prerequisites

Database

Configuration

Build and Run

Development

Design Decisions

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages