Transparent, reproducible benchmarks comparing Pyxle against popular web frameworks.
Every framework lives in its own frameworks/<name>/ directory and implements identical endpoints with the same business logic, database schema, and response format. There are no submodules and nothing fetched at runtime — clone the repo, run one script, and you get the same numbers we publish.
./bench.sh # API throughput suite (default), all 7 frameworks
./bench.sh --suite=ssr # SSR latency: Pyxle SSR vs Next.js
./bench.sh --suite=all # both suites
./bench.sh --suite=all --workers=auto # use every CPU corebench.sh creates a local .venv, installs every dependency, builds the Pyxle and Next.js apps, starts the servers each suite needs, health-checks them, verifies endpoint parity, runs the benchmark, and tears everything down. The first run takes a few minutes (installs + builds); pass --skip-install --skip-build on later runs to reuse the prepared tree.
Same-box vs off-box. The quick-start command above runs the load generator on the same machine as the servers — convenient, but the generator competes for the same CPU, so it caps throughput and is not how we produce published numbers. For trustworthy results we run the generator on a separate box with a multithreaded generator (
oha). See Methodology and Reproducing the published numbers.
Measures requests/second across seven frameworks hitting identical API endpoints.
| Framework | Language | Category | Port | Server |
|---|---|---|---|---|
| Pyxle | Python | Full-stack (SSR + API) | 8001 | pyxle serve (uvicorn) |
| FastAPI | Python | API | 8002 | uvicorn |
| Django | Python | Full-stack | 8003 | uvicorn (ASGI) |
| Flask | Python | Micro | 8004 | gunicorn (gthread) |
| Express | Node.js | API | 8005 | built-in (cluster for multi-core) |
| Hono | Node.js | Ultralight API | 8006 | @hono/node-server (cluster) |
| Next.js | Node.js | Full-stack (SSR) | 3000 | next start (cluster for multi-core) |
Endpoints (identical across every framework):
| Endpoint | Method | Description |
|---|---|---|
/api/json |
GET | Return a static JSON object (pure serialization overhead) |
/api/db |
GET | Read one random row from SQLite (framework + DB) |
/api/queries?n=5 |
GET | Read 5 random rows from SQLite (query loop) |
/api/queries?n=20 |
GET | Read 20 random rows (heavier workload) |
/api/form |
POST | Parse JSON body, validate, return response |
/health |
GET | Minimal health check (raw routing overhead) |
Measures dynamic server-side rendering — the full page re-rendered on every request — for Pyxle SSR (frameworks/pyxle-ssr/, port 8011) vs Next.js (frameworks/nextjs/, port 3000):
| Page | Path | What it stresses |
|---|---|---|
| Landing | / |
Baseline SSR overhead |
| Heavy | /heavy |
A 300-row data table |
| Complex | /complex |
A nested dashboard |
Both apps render equivalent pages — the same loader data and the same DOM structure (verified by row/heading counts, see bench/verify-parity.mjs). They are not byte-identical: Next.js's React Server Component model serializes the rendered tree into the response as hydration data, so a Next page is ~2–3× the bytes of the equivalent Pyxle page. That difference is reported (Bytes / response), not hidden — it's a real, architectural property of each framework.
All suites use a SQLite database with 1,000 deterministically-seeded rows (WAL mode), identical across frameworks.
./bench.sh [options]
Topology:
(default) Same box: servers on 127.0.0.1, generator local
(autocannon). Convenient; NOT for published numbers.
--serve-only Only start + health-check servers (bound to 0.0.0.0)
and idle. Run on the SERVER box.
--generate-only Only run the generator against --target. Run on the
dedicated CLIENT box (defaults to --generator=oha).
--suite=api|ssr|all Which suite to run (default: api)
--frameworks=a,b,... (api) Only benchmark these frameworks
--host=HOST Host the servers bind to (default 127.0.0.1;
serve-only defaults to 0.0.0.0)
--target=HOST (generate-only) Server host/IP to generate load against
--generator=NAME autocannon (same-box only) or oha (off-box / published)
--duration=N Measured seconds per repetition (default 10)
--connections=N,M,... Concurrency levels (e.g. 10,50,100)
--reps=N Repetitions per cell; median reported (default 1)
--warmup-duration=N Seconds of warmup load before measuring (default 5)
--latency-rate=QPS (ssr, oha) Extra fixed-rate, coordinated-omission-
corrected latency pass at QPS
--workers=N|auto Run every framework with N worker processes (auto=nproc)
--output=FILE Save JSON results to FILE
--skip-install Don't (re)install dependencies
--skip-build Don't (re)build the Pyxle / Next.js apps
--skip-parity Don't run the pre-run parity verification
-h, --help Show this help
The runners under bench/ can also be invoked directly once servers are up:
cd bench && node verify-parity.mjs --suite=all # prove parity first
cd bench && node api.mjs --only=pyxle,hono --generator=oha --reps=5
cd bench && node scaling.mjs ../results/<1w>.json ../results/<16w>.jsonA benchmark is only useful if it's fair. This suite is built around five properties:
-
Off-box load generation. For published numbers the generator runs on a separate instance from the servers and hits them over the private network, so the measurement reflects the server, not the harness sharing its CPU. The generator is
oha(multithreaded Rust); single-threaded autocannon is kept only for quick same-box checks and becomes the ceiling at high request rates. -
A correctness gate. Every measured cell carries its non-2xx / error / timeout counts. Any cell with a single non-2xx, connection error, or timeout is marked INVALID, excluded from the ranking, and makes the run exit non-zero — a framework that is erroring is never silently published as a throughput number. A parity check (
verify-parity.mjs) runs before the benchmark and asserts every framework returns200with the same JSON shape (API) and equivalent rendered content (SSR). -
Per-core, scaling, and aggregate — reported separately. A single 16-worker aggregate conflates framework speed with core count. We run a 1-worker pass and an N-worker pass and report all three: single-worker req/s (per-core efficiency), the 1→N scaling factor, and the N-worker aggregate (
scaling.mjs). -
Honest latency. Every SSR row reports the full distribution (p50/p90/p99/p99.9/max), not just the median. An optional fixed-rate pass (
--latency-rate,oha -qwith coordinated-omission correction) holds a sub-saturation request rate so the latency numbers aren't distorted by the harness falling behind. -
Repetition, warmup, and full disclosure. Each cell is measured
--repstimes; the API suite rotates which framework runs first each repetition and the SSR suite alternates which of the two servers is measured first, so first-mover / thermal / noisy-neighbour bias never lands on the same framework. The headline is the median; variance (stddev, coefficient of variation) is reported and saved (a single rep reports variance as—, not±0%). A timed--warmup-durationdrives load to steady state before measuring. Every result JSON records the generator (and its version), whether it ran off-box, worker count, connection levels, durations, reps, instance types, and the exact version of every framework (results/.framework-versions.json). A file is stampedmeta.publishableonly when measured off-box withoha.
- The async Python frameworks (Pyxle, FastAPI, Django) run on uvicorn (with
uvicorn[standard]/uvloop pinned); Flask runs on gunicorn with gthread workers, since it's synchronous. With--workers Neach framework gets the same N worker processes — Python: uvicorn/gunicorn--workers N; Node: theclustermodule (workers share one listening socket, the kernel load-balances, no reverse-proxy hop); Pyxle:pyxle serve --workers N. The 1-worker baseline is genuinely one process for every framework, Flask included. Flask additionally runs a fixed, disclosed number of gthread threads per worker (default 8, inresults/.framework-versions.json) — its production model for a synchronous app; threads are not extra processes. - Pyxle runs via
pyxle serve(production mode) with CSRF disabled for a fair POST comparison. Pyxle's health probe is/api/healthz(file-based routing); every other framework uses/health. Both return{"status":"ok"}. - Each framework's code is idiomatic — not artificially optimized or handicapped: the DB endpoints use a persistent per-worker connection and a process-lifetime prepared statement in every framework, and
/api/jsonemits compact JSON everywhere. - SSR suite: measures dynamic SSR — the page re-rendered on every request (Pyxle
@servervs Next.jsforce-dynamic). For static/cacheable content Next.js pre-renders and serves at static-file speed; this benchmark deliberately targets pages that must render per request, and discloses the per-response transfer size of each.
The numbers on pyxle.dev/benchmarks are produced with the generator off-box. Launch two instances of the same type (we use c7i.4xlarge, 16 vCPU) in the same availability zone / cluster placement group so the network hop between them is minimal, and a security group that lets the client reach the server on the framework ports (3000, 8001–8006, 8011) over the private network.
# On BOTH boxes — toolchain (Python 3.11, Node 20, oha). Re-login afterward so
# the new PATH (python3.11 + oha, added to ~/.bashrc) takes effect:
bash scripts/ec2-setup.sh && exec "$SHELL" -l
# --- SERVER box (private IP e.g. 10.0.0.10) ---------------------------------
# single-worker baseline, then the all-cores run (two server lifecycles):
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all # 1 worker
# ... run the client passes below against it, Ctrl-C, then:
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all --workers=auto
# --- CLIENT box -------------------------------------------------------------
export BENCH_INSTANCE_TYPE=c7i.4xlarge
export BENCH_SERVER_INSTANCE=c7i.4xlarge BENCH_CLIENT_INSTANCE=c7i.4xlarge
SRV=10.0.0.10
# against the 1-worker server (--suite=all writes run-1w-api.json + run-1w-ssr.json):
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
--reps=5 --duration=30 --warmup-duration=20 --connections=100 \
--output=results/run-1w.json
# against the all-cores server:
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
--reps=5 --duration=30 --warmup-duration=20 --connections=100 \
--latency-rate=2000 --output=results/run-16w.json
# --- Combine the API passes into the per-core / scaling / aggregate table ---
cd bench && node scaling.mjs ../results/run-1w-api.json ../results/run-16w-api.json--connections must be high enough to saturate the N-worker server while the client's own CPU stays well under load — confirm a fast framework scales ~linearly 1→N. If the top frameworks all converge to one number, the client is still the bottleneck: use a bigger client box. Tear the instances down when finished.
benchmarks/
├── README.md
├── bench.sh # One-command runner (topology + CLI-arg driven)
├── scripts/
│ └── ec2-setup.sh # AL2023 toolchain incl. pinned oha
├── bench/ # Load-test runners
│ ├── loadgen.mjs # autocannon + oha backends → one metrics shape
│ ├── stats.mjs # reps → median + variance
│ ├── api.mjs # API throughput suite
│ ├── ssr.mjs # SSR + full-stack suite (Pyxle SSR vs Next.js)
│ ├── scaling.mjs # per-core / scaling / aggregate report
│ └── verify-parity.mjs # pre-run parity + payload-size check
├── frameworks/ # One self-contained app per framework
│ ├── pyxle/ pyxle-ssr/ fastapi/ django/ flask/ express/ hono/ nextjs/
│ └── shared/ # Shared DB seed helpers
└── results/ # Timestamped JSON per run (gitignored)
Everything needed to reproduce the published numbers is in this one repository — no submodules, no separately-cloned apps. A fresh clone plus the two-box runbook above re-runs every comparison shown on pyxle.dev/benchmarks. Absolute numbers vary with hardware; the relative standings and the methodology are what the page reports, and each published figure links to the raw result JSON it came from.
To add a new framework to the API suite:
- Create
frameworks/<name>/with the app code. - Implement all six endpoints with identical logic and response format.
- Add it (name, port, language) to the
FRAMEWORKSregistry inbench/api.mjsand the expectations inbench/verify-parity.mjs. - Add it to the server-start block in
bench.sh. - Run
cd bench && node verify-parity.mjs --suite=api— it must pass before any numbers count. - Update this README.
MIT