Pyxle Framework Benchmark Suite

Transparent, reproducible benchmarks comparing Pyxle against popular web frameworks.

Every framework lives in its own frameworks/<name>/ directory and implements identical endpoints with the same business logic, database schema, and response format. There are no submodules and nothing fetched at runtime — clone the repo, run one script, and you get the same numbers we publish.

Quick start

./bench.sh                 # API throughput suite (default), all 7 frameworks
./bench.sh --suite=ssr     # SSR latency: Pyxle SSR vs Next.js
./bench.sh --suite=all     # both suites
./bench.sh --suite=all --workers=auto   # use every CPU core

bench.sh creates a local .venv, installs every dependency, builds the Pyxle and Next.js apps, starts the servers each suite needs, health-checks them, verifies endpoint parity, runs the benchmark, and tears everything down. The first run takes a few minutes (installs + builds); pass --skip-install --skip-build on later runs to reuse the prepared tree.

Same-box vs off-box. The quick-start command above runs the load generator on the same machine as the servers — convenient, but the generator competes for the same CPU, so it caps throughput and is not how we produce published numbers. For trustworthy results we run the generator on a separate box with a multithreaded generator (oha). See Methodology and Reproducing the published numbers.

The two suites

API throughput (`--suite=api`)

Measures requests/second across seven frameworks hitting identical API endpoints.

Framework	Language	Category	Port	Server
Pyxle	Python	Full-stack (SSR + API)	8001	`pyxle serve` (uvicorn)
FastAPI	Python	API	8002	uvicorn
Django	Python	Full-stack	8003	uvicorn (ASGI)
Flask	Python	Micro	8004	gunicorn (gthread)
Express	Node.js	API	8005	built-in (cluster for multi-core)
Hono	Node.js	Ultralight API	8006	@hono/node-server (cluster)
Next.js	Node.js	Full-stack (SSR)	3000	`next start` (cluster for multi-core)

Endpoints (identical across every framework):

Endpoint	Method	Description
`/api/json`	GET	Return a static JSON object (pure serialization overhead)
`/api/db`	GET	Read one random row from SQLite (framework + DB)
`/api/queries?n=5`	GET	Read 5 random rows from SQLite (query loop)
`/api/queries?n=20`	GET	Read 20 random rows (heavier workload)
`/api/form`	POST	Parse JSON body, validate, return response
`/health`	GET	Minimal health check (raw routing overhead)

SSR latency (`--suite=ssr`)

Measures dynamic server-side rendering — the full page re-rendered on every request — for Pyxle SSR (frameworks/pyxle-ssr/, port 8011) vs Next.js (frameworks/nextjs/, port 3000):

Page	Path	What it stresses
Landing	`/`	Baseline SSR overhead
Heavy	`/heavy`	A 300-row data table
Complex	`/complex`	A nested dashboard

Both apps render equivalent pages — the same loader data and the same DOM structure (verified by row/heading counts, see bench/verify-parity.mjs). They are not byte-identical: Next.js's React Server Component model serializes the rendered tree into the response as hydration data, so a Next page is ~2–3× the bytes of the equivalent Pyxle page. That difference is reported (Bytes / response), not hidden — it's a real, architectural property of each framework.

All suites use a SQLite database with 1,000 deterministically-seeded rows (WAL mode), identical across frameworks.

CLI options

./bench.sh [options]

Topology:
  (default)               Same box: servers on 127.0.0.1, generator local
                            (autocannon). Convenient; NOT for published numbers.
  --serve-only            Only start + health-check servers (bound to 0.0.0.0)
                            and idle. Run on the SERVER box.
  --generate-only         Only run the generator against --target. Run on the
                            dedicated CLIENT box (defaults to --generator=oha).

  --suite=api|ssr|all     Which suite to run (default: api)
  --frameworks=a,b,...    (api) Only benchmark these frameworks
  --host=HOST             Host the servers bind to (default 127.0.0.1;
                            serve-only defaults to 0.0.0.0)
  --target=HOST           (generate-only) Server host/IP to generate load against
  --generator=NAME        autocannon (same-box only) or oha (off-box / published)
  --duration=N            Measured seconds per repetition (default 10)
  --connections=N,M,...   Concurrency levels (e.g. 10,50,100)
  --reps=N                Repetitions per cell; median reported (default 1)
  --warmup-duration=N     Seconds of warmup load before measuring (default 5)
  --latency-rate=QPS      (ssr, oha) Extra fixed-rate, coordinated-omission-
                            corrected latency pass at QPS
  --workers=N|auto        Run every framework with N worker processes (auto=nproc)
  --output=FILE           Save JSON results to FILE
  --skip-install          Don't (re)install dependencies
  --skip-build            Don't (re)build the Pyxle / Next.js apps
  --skip-parity           Don't run the pre-run parity verification
  -h, --help              Show this help

The runners under bench/ can also be invoked directly once servers are up:

cd bench && node verify-parity.mjs --suite=all     # prove parity first
cd bench && node api.mjs --only=pyxle,hono --generator=oha --reps=5
cd bench && node scaling.mjs ../results/<1w>.json ../results/<16w>.json

Methodology

A benchmark is only useful if it's fair. This suite is built around five properties:

Off-box load generation. For published numbers the generator runs on a separate instance from the servers and hits them over the private network, so the measurement reflects the server, not the harness sharing its CPU. The generator is oha (multithreaded Rust); single-threaded autocannon is kept only for quick same-box checks and becomes the ceiling at high request rates.
A correctness gate. Every measured cell carries its non-2xx / error / timeout counts. Any cell with a single non-2xx, connection error, or timeout is marked INVALID, excluded from the ranking, and makes the run exit non-zero — a framework that is erroring is never silently published as a throughput number. A parity check (verify-parity.mjs) runs before the benchmark and asserts every framework returns 200 with the same JSON shape (API) and equivalent rendered content (SSR).
Per-core, scaling, and aggregate — reported separately. A single 16-worker aggregate conflates framework speed with core count. We run a 1-worker pass and an N-worker pass and report all three: single-worker req/s (per-core efficiency), the 1→N scaling factor, and the N-worker aggregate (scaling.mjs).
Honest latency. Every SSR row reports the full distribution (p50/p90/p99/p99.9/max), not just the median. An optional fixed-rate pass (--latency-rate, oha -q with coordinated-omission correction) holds a sub-saturation request rate so the latency numbers aren't distorted by the harness falling behind.
Repetition, warmup, and full disclosure. Each cell is measured --reps times; the API suite rotates which framework runs first each repetition and the SSR suite alternates which of the two servers is measured first, so first-mover / thermal / noisy-neighbour bias never lands on the same framework. The headline is the median; variance (stddev, coefficient of variation) is reported and saved (a single rep reports variance as —, not ±0%). A timed --warmup-duration drives load to steady state before measuring. Every result JSON records the generator (and its version), whether it ran off-box, worker count, connection levels, durations, reps, instance types, and the exact version of every framework (results/.framework-versions.json). A file is stamped meta.publishable only when measured off-box with oha.

Apples-to-apples server configuration

The async Python frameworks (Pyxle, FastAPI, Django) run on uvicorn (with uvicorn[standard]/uvloop pinned); Flask runs on gunicorn with gthread workers, since it's synchronous. With --workers N each framework gets the same N worker processes — Python: uvicorn/gunicorn --workers N; Node: the cluster module (workers share one listening socket, the kernel load-balances, no reverse-proxy hop); Pyxle: pyxle serve --workers N. The 1-worker baseline is genuinely one process for every framework, Flask included. Flask additionally runs a fixed, disclosed number of gthread threads per worker (default 8, in results/.framework-versions.json) — its production model for a synchronous app; threads are not extra processes.
Pyxle runs via pyxle serve (production mode) with CSRF disabled for a fair POST comparison. Pyxle's health probe is /api/healthz (file-based routing); every other framework uses /health. Both return {"status":"ok"}.
Each framework's code is idiomatic — not artificially optimized or handicapped: the DB endpoints use a persistent per-worker connection and a process-lifetime prepared statement in every framework, and /api/json emits compact JSON everywhere.
SSR suite: measures dynamic SSR — the page re-rendered on every request (Pyxle @server vs Next.js force-dynamic). For static/cacheable content Next.js pre-renders and serves at static-file speed; this benchmark deliberately targets pages that must render per request, and discloses the per-response transfer size of each.

Reproducing the published numbers (two-box)

The numbers on pyxle.dev/benchmarks are produced with the generator off-box. Launch two instances of the same type (we use c7i.4xlarge, 16 vCPU) in the same availability zone / cluster placement group so the network hop between them is minimal, and a security group that lets the client reach the server on the framework ports (3000, 8001–8006, 8011) over the private network.

# On BOTH boxes — toolchain (Python 3.11, Node 20, oha). Re-login afterward so
# the new PATH (python3.11 + oha, added to ~/.bashrc) takes effect:
bash scripts/ec2-setup.sh && exec "$SHELL" -l

# --- SERVER box (private IP e.g. 10.0.0.10) ---------------------------------
#   single-worker baseline, then the all-cores run (two server lifecycles):
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all              # 1 worker
#   ... run the client passes below against it, Ctrl-C, then:
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all --workers=auto

# --- CLIENT box -------------------------------------------------------------
export BENCH_INSTANCE_TYPE=c7i.4xlarge
export BENCH_SERVER_INSTANCE=c7i.4xlarge BENCH_CLIENT_INSTANCE=c7i.4xlarge
SRV=10.0.0.10
# against the 1-worker server (--suite=all writes run-1w-api.json + run-1w-ssr.json):
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
  --reps=5 --duration=30 --warmup-duration=20 --connections=100 \
  --output=results/run-1w.json
# against the all-cores server:
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
  --reps=5 --duration=30 --warmup-duration=20 --connections=100 \
  --latency-rate=2000 --output=results/run-16w.json

# --- Combine the API passes into the per-core / scaling / aggregate table ---
cd bench && node scaling.mjs ../results/run-1w-api.json ../results/run-16w-api.json

--connections must be high enough to saturate the N-worker server while the client's own CPU stays well under load — confirm a fast framework scales ~linearly 1→N. If the top frameworks all converge to one number, the client is still the bottleneck: use a bigger client box. Tear the instances down when finished.

Project structure

benchmarks/
├── README.md
├── bench.sh                  # One-command runner (topology + CLI-arg driven)
├── scripts/
│   └── ec2-setup.sh          # AL2023 toolchain incl. pinned oha
├── bench/                    # Load-test runners
│   ├── loadgen.mjs           # autocannon + oha backends → one metrics shape
│   ├── stats.mjs             # reps → median + variance
│   ├── api.mjs               # API throughput suite
│   ├── ssr.mjs               # SSR + full-stack suite (Pyxle SSR vs Next.js)
│   ├── scaling.mjs           # per-core / scaling / aggregate report
│   └── verify-parity.mjs     # pre-run parity + payload-size check
├── frameworks/               # One self-contained app per framework
│   ├── pyxle/  pyxle-ssr/  fastapi/  django/  flask/  express/  hono/  nextjs/
│   └── shared/               # Shared DB seed helpers
└── results/                  # Timestamped JSON per run (gitignored)

Reproducibility

Everything needed to reproduce the published numbers is in this one repository — no submodules, no separately-cloned apps. A fresh clone plus the two-box runbook above re-runs every comparison shown on pyxle.dev/benchmarks. Absolute numbers vary with hardware; the relative standings and the methodology are what the page reports, and each published figure links to the raw result JSON it came from.

Contributing

To add a new framework to the API suite:

Create frameworks/<name>/ with the app code.
Implement all six endpoints with identical logic and response format.
Add it (name, port, language) to the FRAMEWORKS registry in bench/api.mjs and the expectations in bench/verify-parity.mjs.
Add it to the server-start block in bench.sh.
Run cd bench && node verify-parity.mjs --suite=api — it must pass before any numbers count.
Update this README.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyxle Framework Benchmark Suite

Quick start

The two suites

API throughput (`--suite=api`)

SSR latency (`--suite=ssr`)

CLI options

Methodology

Apples-to-apples server configuration

Reproducing the published numbers (two-box)

Project structure

Reproducibility

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bench		bench
frameworks		frameworks
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.sh		bench.sh

Folders and files

Latest commit

History

Repository files navigation

Pyxle Framework Benchmark Suite

Quick start

The two suites

API throughput (--suite=api)

SSR latency (--suite=ssr)

CLI options

Methodology

Apples-to-apples server configuration

Reproducing the published numbers (two-box)

Project structure

Reproducibility

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

API throughput (`--suite=api`)

SSR latency (`--suite=ssr`)

Packages