Skip to content

pyxle-dev/benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pyxle Framework Benchmark Suite

Transparent, reproducible benchmarks comparing Pyxle against popular web frameworks.

Every framework lives in its own frameworks/<name>/ directory and implements identical endpoints with the same business logic, database schema, and response format. There are no submodules and nothing fetched at runtime — clone the repo, run one script, and you get the same numbers we publish.

Quick start

./bench.sh                 # API throughput suite (default), all 7 frameworks
./bench.sh --suite=ssr     # SSR latency: Pyxle SSR vs Next.js
./bench.sh --suite=all     # both suites
./bench.sh --suite=all --workers=auto   # use every CPU core

bench.sh creates a local .venv, installs every dependency, builds the Pyxle and Next.js apps, starts the servers each suite needs, health-checks them, verifies endpoint parity, runs the benchmark, and tears everything down. The first run takes a few minutes (installs + builds); pass --skip-install --skip-build on later runs to reuse the prepared tree.

Same-box vs off-box. The quick-start command above runs the load generator on the same machine as the servers — convenient, but the generator competes for the same CPU, so it caps throughput and is not how we produce published numbers. For trustworthy results we run the generator on a separate box with a multithreaded generator (oha). See Methodology and Reproducing the published numbers.

The two suites

API throughput (--suite=api)

Measures requests/second across seven frameworks hitting identical API endpoints.

Framework Language Category Port Server
Pyxle Python Full-stack (SSR + API) 8001 pyxle serve (uvicorn)
FastAPI Python API 8002 uvicorn
Django Python Full-stack 8003 uvicorn (ASGI)
Flask Python Micro 8004 gunicorn (gthread)
Express Node.js API 8005 built-in (cluster for multi-core)
Hono Node.js Ultralight API 8006 @hono/node-server (cluster)
Next.js Node.js Full-stack (SSR) 3000 next start (cluster for multi-core)

Endpoints (identical across every framework):

Endpoint Method Description
/api/json GET Return a static JSON object (pure serialization overhead)
/api/db GET Read one random row from SQLite (framework + DB)
/api/queries?n=5 GET Read 5 random rows from SQLite (query loop)
/api/queries?n=20 GET Read 20 random rows (heavier workload)
/api/form POST Parse JSON body, validate, return response
/health GET Minimal health check (raw routing overhead)

SSR latency (--suite=ssr)

Measures dynamic server-side rendering — the full page re-rendered on every request — for Pyxle SSR (frameworks/pyxle-ssr/, port 8011) vs Next.js (frameworks/nextjs/, port 3000):

Page Path What it stresses
Landing / Baseline SSR overhead
Heavy /heavy A 300-row data table
Complex /complex A nested dashboard

Both apps render equivalent pages — the same loader data and the same DOM structure (verified by row/heading counts, see bench/verify-parity.mjs). They are not byte-identical: Next.js's React Server Component model serializes the rendered tree into the response as hydration data, so a Next page is ~2–3× the bytes of the equivalent Pyxle page. That difference is reported (Bytes / response), not hidden — it's a real, architectural property of each framework.

All suites use a SQLite database with 1,000 deterministically-seeded rows (WAL mode), identical across frameworks.

CLI options

./bench.sh [options]

Topology:
  (default)               Same box: servers on 127.0.0.1, generator local
                            (autocannon). Convenient; NOT for published numbers.
  --serve-only            Only start + health-check servers (bound to 0.0.0.0)
                            and idle. Run on the SERVER box.
  --generate-only         Only run the generator against --target. Run on the
                            dedicated CLIENT box (defaults to --generator=oha).

  --suite=api|ssr|all     Which suite to run (default: api)
  --frameworks=a,b,...    (api) Only benchmark these frameworks
  --host=HOST             Host the servers bind to (default 127.0.0.1;
                            serve-only defaults to 0.0.0.0)
  --target=HOST           (generate-only) Server host/IP to generate load against
  --generator=NAME        autocannon (same-box only) or oha (off-box / published)
  --duration=N            Measured seconds per repetition (default 10)
  --connections=N,M,...   Concurrency levels (e.g. 10,50,100)
  --reps=N                Repetitions per cell; median reported (default 1)
  --warmup-duration=N     Seconds of warmup load before measuring (default 5)
  --latency-rate=QPS      (ssr, oha) Extra fixed-rate, coordinated-omission-
                            corrected latency pass at QPS
  --workers=N|auto        Run every framework with N worker processes (auto=nproc)
  --output=FILE           Save JSON results to FILE
  --skip-install          Don't (re)install dependencies
  --skip-build            Don't (re)build the Pyxle / Next.js apps
  --skip-parity           Don't run the pre-run parity verification
  -h, --help              Show this help

The runners under bench/ can also be invoked directly once servers are up:

cd bench && node verify-parity.mjs --suite=all     # prove parity first
cd bench && node api.mjs --only=pyxle,hono --generator=oha --reps=5
cd bench && node scaling.mjs ../results/<1w>.json ../results/<16w>.json

Methodology

A benchmark is only useful if it's fair. This suite is built around five properties:

  1. Off-box load generation. For published numbers the generator runs on a separate instance from the servers and hits them over the private network, so the measurement reflects the server, not the harness sharing its CPU. The generator is oha (multithreaded Rust); single-threaded autocannon is kept only for quick same-box checks and becomes the ceiling at high request rates.

  2. A correctness gate. Every measured cell carries its non-2xx / error / timeout counts. Any cell with a single non-2xx, connection error, or timeout is marked INVALID, excluded from the ranking, and makes the run exit non-zero — a framework that is erroring is never silently published as a throughput number. A parity check (verify-parity.mjs) runs before the benchmark and asserts every framework returns 200 with the same JSON shape (API) and equivalent rendered content (SSR).

  3. Per-core, scaling, and aggregate — reported separately. A single 16-worker aggregate conflates framework speed with core count. We run a 1-worker pass and an N-worker pass and report all three: single-worker req/s (per-core efficiency), the 1→N scaling factor, and the N-worker aggregate (scaling.mjs).

  4. Honest latency. Every SSR row reports the full distribution (p50/p90/p99/p99.9/max), not just the median. An optional fixed-rate pass (--latency-rate, oha -q with coordinated-omission correction) holds a sub-saturation request rate so the latency numbers aren't distorted by the harness falling behind.

  5. Repetition, warmup, and full disclosure. Each cell is measured --reps times; the API suite rotates which framework runs first each repetition and the SSR suite alternates which of the two servers is measured first, so first-mover / thermal / noisy-neighbour bias never lands on the same framework. The headline is the median; variance (stddev, coefficient of variation) is reported and saved (a single rep reports variance as , not ±0%). A timed --warmup-duration drives load to steady state before measuring. Every result JSON records the generator (and its version), whether it ran off-box, worker count, connection levels, durations, reps, instance types, and the exact version of every framework (results/.framework-versions.json). A file is stamped meta.publishable only when measured off-box with oha.

Apples-to-apples server configuration

  • The async Python frameworks (Pyxle, FastAPI, Django) run on uvicorn (with uvicorn[standard]/uvloop pinned); Flask runs on gunicorn with gthread workers, since it's synchronous. With --workers N each framework gets the same N worker processes — Python: uvicorn/gunicorn --workers N; Node: the cluster module (workers share one listening socket, the kernel load-balances, no reverse-proxy hop); Pyxle: pyxle serve --workers N. The 1-worker baseline is genuinely one process for every framework, Flask included. Flask additionally runs a fixed, disclosed number of gthread threads per worker (default 8, in results/.framework-versions.json) — its production model for a synchronous app; threads are not extra processes.
  • Pyxle runs via pyxle serve (production mode) with CSRF disabled for a fair POST comparison. Pyxle's health probe is /api/healthz (file-based routing); every other framework uses /health. Both return {"status":"ok"}.
  • Each framework's code is idiomatic — not artificially optimized or handicapped: the DB endpoints use a persistent per-worker connection and a process-lifetime prepared statement in every framework, and /api/json emits compact JSON everywhere.
  • SSR suite: measures dynamic SSR — the page re-rendered on every request (Pyxle @server vs Next.js force-dynamic). For static/cacheable content Next.js pre-renders and serves at static-file speed; this benchmark deliberately targets pages that must render per request, and discloses the per-response transfer size of each.

Reproducing the published numbers (two-box)

The numbers on pyxle.dev/benchmarks are produced with the generator off-box. Launch two instances of the same type (we use c7i.4xlarge, 16 vCPU) in the same availability zone / cluster placement group so the network hop between them is minimal, and a security group that lets the client reach the server on the framework ports (3000, 8001–8006, 8011) over the private network.

# On BOTH boxes — toolchain (Python 3.11, Node 20, oha). Re-login afterward so
# the new PATH (python3.11 + oha, added to ~/.bashrc) takes effect:
bash scripts/ec2-setup.sh && exec "$SHELL" -l

# --- SERVER box (private IP e.g. 10.0.0.10) ---------------------------------
#   single-worker baseline, then the all-cores run (two server lifecycles):
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all              # 1 worker
#   ... run the client passes below against it, Ctrl-C, then:
PYXLE_LOCAL_SRC=$HOME/pyxle ./bench.sh --serve-only --suite=all --workers=auto

# --- CLIENT box -------------------------------------------------------------
export BENCH_INSTANCE_TYPE=c7i.4xlarge
export BENCH_SERVER_INSTANCE=c7i.4xlarge BENCH_CLIENT_INSTANCE=c7i.4xlarge
SRV=10.0.0.10
# against the 1-worker server (--suite=all writes run-1w-api.json + run-1w-ssr.json):
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
  --reps=5 --duration=30 --warmup-duration=20 --connections=100 \
  --output=results/run-1w.json
# against the all-cores server:
./bench.sh --generate-only --target=$SRV --generator=oha --suite=all \
  --reps=5 --duration=30 --warmup-duration=20 --connections=100 \
  --latency-rate=2000 --output=results/run-16w.json

# --- Combine the API passes into the per-core / scaling / aggregate table ---
cd bench && node scaling.mjs ../results/run-1w-api.json ../results/run-16w-api.json

--connections must be high enough to saturate the N-worker server while the client's own CPU stays well under load — confirm a fast framework scales ~linearly 1→N. If the top frameworks all converge to one number, the client is still the bottleneck: use a bigger client box. Tear the instances down when finished.

Project structure

benchmarks/
├── README.md
├── bench.sh                  # One-command runner (topology + CLI-arg driven)
├── scripts/
│   └── ec2-setup.sh          # AL2023 toolchain incl. pinned oha
├── bench/                    # Load-test runners
│   ├── loadgen.mjs           # autocannon + oha backends → one metrics shape
│   ├── stats.mjs             # reps → median + variance
│   ├── api.mjs               # API throughput suite
│   ├── ssr.mjs               # SSR + full-stack suite (Pyxle SSR vs Next.js)
│   ├── scaling.mjs           # per-core / scaling / aggregate report
│   └── verify-parity.mjs     # pre-run parity + payload-size check
├── frameworks/               # One self-contained app per framework
│   ├── pyxle/  pyxle-ssr/  fastapi/  django/  flask/  express/  hono/  nextjs/
│   └── shared/               # Shared DB seed helpers
└── results/                  # Timestamped JSON per run (gitignored)

Reproducibility

Everything needed to reproduce the published numbers is in this one repository — no submodules, no separately-cloned apps. A fresh clone plus the two-box runbook above re-runs every comparison shown on pyxle.dev/benchmarks. Absolute numbers vary with hardware; the relative standings and the methodology are what the page reports, and each published figure links to the raw result JSON it came from.

Contributing

To add a new framework to the API suite:

  1. Create frameworks/<name>/ with the app code.
  2. Implement all six endpoints with identical logic and response format.
  3. Add it (name, port, language) to the FRAMEWORKS registry in bench/api.mjs and the expectations in bench/verify-parity.mjs.
  4. Add it to the server-start block in bench.sh.
  5. Run cd bench && node verify-parity.mjs --suite=api — it must pass before any numbers count.
  6. Update this README.

License

MIT

About

Transparent, reproducible benchmarks comparing [Pyxle](https://pyxle.dev) against popular web frameworks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors