Production-oriented API for scraping demo pages with Playwright, Express, TypeScript, and Swagger/OpenAPI documentation.
- Express API with route, controller, service, middleware, and utility layers
- Playwright Chromium scraping through a shared browser service
- Swagger UI at
/docsand OpenAPI JSON at/openapi.json - Zod request validation, Helmet, CORS, compression, and rate limiting
- Winston structured logging
- Jest unit tests and Playwright API/e2e tests
- Docker image with Playwright browser dependency installation
- Node.js 20+
- npm
- Docker, optional for container deployment
npm install
cp .env.example .env
npm run devThe API runs on http://localhost:3000 by default.
Useful URLs:
- Root:
http://localhost:3000/redirects to/docs - Health:
http://localhost:3000/health - Swagger UI:
http://localhost:3000/docs - OpenAPI JSON:
http://localhost:3000/openapi.json
npm run dev # Start TypeScript dev server
npm run build # Compile src/ to dist/
npm start # Run compiled app
npm run typecheck # TypeScript check without emit
npm run lint # ESLint
npm run format:check # Prettier check
npm test # Jest tests
npm run test:e2e # Playwright API/e2e tests
npm run docker:build # Build Docker image
npm run docker:compose # Start with Docker Compose
npm run zip # Build and create distribution zipGET /Redirects to /docs.
GET /healthReturns service health, uptime, environment, and timestamp.
GET /api/scrape/sitesScrapes available test site links from SCRAPER_BASE_URL.
GET /api/scrape/ecommerce/products?limit=10Query parameters:
limit: integer from 1 to 50, default10
POST /api/scrape/run
Content-Type: application/json
{
"target": "ecommerce",
"limit": 10
}Supported targets:
test-sitesecommerce
POST /api/tests/smoke
Content-Type: application/json
{
"target": "test-sites"
}Success responses:
{
"success": true,
"data": {},
"meta": {
"durationMs": 123,
"timestamp": "2026-05-07T00:00:00.000Z"
}
}Error responses:
{
"success": false,
"error": "Error message",
"meta": {
"durationMs": 12,
"timestamp": "2026-05-07T00:00:00.000Z"
}
}playwright-with-nodejs/
|-- src/
| |-- app.ts
| |-- index.ts
| |-- server.ts
| |-- scraper.ts
| |-- scraper.test.ts
| |-- config/
| |-- controllers/
| |-- middlewares/
| |-- routes/
| |-- schemas/
| |-- services/
| |-- types/
| `-- utils/
|-- tests/
| `-- e2e/
|-- docker/
| `-- entrypoint.sh
|-- Dockerfile
|-- docker-compose.yml
|-- jest.config.js
|-- playwright.config.ts
|-- package.json
|-- tsconfig.json
`-- README.md
dist/, coverage/, test-results/, and playwright-report/ are generated outputs.
| Variable | Default | Description |
|---|---|---|
NODE_ENV |
development |
Runtime environment: development, production, or test |
PORT |
3000 |
HTTP port |
HOST |
0.0.0.0 |
Bind host |
LOG_LEVEL |
info |
Winston log level |
CORS_ORIGIN |
* |
Allowed CORS origin, comma-separated for multiple origins |
SCRAPER_BASE_URL |
https://webscraper.io/test-sites |
Test site source URL |
SCRAPER_ECOMMERCE_URL |
https://webscraper.io/test-sites/e-commerce/allinone |
E-commerce source URL |
PLAYWRIGHT_HEADLESS |
true |
Browser headless mode |
PLAYWRIGHT_TIMEOUT_MS |
30000 |
Playwright page/navigation timeout |
SCRAPER_MAX_LIMIT |
50 |
Maximum scrape limit used by configuration |
RATE_LIMIT_WINDOW_MS |
60000 |
Rate limit window |
RATE_LIMIT_MAX |
60 |
Max requests per rate limit window |
Build and run:
npm run docker:build
docker run -p 3000:3000 --env-file .env playwright-api:latestOr use Compose:
npm run docker:composeThe Dockerfile uses Debian slim instead of Alpine so Playwright can install Chromium and OS dependencies more reliably. If package-lock.json exists, Docker uses npm ci; otherwise it falls back to npm install. For repeatable production builds, generate and commit package-lock.json.
- Set
NODE_ENV=production - Configure
CORS_ORIGINinstead of using* - Use
LOG_LEVEL=warnorerror - Commit
package-lock.jsonfor deterministic installs - Send logs to external storage in production
- Put the service behind HTTPS/reverse proxy
- Set CPU and memory limits for the container
- Monitor
/health - Validate target sites are reachable from the deployment environment
- Scraping and e2e tests depend on external network access to
webscraper.io. - Swagger uses a relative OpenAPI server URL (
/) so the "Try it out" button calls the same origin used to open/docs. - Endpoint-level
headlessvalues are accepted by current schemas for compatibility, but browser mode is controlled byPLAYWRIGHT_HEADLESS. src/scraper.tsis a legacy standalone scraper used by Jest tests; the main API uses the service/controller stack undersrc/servicesandsrc/controllers.
MIT
