The Memory of Your Internet — Archive Everything You Browse.
English | Chinese
A self-hosted personal web archiving system that automatically captures and preserves web pages you visit in Chrome — HTML, CSS, JavaScript, images, and all. When the original page goes offline, you can still browse your archived copy with styles and layout intact.
Chrome + Tampermonkey ──HTTP POST──▶ Go Server ──▶ PostgreSQL (metadata)
(auto-capture on │ + File System (assets)
page load) │
▼
Web UI ──▶ Browse / Search / Replay
- A Tampermonkey userscript runs in your browser, automatically capturing the full DOM and resources once the page finishes loading. If significant DOM changes occur afterward, it submits one additional update.
- The Go server acknowledges archive/update requests as soon as the snapshot metadata is stored. Resource downloading, deduplication, URL rewriting, and snapshot finalization continue in the background.
- A built-in Web UI lets you list, search, and replay any archived page — fully offline, no external dependencies.
- High-fidelity replay — CSSOM serialization, computed styles inlining, and anti-refresh protection reproduce pages as close to the original as possible
- Full-page capture — HTML, CSS, JS, images, fonts; resource URLs are rewritten to local paths
- Cross-origin resource recovery — server-side extraction and download of resources blocked by CORS
- Content-hash deduplication — identical resources shared across pages are stored only once (SHA-256)
- Version history — same URL archived multiple times, distinguished by timestamp
- Timeline view — browse all snapshots of a URL on a visual timeline (like web.archive.org), with prev/next navigation between snapshots
- Smart dedup — session-level + server-level dedup prevents redundant captures; content-hash comparison skips unchanged pages
- Dynamic content support — captures the live DOM state; MutationObserver triggers one auto-update if significant changes occur after initial capture
- SPA-aware — detects SPA navigation, resets capture state per route
- Anti-refresh protection — archived pages are frozen: timers, WebSockets, and navigation APIs are neutralized
- Web UI — responsive interface to browse, full-text search (page content, URL, and title), filter by date range and domain, and replay archived pages
- RESTful API — programmatic access to all archiving and query operations
- PostgreSQL 14+
- Chrome or Firefox + Tampermonkey extension (v5.3+)
The fastest way to get started. Docker Compose will set up both the server and PostgreSQL automatically.
# Clone the repository
git clone https://github.com/icodeface/wayback-archiver.git
cd wayback-archiver
# Start all services
docker compose up -d
# View logs
docker compose logs -f waybackThe server will be available at http://localhost:8080. Skip to step 4 (Install the Userscript).
For detailed Docker configuration and deployment options, see docs/DOCKER.md.
Download the latest release from the Releases page:
- macOS:
wayback-server-darwin-amd64.tar.gz(Intel) orwayback-server-darwin-arm64.tar.gz(Apple Silicon) - Linux:
wayback-server-linux-amd64.tar.gzorwayback-server-linux-arm64.tar.gz - Windows:
wayback-server-windows-amd64.zip - Userscript:
wayback-userscript.js
Extract the archive:
# macOS/Linux
tar -xzf wayback-server-*.tar.gz
# Windows: extract the .zip fileBuilding from source? See docs/BUILD.md for manual compilation instructions.
# PostgreSQL uses your current system username as the default database user
# If your system username is alice, this is equivalent to: createdb -U alice wayback
createdb wayback
# Run the schema (init_db.sql is included in the release archive)
psql wayback < init_db.sqlUpgrade note for existing installs: Any installation upgrading from
< 1.1.0to1.1.0or later needs thepages.snapshot_statecolumn. Current server builds automatically apply this change during startup; if your database user does not haveALTER TABLEpermission, run the following idempotent SQL manually first:ALTER TABLE pages ADD COLUMN IF NOT EXISTS snapshot_state VARCHAR(16); UPDATE pages SET snapshot_state = 'ready' WHERE snapshot_state IS NULL OR snapshot_state = ''; ALTER TABLE pages ALTER COLUMN snapshot_state SET DEFAULT 'pending'; ALTER TABLE pages ALTER COLUMN snapshot_state SET NOT NULL;
# Optional: create .env file for custom configuration
# See Configuration section below for available options
./wayback-serverThe server starts at http://localhost:8080 by default.
If you need a proxy for downloading external resources:
export http_proxy=http://127.0.0.1:7897
export https_proxy=http://127.0.0.1:7897
./wayback-server- Download
wayback-userscript.jsfrom the Releases page - Open Tampermonkey dashboard in your browser
- Click "Create a new script"
- Paste the contents of
wayback-userscript.js - Save and enable
Chrome users: Right-click the Tampermonkey icon → Manage extension, then enable the "Allow user scripts" toggle. Firefox does not require this step.
That's it. Pages are automatically archived as soon as they load. Open http://localhost:8080 to browse your archive.
Puppeteer Integration: For automated archiving, see docs/PUPPETEER.md.
Environment variables (or .env file in the project root):
The server automatically loads .env from the working directory if it exists. You can also set environment variables directly.
| Variable | Default | Description |
|---|---|---|
DB_HOST |
localhost |
PostgreSQL host |
DB_PORT |
5432 |
PostgreSQL port |
DB_USER |
postgres |
Database user. PostgreSQL defaults to the current system username, so leaving this unset is usually recommended. |
DB_PASSWORD |
(empty) | Database password |
DB_NAME |
wayback |
Database name |
DB_SSLMODE |
disable |
PostgreSQL SSL mode used by the server connection |
SERVER_HOST |
127.0.0.1 |
Server bind address (0.0.0.0 = all interfaces, 127.0.0.1 = localhost only) |
SERVER_PORT |
8080 |
HTTP server port |
ALLOWED_ORIGINS |
http://localhost:8080,http://127.0.0.1:8080 |
CORS allowed origins (comma-separated). For remote deployment, add your domain: https://your-domain.com |
DATA_DIR |
./data |
Storage directory for HTML and resources |
LOG_DIR |
./data/logs |
Log file directory |
AUTH_PASSWORD |
(empty) | HTTP Basic Auth password (disabled when empty, username: wayback). REQUIRED for remote deployment |
RESOURCE_METADATA_CACHE_MB |
10% of system memory | Metadata cache budget for resource URL lookups plus HTTP freshness/validator reuse and revalidation. RESOURCE_CACHE_MB is still accepted as a legacy alias. |
ENABLE_DEBUG_API |
false |
Enable /api/debug/* endpoints (memstats, gc, pprof). Keep disabled unless you are actively debugging. |
COMPRESSION_LEVEL |
-1 |
Compression level: 1 (fastest) to 9 (best), -1 (default/balanced). Response compression always enabled, auto-negotiated via Accept-Encoding |
Server-side (responses): Always enabled, auto-negotiated
- Clients that send
Accept-Encoding: gzipget compressed responses - Clients that don't support gzip get uncompressed responses
- No configuration needed - works automatically
Client-side (uploads): Configurable in browser/src/config.ts
- For local deployment (default): Keep
ENABLE_COMPRESSION: false- Localhost transfer is already fast
- No CPU overhead from compression
- For remote deployment: Set
ENABLE_COMPRESSION: true- 95%+ reduction for uploads (large HTML snapshots)
- Rebuild userscript:
cd browser && npm run build
For deploying to a remote server, see REMOTE_DEPLOYMENT.md for detailed instructions.
Quick setup:
# .env configuration
ALLOWED_ORIGINS=https://your-domain.com
AUTH_PASSWORD=your_secure_password
SERVER_HOST=0.0.0.0
# Browser config.ts
SERVER_URL: 'https://your-domain.com/api/archive'
AUTH_PASSWORD: 'your_secure_password'
ENABLE_COMPRESSION: true # Enable upload compression for remote deploymentSecurity Notes:
- Always use HTTPS for remote deployment
- Set a strong
AUTH_PASSWORD - Limit
ALLOWED_ORIGINSto trusted domains only Origin: nullis intentionally rejected because it also covers sandboxed iframes and data/file-backed opaque origins- Both CORS and Basic Auth are required for security (defense in depth)
Performance Notes:
- Enable
ENABLE_COMPRESSIONin browser config for remote deployment - Reduces upload bandwidth by 95%+ (especially for large HTML snapshots)
- Response compression is automatic (no configuration needed)
- Minimal CPU overhead, significant network savings
Capture Notes:
/api/archiveand/api/archive/:idreject decompressed JSON bodies larger than 32 MiB with HTTP413- Cross-origin iframe snapshots remain enabled, but the browser bridge now signs requests and returns frame HTML over a private
MessageChannelinstead of publicwindow.postMessage - Resource downloads only forward cookies that still match the target URL under browser rules (
hostOnly,domain,path,secure, expiry,SameSite, partition top-level site) - The browser script skips local/private targets including
localhost,127.0.0.1,172.16.0.0/12,169.254.0.0/16,::1,fc00::/7,fe80::/10, and.local
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/version |
Server version and build info |
POST |
/api/archive |
Create a page archive |
PUT |
/api/archive/:id |
Update an existing archive snapshot |
GET |
/api/pages |
List all archived pages |
GET |
/api/pages/:id |
Get page details |
GET |
/api/pages/:id/content |
Get page content as Markdown (for AI/LLM consumption) |
GET |
/api/search?q=keyword |
Search pages by URL or title |
GET |
/api/pages/timeline?url=URL |
Get all snapshots of a URL (timeline view) |
GET |
/api/logs |
List available log files |
GET |
/api/logs/latest |
Get latest log file content (supports ?tail=N&grep=keyword) |
GET |
/api/logs/:filename |
Get log file content (supports ?tail=N&grep=keyword) |
GET |
/view/:id |
Replay an archived page |
GET |
/timeline?url=URL |
Visual timeline page for a URL |
GET |
/logs |
Server logs viewer |
Returns { status, page_id, action } where action is created or unchanged (content identical, only last_visited updated).
When action is created, the response is sent immediately after the page row and raw HTML are stored; resource downloads and HTML rewriting continue in the background.
The server tracks background create progress with snapshot_state (pending, ready, failed). Only ready snapshots are treated as fully deduplicated; if background finalize fails, a later capture of the same URL and content hash reuses the same page row and automatically retries finalize instead of getting stuck as unchanged.
Accepts the same body as POST. Returns immediately once the server has accepted the update request. If the content changed, resource re-processing and the final snapshot swap continue in the background; old HTML is queued for delayed deletion after a successful swap. Returns { status, page_id, action } where action is updated or unchanged.
The request body url must exactly match the existing page URL for :id; otherwise the server rejects the update with HTTP 400 to prevent cross-page snapshot corruption.
wayback-archiver/
├── Makefile # Build, test, cross-compile
├── bin/ # Build output (server binary + userscript)
├── browser/ # Tampermonkey userscript (TypeScript)
│ ├── src/
│ │ ├── main.ts # Entry point & orchestration
│ │ ├── config.ts # Constants
│ │ ├── types.ts # TypeScript interfaces
│ │ ├── page-filter.ts # URL filtering logic
│ │ ├── page-freezer.ts # Freeze page runtime state
│ │ ├── dom-collector.ts # DOM serialization
│ │ └── archiver.ts # Server communication
│ ├── dist/ # Built userscript
│ └── build.js # Bundle script
│
├── server/ # Go backend
│ ├── cmd/server/main.go # Entry point
│ ├── internal/
│ │ ├── api/ # HTTP handlers (modular)
│ │ ├── config/ # Environment-based config
│ │ ├── database/ # PostgreSQL operations
│ │ ├── logging/ # File-based logging with rotation
│ │ ├── models/ # Data models
│ │ └── storage/ # File storage & dedup
│ └── web/ # Web UI static files
│
├── .env.example # Configuration template
└── tests/ # Test suites
├── browser/ # Browser-side tests
└── server/ # Server-side & E2E tests
data/
├── html/ # HTML snapshots, organized by date
│ └── 2026/03/09/
│ └── <timestamp>_<hash>.html
├── logs/ # Server logs, rotated by size (10MB) and date (7-day retention)
│ ├── wayback-2026-03-12.001.log
│ └── wayback-2026-03-12.002.log
└── resources/ # Deduplicated static resources
└── ab/cd/
└── <sha256>.css
See docs/BUILD.md for build instructions, cross-compilation, and testing.
This project includes an Agent skill for AI-assisted querying and exploration of your archived pages. Use it to search, analyze, and interact with your archive through natural language.
- Some cross-origin resources may still fail due to server-side 403/404 responses
- Dynamically injected scripts (loaded via JS at runtime) may not be captured
- Tracking pixels and analytics URLs with dynamic parameters are not preserved (they don't affect page rendering)
- Very large media files (video, large images) will consume significant storage
MIT


