ArchiveBox now stores schedules in the database and lets the orchestrator materialize them into queued Crawl records at the right time. You no longer need host cron, user crontabs, or a separate archivebox_scheduler container when archivebox server is running.
archivebox schedule ...creates aCrawlSchedulerecord plus a sealed templateCrawl.- The long-running global orchestrator inside
archivebox serverwatches enabled schedules. - When a schedule becomes due, the orchestrator creates a new queued
Crawl. - That queued crawl is processed the same way as UI/API-submitted work.
One-shot foreground flows such as archivebox add ... continue to process only the crawl they were asked to run. They do not also sweep and execute unrelated scheduled crawls.
cd ~/archivebox/data
archivebox schedule --every=daily --depth=1 https://example.com/feed.xml
archivebox schedule --every='0 */6 * * *' https://example.com/feed.xml
archivebox schedule --show
archivebox schedule --clear
archivebox schedule --run-all
archivebox schedule --foregroundAccepted schedule formats:
- Aliases:
minute,hour,day,week,month,year,daily,weekly,monthly,yearly - Cron expressions: e.g.
0 */6 * * *
archivebox schedule --run-all enqueues every enabled schedule immediately.
archivebox schedule --foreground runs the global orchestrator in the foreground, which is useful outside archivebox server if you want a dedicated long-running scheduler/worker process without the web UI.
Running archivebox schedule --every=day with no import_path creates a recurring maintenance schedule that queues archivebox://update crawls.
With the new orchestrator flow, you only need the main archivebox service:
services:
archivebox:
image: archivebox/archivebox:dev
command: server --quick-init 0.0.0.0:8000
volumes:
- ./data:/dataCreate schedules with:
docker compose run --rm archivebox schedule --every=weekly --depth=1 https://example.com/feed.xml
docker compose run --rm archivebox schedule --showIf the main archivebox server container is already running, its orchestrator will pick up future scheduled runs automatically. There is no scheduler sidecar to restart.
Archive a Twitter mirror once a week:
archivebox schedule --every=weekly --depth=1 'https://nitter.net/ArchiveBoxApp'Archive a subreddit and linked discussions once a week:
archivebox config --set URL_WHITELIST='^http(s)?:\/\/(.+)?teddit\.net\/?.*$'
archivebox schedule --every=weekly --overwrite --depth=1 'https://teddit.net/r/DataHoarder/'Archive Hacker News every day:
archivebox config --set URL_BLACKLIST='^http(s)?:\/\/(.+\.)?(youtube\.com)|(amazon\.com)\/.*$'
archivebox schedule --every=daily --depth=1 'https://news.ycombinator.com'Queue a daily maintenance update:
archivebox schedule --every=day