Releases: ScrapingBee/scrapingbee-cli
v1.4.1 - crawl extension fix (non-seed pages)
Highlights
Patch release closing bug SCR-371: scrapingbee crawl with --extract-rules,
--ai-extract-rules, or --ai-query now writes every discovered page as N.json, not
just the seed.
The full crawl + extract + export → CSV pipeline produces N-row CSVs
instead of silently collapsing to 1 row.
What's fixed
The v1.4.0 "Crawl extension priority" fix only covered the seed URL. Discovered pages
still fell through to the URL-path heuristic and were saved as N.html despite a JSON
body, so scrapingbee export --format csv silently dropped every non-seed page.
_preferred_extension_from_scrape_params now forces "json" for --extract-rules,
--ai-extract-rules, and --ai-query, so every crawled page is written as N.json.
The _url column in exported CSVs is also populated for every row as a side effect
(the manifest now records the correct .json path per URL).
Upgrade
pip install --upgrade scrapingbee-cli
scrapingbee --version # => 1.4.1
See
CHANGELOG.md
for the full entry.
What's Changed
- [SCR-371] Fix crawl --extract-rules saving non-seed pages as .html by @kostas-jakeliunas-sb in #18
Full Changelog: v1.4.0...v1.4.1
v1.4.0 — Smart Extract, tutorial, and path language
What's Changed
- v1.4.0 — Smart Extract, tutorial, and path language by @sahilsunny in #17
Full Changelog: v1.3.1...v1.4.0
v1.3.1 Scraping configurations, parameter flexibility, and security improvements
What's Changed
- Scraping config and underscore support by @sahilsunny in #16
Full Changelog: v1.3.0...v1.3.1
v1.3.0 Security hardening for shell execution features
What's Changed
- v1.3.0 — Security hardening for shell execution features by @sahilsunny in #15
Full Changelog: v1.2.3...v1.3.0
v1.2.3 ChatGPT params, bug fixes, better user agent header and DX
What's Changed
- Add version and platform info to User-Agent header by @kostas-jakeliunas-sb in #12
- Add advanced usage examples documentation by @kostas-jakeliunas-sb in #13
- ChatGPT params, bug fixes, exact credits, better DX by @sahilsunny in #14
New Contributors
- @kostas-jakeliunas-sb made their first contribution in #12
Full Changelog: v1.2.2...v1.2.3
v1.2.2 Restructure plugin layout, upgrade AGENTS.md, add skill directories for Copilot and OpenCode
What's Changed
- Restructure plugin layout, upgrade AGENTS.md, add skill directories for Copilot and OpenCode by @sahilsunny in #11
Full Changelog: v1.2.1...v1.2.2
v1.2.1 Fixed claude marketplace file typo
v1.2.0 — AI extraction, --update-csv, cron scheduling, crawl filtering, per-command options
What's Changed
- feat: v1.2.0 — AI extraction, --update-csv, cron scheduling, crawl filtering, per-command options by @sahilsunny in #6
Full Changelog: v1.1.0...v1.2.0
v1.1.0 Shell-Safe UX, Position-Independent Options, Pipelines & Batch
Highlights
- Global options work anywhere —
scrapingbee google "query" --verbose --output-file out.jsonnow works. No more "must appear before the subcommand" errors. - No shell quoting needed —
--extract-field organic_results.url(dot syntax replaces[]brackets),--duration short(replaces"<4"). Every option is shell-safe. - Pipelines without jq — chain any search → batch in one line:
scrapingbee google "query" --extract-field organic_results.url > urls.txt
New Commands
export— merge batch/crawl output to CSV, NDJSON, or TXTschedule— run any command on a repeating interval (--every 1h) with automatic change detection (--auto-diff)
New Global Flags
--extract-field, --fields, --diff-dir, --resume, --no-progress, --chunk-size/--chunk-overlap (RAG output), --retries/--backoff
Batch & Crawl
- Concurrency control, progress counter, resume interrupted jobs
- Change detection via
--diff-dir(MD5-based, unchanged files skipped) - Enriched
manifest.jsonwithcredits_used,latency_ms,content_md5,fetched_at - Crawl from sitemap (
--from-sitemap), crawl resume, crawl manifest
Pipelines (no jq, no sed)
| Pattern | Command |
|---|---|
| SERP → scrape | google QUERY --extract-field organic_results.url > urls.txt → scrape --input-file urls.txt |
| Amazon search → details → CSV | amazon-search QUERY --extract-field products.asin > asins.txt → amazon-product --input-file asins.txt → export --format csv |
| YouTube search → metadata | youtube-search QUERY --extract-field results.link > vids.txt → youtube-metadata --input-file vids.txt |
| Monitor for changes | scrape --input-file urls.txt --diff-dir old_run/ --output-dir new_run/ |
AI Agent Support
Multi-tool agent compatibility — skill and pipeline agent definitions for Claude Code, Cursor, Amp, RooCode, Windsurf, Kiro, Gemini CLI, GitHub Copilot, Augment Code, OpenCode,
Amazon Q, and Factory AI.
Testing
- 343 unit tests (help-output tests for every command and parameter)
- 182 E2E tests (0 skipped)
- CI via GitHub Actions (Python 3.10–3.13)