Skip to content

Releases: ScrapingBee/scrapingbee-cli

v1.4.1 - crawl extension fix (non-seed pages)

17 Apr 13:59
bde6788

Choose a tag to compare

Highlights

Patch release closing bug SCR-371: scrapingbee crawl with --extract-rules,
--ai-extract-rules, or --ai-query now writes every discovered page as N.json, not
just the seed.

The full crawl + extract + export → CSV pipeline produces N-row CSVs
instead of silently collapsing to 1 row.

What's fixed

The v1.4.0 "Crawl extension priority" fix only covered the seed URL. Discovered pages
still fell through to the URL-path heuristic and were saved as N.html despite a JSON
body, so scrapingbee export --format csv silently dropped every non-seed page.
_preferred_extension_from_scrape_params now forces "json" for --extract-rules,
--ai-extract-rules, and --ai-query, so every crawled page is written as N.json.
The _url column in exported CSVs is also populated for every row as a side effect
(the manifest now records the correct .json path per URL).

Upgrade

pip install --upgrade scrapingbee-cli
scrapingbee --version # => 1.4.1

See
CHANGELOG.md
for the full entry.

What's Changed

Full Changelog: v1.4.0...v1.4.1

v1.4.0 — Smart Extract, tutorial, and path language

02 Apr 11:26
ea13734

Choose a tag to compare

What's Changed

  • v1.4.0 — Smart Extract, tutorial, and path language by @sahilsunny in #17

Full Changelog: v1.3.1...v1.4.0

v1.3.1 Scraping configurations, parameter flexibility, and security improvements

30 Mar 13:30
84b29f7

Choose a tag to compare

What's Changed

Full Changelog: v1.3.0...v1.3.1

v1.3.0 Security hardening for shell execution features

27 Mar 15:26
5ff0104

Choose a tag to compare

What's Changed

  • v1.3.0 — Security hardening for shell execution features by @sahilsunny in #15

Full Changelog: v1.2.3...v1.3.0

v1.2.3 ChatGPT params, bug fixes, better user agent header and DX

25 Mar 07:42
b0c92e3

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.2.2...v1.2.3

v1.2.2 Restructure plugin layout, upgrade AGENTS.md, add skill directories for Copilot and OpenCode

16 Mar 16:18
0ffa36f

Choose a tag to compare

What's Changed

  • Restructure plugin layout, upgrade AGENTS.md, add skill directories for Copilot and OpenCode by @sahilsunny in #11

Full Changelog: v1.2.1...v1.2.2

v1.2.1 Fixed claude marketplace file typo

16 Mar 15:26
2014d37

Choose a tag to compare

What's Changed

Full Changelog: v1.2.0...v1.2.1

v1.2.0 — AI extraction, --update-csv, cron scheduling, crawl filtering, per-command options

16 Mar 11:36
24dcbb0

Choose a tag to compare

What's Changed

  • feat: v1.2.0 — AI extraction, --update-csv, cron scheduling, crawl filtering, per-command options by @sahilsunny in #6

Full Changelog: v1.1.0...v1.2.0

v1.1.0 Shell-Safe UX, Position-Independent Options, Pipelines & Batch

03 Mar 15:44
d565756

Choose a tag to compare

Highlights

  • Global options work anywherescrapingbee google "query" --verbose --output-file out.json now works. No more "must appear before the subcommand" errors.
  • No shell quoting needed--extract-field organic_results.url (dot syntax replaces [] brackets), --duration short (replaces "<4"). Every option is shell-safe.
  • Pipelines without jq — chain any search → batch in one line: scrapingbee google "query" --extract-field organic_results.url > urls.txt

New Commands

  • export — merge batch/crawl output to CSV, NDJSON, or TXT
  • schedule — run any command on a repeating interval (--every 1h) with automatic change detection (--auto-diff)

New Global Flags

--extract-field, --fields, --diff-dir, --resume, --no-progress, --chunk-size/--chunk-overlap (RAG output), --retries/--backoff

Batch & Crawl

  • Concurrency control, progress counter, resume interrupted jobs
  • Change detection via --diff-dir (MD5-based, unchanged files skipped)
  • Enriched manifest.json with credits_used, latency_ms, content_md5, fetched_at
  • Crawl from sitemap (--from-sitemap), crawl resume, crawl manifest

Pipelines (no jq, no sed)

Pattern Command
SERP → scrape google QUERY --extract-field organic_results.url > urls.txtscrape --input-file urls.txt
Amazon search → details → CSV amazon-search QUERY --extract-field products.asin > asins.txtamazon-product --input-file asins.txtexport --format csv
YouTube search → metadata youtube-search QUERY --extract-field results.link > vids.txtyoutube-metadata --input-file vids.txt
Monitor for changes scrape --input-file urls.txt --diff-dir old_run/ --output-dir new_run/

AI Agent Support

Multi-tool agent compatibility — skill and pipeline agent definitions for Claude Code, Cursor, Amp, RooCode, Windsurf, Kiro, Gemini CLI, GitHub Copilot, Augment Code, OpenCode,
Amazon Q, and Factory AI.

Testing

  • 343 unit tests (help-output tests for every command and parameter)
  • 182 E2E tests (0 skipped)
  • CI via GitHub Actions (Python 3.10–3.13)

v.1.0.1 SKILL.md Fix

24 Feb 18:47
3ad6a66

Choose a tag to compare

What's Changed

Full Changelog: v1.0.0...v1.0.1