All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Crawl + extraction non-seed extension (SCR-371) — the v1.4.0 "Crawl extension priority" fix only covered the seed URL. Discovered pages still fell through to the URL-path heuristic and were saved as
N.htmldespite a JSON body, soscrapingbee export --format csvsilently dropped every non-seed page (1-row CSVs)._preferred_extension_from_scrape_paramsnow forces"json"for--extract-rules,--ai-extract-rules, and--ai-query, so every crawled page — not just the seed — is written asN.json. The_urlcolumn in exported CSVs is also populated for every row as a side effect (the manifest now records the correct.jsonpath per URL).
pyproject.tomlproject URLs — addedChangelogandIssuesentries so PyPI surfaces direct links to CHANGELOG.md and the GitHub issue tracker alongside Homepage / Documentation / Repository.
tutorialcommand — interactive step-by-step guide to CLI features (--chapter N,--reset,--list,--output-dir).--confirmflag forcrawl— pass--confirm yesto skip the interactive discovery-phase prompt in scripts.- Discovery-phase warning for
crawl— when--extract-rules,--ai-query,--ai-extract-rules,--return-page-text, or bare screenshot is active, crawl warns that each page will cost two requests and prompts to confirm. - Binary URL skip in crawl discovery — crawl skips the HTML discovery re-request for URLs with known binary extensions (
.jpg,.png,.pdf,.css,.js, etc.) that can never contain links. scrapingbee --scraping-configauto-routes to scrape command. URL is optional when using a saved config.scrapingbee --resume(bare) discovers incomplete batches in the current directory and shows resume commands.--extract-fieldand--fieldsin batch mode — extract-field works with individual files;--fieldsfilters JSON output keys and supports dot notation across all output formats.--flatten-depthfor export CSV to control nesting depth (default 5).--overwriteflag to skip file overwrite prompts.- Batch metadata saved to
.batch_meta.jsonfor resume discovery. - Structured user-agent headers (
User-Agent-Client,User-Agent-Environment,User-Agent-OS). scripts/sync-skills.sh— syncs the canonical.agents/skills/tree to all AI platform directories.--smart-extract— client-side extraction with path language; auto-detects JSON, HTML, XML, CSV, Markdown, and plain text.- Path language for
--smart-extract:.key,(escaped key),[0],[0:5]slicing,[0,3]multi-index,[keys]/[values],...keyrecursive search,~Ncontext expansion,[=filter]/[key=filter]value filters,[=/pattern/]regex filters,| OR,& AND. - JSON schema mode for
--smart-extract— accepts the same format as--extract-rulesfor structured extraction. - Full path language support in
--extract-fieldand--fields. - Auto-parse JSON strings during dot-path traversal so nested stringified JSON is traversed transparently.
- Chainable
~Ncontext expansion — works anywhere in the path and can be chained:...text[=*$49*]~2.h3finds a value, goes up N levels, then continues traversal. [!=pattern]negation filter — exclude values or dicts matching a pattern:...div[class!=sidebar].[*=pattern]glob key filter — find dicts where any key's value matches:...*[*=faq].- Scalar-only value matching — filters only match strings/numbers/booleans, not dicts or lists (prevents false positives).
- List flattening in recursive search —
...section[id=faq]now correctly finds individual elements, not nested lists.
or Nfalsy-value bug —--retries 0,--backoff 0, and similar zero-value numeric options now work correctly across all commands (37 occurrences).- Crawl extension priority —
--extract-rules,--ai-extract-rules, and--ai-querynow correctly produce.jsonoutput files instead of.html. - Back navigation at first step — pressing ← at the first tutorial step now shows "Already at the first step." instead of silently re-running it.
- Tutorial sort option text — CH13-S01 (Amazon) and CH14-S01 (Walmart)
what_to_noticetext now lists the correct valid sort values. - Tutorial CH16-S01 now saves ChatGPT response to file with inline preview.
- Tutorial CH02-S01 and CH03-S01 now save output to file with inline preview.
- Output routing — CSV and NDJSON batch output now correctly uses
--output-file. Individual files use--output-dir. - Flag validation — conflicting flags now produce clear errors instead of being silently ignored.
TextIOWrapperleak in CSV stdout output no longer closes stdout.- File write errors now show clean messages instead of Python tracebacks.
- Network errors caught globally with human-friendly messages.
- Dropped batch results from async exceptions now counted as failures.
--update-csvrequires CSV input file.--resumerequires--output-dir.- Negative
--concurrencyrejected. --everywarns when seconds are rounded to minutes.- Auth distinguishes network errors from invalid API keys.
- Crawl project spider mode rejects API params with a helpful error showing
ScrapingBeeRequestusage. - Spider name detection fixed when
--projectis set (dots in spider names no longer misidentified as URLs). - Export
--columnserrors with available column names when no columns match. - "global --input-file" wording removed from all error messages.
--output-formatchoices simplified tocsvandndjson. Default (no flag) writes individual files.- Crawl concurrency defaults to 1 (with warning) when usage API fails, instead of silent 16.
- Fast-search credit cost corrected to 10 in all skill reference files (was incorrectly documented as 5).
- Skill reference docs updated:
--purchased(YouTube),--save-pattern(crawl),--flatten-depth(export),content_md5(batch output manifest). - AI platform agent files synced across
.agents/,.github/,.kiro/,.opencode/,plugins/;.github/agent renamed from.agent.mdto.md.
--scraping-configparameter forscrapeandcrawlcommands. Apply a pre-saved scraping configuration by name from the ScrapingBee dashboard. Inline options override config settings.--start-pageforwalmart-searchto paginate search results.--deviceforwalmart-productto select device type (desktop, mobile, tablet).--purchasedfilter foryoutube-searchto filter by purchased content.- Parameter value flexibility. Choice parameters now accept both hyphens and underscores interchangeably (e.g.
--sort-by price-lowand--sort-by price_lowboth work). - Improved internal validation.
scrapingbee unsafecommand for managing advanced feature status.- Audit logging.
scrapingbee logoutresets all advanced feature settings.
- ChatGPT
--search,--add-html,--country-codeflags: Thechatgptcommand now supports web-enhanced responses (--search true), full HTML inclusion (--add-html true), and geolocation (--country-code gb).--search falseis silently ignored (onlytruesends the param). - Auto-prepend
https://: URLs without a scheme (e.g.example.com) now automatically gethttps://prepended, matching curl/httpie behavior. Works forscrape,crawl, and--from-sitemap. --extract-fieldpath suggestions: When--extract-fielddoesn't match any data, the CLI now prints a warning with all available dot-paths instead of silent empty output.- Exact credit costs in
--verbose: SERP commands (Google, Fast Search, Amazon, Walmart, YouTube, ChatGPT) now show exact credit costs based on request parameters (e.g.Credit Cost: 10for Google light requests) instead of estimated ranges. - Unit tests for all v1.2.3 changes: 39 new unit tests in
tests/unit/test_v122_fixes.pyplus 8 new e2e tests (FX-01 through FX-08). - CLI documentation page: Full docs at https://www.scrapingbee.com/documentation/cli/ — installation, authentication, all commands, parameters, pipelines, and examples.
--allowed-domainscrawl bug: Fixed a bug where--allowed-domainscaused crawls to produce no output. Scrapy's built-inOffsiteMiddlewarewas reading the spider'sallowed_domainsattribute and filtering out all ScrapingBee proxy requests. Renamed to_cli_allowed_domainsto avoid the conflict.--max-depthwith non-HTML modes: Disabled Scrapy's built-inDepthMiddlewarewhich incorrectly incremented depth on discovery re-fetches, breaking--max-depthwhen using--ai-query,--return-page-markdown, or other non-HTML output modes.- Misleading screenshot warning removed:
--screenshot-full-page truewithout--screenshotno longer prints a false "has no effect" warning — the API handles it correctly and produces a valid screenshot. - Fast Search credit cost: Corrected from 5 to 10 credits in the estimated fallback.
- Installation recommendation: Docs now recommend
uv tool install scrapingbee-clioverpip installfor isolated, globally-available installation without virtual environment management. - Version bumped to 1.2.3 across
pyproject.toml,__init__.py, all skill files, and plugin manifests.
- Plugin directory restructured: Separated marketplace catalog from plugin content. Plugin now lives at
plugins/scrapingbee-cli/with its own.claude-plugin/plugin.json, matching the Claude Code marketplace spec. marketplace.jsonfixed: Moved top-leveldescriptiontometadata.description, updated pluginsourceto./plugins/scrapingbee-cli, removed non-spec$schemafield.AGENTS.mdupgraded: Now comprehensive and self-contained — covers all commands, options, pipelines, extraction, crawling, scheduling, credit costs, troubleshooting, and known limitations. Serves as the single source of truth for tools that readAGENTS.md(Codex CLI, Cursor, Windsurf, Amp, RooCode, Continue, and others).
- GitHub Copilot skills: Added
.github/skills/scrapingbee-cli/for Copilot skill discovery. - OpenCode skills: Added
.opencode/skills/scrapingbee-cli/for OpenCode skill discovery. sync-skills.shupdated: Now syncs skills to.github/skills/and.opencode/skills/in addition to existing destinations.
- Marketplace plugin install: Changed
"source": "."to"source": "./"in.claude-plugin/marketplace.jsonto match Claude Code's marketplace schema validator.
--update-csv: Fetch fresh data and update the input CSV file in-place with the latest results. Replaces the old--diff-dirworkflow.- Cron-based
schedule:schedule --every INTERVAL --name NAME CMDregisters a cron job. Multiple named schedules supported. Use--listto view active schedules with running time,--stop NAMEor--stop allto remove them. Replacing a schedule prompts for confirmation. - Per-command options: Options are now per-command (shown via
scrapingbee [command] --help) instead of global. API-specific options are grouped (Search, Filters, Locale, etc.). --output-format [files|csv|ndjson]: Choose batch output format —files(default, individual files),csv(single CSV), orndjson(streaming JSON lines to stdout).--deduplicate: Normalize URLs and remove duplicates before batch processing. Also available onexportfor removing duplicate CSV rows.--sample N: Process only N random items from input file — useful for testing configurations cheaply.--input-column: CSV input support —--input-file data.csv --input-column urlreads from a named or indexed column.--post-process: Pipe each batch result through a shell command (e.g.--post-process 'jq .title') before saving.- Crawl
--include-pattern/--exclude-pattern: Regex filters for which links the crawler follows. - Crawl
--save-pattern: Only save pages matching this regex. Other pages are visited for link discovery but not saved. Useful for crawling through category pages to reach detail pages. - Rich batch progress: Progress display now shows
[N/total] 50 req/s | ETA 2m 30s | Failures: 3%. - Export
--flatten: Recursively flatten nested JSON dicts to dot-notation CSV columns. Lists of dicts are index-expanded (e.g.buybox.0.price). - Export
--columns: Cherry-pick CSV columns by name. Rows missing all selected columns are dropped. - Auth validates API key:
scrapingbee authnow calls the usage endpoint to verify the key before saving. - Logout checks schedules:
scrapingbee logoutwarns about active schedules and offers to stop them. - Active schedule hint: Every command shows a one-line reminder when schedules are running.
- Crawl resilience:
parse()catches errors from non-HTML responses (JSON, plain text) instead of crashing.
--diff-dir: Removed from batch and export. Use--update-csvfor refreshing data instead.--auto-diff: Removed completely.--daemon: Removed from schedule. Schedule now uses cron jobs that persist across sessions.- Global flags: Options are now per-command. Removed option reorder logic and
--option=valuerejection.
- Crawl broken on Scrapy 2.14: Fixed
start→start_requestsrename that broke the spider. - Crawl
--max-pages: Now enforced at both spider and Scrapy downloader level (CLOSESPIDER_PAGECOUNT). Counts actual fetched pages, not discovered URLs. - Crawl JSON response crash:
_extract_hrefs_from_responsenow catchesValueErrorwhen response is JSON instead of HTML. --robots.txt: DisabledROBOTSTXT_OBEYsince ScrapingBee handles robots.txt compliance.- "Batch complete" sent to stderr: Moved back to stdout.
- CSV export for product pages:
_find_main_listheuristic improved to not expand reviews/variants from single-item detail pages. - Integration test fixes: Fixed hyphenated choice values (
best-match,this-week), changed Amazon--country ustogb.
-
Shell-safe YouTube duration aliases:
--duration short/medium/longas aliases for"<4"/"4-20"/">20". Raw values still work (backward compatible). -
Position-independent global options:
--verbose,--output-file, and all other global flags now work when placed after the subcommand (e.g.scrapingbee google --verbose "query"), in addition to before it. -
Shell-safe
--extract-fielddot syntax:--extract-field organic_results.urlreplaces the old bracket syntax (organic_results[].url). No shell quoting needed. -
AGENTS.md: Added project-root context file for tools that have no plugin/skill system. Read automatically by Amp, RooCode, Windsurf, Kilo Code, and OpenAI Codex CLI (the only mechanism Codex supports). Contains install, auth, all commands, global flags, pipeline recipes, credit costs, and troubleshooting — self-contained so no SKILL.md is needed. -
Multi-tool agent compatibility:
scraping-pipelineagent is now placed in all major AI coding tool directories —.gemini/agents/(Gemini CLI),.github/agents/(GitHub Copilot),.augment/agents/(Augment Code),.factory/droids/(Factory AI),.kiro/agents/(Kiro IDE),.opencode/agents/(OpenCode). All use the same markdown+YAML content as.claude/agents/(already covers Claude Code, Cursor, Amp, RooCode, Windsurf, Augment Code). Amazon Q gets.amazonq/cli-agents/scraping-pipeline.json(JSON format required by that tool). -
Multi-tool skill compatibility:
SKILL.mdis mirrored to.agents/skills/scrapingbee-cli/(Amp + RooCode + OpenCode — none readAGENTS.mdfor skills) and.kiro/skills/scrapingbee-cli/(Kiro — uses.kiro/steering/for context, notAGENTS.md). Windsurf and Kilo Code are covered byAGENTS.mdinstead (both read it natively), so no dedicated skill directories are needed for them. -
.claude-plugin/marketplace.json: Added Claude Code plugin marketplace manifest so thescrapingbee-cliGitHub repo is recognized as a self-contained plugin marketplace. Enables users to install via Claude Code's plugin system (/plugins install scrapingbee@scrapingbee) after registering the marketplace. Declares the singlescrapingbee-cliplugin withsource: "."pointing to the repo root whereskills/is discovered automatically. -
--extract-fieldglobal flag: Extract values from a JSON response using a path expression and output one value per line — e.g.--extract-field organic_results.urlextracts each URL from a SERP, ready to pipe into--input-file. Supports array expansion (key.subkey) and top-level scalars/lists (key). Takes precedence over--fields. -
--fieldsglobal flag: Filter JSON response output to specified comma-separated top-level keys — e.g.--fields title,price,rating. Works on single-object and list responses. -
Per-request metadata in batch manifest:
write_batch_output_to_dirnow writesmanifest.json(alongside the numbered output files) with per-item metadata:{"input": {"file": "N.ext", "fetched_at": "<ISO-8601>", "http_status": 200}}. Enables time-series analysis for price monitoring, change detection, and audit trails. -
Enriched crawl manifest:
crawlmanifest.json now uses the same enriched per-item format:{"url": {"file": "N.ext", "fetched_at": "<ISO-8601>", "http_status": 200}}. -
export --diff-dir: Compare a new batch/crawl directory with a previous one and output only items whose content has changed or are new. Unchanged items (same file content by MD5) are skipped. Prints a count of skipped items to stderr. -
google --search-type ai_mode: Added theai_modesearch type to the--search-typechoice list (returns an AI-generated answer). -
youtube-metadataaccepts full URLs: The command now auto-extracts the video ID from full YouTube URLs (youtube.com/watch?v=...,youtu.be/...,/shorts/...), enabling direct piping fromyoutube-search --extract-field results.linkwithoutsed. -
Claude Skill — Pipelines section:
SKILL.mdnow has a prominent Pipelines table at the top listing the 6 main multi-step patterns with exact one-liner commands. -
Claude Skill — Pipeline subagent:
.claude/agents/scraping-pipeline.mddefines an isolated subagent that orchestrates full scraping pipelines (credit check → search → batch → export) without polluting the main conversation context. -
Claude Skill —
--extract-fieldexamples added to all search command docs:fast-search,amazon-search,walmart-search,youtube-search, andgoogledocs now include a "Pipeline" section showing how to chain into the downstream batch command. -
Claude Skill — Change monitoring pattern:
patterns.mddocuments the--diff-dirmonitoring workflow and notes thatmanifest.jsonnow includesfetched_at/http_statusper item for time-series analysis. -
youtube-searchresponse normalization: The command now parses the raw YouTube API payload and outputs a clean JSON structure —resultsis a proper array (not a JSON-encoded string) with flat fields:link(fullhttps://www.youtube.com/watch?v=…URL),video_id,title,channel,views,published,duration. Enables--extract-field results.linkto work directly for piping intoyoutube-metadata. -
walmart-search→walmart-productpipeline: Search results include a top-levelidfield per product (e.g."921722537"), enabling--extract-field products.id walmart-search QUERY | walmart-product— an exact parallel to the Amazon search → product pipeline. Docs updated to document this pipeline. -
Claude Skill —
walmart-search → walmart-productpipeline:walmart/search.md,patterns.md, andSKILL.mdpipeline table updated to document--extract-field products.id→walmart-product. -
Claude Skill — YouTube search output schema corrected:
reference/youtube/search-output.mdnow documents the clean normalized schema (link, video_id, title, channel, views, published, duration). -
Tests: Unit tests for
_normalize_youtube_search(8 tests: results array, link construction, title/channel extraction, video_id field, items without videoId skipped, already-array passthrough, invalid JSON passthrough, other fields preserved). -
Tests: Unit tests for
write_batch_output_to_dirmanifest writing (5 tests: correct structure, errors omitted, skipped items omitted, no manifest when all fail, screenshot subdir in manifest path). -
Tests: Unit tests for
_extract_field_values(7 tests: array subkey, top-level scalar/list, missing key, invalid JSON, missing subkey items, empty array) and_filter_fields(5 tests: dict filter, nonexistent keys, empty fields, invalid JSON, list filter). Global--extract-field,--fields, andai_modecoverage in CLI help tests. -
Tests: Unit tests for
export --diff-dir(4 tests: all unchanged, changed item, new item, mixed). Unit test for new dict-valued manifest format in CSV export. -
schedulecommand:scrapingbee schedule --every INTERVAL CMDrepeatedly runs any scrapingbee command at a fixed interval (supports30s,5m,1h,2d).--auto-diffautomatically injects--diff-dirfrom the previous run for change detection across runs. -
--diff-dirglobal option: Compare batch/crawl output with a previous run — unchanged files (by MD5) are not re-written and are marked"unchanged": truein manifest.json. Works with all batch commands. -
RAG-ready chunked output:
scrape --chunk-size N [--chunk-overlap M]splits text/markdown responses into overlapping NDJSON chunks (each line:{"url", "chunk_index", "total_chunks", "content", "fetched_at"}). Ready for vector DB ingestion or LLM context windows. -
Enriched batch manifest:
manifest.jsonnow includescredits_used(fromSpb-Costheader),latency_ms(request timing), andcontent_md5(MD5 hash of response body) per item.content_md5powers the--diff-dirchange detection. -
Estimated credit costs in verbose mode: SERP endpoints (Google, Fast Search, Amazon, Walmart, YouTube, ChatGPT) don't return the
Spb-Costheader.--verbosenow shows estimated credit cost from hardcoded values incredits.pywhen the header is absent. -
E2E test suite: 182 end-to-end tests covering all commands, batch/crawl, export, schedule, diff-dir, verbose output, and edge cases.
-
Tests: Unit tests for
read_input_file, crawl spider manifest fields (credits_used,latency_ms), estimated credit cost display,chunk_text,_parse_duration, schedule helpers. -
Progress counter: Batch runs now print a per-item
[n/total]counter to stderr as each item completes (with(error)or(skipped)suffix when applicable). Suppress with global--no-progressflag. -
CSV export:
scrapingbee export --format csvflattens JSON batch/crawl output to a tabular CSV. API responses with a top-level list (e.g.organic_results,products,results) expand to one row per item; single-object responses (e.g. product pages) produce one row per file. Nested dicts/arrays are serialised as JSON strings._urlcolumn is added whenmanifest.jsonis present. -
Chained workflow docs:
reference/usage/patterns.mdnow includes end-to-end pipeline recipes: SERP → scrape result pages, Amazon search → product details (with CSV export), YouTube search → video metadata, and batch SERP for many queries. -
Resume (batch):
--resumeglobal flag skips already-completed items when re-running a batch command against an existing--output-dir. Completed items are detected by scanning forN.<ext>files (.errfiles are not treated as complete). Applies to all batch commands:scrape,google,fast-search,amazon-product,amazon-search,walmart-product,walmart-search,youtube-metadata,youtube-search,chatgpt. -
Resume (crawl):
--resumealso resumes an interrupted crawl: existingmanifest.jsonis loaded to pre-populate already-visited URLs, preventing re-fetching. -
Crawl manifest:
crawlnow writesmanifest.json(URL → relative filename map) to the output directory when the crawl finishes, enabling resume and export. -
Sitemap ingestion:
crawl --from-sitemap <url>fetches a sitemap (or sitemap index) and crawls all discovered URLs. Handles<sitemapindex>recursively (depth limit 2) and both namespaced and bare XML. -
Export command:
scrapingbee export --input-dir <dir> [--format ndjson|txt]merges numbered batch/crawl output files into a single stream. NDJSON mode enriches each record with_urlwhen amanifest.jsonis present; TXT mode emits# URLheaders followed by page text. Output respects--output-file. -
CI: GitHub Actions workflow (
.github/workflows/ci.yml) runs unit tests across Python 3.10–3.13 on every push and pull request. -
Tests: Unit tests for
validate_batch_run(credit and concurrency checks). -
Tests: Unit tests for
_find_main_list,_flatten_value, andexport --format csv(17 tests covering flat objects, list expansion, non-JSON skipping, manifest URL enrichment, and empty-input error). -
Tests: Unit tests for
_find_completed_n(nonexistent dir, numbered files, ignores.err, ignores non-numeric stems, finds files in subdirectories). -
Tests: Unit tests for
run_batch_asyncskip-n (resume) behaviour: skipped items are markedskipped=Truewith empty body; emptyskip_nprocesses all items. -
Tests: Unit tests for the crawl double-fetch discovery mechanism (
parse()triggers discovery when no links;_parse_discovery_links_only()follows links without saving). -
Tests: Help-output tests for every command (youtube-search, youtube-metadata, walmart-search, walmart-product, amazon-product, amazon-search, fast-search, chatgpt, crawl, export, schedule, usage, scrape, google) — verifying key params appear in
--help. YouTube choice constants tests. Global option reordering tests (15 edge cases). Total: 343 unit tests. -
Claude Skill:
reference/usage/patterns.md— multi-step workflow recipes: crawl + AI extraction (Option A one-command; Option B crawl-then-batch), batch SERP pipeline. -
Claude Skill: Prerequisites section at the top of
SKILL.mdso AI agents install the CLI and authenticate before issuing commands. -
Claude Skill: Output schemas (truncated JSON examples) added to all API reference docs:
google,fast-search,amazon-product,amazon-search,walmart-product,walmart-search,youtube-search,youtube-metadata,chatgpt. -
Claude Skill:
reference/troubleshooting.md— decision tree covering empty responses, 403/429 errors,.errfiles, crawl stopping early,--ai-queryreturning null, missing output files, and proxy recommendations. -
Claude Skill:
reference/batch/export.md— documents theexportcommand and--resumeflag with examples. -
Claude Skill:
reference/scrape/extraction.md— documents--ai-queryand--ai-extract-rulesresponse formats with JSON examples. -
Claude Skill:
reference/scrape/strategies.md— "Why use ScrapingBee instead of WebFetch or curl?" section explaining automatic proxy rotation, CAPTCHA handling, and JS rendering as reasons to prefer ScrapingBee for all web scraping tasks. -
Claude Skill:
reference/crawl/overview.md— documents sitemap mode (--from-sitemap), resume (--resume),manifest.json, and the three crawl modes (Scrapy project, URL-based, sitemap-based).
- Claude Skill:
SKILL.mdfrontmatterversioncorrected from1.3.0to1.1.0to matchpyproject.toml. - Claude Skill:
reference/crawl/overview.mdnow accurately documents the double-fetch discovery mechanism:--return-page-text(and other non-HTML options) triggers a second plain-HTML fetch for link discovery, costing 2 credits per affected page.--return-page-markdownis exempt because markdown links are extracted directly. - Claude Skill: Removed spurious
add_html/full_htmlreference fromreference/chatgpt/overview.md(the ChatGPT command has no--add-htmloption). - Claude Skill:
reference/usage/patterns.mdOption B uses--preset extract-linksfor concrete URL discovery and documents that crawl output files are numbered (no URL manifest). - Tests:
test_root_versionnow asserts the exact__version__string instead of the fragile"1.0" in outsubstring check.
- Claude Skill: Removed invalid
tagskey fromSKILL.mdfrontmatter so it validates against allowed properties (name,description,version, etc.).
- CLI for ScrapingBee API:
scrapingbeewith subcommands for scrape, batch, crawl, usage, auth, and specialized tools (Google, Fast Search, Amazon, Walmart, YouTube, ChatGPT). - Space-separated option syntax (
--option value);--option=valueis rejected. - Claude Skill documentation under
skills/scrapingbee-cli/for AI-assisted usage.