Skip to content

Plasmate as a lightweight scraping backend - no Chrome needed #1055

@dbhurley

Description

@dbhurley

ScrapeGraph-AI currently uses Chrome/Playwright for fetching pages. For many use cases (especially content extraction and data scraping from server-rendered pages), the full Chrome rendering pipeline is overkill.

Plasmate is an open-source browser engine (Rust, Apache 2.0) that parses HTML and outputs structured semantic content. No rendering, no GPU, no 300MB Chrome process.

For scraping workflows:

  • 30MB memory instead of 300MB per instance
  • 16.6x fewer tokens per page (saves LLM costs in AI-powered extraction)
  • Works as a single binary: pip install plasmate

Could work as an alternative Fetcher for static pages, with Chrome as fallback for SPAs.

Not a sales pitch - it's free and open source. Just think it could be useful for the project.

https://github.com/plasmate-labs/plasmate

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions