A simple Model Context Protocol (MCP) server that wraps the steel-dev API for visiting websites with browser automation.
-
Install the package:
npm install -g @jharding_npm/mcp-server-steel-scraper
-
Add to your MCP client configuration:
{ "mcpServers": { "steel-scraper": { "command": "npx", "args": ["@jharding_npm/mcp-server-steel-scraper", "--mode=both"], "env": { "STEEL_API_URL": "http://localhost:3000" } } }
}
3. **Start using the stateless `visit_with_browser` tool, or the stateful interactive tools.**
## Features
- **Dual Modes**: Run stateless scraping, stateful interaction, or both via `--mode=stateless|stateful|both` (default: `both`)
- **Stateless Tool**: `visit_with_browser` - Visit websites using steel-dev API
- **Stateful Tools**: Create sessions and interact with pages (navigate, click, type, scroll, snapshot)
- **Flexible Return Types**: HTML, markdown, readability, or cleaned HTML
- **Local/Remote Support**: Works with local or remote steel-dev instances
- **Browser Automation**: Screenshot capture, PDF generation, proxy support
- **Smart Length Management**: Single `maxLength` parameter with intelligent defaults and automatic content/metadata split
- **Clean Output by Default**: Minimal metadata output perfect for 7B models and summarization
- **Verbose Mode**: Optional full metadata when detailed information is needed
- **TypeScript**: Fully typed implementation
## Installation
### Option 1: NPM Package (Recommended)
Install the package globally to use it with npx:
```bash
npm install -g @jharding_npm/mcp-server-steel-scraper
Or use it directly with npx without installing:
npx @jharding_npm/mcp-server-steel-scraper- Clone this repository:
git clone <repository-url>
cd mcp-server-steel-scraper- Install dependencies:
npm install- Build the project:
npm run buildThe server uses environment variables for configuration:
STEEL_API_URL: The steel-dev API endpoint (default:http://localhost:3000)STEEL_TIMEOUT: Request timeout in milliseconds (default:30000)STEEL_RETRIES: Number of retry attempts (default:3)STEEL_LOCAL: Set totruewhen using a local Steel instance for stateful sessionsSTEEL_BASE_URL: Base URL for the Steel Sessions API (default:https://api.steel.dev, orhttp://localhost:3000whenSTEEL_LOCAL=true)STEEL_API_KEY: Required for cloud mode stateful sessionsSTEEL_SESSION_TIMEOUT_MS: Session timeout in milliseconds (default:900000)STEEL_GLOBAL_WAIT_SECONDS: Optional delay after each stateful action (default:0)STEEL_IDLE_TIMEOUT_MS: Auto-release idle sessions after this many milliseconds (default:600000, set to0to disable)
Copy env.example to .env and modify as needed:
cp env.example .env# Development mode
npm run dev
# Auto-rebuild on changes (recommended for npm link workflows)
npm run build:watch
# Production mode
npm start
# Only stateless scraping tools
npm start -- --mode=stateless
# Only stateful interactive tools
npm start -- --mode=stateful
# Both tool sets (default)
npm start -- --mode=bothAdd this server to your MCP client configuration. Here are examples for popular LLM clients:
{
"mcpServers": {
"steel-scraper": {
"command": "npx",
"args": ["@jharding_npm/mcp-server-steel-scraper", "--mode=stateless"],
"env": {
"STEEL_API_URL": "http://localhost:3000"
}
}
}
}To expose the stateful interactive tools, add --mode=stateful or --mode=both to the args array.
{
"mcpServers": {
"steel-scraper": {
"command": "npx",
"args": ["@jharding_npm/mcp-server-steel-scraper", "--mode=stateless"],
"env": {
"STEEL_API_URL": "http://localhost:3000"
}
}
}
}{
"mcpServers": {
"steel-scraper": {
"command": "npx",
"args": ["@jharding_npm/mcp-server-steel-scraper", "--mode=stateless"],
"env": {
"STEEL_API_URL": "http://localhost:3000"
}
}
}
}{
"mcpServers": {
"steel-scraper": {
"command": "npx",
"args": ["@jharding_npm/mcp-server-steel-scraper", "--mode=stateless"],
"env": {
"STEEL_API_URL": "https://your-steel-dev-instance.com"
}
}
}
}If you've installed the package globally with npm install -g @jharding_npm/mcp-server-steel-scraper, you can use:
{
"mcpServers": {
"steel-scraper": {
"command": "mcp-server-steel-scraper",
"env": {
"STEEL_API_URL": "http://localhost:3000"
}
}
}
}{
"mcpServers": {
"steel-scraper": {
"command": "node",
"args": ["/path/to/mcp-server-steel-scraper/dist/index.js"],
"env": {
"STEEL_API_URL": "http://localhost:3000"
}
}
}
}The server provides one tool: visit_with_browser
url(required): The URL to visitformat(optional): Content formats to extract -["html"]for raw HTML source (may be very large),["markdown"]for clean formatted text converted from HTML (recommended for reading),["readability"]for Mozilla Readability format,["cleaned_html"]for cleaned HTML. You can request multiple formats (default:["markdown"])screenshot(optional): Take a screenshot of the page (returns base64 encoded image) (default:false)pdf(optional): Generate a PDF of the page (returns base64 encoded PDF) (default:false)proxyUrl(optional): Proxy URL to use for the request (e.g.,"http://proxy:port")delay(optional): Delay in seconds to wait after page load before scraping (default:0)logUrl(optional): URL to send logs to for debugging purposesmaxLength(optional): Maximum characters to return. Smart defaults: markdown=8000, readability=10000, html=15000, cleaned_html=12000. For markdown, automatically reserves space for metadataverboseMode(optional): Return full metadata instead of clean content-focused output (default: false). Use when you need detailed visit information
// Basic website visit
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://example.com"
}
}
// Advanced visit with multiple formats
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://example.com",
"format": ["markdown", "html"],
"screenshot": true,
"delay": 2
}
}
// Simple visit with smart defaults (perfect for 7B models)
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://example.com",
"format": ["markdown"]
}
}
// Custom length limit (automatically handles content vs metadata split)
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://en.wikipedia.org/wiki/Long_Article",
"format": ["markdown"],
"maxLength": 5000
}
}
// Verbose mode when you need detailed visit information
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://example.com",
"format": ["markdown"],
"maxLength": 8000,
"verboseMode": true
}
}
// With proxy and PDF generation
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://example.com",
"format": ["readability"],
"pdf": true,
"proxyUrl": "http://proxy:8080"
}
}When running with --mode=stateful or --mode=both, the server exposes stateful tools that let the LLM interact with a live page.
Stateful sessions are created via the Steel Sessions API and connected over CDP (Chrome DevTools Protocol).
session_create- Create a new Steel session and connectsession_release- Release the current sessionnavigate- Navigate to a URLsearch- Open Google search results for a queryclick- Click an element by labeltype- Type into an element by labelscroll_down/scroll_up- Scroll the pagego_back- Navigate backwait- Wait a few seconds for dynamic contentsnapshot- Annotated screenshot + labels listsnapshot_unmarked- Screenshot without labelspage_content- Return page HTML or text
// Create a session
{
"tool": "session_create",
"arguments": { "timeoutMs": 900000 }
}
// Navigate
{
"tool": "navigate",
"arguments": { "url": "https://example.com" }
}
// Get an annotated snapshot (labels + image)
{
"tool": "snapshot",
"arguments": {}
}
// Click a labeled element
{
"tool": "click",
"arguments": { "label": 3 }
}
// Type into a labeled input
{
"tool": "type",
"arguments": { "label": 5, "text": "hello", "replaceText": true }
}The server automatically handles content length optimization:
- Unified Length Control: Single
maxLengthparameter handles both content and metadata - Automatic Content/Metadata Split: For markdown, reserves 10% for metadata, uses 90% for content
- Smart Defaults: Reasonable defaults when no length is specified (markdown=8000, text=10000, html=15000, json=5000)
- Better Truncation: Avoids double-truncation issues that could result in incomplete content
- Conversion Detection: Automatically detects when HTML-to-markdown conversion may have failed
- Warning System: Provides warnings when content appears truncated or incomplete
// Simple usage - uses smart defaults
{
"url": "https://example.com",
"format": ["markdown"]
// Automatically uses 8000 characters, reserves 800 for metadata, 7200 for content
}
// Custom length - automatically splits appropriately
{
"url": "https://example.com",
"format": ["markdown"],
"maxLength": 5000
// Uses 5000 total, reserves 500 for metadata, 4500 for content
}This approach ensures you get complete, properly formatted content while maintaining simple, intuitive parameter management.
For large, complex pages like Amazon.com, follow these best practices:
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://www.amazon.com",
"format": ["readability"], // Most reliable for complex pages
"maxLength": 5000, // Reasonable limit for large pages
"delay": 3 // Wait for main content to load
}
}- HTML: Returns raw HTML source (can be 900,000+ characters for Amazon)
- Readability: Mozilla Readability format (most reliable, good for complex pages)
- Markdown: Converts HTML to clean, readable text (may fail on complex pages like Amazon)
- Cleaned HTML: Cleaned HTML with better structure
Note: Markdown conversion may fail on complex, JavaScript-heavy pages like Amazon. Use ["readability"] for the most reliable results.
If you get HTML instead of Markdown:
- The steel-dev API may not support markdown conversion for that page type
- Try using
format: ["readability"]instead for better text extraction - Complex pages with heavy JavaScript may not convert properly
If you get truncated content:
- The page may be too large for the specified
maxLength - Try increasing
maxLengthor using a longerdelay - Consider using
format: ["readability"]for more reliable truncation
Use delay parameter to wait for content to load:
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://www.amazon.com",
"format": ["markdown"],
"delay": 5, // Wait 5 seconds for content to load
"maxLength": 10000 // Longer content for complex pages
}
}The server is designed with 7B models in mind, providing clean, content-focused output by default:
- Content Summarization: Perfect for weaker models that need to summarize web content
- Content Analysis: Ideal for processing large amounts of text
- Context Optimization: Maximizes the content-to-metadata ratio automatically
Default Mode (clean output):
# Article Title
This is the actual content...
Verbose Mode (verboseMode: true):
SUCCESS: Successfully scraped https://example.com
Method: full-browser-automation (stealth browser, anti-detection)
Format: markdown
Status Code: 200
Processing Time: 1250ms
Content Length: 5000 characters
Content Type: text/html
Timestamp: 2024-01-15T10:30:00.000Z
Title: Article Title
Description: Article description
Language: en
Screenshot: Available (base64)
Links Found: 15
SCRAPED CONTENT:
# Article Title
This is the actual content...
- Maximum Content Space: Removes ~200-300 characters of metadata overhead
- Cleaner Output: Direct content without verbose headers
- Better for 7B Models: Focuses the model's attention on the actual content
- Preserves Warnings: Still shows important warnings if conversion issues occur
For summarization tasks, use the default clean output:
{
"tool": "visit_with_browser",
"arguments": {
"url": "https://article-to-summarize.com",
"format": ["markdown"],
"maxLength": 10000 // Automatically optimizes content vs metadata split
}
}This MCP server expects a steel-dev API instance running with the following endpoints:
POST /scrape- Main scraping endpointGET /health- Health check endpoint (optional)GET /info- API information endpoint (optional)POST /v1/sessions- Create a stateful browser sessionPOST /v1/sessions/{id}/release- Release a stateful session
{
"url": "https://example.com",
"format": ["html", "markdown"],
"screenshot": true,
"pdf": false,
"proxyUrl": "http://proxy:8080",
"delay": 2,
"logUrl": "https://logs.example.com"
}{
"content": {
"html": "<html>...</html>",
"markdown": "# Title\nContent..."
},
"metadata": {
"title": "Page Title",
"description": "Page description",
"statusCode": 200,
"timestamp": "2024-01-15T10:30:00.000Z"
},
"links": [
{"url": "https://example.com/link1", "text": "Link Text"}
],
"screenshot": "base64...",
"pdf": "base64..."
}src/
├── index.ts # Main MCP server implementation
├── steel-api.ts # Steel-dev API wrapper
└── config.ts # Configuration management
npm run build- Build TypeScript to JavaScriptnpm run start- Run the built servernpm run dev- Run in development mode with tsx
- Modify the tool schema in
src/index.ts - Update the
SteelAPIclass insrc/steel-api.tsif needed - Rebuild and test
The server includes comprehensive error handling:
- Network errors are caught and returned as error responses
- Invalid parameters are validated
- Steel-dev API errors are properly forwarded
- Timeout handling for long-running requests
MIT