Bright Data Python SDK 🐍

Modern async-first Python SDK for Bright Data APIs with dataclass payloads, Jupyter notebooks, comprehensive platform support, and CLI tool - built for data scientists and developers.

📑 Table of Contents

✨ Features
📓 Jupyter Notebooks
📦 Installation
🚀 Quick Start
- Authentication
- Simple Web Scraping
- Using Dataclass Payloads
- Pandas Integration
- Platform-Specific Scraping
- Search Engine Results (SERP)
- Async Usage
🆕 What's New in v2.0.0
🏗️ Architecture
📚 API Reference
- Client Initialization
- Connection Testing
- Zone Management
- Result Objects
🖥️ CLI Usage
🐼 Pandas Integration
🎨 Dataclass Payloads
🔧 Advanced Usage
🧪 Testing
🏛️ Design Philosophy
📖 Documentation
🔧 Troubleshooting
🤝 Contributing
📊 Project Stats
📝 License
🔗 Links
💡 Examples
🎯 Roadmap
🙏 Acknowledgments
🌟 Why Choose This SDK?

✨ Features

🎯 For Data Scientists

📓 5 Jupyter Notebooks - Complete tutorials from quickstart to batch processing
🐼 Pandas Integration - Native DataFrame support with examples
📊 Data Analysis Ready - Built-in visualization, export to CSV/Excel
💰 Cost Tracking - Budget management and cost analytics
🔄 Progress Bars - tqdm integration for batch operations
💾 Caching Support - joblib integration for development

🏗️ Core Features

🚀 Async-first architecture with sync wrappers for compatibility
🎨 Dataclass Payloads - Runtime validation, IDE autocomplete, helper methods
📝 Markdown Output - Export results as GitHub-flavored markdown tables
🌐 Web scraping via Web Unlocker proxy service
🔍 SERP API - Google, Bing, Yandex search results
📦 Platform scrapers - LinkedIn, Amazon, ChatGPT, Facebook, Instagram
🎯 Dual namespace - scrape (URL-based) + search (discovery)
🖥️ CLI Tool - brightdata command for terminal usage

🛡️ Enterprise Grade

🔒 100% type safety - Dataclasses + TypedDict definitions
✅ 502+ comprehensive tests - Unit, integration, and E2E
⚡ Resource efficient - Single shared AsyncEngine
🎨 Rich result objects - Timing, cost tracking, method tracking
🔐 .env file support - Automatic loading via python-dotenv
🛡️ SSL error handling - Helpful guidance for certificate issues
📊 Function-level monitoring - Track which SDK methods are used

📓 Jupyter Notebooks (NEW!)

Perfect for data scientists! Interactive tutorials with examples:

01_quickstart.ipynb - Get started in 5 minutes
02_pandas_integration.ipynb - Work with DataFrames
03_amazon_scraping.ipynb - Amazon deep dive
04_linkedin_jobs.ipynb - Job market analysis
05_batch_processing.ipynb - Scale to 1000s of URLs

📦 Installation

pip install brightdata-sdk

Or install from source:

git clone https://github.com/brightdata/sdk-python.git
cd sdk-python
pip install -e .

🚀 Quick Start

Authentication

Set your API token as an environment variable:

export BRIGHTDATA_API_TOKEN="your_api_token_here"
export BRIGHTDATA_CUSTOMER_ID="your_customer_id"  # Optional

Or use a .env file (automatically loaded):

# .env
BRIGHTDATA_API_TOKEN=your_api_token_here
BRIGHTDATA_CUSTOMER_ID=your_customer_id  # Optional

Or pass credentials directly:

from brightdata import BrightDataClient

client = BrightDataClient(
    token="your_api_token",
    customer_id="your_customer_id"  # Optional
)

Simple Web Scraping

from brightdata import BrightDataClient

# Initialize client (auto-loads token from environment)
client = BrightDataClient()

# Scrape any website (sync wrapper)
result = client.scrape.generic.url("https://example.com")

if result.success:
print(f"Success: {result.success}")
print(f"Data: {result.data[:200]}...")
print(f"Time: {result.elapsed_ms():.2f}ms")
else:
    print(f"Error: {result.error}")

Using Dataclass Payloads (Type-Safe ✨)

from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload, LinkedInJobSearchPayload

client = BrightDataClient()

# Amazon with validated payload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123456789",
    reviews_count=50  # Runtime validated!
)
print(f"ASIN: {payload.asin}")  # Helper property

result = client.scrape.amazon.products(**payload.to_dict())

# LinkedIn job search with validation
job_payload = LinkedInJobSearchPayload(
    keyword="python developer",
    location="New York",
    remote=True
)
print(f"Remote search: {job_payload.is_remote_search}")

jobs = client.search.linkedin.jobs(**job_payload.to_dict())

Pandas Integration for Data Scientists 🐼

import pandas as pd
from brightdata import BrightDataClient

client = BrightDataClient()

# Scrape multiple products
urls = ["https://amazon.com/dp/B001", "https://amazon.com/dp/B002"]
results = []

for url in urls:
    result = client.scrape.amazon.products(url=url)
    if result.success:
        results.append({
            'title': result.data.get('title'),
            'price': result.data.get('final_price'),
            'rating': result.data.get('rating'),
            'cost': result.cost
        })

# Convert to DataFrame
df = pd.DataFrame(results)
print(df.describe())

# Export to CSV
df.to_csv('products.csv', index=False)

Platform-Specific Scraping

Amazon Products

# Scrape specific product URLs
result = client.scrape.amazon.products(
    url="https://amazon.com/dp/B0CRMZHDG8",
    timeout=65
)

# Extract reviews with filters
result = client.scrape.amazon.reviews(
    url="https://amazon.com/dp/B0CRMZHDG8",
    pastDays=30,
    keyWord="quality",
    numOfReviews=100
)

# Scrape seller information
result = client.scrape.amazon.sellers(
    url="https://amazon.com/sp?seller=AXXXXXXXXX"
)

# NEW: Search Amazon by keyword and filters
result = client.search.amazon.products(
    keyword="laptop",
    min_price=50000,    # $500 in cents
    max_price=200000,   # $2000 in cents
    prime_eligible=True,
    condition="new"
)

# Search by category
result = client.search.amazon.products(
    keyword="wireless headphones",
    category="electronics"
)

LinkedIn Data

# URL-based extraction
result = client.scrape.linkedin.profiles(
    url="https://linkedin.com/in/johndoe"
)

result = client.scrape.linkedin.jobs(
    url="https://linkedin.com/jobs/view/123456"
)

result = client.scrape.linkedin.companies(
    url="https://linkedin.com/company/microsoft"
)

result = client.scrape.linkedin.posts(
    url="https://linkedin.com/feed/update/..."
)

# Discovery/search operations
result = client.search.linkedin.jobs(
    keyword="python developer",
    location="New York",
    remote=True,
    experienceLevel="mid"
)

result = client.search.linkedin.profiles(
    firstName="John",
    lastName="Doe"
)

result = client.search.linkedin.posts(
    profile_url="https://linkedin.com/in/johndoe",
    start_date="2025-01-01",
    end_date="2025-12-31"
)

ChatGPT Interactions

# Send single prompt to ChatGPT
result = client.scrape.chatgpt.prompt(
    prompt="Explain Python async programming",
    country="us",
    web_search=True
)

# Batch prompts
result = client.scrape.chatgpt.prompts(
    prompts=["What is Python?", "What is JavaScript?", "Compare them"],
    web_searches=[False, False, True]
)

Facebook Data

# Scrape posts from profile
result = client.scrape.facebook.posts_by_profile(
    url="https://facebook.com/profile",
    num_of_posts=10,
    start_date="01-01-2025",
    end_date="12-31-2025",
    timeout=240
)

# Scrape posts from group
result = client.scrape.facebook.posts_by_group(
    url="https://facebook.com/groups/example",
    num_of_posts=20,
    timeout=240
)

# Scrape specific post
result = client.scrape.facebook.posts_by_url(
    url="https://facebook.com/post/123456",
    timeout=240
)

# Scrape comments from post
result = client.scrape.facebook.comments(
    url="https://facebook.com/post/123456",
    num_of_comments=100,
    start_date="01-01-2025",
    end_date="12-31-2025",
    timeout=240
)

# Scrape reels from profile
result = client.scrape.facebook.reels(
    url="https://facebook.com/profile",
    num_of_posts=50,
    timeout=240
)

Instagram Data

# Scrape Instagram profile
result = client.scrape.instagram.profiles(
    url="https://instagram.com/username",
    timeout=240
)

# Scrape specific post
result = client.scrape.instagram.posts(
    url="https://instagram.com/p/ABC123",
    timeout=240
)

# Scrape comments from post
result = client.scrape.instagram.comments(
    url="https://instagram.com/p/ABC123",
    timeout=240
)

# Scrape specific reel
result = client.scrape.instagram.reels(
    url="https://instagram.com/reel/ABC123",
    timeout=240
)

# Discover posts from profile (with filters)
result = client.search.instagram.posts(
    url="https://instagram.com/username",
    num_of_posts=10,
    start_date="01-01-2025",
    end_date="12-31-2025",
    post_type="reel",
    timeout=240
)

# Discover reels from profile
result = client.search.instagram.reels(
    url="https://instagram.com/username",
    num_of_posts=50,
    start_date="01-01-2025",
    end_date="12-31-2025",
    timeout=240
)

Search Engine Results (SERP)

# Google search
result = client.search.google(
    query="python tutorial",
    location="United States",
    language="en",
    num_results=20
)

# Access results
for item in result.data:
    print(f"{item['position']}. {item['title']}")
    print(f"   {item['url']}")

# Bing search
result = client.search.bing(
    query="python tutorial",
    location="United States"
)

# Yandex search
result = client.search.yandex(
    query="python tutorial",
    location="Russia"
)

Async Usage

For better performance with multiple operations, use async:

import asyncio
from brightdata import BrightDataClient

async def scrape_multiple():
    # Use async context manager for engine lifecycle
    async with BrightDataClient() as client:
        # Scrape multiple URLs concurrently
        results = await client.scrape.generic.url_async([
            "https://example1.com",
            "https://example2.com",
            "https://example3.com"
        ])
        
        for result in results:
            print(f"Success: {result.success}")

asyncio.run(scrape_multiple())

Important: When using *_async methods, always use the async context manager (async with BrightDataClient() as client). Sync wrappers (methods without _async) handle this automatically.

🆕 What's New in v2 2.0.0

🆕 Latest Updates (December 2025)

✅ Markdown Output Format - NEW! Export results as GitHub-flavored markdown
✅ Amazon Search API - NEW parameter-based product discovery with correct dataset
✅ LinkedIn Job Search Fixed - Now builds URLs from keywords internally
✅ Trigger Interface - Manual trigger/poll/fetch control for all platforms
✅ 30 Sync Wrapper Fixes - ALL sync methods work (scrapers + SERP + generic)
✅ Batch Operations Fixed - Returns List[ScrapeResult] correctly
✅ Auto-Create Zones - Now enabled by default (was opt-in)
✅ Improved Zone Names - sdk_unlocker, sdk_serp, sdk_browser
✅ Full Sync/Async Examples - README now shows both patterns for all features

🎓 For Data Scientists

✅ 5 Jupyter Notebooks - Complete interactive tutorials
✅ Pandas Integration - Native DataFrame support with examples
✅ Batch Processing Guide - Scale to 1000s of URLs with progress bars
✅ Cost Management - Budget tracking and optimization
✅ Visualization Examples - matplotlib/seaborn integration

🎨 Dataclass Payloads (Major Upgrade)

✅ Runtime Validation - Catch errors at instantiation time
✅ Helper Properties - .asin, .is_remote_search, .domain, etc.
✅ IDE Autocomplete - Full IntelliSense support
✅ Default Values - Smart defaults (e.g., country="US")
✅ to_dict() Method - Easy API conversion
✅ Consistent Model - Same pattern as result models

🖥️ CLI Tool

✅ brightdata command - Use SDK from terminal
✅ Scrape operations - brightdata scrape amazon products ...
✅ Search operations - brightdata search amazon products --keyword ...
✅ Output formats - JSON, pretty-print, minimal

🏗️ Architecture Improvements

✅ Single AsyncEngine - Shared across all scrapers (8x efficiency)
✅ Resource Optimization - Reduced memory footprint
✅ Enhanced Error Messages - Clear, actionable error messages
✅ 500+ Tests Passing - Comprehensive test coverage (99.4%)

🆕 Platforms & Features

✅ Amazon Search - Keyword-based product discovery
✅ Facebook Scraper - Posts (profile/group/URL), Comments, Reels
✅ Instagram Scraper - Profiles, Posts, Comments, Reels
✅ Instagram Search - Posts and Reels discovery with filters

🏗️ Architecture

Hierarchical Service Access

The SDK provides a clean, intuitive interface organized by operation type:

client = BrightDataClient()

# URL-based extraction (scrape namespace)
client.scrape.amazon.products(url="...")
client.scrape.linkedin.profiles(url="...")
client.scrape.facebook.posts_by_profile(url="...")
client.scrape.instagram.profiles(url="...")
client.scrape.generic.url(url="...")

# Parameter-based discovery (search namespace)
client.search.amazon.products(keyword="...", min_price=..., max_price=...)
client.search.linkedin.jobs(keyword="...", location="...")
client.search.instagram.posts(url="...", num_of_posts=10)
client.search.google(query="...")
client.scrape.chatgpt.prompt(prompt="...")

# Direct service access (advanced)
client.web_unlocker.fetch(url="...")
client.crawler.discover(url="...")  # Coming soon

Core Components

BrightDataClient - Main entry point with authentication and .env support
ScrapeService - URL-based data extraction
SearchService - Parameter-based discovery
Result Models - ScrapeResult, SearchResult, CrawlResult with method tracking
Platform Scrapers - Amazon, LinkedIn, ChatGPT, Facebook, Instagram with registry pattern
SERP Services - Google, Bing, Yandex search
Type System - 100% type safety with TypedDict
Constants Module - Centralized configuration (no magic numbers)
SSL Helpers - Platform-specific error guidance
Function Detection - Automatic SDK function tracking for monitoring

📚 API Reference

Client Initialization

client = BrightDataClient(
    token="your_token",               # Auto-loads from BRIGHTDATA_API_TOKEN if not provided
    customer_id="your_customer_id",   # Auto-loads from BRIGHTDATA_CUSTOMER_ID (optional)
    timeout=30,                        # Default timeout in seconds
    web_unlocker_zone="sdk_unlocker",  # Web Unlocker zone name (default)
    serp_zone="sdk_serp",              # SERP API zone name (default)
    browser_zone="sdk_browser",        # Browser API zone name (default)
    auto_create_zones=True,            # Auto-create missing zones (default: True)
    validate_token=False               # Validate token on init (default: False)
)

Environment Variables:

BRIGHTDATA_API_TOKEN - Your API token (required)
BRIGHTDATA_CUSTOMER_ID - Your customer ID (optional)

Both are automatically loaded from environment or .env file.

Connection Testing

# Test API connection
is_valid = await client.test_connection()
is_valid = client.test_connection_sync()  # Synchronous version

# Get account information
info = await client.get_account_info()
info = client.get_account_info_sync()

print(f"Zones: {info['zone_count']}")
print(f"Active zones: {[z['name'] for z in info['zones']]}")

Zone Management

The SDK can automatically create required zones if they don't exist, or you can manage zones manually.

Automatic Zone Creation

Enable automatic zone creation when initializing the client:

client = BrightDataClient(
    token="your_token",
    auto_create_zones=True  # Automatically create zones if missing
)

# Zones are created on first API call
async with client:
    # sdk_unlocker, sdk_serp, and sdk_browser zones created automatically if needed
    result = await client.scrape.amazon.products(url="...")

Manual Zone Management

List and manage zones programmatically:

# List all zones
zones = await client.list_zones()
zones = client.list_zones_sync()  # Synchronous version

for zone in zones:
    print(f"Zone: {zone['name']} (Type: {zone.get('type', 'unknown')})")

# Advanced: Use ZoneManager directly
from brightdata import ZoneManager

async with client.engine:
    zone_manager = ZoneManager(client.engine)

    # Ensure specific zones exist
    await zone_manager.ensure_required_zones(
        web_unlocker_zone="my_custom_zone",
        serp_zone="my_serp_zone"
    )

Zone Creation API:

Endpoint: POST https://api.brightdata.com/zone
Zones are created via the Bright Data API
Supported zone types: unblocker, serp, browser
Automatically handles duplicate zones gracefully

Result Objects

All operations return rich result objects with timing and metadata:

result = client.scrape.amazon.products(url="...")

# Access data
result.success       # bool - Operation succeeded
result.data          # Any - Scraped data
result.error         # str | None - Error message if failed
result.cost          # float | None - Cost in USD
result.platform      # str | None - Platform name (e.g., "linkedin", "amazon")
result.method        # str | None - Method used: "web_scraper", "web_unlocker", "browser_api"

# Timing information
result.elapsed_ms()              # Total time in milliseconds
result.get_timing_breakdown()    # Detailed timing dict

# Serialization
result.to_dict()                      # Convert to dictionary
result.to_json(indent=2)              # JSON string
result.to_markdown()                  # GitHub-flavored markdown (NEW!)
result.save_to_file("result.json")    # Save as JSON
result.save_to_file("result.md", format="markdown")  # Save as markdown (NEW!)

🖥️ CLI Usage

The SDK includes a powerful CLI tool:

# Help
brightdata --help

# Scrape Amazon product (URL is positional argument)
brightdata scrape amazon products \
  "https://amazon.com/dp/B0CRMZHDG8"

# Search LinkedIn jobs
brightdata search linkedin jobs \
  --keyword "python developer" \
  --location "New York" \
  --remote \
  --output-file jobs.json

# Search Google (query is positional argument)
brightdata search google \
  "python tutorial" \
  --location "United States"

# Generic web scraping (URL is positional argument)
brightdata scrape generic \
  "https://example.com" \
  --response-format raw \
  --output-format pretty

Available Commands

Scrape Operations:

brightdata scrape amazon products/reviews/sellers
brightdata scrape linkedin profiles/jobs/companies/posts
brightdata scrape facebook posts-profile/posts-group/comments/reels
brightdata scrape instagram profiles/posts/comments/reels
brightdata scrape chatgpt prompt
brightdata scrape generic url

Search Operations:

brightdata search amazon products
brightdata search linkedin jobs/profiles/posts
brightdata search instagram posts/reels
brightdata search google/bing/yandex
brightdata search chatgpt

CLI Output Formats

The CLI supports two different format parameters for different purposes:

Global Output Format (`--output-format`)

Controls how results are displayed (available for ALL commands):

# JSON format (default) - Full structured output
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format json

# Pretty format - Human-readable with formatted output
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format pretty

# Minimal format - Just the data, no metadata
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format minimal

# Markdown format - GitHub-flavored tables (NEW!)
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format markdown

Generic Scraper Response Format (`--response-format`)

Controls what the API returns (generic scraper only):

# Raw format (default) - Returns HTML/text as-is
brightdata scrape generic "https://example.com" --response-format raw

# JSON format - API attempts to parse as JSON
brightdata scrape generic "https://api.example.com/data" --response-format json

Note: You can combine both:

brightdata scrape generic "https://example.com" \
  --response-format raw \
  --output-format pretty

Markdown Output Format (NEW!)

Export results as GitHub-flavored markdown tables - perfect for reports and documentation:

# CLI: Markdown output
brightdata search google "python tutorial" --output-format markdown

# Save to file
brightdata search google "python tutorial" \
  --output-format markdown \
  --output-file report.md

SDK: Markdown methods

from brightdata import BrightDataClient

client = BrightDataClient()
result = client.search.google(query="python tutorial", num_results=5)

# Generate markdown
md = result.to_markdown()
print(md)

# Save as markdown
result.save_to_file("report.md", format="markdown")

Example Output:

# Result: ✅ Success

## Metadata

| Field | Value |
|-------|-------|
| Cost | $0.0010 USD |
| Time | 1234.56ms |

## Data

| position | title | url |
|----------|-------|-----|
| 1 | The Python Tutorial | https://docs.python.org/3/tutorial/ |
| 2 | Python Tutorial - W3Schools | https://www.w3schools.com/python/ |
| 3 | Learn Python | https://www.learnpython.org/ |

🐼 Pandas Integration

Perfect for data analysis workflows:

import pandas as pd
from tqdm import tqdm
from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload

client = BrightDataClient()

# Batch scrape with progress bar
urls = ["https://amazon.com/dp/B001", "https://amazon.com/dp/B002"]
results = []

for url in tqdm(urls, desc="Scraping"):
    payload = AmazonProductPayload(url=url)
    result = client.scrape.amazon.products(**payload.to_dict())
    
    if result.success:
        results.append({
            'asin': payload.asin,
            'title': result.data.get('title'),
            'price': result.data.get('final_price'),
            'rating': result.data.get('rating'),
            'cost': result.cost,
            'elapsed_ms': result.elapsed_ms()
        })

# Create DataFrame
df = pd.DataFrame(results)

# Analysis
print(df.describe())
print(f"Total cost: ${df['cost'].sum():.4f}")
print(f"Avg rating: {df['rating'].mean():.2f}")

# Export
df.to_csv('amazon_products.csv', index=False)
df.to_excel('amazon_products.xlsx', index=False)

# Visualization
import matplotlib.pyplot as plt
df.plot(x='asin', y='rating', kind='bar', title='Product Ratings')
plt.show()

See notebooks/02_pandas_integration.ipynb for complete examples.

🎨 Dataclass Payloads

All payloads are now dataclasses with runtime validation:

Amazon Payloads

from brightdata.payloads import AmazonProductPayload, AmazonReviewPayload

# Product with validation
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123456789",
    reviews_count=50,
    images_count=10
)

# Helper properties
print(payload.asin)        # "B123456789"
print(payload.domain)      # "amazon.com"
print(payload.is_secure)   # True

# Convert to API dict
api_dict = payload.to_dict()  # Excludes None values

LinkedIn Payloads

from brightdata.payloads import LinkedInJobSearchPayload

payload = LinkedInJobSearchPayload(
    keyword="python developer",
    location="San Francisco",
    remote=True,
    experienceLevel="mid"
)

# Helper properties
print(payload.is_remote_search)  # True

# Use with client
result = client.search.linkedin.jobs(**payload.to_dict())

ChatGPT Payloads

from brightdata.payloads import ChatGPTPromptPayload

payload = ChatGPTPromptPayload(
    prompt="Explain async programming",
    web_search=True
)

# Default values
print(payload.country)  # "US" (default)
print(payload.uses_web_search)  # True

Validation Examples

# Runtime validation catches errors early
try:
    AmazonProductPayload(url="invalid-url")
except ValueError as e:
    print(e)  # "url must be valid HTTP/HTTPS URL"

try:
    AmazonProductPayload(
        url="https://amazon.com/dp/B123",
        reviews_count=-1
    )
except ValueError as e:
    print(e)  # "reviews_count must be non-negative"

🔧 Advanced Usage

Batch Operations

# Scrape multiple URLs concurrently
urls = [
    "https://amazon.com/dp/B001",
    "https://amazon.com/dp/B002",
    "https://amazon.com/dp/B003"
]

results = client.scrape.amazon.products(url=urls)

for result in results:
    if result.success:
        print(f"{result.data['title']}: ${result.data['price']}")

Platform-Specific Options

# Amazon reviews with filters
result = client.scrape.amazon.reviews(
    url="https://amazon.com/dp/B123",
    pastDays=7,              # Last 7 days only
    keyWord="quality",       # Filter by keyword
    numOfReviews=50          # Limit to 50 reviews
)

# LinkedIn jobs with extensive filters
result = client.search.linkedin.jobs(
    keyword="python developer",
    location="New York",
    country="us",
    jobType="full-time",
    experienceLevel="mid",
    remote=True,
    company="Microsoft",
    timeRange="past-week"
)

Sync vs Async Examples - Full Coverage

All SDK methods support both sync and async patterns. Choose based on your needs:

Amazon Products

# SYNC - Simple scripts
result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")

# ASYNC - Concurrent operations
import asyncio

async def scrape_amazon():
    async with BrightDataClient() as client:
        result = await client.scrape.amazon.products_async(url="https://amazon.com/dp/B123")
        return result

result = asyncio.run(scrape_amazon())

Amazon Search

# SYNC - Simple keyword search
result = client.search.amazon.products(keyword="laptop", prime_eligible=True)

# ASYNC - Batch keyword searches
async def search_amazon():
    async with BrightDataClient() as client:
        result = await client.search.amazon.products_async(
            keyword="laptop",
            min_price=50000,
            max_price=200000,
            prime_eligible=True
        )
        return result

result = asyncio.run(search_amazon())

LinkedIn Scraping

# SYNC - Single profile
result = client.scrape.linkedin.profiles(url="https://linkedin.com/in/johndoe")

# ASYNC - Multiple profiles concurrently
async def scrape_linkedin():
    async with BrightDataClient() as client:
        urls = ["https://linkedin.com/in/person1", "https://linkedin.com/in/person2"]
        results = await client.scrape.linkedin.profiles_async(url=urls)
        return results

results = asyncio.run(scrape_linkedin())

LinkedIn Job Search

# SYNC - Simple job search
result = client.search.linkedin.jobs(keyword="python", location="NYC", remote=True)

# ASYNC - Advanced search with filters
async def search_jobs():
    async with BrightDataClient() as client:
        result = await client.search.linkedin.jobs_async(
            keyword="python developer",
            location="New York",
            experienceLevel="mid",
            jobType="full-time",
            remote=True
        )
        return result

result = asyncio.run(search_jobs())

SERP API (Google, Bing, Yandex)

# SYNC - Quick Google search
result = client.search.google(query="python tutorial", location="United States")

# ASYNC - Multiple search engines concurrently
async def search_all_engines():
    async with BrightDataClient() as client:
        google = await client.search.google_async(query="python", num_results=10)
        bing = await client.search.bing_async(query="python", num_results=10)
        yandex = await client.search.yandex_async(query="python", num_results=10)
        return google, bing, yandex

results = asyncio.run(search_all_engines())

Facebook Scraping

# SYNC - Single profile posts
result = client.scrape.facebook.posts_by_profile(
    url="https://facebook.com/profile",
    num_of_posts=10
)

# ASYNC - Multiple sources
async def scrape_facebook():
    async with BrightDataClient() as client:
        profile_posts = await client.scrape.facebook.posts_by_profile_async(
            url="https://facebook.com/zuck",
            num_of_posts=10
        )
        group_posts = await client.scrape.facebook.posts_by_group_async(
            url="https://facebook.com/groups/programming",
            num_of_posts=10
        )
        return profile_posts, group_posts

results = asyncio.run(scrape_facebook())

Instagram Scraping

# SYNC - Single profile
result = client.scrape.instagram.profiles(url="https://instagram.com/instagram")

# ASYNC - Profile + posts
async def scrape_instagram():
    async with BrightDataClient() as client:
        profile = await client.scrape.instagram.profiles_async(
            url="https://instagram.com/instagram"
        )
        posts = await client.scrape.instagram.posts_async(
            url="https://instagram.com/p/ABC123"
        )
        return profile, posts

results = asyncio.run(scrape_instagram())

ChatGPT

# SYNC - Single prompt
result = client.scrape.chatgpt.prompt(prompt="Explain Python", web_search=True)

# ASYNC - Batch prompts
async def ask_chatgpt():
    async with BrightDataClient() as client:
        result = await client.scrape.chatgpt.prompts_async(
            prompts=["What is Python?", "What is JavaScript?"],
            web_searches=[False, True]
        )
        return result

result = asyncio.run(ask_chatgpt())

Generic Web Scraping

# SYNC - Single URL
result = client.scrape.generic.url(url="https://example.com")

# ASYNC - Concurrent scraping
async def scrape_multiple():
    async with BrightDataClient() as client:
        results = await client.scrape.generic.url_async([
            "https://example1.com",
            "https://example2.com",
            "https://example3.com"
        ])
        return results

results = asyncio.run(scrape_multiple())

When to Use Sync vs Async

Use Sync When:

✅ Simple scripts or notebooks
✅ Single operations at a time
✅ Learning or prototyping
✅ Sequential workflows

Use Async When:

✅ Scraping multiple URLs concurrently
✅ Combining multiple API calls
✅ Production applications
✅ Performance-critical operations

Note: Sync wrappers (e.g., profiles()) internally use asyncio.run() and cannot be called from within an existing async context. Use *_async methods when you're already in an async function.

SSL Certificate Error Handling

The SDK includes comprehensive SSL error handling with platform-specific guidance:

from brightdata import BrightDataClient
from brightdata.exceptions import SSLError

try:
    client = BrightDataClient()
    result = client.scrape.generic.url("https://example.com")
except SSLError as e:
    # Helpful error message with platform-specific fix instructions
    print(e)
    # On macOS, suggests:
    # - pip install --upgrade certifi
    # - Running Install Certificates.command
    # - Setting SSL_CERT_FILE environment variable

Common SSL fixes:

# Option 1: Upgrade certifi
pip install --upgrade certifi

# Option 2: Set SSL_CERT_FILE (macOS/Linux)
export SSL_CERT_FILE=$(python -m certifi)

# Option 3: Run Install Certificates (macOS python.org installers)
/Applications/Python\ 3.x/Install\ Certificates.command

Code Quality Improvements (PR #6)

Recent architectural refactoring includes:

1. Centralized Constants Module

All magic numbers moved to constants.py:

from brightdata.constants import (
    DEFAULT_POLL_INTERVAL,      # 10 seconds
    DEFAULT_POLL_TIMEOUT,       # 600 seconds
    DEFAULT_TIMEOUT_SHORT,      # 180 seconds
    DEFAULT_TIMEOUT_MEDIUM,     # 240 seconds
    DEFAULT_COST_PER_RECORD,    # 0.001 USD
)

2. Method Field Instead of Fallback

Results now track which method was used:

result = client.scrape.amazon.products(url="...")
print(result.method)  # "web_scraper", "web_unlocker", or "browser_api"

3. Function-Level Monitoring

Automatic tracking of which SDK functions are called:

# Automatically detected and sent in API requests
result = client.scrape.linkedin.profiles(url="...")
# Internal: sdk_function="profiles" sent to Bright Data

4. Service Class Separation

Clean separation of concerns:

ScrapeService - URL-based extraction
SearchService - Parameter-based discovery
CrawlerService - Web crawling (coming soon)
WebUnlockerService - Direct proxy access

5. Enhanced SSL Error Handling

Platform-specific guidance for certificate issues:

from brightdata.utils.ssl_helpers import (
    is_ssl_certificate_error,
    get_ssl_error_message
)

🧪 Testing

The SDK includes 365+ comprehensive tests:

# Run all tests
pytest tests/

# Run specific test suites
pytest tests/unit/              # Unit tests
pytest tests/integration/       # Integration tests
pytest tests/e2e/               # End-to-end tests

# Run with coverage
pytest tests/ --cov=brightdata --cov-report=html

🏛️ Design Philosophy

Client is single source of truth for configuration
Authentication "just works" with minimal setup
Fail fast and clearly when credentials are missing/invalid
Each platform is an expert in its domain
Scrape vs Search distinction is clear and consistent
Build for future - registry pattern enables intelligent routing

📖 Documentation

Jupyter Notebooks (Interactive)

01_quickstart.ipynb - 5-minute getting started
02_pandas_integration.ipynb - DataFrame workflows
03_amazon_scraping.ipynb - Amazon deep dive
04_linkedin_jobs.ipynb - Job market analysis
05_batch_processing.ipynb - Scale to production

Code Examples

examples/10_pandas_integration.py - Pandas integration
examples/01_simple_scrape.py - Basic usage
examples/03_batch_scraping.py - Batch operations
examples/04_specialized_scrapers.py - Platform-specific
All examples →

Documentation

API Reference
Contributing Guidelines (See upstream repo)

🔧 Troubleshooting

SSL Certificate Errors (macOS)

If you encounter SSL certificate verification errors, especially on macOS:

SSL: CERTIFICATE_VERIFY_FAILED

The SDK will provide helpful, platform-specific guidance. Quick fixes:

# Option 1: Upgrade certifi
pip install --upgrade certifi

# Option 2: Set SSL_CERT_FILE environment variable
export SSL_CERT_FILE=$(python -m certifi)

# Option 3: Run Install Certificates (macOS with python.org installer)
/Applications/Python\ 3.x/Install\ Certificates.command

# Option 4: Install via Homebrew (if using Homebrew Python)
brew install ca-certificates

Missing Token

# Error: BRIGHTDATA_API_TOKEN not found in environment

# Solution 1: Create .env file
echo "BRIGHTDATA_API_TOKEN=your_token" > .env

# Solution 2: Export environment variable
export BRIGHTDATA_API_TOKEN="your_token"

# Solution 3: Pass directly to client
client = BrightDataClient(token="your_token")

Import Errors

# If you get import errors, ensure package is installed
pip install --upgrade brightdata-sdk

# For development installation
pip install -e .

🤝 Contributing

Contributions are welcome! Check the GitHub repository for contribution guidelines.

Development Setup

git clone https://github.com/brightdata/sdk-python.git
cd sdk-python

# Install with dev dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest tests/

📊 Project Stats

Production Code: ~9,000 lines
Test Code: ~4,000 lines
Documentation: 5 Jupyter notebooks + 10 examples
Test Coverage: 502+ tests passing (Unit, Integration, E2E)
Supported Platforms: Amazon, LinkedIn, ChatGPT, Facebook, Instagram, Generic Web
Supported Search Engines: Google, Bing, Yandex
Type Safety: 100% (Dataclasses + TypedDict)
Resource Efficiency: Single shared AsyncEngine
Data Science Ready: Pandas, tqdm, joblib integration
CLI Tool: Full-featured command-line interface
Code Quality: Enterprise-grade, FAANG standards

📝 License

MIT License - see LICENSE file for details.

🔗 Links

💡 Examples

Complete Workflow Example

from brightdata import BrightDataClient

# Initialize (auto-loads from .env or environment)
client = BrightDataClient()

# Test connection
if client.test_connection_sync():
    print("✅ Connected to Bright Data API")
    
    # Get account info
    info = client.get_account_info_sync()
    print(f"Active zones: {info['zone_count']}")
    
    # Scrape Amazon product
    product = client.scrape.amazon.products(
        url="https://amazon.com/dp/B0CRMZHDG8"
    )
    
    if product.success:
        print(f"Product: {product.data[0]['title']}")
        print(f"Price: {product.data[0]['final_price']}")
        print(f"Rating: {product.data[0]['rating']}")
        print(f"Cost: ${product.cost:.4f}")
    
    # Search LinkedIn jobs
    jobs = client.search.linkedin.jobs(
        keyword="python developer",
        location="San Francisco",
        remote=True
    )
    
    if jobs.success:
        print(f"Found {len(jobs.data)} jobs")
    
    # Scrape Facebook posts
    fb_posts = client.scrape.facebook.posts_by_profile(
        url="https://facebook.com/zuck",
        num_of_posts=10,
        timeout=240
    )
    
    if fb_posts.success:
    print(f"Scraped {len(fb_posts.data)} Facebook posts")
    
    # Scrape Instagram profile
    ig_profile = client.scrape.instagram.profiles(
        url="https://instagram.com/instagram",
        timeout=240
    )
    
    if ig_profile.success:
        print(f"Profile: {ig_profile.data[0]['username']}")
        print(f"Followers: {ig_profile.data[0]['followers_count']}")
    
    # Search Google
    search_results = client.search.google(
        query="python async tutorial",
        location="United States",
        num_results=10
    )
    
    if search_results.success:
        for i, item in enumerate(search_results.data[:5], 1):
            print(f"{i}. {item.get('title', 'N/A')}")

Interactive CLI Demo

Run the included demo to explore the SDK interactively:

python demo_sdk.py

🙏 Acknowledgments

Built with best practices from:

Modern Python packaging (PEP 518, 621)
Async/await patterns
Type safety (PEP 484, 544, dataclasses)
Enterprise-grade engineering standards
Data science workflows (pandas, jupyter)

Built For

🎓 Data Scientists - Jupyter notebooks, pandas integration, visualization examples
👨‍💻 Developers - Type-safe API, comprehensive docs, CLI tool
🏢 Enterprises - Production-ready, well-tested, resource-efficient

🌟 Why Choose This SDK?

✅ Data Scientist Friendly - 5 Jupyter notebooks, pandas examples, visualization guides
✅ Type Safe - Dataclass payloads with runtime validation
✅ Enterprise Ready - 502+ tests, resource efficient, production-proven
✅ Well Documented - Interactive notebooks + code examples + API docs
✅ Easy to Use - CLI tool, intuitive API, helpful error messages
✅ Actively Maintained - Regular updates, bug fixes, new features

Ready to start scraping? Get your API token at brightdata.com and try our quickstart notebook!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Bright Data Python SDK 🐍

📑 Table of Contents

✨ Features

🎯 For Data Scientists

🏗️ Core Features

🛡️ Enterprise Grade

📓 Jupyter Notebooks (NEW!)

📦 Installation

🚀 Quick Start

Authentication

Simple Web Scraping

Using Dataclass Payloads (Type-Safe ✨)

Pandas Integration for Data Scientists 🐼

Platform-Specific Scraping

Amazon Products

LinkedIn Data

ChatGPT Interactions

Facebook Data

Instagram Data

Search Engine Results (SERP)

Async Usage

🆕 What's New in v2 2.0.0

🆕 Latest Updates (December 2025)

🎓 For Data Scientists

🎨 Dataclass Payloads (Major Upgrade)

🖥️ CLI Tool

🏗️ Architecture Improvements

🆕 Platforms & Features

🏗️ Architecture

Hierarchical Service Access

Core Components

📚 API Reference

Client Initialization

Connection Testing

Zone Management

Automatic Zone Creation

Manual Zone Management

Result Objects

🖥️ CLI Usage

Available Commands

CLI Output Formats

Global Output Format (--output-format)

Generic Scraper Response Format (--response-format)

Markdown Output Format (NEW!)

🐼 Pandas Integration

🎨 Dataclass Payloads

Amazon Payloads

LinkedIn Payloads

ChatGPT Payloads

Validation Examples

🔧 Advanced Usage

Batch Operations

Platform-Specific Options

Sync vs Async Examples - Full Coverage

Amazon Products

Amazon Search

LinkedIn Scraping

LinkedIn Job Search

SERP API (Google, Bing, Yandex)

Facebook Scraping

Instagram Scraping

ChatGPT

Generic Web Scraping

When to Use Sync vs Async

SSL Certificate Error Handling

Code Quality Improvements (PR #6)

1. Centralized Constants Module

2. Method Field Instead of Fallback

3. Function-Level Monitoring

4. Service Class Separation

5. Enhanced SSL Error Handling

🧪 Testing

🏛️ Design Philosophy

📖 Documentation

Global Output Format (`--output-format`)

Generic Scraper Response Format (`--response-format`)