|
1 | 1 | # Bright Data Python SDK |
2 | 2 |
|
3 | | -The official Python SDK for [Bright Data](https://brightdata.com) APIs. Scrape any website, get SERP results, bypass bot detection and CAPTCHAs. |
| 3 | +The official Python SDK for [Bright Data](https://brightdata.com) APIs. Scrape any website, get SERP results, bypass bot detection and CAPTCHAs, and access 100+ ready-made datasets. |
4 | 4 |
|
5 | 5 | [](https://www.python.org/) |
6 | 6 | [](LICENSE) |
@@ -135,6 +135,55 @@ async with BrightDataClient() as client: |
135 | 135 | - `client.scrape.instagram` - profiles, posts, comments, reels |
136 | 136 | - `client.scrape.facebook` - posts, comments, reels |
137 | 137 |
|
| 138 | +## Datasets API |
| 139 | + |
| 140 | +Access 100+ ready-made datasets from Bright Data — pre-collected, structured data from popular platforms. |
| 141 | + |
| 142 | +```python |
| 143 | +async with BrightDataClient() as client: |
| 144 | + # Filter a dataset — returns a snapshot_id |
| 145 | + snapshot_id = await client.datasets.imdb_movies( |
| 146 | + filter={"name": "title", "operator": "includes", "value": "black"}, |
| 147 | + records_limit=5 |
| 148 | + ) |
| 149 | + |
| 150 | + # Download when ready (polls until snapshot is complete) |
| 151 | + data = await client.datasets.imdb_movies.download(snapshot_id) |
| 152 | + print(f"Got {len(data)} records") |
| 153 | + |
| 154 | + # Quick sample: .sample() auto-discovers fields, no filter needed |
| 155 | + # Works on any dataset |
| 156 | + snapshot_id = await client.datasets.imdb_movies.sample(records_limit=5) |
| 157 | +``` |
| 158 | + |
| 159 | +**Export results to file:** |
| 160 | + |
| 161 | +```python |
| 162 | +from brightdata.datasets import export |
| 163 | + |
| 164 | +export(data, "results.json") # JSON |
| 165 | +export(data, "results.csv") # CSV |
| 166 | +export(data, "results.jsonl") # JSONL |
| 167 | +``` |
| 168 | + |
| 169 | +**Available dataset categories:** |
| 170 | +- **E-commerce:** Amazon, Walmart, Shopee, Lazada, Zalando, Zara, H&M, Shein, IKEA, Sephora, and more |
| 171 | +- **Business intelligence:** ZoomInfo, PitchBook, Owler, Slintel, VentureRadar, Manta |
| 172 | +- **Jobs & HR:** Glassdoor (companies, reviews, jobs), Indeed (companies, jobs), Xing |
| 173 | +- **Reviews:** Google Maps, Yelp, G2, Trustpilot, TrustRadius |
| 174 | +- **Social media:** Pinterest (posts, profiles), Facebook Pages |
| 175 | +- **Real estate:** Zillow, Airbnb, and 8+ regional platforms |
| 176 | +- **Luxury brands:** Chanel, Dior, Prada, Balenciaga, Hermes, YSL, and more |
| 177 | +- **Entertainment:** IMDB, NBA, Goodreads |
| 178 | + |
| 179 | +**Discover available fields:** |
| 180 | + |
| 181 | +```python |
| 182 | +metadata = await client.datasets.imdb_movies.get_metadata() |
| 183 | +for name, field in metadata.fields.items(): |
| 184 | + print(f"{name}: {field.type}") |
| 185 | +``` |
| 186 | + |
138 | 187 | ## Async Usage |
139 | 188 |
|
140 | 189 | Run multiple requests concurrently: |
|
0 commit comments