Skip to content

Commit 50a697a

Browse files
committed
add browserapi support as well as rest of the datasets
1 parent 6699b56 commit 50a697a

154 files changed

Lines changed: 4588 additions & 80 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Bright Data Python SDK Changelog
22

3+
## Version 2.2.2 - Browser API, Scraper Studio, 175 Datasets
4+
5+
- **Browser API**: Connect to cloud Chrome via CDP WebSocket. SDK builds the `wss://` URL, you connect with Playwright/Puppeteer (`client.browser.get_connect_url()`)
6+
- **Scraper Studio**: Trigger and fetch results from custom scrapers built in Bright Data's IDE (`client.scraper_studio.run()`)
7+
- **75 more datasets**: Agoda, AutoZone, BBC, Best Buy, Bluesky, Booking, Costco, eBay, Etsy, GitHub, Google News/Play/Shopping, Home Depot, Kroger, Lowe's, Macy's, Microcenter, Ozon, Quora, Realtor, Reddit, Snapchat, TikTok Shop, Tokopedia, Vimeo, Wayfair, Wikipedia, Wildberries, X/Twitter, Yahoo Finance, Zoopla, and more — **175 total**
8+
9+
---
10+
311
## Version 2.2.1 - 100 Datasets API
412

513
### ✨ New Features

notebooks/04_web_unlocker.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -326,13 +326,13 @@
326326
],
327327
"metadata": {
328328
"kernelspec": {
329-
"display_name": "Python 3",
329+
"display_name": ".venv",
330330
"language": "python",
331331
"name": "python3"
332332
},
333333
"language_info": {
334334
"name": "python",
335-
"version": "3.11.0"
335+
"version": "3.11.10"
336336
}
337337
},
338338
"nbformat": 4,

notebooks/05_scraper_studio.ipynb

Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Scraper Studio - Custom Scrapers via SDK\n",
8+
"\n",
9+
"Trigger and fetch results from your custom scrapers built via Bright Data's Scraper Studio (AI Agent, IDE, or templates).\n",
10+
"\n",
11+
"## What You'll Learn\n",
12+
"1. Setup and authentication\n",
13+
"2. Trigger a custom scraper\n",
14+
"3. Fetch results when ready\n",
15+
"4. Check job status\n",
16+
"5. Multiple inputs\n",
17+
"\n",
18+
"---\n",
19+
"\n",
20+
"## Setup"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 1,
26+
"metadata": {},
27+
"outputs": [
28+
{
29+
"name": "stdout",
30+
"output_type": "stream",
31+
"text": [
32+
"API Token: 7011787d-2...3336\n",
33+
"Setup complete!\n"
34+
]
35+
}
36+
],
37+
"source": [
38+
"import os\n",
39+
"from dotenv import load_dotenv\n",
40+
"load_dotenv()\n",
41+
"\n",
42+
"API_TOKEN = os.getenv(\"BRIGHTDATA_API_TOKEN\")\n",
43+
"if not API_TOKEN:\n",
44+
" raise ValueError(\"Set BRIGHTDATA_API_TOKEN in .env file\")\n",
45+
"\n",
46+
"print(f\"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}\")\n",
47+
"print(\"Setup complete!\")"
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"metadata": {},
53+
"source": [
54+
"## Initialize Client"
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": 2,
60+
"metadata": {},
61+
"outputs": [
62+
{
63+
"name": "stdout",
64+
"output_type": "stream",
65+
"text": [
66+
"Client initialized\n"
67+
]
68+
}
69+
],
70+
"source": [
71+
"from brightdata import BrightDataClient\n",
72+
"\n",
73+
"client = BrightDataClient(token=API_TOKEN)\n",
74+
"await client.__aenter__()\n",
75+
"\n",
76+
"# Your collector ID from Scraper Studio dashboard\n",
77+
"COLLECTOR_ID = \"c_mly0sa6x10hshxi8jb\" # Replace with your collector ID\n",
78+
"\n",
79+
"print(\"Client initialized\")"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"---\n",
87+
"\n",
88+
"## Single URL - Trigger\n",
89+
"\n",
90+
"Trigger the scraper. Returns immediately with a job object containing the `response_id`."
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": 3,
96+
"metadata": {},
97+
"outputs": [
98+
{
99+
"name": "stdout",
100+
"output_type": "stream",
101+
"text": [
102+
"Job triggered: d2t1771835182154rujjlatrcl4o\n"
103+
]
104+
}
105+
],
106+
"source": [
107+
"# Trigger - returns immediately\n",
108+
"job = await client.scraper_studio.trigger(\n",
109+
" collector=COLLECTOR_ID,\n",
110+
" input={\"url\": \"https://www.sahibinden.com/ilan/emlak-konut-satilik-golden-gate-1287846580/detay\"},\n",
111+
")\n",
112+
"print(f\"Job triggered: {job.response_id}\")"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": [
119+
"## Single URL - Fetch\n",
120+
"\n",
121+
"Try to fetch the result. If not ready yet, re-run this cell."
122+
]
123+
},
124+
{
125+
"cell_type": "code",
126+
"execution_count": 9,
127+
"metadata": {},
128+
"outputs": [
129+
{
130+
"name": "stdout",
131+
"output_type": "stream",
132+
"text": [
133+
"Got 1 record(s)\n",
134+
" title: GOLDEN GATE\n",
135+
" price: {'value': 6600000, 'currency': 'TRY', 'symbol': '₺'}\n",
136+
" property_size: 100\n",
137+
" room_count: 3\n",
138+
" building_age: 6-10 arası\n"
139+
]
140+
}
141+
],
142+
"source": [
143+
"# Fetch - single attempt, re-run if not ready\n",
144+
"try:\n",
145+
" data = await job.fetch()\n",
146+
" print(f\"Got {len(data)} record(s)\")\n",
147+
" for record in data:\n",
148+
" for key, value in list(record.items())[:5]:\n",
149+
" print(f\" {key}: {value}\")\n",
150+
"except Exception as e:\n",
151+
" print(f\"Not ready yet: {e}\\nRe-run this cell in a few seconds.\")"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"metadata": {},
157+
"source": [
158+
"---\n",
159+
"\n",
160+
"## Check Job Status\n",
161+
"\n",
162+
"Check the status of a previously triggered job using its job ID (from the Scraper Studio dashboard)."
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": 11,
168+
"metadata": {},
169+
"outputs": [
170+
{
171+
"name": "stdout",
172+
"output_type": "stream",
173+
"text": [
174+
"Job ID: j_mly4pzxd1mj4u0gjj8\n",
175+
"Status: done\n",
176+
"Collector: c_mly0sa6x10hshxi8jb\n",
177+
"Inputs: 1\n",
178+
"Lines: 1\n",
179+
"Success rate: 1\n",
180+
"Job time: 106996ms\n"
181+
]
182+
}
183+
],
184+
"source": [
185+
"# Check status of a known job\n",
186+
"JOB_ID = \"j_mly4pzxd1mj4u0gjj8\" # Replace with your job ID\n",
187+
"\n",
188+
"info = await client.scraper_studio.status(job_id=JOB_ID)\n",
189+
"\n",
190+
"print(f\"Job ID: {info.id}\")\n",
191+
"print(f\"Status: {info.status}\")\n",
192+
"print(f\"Collector: {info.collector}\")\n",
193+
"print(f\"Inputs: {info.inputs}\")\n",
194+
"print(f\"Lines: {info.lines}\")\n",
195+
"print(f\"Success rate: {info.success_rate}\")\n",
196+
"print(f\"Job time: {info.job_time}ms\")"
197+
]
198+
},
199+
{
200+
"cell_type": "markdown",
201+
"metadata": {},
202+
"source": [
203+
"---\n",
204+
"\n",
205+
"## Multiple Inputs\n",
206+
"\n",
207+
"`run()` accepts a list of inputs, triggers each, polls, and returns combined results."
208+
]
209+
},
210+
{
211+
"cell_type": "code",
212+
"execution_count": null,
213+
"metadata": {},
214+
"outputs": [],
215+
"source": [
216+
"urls = [\n",
217+
" {\"url\": \"https://www.sahibinden.com/ilan/emlak-konut-satilik-golden-gate-1287846580/detay\"},\n",
218+
" {\"url\": \"https://www.sahibinden.com/ilan/emlak-konut-satilik-golden-gate-1287846581/detay\"},\n",
219+
"]\n",
220+
"\n",
221+
"multi_data = await client.scraper_studio.run(\n",
222+
" collector=COLLECTOR_ID,\n",
223+
" input=urls,\n",
224+
" timeout=300,\n",
225+
")\n",
226+
"\n",
227+
"print(f\"Got {len(multi_data)} total record(s)\")\n",
228+
"for record in multi_data:\n",
229+
" print(f\" - {record.get('title', 'N/A')}\")"
230+
]
231+
},
232+
{
233+
"cell_type": "markdown",
234+
"metadata": {},
235+
"source": [
236+
"---\n",
237+
"\n",
238+
"## Save Results\n",
239+
"\n",
240+
"Save the scraped data to a JSON file."
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": null,
246+
"metadata": {},
247+
"outputs": [],
248+
"source": [
249+
"import json\n",
250+
"\n",
251+
"with open(\"scraper_studio_results.json\", \"w\", encoding=\"utf-8\") as f:\n",
252+
" json.dump(data, f, indent=2, ensure_ascii=False)\n",
253+
"\n",
254+
"print(f\"Saved {len(data)} record(s) to scraper_studio_results.json\")"
255+
]
256+
},
257+
{
258+
"cell_type": "markdown",
259+
"metadata": {},
260+
"source": [
261+
"---\n",
262+
"\n",
263+
"## Cleanup"
264+
]
265+
},
266+
{
267+
"cell_type": "code",
268+
"execution_count": null,
269+
"metadata": {},
270+
"outputs": [],
271+
"source": [
272+
"await client.__aexit__(None, None, None)\n",
273+
"print(\"Client closed.\")"
274+
]
275+
},
276+
{
277+
"cell_type": "markdown",
278+
"metadata": {},
279+
"source": [
280+
"---\n",
281+
"\n",
282+
"## Summary\n",
283+
"\n",
284+
"| Method | What it does |\n",
285+
"|--------|-------------|\n",
286+
"| `client.scraper_studio.run(collector, input)` | Trigger + poll + return data |\n",
287+
"| `client.scraper_studio.trigger(collector, input)` | Trigger only, returns job object |\n",
288+
"| `job.fetch()` | Single fetch attempt |\n",
289+
"| `job.wait_and_fetch(timeout)` | Poll until data arrives |\n",
290+
"| `client.scraper_studio.status(job_id)` | Check job status |\n",
291+
"| `client.scraper_studio.fetch(response_id)` | Fetch results by response_id |\n",
292+
"\n",
293+
"## Resources\n",
294+
"\n",
295+
"- [Scraper Studio Dashboard](https://brightdata.com/cp/data_collector)\n",
296+
"- [API Reference](https://docs.brightdata.com/api-reference/scraper-studio-api/)"
297+
]
298+
}
299+
],
300+
"metadata": {
301+
"kernelspec": {
302+
"display_name": "Python 3",
303+
"language": "python",
304+
"name": "python3"
305+
},
306+
"language_info": {
307+
"name": "python",
308+
"version": "3.11.0"
309+
}
310+
},
311+
"nbformat": 4,
312+
"nbformat_minor": 4
313+
}

0 commit comments

Comments
 (0)