A comprehensive Bluesky scraper that collects posts, profiles, followers, and full comment threads from searches, users, or direct post URLs. It helps teams and analysts turn public Bluesky activity into clean, structured datasets for research, discovery, and automation.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for bluesky-scraper-all-in-one-1-5-1k you've just found your team — Let’s Chat. 👆👆
This project provides an all-in-one solution for extracting structured public data from Bluesky in a consistent and repeatable way. It removes the friction of manual browsing and fragmented tools by centralizing multiple collection modes into a single workflow. It is designed for researchers, marketers, analysts, and developers who need scalable access to Bluesky conversations and social graphs.
- Collects posts, profiles, followers, and follow relationships in one tool
- Supports keyword-based searches, user-centric extraction, and direct post lookups
- Designed for repeated runs to keep datasets fresh and comparable over time
- Outputs structured records ready for analysis or downstream pipelines
| Feature | Description |
|---|---|
| Multi-mode extraction | Collect posts, profiles, followers, follows, or profile details from a single configuration. |
| Keyword search | Discover conversations and trends by searching posts with specific keywords. |
| User-based scraping | Fetch posts, followers, or follows for specific user handles. |
| Engagement metrics | Capture replies, reposts, and likes for posts when available. |
| Scalable runs | Handle small tests or large data pulls with configurable limits. |
| Field Name | Field Description |
|---|---|
| kind | Type of record such as post, profile, follower, or follow. |
| query | Keyword or user handle that generated the record. |
| id | Unique identifier of the post or profile. |
| authorHandle | Bluesky handle associated with the record. |
| authorName | Display name of the author or profile. |
| text | Post content or profile bio text. |
| createdAt | Timestamp indicating when the post was created. |
| replyCount | Number of replies on a post, if available. |
| repostCount | Number of reposts on a post, if available. |
| likeCount | Number of likes on a post, if available. |
| url | Direct link to the post or profile. |
[
{
"kind": "post",
"query": "web scraping",
"id": "3kq7xexample",
"authorHandle": "example.bsky.social",
"authorName": "Example User",
"text": "Web scraping is becoming essential for modern data workflows.",
"createdAt": "2024-06-01T14:32:10.000Z",
"replyCount": 4,
"repostCount": 12,
"likeCount": 57,
"url": "https://bsky.app/profile/example.bsky.social/post/3kq7xexample"
}
]
Bluesky Scraper | All-In-One | $1.5 / 1K/
├── src/
│ ├── main.py
│ ├── actions/
│ │ ├── search_posts.py
│ │ ├── get_user_posts.py
│ │ ├── get_followers.py
│ │ ├── get_follows.py
│ │ └── get_profile.py
│ ├── parsers/
│ │ ├── post_parser.py
│ │ └── profile_parser.py
│ ├── utils/
│ │ ├── request_helper.py
│ │ └── rate_limit.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── requirements.txt
└── README.md
- Market researchers use it to track emerging topics on Bluesky, so they can identify early trends and shifts in discussion.
- Growth teams use it to discover active creators and audiences, so they can plan partnerships or outreach.
- Data analysts use it to collect engagement metrics, so they can measure resonance and content performance.
- Developers use it to build social datasets, so they can enrich internal tools or analytics dashboards.
What types of data can be collected? You can collect posts by keyword, user posts, followers, follows, and profile details, all in a structured format.
Is there a limit on how much data can be extracted? The maximum number of saved results is configurable. You can start with small limits for testing or scale up for larger datasets.
Can it be used for recurring data collection? Yes, it is suitable for repeated runs, making it useful for monitoring changes, trends, or audience growth over time.
What formats does the output support? The extracted data is structured and ready for export into common formats such as JSON, CSV, or database ingestion workflows.
Primary Metric: Processes keyword-based searches at an average rate of several hundred posts per minute, depending on query complexity.
Reliability Metric: Consistently achieves high completion rates across repeated runs with stable results.
Efficiency Metric: Optimized request handling minimizes redundant calls and balances throughput with stability.
Quality Metric: Captured records maintain high field completeness, including text content and engagement metrics where available.
