Skip to content

nova99355cyberk/bluesky-scraper-all-in-one-1-5-1k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Bluesky Scraper | All-In-One

A comprehensive Bluesky scraper that collects posts, profiles, followers, and full comment threads from searches, users, or direct post URLs. It helps teams and analysts turn public Bluesky activity into clean, structured datasets for research, discovery, and automation.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for bluesky-scraper-all-in-one-1-5-1k you've just found your team — Let’s Chat. 👆👆

Introduction

This project provides an all-in-one solution for extracting structured public data from Bluesky in a consistent and repeatable way. It removes the friction of manual browsing and fragmented tools by centralizing multiple collection modes into a single workflow. It is designed for researchers, marketers, analysts, and developers who need scalable access to Bluesky conversations and social graphs.

Built for scalable social data collection

  • Collects posts, profiles, followers, and follow relationships in one tool
  • Supports keyword-based searches, user-centric extraction, and direct post lookups
  • Designed for repeated runs to keep datasets fresh and comparable over time
  • Outputs structured records ready for analysis or downstream pipelines

Features

Feature Description
Multi-mode extraction Collect posts, profiles, followers, follows, or profile details from a single configuration.
Keyword search Discover conversations and trends by searching posts with specific keywords.
User-based scraping Fetch posts, followers, or follows for specific user handles.
Engagement metrics Capture replies, reposts, and likes for posts when available.
Scalable runs Handle small tests or large data pulls with configurable limits.

What Data This Scraper Extracts

Field Name Field Description
kind Type of record such as post, profile, follower, or follow.
query Keyword or user handle that generated the record.
id Unique identifier of the post or profile.
authorHandle Bluesky handle associated with the record.
authorName Display name of the author or profile.
text Post content or profile bio text.
createdAt Timestamp indicating when the post was created.
replyCount Number of replies on a post, if available.
repostCount Number of reposts on a post, if available.
likeCount Number of likes on a post, if available.
url Direct link to the post or profile.

Example Output

[
      {
        "kind": "post",
        "query": "web scraping",
        "id": "3kq7xexample",
        "authorHandle": "example.bsky.social",
        "authorName": "Example User",
        "text": "Web scraping is becoming essential for modern data workflows.",
        "createdAt": "2024-06-01T14:32:10.000Z",
        "replyCount": 4,
        "repostCount": 12,
        "likeCount": 57,
        "url": "https://bsky.app/profile/example.bsky.social/post/3kq7xexample"
      }
    ]

Directory Structure Tree

Bluesky Scraper | All-In-One | $1.5 / 1K/
├── src/
│   ├── main.py
│   ├── actions/
│   │   ├── search_posts.py
│   │   ├── get_user_posts.py
│   │   ├── get_followers.py
│   │   ├── get_follows.py
│   │   └── get_profile.py
│   ├── parsers/
│   │   ├── post_parser.py
│   │   └── profile_parser.py
│   ├── utils/
│   │   ├── request_helper.py
│   │   └── rate_limit.py
│   └── config/
│       └── settings.example.json
├── data/
│   ├── sample_input.json
│   └── sample_output.json
├── requirements.txt
└── README.md

Use Cases

  • Market researchers use it to track emerging topics on Bluesky, so they can identify early trends and shifts in discussion.
  • Growth teams use it to discover active creators and audiences, so they can plan partnerships or outreach.
  • Data analysts use it to collect engagement metrics, so they can measure resonance and content performance.
  • Developers use it to build social datasets, so they can enrich internal tools or analytics dashboards.

FAQs

What types of data can be collected? You can collect posts by keyword, user posts, followers, follows, and profile details, all in a structured format.

Is there a limit on how much data can be extracted? The maximum number of saved results is configurable. You can start with small limits for testing or scale up for larger datasets.

Can it be used for recurring data collection? Yes, it is suitable for repeated runs, making it useful for monitoring changes, trends, or audience growth over time.

What formats does the output support? The extracted data is structured and ready for export into common formats such as JSON, CSV, or database ingestion workflows.


Performance Benchmarks and Results

Primary Metric: Processes keyword-based searches at an average rate of several hundred posts per minute, depending on query complexity.

Reliability Metric: Consistently achieves high completion rates across repeated runs with stable results.

Efficiency Metric: Optimized request handling minimizes redundant calls and balances throughput with stability.

Quality Metric: Captured records maintain high field completeness, including text content and engagement metrics where available.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors