Skip to content

techmillicentbooker/twitter-x-profile-bio-icp-classifier-bio-keywords-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Twitter(X) Profile Bio ICP Classifier - Bio Keywords Extractor

Analyze Twitter (X) profile bios to automatically classify users based on keyword matching and intent signals. This project helps transform unstructured bio text into actionable audience segments, making Twitter bio keyword classification practical and scalable. It delivers clear insights for teams building ICPs, researching audiences, or qualifying leads.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for twitter-x-profile-bio-icp-classifier-bio-keywords-extractor you've just found your team — Let’s Chat. 👆👆

Introduction

This project analyzes public Twitter (X) profile bios and classifies profiles by matching bio text against customizable keyword groups. It solves the problem of manually reviewing bios by turning free-text descriptions into structured, searchable signals. It is designed for marketers, founders, analysts, and community managers who need fast and consistent audience segmentation.

Bio-Driven Audience Classification

  • Parses profile bios from structured profile datasets
  • Matches bios against configurable keyword categories
  • Assigns one or more matched keywords per profile
  • Flags profiles with no matches as Unclassified
  • Outputs clean, structured results ready for analysis

Features

Feature Description
Keyword Group Matching Match bios against multiple keyword categories such as SaaS, Marketing, or Developers.
Multi-Label Classification Assign more than one keyword when a bio matches multiple categories.
Custom Taxonomies Define and adjust keyword groups to fit any ICP or audience model.
Structured Output Produces normalized JSON records for easy downstream processing.
Noise Reduction Filters irrelevant bios by clearly marking unclassified profiles.

What Data This Scraper Extracts

Field Name Field Description
name Display name of the Twitter (X) profile.
bio Raw biography text written by the user.
followers_count Total number of followers for the profile.
following_count Total number of accounts the profile follows.
profile_url Direct URL to the Twitter (X) profile.
matched_keywords List of keywords detected in the bio text.

Example Output

[
      {
        "name": "Spencer Walden",
        "bio": "I build things: Building software to help founders",
        "followers_count": 795,
        "following_count": 378,
        "profile_url": "https://x.com/Swaldy",
        "matched_keywords": [
              "founder"
        ]
      },
      {
        "name": "Yash Desai 🚀 Shipr.Dev",
        "bio": "Founder @ShiprDev | Simplifying SaaS Development",
        "followers_count": 2201,
        "following_count": 2139,
        "profile_url": "https://x.com/yhdesai",
        "matched_keywords": [
              "saas",
              "founder"
        ]
      }
]

Directory Structure Tree

Twitter(X) Profile Bio ICP Classifier - Bio Keywords Extractor/
├── src/
│   ├── runner.py
│   ├── classifier/
│   │   ├── keyword_matcher.py
│   │   └── normalizer.py
│   ├── io/
│   │   ├── dataset_loader.py
│   │   └── output_writer.py
│   └── config/
│       └── keywords.example.json
├── data/
│   ├── input.sample.json
│   └── output.sample.json
├── requirements.txt
└── README.md

Use Cases

  • Marketing teams use it to segment Twitter audiences, so they can run highly targeted outreach campaigns.
  • Founders use it to identify peers and potential partners, so they can build stronger networks faster.
  • Growth analysts use it to classify large user datasets, so they can understand niche positioning trends.
  • Community managers use it to tag members by role, so they can personalize engagement and content.
  • Lead generation teams use it to qualify prospects, so they focus only on high-intent profiles.

FAQs

Can I customize the keyword categories? Yes. Keyword groups are fully configurable, allowing you to define your own labels and matching terms based on your ICP or market.

What happens if a bio matches multiple categories? The profile will be assigned all matching keywords, enabling multi-label classification for richer analysis.

How are unclassified profiles handled? Profiles with no keyword matches are clearly labeled as Unclassified, making it easy to filter or review them separately.

Is this suitable for large datasets? Yes. The classification logic is lightweight and designed to scale efficiently across large profile datasets.


Performance Benchmarks and Results

Primary Metric: Processes thousands of profiles per minute with keyword matching latency measured in milliseconds per bio.

Reliability Metric: Consistent classification results with stable matching behavior across repeated runs.

Efficiency Metric: Low memory footprint due to simple text normalization and dictionary-based matching.

Quality Metric: High precision for clearly defined keywords, producing clean and interpretable classification outputs suitable for ICP modeling.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors