Analyze Twitter (X) profile bios to automatically classify users based on keyword matching and intent signals. This project helps transform unstructured bio text into actionable audience segments, making Twitter bio keyword classification practical and scalable. It delivers clear insights for teams building ICPs, researching audiences, or qualifying leads.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for twitter-x-profile-bio-icp-classifier-bio-keywords-extractor you've just found your team — Let’s Chat. 👆👆
This project analyzes public Twitter (X) profile bios and classifies profiles by matching bio text against customizable keyword groups. It solves the problem of manually reviewing bios by turning free-text descriptions into structured, searchable signals. It is designed for marketers, founders, analysts, and community managers who need fast and consistent audience segmentation.
- Parses profile bios from structured profile datasets
- Matches bios against configurable keyword categories
- Assigns one or more matched keywords per profile
- Flags profiles with no matches as Unclassified
- Outputs clean, structured results ready for analysis
| Feature | Description |
|---|---|
| Keyword Group Matching | Match bios against multiple keyword categories such as SaaS, Marketing, or Developers. |
| Multi-Label Classification | Assign more than one keyword when a bio matches multiple categories. |
| Custom Taxonomies | Define and adjust keyword groups to fit any ICP or audience model. |
| Structured Output | Produces normalized JSON records for easy downstream processing. |
| Noise Reduction | Filters irrelevant bios by clearly marking unclassified profiles. |
| Field Name | Field Description |
|---|---|
| name | Display name of the Twitter (X) profile. |
| bio | Raw biography text written by the user. |
| followers_count | Total number of followers for the profile. |
| following_count | Total number of accounts the profile follows. |
| profile_url | Direct URL to the Twitter (X) profile. |
| matched_keywords | List of keywords detected in the bio text. |
[
{
"name": "Spencer Walden",
"bio": "I build things: Building software to help founders",
"followers_count": 795,
"following_count": 378,
"profile_url": "https://x.com/Swaldy",
"matched_keywords": [
"founder"
]
},
{
"name": "Yash Desai 🚀 Shipr.Dev",
"bio": "Founder @ShiprDev | Simplifying SaaS Development",
"followers_count": 2201,
"following_count": 2139,
"profile_url": "https://x.com/yhdesai",
"matched_keywords": [
"saas",
"founder"
]
}
]
Twitter(X) Profile Bio ICP Classifier - Bio Keywords Extractor/
├── src/
│ ├── runner.py
│ ├── classifier/
│ │ ├── keyword_matcher.py
│ │ └── normalizer.py
│ ├── io/
│ │ ├── dataset_loader.py
│ │ └── output_writer.py
│ └── config/
│ └── keywords.example.json
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── requirements.txt
└── README.md
- Marketing teams use it to segment Twitter audiences, so they can run highly targeted outreach campaigns.
- Founders use it to identify peers and potential partners, so they can build stronger networks faster.
- Growth analysts use it to classify large user datasets, so they can understand niche positioning trends.
- Community managers use it to tag members by role, so they can personalize engagement and content.
- Lead generation teams use it to qualify prospects, so they focus only on high-intent profiles.
Can I customize the keyword categories? Yes. Keyword groups are fully configurable, allowing you to define your own labels and matching terms based on your ICP or market.
What happens if a bio matches multiple categories? The profile will be assigned all matching keywords, enabling multi-label classification for richer analysis.
How are unclassified profiles handled? Profiles with no keyword matches are clearly labeled as Unclassified, making it easy to filter or review them separately.
Is this suitable for large datasets? Yes. The classification logic is lightweight and designed to scale efficiently across large profile datasets.
Primary Metric: Processes thousands of profiles per minute with keyword matching latency measured in milliseconds per bio.
Reliability Metric: Consistent classification results with stable matching behavior across repeated runs.
Efficiency Metric: Low memory footprint due to simple text normalization and dictionary-based matching.
Quality Metric: High precision for clearly defined keywords, producing clean and interpretable classification outputs suitable for ICP modeling.
