This script cleans up a Twitter/X archive export by extracting only public content and converting it to clean JSON format.
- tweets.json - Your 14,863 public tweets
- likes.json - 25,553 tweets you liked
- followers.json - 1,690 followers
- following.json - 1,176 accounts you follow
- profile.json - Profile information
- community-tweets.json - Community notes
- note-tweets.json - Note tweets
- Media folders - 780MB of images/videos from your tweets
- All direct messages (9.6GB of media + 220MB of text)
- Ad tracking data (ad-engagements, ad-impressions)
- Grok AI chat history
- Deleted tweets
- IP audit logs
- Device tokens and personalization data
python3 cleanup_twitter_archive.pypython3 cleanup_twitter_archive.py /path/to/twitter-archivepython3 cleanup_twitter_archive.py /path/to/archive output_folder_nametwitter_archive_clean/
├── README.txt # Human-readable summary
├── cleanup_report.json # Detailed JSON report
├── data/ # Clean JSON files
│ ├── tweets.json
│ ├── likes.json
│ ├── followers.json
│ ├── following.json
│ ├── profile.json
│ ├── community-tweets.json
│ ├── note-tweets.json
│ ├── tweet-headers.json
│ └── account.json
└── media/ # Media files
├── tweets_media/
├── profile_media/
└── community_tweet_media/
- Original size: ~11GB (with DMs and tracking data)
- Cleaned size: 819MB
- Space saved: 9.89GB
- Tweets extracted: 14,863
- Likes extracted: 25,553
- Time period: Check your tweets for date range
The script converts from Twitter's JavaScript format:
window.YTD.tweets.part0 = [{...}]To clean JSON arrays:
[
{
"tweet": {
"full_text": "...",
"created_at": "...",
...
}
}
]import json
# Load tweets
with open('twitter_archive_clean/data/tweets.json') as f:
tweets = json.load(f)
# Iterate through tweets
for item in tweets:
tweet = item['tweet']
print(f"{tweet['created_at']}: {tweet['full_text']}")from collections import Counter
from datetime import datetime
with open('twitter_archive_clean/data/tweets.json') as f:
tweets = json.load(f)
years = Counter()
for item in tweets:
date_str = item['tweet']['created_at']
# Parse: "Fri Jun 20 18:43:40 +0000 2025"
date = datetime.strptime(date_str, "%a %b %d %H:%M:%S %z %Y")
years[date.year] += 1
for year, count in sorted(years.items()):
print(f"{year}: {count} tweets")with open('twitter_archive_clean/data/tweets.json') as f:
tweets = json.load(f)
for item in tweets:
print(item['tweet']['full_text'])This script works with any Twitter/X archive export. Just:
- Download your archive from Twitter/X
- Extract it
- Run this script on the extracted folder
- The script does not modify your original archive
- All original files remain intact
- Safe to run multiple times (overwrites previous clean archive)
- Validates JSON structure to ensure data integrity
- Works with the standard Twitter archive format as of 2025