Twitter Archive Cleanup Script

This script cleans up a Twitter/X archive export by extracting only public content and converting it to clean JSON format.

What It Does

Keeps (Converted to JSON):

tweets.json - Your 14,863 public tweets
likes.json - 25,553 tweets you liked
followers.json - 1,690 followers
following.json - 1,176 accounts you follow
profile.json - Profile information
community-tweets.json - Community notes
note-tweets.json - Note tweets
Media folders - 780MB of images/videos from your tweets

Removes (Saves ~9.89 GB):

All direct messages (9.6GB of media + 220MB of text)
Ad tracking data (ad-engagements, ad-impressions)
Grok AI chat history
Deleted tweets
IP audit logs
Device tokens and personalization data

Usage

Basic Usage (Current Directory)

python3 cleanup_twitter_archive.py

Specify Archive Path

python3 cleanup_twitter_archive.py /path/to/twitter-archive

Specify Output Directory

python3 cleanup_twitter_archive.py /path/to/archive output_folder_name

Output Structure

twitter_archive_clean/
├── README.txt              # Human-readable summary
├── cleanup_report.json     # Detailed JSON report
├── data/                   # Clean JSON files
│   ├── tweets.json
│   ├── likes.json
│   ├── followers.json
│   ├── following.json
│   ├── profile.json
│   ├── community-tweets.json
│   ├── note-tweets.json
│   ├── tweet-headers.json
│   └── account.json
└── media/                  # Media files
    ├── tweets_media/
    ├── profile_media/
    └── community_tweet_media/

Results from This Archive

Original size: ~11GB (with DMs and tracking data)
Cleaned size: 819MB
Space saved: 9.89GB
Tweets extracted: 14,863
Likes extracted: 25,553
Time period: Check your tweets for date range

JSON Format

The script converts from Twitter's JavaScript format:

window.YTD.tweets.part0 = [{...}]

To clean JSON arrays:

[
  {
    "tweet": {
      "full_text": "...",
      "created_at": "...",
      ...
    }
  }
]

Working with the Data

Python Example

import json

# Load tweets
with open('twitter_archive_clean/data/tweets.json') as f:
    tweets = json.load(f)

# Iterate through tweets
for item in tweets:
    tweet = item['tweet']
    print(f"{tweet['created_at']}: {tweet['full_text']}")

Count Tweets by Year

from collections import Counter
from datetime import datetime

with open('twitter_archive_clean/data/tweets.json') as f:
    tweets = json.load(f)

years = Counter()
for item in tweets:
    date_str = item['tweet']['created_at']
    # Parse: "Fri Jun 20 18:43:40 +0000 2025"
    date = datetime.strptime(date_str, "%a %b %d %H:%M:%S %z %Y")
    years[date.year] += 1

for year, count in sorted(years.items()):
    print(f"{year}: {count} tweets")

Extract All Tweet Text

with open('twitter_archive_clean/data/tweets.json') as f:
    tweets = json.load(f)

for item in tweets:
    print(item['tweet']['full_text'])

Reusability

This script works with any Twitter/X archive export. Just:

Download your archive from Twitter/X
Extract it
Run this script on the extracted folder

Notes

The script does not modify your original archive
All original files remain intact
Safe to run multiple times (overwrites previous clean archive)
Validates JSON structure to ensure data integrity
Works with the standard Twitter archive format as of 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Twitter Archive Cleanup Script

What It Does

Keeps (Converted to JSON):

Removes (Saves ~9.89 GB):

Usage

Basic Usage (Current Directory)

Specify Archive Path

Specify Output Directory

Output Structure

Results from This Archive

JSON Format

Working with the Data

Python Example

Count Tweets by Year

Extract All Tweet Text

Reusability

Notes

FilesExpand file tree

USAGE.md

Latest commit

History

USAGE.md

File metadata and controls

Twitter Archive Cleanup Script

What It Does

Keeps (Converted to JSON):

Removes (Saves ~9.89 GB):

Usage

Basic Usage (Current Directory)

Specify Archive Path

Specify Output Directory

Output Structure

Results from This Archive

JSON Format

Working with the Data

Python Example

Count Tweets by Year

Extract All Tweet Text

Reusability

Notes