How to build your own .md file

An agent-executable plan for extracting a writing voice from a Twitter/X archive. Paste this into Claude Code (or any coding agent) along with your archive path and author name.

Prerequisites

A Twitter/X data archive (.zip from Settings > Your Account > Download an archive)
Unzip it. The file you need is data/tweets.js
Node.js / npx tsx available

Tell the agent:

Build a writing profile for [AUTHOR NAME] using the process in PROCESS.md.
My Twitter archive is at [PATH TO data/tweets.js].
Output the final profile to [AUTHOR]_profile.md.
Ask me to validate findings before finalizing.

Phase 1: Pre-process the archive

Input: data/tweets.js Output: tweet-index.json

Write and run a script that:

Reads data/tweets.js and strips the window.YTD.tweets.part0 = prefix to get valid JSON
Filters out:
- Retweets (full_text starts with "RT @")
- Pure replies (full_text starts with "@")
- URL-only tweets (no text after stripping https?://\S+)
- Empty text
Extracts per tweet: id, text (with URLs stripped), date (created_at), likes (favorite_count as int), rts (retweet_count as int)
Sorts by likes descending
Writes to tweet-index.json
Logs total tweets scanned and total indexed

Checkpoint: Print "[N] tweets scanned, [M] indexed" before proceeding.

Phase 2: Quantitative analysis

Input: tweet-index.json (use top 500 by likes) Output: Sections written to the profile markdown file

Run each analysis as a separate task. Read the tweet index, compute the numbers, and write each section to the output file. Do not combine analyses into one pass.

2a. Shape

Compute:

Median word count per tweet
% under 10 words, under 20 words, under 50 words
% that are a single sentence (no period-separated clauses, no line breaks creating multiple statements)
Average likes by word count bracket (1-5, 6-10, 11-15, 16-20, 21-30, 31+)

Write as ## Numbers section.

2b. Perspective

Compute:

% of tweets containing "you" or "your"
% containing "I" or "my"
% containing "we" or "our"
Same ratios for top 100 only

Write as ## Perspective section.

2c. Punctuation fingerprint

Compute average per tweet:

Periods, commas, line breaks (\n), question marks, exclamation points, colons, semicolons, em dashes
% ending with period vs no punctuation vs other
Average likes for period-ending vs bare-word-ending tweets

Write as ## Punctuation section.

2d. Structure templates

Classify each of the top 500 tweets:

Single sentence
Two-part parallel ("X does A. Y does B.")
Conditional ("If X, then Y.")
Numbered list ("1. ... 2. ...")
Explicit contrast ("X vs. Y")
Other (describe)

Give percentages. Write as ## Structure templates section.

2e. Opening patterns

Classify the opening of each tweet:

Observation/declaration (starts with noun or statement of fact)
Numbered list (starts with "1." or "1)")
Conditional (starts with "If" or "When")
Imperative verb (starts with a command)
Quote (starts with attributed quote)
Question

Give percentages. Write as ## Opening patterns section.

2f. Closing patterns

Analyze the final word of each tweet:

20 most frequent final words with counts
Dominant part of speech of final word
% ending with period vs no punctuation
Average engagement for each ending type
Structural closing patterns (punchline inversion, imperative ending, question ending, simple declarative)

Write as ## Closing patterns section.

2g. Verb mood

Classify each tweet by dominant verb mood:

Declarative (statement of fact)
Imperative (command)
Conditional (if/when)
Interrogative (question)

Give percentages and average engagement per mood. Write as ## Verb mood section.

Phase 3: Rhetorical pattern extraction

Input: Top 300 tweets from tweet-index.json Output: Sections appended to the profile

3a. Rhetorical moves

Read the top 300 tweets. Identify every recurring rhetorical move. For each:

Name it
Describe the mechanics (what makes it work, not just what it is)
Give 3-5 example tweets with exact text

Look for but don't limit to: contrast pairs, reframes, quantification of abstract ideas, humor through understatement, uncomfortable truths, compressed frameworks, definitional reframes.

Write as ## Rhetorical patterns section.

3b. Word-level mechanics

Read the top 500 tweets. Run separate searches for each device:

Alliterative contrasts: Find every tweet where two compared concepts start with the same letter. List each with exact text and letter pair.
Matched meter: Find couplets where both lines have equal or near-equal syllable counts.
Chiasmus: Find A-B / B-A word-order reversals.
Circular loops: Find tweets where the ending echoes or returns to the beginning.
Internal rhyme: Find sound echoes within or across lines.
Negation flips: Find tweets using the same words with "don't" / "no" added or removed to invert meaning.
Paradox: Find self-contradictory statements that resolve into insight.
Ending word analysis: Do punchlines tend toward monosyllabic or polysyllabic words? Anglo-Saxon or Latinate?
Case patterns: Does lowercase vs capitalized serve a deliberate purpose? Compare engagement.

Write as ## Word-level mechanics section with subsections.

3c. Contrast frames

Read the top 500 tweets. Find every tweet containing a contrast or comparison. Classify the syntactic frame:

Reframe (single sentence repositioning)
Parallel declaration ("X does A. Y does B.")
Paradox (self-contradiction resolving to insight)
Conditional reveal ("If X, [surprising Y]")
Juxtaposed pair (two words, no verb)
Progression (numbered escalation)
Explicit vs ("X vs Y")
Negation flip ("Not X, Y" or "No one X, everyone Y")
Expectation subversion ("What you think: ... What's true: ...")
Labeled contrast ("[Label A]: X. [Label B]: Y.")
Chiastic inversion (ABBA word swap)
Cyclical (A->B->C->A loop)
Any other frames discovered

Rank by frequency. Give percentages and 3+ examples of each. Write as ## Contrast frames section.

3d. Colon / punctuation as device

Find every tweet using a colon. Classify:

Label for visual/thread (structural, not rhetorical)
Setup:punchline pivot (rhetorical)

For rhetorical colons: what's the word count before vs after? Compare engagement to non-colon tweets. Write as ## Colon as pivot section (or name for whatever punctuation device is most prominent).

Phase 4: Negative space and constraints

Input: Top 500 tweets Output: Sections appended to the profile

4a. What the voice never says

Read 500 tweets and check for the absence of each category:

Personal emotions ("I feel...", "I'm excited...")
Apologies or self-correction
Current events or news commentary
Personal biographical details (daily activities, travel, meals)
Engagement asks (like, share, follow, RT)
Gratitude performances ("so grateful...", "humbled...")
Vulnerability theater ("I'll be honest...", "real talk...")
Motivational cliches ("believe in yourself", "never give up")
Own success metrics (revenue, follower counts)
Pop culture references (movies, TV, sports, music)
Political opinions
Religious references
Complaining or negativity about specific people or groups
Self-promotion with direct calls to purchase

For each: confirm truly absent, or count rare exceptions and quote them. Write as ## What the voice never says section.

4b. Banned words

Based on everything analyzed so far, generate a list of words and phrases that would break this voice. Categories:

Corporate jargon
Filler phrases and hedging language
Buzzwords
Transition words that add no meaning
Cliche phrases

Write as ## Never use these words section.

4c. Anti-patterns

List structural habits the voice avoids. Things like: long explanations, preamble, summary paragraphs, excessive adjectives, em dashes, self-congratulation, hedging qualifiers. Write as ## What the voice never does section.

Phase 5: Voice description and themes

Input: All analysis completed in phases 2-4 Output: Sections prepended/appended to the profile

5a. Voice description

Write a one-paragraph description of the voice based on all quantitative and qualitative findings. No hedging. Declarative sentences only. Write as ## Voice section at the top of the file.

5b. Signature vocabulary

Find the 15-20 most frequently used meaningful words (exclude stop words). Note what the word choices reveal about the worldview. Write as ## Signature vocabulary section.

5c. Tone

Describe the tone in 3-5 bullet points based on the data. Not what the voice says, but how it feels. Write as ## Tone section.

5d. Themes

Identify 5-8 recurring themes across the corpus. For each, give a name and 2-3 example phrasings. Write as ## Themes section.

Phase 6: Teaching examples

Input: tweet-index.json + all analysis Output: Final sections of the profile

6a. Rewrite pairs

Select 10 ideas from the corpus. For each, write:

A generic version (how a default writer would say it - long, hedging, jargon)
The actual tweet (compressed, direct)

Write as ## Rewrite pairs section.

6b. Reference examples

Select 20-30 of the highest-performing tweets that best represent the voice. Choose for variety across rhetorical moves, structural patterns, and themes. Write as ## Reference tweets section.

6c. Long-form rules

If the corpus contains any long-form writing (threads, essays), describe the format: paragraph length, opening style, closing style, use of visuals. Write as ## Article format section. Skip if no long-form exists.

Phase 7: Validate with the author

This step is mandatory. Do not skip it.

Present all findings to the author grouped by category:

Quantitative findings (shape, perspective, punctuation, structure, openings, closings, verb mood)
Rhetorical moves identified
Word-level mechanics found
Contrast frames cataloged
Negative space (what the voice never says)
Banned words and anti-patterns

For each group, ask: "Which of these are intentional?"

Remove anything the author identifies as accidental or coincidental. The file should only contain deliberate patterns.

Phase 8: Assemble and order

Combine all validated sections into a single markdown file. Final section order:

## Voice
## Numbers
## Perspective
## Punctuation
## Structure templates
## What performs
## Sentence structure
## Rhetorical patterns
## Word-level mechanics
## Closing patterns
## Contrast frames
## Verb mood
## Colon as pivot
## What the voice never says
## Opening patterns
## Signature vocabulary
## Tone
## Themes
## Never use these words
## What the voice never does
## Article format
## Rewrite pairs
## Reference tweets

Write the final file to [AUTHOR]_profile.md.

Notes

1,000 posts is a minimum corpus. 10,000+ gives statistical confidence.
Engagement data is critical. Without it you're measuring output, not signal.
Run each analysis in phases 2-4 separately. Combining them produces shallow results.
The file is never finished. New patterns surface over time.
This process works for any medium. Adapt phase 1 for newsletters, blog posts, transcripts, or books.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to build your own .md file

Prerequisites

Phase 1: Pre-process the archive

Phase 2: Quantitative analysis

2a. Shape

2b. Perspective

2c. Punctuation fingerprint

2d. Structure templates

2e. Opening patterns

2f. Closing patterns

2g. Verb mood

Phase 3: Rhetorical pattern extraction

3a. Rhetorical moves

3b. Word-level mechanics

3c. Contrast frames

3d. Colon / punctuation as device

Phase 4: Negative space and constraints

4a. What the voice never says

4b. Banned words

4c. Anti-patterns

Phase 5: Voice description and themes

5a. Voice description

5b. Signature vocabulary

5c. Tone

5d. Themes

Phase 6: Teaching examples

6a. Rewrite pairs

6b. Reference examples

6c. Long-form rules

Phase 7: Validate with the author

Phase 8: Assemble and order

Notes

FilesExpand file tree

PROCESS.md

Latest commit

History

PROCESS.md

File metadata and controls

How to build your own .md file

Prerequisites

Phase 1: Pre-process the archive

Phase 2: Quantitative analysis

2a. Shape

2b. Perspective

2c. Punctuation fingerprint

2d. Structure templates

2e. Opening patterns

2f. Closing patterns

2g. Verb mood

Phase 3: Rhetorical pattern extraction

3a. Rhetorical moves

3b. Word-level mechanics

3c. Contrast frames

3d. Colon / punctuation as device

Phase 4: Negative space and constraints

4a. What the voice never says

4b. Banned words

4c. Anti-patterns

Phase 5: Voice description and themes

5a. Voice description

5b. Signature vocabulary

5c. Tone

5d. Themes

Phase 6: Teaching examples

6a. Rewrite pairs

6b. Reference examples

6c. Long-form rules

Phase 7: Validate with the author

Phase 8: Assemble and order

Notes