browndw

David Brown 👋

Computational Linguistics • Text Analysis • Corpus Analysis

I develop tools and methods for computational text analysis, focusing on corpus linguistics, rhetorical analysis, and statistical approaches to language variation. My work spans from desktop applications for researchers to R and Python packages for text analysis workflows.

🛠️ Tech Stack

📊 Stats & Impact

📈 Downloads

Platform	Package	Description
CRAN	pseudobibeR	Linguistic feature extraction
CRAN	mda.biber	Multi-Dimensional Analysis (MDA)
CRAN	spell.replacer	Probablistic spell correction
PyPI	docuscospacy	Support for spaCy DocuScope models
PyPI	pybiber	Biber feature extraction
PyPI	google_ngrams	Ngram processing
PyPI	moodswing	Sentiment trajectories

Development Packages (GitHub): mda.biber • quanteda.extras • vnc • ngramr.plus

Platform	Resource	Description
Hugging Face	en_docusco_spacy	DocuScope spaCy model
Hugging Face	HAP-E corpus	Human-AI parallel texts
Hugging Face	HAP-E mini	Human-AI parallel texts (mini)

🛠️ Featured Applications

DocuScope Corpus Analysis

Desktop & Web Applications for Corpus Analysis and Concordancing

DocuScope CA Desktop - Standalone desktop application combining part-of-speech tagging with DocuScope rhetorical analysis
DocuScope CA Online - Web-based version for corpus analysis with frequency tables, KWIC, and comparative analysis

Features: Corpus processing, frequency analysis, keyword-in-context tables, corpus comparison, advanced plotting

📦 R Packages for Text Analysis

Statistical & Linguistic Analysis

mda.biber - Multi-Dimensional Analysis (MDA) for linguistic variation across genres and registers
pseudobibeR - Extract 67 lexicogrammatical features from parsed text data for register analysis
quanteda.extras - Extended corpus functions for keyness, dispersion, and collocational analysis
vnc - Variability-Based Neighbor Clustering for data-driven periodization in historical linguistics

Data & Utilities

ngramr.plus - Extract frequency data from Google Books Ngram datasets across multiple English varieties
spell.replacer - Fast probabilistic spelling correction based on COCA frequency data

🐍 Python Packages

Text Processing & Analysis

docuscospacy - spaCy models trained on DocuScope and CLAWS7 tagset for rhetorical analysis
pybiber - Python implementation of Biber's linguistic feature extraction
google_ngrams - Process Google Ngram data with Variability-Based Neighbor Clustering
moodswing - Sentiment trajectories analysis

🤗 Hugging Face Resources

Specialized Models

en_docusco_spacy - Custom spaCy model trained on DocuScope and CLAWS7 tagset, powering the DocuScope applications above

Research Corpora

HAP-E: Human-AI Parallel Corpus - Parallel corpus of human and AI-generated texts for comparative analysis
HAP-E Mini - Smaller version of the Human-AI parallel corpus for quick testing

Browse all resources: browndw on Hugging Face

📚 Teaching & Documentation

Course Materials

cmu.textstat - R package for Carnegie Mellon's Special Topics in Statistics & Data Science course
textstat_docs - Comprehensive documentation and tutorials for statistical text analysis
cmu-textstat-docs - Course documentation and lab materials

Documentation Sites

DocuScope Documentation - Comprehensive guides for DocuScope rhetorical analysis tools
Presentations - Conference presentations and workshop materials

🔬 Research Focus

My work centers on developing computational methods for:

Multi-Dimensional Analysis - Statistical approaches to linguistic variation
Corpus Linguistics - Tools for large-scale text analysis and comparison
Rhetorical Analysis - Computational approaches to discourse analysis
Register & Genre Analysis - Automated classification of text types
Historical Linguistics - Quantitative approaches to language change

📄 Recent Publications (2024-2025)

Journal Articles

DeLuca, L. S., Reinhart, A., Weinberg, G., Laudenbach, M., Miller, S., & Brown, D. W. (2025). Developing Students' Statistical Expertise Through Writing in the Age of AI. Journal of Statistics and Data Science Education, 1-13. https://doi.org/10.1080/26939169.2025.2497547
Reinhart, A., Markey, B., Laudenbach, M., Pantusen, K., Yurko, R., Weinberg, G., & Brown, D. W. (2025). Do LLMs write like humans? Variation in grammatical and rhetorical styles. Proceedings of the National Academy of Sciences, 122(8), e2422455122. https://doi.org/10.1073/pnas.2422455122
Markey, B., Brown, D. W., Laudenbach, M., & Kohler, A. (2024). Dense and disconnected: Analyzing the sedimented style of ChatGPT-generated text at scale. Written Communication, 41(4), 571-600. https://doi.org/10.1177/07410883241263528
Laudenbach, M., Brown, D. W., Guo, Z., Ishizaki, S., Reinhart, A., & Weinberg, G. (2024). Visualizing formative feedback in statistics writing: An exploratory study of student motivation using DocuScope Write & Audit. Assessing Writing, 60, 100830. https://doi.org/10.1016/j.asw.2024.100830

Book Chapters

Brown, D. W. (2024). Dictionaries, Language Ideologies, and Language Attitudes. In E. Finegan & M. Adams (Eds.), The Cambridge Handbook of the Dictionary (pp. 277-300). Cambridge University Press. https://doi.org/10.1017/9781108864435.015

🤝 Connect

💬 Ask me about: Corpus linguistics, text analysis methods, R/Python for linguistics
� Collaborate on: Open-source text analysis tools, corpus linguistics research
🎓 Teaching: Statistical methods for text analysis, computational linguistics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly