Skip to content
View browndw's full-sized avatar

Highlights

  • Pro

Block or report browndw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
browndw/README.md

David Brown πŸ‘‹

Computational Linguistics β€’ Text Analysis β€’ Corpus Analysis

I develop tools and methods for computational text analysis, focusing on corpus linguistics, rhetorical analysis, and statistical approaches to language variation. My work spans from desktop applications for researchers to R and Python packages for text analysis workflows.

πŸ› οΈ Tech Stack

R Python Rust spaCy Streamlit Jupyter

CRAN PyPI Hugging Face Docker DocuScope

πŸ“Š Stats & Impact

GitHub profile summary
Top languages by repo Top languages by commits

πŸ“ˆ Downloads

Platform Package Description Total Downloads
CRAN pseudobibeR Linguistic feature extraction Downloads
CRAN mda.biber Multi-Dimensional Analysis (MDA) Downloads
CRAN spell.replacer Probablistic spell correction Downloads
PyPI docuscospacy Support for spaCy DocuScope models Downloads
PyPI pybiber Biber feature extraction Downloads
PyPI google_ngrams Ngram processing Downloads
PyPI moodswing Sentiment trajectories Downloads

Development Packages (GitHub): mda.biber β€’ quanteda.extras β€’ vnc β€’ ngramr.plus

Platform Resource Description Per Month Downloads
Hugging Face en_docusco_spacy DocuScope spaCy model Downloads
Hugging Face HAP-E corpus Human-AI parallel texts Downloads
Hugging Face HAP-E mini Human-AI parallel texts (mini) Downloads

πŸ› οΈ Featured Applications

DocuScope Corpus Analysis

Desktop & Web Applications for Corpus Analysis and Concordancing

  • DocuScope CA Desktop - Standalone desktop application combining part-of-speech tagging with DocuScope rhetorical analysis
  • DocuScope CA Online - Web-based version for corpus analysis with frequency tables, KWIC, and comparative analysis

Features: Corpus processing, frequency analysis, keyword-in-context tables, corpus comparison, advanced plotting


πŸ“¦ R Packages for Text Analysis

Statistical & Linguistic Analysis

  • mda.biber - Multi-Dimensional Analysis (MDA) for linguistic variation across genres and registers CRAN
  • pseudobibeR - Extract 67 lexicogrammatical features from parsed text data for register analysis CRAN
  • quanteda.extras - Extended corpus functions for keyness, dispersion, and collocational analysis
  • vnc - Variability-Based Neighbor Clustering for data-driven periodization in historical linguistics

Data & Utilities

  • ngramr.plus - Extract frequency data from Google Books Ngram datasets across multiple English varieties
  • spell.replacer - Fast probabilistic spelling correction based on COCA frequency data CRAN

🐍 Python Packages

Text Processing & Analysis

  • docuscospacy - spaCy models trained on DocuScope and CLAWS7 tagset for rhetorical analysis PyPI
  • pybiber - Python implementation of Biber's linguistic feature extraction PyPI
  • google_ngrams - Process Google Ngram data with Variability-Based Neighbor Clustering PyPI
  • moodswing - Sentiment trajectories analysis PyPI

πŸ€— Hugging Face Resources

Specialized Models

  • en_docusco_spacy - Custom spaCy model trained on DocuScope and CLAWS7 tagset, powering the DocuScope applications above

Research Corpora

Browse all resources: browndw on Hugging Face


πŸ“š Teaching & Documentation

Course Materials

  • cmu.textstat - R package for Carnegie Mellon's Special Topics in Statistics & Data Science course
  • textstat_docs - Comprehensive documentation and tutorials for statistical text analysis
  • cmu-textstat-docs - Course documentation and lab materials

Documentation Sites


πŸ”¬ Research Focus

My work centers on developing computational methods for:

  • Multi-Dimensional Analysis - Statistical approaches to linguistic variation
  • Corpus Linguistics - Tools for large-scale text analysis and comparison
  • Rhetorical Analysis - Computational approaches to discourse analysis
  • Register & Genre Analysis - Automated classification of text types
  • Historical Linguistics - Quantitative approaches to language change

πŸ“„ Recent Publications (2024-2025)

Journal Articles

  • DeLuca, L. S., Reinhart, A., Weinberg, G., Laudenbach, M., Miller, S., & Brown, D. W. (2025). Developing Students' Statistical Expertise Through Writing in the Age of AI. Journal of Statistics and Data Science Education, 1-13. https://doi.org/10.1080/26939169.2025.2497547

  • Reinhart, A., Markey, B., Laudenbach, M., Pantusen, K., Yurko, R., Weinberg, G., & Brown, D. W. (2025). Do LLMs write like humans? Variation in grammatical and rhetorical styles. Proceedings of the National Academy of Sciences, 122(8), e2422455122. https://doi.org/10.1073/pnas.2422455122

  • Markey, B., Brown, D. W., Laudenbach, M., & Kohler, A. (2024). Dense and disconnected: Analyzing the sedimented style of ChatGPT-generated text at scale. Written Communication, 41(4), 571-600. https://doi.org/10.1177/07410883241263528

  • Laudenbach, M., Brown, D. W., Guo, Z., Ishizaki, S., Reinhart, A., & Weinberg, G. (2024). Visualizing formative feedback in statistics writing: An exploratory study of student motivation using DocuScope Write & Audit. Assessing Writing, 60, 100830. https://doi.org/10.1016/j.asw.2024.100830

Book Chapters

  • Brown, D. W. (2024). Dictionaries, Language Ideologies, and Language Attitudes. In E. Finegan & M. Adams (Eds.), The Cambridge Handbook of the Dictionary (pp. 277-300). Cambridge University Press. https://doi.org/10.1017/9781108864435.015

🀝 Connect

  • πŸ’¬ Ask me about: Corpus linguistics, text analysis methods, R/Python for linguistics
  • οΏ½ Collaborate on: Open-source text analysis tools, corpus linguistics research
  • πŸŽ“ Teaching: Statistical methods for text analysis, computational linguistics

Pinned Loading

  1. docuscope-ca-online docuscope-ca-online Public template

    Python 1

  2. cmu.textstat cmu.textstat Public

    R 1

  3. docuscope-ca-desktop docuscope-ca-desktop Public

    Python

  4. pseudobibeR pseudobibeR Public

    R 6 1

  5. docuscospacy docuscospacy Public

    Python 2

  6. textstat_tools textstat_tools Public

    R 4 21