Skip to content

HigherEdData/CA-Private-Colleges

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CA-Private-Colleges

Data, analysis, and figures for the paper "Public Power, Private Debts: The Misuse of California's Tax Intercept Power for Private Colleges" by Dalié Jiménez, Jonathan Glater, Andrew Martin & Charlie Eaton, forthcoming in the Yale Law Journal Forum (2026).

Project Overview

This repository contains the data pipeline and analysis for a study of California's Interagency Intercept Collection (IIC) program, focusing on private colleges and universities. The project extracts offset data from public records obtained via the California Public Records Act (PRA), cleans and normalizes it, and produces the figures and tables used in the paper.

Repository Structure

CA-Private-Colleges/
|-- PRA Response/
|-- code/
|-- data/
|   |-- raw/
|   |-- cleaned/
|-- figures/
|-- README.md

PRA Response/

Unedited responses from the Franchise Tax Board (OCR added to enable text extraction, but content is otherwise unmodified).

code/

Scripts are numbered in pipeline order:

File Description
0_extract_iic_offset_2018_2023.py Extracts tabular data from PRA response PDFs into a normalized CSV
1_clean_iic_data.R Cleans and normalizes the extracted CSV for analysis
2_verify_extraction.R Verification checks on the extracted and cleaned data
3_iic_offset_data_analysis.R Main analysis: figures and tables for the paper
4_iic_agency_enrollments_analysis.r Enrollment timeline analysis and figures
PROJECT CONTEXT.md Detailed documentation of transformation rules, parser behavior, and data quality notes

data/

  • raw/ — Raw CSV data extracted directly from the PRA response documents
  • cleaned/ — Cleaned and normalized datasets used to create the figures and tables in the paper

figures/

All figures (charts) and tables generated by the analysis code, as used in the paper.

Reproducing the Analysis

  1. Extract data from PDFs (requires Python 3 + pdfplumber):
    python3 code/0_extract_iic_offset_2018_2023.py
    
     Run the R scripts in order (requires R + packages listed below):
    
     1_clean_iic_data.R
     2_verify_extraction.R
     3_iic_offset_data_analysis.R
     4_iic_agency_enrollments_analysis.r
    

Dependencies

Python 3 with pdfplumber (pip install pdfplumber)
R with tidyverse, lubridate, scales, readr, flextable

Data Sources

All source data was obtained via PRA requests to the California Franchise Tax Board. Original documents are preserved unmodified in PRA Response/. Citation

If you use this data or code, please cite:

Jiménez, Dalié, Jonathan Glater, Andrew Martin & Charlie Eaton. "Public Power, Private Debts: The Misuse of California's Tax Intercept Power for Private Colleges." Yale Law Journal Forum (forthcoming 2026).

License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY-4.0).

You are free to share and adapt this material for any purpose, provided you give appropriate attribution to the authors and the Yale Law Journal Forum.

About

Data for paper on "Public Power, Private Debts"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors