This project analyzes NYC 311 noise complaint data to explore spatial and temporal reporting patterns and assess data quality considerations in large-scale civic datasets. The analysis focuses on understanding when and where noise complaints are reported, how reporting varies over time and location, and what limitations exist when interpreting historical complaint data.
This work is intended as an exploratory and analytical exercise, not as an operational or enforcement decision-making tool. Historical 311 complaint data reflects reporting behavior and access to services, which may vary across communities and over time, and should not be treated as a direct proxy for underlying incident rates.
- When do noise complaints tend to be reported?
- How does reported complaint volume vary by borough and time of day?
- Are there observable seasonal or temporal reporting patterns?
- What limitations arise when using historical complaint data for trend analysis or forecasting?
This project explicitly avoids recommending enforcement actions or resource allocation strategies. During later graduate research on AI and predictive policing, I examined how using historical data to guide future enforcement decisions can reinforce existing biases and create feedback loops that disproportionately impact marginalized communities.
As a result, forecasting and trend analysis in this repository are presented as methodological demonstrations and diagnostic tools rather than prescriptions for action. Any temporal models are included to illustrate analytical techniques and to highlight the uncertainty and limitations inherent in complaint-based data.
- Python (pandas, numpy, matplotlib, seaborn, scikit-learn) for data cleaning, exploratory analysis, and time series modeling.
- SQL for querying, aggregating, joining, and validating structured data.
- Jupyter Notebooks for transparent, reproducible analysis.
NYC 311 Service Requests filtered to noise complaints (sample dataset: 311_noise_complaints_2024.csv). Fields include date and time, complaint type, borough, and geolocation information.
nyc-noise/
├── data_raw/ # Raw data (unmodified source files)
│ └── 311_noise_complaints_2024.csv
├── data_processed/ # Cleaned/aggregated data ready for analysis
├── notebooks/ # Jupyter notebooks for EDA, forecasting, mapping
│ └── nyc_311_noise_analysis.ipynb
├── src/ # Python scripts for cleaning, feature engineering
├── assets/ # Images/plots for README and reports
├── dashboards/ # Tableau/Power BI dashboards
├── reports/ # Project reports or summaries
├── sql/
│ └── init_table.sql # Drops/creates table + loads CSV
├── scripts/
│ └── setup_db.py # Creates database + runs init_table.sql
│
├── environment.yml # Conda environment (alternative to requirements.txt)
├── LICENSE # Open-source license
└── README.md # Project overview and instructions
Baseline exploratory analysis and initial time series modeling to examine seasonality and reporting patterns. Forecasting components are included for methodological illustration and evaluation of model limitations.
- Expand data validation checks and summary tables.
- Add written interpretation of observed patterns and known data limitations.
- Create a simple dashboard for exploratory, non-operational visualization.
This repository is intended as a demonstration of data cleaning, validation, exploratory analysis, and documentation practices in a public-sector context. Findings should be interpreted cautiously and within the broader social, demographic, and institutional factors that influence service request data.
This project uses PostgreSQL for data storage and Conda for environment management.
Follow the steps below to set up the environment, load the data, and generate the cleaned datasets.
This setup script assumes PostgreSQL is installed locally and that the user can run createdb and psql without interactive password prompts. If PostgreSQL is not available, you can still run the Python notebook directly using the raw CSV.
- PostgreSQL installed and accessible via
psql - Conda (or Mamba) installed
- A PostgreSQL user with permission to create databases and tables
From the repo root (nyc-noise/), create and activate the environment:
conda env create -f environment.yml
conda activate nycnoiseRun the setup script to create the database, build tables, and load data:
python scripts/setup_db.pyIf PostgreSQL is available, this will:
- Create a database called
nyc_noiseif it does not already exist. - Run
sql/init_table.sqlto create two tables:noise_complaints_2024(raw, full schema)noise_complaints_clean(slimmed, analysis-ready schema)
- Export the cleaned SQL dataset to
data_processed/noise_complaints_clean_sql.csv.
You can also use the Jupyter notebook to produce a parallel cleaned dataset:
jupyter notebook notebooks/nyc_311_noise_analysis.ipynbThe notebook will:
- Clean and transform the raw dataset with pandas.
- Save an additional file to
data_processed/noise_complaints_clean_py.csv. - Export key visualizations into
assets/for use in the README or reports.
Check the processed files in your repo:
head data_processed/noise_complaints_clean_sql.csv
head data_processed/noise_complaints_clean_py.csvYou can connect Tableau, Python, or other tools directly to these CSVs.
- By default the script uses PGUSER if set; otherwise it uses your OS username.
- To override, set the environment variable
PGUSERbefore running the script:
PGUSER=your_pg_username python scripts/setup_db.py

