Skip to content

NashC/cashflow_analysis

Repository files navigation

Cash Flow Analysis

A Python application that analyzes personal cash flow from bank CSV exports. Achieves >90% transaction categorization accuracy using 200+ regex patterns and provides mortgage interest integration for realistic expense tracking.

What it does

  • Parses CSV exports from Chase, Wells Fargo, Bank of America, and generic bank formats with automatic encoding detection
  • Classifies every transaction into one of four flow types: INCOME, EXPENSE, INTERNAL_TRANSFER, or EXCLUDED (debt payments)
  • Categorizes transactions into 50+ categories using layered pattern matching (regex, fuzzy matching, merchant aliases) with confidence scoring
  • Calculates true net cash flow by excluding internal transfers and debt principal payments
  • Integrates mortgage payment data, separating principal (wealth transfer, excluded) from interest (true expense, included)
  • All processing is local — no external APIs, no cloud dependencies

Quick Start

cd cashflow_analysis
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Generate sample data and run analysis
python cashflow_analyzer.py --generate-sample

# Analyze your own bank export
python3 -m src.main data/your_bank_export.csv

# Enhanced analysis with mortgage data
python3 enhanced_analysis.py

# Interactive dashboard
python run_dashboard.py

How the cash flow formula works

Net Cash Flow = Income - True Expenses

Excluded from expenses:
  - Internal transfers (savings, investments — money stays in your system)
  - Credit card payments (already counted when originally spent)
  - Mortgage principal (wealth transfer, not an operating cost)

Included as expenses:
  - Mortgage interest (true operating cost)
  - All other outflows that leave your financial system

This distinction matters. Without it, savings contributions look like expenses and mortgage principal distorts your expense ratio.

Project Structure

src/
├── core/              # Transaction models, categorization constants, exceptions
├── data/              # CSV loader (multi-bank), mortgage loader, balance validator
├── categorization/    # Flow classifier (4-tier priority) and regex categorizer
├── analysis/          # Core cash flow metrics and enhanced mortgage integration
├── visualization/     # Dash dashboard
└── utils/             # Sample data generator
config/config.yaml     # Categorization rules, confidence thresholds, merchant aliases
tests/                 # Unit tests for flow classification and cash flow calculations

Testing

python -m pytest tests/ -v

Tests cover flow classification logic (ensuring transactions are assigned the correct INCOME/EXPENSE/TRANSFER/EXCLUDED type) and net cash flow calculations (verifying that transfers and debt payments are properly excluded).

Tech Stack

Python 3.13, pandas, NumPy, Plotly/Dash, PyYAML, fuzzywuzzy. Uses Decimal arithmetic throughout for financial precision.

About

Cash flow analysis from bank CSV exports with >90% categorization accuracy and mortgage integration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages