Data scientist and builder working on public-data fraud analytics, program-integrity tools, and AI-assisted developer workflows.
I build applied ML/data systems and small developer-focused products — especially tools that turn messy real-world data or repetitive engineering workflows into something clearer, faster, and easier to act on.
My public work is split across a few areas:
- Fraud / program-integrity analytics
- Public-data investigation tooling
- LLM-assisted workflows
- Developer utilities and API tooling
- Lightweight SaaS/product experiments
Fraud-lead research pipeline over 11.4M public PPP loans, validated against real enforcement outcomes from DOJ/SBA-OIG sources.
This project focuses on turning public records into transparent, defensible investigative leads — not accusations.
Highlights:
- Built a local analytical warehouse from public SBA/DOJ data
- Used anomaly detection and ML experiments to rank potentially suspicious loans
- Added positive-unlabeled learning and LightGBM experiments
- Benchmarked signals against known prosecuted cases
- Included bootstrap confidence intervals and clear model limitations
- Built Streamlit views for analyst-style review
- Added LLM-assisted entity resolution, retrieval, and similar-case workflows
- Framed outputs responsibly as statistical leads, not proof of fraud
Stack: Python · DuckDB · pandas · scikit-learn · LightGBM · Streamlit · LLM workflows · graph/retrieval methods
I use GitHub organizations to separate different kinds of work instead of mixing every project into one personal account.
Developer tooling for app localization and translation workflows.
Shipi18n is focused on making internationalization easier for developers: translating locale files, preserving placeholders, supporting JSON/i18n workflows, and integrating translation into existing build or automation pipelines.
Examples of work in this area:
- Translation APIs for developers
- CLI tooling for locale-file translation
- Vite/plugin-style integrations
- Demo repos and framework examples
- Automation around i18n files, placeholders, and translation memory
Developer tooling around API specs, structured outputs, and engineering workflow automation.
This workstream is for tools that help developers move faster around specs, schemas, generated code, API documentation, or related automation.
Examples of work in this area:
- API/spec utilities
- Schema and contract tooling
- Developer workflow automation
- Lightweight AI-assisted coding tools
- Experiments around turning specs into usable project assets
I am building public portfolio projects that show the kind of work I can discuss openly when client work cannot be shared in detail.
My emphasis is on:
- Messy real-world data
- Transparent assumptions
- Reproducible pipelines
- Honest validation
- Practical analyst workflows
- LLMs used as workflow support, not magic black boxes
- Small tools that solve specific developer pain points
Python · SQL · DuckDB · pandas · scikit-learn · LightGBM · Streamlit · AWS · LLM APIs · LangChain · LangGraph · JavaScript/TypeScript · GitHub Actions
- Fraud detection and program integrity
- Public-sector and government data systems
- Applied ML that survives honest evaluation
- LLMs for retrieval, triage, entity resolution, and analyst workflows
- Reproducible data pipelines
- Developer tools and workflow automation
- API/spec tooling
- Lightweight SaaS products