Visa AI Hackathon Submission Core Stack: FastAPI โข LangGraph โข React โข Google Gemini 3.1 Pro Preview โข Recharts
In the modern financial world, compliance is a bottleneck. Data teams spend thousands of hours manually mapping columns (e.g., "Is 'Billing_Loc' an address?"), running strict arithmetic checks (Basel III), and cross-referencing global standards like GDPR and PCI DSS. A single human error can lead to millions in fines.
FinAUDIT was born to solve this.
It is not just a tool; it is an Autonomous AI Auditor. It combines the mathematical certainty of code (Deterministic) with the reasoning power of Large Language Models (Probabilistic). This document tells the story of how a raw file becomes a boardroom-ready audit report.
"Data enters the system, but the risk stays out."
The journey begins when a user uploads a CSV or Excel file via the secured React Frontend. In a traditional system, this file might be stored dangerously. In FinAUDIT, we use a Zero-Copy PII Architecture.
Before any processing happens, the Python Backend (FastAPI) captures the file stream in-memory.
- Profiling: We use
pandasto extract only statistical metadata. - Data Discard: The raw rows (containing Names, CC Numbers) are discarded immediately after profiling.
- Artifact Creation: The system generates a
metadataobject that represents the shape of the data, not the content.
// The only data that survives Ingestion
{
"total_rows": 50000,
"columns": {
"transaction_amnt": {
"type": "float",
"null_percentage": 0.02,
"min": -50.0
},
"customer_ssn": { "type": "string", "unique_count": 49000 },
"residency_code": { "type": "string", "null_percentage": 15.4 }
}
}Technical Note: This JSON is safe to send to any LLM because it contains no PII, yet it describes the data perfectly.
"Before the Agent thinks, the Code verifies."
We cannot hallucinate compliance. A Credit Card number is either masked (PCI DSS pass) or it isn't (Fail). We built a custom RulesEngine in core/rules_engine.py that acts as the deterministic foundation.
The system does not require users to tag columns. It uses Advanced Regex Semantics to "understand" the schema.
| Concept | Patterns Detected (Partial List) |
| :------------- | :------------------------------- | --------- | --------- | ------------ | ----------- | ------------ | ---------------- |
| Address | r"address | domicile | residency | municipality | territory | provenance" |
| KYC | r"tin | ein | ssn | passport | national_id | govt_id | driver_license" |
| Financials | r"principal | exposure | remitter | beneficiary | ledger | gl_code" |
| Security | r"token | encyrpted | cipher | hash | salt | key_id" |
Once columns are mapped, the engine executes 30+ strict binary checks across 5 standards:
- Visa CEDP:
- Check:
visa_no_unauthorized_storage - Logic: IF column matches
pan|credit_cardAND format israw(detected via sample stats) โ FAIL.
- Check:
- GDPR:
- Check:
gdpr_storage_limitation - Logic: MUST have
retention|purge|ttlcolumns present.
- Check:
- Basel III:
- Check:
basel_amount_accuracy - Logic:
minvalue ofamountcolumns must be >= 0 (No negative exposure).
- Check:
- AML/FATF:
- Check:
aml_suspicious_patterns - Logic: Data must contain both
amount(Volume) andtimestamp(Velocity) to support audit. - Check:
aml_kyc_identifier(Must be Present).
- Check:
Output: A scores dictionary (0-100) for every dimension.
"The logic finds the errors. The AI explains the risk."
This is the crown jewel. We use LangGraph to orchestrate a State Machine of AI Agents. It is a 4-step pipeline that mimics a human audit team.
Data flows through the graph in this schema:
class AgentState(TypedDict):
metadata: dict # The Safe JSON from Chapter 1
scores: dict # The Rules Result from Chapter 2
dataset_type: str # e.g., "High-Velocity Transaction Ledger"
insights: str # "Health is low due to PCI failures"
privacy_check: str # "Passed: No SSN in keys"
analysis: dict # Final ReportThe AI architecture is not a black box. It is a Sequential State Machine where each agent passes a structured "State Object" to the next.
- Role: The Gatekeeper.
- Input: Raw
metadata(Column names). - Action: It executes a pre-LLM scan using a restricted keyword list (
ssn,password,secret). - Logic:
- Safe: "No PII keys found. Proceed to analysis."
- Unsafe: "ALERT: Column 'user_password' detected." -> ABORT.
- Why?: To strictly prevent the LLM from even seeing potentially compromised schema keys.
- Role: The Context Engine.
- Input:
metadata+privacy_check_result. - Action: It looks at the combination of columns to determine the dataset's purpose.
- Decision Tree:
- IF
amounts+dates+gl_codeARE PRESENT โ Classify as "Financial Ledger". - IF
passport+dob+addressARE PRESENT โ Classify as "KYC Identity Data".
- IF
- Why?: A "Missing Address" is critical for KYC data but irrelevant for a General Ledger. Context changes the rules.
- Role: The Quantitative Scientist.
- Input:
scores(from Rules Engine) +dataset_type. - Action: It translates raw numbers into narrative trends.
- Output Example:
"Health Score is 45/100. While GDPR compliance is perfect (100%), the dataset fails the 'Visa CEDP' check because 15% of rows in the 'Credit_Card' column appear unmasked."
- Why?: The LLM needs a summarized "view" of the math, not just a raw dump of 50 score variables.
- Role: Chief Compliance Officer.
- Model: Gemini 3.1 Pro Preview (Chosen for 128k context window).
- Input:
Insight Narrative+Detailed Rule Failures+Regulatory Text. - Action: It generates the remediation strategy.
- Logic:
- Identify: Which failure carries the highest legal penalty? (e.g., Unmasked PAN > Missing Date).
- Prioritize: Label fixes as CRITICAL, HIGH, or MEDIUM.
- Prescribe: Write specific SQL/Python remediation steps (e.g., "Run
UPDATE table SET pan = MASK(pan)").
- Why?: Compliance is about priority. You fix the jail-time risks first.
"An independent auditor, accessible 24/7."
Users can chat with their data. To optimize for latency and cost, we use a Split-Stack Architecture:
- The Auditor (Backend): Uses Gemini 3.1 Pro Preview for the heavy lifting (Generating the report).
- The Chatbot (Frontend Interaction): Uses Gemini 3 Flash Preview.
- Why? Gemini 3 Flash Preview is fast and cost-effective. It takes the Report generated by Gemini 3.1 Pro Preview as context and answers user questions.
- Example User Query: "Why did we fail the PCI check?"
- Bot Response: "We failed because column 'CC_Num' was detected with unmasked values, violating Visa CEDP Requirement 3."
"Compliance at a glance."
The Dashboard acts as the mission control.
- Tech: React 18 + Vite (for blink-speed HMR) + Recharts.
- Visualization:
- Health Dial: An animated gauge showing the global trust score.
- Radar Charts: Comparing performance across dimensions (AML vs GDPR vs Visa).
- Remediation List: A sorted list of actionable fixes, prioritized by the AI (Critical first).
- UX Detail: We purposely solved the "Recharts Resizing" bug using absolute positioning tech, ensuring the dashboard looks perfect on 4K monitors and laptops alike.
"How do we know the AI didn't lie?"
FinAUDIT uses a Cryptographic Ledger to prove that every report is authentic. Think of it like a Digital Wax Seal on an envelope.
Imagine you are sending a secret letter:
- The Fingerprint: We take the entire audit report and put it through a mathematical shredder (SHA-256) that turns it into a unique string of characters called a "Hash".
- Analogy: If you change even one comma in the report, the Hash changes completely.
- The Signature: We stamp this Hash with our Private Key (which only the system possesses).
- The Verification: Anyone with our Public Key can unlock the stamp and check the Hash. If it matches, they know the report hasn't been touched since we created it.
Every API response includes a provenance block:
"provenance": {
"timestamp": "2024-02-02T12:00:00Z",
"fingerprint": "a1b2c3d4...", // The SHA-256 Hash
"signature": "base64_rsa_signature...", // The RSA-2048 Signed Hash
"algorithm": "RSA-SHA256"
}Why is this better than Blockchain? It provides the same Immutability (you can't fake it) but instantly, without waiting for miners or paying gas fees. It is the perfect "Lite" solution for high-speed audits.
graph TD
A[User Upload] -->|Stream| B(FastAPI Backend)
B -->|Pandas| C{Profiling}
C -->|Metadata Only| D[Rules Engine]
D -->|Scores| E[LangGraph Agent]
E -->|Node 1| F[Privacy Guard]
F -->|Node 2| G[Metadata Analyst]
G -->|Node 3| H[Insights Agent]
H -->|Node 4| I[Gemini 3.1 Advisory]
I -->|JSON Report| J[React Dashboard]
1. Clone the Repo
git clone https://github.com/GaneshArihanth/FinAUDIT.git
cd FinAUDIT2. Backend Setup
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt3. Frontend Setup
cd frontend
npm install
npm run dev4. Configuration (.env) You must provide keys for the Split-Stack AI:
# For the heavy analysis (Agent)
GOOGLE_API_KEY=AIzaSy_Gemini3.1_Key...
# For the fast chat (Chatbot)
GOOGLE_CHAT_API_KEY=AIzaSy_Gemini3_Key...Traditional compliance is Reactiveโyou fix issues after the audit fails. FinAUDIT is Proactiveโit tells you the risk the moment the data is born.
By strictly separating the "How" (Code/Regex) from the "Why" (AI/Gemini), we have built a system that is:
- Hallucination-Proof: The AI cannot invent a passing score.
- Privacy-Preserving: Designed for Banking standards.
- Explainable: Every decision traces back to a specific Rule ID.
FinAUDIT: Trust your data.