Fraud Detection TensorFlow

Production-grade fraud detection system using a TensorFlow autoencoder + classifier ensemble with Vertex AI pipeline patterns, FastAPI serving, and real-time model monitoring.

Architecture

                    +-----------------------+
                    |   Transaction Input   |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Preprocessing      |
                    |  (Scale + Encode)     |
                    +-----------+-----------+
                                |
              +-----------------+-----------------+
              |                                   |
  +-----------v-----------+           +-----------v-----------+
  |     Autoencoder       |           |   Binary Classifier   |
  |  (Anomaly Detection)  |           |   (Focal Loss)        |
  |                       |           |                       |
  |  Input -> 64 -> 32    |           |  Input -> 128(relu)   |
  |  -> 16 -> 32 -> 64    |           |  -> Drop -> 64(relu)  |
  |  -> Output             |           |  -> Drop -> 32(relu)  |
  |                       |           |  -> 1(sigmoid)        |
  |  Trained on legit     |           |  Trained with focal   |
  |  transactions only    |           |  loss (alpha=0.75)    |
  +-----------+-----------+           +-----------+-----------+
              |                                   |
              |  Reconstruction Error             |  Fraud Probability
              |  (normalized)                     |
              +-----------------+-----------------+
                                |
                    +-----------v-----------+
                    |   Ensemble Scoring    |
                    |                       |
                    |  score = 0.4 * AE     |
                    |       + 0.6 * CLF     |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |   Risk Classification |
                    |                       |
                    |  < 0.3  -> LOW        |
                    |  < 0.6  -> MEDIUM     |
                    |  < 0.85 -> HIGH       |
                    |  >= 0.85 -> CRITICAL  |
                    +-----------------------+

Training Pipeline

The training pipeline follows a two-stage approach designed for the severe class imbalance inherent in fraud detection (2% fraud rate):

Stage 1: Autoencoder Pretraining

Train a symmetric dense autoencoder on legitimate transactions only
The model learns the normal distribution of transaction features
Reconstruction error on unseen transactions serves as an anomaly score
Architecture: Input(N) -> 64 -> 32 -> 16 -> 32 -> 64 -> Output(N)

Stage 2: Classifier Training

Train a binary classifier on SMOTE-resampled data
Uses focal loss (alpha=0.75, gamma=2.0) to handle class imbalance
Focal loss down-weights well-classified examples, focusing on hard negatives
Architecture: Input(N) -> 128(relu) -> Dropout(0.3) -> 64(relu) -> Dropout(0.3) -> 32(relu) -> 1(sigmoid)

Stage 3: Ensemble Calibration

Combine autoencoder anomaly scores with classifier probabilities
Calibrate normalization parameters on legitimate transaction statistics
Weighted fusion: score = 0.4 * normalized_ae_error + 0.6 * classifier_prob

# Run training locally
python -m src.pipeline.local_runner --n-transactions 100000 --ae-epochs 50 --clf-epochs 30

# Or use the Python API
from src.training.train import run_training_pipeline, TrainingConfig
config = TrainingConfig(n_transactions=100_000)
result = run_training_pipeline(config)

Serving

FastAPI Endpoint

# Start the API server
uvicorn src.serving.endpoint:app --host 0.0.0.0 --port 8000

# Score a transaction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "amount": 9500.00,
    "merchant_category": "electronics",
    "hour_of_day": 3,
    "day_of_week": 6,
    "distance_from_home": 250.0,
    "distance_from_last_transaction": 180.0,
    "ratio_to_median_purchase": 15.0,
    "is_foreign": true,
    "uses_chip": false,
    "uses_pin": false,
    "online_order": true
  }'

Response:

{
  "fraud_score": 0.87,
  "risk_level": "CRITICAL",
  "ae_score": 0.82,
  "clf_score": 0.91,
  "latency_ms": 3.45
}

Gradio Demo

python app.py

Three-tab interface:

Detect Fraud - Input transaction fields and get real-time fraud scoring
Batch Analysis - Upload CSV for batch processing with downloadable results
Model Info - Architecture details, metrics, and monitoring explanation

Monitoring

Drift Detection

Kolmogorov-Smirnov two-sample test per feature
Configurable significance level and drift fraction threshold
Alerts when more than 30% of features show significant distribution shift

Performance Tracking

Rolling window FPR, latency percentiles (P50/P95/P99)
Risk level distribution monitoring
Configurable alert thresholds for FPR and latency degradation

from src.monitoring.drift_detector import DriftDetector
from src.monitoring.performance_tracker import PerformanceTracker

detector = DriftDetector(reference_data=X_train, feature_names=feature_names)
report = detector.detect(production_data)
print(detector.summary(report))

Vertex AI Pipeline

Pipeline definition using KFP SDK for Google Cloud Vertex AI deployment:

Generate Data -> Preprocess -> Train Model -> Evaluate

# Compile the pipeline
from kfp import compiler
from src.pipeline.vertex_pipeline import fraud_pipeline

compiler.Compiler().compile(
    pipeline_func=fraud_pipeline,
    package_path="fraud_pipeline.yaml",
)

Project Structure

fraud-detection-tf/
├── src/
│   ├── data/           # Schema, synthetic generator, preprocessing
│   ├── model/          # Autoencoder, classifier, focal loss, ensemble
│   ├── training/       # Training pipeline, evaluation, tf.data utilities
│   ├── serving/        # FastAPI endpoint, request logging, schemas
│   ├── monitoring/     # Drift detection, performance tracking
│   ├── pipeline/       # Vertex AI pipeline definition, local runner
│   ├── deploy/         # Hugging Face Hub push utilities
│   └── utils/          # Device detection, GPU configuration
├── tests/              # Comprehensive test suite (30+ tests)
├── configs/            # YAML configuration files
├── app.py              # Gradio demo application
├── Dockerfile          # Multi-stage build (GPU trainer + slim server)
└── .github/workflows/  # CI/CD pipeline

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v --cov=src

# Lint
ruff check src/ tests/
ruff format src/ tests/

# Type check
mypy src/ --ignore-missing-imports

Technical Highlights

Focal Loss: Custom implementation handles 50:1 class imbalance without naive oversampling degradation
SMOTE Resampling: Synthetic minority oversampling on training set only to prevent data leakage
Ensemble Approach: Combines unsupervised anomaly detection with supervised classification for robust fraud scoring
KS-test Monitoring: Statistical drift detection catches distribution shift before model degradation
Rule-based Fallback: API serves predictions even without a trained model using interpretable heuristics

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
configs		configs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection TensorFlow

Architecture

Training Pipeline

Stage 1: Autoencoder Pretraining

Stage 2: Classifier Training

Stage 3: Ensemble Calibration

Serving

FastAPI Endpoint

Gradio Demo

Monitoring

Drift Detection

Performance Tracking

Vertex AI Pipeline

Project Structure

Development

Technical Highlights

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection TensorFlow

Architecture

Training Pipeline

Stage 1: Autoencoder Pretraining

Stage 2: Classifier Training

Stage 3: Ensemble Calibration

Serving

FastAPI Endpoint

Gradio Demo

Monitoring

Drift Detection

Performance Tracking

Vertex AI Pipeline

Project Structure

Development

Technical Highlights

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages