Skip to content

subhlabh610/TradeGuardAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TradeGuard AI Testing Framework

CI

Overview

TradeGuard AI Testing Framework is an AI-powered quality engineering solution designed for validating Large Language Model (LLM) applications in the Capital Markets domain. The framework leverages Python, Pytest, and DeepEval to evaluate the accuracy, consistency, and regulatory compliance of AI-generated responses related to market surveillance and trade monitoring.

The primary objective of the framework is to ensure that AI systems can correctly identify and explain suspicious trading behaviours while maintaining alignment with financial regulations and market abuse guidelines.


Business Context

Financial institutions, investment banks, and regulatory technology teams increasingly use AI-powered systems to assist analysts in detecting potential market abuse and suspicious trading activities.

Regulatory bodies such as the Monetary Authority of Singapore (MAS) and the Markets in Financial Instruments Directive II (MiFID II) require firms to implement effective surveillance mechanisms capable of identifying manipulative trading practices and maintaining market integrity.

This framework provides a structured approach to validating AI-driven surveillance use cases against realistic trading scenarios.


Key Detection Scenarios

The framework evaluates AI responses against various market manipulation patterns, including:

1. Wash Trades

Detection of transactions where there is no genuine change in beneficial ownership despite apparent trading activity.

2. Layering

Identification of multiple deceptive orders placed at different price levels to create a false impression of market demand or supply.

3. Spoofing

Detection of large non-bona fide orders intended to manipulate market perception and subsequently cancelled before execution.

4. Normal Trades

Validation that legitimate trades are not incorrectly flagged — false positive detection.


Architecture

Trade Dataset (JSON)
        ↓
Target LLM (Ollama / OpenAI)
        ↓
Generated Trade Analysis
        ↓
DeepEval Evaluation Framework
        ↓
LLM Judge Assessment (OpenAI GPT-4o)
        ↓
Pass / Fail Decision + Metrics Score

Evaluation Approach — LLM-as-a-Judge

The framework adopts the LLM-as-a-Judge evaluation methodology.

For each surveillance scenario:

  1. Trade data is supplied as structured JSON input
  2. The target AI model analyses the trading activity
  3. DeepEval compares the generated response against predefined expectations
  4. An independent evaluator LLM scores the response based on:
    • Answer Relevancy — Is the response relevant to the trade description?
    • Faithfulness — Is the reasoning faithful to the source data?
    • Detection Accuracy — Was the correct FLAG: YES / FLAG: NO decision made?

This approach enables scalable testing of AI systems beyond traditional rule-based assertions.


Technology Stack

Layer Technology
Language Python 3.11+
Test Runner Pytest
LLM Evaluation DeepEval
Local LLM Ollama (TinyLlama / Mistral)
Evaluation Judge OpenAI GPT-4o
Data Format JSON
CI/CD GitHub Actions
Dependency Management pip + requirements.txt

Project Structure

TradeGuardAI/
├── data/
│   └── synthetic_trades.json     ← Trade test dataset
├── tests/
│   ├── test_trade_evaluation.py  ← DeepEval test cases
│   └── validate_trades.py        ← Rule-based assertions
├── utils/
│   ├── read_data.py              ← Trade data loader
│   └── llm_client.py            ← LLM API client
├── evaluators/
│   └── ollama_evaluator.py       ← Custom DeepEval evaluator
├── prompts/                      ← LLM system prompts
├── reports/                      ← Test execution reports
├── conftest.py                   ← Pytest fixtures and setup
├── pytest.ini                    ← Pytest configuration
├── requirements.txt              ← Project dependencies
├── .env                          ← Local environment variables (not committed)
└── .github/
    └── workflows/
        └── ci.yml                ← GitHub Actions CI pipeline

Getting Started

Prerequisites

Installation

# Clone the repository
git clone https://github.com/subhlabh610/TradeGuardAI.git
cd TradeGuardAI

# Create and activate virtual environment
python -m virtualenv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

# Install dependencies
pip install -r requirements.txt

Configuration

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_key_here
BASE_URL=http://localhost:11434
MODEL_NAME=tinyllama

Pull Ollama Model

ollama pull tinyllama

Run Tests

# Run full test suite
python -m pytest

# Run with verbose output
python -m pytest -v

# Run specific test file
python -m pytest tests/test_trade_evaluation.py -v

CI/CD Pipeline

The project includes a GitHub Actions CI pipeline that:

  • Triggers on every push and pull request to main
  • Installs Python 3.11 and project dependencies
  • Installs Ollama and pulls TinyLlama model
  • Runs the full test suite
  • Reports pass/fail results

Pipeline configuration: .github/workflows/ci.yml


Framework Benefits

  • Automated validation of AI-powered surveillance systems
  • Support for realistic Capital Markets trade scenarios
  • Regulatory-focused testing strategy aligned with MAS and MiFID II
  • Reusable Pytest test suites with fixture-based architecture
  • Explainable evaluation results using DeepEval metrics
  • Scalable approach for AI model benchmarking
  • CI/CD integration for continuous quality assurance

Why This Matters

Banks and financial institutions must ensure that AI-driven surveillance systems can reliably identify market abuse patterns while supporting regulatory compliance, reducing operational risk, and protecting market integrity.

A single undetected wash trade or spoofing pattern can result in significant regulatory penalties, reputational damage, and market instability. This framework provides the quality assurance layer that bridges AI development and production deployment in regulated financial environments.


Author

Sulabh Gupta — Senior SDET | AI Quality Engineering | Capital Markets

LinkedIn | GitHub

Releases

No releases published

Packages

 
 
 

Contributors

Languages