Skip to content

kspeiris/bot-detection_01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Coordinated Bot Detection Platform

Hero image for the Coordinated Bot Detection Platform

Python badge Flask badge Selenium badge scikit-learn badge Pandas badge

A web-based behavioural analytics project for detecting individual bots and coordinated bot groups using event logging, feature engineering, machine learning, and short-window similarity analysis.


๐Ÿ“– Overview

This project simulates realistic user journeys through a small web application, records browser interaction events, transforms those events into behavioural fingerprints, and scores sessions for bot risk.

It is designed to demonstrate a full detection workflow:

  • Ingestion: capture click, scroll, keydown, and mousemove events.
  • Session analytics: build behaviour features for each session.
  • Window analytics: split sessions into short windows for finer-grained scoring.
  • Rule-based detection: flag suspicious timing and repetition patterns.
  • ML detection: classify human vs bot behaviour with trained models.
  • Coordination detection: find bot sessions that move with unusually high similarity in a narrow time range.
  • Dashboarding: present risk scores, coordination groups, and supporting evidence in a visual admin UI.

โœจ Why This Project Stands Out

  • Attractive end-to-end demo with both a user-facing flow and an admin dashboard.
  • Supports multiple bot behaviours: fast, human_like, and coordinated.
  • Combines rules, machine learning, and graph-style grouping rather than relying on one signal.
  • Uses both full-session and fixed-window features, which makes coordination analysis more reliable.
  • Includes visual assets, training scripts, simulation scripts, and dashboard controls in one repo.

๐Ÿš€ Key Features

Area What it does
Web UI Multi-page Flask app for landing, login, search, browse, form, and dashboard views
Event Logging Stores browser activity into data/events.csv with actor and bot labels
Feature Extraction Creates features.csv and window_features.csv from raw event streams
Rule Engine Flags high event rate, low timing variance, click-without-movement, and repetitive patterns
ML Models Trains Logistic Regression and Random Forest classifiers for bot detection
Coordination Engine Uses cosine similarity plus time-gap thresholds to identify suspicious bot pairs and groups
Risk Fusion Combines rules, ML scores, and coordination evidence into a final risk score
Demo Automation Runs Selenium bots directly from the dashboard or via standalone scripts

๐Ÿ—‚๏ธ Project Structure

bot-detection_01/
|-- app.py
|-- bot_simulation/
|-- coordination_analysis/
|-- data/
|-- detection/
|-- feature_extraction/
|-- ingestion/
|-- models/
|-- processing/
|-- static/
|-- storage/
`-- templates/

๐Ÿ–ผ๏ธ Screenshots

These screenshots highlight the user-facing flow and the detection dashboard experience.

๐Ÿงฉ UI Gallery

Screenshot 1

Screenshot 2

Screenshot 3

Screenshot 4

๐Ÿ—๏ธ System Architecture

The platform follows a layered pipeline from event capture to fused risk scoring and visual review.

System architecture

๐Ÿ”— Mermaid Diagram

flowchart LR
    A[Human Users] --> B[Flask Web App]
    A2[Selenium Bot Simulations] --> B

    B --> C[Event Logging API]
    C --> D[data/events.csv]
    C --> E[storage/raw_events/events.jsonl]

    D --> F[Feature Extraction]
    F --> G[data/features.csv]
    F --> H[data/window_features.csv]

    G --> I[Session Rule Engine]
    G --> J[Session ML Models]
    H --> K[Window ML Models]
    H --> L[Coordination Engine]

    I --> M[Risk Fusion]
    J --> M
    K --> M
    L --> M

    M --> N[Alert Payloads]
    N --> O[Admin Dashboard]

    O --> P[Session Risk Table]
    O --> Q[Risk Bands and Posture]
    O --> R[Coordinated Group Alerts]
    O --> S[Timeline and Explanations]
Loading

๐Ÿ”„ Architecture Flow

  1. Browser activity is captured from the user app pages and posted to the logging endpoint.
  2. Raw event data is stored in CSV form and optionally normalized through ingestion helpers.
  3. Feature engineering builds both session-level and fixed-window behavioural features.
  4. Session and window models estimate the probability of bot behaviour.
  5. Rule checks add deterministic reasons for suspicious timing or repetition.
  6. Coordination analysis compares bot windows and groups highly similar near-synchronous sessions.
  7. Risk fusion combines the signals into a final alert shown in the admin dashboard.

๐Ÿ›ก๏ธ Detection Logic

๐Ÿ“ 1. Rule-Based Signals

The rule engine highlights sessions with signals such as:

  • Very high event rate
  • Very low timing variance
  • Many clicks without mouse movement
  • High interaction repetition

๐Ÿค– 2. Individual Session Models

The training pipeline uses engineered features such as:

  • Event counts and ratios
  • Mean, min, max, and standard deviation of inter-event timing
  • Session duration and event rate
  • Idle behaviour
  • Sequence entropy
  • Repetition score
  • Behaviour bigram frequencies

๐Ÿ•ธ๏ธ 3. Coordination Analysis

Coordinated bot detection focuses on:

  • Selecting bot-labelled windows as candidates
  • Building a cosine similarity matrix from behavioural features
  • Marking highly similar windows that start within a short time threshold
  • Clustering suspicious pairs into larger coordinated groups

โš–๏ธ 4. Final Risk Fusion

Final risk is derived from a weighted blend of:

  • Rule score
  • Individual model score
  • Coordination score

This helps the system avoid depending on any single signal source.

๐Ÿ‘ฅ Included Bot Profiles

Bot Type Behaviour
fast Moves quickly through the workflow with low hesitation and compressed timing
human_like Uses more variation, slower typing, and less obviously robotic pacing
coordinated Launches multiple similar sessions in parallel to simulate grouped automation

๐Ÿ—ƒ๏ธ Data and Artifacts

The repository already includes generated artifacts and samples under:

๐Ÿงฐ Getting Started

โœ… Prerequisites

  • Python 3.10+
  • Google Chrome installed
  • A Chrome WebDriver compatible with your installed Chrome version

๐Ÿ“ฆ Install Dependencies

Create a virtual environment and install the libraries used in the codebase:

pip install flask pandas joblib matplotlib scikit-learn selenium

Optional:

pip install xgboost

โ–ถ๏ธ Run the Project

1. ๐ŸŒ Start the Flask application

python app.py

The app runs locally at:

http://127.0.0.1:5000

2. ๐Ÿงช Generate behaviour

Use the UI manually:

  • Open /
  • Move through Login, Search, Browse, and Form
  • Open /dashboard to inspect results

Or run the built-in bot simulations:

python bot_simulation/fast_bot.py
python bot_simulation/human_like_bot.py
python bot_simulation/coordinated_bots.py

3. ๐Ÿงฌ Extract features

python feature_extraction/extract_features.py

4. ๐Ÿง  Train models

python models/train_model.py
python models/train_window_model.py

5. ๐Ÿ” Run coordination analysis

python coordination_analysis/detect_coordination.py

๐Ÿ“Š Dashboard Workflow

The admin dashboard at /dashboard is the easiest way to demonstrate the full system.

  • Run Fast Bot, Human-Like Bot, or Coordinated Bots
  • Refresh analytics from the UI
  • Review session rows, risk bands, posture state, and coordinated alert groups
  • Use Run Full Demo for an end-to-end showcase

๐Ÿงฉ Core Modules

Module Responsibility
app.py Flask app, routes, dashboard payload generation, and demo orchestration
processing/feature_engine.py Session and window feature extraction
detection/rules.py Deterministic suspicious-behaviour rules
detection/individual_model.py Dataset validation and training input preparation
detection/coordination_engine.py Similarity scoring, suspicious pair detection, and clustering
detection/risk_fusion.py Final fused risk calculation
models/train_model.py Session-level model training and feature importance output
models/train_window_model.py Window-level model training
bot_simulation/ Selenium-based bot behaviour generators

๐ŸŽจ Visual Assets

The repo already includes strong visual support assets for demos and documentation:

  • static/images/landing-hero.png
  • static/images/human-journey.png
  • static/images/bot-vs-human-behaviour.png
  • static/images/coordinated-bot-cluster.png
  • static/images/security-operations-dashboard.png

๐Ÿงฑ Reliability Notes

This repository is reliable as a research demo and coursework-style detection platform, with a clear and reproducible flow from simulation to analytics. For production use, you would typically add:

  • A pinned requirements.txt
  • Structured logging and error monitoring
  • Persistent database storage instead of CSV-first storage
  • Authentication and access controls for dashboard actions
  • Model versioning and experiment tracking
  • More robust evaluation on larger real-world datasets

๐Ÿ”ฎ Future Improvements

  • Stream events through an API and queue instead of writing directly to CSV
  • Add live charts with websocket updates
  • Introduce device fingerprint and network-level signals
  • Persist alerts and case review notes in a database
  • Add unit tests and integration tests for feature extraction and scoring
  • Containerize the application for easier deployment

๐Ÿ“„ License

This repository does not currently include a license file. Add one if you plan to share or publish the project externally.

About

Behavioral bot detection platform with ML, coordination analysis, Selenium simulation, and a live risk dashboard.

Topics

Resources

Stars

Watchers

Forks

Contributors