LogSentry — SIEM Log Analyzer & Threat Detection Engine

A multi-source log analysis and threat detection tool built in Python, mirroring the internal architecture of enterprise SIEM platforms such as Splunk and IBM QRadar.

Overview

LogSentry is a Python-based security log analyzer that processes raw log files from multiple sources, applies threat detection rules, correlates alerts across sources, enriches findings with real-world threat intelligence, and produces a timestamped HTML incident report.

It supports three log formats out of the box:

SSH — /var/log/auth.log and /var/log/secure
Apache/nginx — Combined Log Format (access.log)
Windows Security — Binary .evtx event log files

The tool is designed around the same fundamental pipeline that powers enterprise SIEM platforms — ingest, normalise, detect, correlate, report — rebuilt from scratch in Python to demonstrate deep understanding of the internals.

Why LogSentry?

Most security students use pre-built SIEM tools. LogSentry was built from scratch to prove understanding of what those tools actually do internally:

What commercial SIEMs do	What LogSentry does
Universal forwarders / log collectors	Custom parsers per log format
Common Information Model (CIM)	Uniform event dictionary schema
SPL/AQL correlation rules	Python detector modules
Notable Events / Offenses	`correlator.py` timeline engine
Threat intelligence feeds (TAXII)	AbuseIPDB API integration
MITRE ATT&CK app for Splunk	`mitre.py` with STIX 2.0 dataset
Dashboards, PDF exports	Jinja2 HTML report

Architecture

LogSentry is built around a strict four-layer pipeline with complete separation of concerns:

Raw Log Files
      │
      ▼
┌─────────────┐
│   PARSERS   │  ssh_parser.py / apache_parser.py / windows_parser.py
└─────────────┘
      │  List of normalised event dicts
      ▼
┌─────────────┐
│  DETECTORS  │  brute_force.py / off_hours.py / multi_target.py / web_attacks.py
└─────────────┘
      │  List of alert dicts
      ▼
┌─────────────┐
│ CORRELATOR  │  correlator.py
└─────────────┘
      │  Grouped attack timelines
      ▼
┌─────────────┐
│  REPORTER   │  report_generator.py + report.html (Jinja2)
└─────────────┘
      │
      ▼
Timestamped HTML Report

The core design rule: A detector never reads a file. A parser never makes a judgment. A reporter never analyses data. Each module has exactly one job.

Project Structure

logsentry/
├── main.py                    # Orchestration entry point
├── correlator.py              # Cross-source timeline builder
├── mitre.py                   # MITRE ATT&CK technique enrichment
├── enrichment.py              # AbuseIPDB API integration
├── report_generator.py        # Jinja2 HTML report builder
├── requirements.txt
├── .env                       # API keys (never commit this)
│
├── parsers/
│   ├── __init__.py
│   ├── ssh_parser.py          # auth.log / secure parser
│   ├── apache_parser.py       # access.log parser
│   └── windows_parser.py      # .evtx binary parser
│
├── detectors/
│   ├── __init__.py
│   ├── brute_force.py         # Threshold + CRITICAL escalation
│   ├── off_hours.py           # Temporal anomaly detection
│   ├── multi_target.py        # Credential stuffing detection
│   └── web_attacks.py         # SQL/traversal/command/scan detection
│
├── templates/
│   └── report.html            # Jinja2 HTML template
│
├── sample_logs/
│   ├── auth.log               # 4-scenario SSH test data
│   └── access.log             # 3-scenario Apache test data
│
└── output/                    # Generated reports saved here

Features

Multi-format parsing — SSH, Apache CLF, Windows EVTX binary
Brute force detection with CRITICAL escalation on confirmed account compromise
Off-hours & weekend access detection for insider threat modelling
Credential stuffing detection via multi-target username enumeration
Web attack detection — SQL injection, directory traversal, command injection, vulnerability scanning
Cross-source correlation — links alerts from different log types to the same attacker IP
MITRE ATT&CK mapping — every alert mapped to a real technique ID (STIX 2.0)
AbuseIPDB enrichment — live threat intelligence context for every attacker IP
HTML report generation — professional timestamped incident reports via Jinja2

Parsers

All parsers output a uniform event dictionary so that every detector can consume events from any log source without modification:

{
    "timestamp":  "Mar 29 02:14:33",   # raw string from log
    "ip":         "192.168.1.105",     # always normalised string
    "username":   "root",              # N/A when not available
    "event_type": "failed",            # "failed" | "success" | "invalid_user"
    "source":     "ssh",               # "ssh" | "apache" | "windows"

    # Apache-only
    "method":     "GET",
    "url":        "/index.php",
    "status":     200,                 # int — needed for numeric comparison
    "size":       "1234",              # str — display only

    # Windows-only
    "event_id":   4625,                # int
}

SSH Parser

Processes /var/log/auth.log and /var/log/secure using three compiled regex patterns:

FAILED_PATTERN      = r'(\w+\s+\d+\s+\d+:\d+:\d+).*Failed password for (\S+) from (\S+) port'
SUCCESS_PATTERN     = r'(\w+\s+\d+\s+\d+:\d+:\d+).*Accepted password for (\S+) from (\S+) port'
INVALID_USER_PATTERN = r'(\w+\s+\d+\s+\d+:\d+:\d+).*Invalid user (\S+) from (\S+) port'

Event type	Meaning	Attack implication
`failed`	Username exists, password wrong	Password spraying / brute force
`invalid_user`	Username does not exist	Account enumeration / credential stuffing
`success`	Successful authentication	Legitimate or post-compromise access

Files are opened with encoding='utf-8', errors='ignore' to handle non-UTF-8 bytes that occasionally appear in production log files from special usernames or log aggregators.

Apache Parser

Processes Combined Log Format (CLF) — the default format for both Apache and nginx:

203.0.113.47 - - [29/Mar/2026:00:01:15 +0000] "GET /wp-admin HTTP/1.1" 404 512

The regex captures: client IP, timestamp, HTTP method, URL path, status code, response size. The status code is immediately converted to int at parse time for numeric comparison in detectors. All URLs are lowercased at parse time to eliminate case-based evasion techniques (SeLeCt → select).

Windows Event Log Parser

Windows .evtx files are binary. The parser uses python-evtx to decode each record into an XML string, then navigates the DOM using xml.etree.ElementTree.

Every tag requires the Microsoft event schema namespace prefix:

{http://schemas.microsoft.com/win/2004/08/events/event}

Event ID	Event Name	Has IP?
4625	Failed Logon	Yes
4624	Successful Logon	Yes
4740	Account Lockout	Yes
4720	User Account Created	No (local event)
4672	Special Privileges Assigned	No (local event)
4732	Member Added to Security Group	No (local event)

Detectors

Brute Force Detector

File: detectors/brute_force.py

Detects repeated failed login attempts from a single IP, and escalates to CRITICAL if the same IP later succeeds.

Algorithm:

Separate failed_events and success_events from input
failure_counts = Counter(e["ip"] for e in failed_events)
For each IP where count >= BRUTE_FORCE_THRESHOLD (5):
- Collect targeted usernames, first/last timestamps → HIGH alert
If same IP appears in success_events → CRITICAL alert (account compromised)

Why threshold = 5?

Behaviour	Typical count	Result
Legitimate user forgot password	1–3	No alert
Legitimate user, caps lock	2–4	No alert
Manual attacker testing	5–20	Alert fires
Automated tool (hydra, medusa)	100–10,000	Alert fires

5 is the industry-standard middle ground — matches the default in Fail2Ban and many commercial SIEM rulesets.

Off-Hours Detector

File: detectors/off_hours.py

Analyses successful logins only (failed attempts at odd hours are already caught by the brute force detector — no double alerting). Detects access outside business hours and on weekends.

WORK_START   = 8     # 08:00
WORK_END     = 18    # 18:00
WEEKEND_DAYS = [5, 6] # Saturday, Sunday

Condition	Severity	Rationale
Weekend login	CRITICAL	Almost no legitimate reason; highest insider-threat signal
Weekday off-hours login	HIGH	Suspicious — requires investigation
Weekday in-hours login	No alert	Normal

Why this matters: Insider threats use legitimate credentials. A disgruntled employee accessing the database server at 2 AM on a Sunday is as dangerous as an external attacker. The tool flags the anomaly — a human analyst with organisational context makes the final call.

Timestamp year injection: Syslog format (RFC 3164) does not include the year. The parser injects the current year:

full_ts = f"{datetime.now().year} {ts_str}"
datetime.strptime(full_ts, "%Y %b %d %H:%M:%S")

Multi-Target Detector

File: detectors/multi_target.py

Detects credential stuffing and account enumeration — one IP probing many different usernames.

ip_to_usernames = defaultdict(set)  # set deduplicates automatically

Uses defaultdict(set) instead of defaultdict(list) to ensure each username is counted once per IP regardless of how many times it was attempted. An IP with 100 attempts at root counts as 1 unique target, not 100.

Combines both failed and invalid_user events — a real credential stuffing attack generates both (some usernames exist on the system, some don't). Looking at only one type gives an incomplete picture.

Threshold: 3 unique usernames from one IP = credential stuffing alert (HIGH).

Web Attacks Detector

File: detectors/web_attacks.py

Analyses Apache access log events across four attack categories:

SQL Injection — 13 keyword patterns including union select, or 1=1, drop table, exec(, cast(, admin'--. Uses break after first match to generate one alert per request (prevents triple-alerting on a single URL that matches multiple patterns).

Directory Traversal — 7 patterns including ../, ..\, /etc/passwd, /etc/shadow, boot.ini, /proc/self. A status 200 response on a traversal URL is flagged specially — it means the server actually returned the file.

Command Injection — 12 patterns including ; cat, ; whoami, | cat, && ls, `whoami`, $(whoami). Receives CRITICAL severity — successful command injection is full remote code execution (RCE).

Vulnerability Scanner Detection — counts 404 responses per IP across the entire log. Fires after the main loop (a two-pass algorithm) because the threshold decision requires knowing the total count, not a running total.

Attack Type	Severity	MITRE Technique
Command injection	CRITICAL	T1059
SQL injection	HIGH	T1190
Directory traversal	HIGH	T1083
Vulnerability scanning	MEDIUM	T1595

Correlator

File: correlator.py

Individual alerts are noise. A correlated timeline is the attack story.

The correlator does not detect new threats. It links existing alerts from different detectors into a unified attack narrative grouped by source IP — mirroring the work of a Tier 2 SOC analyst.

Algorithm:

defaultdict(list) groups all alerts by IP address
Count unique alert types per IP using set()
Skip IPs with only 1 unique type — not a coordinated attack
For IPs with 2+ unique types: collect sources involved, build chronological timeline, inherit highest severity

Why 2 unique types as the threshold?

1 type → could be an automated scanner with no human behind it
2 types → attacker pivoted from one technique to another — definitionally a multi-stage attack
3+ types → full kill chain — the highest-value intelligence in the report

Example full kill chain (single IP):

00:01  Apache  → Vulnerability scan (18 × 404)
00:02  Apache  → SQL injection (UNION SELECT)
00:03  Apache  → Directory traversal (../../../etc/passwd → 200)
00:03  SSH     → Brute force (45 failed attempts on root)
00:03  SSH     → Root account COMPROMISED
02:17  SSH     → Off-hours login as dbadmin (lateral movement)

Without correlation: 6 separate low-context alerts. With correlation: one CRITICAL incident with a 6-event timeline.

External Integrations

MITRE ATT&CK

File: mitre.py

Maps every LogSentry alert to a real MITRE ATT&CK technique ID using the official mitreattack-python library and the enterprise-attack.json STIX 2.0 dataset (~75MB, downloaded once at setup).

from mitreattack.stix20 import MitreAttackData
mitre_data = MitreAttackData("enterprise-attack.json")

technique = mitre_data.get_object_by_attack_id(
    technique_id,
    "attack-pattern"   # MUST be "attack-pattern", NOT "technique"
)

Detector	Event Type	MITRE ID	Technique Name
brute_force	SSH brute force	T1110.001	Password Guessing
brute_force	Account compromise	T1078	Valid Accounts
off_hours	Off-hours/weekend login	T1078.003	Local Accounts
multi_target	Credential stuffing	T1110.004	Credential Stuffing
web_attacks	SQL injection	T1190	Exploit Public-Facing App
web_attacks	Directory traversal	T1083	File and Directory Discovery
web_attacks	Command injection	T1059	Command and Scripting Interpreter
web_attacks	Vulnerability scan	T1595	Active Scanning

The integration includes graceful fallback — if the MITRE library fails for any reason, the main program never crashes. It returns a minimal dict with the technique ID and URL only.

AbuseIPDB

File: enrichment.py

Queries the AbuseIPDB API for every attacker IP found in the logs, adding real-world threat intelligence context.

Endpoint: https://api.abuseipdb.com/api/v2/check
Method: GET
Free tier: 1,000 checks/day
lookback window: 90 days (maxAgeInDays: 90)

Response fields used:

abuseConfidenceScore (0–100): 0–24 = low, 25–74 = moderate, 75–100 = high
totalReports: how many times reported globally
lastReportedAt: timestamp of most recent report
countryCode: origin country
isp: internet service provider

Important: AbuseIPDB enriches existing alerts. It does not create new alerts. An IP with a score of 0 that appears in a brute force event is still a brute force alert.

Report Generation

File: report_generator.py + templates/report.html

Generates a professional timestamped HTML report using Jinja2:

env      = Environment(loader=FileSystemLoader("./templates"))
template = env.get_template("report.html")
html_output = template.render(**report_data)

Output filename format: logsentry_report_YYYYMMDD_HHMMSS.html Each run produces a uniquely named file — previous reports are never overwritten, preserving history for compliance and incident post-mortems.

Report sections:

Executive summary (total alerts, severity breakdown)
Correlated attack timelines (grouped by IP, chronological)
SSH alerts table
Apache alerts table
Windows alerts table
Per-alert MITRE ATT&CK technique links
AbuseIPDB enrichment data per attacker IP

Sample Log Scenarios

The sample_logs/ directory contains carefully crafted test data covering 7 scenarios.

auth.log — 4 scenarios

IP	Scenario	Detectors triggered	Max severity
203.0.113.47	Full kill chain	Multi-target + Brute force + Off-hours	CRITICAL
198.51.100.22	Credential stuffing, no success	Multi-target + Brute force	HIGH
45.33.32.156	Simple brute force	Brute force	HIGH
192.168.x.x	Legitimate weekend logins	Off-hours (false positive demo)	CRITICAL

203.0.113.47 full timeline:

00:02  → Username enumeration (admin, ubuntu, deploy, git)   HIGH
00:02  → 45 failed root login attempts                        HIGH
00:03  → ROOT ACCOUNT COMPROMISED                             CRITICAL
02:17  → Returns as dbadmin (lateral movement, off-hours)     CRITICAL

access.log — 3 scenarios

IP	Scenario	Detectors triggered	Max severity
203.0.113.47	Full web attack chain	Scan + SQLi + Traversal + CMDi	CRITICAL
198.51.100.22	SQL + traversal combo	SQLi + Traversal	HIGH
45.33.32.156	Pure vulnerability scanner	Scanner detection	MEDIUM

Correlated: 203.0.113.47 appears in both logs → the correlator produces a single CRITICAL incident spanning: web recon → SQL injection → directory traversal (passwd file leaked) → SSH brute force → root compromise.

Installation

Requirements: Python 3.10+

# 1. Clone the repository
git clone https://github.com/Doumit04/LogSentry.git
cd LogSentry

# 2. Create and activate virtual environment
python -m venv .venv
.venv\Scripts\activate        # Windows
source .venv/bin/activate     # Linux / macOS

# 3. Install dependencies
pip install -r requirements.txt

# 4. Download MITRE ATT&CK dataset (one time, ~75MB)
python -c "from mitreattack.stix20 import MitreAttackData; MitreAttackData.download('enterprise-attack.json')"

# 5. Configure API key
cp .env.example .env
# Edit .env and add your AbuseIPDB API key

Dependencies:

requests
python-dotenv
jinja2
colorama
python-evtx
mitreattack-python

Usage

# Analyse SSH logs
python main.py --log sample_logs/auth.log --type ssh

# Analyse Apache logs
python main.py --log sample_logs/access.log --type apache

# Analyse Windows Event Logs
python main.py --log sample_logs/Security.evtx --type windows

# Analyse all sources at once
python main.py --log sample_logs/ --type all

Reports are saved to the output/ directory with a timestamp in the filename.

Configuration

Edit .env to configure API keys:

ABUSEIPDB_API_KEY=your_api_key_here

Edit detection thresholds in the relevant detector files:

Constant	Default	File	Rationale
`BRUTE_FORCE_THRESHOLD`	5	`brute_force.py`	Fail2Ban industry standard
`MULTI_TARGET_THRESHOLD`	3	`multi_target.py`	2 = shared workstation; 3 = enumeration
`SCAN_THRESHOLD`	10	`web_attacks.py`	OSSEC default
`WORK_START`	8	`off_hours.py`	Standard 8-hour workday
`WORK_END`	18	`off_hours.py`	ISO/IEC 27001

Design Decisions

False positive philosophy: LogSentry is designed to surface everything suspicious and let a human decide. This mirrors real SIEM behaviour. Tuning thresholds so high that only obvious attacks trigger results in missed detections. In security, a false positive wastes an analyst's time — a false negative lets an attacker through.

Parsing vs detection: The most important distinction in this codebase. A parser converts a raw log line into a normalised dictionary — it makes no judgments. A detector receives normalised events and asks one question: does this data match a specific attack pattern? This separation means adding a new log source requires writing one new parser with zero changes to any detector, and adding a new detection rule requires writing one new detector with zero changes to any parser.

Why correlation is the highest-value feature: A 404 scanning alert could be a broken link aggregator. A brute force alert could be a misconfigured CI/CD system. An off-hours login could be a developer in a different timezone. The same three events correlated to a single IP — in order, over 45 minutes — are unambiguous: a human attacker performed reconnaissance, validated credentials via brute force, and then used the compromised account during off-hours to avoid detection. The correlator transforms noise into a decision-ready incident report.

What This Project Demonstrates

Skill Area	Specific Knowledge
SIEM architecture	Ingest → Normalise → Detect → Correlate → Alert pipeline
Log format parsing	Syslog, Combined Log Format, Windows EVTX binary
Regex engineering	Greedy vs lazy quantifiers, anchor strings, evasion-resistant normalisation
Python standard library	`Counter`, `defaultdict`, `re`, `datetime.strptime`, generator expressions
Threat detection logic	Threshold-based, behavioural anomaly, multi-source correlation
MITRE ATT&CK	STIX 2.0, technique taxonomy, kill chain phases, sub-technique URLs
Threat intelligence APIs	REST integration, authentication headers, confidence scores
Windows event logs	EVTX binary format, XML namespace handling, Event ID taxonomy
False positive management	Threshold tuning rationale, SOC analyst workflow, insider threat modelling
Software design	Single responsibility principle applied to security tooling

Documentation

A 41-page deep technical reference document is included in the docs/ directory, covering every algorithmic decision, regex pattern, threshold rationale, XML schema, API integration, and architectural trade-off in the project.

Topics covered: SSH Parser · Apache Parser · Windows EVTX Parser · Brute Force Detector · Off-Hours Detector · Multi-Target Detector · Web Attack Detector · Correlator · Reporter · MITRE ATT&CK Integration · AbuseIPDB Integration · Jinja2 Reporting

Built by Tony Doumit — March 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
detectors		detectors
parsers		parsers
reporter		reporter
samples		samples
templates		templates
Deep technical details for latex documentation.txt		Deep technical details for latex documentation.txt
LogSentry Project LogFile Analyzer Report.pdf		LogSentry Project LogFile Analyzer Report.pdf
README.md		README.md
analyzer.py		analyzer.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LogSentry — SIEM Log Analyzer & Threat Detection Engine

Table of Contents

Overview

Why LogSentry?

Architecture

Project Structure

Features

Parsers

SSH Parser

Apache Parser

Windows Event Log Parser

Detectors

Brute Force Detector

Off-Hours Detector

Multi-Target Detector

Web Attacks Detector

Correlator

External Integrations

MITRE ATT&CK

AbuseIPDB

Report Generation

Sample Log Scenarios

auth.log — 4 scenarios

access.log — 3 scenarios

Installation

Usage

Configuration

Design Decisions

What This Project Demonstrates

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages