Anomaly Detection In Network Traffic

This project uses unsupervised learning techniques — specifically Isolation Forest and Autoencoders — to detect anomalies in network traffic data. These anomalies could indicate potential cybersecurity threats, unauthorized access, or system malfunctions.

KDD Cup 1999 Dataset

A benchmark dataset for network intrusion detection or anomaly detection in network traffic data.

Contains millions of connection records labeled as normal or attack.
Features include protocol, duration, service, source bytes, and content-based features.

Available at:

Project Objective

Detect unusual network activity patterns using unsupervised anomaly detection.
Implement and compare:
- Isolation Forest
- Autoencoder Neural Network
Evaluate model performance using reconstruction error and anomaly scores.
Visualize and interpret results.

Tools & Technologies

Python 3.10+
Scikit-learn – for Isolation Forest and preprocessing
TensorFlow / Keras – for Autoencoder implementation
Pandas / NumPy – for data handling
Matplotlib / Seaborn – for visualization

Workflow

Data Preparation

Load kddcup.data_10_percent_corrected and assign column names.
Encode categorical features using Label Encoding.
Normalize numeric features using StandardScaler.
Map labels to binary: 0 = normal, 1 = anomaly.

Isolation Forest Model

Isolation Forest model for anomaly detection in network traffic is implemented here: https://github.com/paramveerkaur1/anomaly-detection-in-network-traffic/blob/main/anomaly-detection-using-isolation-forest.ipynb

Workflow:

Train an Isolation Forest model on the dataset.
Predict anomalies (-1 = anomaly, 1 = normal) and map to encoded values.
Convert predictions for evaluation.
Evaluate using F1-score, precision, recall, and ROC AUC.

Autoencoder Model

Autoencoder model for anomaly detection in network traffic is implemented here: https://github.com/paramveerkaur1/anomaly-detection-in-network-traffic/blob/main/anomaly_detection_using_autoencoder.ipynb

Workflow:

Build a deep autoencoder neural network.
Train only on normal records to learn expected behavior.
Use reconstruction error to identify outliers.
Determine threshold (e.g., 95th percentile of error).
Evaluate predictions using
- Classification Report (Precision, Recall, F1-Score)
- Confusion Matrix
- ROC AUC Score
- Reconstruction Error Distribution plots

Output

Isolation Forest Model:

Confusion Matrix

ROC Curve

Autoencoder Model:

Confusion Matrix

Autoencoder Model Summary

For questions or suggestions, contact: 14paramveer@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
24MDA10058_Paramveer_SIPReport_AnomalyDetection.pdf		24MDA10058_Paramveer_SIPReport_AnomalyDetection.pdf
24MDA10058_Paramveer_SIP_AnomalyDetection.pptx		24MDA10058_Paramveer_SIP_AnomalyDetection.pptx
LICENSE		LICENSE
README.md		README.md
anomaly-detection-using-isolation-forest.ipynb		anomaly-detection-using-isolation-forest.ipynb
anomaly_detection_using_autoencoder.ipynb		anomaly_detection_using_autoencoder.ipynb
autoencoder_model.keras		autoencoder_model.keras
iso_forest_model.pkl		iso_forest_model.pkl
kddcup.data_10_percent_corrected		kddcup.data_10_percent_corrected
kddcup.names		kddcup.names
training_attack_types		training_attack_types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection In Network Traffic

KDD Cup 1999 Dataset

Project Objective

Tools & Technologies