This project uses unsupervised learning techniques — specifically Isolation Forest and Autoencoders — to detect anomalies in network traffic data. These anomalies could indicate potential cybersecurity threats, unauthorized access, or system malfunctions.
A benchmark dataset for network intrusion detection or anomaly detection in network traffic data.
- Contains millions of connection records labeled as normal or attack.
- Features include protocol, duration, service, source bytes, and content-based features.
Available at:
- https://www.kaggle.com/datasets/galaxyh/kdd-cup-1999-data
- http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
- Detect unusual network activity patterns using unsupervised anomaly detection.
- Implement and compare:
- Isolation Forest
- Autoencoder Neural Network
- Evaluate model performance using reconstruction error and anomaly scores.
- Visualize and interpret results.
- Python 3.10+
- Scikit-learn – for Isolation Forest and preprocessing
- TensorFlow / Keras – for Autoencoder implementation
- Pandas / NumPy – for data handling
- Matplotlib / Seaborn – for visualization
- Load
kddcup.data_10_percent_correctedand assign column names. - Encode categorical features using Label Encoding.
- Normalize numeric features using
StandardScaler. - Map labels to binary:
0 = normal,1 = anomaly.
Isolation Forest model for anomaly detection in network traffic is implemented here: https://github.com/paramveerkaur1/anomaly-detection-in-network-traffic/blob/main/anomaly-detection-using-isolation-forest.ipynb
Workflow:
- Train an Isolation Forest model on the dataset.
- Predict anomalies (
-1= anomaly,1= normal) and map to encoded values. - Convert predictions for evaluation.
- Evaluate using F1-score, precision, recall, and ROC AUC.
Autoencoder model for anomaly detection in network traffic is implemented here: https://github.com/paramveerkaur1/anomaly-detection-in-network-traffic/blob/main/anomaly_detection_using_autoencoder.ipynb
Workflow:
- Build a deep autoencoder neural network.
- Train only on normal records to learn expected behavior.
- Use reconstruction error to identify outliers.
- Determine threshold (e.g., 95th percentile of error).
- Evaluate predictions using
- Classification Report (Precision, Recall, F1-Score)
- Confusion Matrix
- ROC AUC Score
- Reconstruction Error Distribution plots
Isolation Forest Model:
- Confusion Matrix
- ROC Curve
Autoencoder Model:
- Confusion Matrix
- Autoencoder Model Summary
For questions or suggestions, contact: 14paramveer@gmail.com