An advanced implementation of the Decision Tree algorithm from the ground up, designed to solve complex multi-class classification problems. This project demonstrates the core mechanics of machine learning without relying on high-level library abstractions for the model itself.
This repository contains a pure Python implementation of a Decision Tree, evaluated on the UCI Covertype Dataset. The goal is to predict forest cover types based on 54 cartographic variables (elevation, slope, soil types, etc.).
- Mathematical Core: Implemented both Information Gain (Entropy) and Gini Impurity as splitting criteria.
- Data Pipeline: Custom Quartile Binning strategy to transform continuous environmental data into meaningful categorical buckets.
- Overfitting Control: A robust Post-Pruning algorithm that uses a validation set to trim the tree for better generalization.
- Vivid Analytics: * Structural visualization of the tree using
networkx.- Comprehensive performance metrics including Confusion Matrices and Per-Class accuracy.
- Feature Importance analysis based on split frequency.
Unlike standard library calls, this implementation builds the tree recursively:
- Splitting: Exhaustive search for the best feature/value pair that maximizes Information Gain.
- Pruning: Bottom-up traversal to compare accuracy before and after removing nodes, ensuring the simplest effective model.
- Visualization: Dynamic mapping of the tree nodes into a directed graph for interpretability.
- Python 3.x
- Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, NetworkX
- Clone the repository:
git clone [https://github.com/NegarR6/Covertype-DecisionTree.git](https://github.com/NegarR6/Covertype-DecisionTree.git)
- Install dependencies:
pip install -r requirements.txt
- Run the analysis:
python final.py
This project is licensed under the MIT License - see the LICENSE file for details.
Developed by Negar Rezaei | GitHub @NegarR6