K-Nearest Neighbours clustering can be used as a supervised ML model to find patterns within labelled cluster datasets for classification and regression. This ML model is simple and quick, so no indexes as in my other repo's will be needed.
- Given a data point, calculate distance from that point to every other point in dataset.
- Return the
$K$ nearest points (smallest distances). -
$K$ is a hyper-parameter pre-defined by the user. - For regression you get average of values of
$K$ nearest neighbours. - For classification get the majority vote class of
$K$ nearest neighbours.
To calculate this distance from one data point to the other, we can use the euclidean distance equation:
From here, we can sort the values of each euclidian distance from nearest to furthest, and take the
From there, we can take the average of the K-NN for regression, and take the majority vote class (using a simple iterative counter) for classification, implementing with label encoding works as well. This implementation uses label encoding.
The main.py file is the main entry point for running the K-Nearest Neighbour predictions. It initializes the dataset, label encodes features, runs the euclidean distance calculation on the given points, and outputs the predicted point class (0 or 1).
To run the predictions, follow these steps:
- Ensure you have Python 3 installed on your system.
- Clone this repository and navigate to the project directory.
- Run the following commands to install any necessary dependencies and execute the script:
# Clone the repository
git clone https://github.com/oskccy/knn-from-scratch.git
cd knn-from-scratch
# Input any point using the `classifier` function
# Run the predictions
python3 main.py