K-Nearest Neighbours Clustering

K-Nearest Neighbours clustering can be used as a supervised ML model to find patterns within labelled cluster datasets for classification and regression. This ML model is simple and quick, so no indexes as in my other repo's will be needed.

By: Oscar Sharaz Spencer

Given a data point, calculate distance from that point to every other point in dataset.
Return the $K$ nearest points (smallest distances).
$K$ is a hyper-parameter pre-defined by the user.
For regression you get average of values of $K$ nearest neighbours.
For classification get the majority vote class of $K$ nearest neighbours.

To calculate this distance from one data point to the other, we can use the euclidean distance equation:

$$d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^n (q_i - p_i)^2}$$

From here, we can sort the values of each euclidian distance from nearest to furthest, and take the $K$ nearest neighbours with their associated classes.

From there, we can take the average of the K-NN for regression, and take the majority vote class (using a simple iterative counter) for classification, implementing with label encoding works as well. This implementation uses label encoding.

main.py

The main.py file is the main entry point for running the K-Nearest Neighbour predictions. It initializes the dataset, label encodes features, runs the euclidean distance calculation on the given points, and outputs the predicted point class (0 or 1).

Installation and Execution

To run the predictions, follow these steps:

Ensure you have Python 3 installed on your system.
Clone this repository and navigate to the project directory.
Run the following commands to install any necessary dependencies and execute the script:

# Clone the repository
git clone https://github.com/oskccy/knn-from-scratch.git
cd knn-from-scratch

# Input any point using the `classifier` function

# Run the predictions
python3 main.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
dataset_processing		dataset_processing
README.md		README.md
euclidean_distance.py		euclidean_distance.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Nearest Neighbours Clustering

main.py

Installation and Execution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K-Nearest Neighbours Clustering

main.py

Installation and Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages