Python KNN Algorithm And Sample Application

python

Hello;
In this paper we will talk about the closest neighbor(knn / K neighborhood) algorithm used in classification technique. The narrative be clear on behalf of a class attribute to a certain value, for example, the particular value of the class attribute when we talk about the guy I’ll tell him.

KNN_similarity
KNN_similarity

Knn algorithm; (in the space formed by the elements which have the class attribute set to a new instance (of the class nature non-specific) added to this when, for example, an algorithm that determines whether it should be included in the closest class. It uses a k variable to determine the class closest to it. This variable K represents the number of elements that are closest to the instance. The following 3 distance functions are commonly used when calculating the distance between the elements.

Euclidean function is a function that we have been familiar with since elementary school or high school. In fact, don’t be afraid of your eyes, all derived from one main function. We get the form of the Manhattan distance for the value q=1 in Minkowski, and the Euclidean distance for the value q=2. Therefore, we will use minkowski’s general formula to implement. Thus, the person will only change the Q value for other distance measurements.

If we summarize the algorithm with a few items;

  • Add a new instance to a space made up of elements with class attributes.
  • Set the K count.
  • Calculate the distance between the new sample and the elements.(Using one of the offset functions))
  • Look at the class attribute of the closest K element.
  • Include an instance of the class whose class value is greater than the number of elements that are closest to it.–

Learn KNN Algorithm Python

The theoretical part of the algorithm ends here. Now let’s do both practice and practice with Python. As you know, the first thing we need is an appropriate data. We’re going to use the data set here for the application we’re going to do. In this data set, the color values(r, g, b) of the human face (SIMA) in pictures of people of different ages,sexes,races are taken. There are also non-human face samples from the color values in the dataset. As a result of these values taken 4. in milk, this value is 1 if it belongs to a human face, if not we have a class that holds a value of 2 which is. Because our data set is too large, we will only use a fraction of the data set because our personal computers will be forced to process it(at least mine). Since the class separation in the data set starts at 50.859 lines, we will read the data set considering this information. Although it is a bit easier to parse the data set with libraries like pandas, we will not use any Library for the parse process in this application. I thought it would be better to share the code as a whole rather than share the part and make the necessary comment.

The view of our application in 3D space is as follows. Our blue dot example(class is not certain), red dots represent the skin(SIMA) class, and green dots represent the non-skin class. As you can see clearly in the graph, our example is included in the class closest to it by our application.

data
data

https://www.kalogroup.com.au/