摘要

k nearest neighbor (kNN) is one of the basic processes behind various machine learning methods In kNN, the relation of a query to a neighboring sample is basically measured by a similarity metric, such as Euclidean distance. This process starts with mapping the training dataset onto a one-dimensional distance space based on the calculated similarities, and then labeling the query in accordance with the most dominant or mean of the labels of the k nearest neighbors, in classification or regression issues, respectively. The number of nearest neighbors (k) is chosen according to the desired limit of success. Nonetheless, two distinct samples may have equal distances to query but, with different angles in the feature space. The similarity of the query to these two samples needs to be weighted in accordance with the angle going between the query and each of the samples to differentiate between the two distances in reference to angular information. This opinion can be analyzed in the context of dependency and can be utilized to increase the precision of classifier. With this point of view, instead of kNN, the query is labeled according to its nearest dependent neighbors that are determined by a joint function, which is built on the similarity and the dependency. This method, therefore, may be called dependent NN (d-NN). To demonstrate d-NN, it is applied to synthetic datasets, which have different statistical distributions, and 4 benchmark datasets, which are Pima Indian, Hepatitis, approximate Sinc and CASP datasets. Results showed the superiority of d-NN in terms of accuracy and computation cost as compared to other employed popular machine learning methods.

  • 出版日期2017-6