摘要

Searching for Approximate Nearest Neighbors (ANN) of high dimensional floating point data has to compute their expensive Euclidean distances, and the memory occupancy rate is high. In order to fix such problem, an algorithm is proposed to effectively and efficiently map high dimensional floating point data into low dimensional binary codes, while preserving the normalized distance similarity. As a result, Hamming distances can be used to instead the Euclidean distances during ANN search process. To guarantee the ANN search performance of obtained binary codes, the distribution adaptive binary labels of the training data are firstly acquired based on the look-up mechanism. Then, the classifaction planes are obtained on the basis of SVM algorithm, and the one with the highest entropy value is chosen as the final hashing function. In order to further improve the retrieval performance, the retrieval system based on multiple binary codes is proposed. During the training process, different kinds of original encoding centers are chosen to obtain multiple hashing functions and multiple binary codes. During the search stage, the points with the minimal average Hamming distances are returned as query results. Experiments show that the proposed algorithm can efficiently and effectively map the floating point data into superior binary codes, and has excellent ANN search performance when compared with other state-of-art methods.

全文