Approximate Order-Sensitive k-NN Queries over Correlated High-Dimensional Data

作者:Gu, Yu*; Guo, Yandan; Song, Yang; Zhou, Xiangmin; Yu, Ge
来源:IEEE Transactions on Knowledge and Data Engineering, 2018, 30(11): 2037-2050.
DOI:10.1109/TKDE.2018.2812153

摘要

The k Nearest Neighbor (k-NN) query has been gaining more importance in extensive applications involving information retrieval, data mining, and databases. Specifically, in order to trade off accuracy for efficiency, approximate solutions for the k-NN query are extensively explored. However, the precision is usually order-insensitive, which is defined on the result set instead of the result sequence. In many situations, it cannot reasonably reflect the query result quality. In this paper, we focus on the approximate k-NN query problem with the order-sensitive precision requirement and propose a novel scheme based on the projection-filter-refinement framework. Basically, we adopt PCA to project the high-dimensional data objects into the low-dimensional space. Then, a filter condition is inferred to execute efficient pruning over the projected data. In addition, an index strategy named OR-tree is proposed to reduce the I/O cost. The extensive experiments based on several real-world data sets and a synthetic data set are conducted to verify the effectiveness and efficiency of the proposed solution. Compared to the state-of-the-art methods, our method can support order-sensitive k-NN queries with higher result precision while retaining satisfactory CPU and I/O efficiency.