摘要

The 2-body correlation 2-BCF) is a group of statistical measurements that found applications in many scientific domains. One type of 2-BCF named the Spatial Distance Histogram (SDH) is of vital importance in describing the physical features of natural systems. While a naive way of computing SDH requires quadratical time, efficient algorithms based on resolving nodes in spatial trees have been developed. A key decision in the design of such algorithms is to choose a proper underlying data structure: our previous work utilizes quad-tree (oct-tree for 3D data) and, in this paper, we study a kd-tree-based solution. Although it is easy to see that both implementations have the same time complexity O(N2d-1/d), where d is the number of dimensions of the dataset, a thorough comparison of their actual running time under different scenarios is conducted. In particular, we present an analytical model to rigorously quantify the running time of dual-tree algorithms. Our analysis suggests that the kd-tree-based implementation outperforms the quad-/oct-tree solution under a wide range of data sizes and query parameters. Specifically, such performance advantage is shown as a speedup up to 1.23x over the quad-tree algorithm for 2D data, and 1.39x over the oct-tree for 3D data, respectively. Results of extensive experiments run on synthetic and real datasets confirm our findings.

全文