Discovering the k Representative Skyline Over a Sliding Window

作者:Bai, Mei*; Xin, Junchang; Wang, Guoren; Zhang, Luming; Zimmermann, Roger; Yuan, Ye; Wu, Xindong
来源:IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8): 2041-2056.
DOI:10.1109/TKDE.2016.2546242

摘要

A representative skyline contains k skyline points that can represent its corresponding full skyline. The existing measuring criteria of k representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the k representative skyline over data streams. First, we propose a new criterion to choose k skyline points as the k representative skyline for data stream environments, termed the k largest dominance skyline (k-LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the k-LDS problem in a 2-dimensional space. The time complexity of PBA is only O((M - k) x k) where M is the size of the full skyline set. Third, the k-LDS problem for a d-dimensional (d >= 3) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer k-LDS queries. To further accelerate the calculation, we propose a epsilon-greedy algorithm which can achieve an approximate factor of 1/(1+epsilon) (1 - 1/root e). Experimental results on both synthetic and real-world data show that our k-LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed epsilon-greedy algorithm can solve epsilon-LDS efficiently and with a competitive accuracy.