摘要
A representative skyline contains k skyline points that can represent its corresponding full skyline. The existing measuring criteria of k representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the k representative skyline over data streams. First, we propose a new criterion to choose k skyline points as the k representative skyline for data stream environments, termed the k largest dominance skyline (k-LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the k-LDS problem in a 2-dimensional space. The time complexity of PBA is only O((M - k) x k) where M is the size of the full skyline set. Third, the k-LDS problem for a d-dimensional (d >= 3) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer k-LDS queries. To further accelerate the calculation, we propose a epsilon-greedy algorithm which can achieve an approximate factor of 1/(1+epsilon) (1 - 1/root e). Experimental results on both synthetic and real-world data show that our k-LDS significantly outperforms its competitors in data stream environments. Furthermore, we demonstrate that the proposed epsilon-greedy algorithm can solve epsilon-LDS efficiently and with a competitive accuracy.
- 出版日期2016-8-1
- 单位东北大学