摘要

Although skyline queries are very useful in such areas such as decision support, market analysis and personalized services, they have not been extensively studied in the context of uncertain data. The existing work on answering probabilistic skyline queries either requires a user to define a threshold (Pei et al., 2007), or return all probabilistic skyline objects (Atallah and Qi, 2009). However, it is difficult to set the threshold because if set too high, important results may be lost, but if set too low or if there is no threshold, a lot of low quality results may be returned (Hua et al., 2011; Le and Cao, 2012; Le et al., 2013) [17]. In this paper, we identify two main challenges in answering probabilistic skyline queries. The first is defining what are the interesting probabilistic skyline tuples to return to the users. The second is efficiently finding these tuples without enumerating all possible worlds. We overcome the first challenge by introducing the bestpro-skyline query, which extends the dominance principle to also include the skyline probability of the probabilistic skyline tuples. This approach results in pruning the result set to just a very small number of the most interesting probabilistic skyline tuples without the need to set any user-defined threshold. We overcome the second challenge by using formulas based on the probabilistic theory to directly calculate the skyline probabilities without considering any possible worlds and develop algorithms to prune the search space. Experiments show that our solution is able to find the 17 interesting probabilistic skyline tuples from 13,095 tuples within 19 s in a real data set. Our solution outperforms a Naive solution by up to three orders of magnitude for computational time.

  • 出版日期2016-5-1