摘要

In this letter, we propose an unsupervised salient object detection method in three-dimensional videos. Both temporal and depth information are efficiently considered, and multiscale architecture and graph-based refinement are built to improve accuracy and robustness. First, the input video frame is segmented into nonoverlapping superpixels by combining both appearance and depth information at the input. A multiscale architecture is also deployed after the segmentation with different segmentation parameters. Second, the initial saliency score of each segmented superpixel in each scale is calculated via global contrast, which is defined by appearance, depth, and motion cues from two consecutive frames. Third, the initial saliency in each scale is refined by smoothing over graphs built by three spatial-temporal feature priors-color, depth, and motion. Finally, the result is obtained by fusing three refined saliency maps in three scales. The experiments on two widely used datasets illustrate that our method outperforms state-of-the-art algorithms in terms of accuracy, robustness, and reliability.