摘要

The last few decades have witnessed rapid development of visual saliency detection, as it can detect object-of-interest from clutter environments to substantially facilitate a wide range of applications. However, traditional visual saliency detection models primarily rely on image features, which may face great challenges in low contrast video stream captured from low lighting scenarios. This paper proposes a dynamic multimodal fusion based visual saliency detection model towards low contrast videos, which combines saliency information from spatial, frequency, and temporal domains. In spatial domain, super-pixel covariance is utilized to compute the region dissimilarity under low lighting scenarios; in frequency domain, the amplitude spectrum tuned method is used to suppress the background noise; in temporal domain, the incremental learning is employed to efficiently update background model from high dimensional video streams. Extensive experiments have been conducted to validate the effectiveness of the proposed model.