摘要

In practical applications of video object segmentation, a major challenge is how to increase the accuracy and robustness of complex dynamic scenes and achieve positive scalability under camera displacement conditions. This paper proposes a novel approach to object segmentation by combining information in temporal, spatial, and frequency domains to realize complementary superiority. Four components-motion, color, luminance, and spectral residual from three domains-are applied to structure the temporal-spatial-frequency saliency (TSFS) model. A determining rule is defined to merge the components into a final model. The proposed model is evaluated on two representative video-sequence datasets. The experimental results indicate that the model is more accurate, robust, and effective than other state-of-the-art methods, and can satisfy the requirements of segmenting the object in complex dynamic scenes and large camera displacement conditions.