摘要

Immersive videoconferences have added a new dimension to remote collaboration by bringing participants together in a common virtual space. To achieve this, the conferencing system must extract in real-time the foreground from each incoming video stream and translate it into the shared virtual space. The method presented in this paper differentiates itself in the sense that no prior training or assumptions on the video content are used during foreground extraction. A temporally coherent mask is created based on motion cues obtained from the video stream and is used to provide a set of hard constraints. Based on these constraints, a graph cut algorithm is employed to produce the pixel-accurate foreground segmentation. The obtained results are evaluated using a state-of-the-art perceptual metric to provide an objective assessment of the method accuracy and reliability. Furthermore, the presented approach makes use of parallel execution in order to achieve real-time processing capabilities.

  • 出版日期2012-12