摘要

Some human detection or tracking algorithms output a low-dimensional representation of the human body, such as a bounding box. Even though this representation is enough for some tasks, a more accurate and detailed point-wise representation of the human body is more useful for pose estimation and action recognition. The refinement process can produce a point-wise mask of the human body from its low-dimensional representation. In this paper, we tackle the problem of refining low-dimensional human shapes using RGB-D data with a novel and accurate method for this purpose. This algorithm combines low-level cues such as shape and color, and high level observations such as the estimated ground plane, in a multi-layer graph cut framework. In our algorithm, shape prior information is learned by training a classifier. Unlike some existing work, our method does not utilize or carry features from the internal steps of the methods which provide the bounding box, so our method can work on the outputs of any similar shape providers. Extensive experiments demonstrate that the proposed technique significantly outperforms other suitable methods. Moreover, a previously published refinement method is extended by incorporating more generic cues to serve this purpose.