摘要

Both region-based methods and direct methods have become popular in recent years for tracking the 6-dof pose of an object from monocular video sequences. Region-based methods estimate the pose of the object by maximizing the discrimination between statistical foreground and background appearance models, while direct methods aim to minimize the photometric error through direct image alignment. In practice, region-based methods only care about the pixels within a narrow band of the object contour due to the level-set-based probabilistic formulation, leaving the foreground pixels beyond the evaluation band unused. On the other hand, direct methods only utilize the raw pixel information of the object, but ignore the statistical properties of foreground and background regions. In this paper, we find it beneficial to combine these two kinds of methods together. We construct a new probabilistic formulation for 3D object tracking by combining statistical constraints from region-based methods and photometric constraints from direct methods. In this way, we take advantage of both statistical property and raw pixel values of the image in a complementary manner. Moreover, in order to achieve better performance when tracking heterogeneous objects in complex scenes, we propose to increase the distinctiveness of foreground and background statistical models by partitioning the global foreground and background regions into a small number of sub-regions around the object contour. We demonstrate the effectiveness of the proposed novel strategies on a newly constructed real-world dataset containing different types of objects with ground-truth poses. Further experiments on several challenging public datasets also show that our method obtains competitive or even superior tracking results compared to previous works. In comparison with the recent state-of-art region-based method, the proposed hybrid method is proved to be more stable under silhouette pose ambiguities with a slightly lower tracking accuracy.