摘要

We propose a novel method for robust 6-DOF pose tracking of rigid objects from monocular images. In our method, 3D object tracking is achieved by directly aligning video frames to dynamic templates rendered from a textured 3D object model. Unlike previous methods, which usually utilize a small number of discrete templates to align with video frames, we employ an online textured model, rendering to create dynamic templates in continuous pose space according to the previously estimated object pose. In this way, a pose estimator could be easily converged to the optimal state. Besides, the rendered template also helps to detect the occlusion area by comparing it with the current frame, making our method highly robust to partial occlusions. The performance of our method is further improved by introducing a generic representation of dense images features, which we call extended dense feature fields (EDFF). Different kinds of pixel-level image features can be added to the EDFF and be optimized simultaneously in a unified Gauss-Newton optimization scheme. Attributing to dynamic templates from the textured model rendering and complementary features in EDFF, our method is able to deal with poor-textured and specular objects, as well as lighting variation and heavy occlusions. While our method is quite simple and straightforward, it achieves competitive or even superior results compared with the state of the art on challenging data sets.