摘要

Video annotation plays an important role in content based video retrieval. In this paper, we propose an automatic method to find out person identity in live video from a fixed camera by making use of a novel contextual information, motion pattern. When subjects move around in the Field Of View (FOV) of a camera, motion measurements of human body are simultaneously captured by two different sensing techniques, including camera and smart phones equipped with inertial sensors. Then classification models are trained to recognize motion pattern from raw motion data. To identify the subject that appeared in video from the camera, a metric of distance is defined to quantitatively measure the similarity between motion sequence recognized from video and each of those from smart phones. When a most similar sequence is detected, identity information related to the corresponding phone is used to annotate video frames, together with time and camera location. To test the feasibility and performance of the proposed method, extensive experiments are conducted, which achieved impressive results.

全文