摘要

Representing the features of different types of human action in unconstrained videos is a challenging task due to camera motion, cluttered background, and occlusions. This paper aims to obtain effective and compact action representation with length-variable edge trajectory (LV-ET) and spatio-temporal motion skeleton (STMS). First, in order to better describe the long-term motion information for action representation, a novel edge-based trajectory extracting strategy is introduced by tracking edge points from motion without limiting the length of trajectory; the end of the tracking is depending not only on the optical flow field but also on the optical flow vector position in the next frame. So, we only make use of a compact subset of action-related edge points in one frame to generate length-variable edge trajectories. Second, we observe that different types of action have their specific trajectory. A new trajectory descriptor named spatio-temporal motion skeleton is introduced; first, the LV-ET is encoded using both orientation and magnitude features and then the STMS is computed by motion clustering. Comparative experimental results with three unconstrained human action datasets demonstrate the effectiveness of our method.