摘要

Action recognition has very high academic research value, potential commercial value and wide market application prospect in computer vision. In order to improve the action recognition accuracy, two kinds of dynamic descriptors based on dense trajectories are proposed in this paper. Firstly, to capture the local position information that action occurs, dense sampling in motion regions is done by constraining and clustering of optical flow. Secondly, the motion corners of object are selected as feature points which are then tracked to obtain motion trajectories. Finally, the gradient information and optical flow gradient information are extracted respectively in the video cube centered at the trajectories, then the auto-correlation and normalization processing are carried out on the two above information to obtain two dynamic descriptors named 3D histograms of oriented gradients in trajectory centered cube auto-correlation and 3D histograms of oriented optical flow gradients auto-correlation, which can resist a certain degree of interferences caused by camera motion and complex background. However, the diversity of realistic videos makes dynamic or static descriptors alone unable to achieve accurate action classification. A new framework is proposed, which makes the dynamic descriptors and static descriptors fuse and supplement mutually to further improve the action recognition accuracy. This paper adopts the leave-one-out cross validation on datasets of Weizmann and UCF-Sports with action recognition accuracy of 100 % and 96.00 %, and adopts the four-fold cross validation on datasets of KTH and YouTube with action recognition accuracy of 97.17 % and 88.23 %, which has the better performance over the references.