摘要

Current video surveillance systems are not designed to raise an automatic alert in case of situations that put people lives at risk such as accidents, assaults and terrorism among others. This is due to the fact that these systems are not able to analyze huge amounts of video signals at higher processing speed where these signals come from cameras installed in the worldwide network. Faced with this situation, scientific communities are combining efforts to design algorithms and hardware to accelerate the processing of video signals. However, most of the methods proposed to date are too complex to be implemented in hardware at the place where the video camera is installed. In this paper, we report a significantly reduced novel feature set to design an analysis algorithm of significant less complexity which recognizes human actions from video sequences. The proposed method is based on the natural domain knowledge of the human figure such as proportions of the human body and foot positions. The analysis is characterized by working on sub-sequences of the entire video signals, processing a small fragment of the whole image, estimating the location of the region of interest, using simple operations (sum, subtraction, multiplications, divisions), extracting a reduced number of features per frame (6 features), and using a combination of four linear classifiers (one perceptron and three support vector machines) with a hierarchical structure. The method is evaluated on two of the datasets cited in the human action recognition literature, the Weizmann and the UIUC datasets. Results show that for the case of the Weizmann dataset, the correct classification rate (CCR) is 99.95% when the LOOCV Protocol is used and 98.38% for the case of Protocol 60-40, which is comparable or even higher than that of current state-of-the-art methods. Confusion matrices were also obtained for the UIUC dataset, where the obtained CCR is 100% for the case of the LOOCV Protocol and 99.35% when Protocol 60-40 is used. The experimental results are promising with much fewer features (between 85 and 113 times less features), compared with other methods, and the possibility of processing more than 200 fps.

  • 出版日期2015-1