摘要

The bag-of-features model is a distinctive and robust approach to detect human actions in videos. The discriminative power of this model relies heavily on the quantization of the video features into visual words. The quantization determines how well the visual words describe the human action. Random forests have proven to efficiently transform the features into distinctive visual words. A major disadvantage of the random forest is that it makes binary decisions on the feature values, and thus not taking into account uncertainties of the values. We propose a soft-assignment random forest, which is a generalization of the random forest, by substitution of the binary decisions inside the tree nodes by a sigmoid function. The slope of the sigmoid models the degree of uncertainty about a feature's value. The results demonstrate that the soft-assignment random forest improves significantly the action detection accuracy compared to the original random forest. The human actions that are hard to detect - because they involve interactions with or manipulations of some (typically small) item - are structurally improved. Most prominent improvements are reported for a person handing, throwing, dropping, hauling, taking, closing or opening some item. Improvements are achieved for the state-of-the-art on the IXMAS and UT-Interaction datasets by using the soft-assignment random forest.

  • 出版日期2013-6

全文