A general description generator for human activity images based on deep understanding framework

Zhou, Zheng; Li, Kan<sup>*</sup>; Bai, Lin

doi:10.1007/s00521-015-2171-x

摘要

Image description generation is of great application value in online image searching. Inspired by the recent achievements on neocortex study, we design a deep image understanding framework to implement a description generator for general images involving human activities. Different from existing work on image description, which regards it as a retrieval problem instead of trying to understand an image, our framework can recognize the human-object interaction (HOI) activity in the image based on the co-occurrence analysis of 3-D spatial layout and generate natural language description according to what is really happening in the image. We propose a deep hierarchical model to do the image recognition and a syntactic tree-based model to do the natural language generation. With the consideration of supporting online image searching, these two models are designed to uniformly extract features from humans and different object classes and produce well-formed sentences describing the exact things happening in the image. By conducting experiments on the dataset containing images from the phrasal recognition dataset, the six-class sports dataset and the UIUC Pascal sentence dataset, we demonstrate that our framework outperforms the state-of-the-art methods on recognizing HOI activities and generating image descriptions.

出版日期2017-8
单位北京理工大学; 广西大学

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2021-07-09 16:47

A general description generator for human activity images based on deep understanding framework

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友