摘要

Text is very important to video retrieval, index, and understanding. However, its detection and extraction is challenging due to varying background, low contrast between text and non-text regions, and perspective distortion. In this paper, we propose a novel two phase approach to tackling this problem by discriminative features and edge density. The first phase firstly defines and extracts a novel feature called edge distribution entropy and then uses this feature to remove most non-text regions. The second phase employs a Support vector machine (SVM) to further distinguish real text regions from non-text ones. To generate inputs for SVM, additional three novel features are defined and extracted from each region: a foreground pixel distribution entropy, skeleton/size ratio, and edge density. After text regions have been detected, texts are extracted from such regions that are surrounded by sufficient edge pixels. A comparative study using two publicly accessible datasets shows that the proposed method significantly outperforms the selected four state of the art ones for accurate text detection and extraction.