摘要

Textual information in images and video frames constitutes a valuable source of high-level semantics for multimedia indexing and retrieval systems. Text detection is the most crucial step in a multimedia text extraction system and although it has been extensively studied the past decade still, it does not exist a generic architecture that would work for artificial and scene text in multimedia content. In this paper we propose a system for text detection of both artificial and scene text in images and video frames. The system is based on a machine learning stage which uses an Random Forest classifier and a highly discriminative feature set produced by using a new texture operator called Multilevel Adaptive Color edge Local Binary Pattern (MACeLBP). MACeLBP describes the spatial distribution of color edges in multiple adaptive levels of contrast. Then, a gradient-based algorithm is applied to achieve distinction among text lines as well as refinement in the localization of the text lines. The whole algorithm is situated in a multiresolution framework to achieve invariance to scale for the detection of text lines. Finally, an optional connected-component step segments text lines into words based on the distances between the resulting components. The experimental results are produced by applying a concise evaluation methodology and prove the superior performance achieved by the proposed text detection system for artificial and scene text in images and video frames.

  • 出版日期2013-8

全文