摘要

Almost all binarization methods have a few parameters that require setting. However, they do not usually achieve their upper-bound performance unless the parameters are individually set and optimized for each input document image. In this work, a learning framework for the optimization of the binarization methods is introduced, which is designed to determine the optimal parameter values for a document image. The framework, which works with any binarization method, has a standard structure, and performs three main steps: (i) extracts features, (ii) estimates optimal parameters, and (iii) learns the relationship between features and optimal parameters. First, an approach is proposed to generate numerical feature vectors from 20 data. The statistics of various maps are extracted and then combined into a final feature vector, in a nonlinear way. The optimal behavior is learned using support vector regression (SVR). Although the framework works with any binarization method, two methods are considered as typical examples in this work: the grid-based Sauvola method, and Lu's method, which placed first in the DIBCO'09 contest. The experiments are performed on the DIBC0'09 and H-DIBCO'10 datasets, and combinations of these datasets with promising results.

  • 出版日期2013-3