摘要

This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.