摘要

Character segmentation is a key technique for Uyghur handwriting recognition, but cursive characters and the phenomenon of stroke drift make the segmentation difficult. A new character segmentation algorithm based on multiple information fusion is proposed to solve the problem. Strokes of a word are extracted, segmented and clustered to get two types of sections: main and affix. The robust over-segmentation primitive sequences are obtained using fuzzy section matching to reduce the interference from stroke drift. Then, the matching information is estimated by constructing a matching position Gaussian model. The recognition confidence is converted from character classifier outputs by confidence transformation, and the semantic information is obtained by word data statistics. A character sequences Markov model is presented and the formula to calculate the posterior probability of a word is derived based on the Bayes criterion. The optimal path and the optimal segmentation result are achieved by weighted multiple information fusion. Experiments show that the proposed algorithm can effectively improve the accuracy and stability of character segmentation.

全文