摘要

A method to detect the erroneous characters wrongly substituted, deleted, and inserted at the interior location of Japanese sentences and 'bunsetsu's using mth-order Markov chain model has been proposed earlier and was found to be useful in detecting these erroneous characters. However, with this method it is difficult to detect erroneous characters at the end position of Japanese sentences and 'bunsetsu's, because the Markov chain probabilities of erroneous characters at the end position of sentences and 'bunsetsu's, do not remain smaller than the critical value T the same number of times. This paper proposes a method to detect erroneous characters located at the end position of sentences and 'bunsetsu's using the 'skipped Markov chain model' in addition to the 'connected Markov chain model'. From experiments with newspaper articles, the proposed method is shown to be useful to correct erroneous characters located at the end position of sentences and 'bunsetsu's.

  • 出版日期2011-3

全文