摘要
Word segmentation or word extraction is always the first step of subject extraction. For no intervals between words, word segmentation of Chinese text is rather complicated. In this paper, a novel text subject extraction method based on contextual co-occurrence is put forwards, and an approach of extracting subject sentence from Chinese text using character contextual co-occurrence data is described. The new approach has fast speed and can skip the segmentation. It also can be applied in multi-style text. The result of three experiments shows that the approach gains high accuracy in multi-style text, 77.19% in news text. Comparative experiment shows that there was no losing in accuracy.
- 出版日期2003
- 单位上海交通大学