
Automatic partition of Chinese sentence group is very important to the statistical machine translation system based on discourse. This paper presents an approach to this issue, first, each sentence in a discourse is expressed as a feature vectort second, a special hierarchical clustering algorithm is applied to present a discourse as a sentence group tree. In this paper, local reoccurrence measure is proposed to the selection of key phrases and the evaluation of the weight of key phrases. Experimental results show our approach promising.
