摘要

Link Detection (abbr. LDT) is to determine whether two stories discuss the same topic in Topic Detection and Tracking (abbr. TDT) track. The key issue is to correctly measure the relevance between two stories. Most researches on LDT use a series of independent words to describe stories (each story is a text specially discussing news), and the relevance between two stories is determined based on the percentage and weight of overlapping words between them. Although substantial improvement has been achieved, inadequate descriptions of word sense and semantics still have negative influences on the accuracy of LDT. In this paper we propose an online semantic tree, which is hierarchically constructed by the most relevant words extracted from previous story streams. In online semantic tree, word sense is described by a series of words in a sense closed-loop, and semantic relation among words is measured by depth and width of level that words locate in. In LDT, online semantic tree is built for each story, and the relevance between two stories is determined by measuring the KL divergence between their online semantic trees. The method performs quite well on TDT4 corpus. The Min Norm CDet of the method in testing is 0.2274 lower than that of the baseline.

全文