摘要

Word sense induction (WSI) is important to many natural language processing tasks because word sense ambiguity is pervasive in linguistic expressions. The majority of existing WSI algorithms are not applicable to capture both lexical semantics and syntactic relations without involving excessive task-specific feature engineering. Moreover, it remains a challenge to explore a sense clustering method which is capable of determining the number of word senses for the polysemous words automatically and properly. In this paper, we learn continuous semantic space representations for the ambiguous instances via recursive context composition, allowing us to capture lexical semantics and syntactic relations simultaneously. Using the learned representations of ambiguous instances, we further adapt rival penalization competitive learning to conduct instances based word sense clustering, allowing us to determine the number of word senses automatically. We validate the effectiveness of our method on the SEMEVAL-2010 WSI dataset. Experiment results show that our method is able to improve the quality of word sense clustering over several competitive baselines.

全文