摘要

This paper presents a supervised framework for extracting keywords from meeting transcripts, a genre that is significantly different from written text or other speech domains such as broadcast news. In addition to the traditional frequency-or position-based clues, we investigate a variety of novel features, including linguistically motivated term specificity features, decision-making sentence-related features, prosodic prominence scores, as well as a group of features derived from summary sentences. To generate better system summaries, we propose a feedback loop mechanism under a supervised framework to leverage the relationship between keywords and summary sentences. Experiments are performed on the ICSI meeting corpus using both human transcripts and automatic speech recognition (ASR) outputs. Results have shown that our proposed supervised framework is able to outperform both unsupervised term frequency inverse x document frequency (TF-IDF) weighting and a supervised keyphrase extraction system which is known for its satisfying performance on written text. We conduct extensive analysis to demonstrate the effectiveness of the newly proposed features and the feedback mechanism used to generate summaries. Furthermore, we show promising results using n-best recognition output to address the problems of recognition errors.

  • 出版日期2011-3