摘要

In this work, we formulate the problem of keyword spotting as a non-uniform error automatic speech recognition (ASR) problem and propose a model training methodology based on the non-uniform minimum classification error (MCE) approach. The main idea is to adapt the fundamental MCE criteria to reflect the cost-sensitive notion in that errors on keywords are much more significant than errors on non-keywords in an automatic speech recognition task. The notion of cost sensitivity leads to emphasis of keyword models in parameter optimization. Then we present a system which takes advantage of the weighted finite-state transducer (WFST) framework to efficiently implement the non-uniform MCE. To enhance the approach of non-uniform error cost minimization for keyword spotting, we further formulate a technique called "adaptive boosted non-uniform MCE" which incorporates the idea of boosting. We validate the proposed framework on two challenging large-scale spontaneous conversational telephone speech (CTS) datasets in two different languages (English and Mandarin). Experimental results show our framework can achieve consistent and significant spotting performance gains over both the maximum likelihood estimation (MLE) baseline and conventional discriminatively-trained systems with uniform error cost.

  • 出版日期2015-2