摘要

Large vocabulary continuous speech recognition is particularly difficult for low-resource languages. In the scenario we focus on here is that there is a very limited amount of acoustic training data in the target language, but more plentiful data in other languages. We investigate both feature-level and model-level approaches. The first is based on the MLP framework, in which we train the multi-streams based on the Automatic speech attribute transcription strategy and data sampling method individually, and a multilingual training mode using the non-target languages data is presented to obtain more discriminative features. At the model level we apply the recently proposed Subspace Gaussian mixture model to obtain more improvement. Finally, combining these two strategies in a multilingual training mode we get a large improvement of more than 13% absolute versus a conventional baseline.

全文