摘要

Batch Normalization has showed success in image classification and other image processing areas by reducing internal covariate shift in deep network model's training procedure. In this paper, we propose to apply batch normalization to speech recognition within the hybrid NN-HMM model. We evaluate the performance of this new method in the acoustic model of the hybrid system with a speaker-independent speech recognition task using some Chinese datasets. Compared to the former best model we used in the Chinese datasets, it shows that with batch normalization we can reach lower word error rate (WER) of 8%-13% relatively, meanwhile we just need 60% iterations of original model to finish the training procedure.