摘要

This paper presents a pitch-range (PR) based feature set for age and gender classification. The performance of the proposed feature set is compared With MFCCs, energy, relative spectral transform-perceptual linear prediction (RASTA_PLP), and fundamental frequency (F0). Voice activity detection (VAD) is performed to extract speech utterances before feature extraction. Two different classifiers, k-Nearest Neighbors (kNN) and Support Vector Machines (SVM) are used in order to evaluate the effectiveness of the feature sets. Experimental results are reported for the aGender database. Both kNN and SVM classifiers achieved the highest accuracy rates by the proposed PR feature set in age + gender and age classifications. PR features represent the pitch changes over time. In age + gender classification, the class of middle-aged female speaker is recognized with an accuracy of 92.86%, followed by senior female speakers with 83.61%, children with 83.02%, middle-aged male speakers with 73.58%, young female speakers with 67.35%, and senior male speakers with 34.33% by using 3PR features with the SVM classifier. Low classification accuracies are observed for young male speakers.

  • 出版日期2015-11