Age and gender classification from speech and face images by jointly fine-tuned deep neural networks

Qawaqneh Zakariya; Abu Mallouh Arafat; Barkana Buket D<sup>*</sup>

doi:10.1016/j.eswa.2017.05.037

摘要

The classification of human's age and gender from speech and face images is a challenging task that has important applications in real-life and its applications are expected to grow more in the future. Deep neural networks (DNNs) and Convolutional neural networks (CNNs) are considered as one of the state-of-art systems as feature extractors and classifiers and are proven to be very efficient in analyzing problems with complex feature space. In this work, we propose a new cost function for fine-tuning two DNNs jointly. The proposed cost function is evaluated by using speech utterances and unconstrained face images for age and gender classification task. The proposed classifier design consists of two DNNs trained on different feature sets, which are extracted from "the same input data. Mel-frequency cepstral coefficients (MFCCs) and fundamental frequency (F0) and the shifted delta cepstral coefficients (SDC) are extracted from speech as the first and second feature sets, respectively. Facial appearance and the depth information are extracted from face images as the first and second feature sets, respectively. Jointly training of two DNNs with the proposed cost function improved the classification accuracies and minimized the over-fitting effect for both speech-based and image-based systems. Extensive experiments have been conducted to evaluate the performance and the accuracy of the proposed work. Two publicly available databases, the Age-Annotated Database of the German Telephone Speech database (aGender) and the Adience database, are used to evaluate the proposed system. The overall accuracy of the proposed system is calculated as 56.06% for seven speaker classes and overall exact accuracy is calculated as 63.78% for Adience database.

出版日期2017-11-1

全文

访问全文

收藏分享被引(47) 浏览

更新时间：2024-04-22 20:01

Age and gender classification from speech and face images by jointly fine-tuned deep neural networks

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友