Audio Recapture Detection With Convolutional Neural Networks

Lin, Xiaodan<sup>*</sup>; Liu, Jingxian; Kang, Xiangui

doi:10.1109/TMM.2016.2571999

摘要

In this paper, we investigate how features can be effectively learned by deep neural networks for audio forensic problems. By providing a preliminary feature preprocessing based on electric network frequency (ENF) analysis, we propose a convolutional neural network (CNN) for training and classification of genuine and recaptured audio recordings. Hierarchical representations which contain levels of details of the ENF components are learned from the deep neural networks and can be used for further classification. The proposed method works for small audio clips of 2 second duration, whereas the state of the art may fail with such small audio clips. Experimental results demonstrate that the proposed network yields high detection accuracy with each ENF harmonic component represented as a single-channel input. The performance can be further improved by a combined input representation which incorporates both the fundamental ENF and its harmonics. The convergence property of the network and the effect of using an analysis window with various sizes are also studied. Performance comparison against the support tensor machine demonstrates the advantage of using CNN for the task of audio recapture detection. Moreover, visualization of the intermediate feature maps provides some insight into what the deep neural networks actually learn and how they make decisions.

出版日期2016-8
单位中山大学; 华侨大学

全文

访问全文

收藏分享被引(37) 浏览

更新时间：2024-04-20 03:10

Audio Recapture Detection With Convolutional Neural Networks

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友