Shallow and deep feature fusion for digital audio tampering detection

Abstract Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classificati...

Full description

Bibliographic Details
Main Authors: Zhifeng Wang, Yao Yang, Chunyan Zeng, Shuai Kong, Shixiong Feng, Nan Zhao
Format: Article
Language:English
Published: SpringerOpen 2022-08-01
Series:EURASIP Journal on Advances in Signal Processing
Subjects:
Online Access:https://doi.org/10.1186/s13634-022-00900-4
_version_ 1811215870011113472
author Zhifeng Wang
Yao Yang
Chunyan Zeng
Shuai Kong
Shixiong Feng
Nan Zhao
author_facet Zhifeng Wang
Yao Yang
Chunyan Zeng
Shuai Kong
Shixiong Feng
Nan Zhao
author_sort Zhifeng Wang
collection DOAJ
description Abstract Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. Firstly, the audio signal is band-pass filtered to obtain the ENF component. Then, the discrete Fourier transform (DFT) and Hilbert transform are performed to obtain the phase and instantaneous frequency of the ENF component. Secondly, the mean value of the sequence variation is used as the shallow feature; the feature matrix obtained by framing and reshaping of the ENF sequence is used as the input of the convolutional neural network; the characteristics of the fitted coefficients are obtained by curve fitting. Then, the local details of ENF are obtained from the feature matrix by the convolutional neural network, and the global information of ENF is obtained by fitting coefficient features through deep neural network (DNN). The depth features of ENF are composed of ENF global information and local information together. The shallow and deep features are fused using an attention mechanism to give greater weights to features useful for classification and suppress invalid features. Finally, the tampered audio is detected by downscaling and fitting with a DNN containing two fully connected layers, and classification is performed using a Softmax layer. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.
first_indexed 2024-04-12T06:30:49Z
format Article
id doaj.art-e5d9227ddd77471e88f5a1dcb81a70e3
institution Directory Open Access Journal
issn 1687-6180
language English
last_indexed 2024-04-12T06:30:49Z
publishDate 2022-08-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Advances in Signal Processing
spelling doaj.art-e5d9227ddd77471e88f5a1dcb81a70e32022-12-22T03:44:02ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61802022-08-012022112010.1186/s13634-022-00900-4Shallow and deep feature fusion for digital audio tampering detectionZhifeng Wang0Yao Yang1Chunyan Zeng2Shuai Kong3Shixiong Feng4Nan Zhao5Department of Digital Media Technology, Central China Normal UniversityHubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of TechnologyHubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of TechnologyHubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of TechnologyHubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of TechnologyHubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of TechnologyAbstract Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. Firstly, the audio signal is band-pass filtered to obtain the ENF component. Then, the discrete Fourier transform (DFT) and Hilbert transform are performed to obtain the phase and instantaneous frequency of the ENF component. Secondly, the mean value of the sequence variation is used as the shallow feature; the feature matrix obtained by framing and reshaping of the ENF sequence is used as the input of the convolutional neural network; the characteristics of the fitted coefficients are obtained by curve fitting. Then, the local details of ENF are obtained from the feature matrix by the convolutional neural network, and the global information of ENF is obtained by fitting coefficient features through deep neural network (DNN). The depth features of ENF are composed of ENF global information and local information together. The shallow and deep features are fused using an attention mechanism to give greater weights to features useful for classification and suppress invalid features. Finally, the tampered audio is detected by downscaling and fitting with a DNN containing two fully connected layers, and classification is performed using a Softmax layer. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.https://doi.org/10.1186/s13634-022-00900-4Electronic network frequencyAudio forensicsDeep learningFeature fusion
spellingShingle Zhifeng Wang
Yao Yang
Chunyan Zeng
Shuai Kong
Shixiong Feng
Nan Zhao
Shallow and deep feature fusion for digital audio tampering detection
EURASIP Journal on Advances in Signal Processing
Electronic network frequency
Audio forensics
Deep learning
Feature fusion
title Shallow and deep feature fusion for digital audio tampering detection
title_full Shallow and deep feature fusion for digital audio tampering detection
title_fullStr Shallow and deep feature fusion for digital audio tampering detection
title_full_unstemmed Shallow and deep feature fusion for digital audio tampering detection
title_short Shallow and deep feature fusion for digital audio tampering detection
title_sort shallow and deep feature fusion for digital audio tampering detection
topic Electronic network frequency
Audio forensics
Deep learning
Feature fusion
url https://doi.org/10.1186/s13634-022-00900-4
work_keys_str_mv AT zhifengwang shallowanddeepfeaturefusionfordigitalaudiotamperingdetection
AT yaoyang shallowanddeepfeaturefusionfordigitalaudiotamperingdetection
AT chunyanzeng shallowanddeepfeaturefusionfordigitalaudiotamperingdetection
AT shuaikong shallowanddeepfeaturefusionfordigitalaudiotamperingdetection
AT shixiongfeng shallowanddeepfeaturefusionfordigitalaudiotamperingdetection
AT nanzhao shallowanddeepfeaturefusionfordigitalaudiotamperingdetection