Spoofing speech detection using temporal convolutional neural network

Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these fea...

Deskribapen osoa

Xehetasun bibliografikoak
Egile Nagusiak:	Xiao, Xiong, Li, Haizhou, Tian, Xiaohai, Chng, Eng Siong
Beste egile batzuk:	School of Computer Science and Engineering
Formatua:	Conference Paper
Hizkuntza:	English
Argitaratua:	2018
Gaiak:	DRNTU::Engineering::Computer science and engineering Convolutional Neural Network (CNN) Speech Detection
Sarrera elektronikoa:	https://hdl.handle.net/10356/89639 http://hdl.handle.net/10220/47064

_version_	1826130578410307584
author	Xiao, Xiong Li, Haizhou Tian, Xiaohai Chng, Eng Siong
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xiao, Xiong Li, Haizhou Tian, Xiaohai Chng, Eng Siong
author_sort	Xiao, Xiong
collection	NTU
description	Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.
first_indexed	2024-10-01T07:58:39Z
format	Conference Paper
id	ntu-10356/89639
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:58:39Z
publishDate	2018
record_format	dspace
spelling	ntu-10356/896392020-03-07T11:48:46Z Spoofing speech detection using temporal convolutional neural network Xiao, Xiong Li, Haizhou Tian, Xiaohai Chng, Eng Siong School of Computer Science and Engineering 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) NTU-UBC Research Centre of Excellence in Active Living for the Elderly Temasek Laboratories DRNTU::Engineering::Computer science and engineering Convolutional Neural Network (CNN) Speech Detection Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection. NRF (Natl Research Foundation, S’pore) Accepted version 2018-12-18T07:45:21Z 2019-12-06T17:30:03Z 2018-12-18T07:45:21Z 2019-12-06T17:30:03Z 2016-12-01 2016 Conference Paper Tian, X., Xiao, X., Chng, E. S., & Li, H. (2016). Spoofing speech detection using temporal convolutional neural network. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). doi:10.1109/APSIPA.2016.7820738 https://hdl.handle.net/10356/89639 http://hdl.handle.net/10220/47064 10.1109/APSIPA.2016.7820738 200465 en © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/APSIPA.2016.7820738]. 6 p. application/pdf
spellingShingle	DRNTU::Engineering::Computer science and engineering Convolutional Neural Network (CNN) Speech Detection Xiao, Xiong Li, Haizhou Tian, Xiaohai Chng, Eng Siong Spoofing speech detection using temporal convolutional neural network
title	Spoofing speech detection using temporal convolutional neural network
title_full	Spoofing speech detection using temporal convolutional neural network
title_fullStr	Spoofing speech detection using temporal convolutional neural network
title_full_unstemmed	Spoofing speech detection using temporal convolutional neural network
title_short	Spoofing speech detection using temporal convolutional neural network
title_sort	spoofing speech detection using temporal convolutional neural network
topic	DRNTU::Engineering::Computer science and engineering Convolutional Neural Network (CNN) Speech Detection
url	https://hdl.handle.net/10356/89639 http://hdl.handle.net/10220/47064
work_keys_str_mv	AT xiaoxiong spoofingspeechdetectionusingtemporalconvolutionalneuralnetwork AT lihaizhou spoofingspeechdetectionusingtemporalconvolutionalneuralnetwork AT tianxiaohai spoofingspeechdetectionusingtemporalconvolutionalneuralnetwork AT chngengsiong spoofingspeechdetectionusingtemporalconvolutionalneuralnetwork

Spoofing speech detection using temporal convolutional neural network

Antzeko izenburuak