Evaluating naturalness of voice impersonations by subjective and objective measures

This project is to find out the reasons why some impersonated voices is able to deceive people and whether it is possible to quantify the voices. As there is a rise in crimes related to voice impersonation, it is important to know how a person changes his/her voice to another person’s voice and what...

Full description

Bibliographic Details
Main Author: Ng, Chen Yi.
Other Authors: Pina Marziliano
Format: Final Year Project (FYP)
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/54271
_version_ 1811681014642114560
author Ng, Chen Yi.
author2 Pina Marziliano
author_facet Pina Marziliano
Ng, Chen Yi.
author_sort Ng, Chen Yi.
collection NTU
description This project is to find out the reasons why some impersonated voices is able to deceive people and whether it is possible to quantify the voices. As there is a rise in crimes related to voice impersonation, it is important to know how a person changes his/her voice to another person’s voice and what are the factors which a human determines whether the voice is disguised or not. Subjective and objective measure is used for this project. The project has a database of voices from three speakers. Each speaker has 9 voices, 8 impersonated voices and 1 natural voice. Each of the voices has 9 sentences. Therefore there are a total of 243 files which will be used for subjective testing. For subjective measure, a trial test was conducted to find out how well a person is able to distinguish impersonated voices. A random generator is created to randomize the voices in the database that are going to be used for the trial test. A graphic user interface is made to facilitate the listener to input his/her decision on whether the voice is disguised or not when doing the test and to play the voices one by one for the listener. Each listener will rate all voices from the 3 speakers. The result obtained from the test is that about 86% of the listeners were able to correct identify the natural voice of the speakers. All of the listeners are also able to identify 5 out of 8 impersonated voices from each speaker. For objective measure, it is to find out the effects of changing the pitch and formants on a voice and also the range of pitch and formants which a synthesized voice sounds natural to a human. Pitch refers to the fundamental frequency of the voice. Formant is defined as the spectral peaks of a sound spectrum |p(f)| [1] and they denotes the vowels. Investigations were made and found that changing the pitch of a voice changes the gender of the source voice and changing the formants of a voice changes the age category of the voice. If the source voice is middle aged male, changing the formants is able to turn the voice into a young male voice or a voice of a child. For a voice to sound natural, a correct combination of pitch and formants is required. Range of pitch from 50Hz to 500Hz with a step size of 30Hz, a formant range of 0.1 to 2 with a step size of 0.1, at least a natural sounding synthesize voice at each step of the pitch. With the two parameters, it is able to quantify the voice as the range of pitch and formants can be found for a synthesized voice to sound natural. Human can clearly differentiate a disguised voice and a person’s daily voice. For future work, the subjective testing can take place on a larger scale to get a more accurate result and more parameters such as the uniqueness and the naturalness of a voice. The database used for the test can be expanded using the natural sounding synthesized voices from the objective measurement. For objective measurement, a more accurate range of formants which changes the age category of a voice can be investigated.
first_indexed 2024-10-01T03:34:13Z
format Final Year Project (FYP)
id ntu-10356/54271
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:34:13Z
publishDate 2013
record_format dspace
spelling ntu-10356/542712023-07-07T16:53:56Z Evaluating naturalness of voice impersonations by subjective and objective measures Ng, Chen Yi. Pina Marziliano School of Electrical and Electronic Engineering Talal Bin Amin DRNTU::Engineering This project is to find out the reasons why some impersonated voices is able to deceive people and whether it is possible to quantify the voices. As there is a rise in crimes related to voice impersonation, it is important to know how a person changes his/her voice to another person’s voice and what are the factors which a human determines whether the voice is disguised or not. Subjective and objective measure is used for this project. The project has a database of voices from three speakers. Each speaker has 9 voices, 8 impersonated voices and 1 natural voice. Each of the voices has 9 sentences. Therefore there are a total of 243 files which will be used for subjective testing. For subjective measure, a trial test was conducted to find out how well a person is able to distinguish impersonated voices. A random generator is created to randomize the voices in the database that are going to be used for the trial test. A graphic user interface is made to facilitate the listener to input his/her decision on whether the voice is disguised or not when doing the test and to play the voices one by one for the listener. Each listener will rate all voices from the 3 speakers. The result obtained from the test is that about 86% of the listeners were able to correct identify the natural voice of the speakers. All of the listeners are also able to identify 5 out of 8 impersonated voices from each speaker. For objective measure, it is to find out the effects of changing the pitch and formants on a voice and also the range of pitch and formants which a synthesized voice sounds natural to a human. Pitch refers to the fundamental frequency of the voice. Formant is defined as the spectral peaks of a sound spectrum |p(f)| [1] and they denotes the vowels. Investigations were made and found that changing the pitch of a voice changes the gender of the source voice and changing the formants of a voice changes the age category of the voice. If the source voice is middle aged male, changing the formants is able to turn the voice into a young male voice or a voice of a child. For a voice to sound natural, a correct combination of pitch and formants is required. Range of pitch from 50Hz to 500Hz with a step size of 30Hz, a formant range of 0.1 to 2 with a step size of 0.1, at least a natural sounding synthesize voice at each step of the pitch. With the two parameters, it is able to quantify the voice as the range of pitch and formants can be found for a synthesized voice to sound natural. Human can clearly differentiate a disguised voice and a person’s daily voice. For future work, the subjective testing can take place on a larger scale to get a more accurate result and more parameters such as the uniqueness and the naturalness of a voice. The database used for the test can be expanded using the natural sounding synthesized voices from the objective measurement. For objective measurement, a more accurate range of formants which changes the age category of a voice can be investigated. Bachelor of Engineering 2013-06-18T04:17:08Z 2013-06-18T04:17:08Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/54271 en Nanyang Technological University 97 p. application/pdf
spellingShingle DRNTU::Engineering
Ng, Chen Yi.
Evaluating naturalness of voice impersonations by subjective and objective measures
title Evaluating naturalness of voice impersonations by subjective and objective measures
title_full Evaluating naturalness of voice impersonations by subjective and objective measures
title_fullStr Evaluating naturalness of voice impersonations by subjective and objective measures
title_full_unstemmed Evaluating naturalness of voice impersonations by subjective and objective measures
title_short Evaluating naturalness of voice impersonations by subjective and objective measures
title_sort evaluating naturalness of voice impersonations by subjective and objective measures
topic DRNTU::Engineering
url http://hdl.handle.net/10356/54271
work_keys_str_mv AT ngchenyi evaluatingnaturalnessofvoiceimpersonationsbysubjectiveandobjectivemeasures