Voice-Based Gender Recognition Model Using FRT and Light GBM

Voice-based gender recognition is vital in many computer-aided voice analysis applications like Human-Computer Interaction, fraudulent call identification, etc. A powerful feature is needed for training the machine learning model to discriminate a gender as male or female from a voice signal. This w...

Full description

Bibliographic Details
Main Authors: Priya Kannapiran, Mohamed Mansoor Roomi Sindha
Format: Article
Language:English
Published: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2023-01-01
Series:Tehnički Vjesnik
Subjects:
Online Access:https://hrcak.srce.hr/file/417688
Description
Summary:Voice-based gender recognition is vital in many computer-aided voice analysis applications like Human-Computer Interaction, fraudulent call identification, etc. A powerful feature is needed for training the machine learning model to discriminate a gender as male or female from a voice signal. This work proposes the use of a gradient boosting model in conjunction with a novel Cumulative Point Index (CPI) feature computed by Forward Rajan Transform (FRT) for gender recognition from voice signals. Firstly, voice signals are preprocessed to remove the nonsignificant silence period and are further framed and windowed to make them stationary. Then CPI is computed using the first coefficients of FRT and concatenated to form a feature set, and it is used to train the Light Gradient Boosting Machine (LightGBM) to recognize the gender. This approach provides better accuracy and faster training compared with the state of the art techniques. Experimental results show the primacy of the FRTCPI over other standard features used in the literature. It is also shown that the proposed features, in combination with LightGBM, provide better accuracy of 95.26% with a less computational time of 2.25 s over the challenging large datasets like Speech Accent Archive, Voice Gender Dataset, Common Voice, and Texas Instruments/Massachusetts Institute of Technology corpus.
ISSN:1330-3651
1848-6339