EMD-based method to improve the efficiency of speech/pause segmentation

Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and...

Full description

Bibliographic Details
Main Authors:	A.K. Alimuradov, A.Yu. Tychkov, P.P. Churakov, A.V. Ageykin, A.V. Kuz'min, M.A. Mitrokhin, I.A. Chernov
Format:	Article
Language:	English
Published:	Penza State University Publishing House 2021-09-01
Series:	Известия высших учебных заведений. Поволжский регион:Технические науки
Subjects:	speech signal processing speech segmentation voiced and unvoiced speech empirical mode decomposition

Description
Summary:	Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance.
ISSN:	2072-3059

EMD-based method to improve the efficiency of speech/pause segmentation

Similar Items