Summary: | Background. Speech/pause segmentation is one of the most important tasks in
speech applications being accurate detection of the boundaries of the beginning and the end
of voiced and unvoiced speech, and pauses. This is especially important both when analyzing
distribution speed, acceleration, and entropy of voiced and unvoiced speech sections,
and pauses, and analyzing the average duration of pauses. The aim of the work is to improve
the efficiency of speech/pause segmentation based on the method of empirical mode
decomposition. Materials and methods. A unique technology for adaptive decomposition of
non-stationary signals, namely, the improved complete ensemble empirical mode decomposition
with adaptive noise, has been used in the work. The software implementation of the
method was performed in ©MATLAB (MathWorks) mathematical modeling environment.
Results. A decomposition-based method has been developed to be used at the preprocessing
stage of the original speech signals to form a set of new investigated signals containing the
most reliable information about the boundaries of the beginning and the end of the voiced
and unvoiced speech, and pauses. The research to assess the influence of the decomposition
method, and the duration of the studied signal fragments on the efficiency of speech/pause
segmentation has been done. We have used the methods based on the analysis of zerocrossing
rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions.
Based on the research results, it was found that the proposed method provides an increase
in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for
the method based on the analysis of zero-crossing rate; by 8.24% for the method based on
the analysis of short-term energy; by 5.72% for the method based on the combined analysis
of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis
of one-dimensional Mahalanobis distance.
|