Surveillance of sound environment by machine learning

Environmental sound recognition and classification is an important topic in the field of sound event study. Computer can be used to simulate the way that human ear's hearing function works to recognize transient sound signal and assign with corresponding category label. Environmental sounds con...

Full description

Bibliographic Details
Main Author: Yu, Xiang
Other Authors: Jiang Xudong
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163693
Description
Summary:Environmental sound recognition and classification is an important topic in the field of sound event study. Computer can be used to simulate the way that human ear's hearing function works to recognize transient sound signal and assign with corresponding category label. Environmental sounds contain a lot of key information, acoustic scene classification and sound event detection are important technologies of natural acoustic scene calculation and analysis, which will be an essential part for modern applications such as smart robot, airport noise monitoring, unmanned driving, public security intelligent surveillance, etc. At present, the tasks of ambient sound recognition pose many challenges. On one hand, unlike speech and music, ambient sound has complex and changeable frequency domain features and time domain structures, especially in the scene with multiple sound events. In terms of frequency domain features, a sound pitch may have distinct peaks in the frequency spectrum such as an impact sound, or it may be with frequency distribution across the whole spectrum like the wind or noise. In terms of time domain structure, sound can be transient, continuous or intermittent. Therefore it is important and challenging to design a sound recognition system according to the various features of environment sounds, and how to make the computer perceive and understand the acoustic scene exactly like the human ear is a research hotspot in the field of audio signals processing. On the other hand, dataset of environment sound event from open source is very limited, how to make use of the limited dataset to ensure the model with accurate and effective performance is also important. Using the spectrogram, a sound signal can be visualized and quantified with a time-frequency spectral analysis of the magnitude spectrum in a 2D plane. This poses a challenge to sound event classification as spectral amplitudes alone are not sufficient for sound classification. In this project, a process called "Regularized 2D complex-log-Fourier transform" was introduced to resolve this problem. This method was first proposed by Professor Jiang Xudong and Professor Ren Jianfeng, which involves analyzing phase spectrum signal and amplitude spectrum signal for sound events classification. The "Principal Component Analysis" PCA was introduced to remove all the unnecessary sound features from the samples. Finally, the "Mahalanobis Distance" MD is also introduced and calculated for the sound class identification.