Urban sound analysis and synthesis using artificial intelligence

With the advent of artificial intelligence and machine learning, multiple industries have gone through different kinds of revolution. For example, convolutional neural networks has drastically changed the conventional ways for computer to capture features of image and video also known as computer vi...

Full description

Bibliographic Details
Main Author: Guo, Zixun
Other Authors: Gan Woon Seng
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141355
_version_ 1826113410349137920
author Guo, Zixun
author2 Gan Woon Seng
author_facet Gan Woon Seng
Guo, Zixun
author_sort Guo, Zixun
collection NTU
description With the advent of artificial intelligence and machine learning, multiple industries have gone through different kinds of revolution. For example, convolutional neural networks has drastically changed the conventional ways for computer to capture features of image and video also known as computer vision. In the audio domain, artificial intelligence has been widely used in areas such as sound classification, speech to text conversion etc. In this work, I will mainly focus on the use of artificial intelligence in urban sound analysis and processing which was shown to have much better performance than conventional methods. Unlike images or videos, analog sound has to be sampled and quantized in order to be stored in digital format. In this work, only digital sound is concerned since neural networks can only pick up digital values. Digital sound also has its unique sets of features such as sampling frequency, bit depth. Various research work has also utilized sound features in the frequency domain such as bandwidth. One important feature of digital sound, sampling frequency, is normally beyond 8kHz. This would bring up some issues in audio processing since one second of audio would contain at least thousands of discrete digital values. In order to process large amounts of sound samples in a sequential manner, the focus of this work will be on recurrent neural networks, a type of network structure with its own memory mechanism that can deal with long-term dependency. In this work I will focus on two topics: audio captioning and audio synthesis. Firstly, captioning using AI has been widely used in the field of computer vision. Meanwhile, audio captioning would be useful for those people who may have hearing issues to perceive sound information. Secondly, audio data collection could be time-consuming and costly. However by learning audio patterns and inter-dependencies, sound synthesis would generate sound more efficiently.
first_indexed 2024-10-01T03:22:47Z
format Final Year Project (FYP)
id ntu-10356/141355
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:22:47Z
publishDate 2020
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1413552023-07-07T18:38:57Z Urban sound analysis and synthesis using artificial intelligence Guo, Zixun Gan Woon Seng School of Electrical and Electronic Engineering Smart Nation TRANS Lab Information Communication Institute of Singapore Furi Andi Karnapi EWSGAN@ntu.edu.sg, furi@ntu.edu.sg Engineering::Electrical and electronic engineering With the advent of artificial intelligence and machine learning, multiple industries have gone through different kinds of revolution. For example, convolutional neural networks has drastically changed the conventional ways for computer to capture features of image and video also known as computer vision. In the audio domain, artificial intelligence has been widely used in areas such as sound classification, speech to text conversion etc. In this work, I will mainly focus on the use of artificial intelligence in urban sound analysis and processing which was shown to have much better performance than conventional methods. Unlike images or videos, analog sound has to be sampled and quantized in order to be stored in digital format. In this work, only digital sound is concerned since neural networks can only pick up digital values. Digital sound also has its unique sets of features such as sampling frequency, bit depth. Various research work has also utilized sound features in the frequency domain such as bandwidth. One important feature of digital sound, sampling frequency, is normally beyond 8kHz. This would bring up some issues in audio processing since one second of audio would contain at least thousands of discrete digital values. In order to process large amounts of sound samples in a sequential manner, the focus of this work will be on recurrent neural networks, a type of network structure with its own memory mechanism that can deal with long-term dependency. In this work I will focus on two topics: audio captioning and audio synthesis. Firstly, captioning using AI has been widely used in the field of computer vision. Meanwhile, audio captioning would be useful for those people who may have hearing issues to perceive sound information. Secondly, audio data collection could be time-consuming and costly. However by learning audio patterns and inter-dependencies, sound synthesis would generate sound more efficiently. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-06-08T02:17:26Z 2020-06-08T02:17:26Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/141355 en A3090-191 application/pdf Nanyang Technological University
spellingShingle Engineering::Electrical and electronic engineering
Guo, Zixun
Urban sound analysis and synthesis using artificial intelligence
title Urban sound analysis and synthesis using artificial intelligence
title_full Urban sound analysis and synthesis using artificial intelligence
title_fullStr Urban sound analysis and synthesis using artificial intelligence
title_full_unstemmed Urban sound analysis and synthesis using artificial intelligence
title_short Urban sound analysis and synthesis using artificial intelligence
title_sort urban sound analysis and synthesis using artificial intelligence
topic Engineering::Electrical and electronic engineering
url https://hdl.handle.net/10356/141355
work_keys_str_mv AT guozixun urbansoundanalysisandsynthesisusingartificialintelligence