Bird species identification using spectrograms and convolutional neural networks

Birds are particularly useful ecological indicators as they respond quickly to the changes in their environment. Thus, studies regarding the diversity of birds are indispensable. Domain experts classify birds manually to achieve accurate results, but the process is tedious with growing amounts of da...

Full description

Bibliographic Details
Main Author: Saad, Aymen
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/93040/1/AymenSaadMSKE2020.pdf
_version_ 1796865543295205376
author Saad, Aymen
author_facet Saad, Aymen
author_sort Saad, Aymen
collection ePrints
description Birds are particularly useful ecological indicators as they respond quickly to the changes in their environment. Thus, studies regarding the diversity of birds are indispensable. Domain experts classify birds manually to achieve accurate results, but the process is tedious with growing amounts of data. Meanwhile, bioacoustics monitoring employs automated recorders to collect large-scale audio data of fauna vocalization. Nevertheless, the analysis of large-scale audio is impossible to be done manually. Hence, machine learning is a more practical approach. Previously, Convolutional Neural Network (CNN) approach had achieved excellent results using the augmented spectrogram image of the audio. Varieties of CNN architectures such as Cube, Inception-v3, DenseNet, ResNet, and ConvNets are having advantage in high accuracy, but are disadvantage in high computational cost. These architectures are suitable for large-scale classification up to 1000 species due to the deep layer of neural network models to obtain high-level feature extraction from the spectrogram image. However, many devices intended for these models have limited computation resources and strict power consumption constraints. Therefore, the proposed study aims to optimize the CNN-based birdcall classifier model targeting embedded platforms. A low complexity CNN model, MobileNet-v2 was implied which is sufficient for a small-scale classification such as to identify ten bird species as inputs. The dataset used to train our model is from the Xeno-canto repository. Each audio data is amplified to 16 kHz and segmented into 1-second sample data. An algorithm to splice the audio according to the label is proposed. Then, each sample that contains the birdcall signal is augmented into three samples, and the noise only samples are removed from the dataset. The spectrogram image of the samples is obtained using STFT and MFCC conversion, and then all images are resized to 224×224×1, using Matlab 2019b. To verify our model, we compare it with the high complexity CNN model, ResNet-50. In the result, the MobileNet-v2 model has reduced the computational cost of ResNet-50 by 86% with a slight trade-off to the accuracy. Compared to ResNet-50, the accuracy of MobileNet-v2 dropped 12% if using STFT, but only dropped 2% if using MFCC, which made MobileNet-v2 model with MFCC conversion the best CNN model for device applications with small number of classifiers.
first_indexed 2024-03-05T20:58:36Z
format Thesis
id utm.eprints-93040
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T20:58:36Z
publishDate 2020
record_format dspace
spelling utm.eprints-930402021-11-07T06:00:37Z http://eprints.utm.my/93040/ Bird species identification using spectrograms and convolutional neural networks Saad, Aymen TK Electrical engineering. Electronics Nuclear engineering Birds are particularly useful ecological indicators as they respond quickly to the changes in their environment. Thus, studies regarding the diversity of birds are indispensable. Domain experts classify birds manually to achieve accurate results, but the process is tedious with growing amounts of data. Meanwhile, bioacoustics monitoring employs automated recorders to collect large-scale audio data of fauna vocalization. Nevertheless, the analysis of large-scale audio is impossible to be done manually. Hence, machine learning is a more practical approach. Previously, Convolutional Neural Network (CNN) approach had achieved excellent results using the augmented spectrogram image of the audio. Varieties of CNN architectures such as Cube, Inception-v3, DenseNet, ResNet, and ConvNets are having advantage in high accuracy, but are disadvantage in high computational cost. These architectures are suitable for large-scale classification up to 1000 species due to the deep layer of neural network models to obtain high-level feature extraction from the spectrogram image. However, many devices intended for these models have limited computation resources and strict power consumption constraints. Therefore, the proposed study aims to optimize the CNN-based birdcall classifier model targeting embedded platforms. A low complexity CNN model, MobileNet-v2 was implied which is sufficient for a small-scale classification such as to identify ten bird species as inputs. The dataset used to train our model is from the Xeno-canto repository. Each audio data is amplified to 16 kHz and segmented into 1-second sample data. An algorithm to splice the audio according to the label is proposed. Then, each sample that contains the birdcall signal is augmented into three samples, and the noise only samples are removed from the dataset. The spectrogram image of the samples is obtained using STFT and MFCC conversion, and then all images are resized to 224×224×1, using Matlab 2019b. To verify our model, we compare it with the high complexity CNN model, ResNet-50. In the result, the MobileNet-v2 model has reduced the computational cost of ResNet-50 by 86% with a slight trade-off to the accuracy. Compared to ResNet-50, the accuracy of MobileNet-v2 dropped 12% if using STFT, but only dropped 2% if using MFCC, which made MobileNet-v2 model with MFCC conversion the best CNN model for device applications with small number of classifiers. 2020 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/93040/1/AymenSaadMSKE2020.pdf Saad, Aymen (2020) Bird species identification using spectrograms and convolutional neural networks. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:135943
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Saad, Aymen
Bird species identification using spectrograms and convolutional neural networks
title Bird species identification using spectrograms and convolutional neural networks
title_full Bird species identification using spectrograms and convolutional neural networks
title_fullStr Bird species identification using spectrograms and convolutional neural networks
title_full_unstemmed Bird species identification using spectrograms and convolutional neural networks
title_short Bird species identification using spectrograms and convolutional neural networks
title_sort bird species identification using spectrograms and convolutional neural networks
topic TK Electrical engineering. Electronics Nuclear engineering
url http://eprints.utm.my/93040/1/AymenSaadMSKE2020.pdf
work_keys_str_mv AT saadaymen birdspeciesidentificationusingspectrogramsandconvolutionalneuralnetworks