Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amou...

Full description

Bibliographic Details
Main Authors: Veronica Morfi, Dan Stowell
Format: Article
Language:English
Published: MDPI AG 2018-08-01
Series:Applied Sciences
Subjects:
Online Access:http://www.mdpi.com/2076-3417/8/8/1397
_version_ 1818839239111999488
author Veronica Morfi
Dan Stowell
author_facet Veronica Morfi
Dan Stowell
author_sort Veronica Morfi
collection DOAJ
description In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.
first_indexed 2024-12-19T03:51:07Z
format Article
id doaj.art-f252b1f5cff746c499f6da1be2cb1395
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-19T03:51:07Z
publishDate 2018-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f252b1f5cff746c499f6da1be2cb13952022-12-21T20:37:00ZengMDPI AGApplied Sciences2076-34172018-08-0188139710.3390/app8081397app8081397Deep Learning for Audio Event Detection and Tagging on Low-Resource DatasetsVeronica Morfi0Dan Stowell1Machine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London, London E1 4NS, UKMachine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London, London E1 4NS, UKIn training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.http://www.mdpi.com/2076-3417/8/8/1397deep learningmulti-task learningaudio event detectionaudio taggingweak learninglow-resource data
spellingShingle Veronica Morfi
Dan Stowell
Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
Applied Sciences
deep learning
multi-task learning
audio event detection
audio tagging
weak learning
low-resource data
title Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_full Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_fullStr Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_full_unstemmed Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_short Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_sort deep learning for audio event detection and tagging on low resource datasets
topic deep learning
multi-task learning
audio event detection
audio tagging
weak learning
low-resource data
url http://www.mdpi.com/2076-3417/8/8/1397
work_keys_str_mv AT veronicamorfi deeplearningforaudioeventdetectionandtaggingonlowresourcedatasets
AT danstowell deeplearningforaudioeventdetectionandtaggingonlowresourcedatasets