Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amou...

Full description

Bibliographic Details
Main Authors:	Veronica Morfi, Dan Stowell
Format:	Article
Language:	English
Published:	MDPI AG 2018-08-01
Series:	Applied Sciences
Subjects:	deep learning multi-task learning audio event detection audio tagging weak learning low-resource data
Online Access:	http://www.mdpi.com/2076-3417/8/8/1397

_version_	1830286008130732032
author	Veronica Morfi Dan Stowell
author_facet	Veronica Morfi Dan Stowell
author_sort	Veronica Morfi
collection	DOAJ
description	In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.
first_indexed	2024-12-19T03:51:07Z
format	Article
id	doaj.art-f252b1f5cff746c499f6da1be2cb1395
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-19T03:51:07Z
publishDate	2018-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-f252b1f5cff746c499f6da1be2cb13952022-12-21T20:37:00ZengMDPI AGApplied Sciences2076-34172018-08-0188139710.3390/app8081397app8081397Deep Learning for Audio Event Detection and Tagging on Low-Resource DatasetsVeronica Morfi0Dan Stowell1Machine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London, London E1 4NS, UKMachine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London, London E1 4NS, UKIn training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.http://www.mdpi.com/2076-3417/8/8/1397deep learningmulti-task learningaudio event detectionaudio taggingweak learninglow-resource data
spellingShingle	Veronica Morfi Dan Stowell Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets Applied Sciences deep learning multi-task learning audio event detection audio tagging weak learning low-resource data
title	Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_full	Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_fullStr	Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_full_unstemmed	Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_short	Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
title_sort	deep learning for audio event detection and tagging on low resource datasets
topic	deep learning multi-task learning audio event detection audio tagging weak learning low-resource data
url	http://www.mdpi.com/2076-3417/8/8/1397
work_keys_str_mv	AT veronicamorfi deeplearningforaudioeventdetectionandtaggingonlowresourcedatasets AT danstowell deeplearningforaudioeventdetectionandtaggingonlowresourcedatasets

Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

Similar Items