Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings we...

Full description

Bibliographic Details
Main Authors: Sławomir K. Zieliński, Hyunkook Lee
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/9/1724
_version_ 1818018073939017728
author Sławomir K. Zieliński
Hyunkook Lee
author_facet Sławomir K. Zieliński
Hyunkook Lee
author_sort Sławomir K. Zieliński
collection DOAJ
description The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.
first_indexed 2024-04-14T07:35:23Z
format Article
id doaj.art-eeb7b6dc0fc34a5b8b4e60a2ffa01912
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-04-14T07:35:23Z
publishDate 2019-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-eeb7b6dc0fc34a5b8b4e60a2ffa019122022-12-22T02:05:42ZengMDPI AGApplied Sciences2076-34172019-04-0199172410.3390/app9091724app9091724Automatic Spatial Audio Scene Classification in Binaural Recordings of MusicSławomir K. Zieliński0Hyunkook Lee1Faculty of Computer Science, Białystok University of Technology, 15-351 Białystok, PolandApplied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield HD1 3DH, UKThe aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.https://www.mdpi.com/2076-3417/9/9/1724binaural audiomachine-listeningmachine-learningspatial audio scene classification
spellingShingle Sławomir K. Zieliński
Hyunkook Lee
Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
Applied Sciences
binaural audio
machine-listening
machine-learning
spatial audio scene classification
title Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
title_full Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
title_fullStr Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
title_full_unstemmed Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
title_short Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
title_sort automatic spatial audio scene classification in binaural recordings of music
topic binaural audio
machine-listening
machine-learning
spatial audio scene classification
url https://www.mdpi.com/2076-3417/9/9/1724
work_keys_str_mv AT sławomirkzielinski automaticspatialaudiosceneclassificationinbinauralrecordingsofmusic
AT hyunkooklee automaticspatialaudiosceneclassificationinbinauralrecordingsofmusic