Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings we...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/9/9/1724 |
_version_ | 1818018073939017728 |
---|---|
author | Sławomir K. Zieliński Hyunkook Lee |
author_facet | Sławomir K. Zieliński Hyunkook Lee |
author_sort | Sławomir K. Zieliński |
collection | DOAJ |
description | The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes. |
first_indexed | 2024-04-14T07:35:23Z |
format | Article |
id | doaj.art-eeb7b6dc0fc34a5b8b4e60a2ffa01912 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-04-14T07:35:23Z |
publishDate | 2019-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-eeb7b6dc0fc34a5b8b4e60a2ffa019122022-12-22T02:05:42ZengMDPI AGApplied Sciences2076-34172019-04-0199172410.3390/app9091724app9091724Automatic Spatial Audio Scene Classification in Binaural Recordings of MusicSławomir K. Zieliński0Hyunkook Lee1Faculty of Computer Science, Białystok University of Technology, 15-351 Białystok, PolandApplied Psychoacoustics Laboratory (APL), University of Huddersfield, Huddersfield HD1 3DH, UKThe aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.https://www.mdpi.com/2076-3417/9/9/1724binaural audiomachine-listeningmachine-learningspatial audio scene classification |
spellingShingle | Sławomir K. Zieliński Hyunkook Lee Automatic Spatial Audio Scene Classification in Binaural Recordings of Music Applied Sciences binaural audio machine-listening machine-learning spatial audio scene classification |
title | Automatic Spatial Audio Scene Classification in Binaural Recordings of Music |
title_full | Automatic Spatial Audio Scene Classification in Binaural Recordings of Music |
title_fullStr | Automatic Spatial Audio Scene Classification in Binaural Recordings of Music |
title_full_unstemmed | Automatic Spatial Audio Scene Classification in Binaural Recordings of Music |
title_short | Automatic Spatial Audio Scene Classification in Binaural Recordings of Music |
title_sort | automatic spatial audio scene classification in binaural recordings of music |
topic | binaural audio machine-listening machine-learning spatial audio scene classification |
url | https://www.mdpi.com/2076-3417/9/9/1724 |
work_keys_str_mv | AT sławomirkzielinski automaticspatialaudiosceneclassificationinbinauralrecordingsofmusic AT hyunkooklee automaticspatialaudiosceneclassificationinbinauralrecordingsofmusic |