ATOSE: Audio Tagging with One-Sided Joint Embedding
Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning mode...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/15/9002 |
_version_ | 1797587011604840448 |
---|---|
author | Jaehwan Lee Daekyeong Moon Jik-Soo Kim Minkyoung Cho |
author_facet | Jaehwan Lee Daekyeong Moon Jik-Soo Kim Minkyoung Cho |
author_sort | Jaehwan Lee |
collection | DOAJ |
description | Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags—for instance, the connection between “classical music” and “cello”. In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations. |
first_indexed | 2024-03-11T00:31:13Z |
format | Article |
id | doaj.art-489030b544fe42939c295ac31b7c60d0 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T00:31:13Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-489030b544fe42939c295ac31b7c60d02023-11-18T22:40:24ZengMDPI AGApplied Sciences2076-34172023-08-011315900210.3390/app13159002ATOSE: Audio Tagging with One-Sided Joint EmbeddingJaehwan Lee0Daekyeong Moon1Jik-Soo Kim2Minkyoung Cho3Com2uS Corporation, Seoul 08506, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaAudio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags—for instance, the connection between “classical music” and “cello”. In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.https://www.mdpi.com/2076-3417/13/15/9002deep learningmusic auto-taggingjoint embedding |
spellingShingle | Jaehwan Lee Daekyeong Moon Jik-Soo Kim Minkyoung Cho ATOSE: Audio Tagging with One-Sided Joint Embedding Applied Sciences deep learning music auto-tagging joint embedding |
title | ATOSE: Audio Tagging with One-Sided Joint Embedding |
title_full | ATOSE: Audio Tagging with One-Sided Joint Embedding |
title_fullStr | ATOSE: Audio Tagging with One-Sided Joint Embedding |
title_full_unstemmed | ATOSE: Audio Tagging with One-Sided Joint Embedding |
title_short | ATOSE: Audio Tagging with One-Sided Joint Embedding |
title_sort | atose audio tagging with one sided joint embedding |
topic | deep learning music auto-tagging joint embedding |
url | https://www.mdpi.com/2076-3417/13/15/9002 |
work_keys_str_mv | AT jaehwanlee atoseaudiotaggingwithonesidedjointembedding AT daekyeongmoon atoseaudiotaggingwithonesidedjointembedding AT jiksookim atoseaudiotaggingwithonesidedjointembedding AT minkyoungcho atoseaudiotaggingwithonesidedjointembedding |