ATOSE: Audio Tagging with One-Sided Joint Embedding

Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning mode...

Full description

Bibliographic Details
Main Authors: Jaehwan Lee, Daekyeong Moon, Jik-Soo Kim, Minkyoung Cho
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/15/9002
_version_ 1797587011604840448
author Jaehwan Lee
Daekyeong Moon
Jik-Soo Kim
Minkyoung Cho
author_facet Jaehwan Lee
Daekyeong Moon
Jik-Soo Kim
Minkyoung Cho
author_sort Jaehwan Lee
collection DOAJ
description Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags—for instance, the connection between “classical music” and “cello”. In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.
first_indexed 2024-03-11T00:31:13Z
format Article
id doaj.art-489030b544fe42939c295ac31b7c60d0
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T00:31:13Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-489030b544fe42939c295ac31b7c60d02023-11-18T22:40:24ZengMDPI AGApplied Sciences2076-34172023-08-011315900210.3390/app13159002ATOSE: Audio Tagging with One-Sided Joint EmbeddingJaehwan Lee0Daekyeong Moon1Jik-Soo Kim2Minkyoung Cho3Com2uS Corporation, Seoul 08506, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaDepartment of Computer Engineering, Myongji University, Yongin 17058, Republic of KoreaAudio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags—for instance, the connection between “classical music” and “cello”. In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.https://www.mdpi.com/2076-3417/13/15/9002deep learningmusic auto-taggingjoint embedding
spellingShingle Jaehwan Lee
Daekyeong Moon
Jik-Soo Kim
Minkyoung Cho
ATOSE: Audio Tagging with One-Sided Joint Embedding
Applied Sciences
deep learning
music auto-tagging
joint embedding
title ATOSE: Audio Tagging with One-Sided Joint Embedding
title_full ATOSE: Audio Tagging with One-Sided Joint Embedding
title_fullStr ATOSE: Audio Tagging with One-Sided Joint Embedding
title_full_unstemmed ATOSE: Audio Tagging with One-Sided Joint Embedding
title_short ATOSE: Audio Tagging with One-Sided Joint Embedding
title_sort atose audio tagging with one sided joint embedding
topic deep learning
music auto-tagging
joint embedding
url https://www.mdpi.com/2076-3417/13/15/9002
work_keys_str_mv AT jaehwanlee atoseaudiotaggingwithonesidedjointembedding
AT daekyeongmoon atoseaudiotaggingwithonesidedjointembedding
AT jiksookim atoseaudiotaggingwithonesidedjointembedding
AT minkyoungcho atoseaudiotaggingwithonesidedjointembedding