Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging.
In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Similarity-based retrieval involves automatically analyzing a music track and fetching analogous tracks from a database. Auto-tagging, on the other hand, assesses a music track to...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2023-01-01
|
Series: | PLoS ONE |
Online Access: | https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0294643&type=printable |
_version_ | 1797392940229722112 |
---|---|
author | Taketo Akama Hiroaki Kitano Katsuhiro Takematsu Yasushi Miyajima Natalia Polouliakh |
author_facet | Taketo Akama Hiroaki Kitano Katsuhiro Takematsu Yasushi Miyajima Natalia Polouliakh |
author_sort | Taketo Akama |
collection | DOAJ |
description | In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Similarity-based retrieval involves automatically analyzing a music track and fetching analogous tracks from a database. Auto-tagging, on the other hand, assesses a music track to deduce associated tags, such as genre and mood. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Contrastive learning-based self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this work, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of music tracks. |
first_indexed | 2024-03-08T23:54:52Z |
format | Article |
id | doaj.art-905cc4c4bce5426facb87b1cc96ef5e5 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-03-08T23:54:52Z |
publishDate | 2023-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-905cc4c4bce5426facb87b1cc96ef5e52023-12-13T05:32:39ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-011811e029464310.1371/journal.pone.0294643Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging.Taketo AkamaHiroaki KitanoKatsuhiro TakematsuYasushi MiyajimaNatalia PolouliakhIn the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Similarity-based retrieval involves automatically analyzing a music track and fetching analogous tracks from a database. Auto-tagging, on the other hand, assesses a music track to deduce associated tags, such as genre and mood. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Contrastive learning-based self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this work, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of music tracks.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0294643&type=printable |
spellingShingle | Taketo Akama Hiroaki Kitano Katsuhiro Takematsu Yasushi Miyajima Natalia Polouliakh Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. PLoS ONE |
title | Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. |
title_full | Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. |
title_fullStr | Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. |
title_full_unstemmed | Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. |
title_short | Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging. |
title_sort | auxiliary self supervision to metric learning for music similarity based retrieval and auto tagging |
url | https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0294643&type=printable |
work_keys_str_mv | AT taketoakama auxiliaryselfsupervisiontometriclearningformusicsimilaritybasedretrievalandautotagging AT hiroakikitano auxiliaryselfsupervisiontometriclearningformusicsimilaritybasedretrievalandautotagging AT katsuhirotakematsu auxiliaryselfsupervisiontometriclearningformusicsimilaritybasedretrievalandautotagging AT yasushimiyajima auxiliaryselfsupervisiontometriclearningformusicsimilaritybasedretrievalandautotagging AT nataliapolouliakh auxiliaryselfsupervisiontometriclearningformusicsimilaritybasedretrievalandautotagging |