Audio retrieval with natural language queries
We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the AudioCaps and Clot...
Main Authors: | , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
International Speech Communication Association
2021
|
_version_ | 1826307889074012160 |
---|---|
author | Oncescu, A-M Sophia, AS Henriques, JF Akata, Z Albanie, S |
author_facet | Oncescu, A-M Sophia, AS Henriques, JF Akata, Z Albanie, S |
author_sort | Oncescu, A-M |
collection | OXFORD |
description | We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the AudioCaps and Clotho datasets. We then employ these benchmarks to establish baselines for cross-modal audio retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into cross-modal text-based audio retrieval with free-form text queries. |
first_indexed | 2024-03-07T07:09:47Z |
format | Conference item |
id | oxford-uuid:cc107781-c321-459d-ba0b-70cef6c52222 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:09:47Z |
publishDate | 2021 |
publisher | International Speech Communication Association |
record_format | dspace |
spelling | oxford-uuid:cc107781-c321-459d-ba0b-70cef6c522222022-06-13T12:45:52ZAudio retrieval with natural language queriesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:cc107781-c321-459d-ba0b-70cef6c52222EnglishSymplectic ElementsInternational Speech Communication Association2021Oncescu, A-MSophia, ASHenriques, JFAkata, ZAlbanie, SWe consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce challenging new benchmarks for text-based audio retrieval using text annotations sourced from the AudioCaps and Clotho datasets. We then employ these benchmarks to establish baselines for cross-modal audio retrieval, where we demonstrate the benefits of pre-training on diverse audio tasks. We hope that our benchmarks will inspire further research into cross-modal text-based audio retrieval with free-form text queries. |
spellingShingle | Oncescu, A-M Sophia, AS Henriques, JF Akata, Z Albanie, S Audio retrieval with natural language queries |
title | Audio retrieval with natural language queries |
title_full | Audio retrieval with natural language queries |
title_fullStr | Audio retrieval with natural language queries |
title_full_unstemmed | Audio retrieval with natural language queries |
title_short | Audio retrieval with natural language queries |
title_sort | audio retrieval with natural language queries |
work_keys_str_mv | AT oncescuam audioretrievalwithnaturallanguagequeries AT sophiaas audioretrievalwithnaturallanguagequeries AT henriquesjf audioretrievalwithnaturallanguagequeries AT akataz audioretrievalwithnaturallanguagequeries AT albanies audioretrievalwithnaturallanguagequeries |