Audio captioning and retrieval with improved cross-modal objectives
Automated Audio Captioning (AAC) is the task of generating descriptive captions from an input audio clip, while Language-Based Audio Retrieval (LBAR) is the task of retrieving the most relevant audio clip based on an input text query. AAC requires a model that is not only able to comprehend the acou...
Autor principal: | Koh, Andrew Jin Jie |
---|---|
Outros Autores: | Chng Eng Siong |
Formato: | Thesis-Doctor of Philosophy |
Idioma: | English |
Publicado em: |
Nanyang Technological University
2023
|
Assuntos: | |
Acesso em linha: | https://hdl.handle.net/10356/172437 |
Registros relacionados
-
Cross-modal graph with meta concepts for video captioning
por: Wang, Hao, et al.
Publicado em: (2022) -
Improved image captioning techniques with comparative study
por: He, Cari
Publicado em: (2021) -
Audio pattern discovery and retrieval
por: Wang, Lei
Publicado em: (2013) -
Deep learning-based image captioning
por: Chong, Kaydon
Publicado em: (2019) -
Neural image and video captioning (NIVC)
por: Lee, Jeremy Kian Kiat
Publicado em: (2022)