A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-12-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-023-02793-w |
_version_ | 1797388576024952832 |
---|---|
author | Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi |
author_facet | Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi |
author_sort | Yao Ge |
collection | DOAJ |
description | Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition. |
first_indexed | 2024-03-08T22:42:49Z |
format | Article |
id | doaj.art-edd6a19b27234b20a9751230252188b5 |
institution | Directory Open Access Journal |
issn | 2052-4463 |
language | English |
last_indexed | 2024-03-08T22:42:49Z |
publishDate | 2023-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj.art-edd6a19b27234b20a9751230252188b52023-12-17T12:06:26ZengNature PortfolioScientific Data2052-44632023-12-0110111710.1038/s41597-023-02793-wA comprehensive multimodal dataset for contactless lip reading and acoustic analysisYao Ge0Chong Tang1Haobo Li2Zikang Chen3Jingyan Wang4Wenda Li5Jonathan Cooper6Kevin Chetty7Daniele Faccio8Muhammad Imran9Qammer H. Abbasi10James Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Science and Engineering, University of DundeeJames Watt School of Engineering, University of GlasgowDepartment of Security and Crime Science, University College LondonSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowAbstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.https://doi.org/10.1038/s41597-023-02793-w |
spellingShingle | Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi A comprehensive multimodal dataset for contactless lip reading and acoustic analysis Scientific Data |
title | A comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
title_full | A comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
title_fullStr | A comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
title_full_unstemmed | A comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
title_short | A comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
title_sort | comprehensive multimodal dataset for contactless lip reading and acoustic analysis |
url | https://doi.org/10.1038/s41597-023-02793-w |
work_keys_str_mv | AT yaoge acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT chongtang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT haoboli acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT zikangchen acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jingyanwang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT wendali acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jonathancooper acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT kevinchetty acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT danielefaccio acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT muhammadimran acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT qammerhabbasi acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT yaoge comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT chongtang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT haoboli comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT zikangchen comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jingyanwang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT wendali comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jonathancooper comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT kevinchetty comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT danielefaccio comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT muhammadimran comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT qammerhabbasi comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis |