A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this...

Full description

Bibliographic Details
Main Authors: Yao Ge, Chong Tang, Haobo Li, Zikang Chen, Jingyan Wang, Wenda Li, Jonathan Cooper, Kevin Chetty, Daniele Faccio, Muhammad Imran, Qammer H. Abbasi
Format: Article
Language:English
Published: Nature Portfolio 2023-12-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-02793-w
_version_ 1797388576024952832
author Yao Ge
Chong Tang
Haobo Li
Zikang Chen
Jingyan Wang
Wenda Li
Jonathan Cooper
Kevin Chetty
Daniele Faccio
Muhammad Imran
Qammer H. Abbasi
author_facet Yao Ge
Chong Tang
Haobo Li
Zikang Chen
Jingyan Wang
Wenda Li
Jonathan Cooper
Kevin Chetty
Daniele Faccio
Muhammad Imran
Qammer H. Abbasi
author_sort Yao Ge
collection DOAJ
description Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.
first_indexed 2024-03-08T22:42:49Z
format Article
id doaj.art-edd6a19b27234b20a9751230252188b5
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-08T22:42:49Z
publishDate 2023-12-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-edd6a19b27234b20a9751230252188b52023-12-17T12:06:26ZengNature PortfolioScientific Data2052-44632023-12-0110111710.1038/s41597-023-02793-wA comprehensive multimodal dataset for contactless lip reading and acoustic analysisYao Ge0Chong Tang1Haobo Li2Zikang Chen3Jingyan Wang4Wenda Li5Jonathan Cooper6Kevin Chetty7Daniele Faccio8Muhammad Imran9Qammer H. Abbasi10James Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Science and Engineering, University of DundeeJames Watt School of Engineering, University of GlasgowDepartment of Security and Crime Science, University College LondonSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowAbstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.https://doi.org/10.1038/s41597-023-02793-w
spellingShingle Yao Ge
Chong Tang
Haobo Li
Zikang Chen
Jingyan Wang
Wenda Li
Jonathan Cooper
Kevin Chetty
Daniele Faccio
Muhammad Imran
Qammer H. Abbasi
A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
Scientific Data
title A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_full A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_fullStr A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_full_unstemmed A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_short A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_sort comprehensive multimodal dataset for contactless lip reading and acoustic analysis
url https://doi.org/10.1038/s41597-023-02793-w
work_keys_str_mv AT yaoge acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT chongtang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT haoboli acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT zikangchen acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT jingyanwang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT wendali acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT jonathancooper acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT kevinchetty acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT danielefaccio acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT muhammadimran acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT qammerhabbasi acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT yaoge comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT chongtang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT haoboli comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT zikangchen comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT jingyanwang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT wendali comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT jonathancooper comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT kevinchetty comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT danielefaccio comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT muhammadimran comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis
AT qammerhabbasi comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis