A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this...

Full description

Bibliographic Details
Main Authors:	Yao Ge, Chong Tang, Haobo Li, Zikang Chen, Jingyan Wang, Wenda Li, Jonathan Cooper, Kevin Chetty, Daniele Faccio, Muhammad Imran, Qammer H. Abbasi
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-12-01
Series:	Scientific Data
Online Access:	https://doi.org/10.1038/s41597-023-02793-w

_version_	1797388576024952832
author	Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi
author_facet	Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi
author_sort	Yao Ge
collection	DOAJ
description	Abstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.
first_indexed	2024-03-08T22:42:49Z
format	Article
id	doaj.art-edd6a19b27234b20a9751230252188b5
institution	Directory Open Access Journal
issn	2052-4463
language	English
last_indexed	2024-03-08T22:42:49Z
publishDate	2023-12-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Data
spelling	doaj.art-edd6a19b27234b20a9751230252188b52023-12-17T12:06:26ZengNature PortfolioScientific Data2052-44632023-12-0110111710.1038/s41597-023-02793-wA comprehensive multimodal dataset for contactless lip reading and acoustic analysisYao Ge0Chong Tang1Haobo Li2Zikang Chen3Jingyan Wang4Wenda Li5Jonathan Cooper6Kevin Chetty7Daniele Faccio8Muhammad Imran9Qammer H. Abbasi10James Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowSchool of Science and Engineering, University of DundeeJames Watt School of Engineering, University of GlasgowDepartment of Security and Crime Science, University College LondonSchool of Physics & Astronomy, University of GlasgowJames Watt School of Engineering, University of GlasgowJames Watt School of Engineering, University of GlasgowAbstract Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.https://doi.org/10.1038/s41597-023-02793-w
spellingShingle	Yao Ge Chong Tang Haobo Li Zikang Chen Jingyan Wang Wenda Li Jonathan Cooper Kevin Chetty Daniele Faccio Muhammad Imran Qammer H. Abbasi A comprehensive multimodal dataset for contactless lip reading and acoustic analysis Scientific Data
title	A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_full	A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_fullStr	A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_full_unstemmed	A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_short	A comprehensive multimodal dataset for contactless lip reading and acoustic analysis
title_sort	comprehensive multimodal dataset for contactless lip reading and acoustic analysis
url	https://doi.org/10.1038/s41597-023-02793-w
work_keys_str_mv	AT yaoge acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT chongtang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT haoboli acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT zikangchen acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jingyanwang acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT wendali acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jonathancooper acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT kevinchetty acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT danielefaccio acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT muhammadimran acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT qammerhabbasi acomprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT yaoge comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT chongtang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT haoboli comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT zikangchen comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jingyanwang comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT wendali comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT jonathancooper comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT kevinchetty comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT danielefaccio comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT muhammadimran comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis AT qammerhabbasi comprehensivemultimodaldatasetforcontactlesslipreadingandacousticanalysis

A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Similar Items