ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand h...

Full description

Bibliographic Details
Main Authors: D.V. Ivanko, I. S. Kipyatkova, A. L. Ronzhin, A. A. Karpov
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2016-05-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:http://ntv.ifmo.ru/file/article/15494.pdf
_version_ 1811193676711329792
author D.V. Ivanko
I. S. Kipyatkova
A. L. Ronzhin
A. A. Karpov
author_facet D.V. Ivanko
I. S. Kipyatkova
A. L. Ronzhin
A. A. Karpov
author_sort D.V. Ivanko
collection DOAJ
description The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
first_indexed 2024-04-12T00:13:29Z
format Article
id doaj.art-1f187f72b84342a885a768c78312c876
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-04-12T00:13:29Z
publishDate 2016-05-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-1f187f72b84342a885a768c78312c8762022-12-22T03:55:54ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732016-05-0116338740110.17586/2226-1494-2016-16-3-387-401ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITIOND.V. IvankoI. S. KipyatkovaA. L. RonzhinA. A. Karpov The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.http://ntv.ifmo.ru/file/article/15494.pdfaudio-visual integrationaudio-visual speech recognitionmultimodal analysismultimodal fusiondeep learning
spellingShingle D.V. Ivanko
I. S. Kipyatkova
A. L. Ronzhin
A. A. Karpov
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
audio-visual integration
audio-visual speech recognition
multimodal analysis
multimodal fusion
deep learning
title ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
title_full ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
title_fullStr ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
title_full_unstemmed ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
title_short ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
title_sort analysis of multimodal fusion techniques for audio visual speech recognition
topic audio-visual integration
audio-visual speech recognition
multimodal analysis
multimodal fusion
deep learning
url http://ntv.ifmo.ru/file/article/15494.pdf
work_keys_str_mv AT dvivanko analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition
AT iskipyatkova analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition
AT alronzhin analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition
AT aakarpov analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition