ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
2016-05-01
|
Series: | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
Subjects: | |
Online Access: | http://ntv.ifmo.ru/file/article/15494.pdf |
_version_ | 1811193676711329792 |
---|---|
author | D.V. Ivanko I. S. Kipyatkova A. L. Ronzhin A. A. Karpov |
author_facet | D.V. Ivanko I. S. Kipyatkova A. L. Ronzhin A. A. Karpov |
author_sort | D.V. Ivanko |
collection | DOAJ |
description | The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion. |
first_indexed | 2024-04-12T00:13:29Z |
format | Article |
id | doaj.art-1f187f72b84342a885a768c78312c876 |
institution | Directory Open Access Journal |
issn | 2226-1494 2500-0373 |
language | English |
last_indexed | 2024-04-12T00:13:29Z |
publishDate | 2016-05-01 |
publisher | Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) |
record_format | Article |
series | Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki |
spelling | doaj.art-1f187f72b84342a885a768c78312c8762022-12-22T03:55:54ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732016-05-0116338740110.17586/2226-1494-2016-16-3-387-401ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITIOND.V. IvankoI. S. KipyatkovaA. L. RonzhinA. A. Karpov The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.http://ntv.ifmo.ru/file/article/15494.pdfaudio-visual integrationaudio-visual speech recognitionmultimodal analysismultimodal fusiondeep learning |
spellingShingle | D.V. Ivanko I. S. Kipyatkova A. L. Ronzhin A. A. Karpov ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki audio-visual integration audio-visual speech recognition multimodal analysis multimodal fusion deep learning |
title | ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION |
title_full | ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION |
title_fullStr | ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION |
title_full_unstemmed | ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION |
title_short | ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION |
title_sort | analysis of multimodal fusion techniques for audio visual speech recognition |
topic | audio-visual integration audio-visual speech recognition multimodal analysis multimodal fusion deep learning |
url | http://ntv.ifmo.ru/file/article/15494.pdf |
work_keys_str_mv | AT dvivanko analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition AT iskipyatkova analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition AT alronzhin analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition AT aakarpov analysisofmultimodalfusiontechniquesforaudiovisualspeechrecognition |