Improving automatic speech recognition through head pose driven visual grounding

Improving automatic speech recognition through head pose driven visual grounding

In this paper, we present a multimodal speech recognition system for real world scene description tasks. Given a visual scene, the system dynamically biases its language model based on the content of the visual scene and visual attention of the speaker. Visual attention is used to focus on likely ob...

Full description

Bibliographic Details
Main Author:	Vosoughi, Soroush
Other Authors:	Massachusetts Institute of Technology. Media Laboratory
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery 2014
Online Access:	http://hdl.handle.net/1721.1/86943 https://orcid.org/0000-0002-2564-8909

Similar Items

Automatic visual speech recognition
by: Irwan Widjojo, et al.
Published: (2016)

An automatic child-directed speech detector for the study of child language development
by: Vosoughi, Soroush, et al.
Published: (2013)

Heading reference-assisted pose estimation for ground vehicles
by: Wang, Han, et al.
Published: (2020)

Automatic detection and verification of rumors on Twitter
by: Vosoughi, Soroush
Published: (2015)

Interactions of caregiver speech and early word learning in the Speechome corpus : computational explorations
by: Vosoughi, Soroush
Published: (2011)

Tweet Acts: A Speech Act Classifier for Twitter
by: Vosoughi, Soroush, et al.
Published: (2016)

A longitudinal study of prosodic exaggeration in child-directed speech
by: Vosoughi, Soroush, et al.
Published: (2013)

Pronunciation learning for automatic speech recognition
by: Badr, Ibrahim
Published: (2011)

Analysis and modeling of non-native speech for automatic speech recognition
by: Livescu, Karen, 1975-
Published: (2013)

A Semi-Automatic Method for Efficient Detection of Stories on Social Media
by: Vosoughi, Soroush, et al.
Published: (2016)

Dual-pivot pose determination of human head based on head movement
by: Yusoff, Fakhrul Hazman, et al.
Published: (2007)

Speaker-machine interaction in automatic speech recognition.
Published: (2004)

The Use of Distinctive Features for Automatic Speech Recognition
by: Meng, Helen Mei-Ling
Published: (2023)

A new structure for automatic speech recognition
by: Duchnowski, Paul
Published: (2005)

Signal enhancement for automatic recognition of noisy speech
Published: (2004)

The use of distinctive features for automatic speech recognition
by: Meng, Helen M
Published: (2005)

Using graphone models in automatic speech recognition
by: Wang, Stanley Xinlei
Published: (2010)

Automatic acquisition of language models for speech recognition
by: McCandless, Michael Kyle
Published: (2007)

Signal enhancement for automatic recognition of noisy speech
by: Verbout, Shawn M. (Shawn Matthew)
Published: (2007)

Automatic Acquisition of Language Models for Speech Recognition
by: McCandless, Michael Kyle
Published: (2023)

Automatic Detection and Categorization of Election-Related Tweets
by: Vijayaraghavan, Prashanth, et al.
Published: (2016)

Towards a face recognition system : face detection, face registration, and head pose estimation
by: Ying, Ying
Published: (2014)

Grounding language models in spatiotemporal context
by: Roy, Brandon C., et al.
Published: (2014)

Automatic Estimation of Transcription Accuracy and Difficulty
by: Vosoughi, Soroush, et al.
Published: (2012)

Neural techniques for modeling visually grounded speech
by: Leidal, Kenneth (Kenneth Knute)
Published: (2018)

HeadLock : wide-range head pose estimation for low resolution video
by: DeCamp, Philip (Philip James)
Published: (2008)

Automatic speech recognition for air traffic control communications
by: Poh, Leston Choo Kiat
Published: (2023)

A comparison of auditory models for automatic speech recognition
by: Jankowski, Charles Robert
Published: (2005)

The use of speaker correlation information for automatic speech recognition
by: Hazen, Timothy J. (Timothy James), 1969-
Published: (2009)

Multi-level acoustic modeling for automatic speech recognition
by: Chang, Hung-An, Ph. D. Massachusetts Institute of Technology
Published: (2012)

Automatic Speech Recognition for Air Traffic Control Communications
by: Badrinath, Sandeep, et al.
Published: (2022)

Multilingual techniques for low resource automatic speech recognition
by: Chuangsuwanich, Ekapol
Published: (2016)

Automatic refinement of hidden Markov models for speech recognition
by: Schlueter, Stephen
Published: (2008)

Automatic acoustic measurement optimization for segmental speech recognition
by: Muzumdar, Manish D. (Manish Deepak)
Published: (2008)

Feature-based pronunciation modeling for automatic speech recognition
by: Livescu, Karen, 1975-
Published: (2008)

A computational model for the automatic recognition of affect in speech
by: Fernandez, Raul
Published: (2005)

Pose-robust face recognition
by: Xie, Zhuofan
Published: (2020)

Articulatory features for robust visual speech recognition
by: Saenko, Ekaterina, 1976-
Published: (2005)

The usefulness of automatic speech recognition (ASR) Eyespeak Software in improving Iraqi EFL students’ pronunciation
by: Sidgi, Lina Fathi Sidig, et al.
Published: (2017)

Mutually reinforcing motion-pose framework for pose invariant action recognition
by: Ramanathan, Manoj, et al.
Published: (2020)