Improving automatic speech recognition through head pose driven visual grounding
In this paper, we present a multimodal speech recognition system for real world scene description tasks. Given a visual scene, the system dynamically biases its language model based on the content of the visual scene and visual attention of the speaker. Visual attention is used to focus on likely ob...
Main Author: | Vosoughi, Soroush |
---|---|
Other Authors: | Massachusetts Institute of Technology. Media Laboratory |
Format: | Article |
Language: | en_US |
Published: |
Association for Computing Machinery
2014
|
Online Access: | http://hdl.handle.net/1721.1/86943 https://orcid.org/0000-0002-2564-8909 |
Similar Items
-
Automatic visual speech recognition
by: Irwan Widjojo, et al.
Published: (2016) -
An automatic child-directed speech detector for the study of child language development
by: Vosoughi, Soroush, et al.
Published: (2013) -
Heading reference-assisted pose estimation for ground vehicles
by: Wang, Han, et al.
Published: (2020) -
Automatic detection and verification of rumors on Twitter
by: Vosoughi, Soroush
Published: (2015) -
Interactions of caregiver speech and early word learning in the Speechome corpus : computational explorations
by: Vosoughi, Soroush
Published: (2011)