MovieQA: Understanding Stories in Movies through Question-Answering

We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "...

Full description

Bibliographic Details
Main Authors:	Tapaswi, Makarand, Zhu, Yukun, Stiefelhagen, Rainer, Torralba, Antonio, Urtasun, Raquel, Fidler, Sanja
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2018
Online Access:	http://hdl.handle.net/1721.1/113894 https://orcid.org/0000-0003-4915-0256

_version_	1811076672600932352
author	Tapaswi, Makarand Zhu, Yukun Stiefelhagen, Rainer Torralba, Antonio Urtasun, Raquel Fidler, Sanja
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tapaswi, Makarand Zhu, Yukun Stiefelhagen, Rainer Torralba, Antonio Urtasun, Raquel Fidler, Sanja
author_sort	Tapaswi, Makarand
collection	MIT
description	We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible answers, a correct one and four deceiving answers provided by human annotators. Our dataset is unique in that it contains multiple sources of information - video clips, plots, subtitles, scripts, and DVS. We analyze our data through various statistics and methods. We further extend existing QA techniques to show that question-answering with such open-ended semantics is hard. We make this data set public along with an evaluation benchmark to encourage inspiring work in this challenging domain. Keywords: Motion pictures, Visualization, Semantics, Voltage control, Cognition, Natural languages, Computer vision
first_indexed	2024-09-23T10:25:45Z
format	Article
id	mit-1721.1/113894
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:25:45Z
publishDate	2018
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1138942022-09-30T21:03:48Z MovieQA: Understanding Stories in Movies through Question-Answering Tapaswi, Makarand Zhu, Yukun Stiefelhagen, Rainer Torralba, Antonio Urtasun, Raquel Fidler, Sanja Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Torralba, Antonio We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible answers, a correct one and four deceiving answers provided by human annotators. Our dataset is unique in that it contains multiple sources of information - video clips, plots, subtitles, scripts, and DVS. We analyze our data through various statistics and methods. We further extend existing QA techniques to show that question-answering with such open-ended semantics is hard. We make this data set public along with an evaluation benchmark to encourage inspiring work in this challenging domain. Keywords: Motion pictures, Visualization, Semantics, Voltage control, Cognition, Natural languages, Computer vision 2018-02-26T21:43:32Z 2018-02-26T21:43:32Z 2016-12 2016-06 Article http://purl.org/eprint/type/ConferencePaper 978-1-4673-8851-1 http://hdl.handle.net/1721.1/113894 Tapaswi, Makarand, et al. "MovieQA: Understanding Stories in Movies through Question-Answering." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June, 2016, Las Vegas, Nevada, IEEE, 2016, pp. 4631–40. https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1109/CVPR.2016.501 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle	Tapaswi, Makarand Zhu, Yukun Stiefelhagen, Rainer Torralba, Antonio Urtasun, Raquel Fidler, Sanja MovieQA: Understanding Stories in Movies through Question-Answering
title	MovieQA: Understanding Stories in Movies through Question-Answering
title_full	MovieQA: Understanding Stories in Movies through Question-Answering
title_fullStr	MovieQA: Understanding Stories in Movies through Question-Answering
title_full_unstemmed	MovieQA: Understanding Stories in Movies through Question-Answering
title_short	MovieQA: Understanding Stories in Movies through Question-Answering
title_sort	movieqa understanding stories in movies through question answering
url	http://hdl.handle.net/1721.1/113894 https://orcid.org/0000-0003-4915-0256
work_keys_str_mv	AT tapaswimakarand movieqaunderstandingstoriesinmoviesthroughquestionanswering AT zhuyukun movieqaunderstandingstoriesinmoviesthroughquestionanswering AT stiefelhagenrainer movieqaunderstandingstoriesinmoviesthroughquestionanswering AT torralbaantonio movieqaunderstandingstoriesinmoviesthroughquestionanswering AT urtasunraquel movieqaunderstandingstoriesinmoviesthroughquestionanswering AT fidlersanja movieqaunderstandingstoriesinmoviesthroughquestionanswering

MovieQA: Understanding Stories in Movies through Question-Answering

Similar Items