Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich...

Full description

Bibliographic Details
Main Authors:	Zhu, Yukun, Kiros, Ryan, Zemel, Rich, Salakhutdinov, Ruslan, Urtasun, Raquel, Torralba, Antonio, Fidler, Sanja
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2017
Online Access:	http://hdl.handle.net/1721.1/112996 https://orcid.org/0000-0003-4915-0256

_version_	1811074537628893184
author	Zhu, Yukun Kiros, Ryan Zemel, Rich Salakhutdinov, Ruslan Urtasun, Raquel Torralba, Antonio Fidler, Sanja
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhu, Yukun Kiros, Ryan Zemel, Rich Salakhutdinov, Ruslan Urtasun, Raquel Torralba, Antonio Fidler, Sanja
author_sort	Zhu, Yukun
collection	MIT
description	Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.
first_indexed	2024-09-23T09:51:23Z
format	Article
id	mit-1721.1/112996
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T09:51:23Z
publishDate	2017
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/1129962022-09-30T17:17:22Z Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books Zhu, Yukun Kiros, Ryan Zemel, Rich Salakhutdinov, Ruslan Urtasun, Raquel Torralba, Antonio Fidler, Sanja Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Torralba, Antonio Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for. Natural Sciences and Engineering Research Council of Canada Canadian Institute for Advanced Research Samsung (Firm) Google (Firm) United States. Office of Naval Research (ONR-N00014-14-1-0232) 2017-12-29T20:28:58Z 2017-12-29T20:28:58Z 2016-02 2015-12 Article http://purl.org/eprint/type/ConferencePaper 978-1-4673-8391-2 http://hdl.handle.net/1721.1/112996 Zhu, Yukun, et al. "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books." 2015 IEEE International Conference on Computer Vision (ICCV), 7-13 December, 2015, Santiago, Chile, IEEE, 2015, pp. 19–27. https://orcid.org/0000-0003-4915-0256 en_US http://dx.doi.org/10.1109/ICCV.2015.11 2015 IEEE International Conference on Computer Vision (ICCV) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) arXiv
spellingShingle	Zhu, Yukun Kiros, Ryan Zemel, Rich Salakhutdinov, Ruslan Urtasun, Raquel Torralba, Antonio Fidler, Sanja Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title	Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title_full	Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title_fullStr	Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title_full_unstemmed	Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title_short	Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
title_sort	aligning books and movies towards story like visual explanations by watching movies and reading books
url	http://hdl.handle.net/1721.1/112996 https://orcid.org/0000-0003-4915-0256
work_keys_str_mv	AT zhuyukun aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT kirosryan aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT zemelrich aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT salakhutdinovruslan aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT urtasunraquel aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT torralbaantonio aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks AT fidlersanja aligningbooksandmoviestowardsstorylikevisualexplanationsbywatchingmoviesandreadingbooks

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

Similar Items