Multimodal generative models for storytelling

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021

Bibliographic Details
Main Author: Bensaid, Eden.
Other Authors: Jacob Andreas and Hendrik Strobelt.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2021
Subjects:
Online Access:https://hdl.handle.net/1721.1/130680
_version_ 1826215913523773440
author Bensaid, Eden.
author2 Jacob Andreas and Hendrik Strobelt.
author_facet Jacob Andreas and Hendrik Strobelt.
Bensaid, Eden.
author_sort Bensaid, Eden.
collection MIT
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021
first_indexed 2024-09-23T16:39:01Z
format Thesis
id mit-1721.1/130680
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T16:39:01Z
publishDate 2021
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1306802021-05-25T03:02:00Z Multimodal generative models for storytelling Bensaid, Eden. Jacob Andreas and Hendrik Strobelt. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 41-45). Storytelling is an open-ended task that entails creative thinking and requires a constant flow of ideas. Generative models have recently gained momentum thanks to their ability to identify complex data's inner structure and learn efficiently from unlabeled data [34]. Natural language generation (NLG) for storytelling is especially challenging because it requires the generated text to follow an overall theme while remaining creative and diverse to engage the reader [26]. Competitive story generation models still suffer from repetition [19], are unable to consistently condition on a theme [51] and struggle to produce a grounded, evolving storyboard [43]. Published story visualization architectures that generate images require a descriptive text to depict the scene to illustrate [30]. Therefore, it seems promising to evaluate an interactive multimodal generative platform that collaborates with writers to face the complex story-generation task. With co-creation, writers contribute their creative thinking, while generative models contribute to their constant workflow. In this work, we introduce a system and a web-based demo, FairyTailor¹, for machine-in-the-loop visual story co-creation. Users can create a cohesive children's story by weaving generated texts and retrieved images with their input. FairyTailor adds another modality and modifies the text generation process to produce a coherent and creative sequence of text and images. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-creation of both texts and images. It allows users to give feedback on co-created stories and share their results. We release the demo source code² for other researchers' use. by Eden Bensaid. M. Eng. M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2021-05-24T19:40:13Z 2021-05-24T19:40:13Z 2021 2021 Thesis https://hdl.handle.net/1721.1/130680 1251773235 eng MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582 45 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Bensaid, Eden.
Multimodal generative models for storytelling
title Multimodal generative models for storytelling
title_full Multimodal generative models for storytelling
title_fullStr Multimodal generative models for storytelling
title_full_unstemmed Multimodal generative models for storytelling
title_short Multimodal generative models for storytelling
title_sort multimodal generative models for storytelling
topic Electrical Engineering and Computer Science.
url https://hdl.handle.net/1721.1/130680
work_keys_str_mv AT bensaideden multimodalgenerativemodelsforstorytelling