Multimodal generative models for storytelling
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/1721.1/130680 |
_version_ | 1826215913523773440 |
---|---|
author | Bensaid, Eden. |
author2 | Jacob Andreas and Hendrik Strobelt. |
author_facet | Jacob Andreas and Hendrik Strobelt. Bensaid, Eden. |
author_sort | Bensaid, Eden. |
collection | MIT |
description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 |
first_indexed | 2024-09-23T16:39:01Z |
format | Thesis |
id | mit-1721.1/130680 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T16:39:01Z |
publishDate | 2021 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1306802021-05-25T03:02:00Z Multimodal generative models for storytelling Bensaid, Eden. Jacob Andreas and Hendrik Strobelt. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 Cataloged from the official PDF of thesis. Includes bibliographical references (pages 41-45). Storytelling is an open-ended task that entails creative thinking and requires a constant flow of ideas. Generative models have recently gained momentum thanks to their ability to identify complex data's inner structure and learn efficiently from unlabeled data [34]. Natural language generation (NLG) for storytelling is especially challenging because it requires the generated text to follow an overall theme while remaining creative and diverse to engage the reader [26]. Competitive story generation models still suffer from repetition [19], are unable to consistently condition on a theme [51] and struggle to produce a grounded, evolving storyboard [43]. Published story visualization architectures that generate images require a descriptive text to depict the scene to illustrate [30]. Therefore, it seems promising to evaluate an interactive multimodal generative platform that collaborates with writers to face the complex story-generation task. With co-creation, writers contribute their creative thinking, while generative models contribute to their constant workflow. In this work, we introduce a system and a web-based demo, FairyTailor¹, for machine-in-the-loop visual story co-creation. Users can create a cohesive children's story by weaving generated texts and retrieved images with their input. FairyTailor adds another modality and modifies the text generation process to produce a coherent and creative sequence of text and images. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-creation of both texts and images. It allows users to give feedback on co-created stories and share their results. We release the demo source code² for other researchers' use. by Eden Bensaid. M. Eng. M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science 2021-05-24T19:40:13Z 2021-05-24T19:40:13Z 2021 2021 Thesis https://hdl.handle.net/1721.1/130680 1251773235 eng MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582 45 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Bensaid, Eden. Multimodal generative models for storytelling |
title | Multimodal generative models for storytelling |
title_full | Multimodal generative models for storytelling |
title_fullStr | Multimodal generative models for storytelling |
title_full_unstemmed | Multimodal generative models for storytelling |
title_short | Multimodal generative models for storytelling |
title_sort | multimodal generative models for storytelling |
topic | Electrical Engineering and Computer Science. |
url | https://hdl.handle.net/1721.1/130680 |
work_keys_str_mv | AT bensaideden multimodalgenerativemodelsforstorytelling |