Enabling data-driven video production with storytelling methodologies

Video has become an increasingly dominant form of storytelling. Current video storytelling research mainly focuses on short video clips that are usually a few seconds in length and contain limited and focused content. It is relatively well studied due to the existence of large datasets. However, for...

Full description

Bibliographic Details
Main Author: Dong, Yi
Other Authors: Miao Chun Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159081
_version_ 1811685395429064704
author Dong, Yi
author2 Miao Chun Yan
author_facet Miao Chun Yan
Dong, Yi
author_sort Dong, Yi
collection NTU
description Video has become an increasingly dominant form of storytelling. Current video storytelling research mainly focuses on short video clips that are usually a few seconds in length and contain limited and focused content. It is relatively well studied due to the existence of large datasets. However, for long videos, we lack datasets annotated by domain experts. Moreover, a video story is an interdisciplinary subject that involves different modalities. Therefore, we need a holistic view to facilitate effective storytelling. I aim to fill this gap from both the data perspective and model perspective. The key contribution of this thesis is to develop a more holistic view of video storytelling through the use of data-driven approaches combined with cinematography and psychological insights. This thesis considers both a single video story and a repository of video stories. For a video story, I have studied how to model both long-term and short-term interactions among its segments using both visual and structural characteristics. The primary focus is on story-based video summarization and video paragraph captioning. For a repository of videos, I have studied video streaming based on the viewer’s emotional status to create a video therapy for the cognitively impaired elderly. Firstly, traditional user-interest-based video summarization is extended in Chapter 3 to a holistic story-based cinematography-aware approach via domain-specific editing idioms. By drawing attention to the shortage of storytelling datasets with professional editor’s decisions, the Television Commercial (TVC) dataset is proposed to contain 618 professional TVC summarization pairs with annotations of editing decisions from domain experts. Existing efforts rely on datasets containing only user interests. However, professional editors take a more holistic view, including domain-specific interest, cinematography rules, and common summarization metrics. Video summarization models are built on the established concept of editing idioms to incorporate rules of thumb for conveying a narrative. Users can efficiently explore different narrative styles with various combinations of editing idioms from a variety of domains. Secondly, segment-level recurrence video paragraph captioning is extended in Chapter 4 to a holistic graph-based approach via an extra-hop mechanism in Transformers. Video paragraph captioning needs to capture the interdependent information since multiple events coexist and even overlap in a video. We advocate that a self-attention mechanism should be enhanced by hopping across related segments to propagate surrounding and contextual information. Specifically, I propose video graph Transformers that can merge the input segment information into an event correlation graph. Then the segment-level recurrence is extended with the graph-attending ability to represent the video story with more holistic information. As a result, this holistic representation will help to generate more accurate and coherent paragraph captions. Thirdly, Chapter 5 investigates viewer emotion-aware video storytelling as a non-pharmacological therapy for the cognitively impaired elderly. I have designed a system that can select video contents from a video repository based on the viewer’s emotional status. Specifically, a sequential decision problem is formulated to consider the long-term effects of each selection. Numerical results have verified the effectiveness and robustness of the algorithm. The critical insight of this thesis is that machine learning approaches combined with traditional cinematography and psychological effects can provide a multidomain multi-scale understanding of the input video and enable video production with strong storytelling capabilities. The real-world deployment of the proposed approaches is described in Chapter 6. Finally, Chapter 7 discusses recent advances and future directions in this field.
first_indexed 2024-10-01T04:43:51Z
format Thesis-Doctor of Philosophy
id ntu-10356/159081
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:43:51Z
publishDate 2022
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1590812023-03-05T16:34:19Z Enabling data-driven video production with storytelling methodologies Dong, Yi Miao Chun Yan Interdisciplinary Graduate School (IGS) Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) ASCYMiao@ntu.edu.sg Engineering::Computer science and engineering::Computer applications::Arts and humanities Video has become an increasingly dominant form of storytelling. Current video storytelling research mainly focuses on short video clips that are usually a few seconds in length and contain limited and focused content. It is relatively well studied due to the existence of large datasets. However, for long videos, we lack datasets annotated by domain experts. Moreover, a video story is an interdisciplinary subject that involves different modalities. Therefore, we need a holistic view to facilitate effective storytelling. I aim to fill this gap from both the data perspective and model perspective. The key contribution of this thesis is to develop a more holistic view of video storytelling through the use of data-driven approaches combined with cinematography and psychological insights. This thesis considers both a single video story and a repository of video stories. For a video story, I have studied how to model both long-term and short-term interactions among its segments using both visual and structural characteristics. The primary focus is on story-based video summarization and video paragraph captioning. For a repository of videos, I have studied video streaming based on the viewer’s emotional status to create a video therapy for the cognitively impaired elderly. Firstly, traditional user-interest-based video summarization is extended in Chapter 3 to a holistic story-based cinematography-aware approach via domain-specific editing idioms. By drawing attention to the shortage of storytelling datasets with professional editor’s decisions, the Television Commercial (TVC) dataset is proposed to contain 618 professional TVC summarization pairs with annotations of editing decisions from domain experts. Existing efforts rely on datasets containing only user interests. However, professional editors take a more holistic view, including domain-specific interest, cinematography rules, and common summarization metrics. Video summarization models are built on the established concept of editing idioms to incorporate rules of thumb for conveying a narrative. Users can efficiently explore different narrative styles with various combinations of editing idioms from a variety of domains. Secondly, segment-level recurrence video paragraph captioning is extended in Chapter 4 to a holistic graph-based approach via an extra-hop mechanism in Transformers. Video paragraph captioning needs to capture the interdependent information since multiple events coexist and even overlap in a video. We advocate that a self-attention mechanism should be enhanced by hopping across related segments to propagate surrounding and contextual information. Specifically, I propose video graph Transformers that can merge the input segment information into an event correlation graph. Then the segment-level recurrence is extended with the graph-attending ability to represent the video story with more holistic information. As a result, this holistic representation will help to generate more accurate and coherent paragraph captions. Thirdly, Chapter 5 investigates viewer emotion-aware video storytelling as a non-pharmacological therapy for the cognitively impaired elderly. I have designed a system that can select video contents from a video repository based on the viewer’s emotional status. Specifically, a sequential decision problem is formulated to consider the long-term effects of each selection. Numerical results have verified the effectiveness and robustness of the algorithm. The critical insight of this thesis is that machine learning approaches combined with traditional cinematography and psychological effects can provide a multidomain multi-scale understanding of the input video and enable video production with strong storytelling capabilities. The real-world deployment of the proposed approaches is described in Chapter 6. Finally, Chapter 7 discusses recent advances and future directions in this field. Doctor of Philosophy 2022-06-07T07:48:31Z 2022-06-07T07:48:31Z 2022 Thesis-Doctor of Philosophy Dong, Y. (2022). Enabling data-driven video production with storytelling methodologies. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/159081 https://hdl.handle.net/10356/159081 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computer applications::Arts and humanities
Dong, Yi
Enabling data-driven video production with storytelling methodologies
title Enabling data-driven video production with storytelling methodologies
title_full Enabling data-driven video production with storytelling methodologies
title_fullStr Enabling data-driven video production with storytelling methodologies
title_full_unstemmed Enabling data-driven video production with storytelling methodologies
title_short Enabling data-driven video production with storytelling methodologies
title_sort enabling data driven video production with storytelling methodologies
topic Engineering::Computer science and engineering::Computer applications::Arts and humanities
url https://hdl.handle.net/10356/159081
work_keys_str_mv AT dongyi enablingdatadrivenvideoproductionwithstorytellingmethodologies