Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene

This thesis introduces "Alive Scene," an online participatory platform for recording dynamic 3D environments and building collective interpretations of objects, events, and atmospheres within them. For instance, a user can browse a recording of a room and describe objects or events to loca...

Full description

Bibliographic Details
Main Author: Cheng, Chi-Li
Other Authors: Nagakura, Takehiko
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/157340
_version_ 1824458347876712448
author Cheng, Chi-Li
author2 Nagakura, Takehiko
author_facet Nagakura, Takehiko
Cheng, Chi-Li
author_sort Cheng, Chi-Li
collection MIT
description This thesis introduces "Alive Scene," an online participatory platform for recording dynamic 3D environments and building collective interpretations of objects, events, and atmospheres within them. For instance, a user can browse a recording of a room and describe objects or events to locate them; or select a time frame, adjust the camera angle, and add a comment to share a new narrative of the scene with others. Unlike traditional digital formats such as simple videos or 3D models, this platform is both three-dimensional and temporal at the same time, and the views are searchable using natural language sentences and sorted by relevance. By building the platform and testing it with human subjects, this thesis demonstrates that such a new participatory media of dynamic 3D environments fosters communal knowledge and enhances the spatial understanding of individual users. Alive Scene produces rich, semantic-level communication among users, akin to the dynamic propagation of cultural memes. The Alive Scene System integrates two advanced techniques: 3D scene reconstruction using Gaussian splatting, and semantic linking of human perceptions through the Contrastive Language-Image Pretraining (CLIP) model. These methods are currently among the most popular and efficient. The platform continually enriches its collection of users' views and interpretations through interactions with this semantic AI system, enabling the archiving of user inputs and suggesting new avenues for exploring diverse perspectives. The streamlined interaction interface promotes user engagement and facilitates the discovery of related views and perceptions. The user test employs a dynamic 3D scene of a student lounge, recorded at four different times, and involves 20 participants generating a total of 235 inputs. Four types of interactive behaviors were observed regarding users' views and interpretations: Disagreement, Simple Agreement, Sharing Perception by adding comments, and Adjusting Views. The analysis indicates evolutionary trends: Initially, users’ express disagreements and provide objective, general comments. As the platform gathers these inputs, a transition occurs where users begin sharing more subjective information and reinterpreting others' views. Eventually, users adjust camera angles when the captions are agreeable. Visualizations of this analysis illustrate that these dynamic behavioral changes facilitate the development of collective perception. For further investigations, this study could benefit from incorporating more elaborate 3D scenes, additional recording times, and a larger number of participants.
first_indexed 2025-02-19T04:24:27Z
format Thesis
id mit-1721.1/157340
institution Massachusetts Institute of Technology
last_indexed 2025-02-19T04:24:27Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1573402024-10-17T03:17:10Z Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene Cheng, Chi-Li Nagakura, Takehiko Massachusetts Institute of Technology. Department of Architecture This thesis introduces "Alive Scene," an online participatory platform for recording dynamic 3D environments and building collective interpretations of objects, events, and atmospheres within them. For instance, a user can browse a recording of a room and describe objects or events to locate them; or select a time frame, adjust the camera angle, and add a comment to share a new narrative of the scene with others. Unlike traditional digital formats such as simple videos or 3D models, this platform is both three-dimensional and temporal at the same time, and the views are searchable using natural language sentences and sorted by relevance. By building the platform and testing it with human subjects, this thesis demonstrates that such a new participatory media of dynamic 3D environments fosters communal knowledge and enhances the spatial understanding of individual users. Alive Scene produces rich, semantic-level communication among users, akin to the dynamic propagation of cultural memes. The Alive Scene System integrates two advanced techniques: 3D scene reconstruction using Gaussian splatting, and semantic linking of human perceptions through the Contrastive Language-Image Pretraining (CLIP) model. These methods are currently among the most popular and efficient. The platform continually enriches its collection of users' views and interpretations through interactions with this semantic AI system, enabling the archiving of user inputs and suggesting new avenues for exploring diverse perspectives. The streamlined interaction interface promotes user engagement and facilitates the discovery of related views and perceptions. The user test employs a dynamic 3D scene of a student lounge, recorded at four different times, and involves 20 participants generating a total of 235 inputs. Four types of interactive behaviors were observed regarding users' views and interpretations: Disagreement, Simple Agreement, Sharing Perception by adding comments, and Adjusting Views. The analysis indicates evolutionary trends: Initially, users’ express disagreements and provide objective, general comments. As the platform gathers these inputs, a transition occurs where users begin sharing more subjective information and reinterpreting others' views. Eventually, users adjust camera angles when the captions are agreeable. Visualizations of this analysis illustrate that these dynamic behavioral changes facilitate the development of collective perception. For further investigations, this study could benefit from incorporating more elaborate 3D scenes, additional recording times, and a larger number of participants. S.M. 2024-10-16T17:43:50Z 2024-10-16T17:43:50Z 2024-05 2024-10-10T15:16:56.784Z Thesis https://hdl.handle.net/1721.1/157340 0009-0001-0774-4577 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Cheng, Chi-Li
Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title_full Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title_fullStr Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title_full_unstemmed Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title_short Alive Scene: Participatory Multimodal AI Framework for Collective Narratives in Dynamic 3D Scene
title_sort alive scene participatory multimodal ai framework for collective narratives in dynamic 3d scene
url https://hdl.handle.net/1721.1/157340
work_keys_str_mv AT chengchili alivesceneparticipatorymultimodalaiframeworkforcollectivenarrativesindynamic3dscene