An fMRI dataset of 1,102 natural videos for visual event understanding

A visual event, such as a dog running in a park, communicates complex relationships between objects and their environment. The human visual system is tasked with transforming these spatiotemporal events into meaningful outputs so we can effectively interact with our environment. To form a useful rep...

Full description

Bibliographic Details
Main Author:	Lahner, Benjamin
Other Authors:	Oliva, Aude
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144631

_version_	1811087405659193344
author	Lahner, Benjamin
author2	Oliva, Aude
author_facet	Oliva, Aude Lahner, Benjamin
author_sort	Lahner, Benjamin
collection	MIT
description	A visual event, such as a dog running in a park, communicates complex relationships between objects and their environment. The human visual system is tasked with transforming these spatiotemporal events into meaningful outputs so we can effectively interact with our environment. To form a useful representation of the event, the visual system utilizes many visual processes, from object recognition to motion perception. Thus, studying the neural correlates of visual event understanding requires brain responses that capture the entire transformation from video-based stimuli to high-level conceptual understanding. However, despite its ecological importance and computational richness, there does not yet exist a dataset to sufficiently study visual event understanding. Here we release the Algonauts Action Videos (AAV) dataset composed of high quality functional magnetic resonance imaging brain responses to 1,102 richly annotated naturalistic video stimuli. We detail AAV’s experimental design and highlight its high quality and reliable activation throughout the visual and parietal cortices. Initial analyses show the signal contained in AAV reflects numerous visual processes representing different aspects of visual event understanding, from scene recognition to action recognition to memorability processing. Since AAV captures an ecologically-relevant and complex visual process, this dataset can be used to study how various aspects of visual perception integrate to form a meaningful understanding of a video. Additionally, we demonstrate its utility as a model evaluation benchmark to bridge the gap between visual neuroscience and video-based computer vision research.
first_indexed	2024-09-23T13:45:35Z
format	Thesis
id	mit-1721.1/144631
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T13:45:35Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1446312022-08-30T03:27:29Z An fMRI dataset of 1,102 natural videos for visual event understanding Lahner, Benjamin Oliva, Aude Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science A visual event, such as a dog running in a park, communicates complex relationships between objects and their environment. The human visual system is tasked with transforming these spatiotemporal events into meaningful outputs so we can effectively interact with our environment. To form a useful representation of the event, the visual system utilizes many visual processes, from object recognition to motion perception. Thus, studying the neural correlates of visual event understanding requires brain responses that capture the entire transformation from video-based stimuli to high-level conceptual understanding. However, despite its ecological importance and computational richness, there does not yet exist a dataset to sufficiently study visual event understanding. Here we release the Algonauts Action Videos (AAV) dataset composed of high quality functional magnetic resonance imaging brain responses to 1,102 richly annotated naturalistic video stimuli. We detail AAV’s experimental design and highlight its high quality and reliable activation throughout the visual and parietal cortices. Initial analyses show the signal contained in AAV reflects numerous visual processes representing different aspects of visual event understanding, from scene recognition to action recognition to memorability processing. Since AAV captures an ecologically-relevant and complex visual process, this dataset can be used to study how various aspects of visual perception integrate to form a meaningful understanding of a video. Additionally, we demonstrate its utility as a model evaluation benchmark to bridge the gap between visual neuroscience and video-based computer vision research. S.M. 2022-08-29T16:00:48Z 2022-08-29T16:00:48Z 2022-05 2022-06-21T19:25:39.641Z Thesis https://hdl.handle.net/1721.1/144631 0000-0002-1821-490X In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lahner, Benjamin An fMRI dataset of 1,102 natural videos for visual event understanding
title	An fMRI dataset of 1,102 natural videos for visual event understanding
title_full	An fMRI dataset of 1,102 natural videos for visual event understanding
title_fullStr	An fMRI dataset of 1,102 natural videos for visual event understanding
title_full_unstemmed	An fMRI dataset of 1,102 natural videos for visual event understanding
title_short	An fMRI dataset of 1,102 natural videos for visual event understanding
title_sort	fmri dataset of 1 102 natural videos for visual event understanding
url	https://hdl.handle.net/1721.1/144631
work_keys_str_mv	AT lahnerbenjamin anfmridatasetof1102naturalvideosforvisualeventunderstanding AT lahnerbenjamin fmridatasetof1102naturalvideosforvisualeventunderstanding

An fMRI dataset of 1,102 natural videos for visual event understanding

Similar Items