Statistics of High-level Scene Context

Context is critical to our ability to recognize environments and to search for objects within them: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we...

Full description

Bibliographic Details
Main Author: Michelle R. Greene
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-10-01
Series:Frontiers in Psychology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00777/full
_version_ 1819091020483133440
author Michelle R. Greene
author_facet Michelle R. Greene
author_sort Michelle R. Greene
collection DOAJ
description Context is critical to our ability to recognize environments and to search for objects within them: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48,167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell, Torralba, Muphy & Freeman, 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed things in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human rapid scene categorization is discussed. Ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. Some objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help researchers in visual cognition design new data-driven experiments.
first_indexed 2024-12-21T22:33:05Z
format Article
id doaj.art-ef56acc9f8664343b2bdd67b0f27dcc2
institution Directory Open Access Journal
issn 1664-1078
language English
last_indexed 2024-12-21T22:33:05Z
publishDate 2013-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Psychology
spelling doaj.art-ef56acc9f8664343b2bdd67b0f27dcc22022-12-21T18:48:03ZengFrontiers Media S.A.Frontiers in Psychology1664-10782013-10-01410.3389/fpsyg.2013.0077754269Statistics of High-level Scene ContextMichelle R. Greene0Stanford UniversityContext is critical to our ability to recognize environments and to search for objects within them: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships. A total of 48,167 objects were hand-labeled in 3499 scenes using the LabelMe tool (Russell, Torralba, Muphy & Freeman, 2008). From these data, I computed a variety of descriptive statistics at three different levels of analysis: the ensemble statistics that describe the density and spatial distribution of unnamed things in the scene; the bag of words level where scenes are described by the list of objects contained within them; and the structural level where the spatial distribution and relationships between the objects are measured. The utility of each level of description for scene categorization was assessed through the use of linear classifiers, and the plausibility of each level for modeling human rapid scene categorization is discussed. Ensemble statistics were found to be the most informative (per feature), and also best explained human patterns of categorization errors. Although a bag of words classifier had similar performance to human observers, it had a markedly different pattern of errors. Some objects are more useful than others, and ceiling classification performance could be achieved using only the 64 most informative objects. As object location tends not to vary as a function of category, structural information provided little additional information. Additionally, these data provide valuable information on natural scene redundancy that can be exploited for machine vision, and can help researchers in visual cognition design new data-driven experiments.http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00777/fullData MiningScene Recognitionensemblescenebag of words: context
spellingShingle Michelle R. Greene
Statistics of High-level Scene Context
Frontiers in Psychology
Data Mining
Scene Recognition
ensemble
scene
bag of words
: context
title Statistics of High-level Scene Context
title_full Statistics of High-level Scene Context
title_fullStr Statistics of High-level Scene Context
title_full_unstemmed Statistics of High-level Scene Context
title_short Statistics of High-level Scene Context
title_sort statistics of high level scene context
topic Data Mining
Scene Recognition
ensemble
scene
bag of words
: context
url http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00777/full
work_keys_str_mv AT michellergreene statisticsofhighlevelscenecontext