Summary: | Over the past 15 years work on visual salience has been restricted to models of low-level, bottom-up salience that give an estimate of the salience for every pixel in an image. This study concerns the question of how to measure the salience of objects: given an image and a list of areas of interest (AOIs), can we assign salience scores to the AOIs that reflect their visual prominence? There is increasing evidence that fixations locations are best explained at an object level and an object-level notion of visual salience can easily be incorporated with other object features task relevance and concepts such as scene context. However, extracting scores for AOIs from the saliency maps output by existing models is a non-trivial task. Using simple psychophysical (1/f-noise) stimuli, we demonstrate that simple methods for assigning salience score to AOIs (such as taking the maximum, mean, or sum of the relevant pixels in the salience map) produce unintuitive results, such as predicting that larger objects are less salient. We also evaluate object salience models over a range of tasks and compare to empirical data. Beyond predicting the number of fixations to different objects in a scene, we also estimate the difficulty of visual search trials; and incorporate visual salience into language production tasks. We present a simple object-based salience model (based on comparing the likelihood of an AOI given the rest of the image to the likelihood of a typical patch of the same area] that gives intuitive results for the 1/f-noise stimuli and performs as well as existing methods on empirical datasets.
|