Summary: | Referring expression generation (REG) presents the converse problem to visualsearch: Given a scene and a specified target, how does one generate adescription which would allow somebody else to quickly and accurately locatethe target? Previous work in psycholinguistics and natural language processingthat has addressed this question identifies only a limited role for vision inthis task. That previous work, which relies largely on simple scenes, tends totreat vision as a pre-process for extracting feature categories that arerelevant to disambiguation. However, the visual search literature suggeststhat some descriptions are better than others at enabling listeners to searchefficiently within complex stimuli. This paper presents the results of a studytesting whether speakers are sensitive to visual features that allow them tocompose such `good' descriptions. Our results show that visual properties(salience, clutter, area, and distance) influence REG for targets embedded inimages from the *Where's Wally?* books, which are an order of magnitudemore complex than traditional stimuli. Referring expressions for large salienttargets are shorter than those for smaller and less salient targets, and targets within highly cluttered scenes are described using more words.We also find that speakers are more likely to mention non-target landmarks thatare large, salient, and in close proximity to the target. These findingsidentfy a key role for visual salience in language production decisions and highlight the importance of scene complexity for REG.
|