Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keyword...

Full description

Bibliographic Details
Main Authors: Duygulu, P, Barnard, K, Freitas, J, Forsyth, D
Format: Conference item
Published: Springer Berlin Heidelberg 2002
_version_ 1826285437096820736
author Duygulu, P
Barnard, K
Freitas, J
Forsyth, D
author_facet Duygulu, P
Barnard, K
Freitas, J
Forsyth, D
author_sort Duygulu, P
collection OXFORD
description We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.
first_indexed 2024-03-07T01:28:49Z
format Conference item
id oxford-uuid:92e538e8-10b7-461a-9f8e-db525ab8fdf4
institution University of Oxford
last_indexed 2024-03-07T01:28:49Z
publishDate 2002
publisher Springer Berlin Heidelberg
record_format dspace
spelling oxford-uuid:92e538e8-10b7-461a-9f8e-db525ab8fdf42022-03-26T23:28:39ZObject Recognition as Machine Translation: Learning a Lexicon for a Fixed Image VocabularyConference itemhttp://purl.org/coar/resource_type/c_5794uuid:92e538e8-10b7-461a-9f8e-db525ab8fdf4Department of Computer ScienceSpringer Berlin Heidelberg2002Duygulu, PBarnard, KFreitas, JForsyth, DWe describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.
spellingShingle Duygulu, P
Barnard, K
Freitas, J
Forsyth, D
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title_full Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title_fullStr Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title_full_unstemmed Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title_short Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
title_sort object recognition as machine translation learning a lexicon for a fixed image vocabulary
work_keys_str_mv AT duygulup objectrecognitionasmachinetranslationlearningalexiconforafixedimagevocabulary
AT barnardk objectrecognitionasmachinetranslationlearningalexiconforafixedimagevocabulary
AT freitasj objectrecognitionasmachinetranslationlearningalexiconforafixedimagevocabulary
AT forsythd objectrecognitionasmachinetranslationlearningalexiconforafixedimagevocabulary