Image similarity using an ensemble of context-sensitive models
Image similarity has been extensively studied in computer vision. In recent years, machine-learned models have shown their ability to encode more semantics than traditional multivariate metrics. However, in labelling semantic similarity, assigning a numerical score to a pair of images is impractical...
Main Authors: | , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Association for Computing Machinery
2024
|
_version_ | 1817932557687193600 |
---|---|
author | Liao, Z Chen, M |
author_facet | Liao, Z Chen, M |
author_sort | Liao, Z |
collection | OXFORD |
description | Image similarity has been extensively studied in computer vision. In recent years, machine-learned models have shown their ability to encode more semantics than traditional multivariate metrics. However, in labelling semantic similarity, assigning a numerical score to a pair of images is impractical, making the improvement and comparisons on the task difficult. In this work, we present a more intuitive approach to build and compare image similarity models based on labelled data in the form of A:R vs B:R, i.e., determining if an image A is closer to a reference image R than another image B. We address the challenges of sparse sampling in the image space (R, A, B) and biases in the models trained with context-based data by using an ensemble model. Our testing results show that the ensemble model constructed performs ∼5% better than the best individual context-sensitive models. They also performed better than the models that were directly fine-tuned using mixed imagery data as well as existing deep embeddings, e.g., CLIP [30] and DINO [3]. This work demonstrates that context-based labelling and model training can be effective when an appropriate ensemble approach is used to alleviate the limitation due to sparse sampling. |
first_indexed | 2024-12-09T03:39:49Z |
format | Conference item |
id | oxford-uuid:c510c7e3-8715-479b-b215-e596864abd1f |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:39:49Z |
publishDate | 2024 |
publisher | Association for Computing Machinery |
record_format | dspace |
spelling | oxford-uuid:c510c7e3-8715-479b-b215-e596864abd1f2024-12-05T09:56:26ZImage similarity using an ensemble of context-sensitive modelsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c510c7e3-8715-479b-b215-e596864abd1fEnglishSymplectic ElementsAssociation for Computing Machinery2024Liao, ZChen, MImage similarity has been extensively studied in computer vision. In recent years, machine-learned models have shown their ability to encode more semantics than traditional multivariate metrics. However, in labelling semantic similarity, assigning a numerical score to a pair of images is impractical, making the improvement and comparisons on the task difficult. In this work, we present a more intuitive approach to build and compare image similarity models based on labelled data in the form of A:R vs B:R, i.e., determining if an image A is closer to a reference image R than another image B. We address the challenges of sparse sampling in the image space (R, A, B) and biases in the models trained with context-based data by using an ensemble model. Our testing results show that the ensemble model constructed performs ∼5% better than the best individual context-sensitive models. They also performed better than the models that were directly fine-tuned using mixed imagery data as well as existing deep embeddings, e.g., CLIP [30] and DINO [3]. This work demonstrates that context-based labelling and model training can be effective when an appropriate ensemble approach is used to alleviate the limitation due to sparse sampling. |
spellingShingle | Liao, Z Chen, M Image similarity using an ensemble of context-sensitive models |
title | Image similarity using an ensemble of context-sensitive models |
title_full | Image similarity using an ensemble of context-sensitive models |
title_fullStr | Image similarity using an ensemble of context-sensitive models |
title_full_unstemmed | Image similarity using an ensemble of context-sensitive models |
title_short | Image similarity using an ensemble of context-sensitive models |
title_sort | image similarity using an ensemble of context sensitive models |
work_keys_str_mv | AT liaoz imagesimilarityusinganensembleofcontextsensitivemodels AT chenm imagesimilarityusinganensembleofcontextsensitivemodels |