Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text de...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10373836/ |
_version_ | 1797366819518939136 |
---|---|
author | Xin Wang Yueqi Zhong |
author_facet | Xin Wang Yueqi Zhong |
author_sort | Xin Wang |
collection | DOAJ |
description | Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description. |
first_indexed | 2024-03-08T17:10:04Z |
format | Article |
id | doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T17:10:04Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb2024-01-04T00:02:40ZengIEEEIEEE Access2169-35362024-01-011228129310.1109/ACCESS.2023.334693310373836Text-Conditioned Outfit Recommendation With Hybrid Attention LayerXin Wang0https://orcid.org/0000-0002-4315-1867Yueqi Zhong1https://orcid.org/0000-0003-2056-7672College of Textiles, Donghua University, Shanghai, ChinaKey Laboratory of Textile Science and Technology, Ministry of Education, Shanghai, ChinaText-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description.https://ieeexplore.ieee.org/document/10373836/Fashion recommendationconditional recommendationmultimedia recommendationvisual fashion analysistransformer |
spellingShingle | Xin Wang Yueqi Zhong Text-Conditioned Outfit Recommendation With Hybrid Attention Layer IEEE Access Fashion recommendation conditional recommendation multimedia recommendation visual fashion analysis transformer |
title | Text-Conditioned Outfit Recommendation With Hybrid Attention Layer |
title_full | Text-Conditioned Outfit Recommendation With Hybrid Attention Layer |
title_fullStr | Text-Conditioned Outfit Recommendation With Hybrid Attention Layer |
title_full_unstemmed | Text-Conditioned Outfit Recommendation With Hybrid Attention Layer |
title_short | Text-Conditioned Outfit Recommendation With Hybrid Attention Layer |
title_sort | text conditioned outfit recommendation with hybrid attention layer |
topic | Fashion recommendation conditional recommendation multimedia recommendation visual fashion analysis transformer |
url | https://ieeexplore.ieee.org/document/10373836/ |
work_keys_str_mv | AT xinwang textconditionedoutfitrecommendationwithhybridattentionlayer AT yueqizhong textconditionedoutfitrecommendationwithhybridattentionlayer |