Text-Conditioned Outfit Recommendation With Hybrid Attention Layer

Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text de...

Full description

Bibliographic Details
Main Authors: Xin Wang, Yueqi Zhong
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10373836/
_version_ 1797366819518939136
author Xin Wang
Yueqi Zhong
author_facet Xin Wang
Yueqi Zhong
author_sort Xin Wang
collection DOAJ
description Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description.
first_indexed 2024-03-08T17:10:04Z
format Article
id doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T17:10:04Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb2024-01-04T00:02:40ZengIEEEIEEE Access2169-35362024-01-011228129310.1109/ACCESS.2023.334693310373836Text-Conditioned Outfit Recommendation With Hybrid Attention LayerXin Wang0https://orcid.org/0000-0002-4315-1867Yueqi Zhong1https://orcid.org/0000-0003-2056-7672College of Textiles, Donghua University, Shanghai, ChinaKey Laboratory of Textile Science and Technology, Ministry of Education, Shanghai, ChinaText-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description.https://ieeexplore.ieee.org/document/10373836/Fashion recommendationconditional recommendationmultimedia recommendationvisual fashion analysistransformer
spellingShingle Xin Wang
Yueqi Zhong
Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
IEEE Access
Fashion recommendation
conditional recommendation
multimedia recommendation
visual fashion analysis
transformer
title Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_full Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_fullStr Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_full_unstemmed Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_short Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_sort text conditioned outfit recommendation with hybrid attention layer
topic Fashion recommendation
conditional recommendation
multimedia recommendation
visual fashion analysis
transformer
url https://ieeexplore.ieee.org/document/10373836/
work_keys_str_mv AT xinwang textconditionedoutfitrecommendationwithhybridattentionlayer
AT yueqizhong textconditionedoutfitrecommendationwithhybridattentionlayer