Text-Conditioned Outfit Recommendation With Hybrid Attention Layer

Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text de...

Full description

Bibliographic Details
Main Authors:	Xin Wang, Yueqi Zhong
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Fashion recommendation conditional recommendation multimedia recommendation visual fashion analysis transformer
Online Access:	https://ieeexplore.ieee.org/document/10373836/

_version_	1797366819518939136
author	Xin Wang Yueqi Zhong
author_facet	Xin Wang Yueqi Zhong
author_sort	Xin Wang
collection	DOAJ
description	Text-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description.
first_indexed	2024-03-08T17:10:04Z
format	Article
id	doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T17:10:04Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-ba203a0ee6ab4335a5d996e86f93d3eb2024-01-04T00:02:40ZengIEEEIEEE Access2169-35362024-01-011228129310.1109/ACCESS.2023.334693310373836Text-Conditioned Outfit Recommendation With Hybrid Attention LayerXin Wang0https://orcid.org/0000-0002-4315-1867Yueqi Zhong1https://orcid.org/0000-0003-2056-7672College of Textiles, Donghua University, Shanghai, ChinaKey Laboratory of Textile Science and Technology, Ministry of Education, Shanghai, ChinaText-conditioned outfit recommendation aims to recommend a whole fashion outfit that satisfies the compatibility between the recommended items and given items and adheres to the text condition like “Paradise Tropical Vacation” or “60s Style”. Using text description as a condition can provide users with a flexible and accurate way to retrieve and recommend fashion items but this problem is underexplored by existing studies. A challenge of text-conditioned outfit recommendation is how to encode and fuse the outfit text description and fashion item images and text. To solve this, this paper proposes a framework for this task which features a hybrid attention layer that constructs the relationship between outfit text description and fashion items for condition compliance, and the relationship between fashion items for internal compatibility. To encode fashion item features, our method uses pre-trained FashionCLIP as an extractor which significantly reduces the trainable parameters compared to previous methods training CNN from scratch. The whole outfits are generated by iteratively adding compatible items based on a given partial outfit. Compared with state-of-the-art methods on polyvore disjoint and non-disjoint datasets, our approach can achieve 3% relative improvement in compatibility prediction AUC, achieve 5% relative improvement in fill-in-the-blank accuracy; achieve 19% relative improvement on complementary item retrieval recall at different ranks in average. Besides, We demonstrate that our approach can recommend a whole outfit with inner compatibility and adhere to the text description.https://ieeexplore.ieee.org/document/10373836/Fashion recommendationconditional recommendationmultimedia recommendationvisual fashion analysistransformer
spellingShingle	Xin Wang Yueqi Zhong Text-Conditioned Outfit Recommendation With Hybrid Attention Layer IEEE Access Fashion recommendation conditional recommendation multimedia recommendation visual fashion analysis transformer
title	Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_full	Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_fullStr	Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_full_unstemmed	Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_short	Text-Conditioned Outfit Recommendation With Hybrid Attention Layer
title_sort	text conditioned outfit recommendation with hybrid attention layer
topic	Fashion recommendation conditional recommendation multimedia recommendation visual fashion analysis transformer
url	https://ieeexplore.ieee.org/document/10373836/
work_keys_str_mv	AT xinwang textconditionedoutfitrecommendationwithhybridattentionlayer AT yueqizhong textconditionedoutfitrecommendationwithhybridattentionlayer

Text-Conditioned Outfit Recommendation With Hybrid Attention Layer

Similar Items