Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
Mixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9247093/ |
_version_ | 1818662628214439936 |
---|---|
author | Lixi Deng Xu Zhang Zhijie Shang |
author_facet | Lixi Deng Xu Zhang Zhijie Shang |
author_sort | Lixi Deng |
collection | DOAJ |
description | Mixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging task because of large visual variances among dishes in different canteens, which is known as the domain shifting problem. Since collecting and annotating sufficient training samples in each canteen for model training is difficult, a more practical way is developing detection models that can adapt quickly to cross-canteen mixed-dish detection with less supervision information. To this end, we propose a novel framework called Weakly-supervised Mean Teacher Network (WMT-Net) that addresses this specific detection task in a weakly supervised manner, where bounding box annotations are not required in the target domain. The proposed WMT-Net constructs Mean Teacher learning by maintaining the image-level consistency between teacher and student modules. Specifically, WMT-Net firstly learns instance-level information from the source dataset in a fully supervised fashion for the student model. Then the whole architecture is optimized with weakly supervised learning: 1) weakly supervised training in student model to reduce the domain gap in global semantics between source data and target data, 2) image-level consistency to align the image-level predictions between teacher model and student model. Experimental results on mixed-dish dataset show that even the proposed WMT-Net is trained in a weakly supervised fashion on the target domain, the performances attained by WMT-Net are very close to the model trained in a fully supervised fashion, which verify the effectiveness of WMT-Net. In addition, the proposed WMT-Net also achieves 44.6% mAP on Pascal VOC to Clipart cross-domain detection, which improves 7.2% mAP compared with the state-of-the-arts method and further demonstrates its generalization capabilities. |
first_indexed | 2024-12-17T05:03:58Z |
format | Article |
id | doaj.art-9e05ecb06e0a48ecaabd5a7b4a742863 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-17T05:03:58Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-9e05ecb06e0a48ecaabd5a7b4a7428632022-12-21T22:02:28ZengIEEEIEEE Access2169-35362020-01-01820123620124610.1109/ACCESS.2020.30357159247093Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-TeacherLixi Deng0https://orcid.org/0000-0002-3969-1940Xu Zhang1Zhijie Shang2Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaNational Computer Network Emergency Response Technical Team, Coordination Center of China, Beijing, ChinaInformation and Communication Branch, State Grid Corporation of China, Beijing, ChinaMixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging task because of large visual variances among dishes in different canteens, which is known as the domain shifting problem. Since collecting and annotating sufficient training samples in each canteen for model training is difficult, a more practical way is developing detection models that can adapt quickly to cross-canteen mixed-dish detection with less supervision information. To this end, we propose a novel framework called Weakly-supervised Mean Teacher Network (WMT-Net) that addresses this specific detection task in a weakly supervised manner, where bounding box annotations are not required in the target domain. The proposed WMT-Net constructs Mean Teacher learning by maintaining the image-level consistency between teacher and student modules. Specifically, WMT-Net firstly learns instance-level information from the source dataset in a fully supervised fashion for the student model. Then the whole architecture is optimized with weakly supervised learning: 1) weakly supervised training in student model to reduce the domain gap in global semantics between source data and target data, 2) image-level consistency to align the image-level predictions between teacher model and student model. Experimental results on mixed-dish dataset show that even the proposed WMT-Net is trained in a weakly supervised fashion on the target domain, the performances attained by WMT-Net are very close to the model trained in a fully supervised fashion, which verify the effectiveness of WMT-Net. In addition, the proposed WMT-Net also achieves 44.6% mAP on Pascal VOC to Clipart cross-domain detection, which improves 7.2% mAP compared with the state-of-the-arts method and further demonstrates its generalization capabilities.https://ieeexplore.ieee.org/document/9247093/Cross domaindetectionfood recognitionweakly supervised |
spellingShingle | Lixi Deng Xu Zhang Zhijie Shang Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher IEEE Access Cross domain detection food recognition weakly supervised |
title | Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher |
title_full | Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher |
title_fullStr | Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher |
title_full_unstemmed | Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher |
title_short | Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher |
title_sort | weakly supervised cross domain mixed dish detection with mean teacher |
topic | Cross domain detection food recognition weakly supervised |
url | https://ieeexplore.ieee.org/document/9247093/ |
work_keys_str_mv | AT lixideng weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher AT xuzhang weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher AT zhijieshang weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher |