Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher

Mixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging...

Full description

Bibliographic Details
Main Authors: Lixi Deng, Xu Zhang, Zhijie Shang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9247093/
_version_ 1818662628214439936
author Lixi Deng
Xu Zhang
Zhijie Shang
author_facet Lixi Deng
Xu Zhang
Zhijie Shang
author_sort Lixi Deng
collection DOAJ
description Mixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging task because of large visual variances among dishes in different canteens, which is known as the domain shifting problem. Since collecting and annotating sufficient training samples in each canteen for model training is difficult, a more practical way is developing detection models that can adapt quickly to cross-canteen mixed-dish detection with less supervision information. To this end, we propose a novel framework called Weakly-supervised Mean Teacher Network (WMT-Net) that addresses this specific detection task in a weakly supervised manner, where bounding box annotations are not required in the target domain. The proposed WMT-Net constructs Mean Teacher learning by maintaining the image-level consistency between teacher and student modules. Specifically, WMT-Net firstly learns instance-level information from the source dataset in a fully supervised fashion for the student model. Then the whole architecture is optimized with weakly supervised learning: 1) weakly supervised training in student model to reduce the domain gap in global semantics between source data and target data, 2) image-level consistency to align the image-level predictions between teacher model and student model. Experimental results on mixed-dish dataset show that even the proposed WMT-Net is trained in a weakly supervised fashion on the target domain, the performances attained by WMT-Net are very close to the model trained in a fully supervised fashion, which verify the effectiveness of WMT-Net. In addition, the proposed WMT-Net also achieves 44.6% mAP on Pascal VOC to Clipart cross-domain detection, which improves 7.2% mAP compared with the state-of-the-arts method and further demonstrates its generalization capabilities.
first_indexed 2024-12-17T05:03:58Z
format Article
id doaj.art-9e05ecb06e0a48ecaabd5a7b4a742863
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:03:58Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-9e05ecb06e0a48ecaabd5a7b4a7428632022-12-21T22:02:28ZengIEEEIEEE Access2169-35362020-01-01820123620124610.1109/ACCESS.2020.30357159247093Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-TeacherLixi Deng0https://orcid.org/0000-0002-3969-1940Xu Zhang1Zhijie Shang2Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, ChinaNational Computer Network Emergency Response Technical Team, Coordination Center of China, Beijing, ChinaInformation and Communication Branch, State Grid Corporation of China, Beijing, ChinaMixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging task because of large visual variances among dishes in different canteens, which is known as the domain shifting problem. Since collecting and annotating sufficient training samples in each canteen for model training is difficult, a more practical way is developing detection models that can adapt quickly to cross-canteen mixed-dish detection with less supervision information. To this end, we propose a novel framework called Weakly-supervised Mean Teacher Network (WMT-Net) that addresses this specific detection task in a weakly supervised manner, where bounding box annotations are not required in the target domain. The proposed WMT-Net constructs Mean Teacher learning by maintaining the image-level consistency between teacher and student modules. Specifically, WMT-Net firstly learns instance-level information from the source dataset in a fully supervised fashion for the student model. Then the whole architecture is optimized with weakly supervised learning: 1) weakly supervised training in student model to reduce the domain gap in global semantics between source data and target data, 2) image-level consistency to align the image-level predictions between teacher model and student model. Experimental results on mixed-dish dataset show that even the proposed WMT-Net is trained in a weakly supervised fashion on the target domain, the performances attained by WMT-Net are very close to the model trained in a fully supervised fashion, which verify the effectiveness of WMT-Net. In addition, the proposed WMT-Net also achieves 44.6% mAP on Pascal VOC to Clipart cross-domain detection, which improves 7.2% mAP compared with the state-of-the-arts method and further demonstrates its generalization capabilities.https://ieeexplore.ieee.org/document/9247093/Cross domaindetectionfood recognitionweakly supervised
spellingShingle Lixi Deng
Xu Zhang
Zhijie Shang
Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
IEEE Access
Cross domain
detection
food recognition
weakly supervised
title Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
title_full Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
title_fullStr Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
title_full_unstemmed Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
title_short Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher
title_sort weakly supervised cross domain mixed dish detection with mean teacher
topic Cross domain
detection
food recognition
weakly supervised
url https://ieeexplore.ieee.org/document/9247093/
work_keys_str_mv AT lixideng weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher
AT xuzhang weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher
AT zhijieshang weaklysupervisedcrossdomainmixeddishdetectionwithmeanteacher