Visual food recognition using artificial intelligence

Food-related research gains increasing attention for its importance in people's daily life. Proper understanding of daily food intake benefits not only individual's personal health but also the collective good of the society. Visual food classification is to recognize different food dishes...

Full description

Bibliographic Details
Main Author: Zhao, Heng
Other Authors: Yap Kim Hui
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161182
_version_ 1811694359424270336
author Zhao, Heng
author2 Yap Kim Hui
author_facet Yap Kim Hui
Zhao, Heng
author_sort Zhao, Heng
collection NTU
description Food-related research gains increasing attention for its importance in people's daily life. Proper understanding of daily food intake benefits not only individual's personal health but also the collective good of the society. Visual food classification is to recognize different food dishes from pictures. Early approaches of visual food classification focus on traditional classification methods that are based on hand-crafted image features. The more recent progress made in visual food classification uses various popular deep architectures such as convolutional neural networks (CNNs) and performs the classification directly using visual image features extracted from the trained network. In this thesis, we will explore different issues in visual food classification using deep architectures. Our first work in Chapter 3 aims to develop a compact network for mobile visual food recognition. We propose a joint-learning distilled network (JDNet) that targets to achieve a high food classification accuracy using a compact student network by learning from a large teacher network. Both networks are trained simultaneously while knowledge are transferred between them using proposed knowledge distillation (KD) techniques. With the joint model, we achieve strong performance as compared to the state-of-the-art large models. Chapter 4 and 5 address the issue of data scarcity that is often encountered in visual food classification. Known frameworks and approaches have heavy reliance on many-shot training of a deep network on existing large-scale food datasets. However, it is common for many food categories that it is difficult to collect a large number of images for training. In view of the situation, we study the task of visual food classification under low-shot learning scenarios: 1) few-shot learning (FSL) that performs the classification using only a few labeled samples per category; 2) zero-shot learning (ZSL) that aims to classify new categories that are unseen during network training. Our second work in Chapter 4 aims to integrate few-shot and many-shot learning. Traditional few-shot learning is unable to properly address the problem of visual food classification due to the complex characteristics and large variations of food images. In addition, most few-shot frameworks cannot perform classification for many-shot and few-shot categories at the same time. Hence, we propose a fusion learning method that unifies many-shot and few-shot under a single framework. It leverages image features and text embeddings, and adopts a graph convolutional network (GCN) to capture inter-class correlations between different food categories. Our method achieves state-of-the-art few-shot and fusion classification performance on several food benchmark datasets. Our third work in Chapter 5 focuses on zero-shot learning, where food images are not available for new categories during network training and hence semantic information plays an important role. We propose a bi-directional visual-semantic autoencoder network (VSAN) that is dedicated to explore the rich visual-semantic interactions and generate discriminative representations in both visual and semantic spaces. VSAN aims to generate discriminative visual feature that incorporates semantic information using a proposed attribute autoencoder network, and to generate new semantic attribute and class label embeddings that preserve visual relations across different classes by a proposed visual hierarchy. Comprehensive experiments on 4 benchmark datasets demonstrate the superior performance of VSAN against prior zero-shot learning works. In summary, this thesis aims to address different aspects and tasks of visual food classification by using different deep learning techniques. We propose the following methods and frameworks: 1) a joint-learning distilled network (JDNet) for compact network design; 2) a fusion learning framework for the integration of few-shot and many-shot classification and 3) a bi-directional visual-semantic autoencoder network (VSAN) for zero-shot learning.
first_indexed 2024-10-01T07:06:19Z
format Thesis-Doctor of Philosophy
id ntu-10356/161182
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:06:19Z
publishDate 2022
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1611822022-09-01T02:33:19Z Visual food recognition using artificial intelligence Zhao, Heng Yap Kim Hui School of Electrical and Electronic Engineering EKHYap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Food-related research gains increasing attention for its importance in people's daily life. Proper understanding of daily food intake benefits not only individual's personal health but also the collective good of the society. Visual food classification is to recognize different food dishes from pictures. Early approaches of visual food classification focus on traditional classification methods that are based on hand-crafted image features. The more recent progress made in visual food classification uses various popular deep architectures such as convolutional neural networks (CNNs) and performs the classification directly using visual image features extracted from the trained network. In this thesis, we will explore different issues in visual food classification using deep architectures. Our first work in Chapter 3 aims to develop a compact network for mobile visual food recognition. We propose a joint-learning distilled network (JDNet) that targets to achieve a high food classification accuracy using a compact student network by learning from a large teacher network. Both networks are trained simultaneously while knowledge are transferred between them using proposed knowledge distillation (KD) techniques. With the joint model, we achieve strong performance as compared to the state-of-the-art large models. Chapter 4 and 5 address the issue of data scarcity that is often encountered in visual food classification. Known frameworks and approaches have heavy reliance on many-shot training of a deep network on existing large-scale food datasets. However, it is common for many food categories that it is difficult to collect a large number of images for training. In view of the situation, we study the task of visual food classification under low-shot learning scenarios: 1) few-shot learning (FSL) that performs the classification using only a few labeled samples per category; 2) zero-shot learning (ZSL) that aims to classify new categories that are unseen during network training. Our second work in Chapter 4 aims to integrate few-shot and many-shot learning. Traditional few-shot learning is unable to properly address the problem of visual food classification due to the complex characteristics and large variations of food images. In addition, most few-shot frameworks cannot perform classification for many-shot and few-shot categories at the same time. Hence, we propose a fusion learning method that unifies many-shot and few-shot under a single framework. It leverages image features and text embeddings, and adopts a graph convolutional network (GCN) to capture inter-class correlations between different food categories. Our method achieves state-of-the-art few-shot and fusion classification performance on several food benchmark datasets. Our third work in Chapter 5 focuses on zero-shot learning, where food images are not available for new categories during network training and hence semantic information plays an important role. We propose a bi-directional visual-semantic autoencoder network (VSAN) that is dedicated to explore the rich visual-semantic interactions and generate discriminative representations in both visual and semantic spaces. VSAN aims to generate discriminative visual feature that incorporates semantic information using a proposed attribute autoencoder network, and to generate new semantic attribute and class label embeddings that preserve visual relations across different classes by a proposed visual hierarchy. Comprehensive experiments on 4 benchmark datasets demonstrate the superior performance of VSAN against prior zero-shot learning works. In summary, this thesis aims to address different aspects and tasks of visual food classification by using different deep learning techniques. We propose the following methods and frameworks: 1) a joint-learning distilled network (JDNet) for compact network design; 2) a fusion learning framework for the integration of few-shot and many-shot classification and 3) a bi-directional visual-semantic autoencoder network (VSAN) for zero-shot learning. Doctor of Philosophy 2022-08-18T08:21:53Z 2022-08-18T08:21:53Z 2022 Thesis-Doctor of Philosophy Zhao, H. (2022). Visual food recognition using artificial intelligence. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/161182 https://hdl.handle.net/10356/161182 10.32657/10356/161182 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Electrical and electronic engineering
Zhao, Heng
Visual food recognition using artificial intelligence
title Visual food recognition using artificial intelligence
title_full Visual food recognition using artificial intelligence
title_fullStr Visual food recognition using artificial intelligence
title_full_unstemmed Visual food recognition using artificial intelligence
title_short Visual food recognition using artificial intelligence
title_sort visual food recognition using artificial intelligence
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Electrical and electronic engineering
url https://hdl.handle.net/10356/161182
work_keys_str_mv AT zhaoheng visualfoodrecognitionusingartificialintelligence