Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality

This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networ...

Full description

Bibliographic Details
Main Author:	Anindita, Roy
Other Authors:	Seah Hock Soon
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Augmented reality
Online Access:	https://hdl.handle.net/10356/175218

_version_	1811692048913268736
author	Anindita, Roy
author2	Seah Hock Soon
author_facet	Seah Hock Soon Anindita, Roy
author_sort	Anindita, Roy
collection	NTU
description	This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networks, MobileNet, EfficientNet, and ResNet-50. The study uses the HG14 dataset, a dataset tailored for hand interaction and application control in augmented reality applications. In the dataset, images taken from the first person view were used, especially for use with augmented reality applications and wearable technologies. The methodology used for this study involves training various models on the HG14 dataset and evaluating model performance using validation accuracies in addition to test evaluation metrics such as accuracy, precision, recall and F1-score. To optimize model performance, different techniques (including data augmentation, transfer learning and hyperparameter tuning) will be used. All architectures are implemented using Python on Jupyter Notebook and with access to a NVIDIA Tesla P100 GPU. Out of the architectures examined, MobileNet had the best performance due to its optimization for mobile and embedded applications. The efficiency of MobileNet can be attributed to its depth-wise separable convolutions which allow for high accuracy and low memory requirements. Transfer learning, with MobileNet pre-trained on the Image-Net dataset, was beneficial in generalizing to the HG14 dataset, with high validation accuracy and high test accuracy, precision, recall, and F1-score. MobileNet’s lightweight architecture led to faster model training, allowing for more flexibility in experimenting with different parameters to enhance model performance. EfficientNet, which uses a compound scaling algorithm that balances model complexity with efficiency, also proved to be a capable algorithm at IR tasks for AR applications. ResNet-50 showed comparable performance to both MobileNet and Efficient, although it had higher computational demands, which may limit its suitability in real-time AR scenarios. Basic CNN models showed lower performance due to hardware dependencies and the resource-intensive nature of their architecture, which also make them less suitable for real-time AR scenarios. The study also examined different techniques to improve model performance. Data augmentation yielded mixed results. EfficientNet was able to handle changes to the input data, but the other algorithms’ performances were negatively affected by the augmentation patterns. This highlighted the need to consider the nature of the dataset when implementing techniques to improve model performance. Hyperparameter tuning led to a notable increase in the performance of the base CNN model, but had negligible impact on the other architectures. Overall, transfer learning was found to be the technique which consistently had a positive impact on model performance across all four algorithms. The MobileNet transfer learning model was found to be the best choice for hand gesture recognition in AR applications. The model offers a good balance of accuracy and adaptability, which is important for real-time and lightweight applications. The results from this study provide valuable insights into how to select and optimize neural network architectures to perform IR tasks in AR contexts, laying the groundwork for future advancements in embedded AR systems. The research framework outlined provides a structured approach to conducting detailed comparative analyses of neural network based architectures, making it easier to make informed decisions when developing and deploying AR applications.
first_indexed	2024-10-01T06:29:36Z
format	Final Year Project (FYP)
id	ntu-10356/175218
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:29:36Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1752182024-05-10T15:40:23Z Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality Anindita, Roy Seah Hock Soon School of Computer Science and Engineering ASHSSEAH@ntu.edu.sg Computer and Information Science Augmented reality This study provides a comparative analysis of neural network architectures for image recognition tasks, particularly in the context of lightweight augmented reality applications, such as on mobile devices. The study focuses on four popular neural network architectures: Convolutional Neural Networks, MobileNet, EfficientNet, and ResNet-50. The study uses the HG14 dataset, a dataset tailored for hand interaction and application control in augmented reality applications. In the dataset, images taken from the first person view were used, especially for use with augmented reality applications and wearable technologies. The methodology used for this study involves training various models on the HG14 dataset and evaluating model performance using validation accuracies in addition to test evaluation metrics such as accuracy, precision, recall and F1-score. To optimize model performance, different techniques (including data augmentation, transfer learning and hyperparameter tuning) will be used. All architectures are implemented using Python on Jupyter Notebook and with access to a NVIDIA Tesla P100 GPU. Out of the architectures examined, MobileNet had the best performance due to its optimization for mobile and embedded applications. The efficiency of MobileNet can be attributed to its depth-wise separable convolutions which allow for high accuracy and low memory requirements. Transfer learning, with MobileNet pre-trained on the Image-Net dataset, was beneficial in generalizing to the HG14 dataset, with high validation accuracy and high test accuracy, precision, recall, and F1-score. MobileNet’s lightweight architecture led to faster model training, allowing for more flexibility in experimenting with different parameters to enhance model performance. EfficientNet, which uses a compound scaling algorithm that balances model complexity with efficiency, also proved to be a capable algorithm at IR tasks for AR applications. ResNet-50 showed comparable performance to both MobileNet and Efficient, although it had higher computational demands, which may limit its suitability in real-time AR scenarios. Basic CNN models showed lower performance due to hardware dependencies and the resource-intensive nature of their architecture, which also make them less suitable for real-time AR scenarios. The study also examined different techniques to improve model performance. Data augmentation yielded mixed results. EfficientNet was able to handle changes to the input data, but the other algorithms’ performances were negatively affected by the augmentation patterns. This highlighted the need to consider the nature of the dataset when implementing techniques to improve model performance. Hyperparameter tuning led to a notable increase in the performance of the base CNN model, but had negligible impact on the other architectures. Overall, transfer learning was found to be the technique which consistently had a positive impact on model performance across all four algorithms. The MobileNet transfer learning model was found to be the best choice for hand gesture recognition in AR applications. The model offers a good balance of accuracy and adaptability, which is important for real-time and lightweight applications. The results from this study provide valuable insights into how to select and optimize neural network architectures to perform IR tasks in AR contexts, laying the groundwork for future advancements in embedded AR systems. The research framework outlined provides a structured approach to conducting detailed comparative analyses of neural network based architectures, making it easier to make informed decisions when developing and deploying AR applications. Bachelor's degree 2024-04-21T12:57:02Z 2024-04-21T12:57:02Z 2024 Final Year Project (FYP) Anindita, R. (2024). Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175218 https://hdl.handle.net/10356/175218 en application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Augmented reality Anindita, Roy Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title	Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title_full	Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title_fullStr	Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title_full_unstemmed	Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title_short	Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
title_sort	comparative study of neural network architectures for lightweight image recognition tasks in augmented reality
topic	Computer and Information Science Augmented reality
url	https://hdl.handle.net/10356/175218
work_keys_str_mv	AT aninditaroy comparativestudyofneuralnetworkarchitecturesforlightweightimagerecognitiontasksinaugmentedreality

Comparative study of neural network architectures for lightweight image recognition tasks in augmented reality

Similar Items