Masked Autoencoders in Computer Vision: A Comprehensive Survey

Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In t...

Full description

Bibliographic Details
Main Authors:	Zexian Zhou, Xiaojing Liu
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Computer vision survey MAE masked autoencoders masked image modeling
Online Access:	https://ieeexplore.ieee.org/document/10278410/

_version_	1797655648967589888
author	Zexian Zhou Xiaojing Liu
author_facet	Zexian Zhou Xiaojing Liu
author_sort	Zexian Zhou
collection	DOAJ
description	Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.
first_indexed	2024-03-11T17:17:31Z
format	Article
id	doaj.art-76f70a1ab61048a9a49f27e20e803fc4
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-11T17:17:31Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-76f70a1ab61048a9a49f27e20e803fc42023-10-19T23:01:40ZengIEEEIEEE Access2169-35362023-01-011111356011357910.1109/ACCESS.2023.332338310278410Masked Autoencoders in Computer Vision: A Comprehensive SurveyZexian Zhou0https://orcid.org/0000-0001-5948-2102Xiaojing Liu1https://orcid.org/0000-0002-5571-4735Department of Computer Technology and Application, Qinghai University, Xining, ChinaDepartment of Computer Technology and Application, Qinghai University, Xining, ChinaMasked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.https://ieeexplore.ieee.org/document/10278410/Computer vision surveyMAEmasked autoencodersmasked image modeling
spellingShingle	Zexian Zhou Xiaojing Liu Masked Autoencoders in Computer Vision: A Comprehensive Survey IEEE Access Computer vision survey MAE masked autoencoders masked image modeling
title	Masked Autoencoders in Computer Vision: A Comprehensive Survey
title_full	Masked Autoencoders in Computer Vision: A Comprehensive Survey
title_fullStr	Masked Autoencoders in Computer Vision: A Comprehensive Survey
title_full_unstemmed	Masked Autoencoders in Computer Vision: A Comprehensive Survey
title_short	Masked Autoencoders in Computer Vision: A Comprehensive Survey
title_sort	masked autoencoders in computer vision a comprehensive survey
topic	Computer vision survey MAE masked autoencoders masked image modeling
url	https://ieeexplore.ieee.org/document/10278410/
work_keys_str_mv	AT zexianzhou maskedautoencodersincomputervisionacomprehensivesurvey AT xiaojingliu maskedautoencodersincomputervisionacomprehensivesurvey

Masked Autoencoders in Computer Vision: A Comprehensive Survey

Similar Items