Survey on Sequence Data Augmentation

To pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data...

Full description

Bibliographic Details
Main Author: GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2021-07-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/CN/abstract/abstract2790.shtml
_version_ 1818647438456520704
author GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao
author_facet GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao
author_sort GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao
collection DOAJ
description To pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data in some specific fields limited by objective reasons. As a result, data insufficiency is a very common problem. Data augmentation is here to alleviate the problem by artificially generating new data. The success of data augmentation in the field of computer vision leads people to consider using similar methods on sequence data. In this paper, not only the time-domain methods such as flipping and cropping but also some augmentation methods in frequency domain are described. In addition to experience-based or knowledge-based methods, detailed descriptions on machine learning models used for automatic data generation such as GAN are also included. Methods that have been widely applied to various sequence data such as text, audio and time series are mentioned with their satisfactory performance in issues like medical diagnosis and emotion classification. Despite the difference in data type, these methods are designed with similar ideas. Using these ideas as a clue, various data augmentation methods applied to different types of sequence data are introduced, and some discussions and prospects are made.
first_indexed 2024-12-17T01:02:32Z
format Article
id doaj.art-83a3639ae0c54ed7ac698d82c6b37c6c
institution Directory Open Access Journal
issn 1673-9418
language zho
last_indexed 2024-12-17T01:02:32Z
publishDate 2021-07-01
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
record_format Article
series Jisuanji kexue yu tansuo
spelling doaj.art-83a3639ae0c54ed7ac698d82c6b37c6c2022-12-21T22:09:23ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182021-07-011571207121910.3778/j.issn.1673-9418.2012062Survey on Sequence Data AugmentationGE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao01. Science and Technology on Communication Information Security Control Laboratory, Jiaxing, Zhejiang 314033, China 2. No.36 Research Institute, China Electronics Technology Group Corporation, Jiaxing, Zhejiang 314033, China 3. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, ChinaTo pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data in some specific fields limited by objective reasons. As a result, data insufficiency is a very common problem. Data augmentation is here to alleviate the problem by artificially generating new data. The success of data augmentation in the field of computer vision leads people to consider using similar methods on sequence data. In this paper, not only the time-domain methods such as flipping and cropping but also some augmentation methods in frequency domain are described. In addition to experience-based or knowledge-based methods, detailed descriptions on machine learning models used for automatic data generation such as GAN are also included. Methods that have been widely applied to various sequence data such as text, audio and time series are mentioned with their satisfactory performance in issues like medical diagnosis and emotion classification. Despite the difference in data type, these methods are designed with similar ideas. Using these ideas as a clue, various data augmentation methods applied to different types of sequence data are introduced, and some discussions and prospects are made.http://fcst.ceaj.org/CN/abstract/abstract2790.shtmlsequence datadata augmentationdeep learning
spellingShingle GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao
Survey on Sequence Data Augmentation
Jisuanji kexue yu tansuo
sequence data
data augmentation
deep learning
title Survey on Sequence Data Augmentation
title_full Survey on Sequence Data Augmentation
title_fullStr Survey on Sequence Data Augmentation
title_full_unstemmed Survey on Sequence Data Augmentation
title_short Survey on Sequence Data Augmentation
title_sort survey on sequence data augmentation
topic sequence data
data augmentation
deep learning
url http://fcst.ceaj.org/CN/abstract/abstract2790.shtml
work_keys_str_mv AT geyizhouxuxiangyangsuorongzhouqingshenfurao surveyonsequencedataaugmentation