Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models’ robustness against adversarial attacks remains an area...

Full description

Bibliographic Details
Main Authors: Huidong Tang, Sayaka Kamei, Yasuhiko Morimoto
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/16/1/59
_version_ 1797447026874515456
author Huidong Tang
Sayaka Kamei
Yasuhiko Morimoto
author_facet Huidong Tang
Sayaka Kamei
Yasuhiko Morimoto
author_sort Huidong Tang
collection DOAJ
description Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models’ robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
first_indexed 2024-03-09T13:49:55Z
format Article
id doaj.art-37238516e2ff434b84d904a3e9a821b3
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-09T13:49:55Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-37238516e2ff434b84d904a3e9a821b32023-11-30T20:51:52ZengMDPI AGAlgorithms1999-48932023-01-011615910.3390/a16010059Data Augmentation Methods for Enhancing Robustness in Text Classification TasksHuidong Tang0Sayaka Kamei1Yasuhiko Morimoto2Graduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, JapanGraduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, JapanGraduate School of Advanced Science and Engineering, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, JapanText classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models’ robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.https://www.mdpi.com/1999-4893/16/1/59artificial intelligencenatural language processingtext classificationdata augmentationrobustness improvement
spellingShingle Huidong Tang
Sayaka Kamei
Yasuhiko Morimoto
Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
Algorithms
artificial intelligence
natural language processing
text classification
data augmentation
robustness improvement
title Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
title_full Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
title_fullStr Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
title_full_unstemmed Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
title_short Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
title_sort data augmentation methods for enhancing robustness in text classification tasks
topic artificial intelligence
natural language processing
text classification
data augmentation
robustness improvement
url https://www.mdpi.com/1999-4893/16/1/59
work_keys_str_mv AT huidongtang dataaugmentationmethodsforenhancingrobustnessintextclassificationtasks
AT sayakakamei dataaugmentationmethodsforenhancingrobustnessintextclassificationtasks
AT yasuhikomorimoto dataaugmentationmethodsforenhancingrobustnessintextclassificationtasks