Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models’ robustness against adversarial attacks remains an area...

Full description

Bibliographic Details
Main Authors: Huidong Tang, Sayaka Kamei, Yasuhiko Morimoto
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/16/1/59
Description
Summary:Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models’ robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
ISSN:1999-4893