Text augmentation using a graph-based approach and clonal selection algorithm
Annotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-03-01
|
Series: | Machine Learning with Applications |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666827023000051 |
_version_ | 1797903548617326592 |
---|---|
author | Hadeer Ahmed Issa Traore Mohammad Mamun Sherif Saad |
author_facet | Hadeer Ahmed Issa Traore Mohammad Mamun Sherif Saad |
author_sort | Hadeer Ahmed |
collection | DOAJ |
description | Annotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with varying degrees of competency, training, and experience. These kinds of labels are incredibly problematic because they are defined and enforced by arbitrary and ambiguous standards. In order to solve these issues of insufficient high-quality labels, researchers are now investigating automated methods for enhancing training and testing data sets. In this paper, we demonstrate how our proposed method improves the quality and quantity of data in two cybersecurity problems (fake news identification & sensitive data leak) by employing the clonal selection algorithm (CLONALG) and abstract meaning representation (AMR) graphs, and how it improves the performance of a classifier by at least 5% on two datasets. |
first_indexed | 2024-04-10T09:34:42Z |
format | Article |
id | doaj.art-d3cd4511a5bb4fa4bd92c99b19f69d0c |
institution | Directory Open Access Journal |
issn | 2666-8270 |
language | English |
last_indexed | 2024-04-10T09:34:42Z |
publishDate | 2023-03-01 |
publisher | Elsevier |
record_format | Article |
series | Machine Learning with Applications |
spelling | doaj.art-d3cd4511a5bb4fa4bd92c99b19f69d0c2023-02-18T04:17:43ZengElsevierMachine Learning with Applications2666-82702023-03-0111100452Text augmentation using a graph-based approach and clonal selection algorithmHadeer Ahmed0Issa Traore1Mohammad Mamun2Sherif Saad3ECE Department, University of Victoria, British Columbia, Canada; Corresponding author.ECE Department, University of Victoria, British Columbia, CanadaNational Research Council Canada, New Brunswick, CanadaSchool of Computer Science, University of Windsor, Ontario, CanadaAnnotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with varying degrees of competency, training, and experience. These kinds of labels are incredibly problematic because they are defined and enforced by arbitrary and ambiguous standards. In order to solve these issues of insufficient high-quality labels, researchers are now investigating automated methods for enhancing training and testing data sets. In this paper, we demonstrate how our proposed method improves the quality and quantity of data in two cybersecurity problems (fake news identification & sensitive data leak) by employing the clonal selection algorithm (CLONALG) and abstract meaning representation (AMR) graphs, and how it improves the performance of a classifier by at least 5% on two datasets.http://www.sciencedirect.com/science/article/pii/S2666827023000051Data augmentationUnstructured dataCybersecurityText generationClonal selection |
spellingShingle | Hadeer Ahmed Issa Traore Mohammad Mamun Sherif Saad Text augmentation using a graph-based approach and clonal selection algorithm Machine Learning with Applications Data augmentation Unstructured data Cybersecurity Text generation Clonal selection |
title | Text augmentation using a graph-based approach and clonal selection algorithm |
title_full | Text augmentation using a graph-based approach and clonal selection algorithm |
title_fullStr | Text augmentation using a graph-based approach and clonal selection algorithm |
title_full_unstemmed | Text augmentation using a graph-based approach and clonal selection algorithm |
title_short | Text augmentation using a graph-based approach and clonal selection algorithm |
title_sort | text augmentation using a graph based approach and clonal selection algorithm |
topic | Data augmentation Unstructured data Cybersecurity Text generation Clonal selection |
url | http://www.sciencedirect.com/science/article/pii/S2666827023000051 |
work_keys_str_mv | AT hadeerahmed textaugmentationusingagraphbasedapproachandclonalselectionalgorithm AT issatraore textaugmentationusingagraphbasedapproachandclonalselectionalgorithm AT mohammadmamun textaugmentationusingagraphbasedapproachandclonalselectionalgorithm AT sherifsaad textaugmentationusingagraphbasedapproachandclonalselectionalgorithm |