Text augmentation using a graph-based approach and clonal selection algorithm

Annotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with...

Full description

Bibliographic Details
Main Authors: Hadeer Ahmed, Issa Traore, Mohammad Mamun, Sherif Saad
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827023000051
_version_ 1797903548617326592
author Hadeer Ahmed
Issa Traore
Mohammad Mamun
Sherif Saad
author_facet Hadeer Ahmed
Issa Traore
Mohammad Mamun
Sherif Saad
author_sort Hadeer Ahmed
collection DOAJ
description Annotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with varying degrees of competency, training, and experience. These kinds of labels are incredibly problematic because they are defined and enforced by arbitrary and ambiguous standards. In order to solve these issues of insufficient high-quality labels, researchers are now investigating automated methods for enhancing training and testing data sets. In this paper, we demonstrate how our proposed method improves the quality and quantity of data in two cybersecurity problems (fake news identification & sensitive data leak) by employing the clonal selection algorithm (CLONALG) and abstract meaning representation (AMR) graphs, and how it improves the performance of a classifier by at least 5% on two datasets.
first_indexed 2024-04-10T09:34:42Z
format Article
id doaj.art-d3cd4511a5bb4fa4bd92c99b19f69d0c
institution Directory Open Access Journal
issn 2666-8270
language English
last_indexed 2024-04-10T09:34:42Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series Machine Learning with Applications
spelling doaj.art-d3cd4511a5bb4fa4bd92c99b19f69d0c2023-02-18T04:17:43ZengElsevierMachine Learning with Applications2666-82702023-03-0111100452Text augmentation using a graph-based approach and clonal selection algorithmHadeer Ahmed0Issa Traore1Mohammad Mamun2Sherif Saad3ECE Department, University of Victoria, British Columbia, Canada; Corresponding author.ECE Department, University of Victoria, British Columbia, CanadaNational Research Council Canada, New Brunswick, CanadaSchool of Computer Science, University of Windsor, Ontario, CanadaAnnotated data is critical for machine learning models, but producing large amounts of data with high-quality labeling is a time-consuming and labor-intensive process. Natural language processing (NLP) and machine learning models have traditionally relied on the labels given by human annotators with varying degrees of competency, training, and experience. These kinds of labels are incredibly problematic because they are defined and enforced by arbitrary and ambiguous standards. In order to solve these issues of insufficient high-quality labels, researchers are now investigating automated methods for enhancing training and testing data sets. In this paper, we demonstrate how our proposed method improves the quality and quantity of data in two cybersecurity problems (fake news identification & sensitive data leak) by employing the clonal selection algorithm (CLONALG) and abstract meaning representation (AMR) graphs, and how it improves the performance of a classifier by at least 5% on two datasets.http://www.sciencedirect.com/science/article/pii/S2666827023000051Data augmentationUnstructured dataCybersecurityText generationClonal selection
spellingShingle Hadeer Ahmed
Issa Traore
Mohammad Mamun
Sherif Saad
Text augmentation using a graph-based approach and clonal selection algorithm
Machine Learning with Applications
Data augmentation
Unstructured data
Cybersecurity
Text generation
Clonal selection
title Text augmentation using a graph-based approach and clonal selection algorithm
title_full Text augmentation using a graph-based approach and clonal selection algorithm
title_fullStr Text augmentation using a graph-based approach and clonal selection algorithm
title_full_unstemmed Text augmentation using a graph-based approach and clonal selection algorithm
title_short Text augmentation using a graph-based approach and clonal selection algorithm
title_sort text augmentation using a graph based approach and clonal selection algorithm
topic Data augmentation
Unstructured data
Cybersecurity
Text generation
Clonal selection
url http://www.sciencedirect.com/science/article/pii/S2666827023000051
work_keys_str_mv AT hadeerahmed textaugmentationusingagraphbasedapproachandclonalselectionalgorithm
AT issatraore textaugmentationusingagraphbasedapproachandclonalselectionalgorithm
AT mohammadmamun textaugmentationusingagraphbasedapproachandclonalselectionalgorithm
AT sherifsaad textaugmentationusingagraphbasedapproachandclonalselectionalgorithm