EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.

Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label...

Full description

Bibliographic Details
Main Authors: Antonio Fernando Lavareda Jacob Junior, Fabricio Almeida do Carmo, Adamo Lima de Santana, Ewaldo Eder Carvalho Santana, Fabio Manoel Franca Lobato
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297147&type=printable
_version_ 1827376841118187520
author Antonio Fernando Lavareda Jacob Junior
Fabricio Almeida do Carmo
Adamo Lima de Santana
Ewaldo Eder Carvalho Santana
Fabio Manoel Franca Lobato
author_facet Antonio Fernando Lavareda Jacob Junior
Fabricio Almeida do Carmo
Adamo Lima de Santana
Ewaldo Eder Carvalho Santana
Fabio Manoel Franca Lobato
author_sort Antonio Fernando Lavareda Jacob Junior
collection DOAJ
description Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be "drama" and "bibliography" simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi-label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.
first_indexed 2024-03-08T12:29:40Z
format Article
id doaj.art-de743979045a47dc9e3f72e23b035bbd
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-03-08T12:29:40Z
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-de743979045a47dc9e3f72e23b035bbd2024-01-22T05:31:23ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01191e029714710.1371/journal.pone.0297147EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.Antonio Fernando Lavareda Jacob JuniorFabricio Almeida do CarmoAdamo Lima de SantanaEwaldo Eder Carvalho SantanaFabio Manoel Franca LobatoMissing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be "drama" and "bibliography" simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi-label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297147&type=printable
spellingShingle Antonio Fernando Lavareda Jacob Junior
Fabricio Almeida do Carmo
Adamo Lima de Santana
Ewaldo Eder Carvalho Santana
Fabio Manoel Franca Lobato
EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
PLoS ONE
title EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
title_full EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
title_fullStr EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
title_full_unstemmed EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
title_short EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.
title_sort evoimp multiple imputation of multi label classification data with a genetic algorithm
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297147&type=printable
work_keys_str_mv AT antoniofernandolavaredajacobjunior evoimpmultipleimputationofmultilabelclassificationdatawithageneticalgorithm
AT fabricioalmeidadocarmo evoimpmultipleimputationofmultilabelclassificationdatawithageneticalgorithm
AT adamolimadesantana evoimpmultipleimputationofmultilabelclassificationdatawithageneticalgorithm
AT ewaldoedercarvalhosantana evoimpmultipleimputationofmultilabelclassificationdatawithageneticalgorithm
AT fabiomanoelfrancalobato evoimpmultipleimputationofmultilabelclassificationdatawithageneticalgorithm