Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles

Scientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Class...

Full description

Bibliographic Details
Main Authors: Aiman Hafeez, Tariq Ali, Asif Nawaz, Saif Ur Rehman, Azhar Imran Mudasir, Abdulaziz A. Alsulami, Ali Alqahtani
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10177756/
_version_ 1797772613073764352
author Aiman Hafeez
Tariq Ali
Asif Nawaz
Saif Ur Rehman
Azhar Imran Mudasir
Abdulaziz A. Alsulami
Ali Alqahtani
author_facet Aiman Hafeez
Tariq Ali
Asif Nawaz
Saif Ur Rehman
Azhar Imran Mudasir
Abdulaziz A. Alsulami
Ali Alqahtani
author_sort Aiman Hafeez
collection DOAJ
description Scientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Classification System (ACM CCS) and Bibsonomy. These taxonomies facilitate authors in the categories of their manuscripts. The incorporation of research work from a variety of domains in the assignment takes on the form of a Multi-Label Classification (MLC). Using MLC, it is possible to assign more than one class to a single document. To address the problem of MLC in its entirety, two distinct methods are used (Problem Transformation and Algorithm Adaptation). The MLC dataset is transformed into one or more single-label datasets through the application of the problem transformation technique. Whereas, a single classifier is modified during the algorithm adaptation process so that it can predict multiple labels. Currently, document classification is done using various techniques in the literature, but none of them paid much attention to the problem of imbalance in Multi-Label Datasets (MLD). However, many effective techniques for dealing with imbalance are available in the literature. The goal of this study is to find an effective technique for balancing datasets before multi-label classification to get better predictions for the classes with fewer instances. Six MLDs, nine transformation techniques and seven classifiers are evaluated in this research work. The proposed research will result in a more accurate recommendation of a research topic for a document. For imbalanced MLDs, LPROS is the best resampling technique using statistical tests. When compared to the other classifiers, the BRkNN classifier is better for MLC. This research will facilitate the classification of documents into their respective classes which can be used by various citation indexes.
first_indexed 2024-03-12T21:53:29Z
format Article
id doaj.art-bf4f0976099f4a099f08a377b4c2879d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-12T21:53:29Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-bf4f0976099f4a099f08a377b4c2879d2023-07-25T23:00:51ZengIEEEIEEE Access2169-35362023-01-0111745007451610.1109/ACCESS.2023.329385210177756Addressing Imbalance Problem for Multi Label Classification of Scholarly ArticlesAiman Hafeez0Tariq Ali1https://orcid.org/0000-0002-4974-1569Asif Nawaz2https://orcid.org/0000-0002-9920-8527Saif Ur Rehman3https://orcid.org/0000-0002-5810-6479Azhar Imran Mudasir4https://orcid.org/0000-0003-3598-2780Abdulaziz A. Alsulami5https://orcid.org/0000-0003-2931-8744Ali Alqahtani6https://orcid.org/0000-0002-7111-8810University Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanDepartment of Creative Technologies, Faculty of Computing and Artificial Intelligence, Air University, Islamabad, PakistanDepartment of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaDepartment of Networks and Communications Engineering, College of Computer Science and Information Systems, Najran University, Najran, Saudi ArabiaScientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Classification System (ACM CCS) and Bibsonomy. These taxonomies facilitate authors in the categories of their manuscripts. The incorporation of research work from a variety of domains in the assignment takes on the form of a Multi-Label Classification (MLC). Using MLC, it is possible to assign more than one class to a single document. To address the problem of MLC in its entirety, two distinct methods are used (Problem Transformation and Algorithm Adaptation). The MLC dataset is transformed into one or more single-label datasets through the application of the problem transformation technique. Whereas, a single classifier is modified during the algorithm adaptation process so that it can predict multiple labels. Currently, document classification is done using various techniques in the literature, but none of them paid much attention to the problem of imbalance in Multi-Label Datasets (MLD). However, many effective techniques for dealing with imbalance are available in the literature. The goal of this study is to find an effective technique for balancing datasets before multi-label classification to get better predictions for the classes with fewer instances. Six MLDs, nine transformation techniques and seven classifiers are evaluated in this research work. The proposed research will result in a more accurate recommendation of a research topic for a document. For imbalanced MLDs, LPROS is the best resampling technique using statistical tests. When compared to the other classifiers, the BRkNN classifier is better for MLC. This research will facilitate the classification of documents into their respective classes which can be used by various citation indexes.https://ieeexplore.ieee.org/document/10177756/Multi label classificationimbalanced datasetresamplingmulti label classifier
spellingShingle Aiman Hafeez
Tariq Ali
Asif Nawaz
Saif Ur Rehman
Azhar Imran Mudasir
Abdulaziz A. Alsulami
Ali Alqahtani
Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
IEEE Access
Multi label classification
imbalanced dataset
resampling
multi label classifier
title Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
title_full Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
title_fullStr Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
title_full_unstemmed Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
title_short Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
title_sort addressing imbalance problem for multi label classification of scholarly articles
topic Multi label classification
imbalanced dataset
resampling
multi label classifier
url https://ieeexplore.ieee.org/document/10177756/
work_keys_str_mv AT aimanhafeez addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT tariqali addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT asifnawaz addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT saifurrehman addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT azharimranmudasir addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT abdulazizaalsulami addressingimbalanceproblemformultilabelclassificationofscholarlyarticles
AT alialqahtani addressingimbalanceproblemformultilabelclassificationofscholarlyarticles