Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles
Scientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Class...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10177756/ |
_version_ | 1797772613073764352 |
---|---|
author | Aiman Hafeez Tariq Ali Asif Nawaz Saif Ur Rehman Azhar Imran Mudasir Abdulaziz A. Alsulami Ali Alqahtani |
author_facet | Aiman Hafeez Tariq Ali Asif Nawaz Saif Ur Rehman Azhar Imran Mudasir Abdulaziz A. Alsulami Ali Alqahtani |
author_sort | Aiman Hafeez |
collection | DOAJ |
description | Scientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Classification System (ACM CCS) and Bibsonomy. These taxonomies facilitate authors in the categories of their manuscripts. The incorporation of research work from a variety of domains in the assignment takes on the form of a Multi-Label Classification (MLC). Using MLC, it is possible to assign more than one class to a single document. To address the problem of MLC in its entirety, two distinct methods are used (Problem Transformation and Algorithm Adaptation). The MLC dataset is transformed into one or more single-label datasets through the application of the problem transformation technique. Whereas, a single classifier is modified during the algorithm adaptation process so that it can predict multiple labels. Currently, document classification is done using various techniques in the literature, but none of them paid much attention to the problem of imbalance in Multi-Label Datasets (MLD). However, many effective techniques for dealing with imbalance are available in the literature. The goal of this study is to find an effective technique for balancing datasets before multi-label classification to get better predictions for the classes with fewer instances. Six MLDs, nine transformation techniques and seven classifiers are evaluated in this research work. The proposed research will result in a more accurate recommendation of a research topic for a document. For imbalanced MLDs, LPROS is the best resampling technique using statistical tests. When compared to the other classifiers, the BRkNN classifier is better for MLC. This research will facilitate the classification of documents into their respective classes which can be used by various citation indexes. |
first_indexed | 2024-03-12T21:53:29Z |
format | Article |
id | doaj.art-bf4f0976099f4a099f08a377b4c2879d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-12T21:53:29Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-bf4f0976099f4a099f08a377b4c2879d2023-07-25T23:00:51ZengIEEEIEEE Access2169-35362023-01-0111745007451610.1109/ACCESS.2023.329385210177756Addressing Imbalance Problem for Multi Label Classification of Scholarly ArticlesAiman Hafeez0Tariq Ali1https://orcid.org/0000-0002-4974-1569Asif Nawaz2https://orcid.org/0000-0002-9920-8527Saif Ur Rehman3https://orcid.org/0000-0002-5810-6479Azhar Imran Mudasir4https://orcid.org/0000-0003-3598-2780Abdulaziz A. Alsulami5https://orcid.org/0000-0003-2931-8744Ali Alqahtani6https://orcid.org/0000-0002-7111-8810University Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanUniversity Institute of Information Technology (UIIT), PMAS Arid Agriculture University, Rawalpindi, PakistanDepartment of Creative Technologies, Faculty of Computing and Artificial Intelligence, Air University, Islamabad, PakistanDepartment of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaDepartment of Networks and Communications Engineering, College of Computer Science and Information Systems, Najran University, Najran, Saudi ArabiaScientific document classification is an important field of machine learning. Currently, scientific document category identification is done manually. There are already defined taxonomies available for categorizing scientific documents, such as the Association for Computing Machinery Computing Classification System (ACM CCS) and Bibsonomy. These taxonomies facilitate authors in the categories of their manuscripts. The incorporation of research work from a variety of domains in the assignment takes on the form of a Multi-Label Classification (MLC). Using MLC, it is possible to assign more than one class to a single document. To address the problem of MLC in its entirety, two distinct methods are used (Problem Transformation and Algorithm Adaptation). The MLC dataset is transformed into one or more single-label datasets through the application of the problem transformation technique. Whereas, a single classifier is modified during the algorithm adaptation process so that it can predict multiple labels. Currently, document classification is done using various techniques in the literature, but none of them paid much attention to the problem of imbalance in Multi-Label Datasets (MLD). However, many effective techniques for dealing with imbalance are available in the literature. The goal of this study is to find an effective technique for balancing datasets before multi-label classification to get better predictions for the classes with fewer instances. Six MLDs, nine transformation techniques and seven classifiers are evaluated in this research work. The proposed research will result in a more accurate recommendation of a research topic for a document. For imbalanced MLDs, LPROS is the best resampling technique using statistical tests. When compared to the other classifiers, the BRkNN classifier is better for MLC. This research will facilitate the classification of documents into their respective classes which can be used by various citation indexes.https://ieeexplore.ieee.org/document/10177756/Multi label classificationimbalanced datasetresamplingmulti label classifier |
spellingShingle | Aiman Hafeez Tariq Ali Asif Nawaz Saif Ur Rehman Azhar Imran Mudasir Abdulaziz A. Alsulami Ali Alqahtani Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles IEEE Access Multi label classification imbalanced dataset resampling multi label classifier |
title | Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles |
title_full | Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles |
title_fullStr | Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles |
title_full_unstemmed | Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles |
title_short | Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles |
title_sort | addressing imbalance problem for multi label classification of scholarly articles |
topic | Multi label classification imbalanced dataset resampling multi label classifier |
url | https://ieeexplore.ieee.org/document/10177756/ |
work_keys_str_mv | AT aimanhafeez addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT tariqali addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT asifnawaz addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT saifurrehman addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT azharimranmudasir addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT abdulazizaalsulami addressingimbalanceproblemformultilabelclassificationofscholarlyarticles AT alialqahtani addressingimbalanceproblemformultilabelclassificationofscholarlyarticles |