Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model

Scientific articles now have multidisciplinary content. These make it difficult for researchers to find out relevant information. Some submissions are irrelevant to the journal's discipline. Categorizing articles and assessing their relevance can aid researchers and journals. Existing research...

Full description

Bibliographic Details
Main Authors: Antonius Angga Kurniawan, Sarifuddin Madenda, Setia Wirawan, Ruddy J. Suhatril
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2023-07-01
Series:IJAIN (International Journal of Advances in Intelligent Informatics)
Subjects:
Online Access:http://ijain.org/index.php/IJAIN/article/view/1051
_version_ 1827793600853835776
author Antonius Angga Kurniawan
Sarifuddin Madenda
Setia Wirawan
Ruddy J. Suhatril
author_facet Antonius Angga Kurniawan
Sarifuddin Madenda
Setia Wirawan
Ruddy J. Suhatril
author_sort Antonius Angga Kurniawan
collection DOAJ
description Scientific articles now have multidisciplinary content. These make it difficult for researchers to find out relevant information. Some submissions are irrelevant to the journal's discipline. Categorizing articles and assessing their relevance can aid researchers and journals. Existing research still focuses on single-category predictive outcomes. Therefore, this research takes a new approach by applying a multidisciplinary classification for Indonesian scientific article abstracts using a pre-trained BERT model, showing the relevance between each category in an abstract. The dataset used was 9,000 abstracts with 9 disciplinary categories. On the dataset, text preprocessing is performed. The classification model was built by combining the pre-trained BERT model with Artificial Neural Network. Fine-tuning the hyperparameters is done to determine the most optimal hyperparameter combination for the model. The hyperparameters consist of batch size, learning rate, number of epochs, and data ratio. The best hyperparameter combination is a learning rate of 1e-5, batch size 32, epochs 3, and data ratio 9:1, with a validation accuracy value of 90.8%. The confusion matrix results of the model are compared with the confusion matrix results by experts. In this case, the highest accuracy result obtained by the model is 99.56%. A software prototype used the most accurate model to classify new data, displaying the top two prediction probabilities and the dominant category. This research produces a model that can be used to solve Indonesian text classification-related problems.
first_indexed 2024-03-11T18:21:47Z
format Article
id doaj.art-1d7137a6648841a68bf5badf12521862
institution Directory Open Access Journal
issn 2442-6571
2548-3161
language English
last_indexed 2024-03-11T18:21:47Z
publishDate 2023-07-01
publisher Universitas Ahmad Dahlan
record_format Article
series IJAIN (International Journal of Advances in Intelligent Informatics)
spelling doaj.art-1d7137a6648841a68bf5badf125218622023-10-15T04:42:26ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612023-07-019233134610.26555/ijain.v9i2.1051255Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT modelAntonius Angga Kurniawan0Sarifuddin Madenda1Setia Wirawan2Ruddy J. Suhatril3Universitas GunadarmaUniversitas GunadarmaUniversitas GunadarmaUniversitas GunadarmaScientific articles now have multidisciplinary content. These make it difficult for researchers to find out relevant information. Some submissions are irrelevant to the journal's discipline. Categorizing articles and assessing their relevance can aid researchers and journals. Existing research still focuses on single-category predictive outcomes. Therefore, this research takes a new approach by applying a multidisciplinary classification for Indonesian scientific article abstracts using a pre-trained BERT model, showing the relevance between each category in an abstract. The dataset used was 9,000 abstracts with 9 disciplinary categories. On the dataset, text preprocessing is performed. The classification model was built by combining the pre-trained BERT model with Artificial Neural Network. Fine-tuning the hyperparameters is done to determine the most optimal hyperparameter combination for the model. The hyperparameters consist of batch size, learning rate, number of epochs, and data ratio. The best hyperparameter combination is a learning rate of 1e-5, batch size 32, epochs 3, and data ratio 9:1, with a validation accuracy value of 90.8%. The confusion matrix results of the model are compared with the confusion matrix results by experts. In this case, the highest accuracy result obtained by the model is 99.56%. A software prototype used the most accurate model to classify new data, displaying the top two prediction probabilities and the dominant category. This research produces a model that can be used to solve Indonesian text classification-related problems.http://ijain.org/index.php/IJAIN/article/view/1051abstractbertclassificationhyperparameter-tuningmultidiciplinary
spellingShingle Antonius Angga Kurniawan
Sarifuddin Madenda
Setia Wirawan
Ruddy J. Suhatril
Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
IJAIN (International Journal of Advances in Intelligent Informatics)
abstract
bert
classification
hyperparameter-tuning
multidiciplinary
title Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
title_full Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
title_fullStr Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
title_full_unstemmed Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
title_short Multidisciplinary classification for Indonesian scientific articles abstract using pre-trained BERT model
title_sort multidisciplinary classification for indonesian scientific articles abstract using pre trained bert model
topic abstract
bert
classification
hyperparameter-tuning
multidiciplinary
url http://ijain.org/index.php/IJAIN/article/view/1051
work_keys_str_mv AT antoniusanggakurniawan multidisciplinaryclassificationforindonesianscientificarticlesabstractusingpretrainedbertmodel
AT sarifuddinmadenda multidisciplinaryclassificationforindonesianscientificarticlesabstractusingpretrainedbertmodel
AT setiawirawan multidisciplinaryclassificationforindonesianscientificarticlesabstractusingpretrainedbertmodel
AT ruddyjsuhatril multidisciplinaryclassificationforindonesianscientificarticlesabstractusingpretrainedbertmodel