LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification

The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text class...

Full description

Bibliographic Details
Main Authors: Dian Kurniasari, Warsono, Mustofa Usman, Favorisen Rosyking Lumbanraja, Wamiliana
Format: Article
Language:English
Published: Magister Program of Material Sciences, Graduate School of Universitas Sriwijaya 2024-04-01
Series:Science and Technology Indonesia
Subjects:
Online Access:https://sciencetechindonesia.com/index.php/jsti/article/view/1124
_version_ 1797221959770046464
author Dian Kurniasari
Warsono
Mustofa Usman
Favorisen Rosyking Lumbanraja
Wamiliana
author_facet Dian Kurniasari
Warsono
Mustofa Usman
Favorisen Rosyking Lumbanraja
Wamiliana
author_sort Dian Kurniasari
collection DOAJ
description The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia.
first_indexed 2024-04-24T13:13:43Z
format Article
id doaj.art-327635c156534b30a65fc5525a8d4785
institution Directory Open Access Journal
issn 2580-4405
2580-4391
language English
last_indexed 2024-04-24T13:13:43Z
publishDate 2024-04-01
publisher Magister Program of Material Sciences, Graduate School of Universitas Sriwijaya
record_format Article
series Science and Technology Indonesia
spelling doaj.art-327635c156534b30a65fc5525a8d47852024-04-04T23:39:37ZengMagister Program of Material Sciences, Graduate School of Universitas SriwijayaScience and Technology Indonesia2580-44052580-43912024-04-019227328310.26554/sti.2024.9.2.273-2831074LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data ClassificationDian Kurniasari0Warsono1Mustofa Usman2Favorisen Rosyking Lumbanraja3Wamiliana4Doctoral Student at the Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaThe rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia.https://sciencetechindonesia.com/index.php/jsti/article/view/1124leukemiabiowordvechybrid lstm-cnnhybrid resampling
spellingShingle Dian Kurniasari
Warsono
Mustofa Usman
Favorisen Rosyking Lumbanraja
Wamiliana
LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
Science and Technology Indonesia
leukemia
biowordvec
hybrid lstm-cnn
hybrid resampling
title LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
title_full LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
title_fullStr LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
title_full_unstemmed LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
title_short LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
title_sort lstm cnn hybrid model performance improvement with biowordvec for biomedical report big data classification
topic leukemia
biowordvec
hybrid lstm-cnn
hybrid resampling
url https://sciencetechindonesia.com/index.php/jsti/article/view/1124
work_keys_str_mv AT diankurniasari lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification
AT warsono lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification
AT mustofausman lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification
AT favorisenrosykinglumbanraja lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification
AT wamiliana lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification