LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification
The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text class...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Magister Program of Material Sciences, Graduate School of Universitas Sriwijaya
2024-04-01
|
Series: | Science and Technology Indonesia |
Subjects: | |
Online Access: | https://sciencetechindonesia.com/index.php/jsti/article/view/1124 |
_version_ | 1797221959770046464 |
---|---|
author | Dian Kurniasari Warsono Mustofa Usman Favorisen Rosyking Lumbanraja Wamiliana |
author_facet | Dian Kurniasari Warsono Mustofa Usman Favorisen Rosyking Lumbanraja Wamiliana |
author_sort | Dian Kurniasari |
collection | DOAJ |
description | The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia. |
first_indexed | 2024-04-24T13:13:43Z |
format | Article |
id | doaj.art-327635c156534b30a65fc5525a8d4785 |
institution | Directory Open Access Journal |
issn | 2580-4405 2580-4391 |
language | English |
last_indexed | 2024-04-24T13:13:43Z |
publishDate | 2024-04-01 |
publisher | Magister Program of Material Sciences, Graduate School of Universitas Sriwijaya |
record_format | Article |
series | Science and Technology Indonesia |
spelling | doaj.art-327635c156534b30a65fc5525a8d47852024-04-04T23:39:37ZengMagister Program of Material Sciences, Graduate School of Universitas SriwijayaScience and Technology Indonesia2580-44052580-43912024-04-019227328310.26554/sti.2024.9.2.273-2831074LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data ClassificationDian Kurniasari0Warsono1Mustofa Usman2Favorisen Rosyking Lumbanraja3Wamiliana4Doctoral Student at the Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaDepartment of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Bandar Lampung, 35145, IndonesiaThe rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia.https://sciencetechindonesia.com/index.php/jsti/article/view/1124leukemiabiowordvechybrid lstm-cnnhybrid resampling |
spellingShingle | Dian Kurniasari Warsono Mustofa Usman Favorisen Rosyking Lumbanraja Wamiliana LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification Science and Technology Indonesia leukemia biowordvec hybrid lstm-cnn hybrid resampling |
title | LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification |
title_full | LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification |
title_fullStr | LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification |
title_full_unstemmed | LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification |
title_short | LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification |
title_sort | lstm cnn hybrid model performance improvement with biowordvec for biomedical report big data classification |
topic | leukemia biowordvec hybrid lstm-cnn hybrid resampling |
url | https://sciencetechindonesia.com/index.php/jsti/article/view/1124 |
work_keys_str_mv | AT diankurniasari lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification AT warsono lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification AT mustofausman lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification AT favorisenrosykinglumbanraja lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification AT wamiliana lstmcnnhybridmodelperformanceimprovementwithbiowordvecforbiomedicalreportbigdataclassification |