Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans

The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suf...

Full description

Bibliographic Details
Main Authors: Lyubomir Gotsev, Ivan Mitkov, Eugenia Kovatcheva, Boyan Jekov, Roumen Nikolov, Elena Shoikova, Milena Petkova
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2022-07-01
Series:IJAIN (International Journal of Advances in Intelligent Informatics)
Subjects:
Online Access:https://ijain.org/index.php/IJAIN/article/view/817
_version_ 1811261242382221312
author Lyubomir Gotsev
Ivan Mitkov
Eugenia Kovatcheva
Boyan Jekov
Roumen Nikolov
Elena Shoikova
Milena Petkova
author_facet Lyubomir Gotsev
Ivan Mitkov
Eugenia Kovatcheva
Boyan Jekov
Roumen Nikolov
Elena Shoikova
Milena Petkova
author_sort Lyubomir Gotsev
collection DOAJ
description The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.
first_indexed 2024-04-12T19:00:56Z
format Article
id doaj.art-60afb833dfab4bb0a1099e2c3984c064
institution Directory Open Access Journal
issn 2442-6571
2548-3161
language English
last_indexed 2024-04-12T19:00:56Z
publishDate 2022-07-01
publisher Universitas Ahmad Dahlan
record_format Article
series IJAIN (International Journal of Advances in Intelligent Informatics)
spelling doaj.art-60afb833dfab4bb0a1099e2c3984c0642022-12-22T03:20:09ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612022-07-018213515010.26555/ijain.v8i2.817205Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scansLyubomir Gotsev0Ivan Mitkov1Eugenia Kovatcheva2Boyan Jekov3Roumen Nikolov4Elena Shoikova5Milena Petkova6State University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaState University of Library Studies and Information Technologies, Sofia, BulgariaThe paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.https://ijain.org/index.php/IJAIN/article/view/817covid-19computed tomographyclusteringtransfer learningensemble learning
spellingShingle Lyubomir Gotsev
Ivan Mitkov
Eugenia Kovatcheva
Boyan Jekov
Roumen Nikolov
Elena Shoikova
Milena Petkova
Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
IJAIN (International Journal of Advances in Intelligent Informatics)
covid-19
computed tomography
clustering
transfer learning
ensemble learning
title Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
title_full Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
title_fullStr Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
title_full_unstemmed Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
title_short Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans
title_sort cluster analysis and ensemble transfer learning for covid 19 classification from computed tomography scans
topic covid-19
computed tomography
clustering
transfer learning
ensemble learning
url https://ijain.org/index.php/IJAIN/article/view/817
work_keys_str_mv AT lyubomirgotsev clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT ivanmitkov clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT eugeniakovatcheva clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT boyanjekov clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT roumennikolov clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT elenashoikova clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans
AT milenapetkova clusteranalysisandensembletransferlearningforcovid19classificationfromcomputedtomographyscans