On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19

The current COVID-19 pandemic, caused by the rapid worldwide spread of the SARS-CoV-2 virus, is having severe consequences for human health and the world economy. The virus affects different individuals differently, with many infected patients showing only mild symptoms, and others showing critical...

Full description

Bibliographic Details
Main Authors: Wenhuan Zeng, Anupam Gautam, Daniel H. Huson
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Computation
Subjects:
Online Access:https://www.mdpi.com/2079-3197/9/1/4
_version_ 1797415535544107008
author Wenhuan Zeng
Anupam Gautam
Daniel H. Huson
author_facet Wenhuan Zeng
Anupam Gautam
Daniel H. Huson
author_sort Wenhuan Zeng
collection DOAJ
description The current COVID-19 pandemic, caused by the rapid worldwide spread of the SARS-CoV-2 virus, is having severe consequences for human health and the world economy. The virus affects different individuals differently, with many infected patients showing only mild symptoms, and others showing critical illness. To lessen the impact of the epidemic, one problem is to determine which factors play an important role in a patient’s progression of the disease. Here, we construct an enhanced COVID-19 structured dataset from more than one source, using natural language processing to add local weather conditions and country-specific research sentiment. The enhanced structured dataset contains 301,363 samples and 43 features, and we applied both machine learning algorithms and deep learning algorithms on it so as to forecast patient’s survival probability. In addition, we import alignment sequence data to improve the performance of the model. Application of Extreme Gradient Boosting (XGBoost) on the enhanced structured dataset achieves 97% accuracy in predicting patient’s survival; with climatic factors, and then age, showing the most importance. Similarly, the application of a Multi-Layer Perceptron (MLP) achieves 98% accuracy. This work suggests that enhancing the available data, mostly basic information on patients, so as to include additional, potentially important features, such as weather conditions, is useful. The explored models suggest that textual weather descriptions can improve outcome forecast.
first_indexed 2024-03-09T05:50:01Z
format Article
id doaj.art-3f2573c06f9046fc9b8cae5baf8ed958
institution Directory Open Access Journal
issn 2079-3197
language English
last_indexed 2024-03-09T05:50:01Z
publishDate 2021-01-01
publisher MDPI AG
record_format Article
series Computation
spelling doaj.art-3f2573c06f9046fc9b8cae5baf8ed9582023-12-03T12:18:11ZengMDPI AGComputation2079-31972021-01-0191410.3390/computation9010004On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19Wenhuan Zeng0Anupam Gautam1Daniel H. Huson2Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076 Tübingen, GermanyInstitute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076 Tübingen, GermanyInstitute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076 Tübingen, GermanyThe current COVID-19 pandemic, caused by the rapid worldwide spread of the SARS-CoV-2 virus, is having severe consequences for human health and the world economy. The virus affects different individuals differently, with many infected patients showing only mild symptoms, and others showing critical illness. To lessen the impact of the epidemic, one problem is to determine which factors play an important role in a patient’s progression of the disease. Here, we construct an enhanced COVID-19 structured dataset from more than one source, using natural language processing to add local weather conditions and country-specific research sentiment. The enhanced structured dataset contains 301,363 samples and 43 features, and we applied both machine learning algorithms and deep learning algorithms on it so as to forecast patient’s survival probability. In addition, we import alignment sequence data to improve the performance of the model. Application of Extreme Gradient Boosting (XGBoost) on the enhanced structured dataset achieves 97% accuracy in predicting patient’s survival; with climatic factors, and then age, showing the most importance. Similarly, the application of a Multi-Layer Perceptron (MLP) achieves 98% accuracy. This work suggests that enhancing the available data, mostly basic information on patients, so as to include additional, potentially important features, such as weather conditions, is useful. The explored models suggest that textual weather descriptions can improve outcome forecast.https://www.mdpi.com/2079-3197/9/1/4COVID-19machine learningdeep learningNLPweathersentiment analysis
spellingShingle Wenhuan Zeng
Anupam Gautam
Daniel H. Huson
On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
Computation
COVID-19
machine learning
deep learning
NLP
weather
sentiment analysis
title On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
title_full On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
title_fullStr On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
title_full_unstemmed On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
title_short On the Application of Advanced Machine Learning Methods to Analyze Enhanced, Multimodal Data from Persons Infected with COVID-19
title_sort on the application of advanced machine learning methods to analyze enhanced multimodal data from persons infected with covid 19
topic COVID-19
machine learning
deep learning
NLP
weather
sentiment analysis
url https://www.mdpi.com/2079-3197/9/1/4
work_keys_str_mv AT wenhuanzeng ontheapplicationofadvancedmachinelearningmethodstoanalyzeenhancedmultimodaldatafrompersonsinfectedwithcovid19
AT anupamgautam ontheapplicationofadvancedmachinelearningmethodstoanalyzeenhancedmultimodaldatafrompersonsinfectedwithcovid19
AT danielhhuson ontheapplicationofadvancedmachinelearningmethodstoanalyzeenhancedmultimodaldatafrompersonsinfectedwithcovid19