Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and No...

Full description

Bibliographic Details
Main Authors: João Antas, Rodrigo Rocha Silva, Jorge Bernardino
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/11/2/29
_version_ 1797481546709467136
author João Antas
Rodrigo Rocha Silva
Jorge Bernardino
author_facet João Antas
Rodrigo Rocha Silva
Jorge Bernardino
author_sort João Antas
collection DOAJ
description COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.
first_indexed 2024-03-09T22:16:09Z
format Article
id doaj.art-4f8708967c85482495b0f11887330756
institution Directory Open Access Journal
issn 2073-431X
language English
last_indexed 2024-03-09T22:16:09Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj.art-4f8708967c85482495b0f118873307562023-11-23T19:22:52ZengMDPI AGComputers2073-431X2022-02-011122910.3390/computers11020029Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 DataJoão Antas0Rodrigo Rocha Silva1Jorge Bernardino2Polytechnic of Coimbra, Coimbra Institute of Engineering (ISEC), 3030-199 Coimbra, PortugalCentre of Informatics and Systems of University of Coimbra (CISUC), 3030-290 Coimbra, PortugalPolytechnic of Coimbra, Coimbra Institute of Engineering (ISEC), 3030-199 Coimbra, PortugalCOVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.https://www.mdpi.com/2073-431X/11/2/29big dataCOVID-19Data MiningSQL and NoSQL databases
spellingShingle João Antas
Rodrigo Rocha Silva
Jorge Bernardino
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
Computers
big data
COVID-19
Data Mining
SQL and NoSQL databases
title Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_full Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_fullStr Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_full_unstemmed Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_short Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_sort assessment of sql and nosql systems to store and mine covid 19 data
topic big data
COVID-19
Data Mining
SQL and NoSQL databases
url https://www.mdpi.com/2073-431X/11/2/29
work_keys_str_mv AT joaoantas assessmentofsqlandnosqlsystemstostoreandminecovid19data
AT rodrigorochasilva assessmentofsqlandnosqlsystemstostoreandminecovid19data
AT jorgebernardino assessmentofsqlandnosqlsystemstostoreandminecovid19data