A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time

Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time...

Full description

Bibliographic Details
Main Authors:	Samuel Dixon, Ravikiran Keshavamurthy, Daniel H. Farber, Andrew Stevens, Karl T. Pazdernik, Lauren E. Charles
Format:	Article
Language:	English
Published:	MDPI AG 2022-01-01
Series:	Pathogens
Subjects:	infectious disease forecasting prediction big data multi-feature fusion machine learning deep learning
Online Access:	https://www.mdpi.com/2076-0817/11/2/185

_version_	1797477362075435008
author	Samuel Dixon Ravikiran Keshavamurthy Daniel H. Farber Andrew Stevens Karl T. Pazdernik Lauren E. Charles
author_facet	Samuel Dixon Ravikiran Keshavamurthy Daniel H. Farber Andrew Stevens Karl T. Pazdernik Lauren E. Charles
author_sort	Samuel Dixon
collection	DOAJ
description	Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder–decoder model). The disease models were trained on data from seven different countries at the region-level between 2009–2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches.
first_indexed	2024-03-09T21:16:32Z
format	Article
id	doaj.art-09a0888d41de429f974cee7a1392729a
institution	Directory Open Access Journal
issn	2076-0817
language	English
last_indexed	2024-03-09T21:16:32Z
publishDate	2022-01-01
publisher	MDPI AG
record_format	Article
series	Pathogens
spelling	doaj.art-09a0888d41de429f974cee7a1392729a2023-11-23T21:31:41ZengMDPI AGPathogens2076-08172022-01-0111218510.3390/pathogens11020185A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and TimeSamuel Dixon0Ravikiran Keshavamurthy1Daniel H. Farber2Andrew Stevens3Karl T. Pazdernik4Lauren E. Charles5Pacific Northwest National Laboratory, Richland, WA 99354, USAPacific Northwest National Laboratory, Richland, WA 99354, USAPacific Northwest National Laboratory, Richland, WA 99354, USAPacific Northwest National Laboratory, Richland, WA 99354, USAPacific Northwest National Laboratory, Richland, WA 99354, USAPacific Northwest National Laboratory, Richland, WA 99354, USAAccurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder–decoder model). The disease models were trained on data from seven different countries at the region-level between 2009–2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches.https://www.mdpi.com/2076-0817/11/2/185infectious disease forecastingpredictionbig datamulti-feature fusionmachine learningdeep learning
spellingShingle	Samuel Dixon Ravikiran Keshavamurthy Daniel H. Farber Andrew Stevens Karl T. Pazdernik Lauren E. Charles A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time Pathogens infectious disease forecasting prediction big data multi-feature fusion machine learning deep learning
title	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_full	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_fullStr	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_full_unstemmed	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_short	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_sort	comparison of infectious disease forecasting methods across locations diseases and time
topic	infectious disease forecasting prediction big data multi-feature fusion machine learning deep learning
url	https://www.mdpi.com/2076-0817/11/2/185
work_keys_str_mv	AT samueldixon acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT ravikirankeshavamurthy acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT danielhfarber acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT andrewstevens acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT karltpazdernik acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT laurenecharles acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT samueldixon comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT ravikirankeshavamurthy comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT danielhfarber comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT andrewstevens comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT karltpazdernik comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT laurenecharles comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime

A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time

Similar Items