A systematic review of data mining and machine learning for air pollution epidemiology

Abstract Background Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to mak...

Full description

Bibliographic Details
Main Authors: Colin Bellinger, Mohomed Shazan Mohomed Jabbar, Osmar Zaïane, Alvaro Osornio-Vargas
Format: Article
Language:English
Published: BMC 2017-11-01
Series:BMC Public Health
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12889-017-4914-3
_version_ 1818024736497598464
author Colin Bellinger
Mohomed Shazan Mohomed Jabbar
Osmar Zaïane
Alvaro Osornio-Vargas
author_facet Colin Bellinger
Mohomed Shazan Mohomed Jabbar
Osmar Zaïane
Alvaro Osornio-Vargas
author_sort Colin Bellinger
collection DOAJ
description Abstract Background Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. Methods We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Results Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. Conclusions We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
first_indexed 2024-12-10T04:04:57Z
format Article
id doaj.art-456d628825ca46088e901008a39b7c55
institution Directory Open Access Journal
issn 1471-2458
language English
last_indexed 2024-12-10T04:04:57Z
publishDate 2017-11-01
publisher BMC
record_format Article
series BMC Public Health
spelling doaj.art-456d628825ca46088e901008a39b7c552022-12-22T02:02:52ZengBMCBMC Public Health1471-24582017-11-0117111910.1186/s12889-017-4914-3A systematic review of data mining and machine learning for air pollution epidemiologyColin Bellinger0Mohomed Shazan Mohomed Jabbar1Osmar Zaïane2Alvaro Osornio-Vargas3Department of Computing Science, University of AlbertaDepartment of Computing Science, University of AlbertaDepartment of Computing Science, University of AlbertaDepartment of Paediatrics, University of AlbertaAbstract Background Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. Methods We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Results Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. Conclusions We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.http://link.springer.com/article/10.1186/s12889-017-4914-3EpidemiologyAir pollutionExposureData miningBig dataMachine learning
spellingShingle Colin Bellinger
Mohomed Shazan Mohomed Jabbar
Osmar Zaïane
Alvaro Osornio-Vargas
A systematic review of data mining and machine learning for air pollution epidemiology
BMC Public Health
Epidemiology
Air pollution
Exposure
Data mining
Big data
Machine learning
title A systematic review of data mining and machine learning for air pollution epidemiology
title_full A systematic review of data mining and machine learning for air pollution epidemiology
title_fullStr A systematic review of data mining and machine learning for air pollution epidemiology
title_full_unstemmed A systematic review of data mining and machine learning for air pollution epidemiology
title_short A systematic review of data mining and machine learning for air pollution epidemiology
title_sort systematic review of data mining and machine learning for air pollution epidemiology
topic Epidemiology
Air pollution
Exposure
Data mining
Big data
Machine learning
url http://link.springer.com/article/10.1186/s12889-017-4914-3
work_keys_str_mv AT colinbellinger asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT mohomedshazanmohomedjabbar asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT osmarzaiane asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT alvaroosorniovargas asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT colinbellinger systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT mohomedshazanmohomedjabbar systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT osmarzaiane systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT alvaroosorniovargas systematicreviewofdataminingandmachinelearningforairpollutionepidemiology