Machine learning algorithms applied to wildfire data in California's central valley
This study focuses on using Machine Learning methods to predict wildfires within California's Central Valley. The specific areas within the Central Valley were Yosemite Valley, Sequoias, and Kings Canyon since these areas can be considered wildfire hotspots. This topic is relevant since Califor...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-03-01
|
Series: | Trees, Forests and People |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666719324000244 |
_version_ | 1797289707958173696 |
---|---|
author | Kassandra Hernandez Aaron B. Hoskins |
author_facet | Kassandra Hernandez Aaron B. Hoskins |
author_sort | Kassandra Hernandez |
collection | DOAJ |
description | This study focuses on using Machine Learning methods to predict wildfires within California's Central Valley. The specific areas within the Central Valley were Yosemite Valley, Sequoias, and Kings Canyon since these areas can be considered wildfire hotspots. This topic is relevant since California has seen an increase in wildfires with an increase in annual forest burned areas to +172 % from 1996 to 2021 (ABC 2024). The algorithms selected were based on previous research that conducted similar studies. From this research it is hypothesized that the best performing algorithm for predicting wildfires would be Random Forest. The novelty in this study stems from focusing on the specific areas mentioned above, which is where many wildfires have occurred throughout the years. The overall goal is to determine the best machine learning algorithm to predict wildfires in the Central Valley and take the results to improve upon wildfire prevention within these regions. The methods implemented included Decision Trees, Random Forest, Naïve Bayes, and Neural Networks. The dataset was gathered from the following satellite data which include MERRA-2 and USGS Landsat 8 along with fire history from 2012 to 2023 within these regions. Utilizing the dataset in the following two variations were a random split and a chronological split of training and testing sets. The best-performing algorithm using this dataset was Decision Trees at 550 maximum splits with an F1-Score of 0.689. The F1-Score ranges between 0 and 1 with a score of 0.7 or higher being deemed a good model to be used for predictions. The conclusion that could be determined from this result is that the randomized data has better predicting power over a chronologically split dataset. This can be seen in the confusion matrices for the chronological split dataset having zero true positive values in all the methods except for Naïve Bayes. Overall, the results show that Decision trees with a larger maximum split in the leaf nodes result in a more accurate prediction of whether a fire will occur within the given regions. The conclusion that can be made from this result is that Decision Trees can be a useful tool in predicting wildfires in California's Central Valley. The applications of this research would be the ability to use the information gained in this study to aid in optimizing resources in wildfire prevention within these areas. |
first_indexed | 2024-03-07T19:08:54Z |
format | Article |
id | doaj.art-ef28abedc43248dea9d96d1f01b9317d |
institution | Directory Open Access Journal |
issn | 2666-7193 |
language | English |
last_indexed | 2024-03-07T19:08:54Z |
publishDate | 2024-03-01 |
publisher | Elsevier |
record_format | Article |
series | Trees, Forests and People |
spelling | doaj.art-ef28abedc43248dea9d96d1f01b9317d2024-03-01T05:07:36ZengElsevierTrees, Forests and People2666-71932024-03-0115100516Machine learning algorithms applied to wildfire data in California's central valleyKassandra Hernandez0Aaron B. Hoskins1Department of Mechanical Engineering, California State University, Fresno, USACorresponding author.; Department of Mechanical Engineering, California State University, Fresno, USAThis study focuses on using Machine Learning methods to predict wildfires within California's Central Valley. The specific areas within the Central Valley were Yosemite Valley, Sequoias, and Kings Canyon since these areas can be considered wildfire hotspots. This topic is relevant since California has seen an increase in wildfires with an increase in annual forest burned areas to +172 % from 1996 to 2021 (ABC 2024). The algorithms selected were based on previous research that conducted similar studies. From this research it is hypothesized that the best performing algorithm for predicting wildfires would be Random Forest. The novelty in this study stems from focusing on the specific areas mentioned above, which is where many wildfires have occurred throughout the years. The overall goal is to determine the best machine learning algorithm to predict wildfires in the Central Valley and take the results to improve upon wildfire prevention within these regions. The methods implemented included Decision Trees, Random Forest, Naïve Bayes, and Neural Networks. The dataset was gathered from the following satellite data which include MERRA-2 and USGS Landsat 8 along with fire history from 2012 to 2023 within these regions. Utilizing the dataset in the following two variations were a random split and a chronological split of training and testing sets. The best-performing algorithm using this dataset was Decision Trees at 550 maximum splits with an F1-Score of 0.689. The F1-Score ranges between 0 and 1 with a score of 0.7 or higher being deemed a good model to be used for predictions. The conclusion that could be determined from this result is that the randomized data has better predicting power over a chronologically split dataset. This can be seen in the confusion matrices for the chronological split dataset having zero true positive values in all the methods except for Naïve Bayes. Overall, the results show that Decision trees with a larger maximum split in the leaf nodes result in a more accurate prediction of whether a fire will occur within the given regions. The conclusion that can be made from this result is that Decision Trees can be a useful tool in predicting wildfires in California's Central Valley. The applications of this research would be the ability to use the information gained in this study to aid in optimizing resources in wildfire prevention within these areas.http://www.sciencedirect.com/science/article/pii/S2666719324000244Wildfire predictionMachine learningDecision treesRandom forestNeural networksNaïve Bayes |
spellingShingle | Kassandra Hernandez Aaron B. Hoskins Machine learning algorithms applied to wildfire data in California's central valley Trees, Forests and People Wildfire prediction Machine learning Decision trees Random forest Neural networks Naïve Bayes |
title | Machine learning algorithms applied to wildfire data in California's central valley |
title_full | Machine learning algorithms applied to wildfire data in California's central valley |
title_fullStr | Machine learning algorithms applied to wildfire data in California's central valley |
title_full_unstemmed | Machine learning algorithms applied to wildfire data in California's central valley |
title_short | Machine learning algorithms applied to wildfire data in California's central valley |
title_sort | machine learning algorithms applied to wildfire data in california s central valley |
topic | Wildfire prediction Machine learning Decision trees Random forest Neural networks Naïve Bayes |
url | http://www.sciencedirect.com/science/article/pii/S2666719324000244 |
work_keys_str_mv | AT kassandrahernandez machinelearningalgorithmsappliedtowildfiredataincaliforniascentralvalley AT aaronbhoskins machinelearningalgorithmsappliedtowildfiredataincaliforniascentralvalley |