River reach-level machine learning estimation of nutrient concentrations in Great Britain

Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels...

Full description

Bibliographic Details
Main Authors: Chak-Hau Michael Tso, Eugene Magee, David Huxley, Michael Eastman, Matthew Fry
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-09-01
Series:Frontiers in Water
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frwa.2023.1244024/full
_version_ 1797679149949648896
author Chak-Hau Michael Tso
Chak-Hau Michael Tso
Eugene Magee
Eugene Magee
David Huxley
Michael Eastman
Michael Eastman
Matthew Fry
Matthew Fry
author_facet Chak-Hau Michael Tso
Chak-Hau Michael Tso
Eugene Magee
Eugene Magee
David Huxley
Michael Eastman
Michael Eastman
Matthew Fry
Matthew Fry
author_sort Chak-Hau Michael Tso
collection DOAJ
description Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (>1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010–2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination (R2) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels.
first_indexed 2024-03-11T23:10:19Z
format Article
id doaj.art-45d4e492919c48d982691a3278c86369
institution Directory Open Access Journal
issn 2624-9375
language English
last_indexed 2024-03-11T23:10:19Z
publishDate 2023-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Water
spelling doaj.art-45d4e492919c48d982691a3278c863692023-09-21T09:19:37ZengFrontiers Media S.A.Frontiers in Water2624-93752023-09-01510.3389/frwa.2023.12440241244024River reach-level machine learning estimation of nutrient concentrations in Great BritainChak-Hau Michael Tso0Chak-Hau Michael Tso1Eugene Magee2Eugene Magee3David Huxley4Michael Eastman5Michael Eastman6Matthew Fry7Matthew Fry8UK Centre for Ecology and Hydrology, Lancaster, United KingdomCentre of Excellence for Environmental Data Science, Lancaster, United KingdomUK Centre for Ecology and Hydrology, Wallingford, United KingdomFormerly Data Science MSc Programme, School of Computing and Communications, Lancaster University, Lancaster, United KingdomFormerly Data Science MSc Programme, School of Computing and Communications, Lancaster University, Lancaster, United KingdomUK Centre for Ecology and Hydrology, Wallingford, United KingdomMet Office, Exeter, United KingdomCentre of Excellence for Environmental Data Science, Lancaster, United KingdomUK Centre for Ecology and Hydrology, Wallingford, United KingdomNitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (>1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010–2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination (R2) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels.https://www.frontiersin.org/articles/10.3389/frwa.2023.1244024/fullriver networkmachine learningnutrientswater qualityrandom forest
spellingShingle Chak-Hau Michael Tso
Chak-Hau Michael Tso
Eugene Magee
Eugene Magee
David Huxley
Michael Eastman
Michael Eastman
Matthew Fry
Matthew Fry
River reach-level machine learning estimation of nutrient concentrations in Great Britain
Frontiers in Water
river network
machine learning
nutrients
water quality
random forest
title River reach-level machine learning estimation of nutrient concentrations in Great Britain
title_full River reach-level machine learning estimation of nutrient concentrations in Great Britain
title_fullStr River reach-level machine learning estimation of nutrient concentrations in Great Britain
title_full_unstemmed River reach-level machine learning estimation of nutrient concentrations in Great Britain
title_short River reach-level machine learning estimation of nutrient concentrations in Great Britain
title_sort river reach level machine learning estimation of nutrient concentrations in great britain
topic river network
machine learning
nutrients
water quality
random forest
url https://www.frontiersin.org/articles/10.3389/frwa.2023.1244024/full
work_keys_str_mv AT chakhaumichaeltso riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT chakhaumichaeltso riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT eugenemagee riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT eugenemagee riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT davidhuxley riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT michaeleastman riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT michaeleastman riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT matthewfry riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain
AT matthewfry riverreachlevelmachinelearningestimationofnutrientconcentrationsingreatbritain