Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations

Organic enrichment associated with marine finfish aquaculture is a local stressor of marine coastal ecosystems. To maintain ecosystem services, the implementation of biomonitoring programs focusing on benthic diversity is required. Traditionally, impact-indices are determined by extracting and ident...

Full description

Bibliographic Details
Main Authors: Kleopatra Leontidou, Verena Rubel, Thorsten Stoeck
Format: Article
Language:English
Published: PeerJ Inc. 2023-06-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/15425.pdf
_version_ 1797419957578891264
author Kleopatra Leontidou
Verena Rubel
Thorsten Stoeck
author_facet Kleopatra Leontidou
Verena Rubel
Thorsten Stoeck
author_sort Kleopatra Leontidou
collection DOAJ
description Organic enrichment associated with marine finfish aquaculture is a local stressor of marine coastal ecosystems. To maintain ecosystem services, the implementation of biomonitoring programs focusing on benthic diversity is required. Traditionally, impact-indices are determined by extracting and identifying benthic macroinvertebrates from samples. However, this is a time-consuming and expensive method with low upscaling potential. A more rapid, inexpensive, and robust method to infer the environmental quality of marine environments is eDNA metabarcoding of bacterial communities. To infer the environmental quality of coastal habitats from metabarcoding data, two taxonomy-free approaches have been successfully applied for different geographical regions and monitoring goals, namely quantile regression splines (QRS) and supervised machine learning (SML). However, their comparative performance remains untested for monitoring the impact of organic enrichment introduced by aquaculture on marine coastal environments. We compared the performance of QRS and SML using bacterial metabarcoding data to infer the environmental quality of 230 aquaculture samples collected from seven farms in Norway and seven farms in Scotland along an organic enrichment gradient. As a measure of environmental quality, we used the Infaunal Quality Index (IQI) calculated from benthic macrofauna data (reference index). The QRS analysis plotted the abundance of amplicon sequence variants (ASVs) as a function to the IQI from which the ASVs with a defined abundance peak were assigned to eco-groups and a molecular IQI was subsequently calculated. In contrast, the SML approach built a random forest model to directly predict the macrofauna-based IQI. Our results show that both QRS and SML perform well in inferring the environmental quality with 89% and 90% accuracy, respectively. For both geographic regions, there was high correspondence between the reference IQI and both the inferred molecular IQIs (p < 0.001), with the SML model showing a higher coefficient of determination compared to QRS. Among the 20 most important ASVs identified by the SML approach, 15 were congruent with the good quality spline ASV indicators identified via QRS for both Norwegian and Scottish salmon farms. More research on the response of the ASVs to organic enrichment and the co-influence of other environmental parameters is necessary to eventually select the most powerful stressor-specific indicators. Even though both approaches are promising to infer environmental quality based on metabarcoding data, SML showed to be more powerful in handling the natural variability. For the improvement of the SML model, addition of new samples is still required, as background noise introduced by high spatio-temporal variability can be reduced. Overall, we recommend the development of a powerful SML approach that will be onwards applied for monitoring the impact of aquaculture on marine ecosystems based on eDNA metabarcoding data.
first_indexed 2024-03-09T06:54:41Z
format Article
id doaj.art-f76bfed17d514842b4422f2b3676c10f
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:54:41Z
publishDate 2023-06-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-f76bfed17d514842b4422f2b3676c10f2023-12-03T10:09:07ZengPeerJ Inc.PeerJ2167-83592023-06-0111e1542510.7717/peerj.15425Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installationsKleopatra LeontidouVerena RubelThorsten StoeckOrganic enrichment associated with marine finfish aquaculture is a local stressor of marine coastal ecosystems. To maintain ecosystem services, the implementation of biomonitoring programs focusing on benthic diversity is required. Traditionally, impact-indices are determined by extracting and identifying benthic macroinvertebrates from samples. However, this is a time-consuming and expensive method with low upscaling potential. A more rapid, inexpensive, and robust method to infer the environmental quality of marine environments is eDNA metabarcoding of bacterial communities. To infer the environmental quality of coastal habitats from metabarcoding data, two taxonomy-free approaches have been successfully applied for different geographical regions and monitoring goals, namely quantile regression splines (QRS) and supervised machine learning (SML). However, their comparative performance remains untested for monitoring the impact of organic enrichment introduced by aquaculture on marine coastal environments. We compared the performance of QRS and SML using bacterial metabarcoding data to infer the environmental quality of 230 aquaculture samples collected from seven farms in Norway and seven farms in Scotland along an organic enrichment gradient. As a measure of environmental quality, we used the Infaunal Quality Index (IQI) calculated from benthic macrofauna data (reference index). The QRS analysis plotted the abundance of amplicon sequence variants (ASVs) as a function to the IQI from which the ASVs with a defined abundance peak were assigned to eco-groups and a molecular IQI was subsequently calculated. In contrast, the SML approach built a random forest model to directly predict the macrofauna-based IQI. Our results show that both QRS and SML perform well in inferring the environmental quality with 89% and 90% accuracy, respectively. For both geographic regions, there was high correspondence between the reference IQI and both the inferred molecular IQIs (p < 0.001), with the SML model showing a higher coefficient of determination compared to QRS. Among the 20 most important ASVs identified by the SML approach, 15 were congruent with the good quality spline ASV indicators identified via QRS for both Norwegian and Scottish salmon farms. More research on the response of the ASVs to organic enrichment and the co-influence of other environmental parameters is necessary to eventually select the most powerful stressor-specific indicators. Even though both approaches are promising to infer environmental quality based on metabarcoding data, SML showed to be more powerful in handling the natural variability. For the improvement of the SML model, addition of new samples is still required, as background noise introduced by high spatio-temporal variability can be reduced. Overall, we recommend the development of a powerful SML approach that will be onwards applied for monitoring the impact of aquaculture on marine ecosystems based on eDNA metabarcoding data.https://peerj.com/articles/15425.pdfBenthic monitoringeDNA metabarcodingBacterial indicatorsQuantile regression splinesSupervised machine learningOrganic enrichment
spellingShingle Kleopatra Leontidou
Verena Rubel
Thorsten Stoeck
Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
PeerJ
Benthic monitoring
eDNA metabarcoding
Bacterial indicators
Quantile regression splines
Supervised machine learning
Organic enrichment
title Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
title_full Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
title_fullStr Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
title_full_unstemmed Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
title_short Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
title_sort comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations
topic Benthic monitoring
eDNA metabarcoding
Bacterial indicators
Quantile regression splines
Supervised machine learning
Organic enrichment
url https://peerj.com/articles/15425.pdf
work_keys_str_mv AT kleopatraleontidou comparingquantileregressionsplineanalysesandsupervisedmachinelearningforenvironmentalqualityassessmentatcoastalmarineaquacultureinstallations
AT verenarubel comparingquantileregressionsplineanalysesandsupervisedmachinelearningforenvironmentalqualityassessmentatcoastalmarineaquacultureinstallations
AT thorstenstoeck comparingquantileregressionsplineanalysesandsupervisedmachinelearningforenvironmentalqualityassessmentatcoastalmarineaquacultureinstallations