An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment

Machine Learning models have become a fruitful tool in water resources modelling. However, it requires a significant amount of datasets for training and validation, which poses challenges in the analysis of data scarce environments, particularly for poorly monitored basins. In such scenarios, using...

Full description

Bibliographic Details
Main Authors: Ayoub Nafii, Houda Lamane, Abdeslam Taleb, Ali El Bilali
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:MethodsX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215016123000389
_version_ 1827917759830294528
author Ayoub Nafii
Houda Lamane
Abdeslam Taleb
Ali El Bilali
author_facet Ayoub Nafii
Houda Lamane
Abdeslam Taleb
Ali El Bilali
author_sort Ayoub Nafii
collection DOAJ
description Machine Learning models have become a fruitful tool in water resources modelling. However, it requires a significant amount of datasets for training and validation, which poses challenges in the analysis of data scarce environments, particularly for poorly monitored basins. In such scenarios, using Virtual Sample Generation (VSG) method is valuable to overcome this challenge in developing ML models. The main aim of this manuscript is to introduce a novel VSG based on multivariate distribution and Gaussian Copula called MVD-VSG whereby appropriate virtual combinations of groundwater quality parameters can be generated to train Deep Neural Network (DNN) for predicting Entropy Weighted Water Quality Index (EWQI) of aquifers even with small datasets. The MVD-VSG is original and was validated for its initial application using sufficient observed datasets collected from two aquifers. The validation results showed that from only 20 original samples, the MVD-VSG provided enough accuracy to predict EWQI with an NSE of 0.87. However the companion publication of this Method paper is El Bilali et al. [1]. • Development of MVD-VSG to generate virtual combinations of groundwater parameters in data scarce environment. • Training deep neural network to predict groundwater quality. • Validation of the method with sufficient observed datasets and sensitivity analysis.
first_indexed 2024-03-13T03:33:02Z
format Article
id doaj.art-0a62beb22f974116bcf3a821e43796ae
institution Directory Open Access Journal
issn 2215-0161
language English
last_indexed 2024-03-13T03:33:02Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series MethodsX
spelling doaj.art-0a62beb22f974116bcf3a821e43796ae2023-06-24T05:17:05ZengElsevierMethodsX2215-01612023-01-0110102034An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environmentAyoub Nafii0Houda Lamane1Abdeslam Taleb2Ali El Bilali3Hassan II University of Casablanca, Faculty of sciences and techniques of Mohammedia, Morocco; River Basin Agency of Bouregreg and Chaouia, 13000 Benslimane, Morocco; Corresponding authors.Hassan II University of Casablanca, Faculty of sciences and techniques of Mohammedia, MoroccoHassan II University of Casablanca, Faculty of sciences and techniques of Mohammedia, MoroccoHassan II University of Casablanca, Faculty of sciences and techniques of Mohammedia, Morocco; River Basin Agency of Bouregreg and Chaouia, 13000 Benslimane, Morocco; Corresponding authors.Machine Learning models have become a fruitful tool in water resources modelling. However, it requires a significant amount of datasets for training and validation, which poses challenges in the analysis of data scarce environments, particularly for poorly monitored basins. In such scenarios, using Virtual Sample Generation (VSG) method is valuable to overcome this challenge in developing ML models. The main aim of this manuscript is to introduce a novel VSG based on multivariate distribution and Gaussian Copula called MVD-VSG whereby appropriate virtual combinations of groundwater quality parameters can be generated to train Deep Neural Network (DNN) for predicting Entropy Weighted Water Quality Index (EWQI) of aquifers even with small datasets. The MVD-VSG is original and was validated for its initial application using sufficient observed datasets collected from two aquifers. The validation results showed that from only 20 original samples, the MVD-VSG provided enough accuracy to predict EWQI with an NSE of 0.87. However the companion publication of this Method paper is El Bilali et al. [1]. • Development of MVD-VSG to generate virtual combinations of groundwater parameters in data scarce environment. • Training deep neural network to predict groundwater quality. • Validation of the method with sufficient observed datasets and sensitivity analysis.http://www.sciencedirect.com/science/article/pii/S2215016123000389An approach based on copulas to predict groundwater quality using DNN models with small data
spellingShingle Ayoub Nafii
Houda Lamane
Abdeslam Taleb
Ali El Bilali
An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
MethodsX
An approach based on copulas to predict groundwater quality using DNN models with small data
title An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
title_full An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
title_fullStr An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
title_full_unstemmed An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
title_short An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment
title_sort approach based on multivariate distribution and gaussian copulas to predict groundwater quality using dnn models in a data scarce environment
topic An approach based on copulas to predict groundwater quality using DNN models with small data
url http://www.sciencedirect.com/science/article/pii/S2215016123000389
work_keys_str_mv AT ayoubnafii anapproachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT houdalamane anapproachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT abdeslamtaleb anapproachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT alielbilali anapproachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT ayoubnafii approachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT houdalamane approachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT abdeslamtaleb approachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment
AT alielbilali approachbasedonmultivariatedistributionandgaussiancopulastopredictgroundwaterqualityusingdnnmodelsinadatascarceenvironment