Imputation of precipitation data in northeast Brazil

Abstract This article evaluates four statistical methods of multiple imputation to fill in the missing data of daily precipitation in Northeast Brazil (NEB). We used a daily database collected by 94 rain gauges distributed in NEB from January 1, 1986 to December 31, 2015. The methods were: random sa...

Full description

Bibliographic Details
Main Authors: DANIELE T. RODRIGUES, WEBER A. GONÇALVES, CLÁUDIO MOISÉS S. E SILVA, MARIA HELENA C. SPYRIDES, PAULO SÉRGIO LÚCIO
Format: Article
Language:English
Published: Academia Brasileira de Ciências 2023-06-01
Series:Anais da Academia Brasileira de Ciências
Subjects:
Online Access:http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0001-37652023000301101&lng=en&tlng=en
_version_ 1797810380598149120
author DANIELE T. RODRIGUES
WEBER A. GONÇALVES
CLÁUDIO MOISÉS S. E SILVA
MARIA HELENA C. SPYRIDES
PAULO SÉRGIO LÚCIO
author_facet DANIELE T. RODRIGUES
WEBER A. GONÇALVES
CLÁUDIO MOISÉS S. E SILVA
MARIA HELENA C. SPYRIDES
PAULO SÉRGIO LÚCIO
author_sort DANIELE T. RODRIGUES
collection DOAJ
description Abstract This article evaluates four statistical methods of multiple imputation to fill in the missing data of daily precipitation in Northeast Brazil (NEB). We used a daily database collected by 94 rain gauges distributed in NEB from January 1, 1986 to December 31, 2015. The methods were: random sampling from the observed values; predictive mean matching, Bayesian linear regression; and bootstrap expectation maximization algorithm (BootEm). To compare these methods, missing data from the original series were initially excluded. The next step was to create three scenarios for each method, in which 10\%, 20\% and 30\% of the data were removed at random. The BootEM method presented the best statistical results. With the average bias between the complete series and the imputed series values ranging between -0.91 and 1.30 mm/day. The values of the Pearson correlation ranging between 0.96, 0.91 and 0.86 respectively for 10\%, 20\% and 30\% missing data. We conclude that this is an adequate method for the reconstruction of historical precipitation data in NEB.
first_indexed 2024-03-13T07:07:58Z
format Article
id doaj.art-cbb0532cb0ae4a39a87dea866979efad
institution Directory Open Access Journal
issn 1678-2690
language English
last_indexed 2024-03-13T07:07:58Z
publishDate 2023-06-01
publisher Academia Brasileira de Ciências
record_format Article
series Anais da Academia Brasileira de Ciências
spelling doaj.art-cbb0532cb0ae4a39a87dea866979efad2023-06-06T07:36:18ZengAcademia Brasileira de CiênciasAnais da Academia Brasileira de Ciências1678-26902023-06-0195210.1590/0001-3765202320210737Imputation of precipitation data in northeast BrazilDANIELE T. RODRIGUEShttps://orcid.org/0000-0003-4307-2832WEBER A. GONÇALVEShttps://orcid.org/0000-0002-5073-8527CLÁUDIO MOISÉS S. E SILVAhttps://orcid.org/0000-0002-2251-7348MARIA HELENA C. SPYRIDEShttps://orcid.org/0000-0001-8087-1962PAULO SÉRGIO LÚCIOhttps://orcid.org/0000-0002-8170-934XAbstract This article evaluates four statistical methods of multiple imputation to fill in the missing data of daily precipitation in Northeast Brazil (NEB). We used a daily database collected by 94 rain gauges distributed in NEB from January 1, 1986 to December 31, 2015. The methods were: random sampling from the observed values; predictive mean matching, Bayesian linear regression; and bootstrap expectation maximization algorithm (BootEm). To compare these methods, missing data from the original series were initially excluded. The next step was to create three scenarios for each method, in which 10\%, 20\% and 30\% of the data were removed at random. The BootEM method presented the best statistical results. With the average bias between the complete series and the imputed series values ranging between -0.91 and 1.30 mm/day. The values of the Pearson correlation ranging between 0.96, 0.91 and 0.86 respectively for 10\%, 20\% and 30\% missing data. We conclude that this is an adequate method for the reconstruction of historical precipitation data in NEB.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0001-37652023000301101&lng=en&tlng=enBootstrapmissing datamultiple imputationsemiarid
spellingShingle DANIELE T. RODRIGUES
WEBER A. GONÇALVES
CLÁUDIO MOISÉS S. E SILVA
MARIA HELENA C. SPYRIDES
PAULO SÉRGIO LÚCIO
Imputation of precipitation data in northeast Brazil
Anais da Academia Brasileira de Ciências
Bootstrap
missing data
multiple imputation
semiarid
title Imputation of precipitation data in northeast Brazil
title_full Imputation of precipitation data in northeast Brazil
title_fullStr Imputation of precipitation data in northeast Brazil
title_full_unstemmed Imputation of precipitation data in northeast Brazil
title_short Imputation of precipitation data in northeast Brazil
title_sort imputation of precipitation data in northeast brazil
topic Bootstrap
missing data
multiple imputation
semiarid
url http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0001-37652023000301101&lng=en&tlng=en
work_keys_str_mv AT danieletrodrigues imputationofprecipitationdatainnortheastbrazil
AT weberagoncalves imputationofprecipitationdatainnortheastbrazil
AT claudiomoisessesilva imputationofprecipitationdatainnortheastbrazil
AT mariahelenacspyrides imputationofprecipitationdatainnortheastbrazil
AT paulosergiolucio imputationofprecipitationdatainnortheastbrazil