CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain)
Accurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/8/4/65 |
_version_ | 1827745404940189696 |
---|---|
author | Llinet Benavides Cesar Miguel Ángel Manso Callejo Calimanut-Ionut Cira Ramon Alcarria |
author_facet | Llinet Benavides Cesar Miguel Ángel Manso Callejo Calimanut-Ionut Cira Ramon Alcarria |
author_sort | Llinet Benavides Cesar |
collection | DOAJ |
description | Accurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance dataset, CyL-GHI, containing refined data from 37 stations found within the Spanish region of Castile and León (Spanish: Castilla y León, or CyL). In addition to the data cleaning steps, the procedure also features steps that enable the addition of meteorological and geographical variables that complement the value of the initial data. The proposed dataset, resulting from applying the processing methodology, is delivered both in raw format and with the quality processing applied, and continuously covers 18 years (the period from 1 January 2002 to 31 December 2019), with a temporal resolution of 30 min. CyL-GHI can result in great importance in studies focused on the spatial-temporal characteristics of solar irradiance data, due to the geographical information considered that enables a regional analysis of the phenomena (the 37 stations cover a land area larger than 94,226 km<sup>2</sup>). Afterwards, three popular artificial intelligence algorithms were optimised and tested on CyL-GHI, their performance values being offered as baselines to compare other forecasting implementations. Furthermore, the ERA5 values corresponding to the studied area were analysed and compared with performance values delivered by the trained models. The inclusion of previous observations of neighbours as input to an optimised Random Forest model (applying a spatio-temporal approach) improved the predictive capability of the machine learning models by almost 3%. |
first_indexed | 2024-03-11T05:06:34Z |
format | Article |
id | doaj.art-eadd32a27a4a4947b47afccdf2681a18 |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-03-11T05:06:34Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-eadd32a27a4a4947b47afccdf2681a182023-11-17T18:53:19ZengMDPI AGData2306-57292023-03-01846510.3390/data8040065CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain)Llinet Benavides Cesar0Miguel Ángel Manso Callejo1Calimanut-Ionut Cira2Ramon Alcarria3Departamento de Ingeniería Topográfica y Cartográfica, Escuela Técnica Superior de Ingenieros en Topografía, Geodesia y Cartografía, Universidad Politécnica de Madrid, Calle Mercator, 2, 28031 Madrid, SpainDepartamento de Ingeniería Topográfica y Cartográfica, Escuela Técnica Superior de Ingenieros en Topografía, Geodesia y Cartografía, Universidad Politécnica de Madrid, Calle Mercator, 2, 28031 Madrid, SpainDepartamento de Ingeniería Topográfica y Cartográfica, Escuela Técnica Superior de Ingenieros en Topografía, Geodesia y Cartografía, Universidad Politécnica de Madrid, Calle Mercator, 2, 28031 Madrid, SpainDepartamento de Ingeniería Topográfica y Cartográfica, Escuela Técnica Superior de Ingenieros en Topografía, Geodesia y Cartografía, Universidad Politécnica de Madrid, Calle Mercator, 2, 28031 Madrid, SpainAccurate solar forecasting lately relies on advances in the field of artificial intelligence and on the availability of databases with large amounts of information on meteorological variables. In this paper, we present the methodology applied to introduce a large-scale, public, and solar irradiance dataset, CyL-GHI, containing refined data from 37 stations found within the Spanish region of Castile and León (Spanish: Castilla y León, or CyL). In addition to the data cleaning steps, the procedure also features steps that enable the addition of meteorological and geographical variables that complement the value of the initial data. The proposed dataset, resulting from applying the processing methodology, is delivered both in raw format and with the quality processing applied, and continuously covers 18 years (the period from 1 January 2002 to 31 December 2019), with a temporal resolution of 30 min. CyL-GHI can result in great importance in studies focused on the spatial-temporal characteristics of solar irradiance data, due to the geographical information considered that enables a regional analysis of the phenomena (the 37 stations cover a land area larger than 94,226 km<sup>2</sup>). Afterwards, three popular artificial intelligence algorithms were optimised and tested on CyL-GHI, their performance values being offered as baselines to compare other forecasting implementations. Furthermore, the ERA5 values corresponding to the studied area were analysed and compared with performance values delivered by the trained models. The inclusion of previous observations of neighbours as input to an optimised Random Forest model (applying a spatio-temporal approach) improved the predictive capability of the machine learning models by almost 3%.https://www.mdpi.com/2306-5729/8/4/65global horizontal irradianceweather measurementsextended areaSpain region |
spellingShingle | Llinet Benavides Cesar Miguel Ángel Manso Callejo Calimanut-Ionut Cira Ramon Alcarria CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) Data global horizontal irradiance weather measurements extended area Spain region |
title | CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) |
title_full | CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) |
title_fullStr | CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) |
title_full_unstemmed | CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) |
title_short | CyL-GHI: Global Horizontal Irradiance Dataset Containing 18 Years of Refined Data at 30-Min Granularity from 37 Stations Located in Castile and León (Spain) |
title_sort | cyl ghi global horizontal irradiance dataset containing 18 years of refined data at 30 min granularity from 37 stations located in castile and leon spain |
topic | global horizontal irradiance weather measurements extended area Spain region |
url | https://www.mdpi.com/2306-5729/8/4/65 |
work_keys_str_mv | AT llinetbenavidescesar cylghiglobalhorizontalirradiancedatasetcontaining18yearsofrefineddataat30mingranularityfrom37stationslocatedincastileandleonspain AT miguelangelmansocallejo cylghiglobalhorizontalirradiancedatasetcontaining18yearsofrefineddataat30mingranularityfrom37stationslocatedincastileandleonspain AT calimanutionutcira cylghiglobalhorizontalirradiancedatasetcontaining18yearsofrefineddataat30mingranularityfrom37stationslocatedincastileandleonspain AT ramonalcarria cylghiglobalhorizontalirradiancedatasetcontaining18yearsofrefineddataat30mingranularityfrom37stationslocatedincastileandleonspain |