Architecture and prototype of a WLCG data lake for HL-LHC

The computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realisti...

Full description

Bibliographic Details
Main Authors: Bird Ian, Campana Simone, Girone Maria, Espinal Xavier, McCance Gavin, Schovancová Jaroslava
Format: Article
Language:English
Published: EDP Sciences 2019-01-01
Series:EPJ Web of Conferences
Online Access:https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf
_version_ 1818621589277638656
author Bird Ian
Campana Simone
Girone Maria
Espinal Xavier
McCance Gavin
Schovancová Jaroslava
author_facet Bird Ian
Campana Simone
Girone Maria
Espinal Xavier
McCance Gavin
Schovancová Jaroslava
author_sort Bird Ian
collection DOAJ
description The computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realistically provided by the funding agencies at the same cost of today. The evolution of the computing facilities and the way storage will be organized and consolidated will play a key role in how this possible shortage of resources will be addressed. In this contribution we will describe the architecture of a WLCG data lake, intended as a storage service geographically distributed across large data centers connected by fast network with low latency. Will present the experience with our first prototype, showing how the concept, implemented at different scales, can serve different needs, from regional and national consolidation of storage to an international data provisioning service. We will highlight how the system leverages its distributed nature, the economy of scale and different classes of storage to optimise the hardware and operational cost, through a set of policy driven decisions concerning data placement and data retention. We will discuss how the system leverages or interoperates with existing federated storage solutions. We will finally describe the possible data processing models in this environment and present our first benchmarks.
first_indexed 2024-12-16T18:11:40Z
format Article
id doaj.art-39d26284213c4605b3da1ce99b8758a9
institution Directory Open Access Journal
issn 2100-014X
language English
last_indexed 2024-12-16T18:11:40Z
publishDate 2019-01-01
publisher EDP Sciences
record_format Article
series EPJ Web of Conferences
spelling doaj.art-39d26284213c4605b3da1ce99b8758a92022-12-21T22:21:46ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140402410.1051/epjconf/201921404024epjconf_chep2018_04024Architecture and prototype of a WLCG data lake for HL-LHCBird IanCampana SimoneGirone MariaEspinal XavierMcCance GavinSchovancová JaroslavaThe computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realistically provided by the funding agencies at the same cost of today. The evolution of the computing facilities and the way storage will be organized and consolidated will play a key role in how this possible shortage of resources will be addressed. In this contribution we will describe the architecture of a WLCG data lake, intended as a storage service geographically distributed across large data centers connected by fast network with low latency. Will present the experience with our first prototype, showing how the concept, implemented at different scales, can serve different needs, from regional and national consolidation of storage to an international data provisioning service. We will highlight how the system leverages its distributed nature, the economy of scale and different classes of storage to optimise the hardware and operational cost, through a set of policy driven decisions concerning data placement and data retention. We will discuss how the system leverages or interoperates with existing federated storage solutions. We will finally describe the possible data processing models in this environment and present our first benchmarks.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf
spellingShingle Bird Ian
Campana Simone
Girone Maria
Espinal Xavier
McCance Gavin
Schovancová Jaroslava
Architecture and prototype of a WLCG data lake for HL-LHC
EPJ Web of Conferences
title Architecture and prototype of a WLCG data lake for HL-LHC
title_full Architecture and prototype of a WLCG data lake for HL-LHC
title_fullStr Architecture and prototype of a WLCG data lake for HL-LHC
title_full_unstemmed Architecture and prototype of a WLCG data lake for HL-LHC
title_short Architecture and prototype of a WLCG data lake for HL-LHC
title_sort architecture and prototype of a wlcg data lake for hl lhc
url https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf
work_keys_str_mv AT birdian architectureandprototypeofawlcgdatalakeforhllhc
AT campanasimone architectureandprototypeofawlcgdatalakeforhllhc
AT gironemaria architectureandprototypeofawlcgdatalakeforhllhc
AT espinalxavier architectureandprototypeofawlcgdatalakeforhllhc
AT mccancegavin architectureandprototypeofawlcgdatalakeforhllhc
AT schovancovajaroslava architectureandprototypeofawlcgdatalakeforhllhc