Architecture and prototype of a WLCG data lake for HL-LHC
The computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realisti...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2019-01-01
|
Series: | EPJ Web of Conferences |
Online Access: | https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf |
_version_ | 1818621589277638656 |
---|---|
author | Bird Ian Campana Simone Girone Maria Espinal Xavier McCance Gavin Schovancová Jaroslava |
author_facet | Bird Ian Campana Simone Girone Maria Espinal Xavier McCance Gavin Schovancová Jaroslava |
author_sort | Bird Ian |
collection | DOAJ |
description | The computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realistically provided by the funding agencies at the same cost of today. The evolution of the computing facilities and the way storage will be organized and consolidated will play a key role in how this possible shortage of resources will be addressed. In this contribution we will describe the architecture of a WLCG data lake, intended as a storage service geographically distributed across large data centers connected by fast network with low latency. Will present the experience with our first prototype, showing how the concept, implemented at different scales, can serve different needs, from regional and national consolidation of storage to an international data provisioning service. We will highlight how the system leverages its distributed nature, the economy of scale and different classes of storage to optimise the hardware and operational cost, through a set of policy driven decisions concerning data placement and data retention. We will discuss how the system leverages or interoperates with existing federated storage solutions. We will finally describe the possible data processing models in this environment and present our first benchmarks. |
first_indexed | 2024-12-16T18:11:40Z |
format | Article |
id | doaj.art-39d26284213c4605b3da1ce99b8758a9 |
institution | Directory Open Access Journal |
issn | 2100-014X |
language | English |
last_indexed | 2024-12-16T18:11:40Z |
publishDate | 2019-01-01 |
publisher | EDP Sciences |
record_format | Article |
series | EPJ Web of Conferences |
spelling | doaj.art-39d26284213c4605b3da1ce99b8758a92022-12-21T22:21:46ZengEDP SciencesEPJ Web of Conferences2100-014X2019-01-012140402410.1051/epjconf/201921404024epjconf_chep2018_04024Architecture and prototype of a WLCG data lake for HL-LHCBird IanCampana SimoneGirone MariaEspinal XavierMcCance GavinSchovancová JaroslavaThe computing strategy document for HL-LHC identifies storage as one of the main WLCG challenges in one decade from now. In the naive assumption of applying today's computing model, the ATLAS and CMS experiments will need one order of magnitude more storage resources than what could be realistically provided by the funding agencies at the same cost of today. The evolution of the computing facilities and the way storage will be organized and consolidated will play a key role in how this possible shortage of resources will be addressed. In this contribution we will describe the architecture of a WLCG data lake, intended as a storage service geographically distributed across large data centers connected by fast network with low latency. Will present the experience with our first prototype, showing how the concept, implemented at different scales, can serve different needs, from regional and national consolidation of storage to an international data provisioning service. We will highlight how the system leverages its distributed nature, the economy of scale and different classes of storage to optimise the hardware and operational cost, through a set of policy driven decisions concerning data placement and data retention. We will discuss how the system leverages or interoperates with existing federated storage solutions. We will finally describe the possible data processing models in this environment and present our first benchmarks.https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf |
spellingShingle | Bird Ian Campana Simone Girone Maria Espinal Xavier McCance Gavin Schovancová Jaroslava Architecture and prototype of a WLCG data lake for HL-LHC EPJ Web of Conferences |
title | Architecture and prototype of a WLCG data lake for HL-LHC |
title_full | Architecture and prototype of a WLCG data lake for HL-LHC |
title_fullStr | Architecture and prototype of a WLCG data lake for HL-LHC |
title_full_unstemmed | Architecture and prototype of a WLCG data lake for HL-LHC |
title_short | Architecture and prototype of a WLCG data lake for HL-LHC |
title_sort | architecture and prototype of a wlcg data lake for hl lhc |
url | https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_04024.pdf |
work_keys_str_mv | AT birdian architectureandprototypeofawlcgdatalakeforhllhc AT campanasimone architectureandprototypeofawlcgdatalakeforhllhc AT gironemaria architectureandprototypeofawlcgdatalakeforhllhc AT espinalxavier architectureandprototypeofawlcgdatalakeforhllhc AT mccancegavin architectureandprototypeofawlcgdatalakeforhllhc AT schovancovajaroslava architectureandprototypeofawlcgdatalakeforhllhc |