A FAIR and AI-ready Higgs boson decay dataset

Abstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, ste...

Full description

Bibliographic Details
Main Authors: Yifan Chen, E. A. Huerta, Javier Duarte, Philip Harris, Daniel S. Katz, Mark S. Neubauer, Daniel Diaz, Farouk Mokhtar, Raghav Kansal, Sang Eon Park, Volodymyr V. Kindratenko, Zhizhen Zhao, Roger Rusack
Format: Article
Language:English
Published: Nature Portfolio 2022-02-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-021-01109-0
_version_ 1819277873836457984
author Yifan Chen
E. A. Huerta
Javier Duarte
Philip Harris
Daniel S. Katz
Mark S. Neubauer
Daniel Diaz
Farouk Mokhtar
Raghav Kansal
Sang Eon Park
Volodymyr V. Kindratenko
Zhizhen Zhao
Roger Rusack
author_facet Yifan Chen
E. A. Huerta
Javier Duarte
Philip Harris
Daniel S. Katz
Mark S. Neubauer
Daniel Diaz
Farouk Mokhtar
Raghav Kansal
Sang Eon Park
Volodymyr V. Kindratenko
Zhizhen Zhao
Roger Rusack
author_sort Yifan Chen
collection DOAJ
description Abstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.
first_indexed 2024-12-24T00:03:02Z
format Article
id doaj.art-d67a4ff4df854871839ca49644de7ebb
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-12-24T00:03:02Z
publishDate 2022-02-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-d67a4ff4df854871839ca49644de7ebb2022-12-21T17:25:05ZengNature PortfolioScientific Data2052-44632022-02-019111010.1038/s41597-021-01109-0A FAIR and AI-ready Higgs boson decay datasetYifan Chen0E. A. Huerta1Javier Duarte2Philip Harris3Daniel S. Katz4Mark S. Neubauer5Daniel Diaz6Farouk Mokhtar7Raghav Kansal8Sang Eon Park9Volodymyr V. Kindratenko10Zhizhen Zhao11Roger Rusack12University of Illinois at Urbana-ChampaignArgonne National LaboratoryUniversity of California San Diego, La JollaHalıcıoğlu Data Science Institute, La JollaUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignUniversity of California San Diego, La JollaUniversity of California San Diego, La JollaUniversity of California San Diego, La JollaMassachusetts Institute of Technology, CambridgeUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignThe University of MinnesotaAbstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.https://doi.org/10.1038/s41597-021-01109-0
spellingShingle Yifan Chen
E. A. Huerta
Javier Duarte
Philip Harris
Daniel S. Katz
Mark S. Neubauer
Daniel Diaz
Farouk Mokhtar
Raghav Kansal
Sang Eon Park
Volodymyr V. Kindratenko
Zhizhen Zhao
Roger Rusack
A FAIR and AI-ready Higgs boson decay dataset
Scientific Data
title A FAIR and AI-ready Higgs boson decay dataset
title_full A FAIR and AI-ready Higgs boson decay dataset
title_fullStr A FAIR and AI-ready Higgs boson decay dataset
title_full_unstemmed A FAIR and AI-ready Higgs boson decay dataset
title_short A FAIR and AI-ready Higgs boson decay dataset
title_sort fair and ai ready higgs boson decay dataset
url https://doi.org/10.1038/s41597-021-01109-0
work_keys_str_mv AT yifanchen afairandaireadyhiggsbosondecaydataset
AT eahuerta afairandaireadyhiggsbosondecaydataset
AT javierduarte afairandaireadyhiggsbosondecaydataset
AT philipharris afairandaireadyhiggsbosondecaydataset
AT danielskatz afairandaireadyhiggsbosondecaydataset
AT marksneubauer afairandaireadyhiggsbosondecaydataset
AT danieldiaz afairandaireadyhiggsbosondecaydataset
AT faroukmokhtar afairandaireadyhiggsbosondecaydataset
AT raghavkansal afairandaireadyhiggsbosondecaydataset
AT sangeonpark afairandaireadyhiggsbosondecaydataset
AT volodymyrvkindratenko afairandaireadyhiggsbosondecaydataset
AT zhizhenzhao afairandaireadyhiggsbosondecaydataset
AT rogerrusack afairandaireadyhiggsbosondecaydataset
AT yifanchen fairandaireadyhiggsbosondecaydataset
AT eahuerta fairandaireadyhiggsbosondecaydataset
AT javierduarte fairandaireadyhiggsbosondecaydataset
AT philipharris fairandaireadyhiggsbosondecaydataset
AT danielskatz fairandaireadyhiggsbosondecaydataset
AT marksneubauer fairandaireadyhiggsbosondecaydataset
AT danieldiaz fairandaireadyhiggsbosondecaydataset
AT faroukmokhtar fairandaireadyhiggsbosondecaydataset
AT raghavkansal fairandaireadyhiggsbosondecaydataset
AT sangeonpark fairandaireadyhiggsbosondecaydataset
AT volodymyrvkindratenko fairandaireadyhiggsbosondecaydataset
AT zhizhenzhao fairandaireadyhiggsbosondecaydataset
AT rogerrusack fairandaireadyhiggsbosondecaydataset