A FAIR and AI-ready Higgs boson decay dataset
Abstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, ste...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-02-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-021-01109-0 |
_version_ | 1819277873836457984 |
---|---|
author | Yifan Chen E. A. Huerta Javier Duarte Philip Harris Daniel S. Katz Mark S. Neubauer Daniel Diaz Farouk Mokhtar Raghav Kansal Sang Eon Park Volodymyr V. Kindratenko Zhizhen Zhao Roger Rusack |
author_facet | Yifan Chen E. A. Huerta Javier Duarte Philip Harris Daniel S. Katz Mark S. Neubauer Daniel Diaz Farouk Mokhtar Raghav Kansal Sang Eon Park Volodymyr V. Kindratenko Zhizhen Zhao Roger Rusack |
author_sort | Yifan Chen |
collection | DOAJ |
description | Abstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics. |
first_indexed | 2024-12-24T00:03:02Z |
format | Article |
id | doaj.art-d67a4ff4df854871839ca49644de7ebb |
institution | Directory Open Access Journal |
issn | 2052-4463 |
language | English |
last_indexed | 2024-12-24T00:03:02Z |
publishDate | 2022-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj.art-d67a4ff4df854871839ca49644de7ebb2022-12-21T17:25:05ZengNature PortfolioScientific Data2052-44632022-02-019111010.1038/s41597-021-01109-0A FAIR and AI-ready Higgs boson decay datasetYifan Chen0E. A. Huerta1Javier Duarte2Philip Harris3Daniel S. Katz4Mark S. Neubauer5Daniel Diaz6Farouk Mokhtar7Raghav Kansal8Sang Eon Park9Volodymyr V. Kindratenko10Zhizhen Zhao11Roger Rusack12University of Illinois at Urbana-ChampaignArgonne National LaboratoryUniversity of California San Diego, La JollaHalıcıoğlu Data Science Institute, La JollaUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignUniversity of California San Diego, La JollaUniversity of California San Diego, La JollaUniversity of California San Diego, La JollaMassachusetts Institute of Technology, CambridgeUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignThe University of MinnesotaAbstract To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.https://doi.org/10.1038/s41597-021-01109-0 |
spellingShingle | Yifan Chen E. A. Huerta Javier Duarte Philip Harris Daniel S. Katz Mark S. Neubauer Daniel Diaz Farouk Mokhtar Raghav Kansal Sang Eon Park Volodymyr V. Kindratenko Zhizhen Zhao Roger Rusack A FAIR and AI-ready Higgs boson decay dataset Scientific Data |
title | A FAIR and AI-ready Higgs boson decay dataset |
title_full | A FAIR and AI-ready Higgs boson decay dataset |
title_fullStr | A FAIR and AI-ready Higgs boson decay dataset |
title_full_unstemmed | A FAIR and AI-ready Higgs boson decay dataset |
title_short | A FAIR and AI-ready Higgs boson decay dataset |
title_sort | fair and ai ready higgs boson decay dataset |
url | https://doi.org/10.1038/s41597-021-01109-0 |
work_keys_str_mv | AT yifanchen afairandaireadyhiggsbosondecaydataset AT eahuerta afairandaireadyhiggsbosondecaydataset AT javierduarte afairandaireadyhiggsbosondecaydataset AT philipharris afairandaireadyhiggsbosondecaydataset AT danielskatz afairandaireadyhiggsbosondecaydataset AT marksneubauer afairandaireadyhiggsbosondecaydataset AT danieldiaz afairandaireadyhiggsbosondecaydataset AT faroukmokhtar afairandaireadyhiggsbosondecaydataset AT raghavkansal afairandaireadyhiggsbosondecaydataset AT sangeonpark afairandaireadyhiggsbosondecaydataset AT volodymyrvkindratenko afairandaireadyhiggsbosondecaydataset AT zhizhenzhao afairandaireadyhiggsbosondecaydataset AT rogerrusack afairandaireadyhiggsbosondecaydataset AT yifanchen fairandaireadyhiggsbosondecaydataset AT eahuerta fairandaireadyhiggsbosondecaydataset AT javierduarte fairandaireadyhiggsbosondecaydataset AT philipharris fairandaireadyhiggsbosondecaydataset AT danielskatz fairandaireadyhiggsbosondecaydataset AT marksneubauer fairandaireadyhiggsbosondecaydataset AT danieldiaz fairandaireadyhiggsbosondecaydataset AT faroukmokhtar fairandaireadyhiggsbosondecaydataset AT raghavkansal fairandaireadyhiggsbosondecaydataset AT sangeonpark fairandaireadyhiggsbosondecaydataset AT volodymyrvkindratenko fairandaireadyhiggsbosondecaydataset AT zhizhenzhao fairandaireadyhiggsbosondecaydataset AT rogerrusack fairandaireadyhiggsbosondecaydataset |