Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers

Applying Machine Learning (ML) methods for the analysis of muon lateral distributions in Extensive Air Showers detected by citizen science projects, while taking into account the spatial distribution of detectors requires enormous training data sets. Therefore, generating these data sets with typica...

Full description

Bibliographic Details
Main Authors: Tomasz Hachaj, Lukasz Bibrzycki, Marcin Piekarczyk
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10019257/
_version_ 1797902239920029696
author Tomasz Hachaj
Lukasz Bibrzycki
Marcin Piekarczyk
author_facet Tomasz Hachaj
Lukasz Bibrzycki
Marcin Piekarczyk
author_sort Tomasz Hachaj
collection DOAJ
description Applying Machine Learning (ML) methods for the analysis of muon lateral distributions in Extensive Air Showers detected by citizen science projects, while taking into account the spatial distribution of detectors requires enormous training data sets. Therefore, generating these data sets with typical Monte Carlo (MC) generators like CORSIKA is computationally prohibitive. Here we present a method which by the application of special augmentation procedures produces the training dataset that is compatible in all essential aspects to the data produced with regular MC computations while avoiding their time overhead. We utilize the Nakamura-Kamata-Greisen (NKG) distribution which was proven to be an attractive alternative to full-fledged simulations. The simulation of <inline-formula> <tex-math notation="LaTeX">$10^{4}$ </tex-math></inline-formula> muons at the ground level takes just a few seconds using our implementation of the NKG approach. For <inline-formula> <tex-math notation="LaTeX">$10^{6}$ </tex-math></inline-formula> muons this figure is still around 1 minute. For comparison, CORSIKA based simulation performed on Prometheus supercomputer at CYFRONET computing center an ensemble of <inline-formula> <tex-math notation="LaTeX">$\sim 100$ </tex-math></inline-formula> showers initiated by a particle of <inline-formula> <tex-math notation="LaTeX">$10^{16} eV$ </tex-math></inline-formula> resulted in <inline-formula> <tex-math notation="LaTeX">$\sim 10^{4}$ </tex-math></inline-formula> muons and <inline-formula> <tex-math notation="LaTeX">$\sim 10^{5}$ </tex-math></inline-formula> electrons required computation time of the order of a few days.
first_indexed 2024-04-10T09:14:34Z
format Article
id doaj.art-a63a60f715fb48a0aae887256371923d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T09:14:34Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-a63a60f715fb48a0aae887256371923d2023-02-21T00:01:15ZengIEEEIEEE Access2169-35362023-01-01117410741910.1109/ACCESS.2023.323780010019257Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray ShowersTomasz Hachaj0https://orcid.org/0000-0003-1390-9021Lukasz Bibrzycki1https://orcid.org/0000-0002-6117-4894Marcin Piekarczyk2https://orcid.org/0000-0003-3699-9955Institute of Computer Science, Pedagogical University of Krakow, Krakow, PolandInstitute of Computer Science, Pedagogical University of Krakow, Krakow, PolandInstitute of Computer Science, Pedagogical University of Krakow, Krakow, PolandApplying Machine Learning (ML) methods for the analysis of muon lateral distributions in Extensive Air Showers detected by citizen science projects, while taking into account the spatial distribution of detectors requires enormous training data sets. Therefore, generating these data sets with typical Monte Carlo (MC) generators like CORSIKA is computationally prohibitive. Here we present a method which by the application of special augmentation procedures produces the training dataset that is compatible in all essential aspects to the data produced with regular MC computations while avoiding their time overhead. We utilize the Nakamura-Kamata-Greisen (NKG) distribution which was proven to be an attractive alternative to full-fledged simulations. The simulation of <inline-formula> <tex-math notation="LaTeX">$10^{4}$ </tex-math></inline-formula> muons at the ground level takes just a few seconds using our implementation of the NKG approach. For <inline-formula> <tex-math notation="LaTeX">$10^{6}$ </tex-math></inline-formula> muons this figure is still around 1 minute. For comparison, CORSIKA based simulation performed on Prometheus supercomputer at CYFRONET computing center an ensemble of <inline-formula> <tex-math notation="LaTeX">$\sim 100$ </tex-math></inline-formula> showers initiated by a particle of <inline-formula> <tex-math notation="LaTeX">$10^{16} eV$ </tex-math></inline-formula> resulted in <inline-formula> <tex-math notation="LaTeX">$\sim 10^{4}$ </tex-math></inline-formula> muons and <inline-formula> <tex-math notation="LaTeX">$\sim 10^{5}$ </tex-math></inline-formula> electrons required computation time of the order of a few days.https://ieeexplore.ieee.org/document/10019257/Cosmic ray showersimulationdata generationdetectorsmachine learning
spellingShingle Tomasz Hachaj
Lukasz Bibrzycki
Marcin Piekarczyk
Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
IEEE Access
Cosmic ray shower
simulation
data generation
detectors
machine learning
title Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
title_full Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
title_fullStr Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
title_full_unstemmed Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
title_short Fast Training Data Generation for Machine Learning Analysis of Cosmic Ray Showers
title_sort fast training data generation for machine learning analysis of cosmic ray showers
topic Cosmic ray shower
simulation
data generation
detectors
machine learning
url https://ieeexplore.ieee.org/document/10019257/
work_keys_str_mv AT tomaszhachaj fasttrainingdatagenerationformachinelearninganalysisofcosmicrayshowers
AT lukaszbibrzycki fasttrainingdatagenerationformachinelearninganalysisofcosmicrayshowers
AT marcinpiekarczyk fasttrainingdatagenerationformachinelearninganalysisofcosmicrayshowers