Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions

Abstract As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express signif...

Full description

Bibliographic Details
Main Authors: Nicholas Dominic, Tjeng Wawan Cenggoro, Arif Budiarto, Bens Pardamean
Format: Article
Language:English
Published: Nature Portfolio 2022-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-16075-9
_version_ 1811283804005859328
author Nicholas Dominic
Tjeng Wawan Cenggoro
Arif Budiarto
Bens Pardamean
author_facet Nicholas Dominic
Tjeng Wawan Cenggoro
Arif Budiarto
Bens Pardamean
author_sort Nicholas Dominic
collection DOAJ
description Abstract As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. The NucleoNets were constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net, and Shannon’s entropy loss. This polygenic modeling obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE), while revealing 15 new important SNPs. Furthermore, the NucleoNets reduced the MSE score up to 32.28% compared to the Ordinary Least Squares (OLS) model. Through the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various important SNPs throughout 12 chromosomes. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.
first_indexed 2024-04-13T02:18:09Z
format Article
id doaj.art-74aec2005e394d31bc3f88f22147aee6
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T02:18:09Z
publishDate 2022-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-74aec2005e394d31bc3f88f22147aee62022-12-22T03:07:04ZengNature PortfolioScientific Reports2045-23222022-08-0112111610.1038/s41598-022-16075-9Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessionsNicholas Dominic0Tjeng Wawan Cenggoro1Arif Budiarto2Bens Pardamean3BINUS Graduate Program, Bina Nusantara UniversitySchool of Computer Science, Bina Nusantara UniversitySchool of Computer Science, Bina Nusantara UniversityBINUS Graduate Program, Bina Nusantara UniversityAbstract As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. The NucleoNets were constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net, and Shannon’s entropy loss. This polygenic modeling obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE), while revealing 15 new important SNPs. Furthermore, the NucleoNets reduced the MSE score up to 32.28% compared to the Ordinary Least Squares (OLS) model. Through the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various important SNPs throughout 12 chromosomes. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.https://doi.org/10.1038/s41598-022-16075-9
spellingShingle Nicholas Dominic
Tjeng Wawan Cenggoro
Arif Budiarto
Bens Pardamean
Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
Scientific Reports
title Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
title_full Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
title_fullStr Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
title_full_unstemmed Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
title_short Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions
title_sort deep polygenic neural network for predicting and identifying yield associated genes in indonesian rice accessions
url https://doi.org/10.1038/s41598-022-16075-9
work_keys_str_mv AT nicholasdominic deeppolygenicneuralnetworkforpredictingandidentifyingyieldassociatedgenesinindonesianriceaccessions
AT tjengwawancenggoro deeppolygenicneuralnetworkforpredictingandidentifyingyieldassociatedgenesinindonesianriceaccessions
AT arifbudiarto deeppolygenicneuralnetworkforpredictingandidentifyingyieldassociatedgenesinindonesianriceaccessions
AT benspardamean deeppolygenicneuralnetworkforpredictingandidentifyingyieldassociatedgenesinindonesianriceaccessions