Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach

Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated...

Full description

Bibliographic Details
Main Authors: Kyoung Tak Cho, Taner Z. Sen, Carson M. Andorf
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-05-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2022.830170/full
_version_ 1818467953882955776
author Kyoung Tak Cho
Taner Z. Sen
Carson M. Andorf
author_facet Kyoung Tak Cho
Taner Z. Sen
Carson M. Andorf
author_sort Kyoung Tak Cho
collection DOAJ
description Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.
first_indexed 2024-04-13T21:06:20Z
format Article
id doaj.art-09766cf15a494c589ee629b81dfd6a33
institution Directory Open Access Journal
issn 2624-8212
language English
last_indexed 2024-04-13T21:06:20Z
publishDate 2022-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj.art-09766cf15a494c589ee629b81dfd6a332022-12-22T02:29:57ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122022-05-01510.3389/frai.2022.830170830170Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning ApproachKyoung Tak Cho0Taner Z. Sen1Carson M. Andorf2Department of Computer Science, Iowa State University, Ames, IA, United StatesUSDA-ARS, Crop Improvement and Genetics Research Unit, Albany, CA, United StatesUSDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA, United StatesMachine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.https://www.frontiersin.org/articles/10.3389/frai.2022.830170/fullmaize geneticsgene expressionprotein abundancemRNA abundancemachine learning
spellingShingle Kyoung Tak Cho
Taner Z. Sen
Carson M. Andorf
Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
Frontiers in Artificial Intelligence
maize genetics
gene expression
protein abundance
mRNA abundance
machine learning
title Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_full Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_fullStr Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_full_unstemmed Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_short Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_sort predicting tissue specific mrna and protein abundance in maize a machine learning approach
topic maize genetics
gene expression
protein abundance
mRNA abundance
machine learning
url https://www.frontiersin.org/articles/10.3389/frai.2022.830170/full
work_keys_str_mv AT kyoungtakcho predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach
AT tanerzsen predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach
AT carsonmandorf predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach