Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.

A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold int...

Full description

Bibliographic Details
Main Authors:	Alec C Gleason, Ghanashyam Ghadge, Jin Chen, Yoshifumi Sonobe, Raymond P Roos
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2022-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0256411

_version_	1811290497441857536
author	Alec C Gleason Ghanashyam Ghadge Jin Chen Yoshifumi Sonobe Raymond P Roos
author_facet	Alec C Gleason Ghanashyam Ghadge Jin Chen Yoshifumi Sonobe Raymond P Roos
author_sort	Alec C Gleason
collection	DOAJ
description	A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with ~85-88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established.
first_indexed	2024-04-13T04:13:44Z
format	Article
id	doaj.art-d6fa9877712d48a4a070728a9c9742da
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-04-13T04:13:44Z
publishDate	2022-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-d6fa9877712d48a4a070728a9c9742da2022-12-22T03:03:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01176e025641110.1371/journal.pone.0256411Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.Alec C GleasonGhanashyam GhadgeJin ChenYoshifumi SonobeRaymond P RoosA number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with ~85-88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established.https://doi.org/10.1371/journal.pone.0256411
spellingShingle	Alec C Gleason Ghanashyam Ghadge Jin Chen Yoshifumi Sonobe Raymond P Roos Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions. PLoS ONE
title	Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
title_full	Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
title_fullStr	Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
title_full_unstemmed	Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
title_short	Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
title_sort	machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
url	https://doi.org/10.1371/journal.pone.0256411
work_keys_str_mv	AT aleccgleason machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions AT ghanashyamghadge machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions AT jinchen machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions AT yoshifumisonobe machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions AT raymondproos machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions

Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.

Similar Items