Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease

Abstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into po...

Full description

Bibliographic Details
Main Authors: Emily Kunce Stroup, Zhe Ji
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-43266-3
_version_ 1797558399115722752
author Emily Kunce Stroup
Zhe Ji
author_facet Emily Kunce Stroup
Zhe Ji
author_sort Emily Kunce Stroup
collection DOAJ
description Abstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.
first_indexed 2024-03-10T17:30:54Z
format Article
id doaj.art-ad3debce15424a6faa1e23413ef937b7
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-10T17:30:54Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-ad3debce15424a6faa1e23413ef937b72023-11-20T10:02:42ZengNature PortfolioNature Communications2041-17232023-11-0114111710.1038/s41467-023-43266-3Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in diseaseEmily Kunce Stroup0Zhe Ji1Department of Pharmacology, Feinberg School of Medicine, Northwestern UniversityDepartment of Pharmacology, Feinberg School of Medicine, Northwestern UniversityAbstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.https://doi.org/10.1038/s41467-023-43266-3
spellingShingle Emily Kunce Stroup
Zhe Ji
Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
Nature Communications
title Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_full Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_fullStr Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_full_unstemmed Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_short Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
title_sort deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
url https://doi.org/10.1038/s41467-023-43266-3
work_keys_str_mv AT emilykuncestroup deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease
AT zheji deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease