Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease
Abstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into po...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-11-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-023-43266-3 |
_version_ | 1797558399115722752 |
---|---|
author | Emily Kunce Stroup Zhe Ji |
author_facet | Emily Kunce Stroup Zhe Ji |
author_sort | Emily Kunce Stroup |
collection | DOAJ |
description | Abstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases. |
first_indexed | 2024-03-10T17:30:54Z |
format | Article |
id | doaj.art-ad3debce15424a6faa1e23413ef937b7 |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2024-03-10T17:30:54Z |
publishDate | 2023-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-ad3debce15424a6faa1e23413ef937b72023-11-20T10:02:42ZengNature PortfolioNature Communications2041-17232023-11-0114111710.1038/s41467-023-43266-3Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in diseaseEmily Kunce Stroup0Zhe Ji1Department of Pharmacology, Feinberg School of Medicine, Northwestern UniversityDepartment of Pharmacology, Feinberg School of Medicine, Northwestern UniversityAbstract The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.https://doi.org/10.1038/s41467-023-43266-3 |
spellingShingle | Emily Kunce Stroup Zhe Ji Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease Nature Communications |
title | Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
title_full | Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
title_fullStr | Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
title_full_unstemmed | Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
title_short | Deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
title_sort | deep learning of human polyadenylation sites at nucleotide resolution reveals molecular determinants of site usage and relevance in disease |
url | https://doi.org/10.1038/s41467-023-43266-3 |
work_keys_str_mv | AT emilykuncestroup deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease AT zheji deeplearningofhumanpolyadenylationsitesatnucleotideresolutionrevealsmoleculardeterminantsofsiteusageandrelevanceindisease |