Machine learning model for sequence-driven DNA G-quadruplex formation

Abstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiate...

Full description

Bibliographic Details
Main Authors: Aleksandr B. Sahakyan, Vicki S. Chambers, Giovanni Marsico, Tobias Santner, Marco Di Antonio, Shankar Balasubramanian
Format: Article
Language:English
Published: Nature Portfolio 2017-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-017-14017-4
_version_ 1818835127407476736
author Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
author_facet Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
author_sort Aleksandr B. Sahakyan
collection DOAJ
description Abstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.
first_indexed 2024-12-19T02:45:46Z
format Article
id doaj.art-289341f558e943b6bbdbc39c89dcf616
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-19T02:45:46Z
publishDate 2017-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-289341f558e943b6bbdbc39c89dcf6162022-12-21T20:38:54ZengNature PortfolioScientific Reports2045-23222017-11-017111110.1038/s41598-017-14017-4Machine learning model for sequence-driven DNA G-quadruplex formationAleksandr B. Sahakyan0Vicki S. Chambers1Giovanni Marsico2Tobias Santner3Marco Di Antonio4Shankar Balasubramanian5Department of Chemistry, University of Cambridge, Lensfield RoadDepartment of Chemistry, University of Cambridge, Lensfield RoadDepartment of Chemistry, University of Cambridge, Lensfield RoadDepartment of Chemistry, University of Cambridge, Lensfield RoadDepartment of Chemistry, University of Cambridge, Lensfield RoadDepartment of Chemistry, University of Cambridge, Lensfield RoadAbstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.https://doi.org/10.1038/s41598-017-14017-4
spellingShingle Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
Machine learning model for sequence-driven DNA G-quadruplex formation
Scientific Reports
title Machine learning model for sequence-driven DNA G-quadruplex formation
title_full Machine learning model for sequence-driven DNA G-quadruplex formation
title_fullStr Machine learning model for sequence-driven DNA G-quadruplex formation
title_full_unstemmed Machine learning model for sequence-driven DNA G-quadruplex formation
title_short Machine learning model for sequence-driven DNA G-quadruplex formation
title_sort machine learning model for sequence driven dna g quadruplex formation
url https://doi.org/10.1038/s41598-017-14017-4
work_keys_str_mv AT aleksandrbsahakyan machinelearningmodelforsequencedrivendnagquadruplexformation
AT vickischambers machinelearningmodelforsequencedrivendnagquadruplexformation
AT giovannimarsico machinelearningmodelforsequencedrivendnagquadruplexformation
AT tobiassantner machinelearningmodelforsequencedrivendnagquadruplexformation
AT marcodiantonio machinelearningmodelforsequencedrivendnagquadruplexformation
AT shankarbalasubramanian machinelearningmodelforsequencedrivendnagquadruplexformation