Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation

Abstract Background Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadr...

Full description

Bibliographic Details
Main Authors: Kseniia Cheloshkina, Maria Poptsova
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Cancer
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12885-019-5653-x
_version_ 1818276594140053504
author Kseniia Cheloshkina
Maria Poptsova
author_facet Kseniia Cheloshkina
Maria Poptsova
author_sort Kseniia Cheloshkina
collection DOAJ
description Abstract Background Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. Methods We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). Results We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. Conclusion Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific.
first_indexed 2024-12-12T22:48:07Z
format Article
id doaj.art-abe76469e3f54e4eb37849905021798d
institution Directory Open Access Journal
issn 1471-2407
language English
last_indexed 2024-12-12T22:48:07Z
publishDate 2019-05-01
publisher BMC
record_format Article
series BMC Cancer
spelling doaj.art-abe76469e3f54e4eb37849905021798d2022-12-22T00:09:08ZengBMCBMC Cancer1471-24072019-05-0119111710.1186/s12885-019-5653-xTissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formationKseniia Cheloshkina0Maria Poptsova1Faculty of Computer Science, National Research University Higher School of EconomicsFaculty of Computer Science, National Research University Higher School of EconomicsAbstract Background Chromosomal rearrangements are the typical phenomena in cancer genomes causing gene disruptions and fusions, corruption of regulatory elements, damage to chromosome integrity. Among the factors contributing to genomic instability are non-B DNA structures with stem-loops and quadruplexes being the most prevalent. We aimed at investigating the impact of specifically these two classes of non-B DNA structures on cancer breakpoint hotspots using machine learning approach. Methods We developed procedure for machine learning model building and evaluation as the considered data are extremely imbalanced and it was required to get a reliable estimate of the prediction power. We built logistic regression models predicting cancer breakpoint hotspots based on the densities of stem-loops and quadruplexes, jointly and separately. We also tested Random Forest models varying different resampling schemes (leave-one-out cross validation, train-test split, 3-fold cross-validation) and class balancing techniques (oversampling, stratification, synthetic minority oversampling). Results We performed analysis of 487,425 breakpoints from 2234 samples covering 10 cancer types available from the International Cancer Genome Consortium. We showed that distribution of breakpoint hotspots in different types of cancer are not correlated, confirming the heterogeneous nature of cancer. It appeared that stem-loop-based model best explains the blood, brain, liver, and prostate cancer breakpoint hotspot profiles while quadruplex-based model has higher performance for the bone, breast, ovary, pancreatic, and skin cancer. For the overall cancer profile and uterus cancer the joint model shows the highest performance. For particular datasets the constructed models reach high predictive power using just one predictor, and in the majority of the cases, the model built on both predictors does not increase the model performance. Conclusion Despite the heterogeneity in breakpoint hotspots’ distribution across different cancer types, our results demonstrate an association between cancer breakpoint hotspots and stem-loops and quadruplexes. Approximately for half of the cancer types stem-loops are the most influential factors while for the others these are quadruplexes. This fact reflects the differences in regulatory potential of stem-loops and quadruplexes at the tissue-specific level, which yet to be discovered at the genome-wide scale. The performed analysis demonstrates that influence of stem-loops and quadruplexes on breakpoint hotspots formation is tissue-specific.http://link.springer.com/article/10.1186/s12885-019-5653-xStem-loopsQuadruplexesNon-B motifsDNA secondary structuresCancer genomesCancer mutations
spellingShingle Kseniia Cheloshkina
Maria Poptsova
Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
BMC Cancer
Stem-loops
Quadruplexes
Non-B motifs
DNA secondary structures
Cancer genomes
Cancer mutations
title Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_full Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_fullStr Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_full_unstemmed Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_short Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation
title_sort tissue specific impact of stem loops and quadruplexes on cancer breakpoints formation
topic Stem-loops
Quadruplexes
Non-B motifs
DNA secondary structures
Cancer genomes
Cancer mutations
url http://link.springer.com/article/10.1186/s12885-019-5653-x
work_keys_str_mv AT kseniiacheloshkina tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation
AT mariapoptsova tissuespecificimpactofstemloopsandquadruplexesoncancerbreakpointsformation