Towards the genomic sequence code of DNA fragility for machine learning
Genomic DNA breakages and the subsequent insertion and deletion mutations are important contributors to genome instability and linked diseases. Unlike the research in point mutations, the relationship between DNA sequence context and the propensity for strand breaks remains elusive. Here, by analyzi...
Main Authors: | , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Oxford University Press
2024
|
_version_ | 1817931746055815168 |
---|---|
author | Pflughaupt, P Abdullah, A Masuda, K Sahakyan, A |
author_facet | Pflughaupt, P Abdullah, A Masuda, K Sahakyan, A |
author_sort | Pflughaupt, P |
collection | OXFORD |
description | Genomic DNA breakages and the subsequent insertion and deletion mutations are important contributors to genome instability and linked diseases. Unlike the research in point mutations, the relationship between DNA sequence context and the propensity for strand breaks remains elusive. Here, by analyzing the differences and commonalities across myriads of genomic breakage datasets, we extract the sequence-linked rules and patterns behind DNA fragility. We show the overall deconvolution of the sequence influence into short-, mid- and long-range effects, and the stressor-dependent differences in defining the range and compositional effects on DNA fragility. We summarize and release our feature compendium as a library that can be seamlessly incorporated into genomic machine learning procedures, where DNA fragility is of concern, and train a generalized DNA fragility model on cancer-associated breakages. Structural variants (SVs) tend to stabilize regions in which they emerge, with the effect most pronounced for pathogenic SVs. In contrast, the effects of chromothripsis are seen across regions less prone to breakages. We find that viral integration may bring genome fragility, particularly for cancer-associated viruses. Overall, this work offers novel insights into the genomic sequence basis of DNA fragility and presents a powerful machine learning resource to further enhance our understanding of genome (in)stability and evolution. |
first_indexed | 2024-12-09T03:26:55Z |
format | Journal article |
id | oxford-uuid:f8a10a7f-e33a-49e7-89c9-d831d0138e3a |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:26:55Z |
publishDate | 2024 |
publisher | Oxford University Press |
record_format | dspace |
spelling | oxford-uuid:f8a10a7f-e33a-49e7-89c9-d831d0138e3a2024-11-28T20:07:20ZTowards the genomic sequence code of DNA fragility for machine learningJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:f8a10a7f-e33a-49e7-89c9-d831d0138e3aEnglishJisc Publications RouterOxford University Press2024Pflughaupt, PAbdullah, AMasuda, KSahakyan, AGenomic DNA breakages and the subsequent insertion and deletion mutations are important contributors to genome instability and linked diseases. Unlike the research in point mutations, the relationship between DNA sequence context and the propensity for strand breaks remains elusive. Here, by analyzing the differences and commonalities across myriads of genomic breakage datasets, we extract the sequence-linked rules and patterns behind DNA fragility. We show the overall deconvolution of the sequence influence into short-, mid- and long-range effects, and the stressor-dependent differences in defining the range and compositional effects on DNA fragility. We summarize and release our feature compendium as a library that can be seamlessly incorporated into genomic machine learning procedures, where DNA fragility is of concern, and train a generalized DNA fragility model on cancer-associated breakages. Structural variants (SVs) tend to stabilize regions in which they emerge, with the effect most pronounced for pathogenic SVs. In contrast, the effects of chromothripsis are seen across regions less prone to breakages. We find that viral integration may bring genome fragility, particularly for cancer-associated viruses. Overall, this work offers novel insights into the genomic sequence basis of DNA fragility and presents a powerful machine learning resource to further enhance our understanding of genome (in)stability and evolution. |
spellingShingle | Pflughaupt, P Abdullah, A Masuda, K Sahakyan, A Towards the genomic sequence code of DNA fragility for machine learning |
title | Towards the genomic sequence code of DNA fragility for machine learning |
title_full | Towards the genomic sequence code of DNA fragility for machine learning |
title_fullStr | Towards the genomic sequence code of DNA fragility for machine learning |
title_full_unstemmed | Towards the genomic sequence code of DNA fragility for machine learning |
title_short | Towards the genomic sequence code of DNA fragility for machine learning |
title_sort | towards the genomic sequence code of dna fragility for machine learning |
work_keys_str_mv | AT pflughauptp towardsthegenomicsequencecodeofdnafragilityformachinelearning AT abdullaha towardsthegenomicsequencecodeofdnafragilityformachinelearning AT masudak towardsthegenomicsequencecodeofdnafragilityformachinelearning AT sahakyana towardsthegenomicsequencecodeofdnafragilityformachinelearning |