SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions

Copy-number variation (CNV) has been associated with increased risk of complex diseases. High throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structures of genomes as well as their evo...

Full description

Bibliographic Details
Main Authors: HOANG T NGUYEN, James Boocock, Tony R Merriman, Mik A Black
Format: Article
Language:English
Published: Frontiers Media S.A. 2016-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00160/full
_version_ 1818841123776364544
author HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
James Boocock
James Boocock
James Boocock
Tony R Merriman
Tony R Merriman
Mik A Black
Mik A Black
author_facet HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
James Boocock
James Boocock
James Boocock
Tony R Merriman
Tony R Merriman
Mik A Black
Mik A Black
author_sort HOANG T NGUYEN
collection DOAJ
description Copy-number variation (CNV) has been associated with increased risk of complex diseases. High throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structures of genomes as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximise information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92%, 100% and 82%, respectively).The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.
first_indexed 2024-12-19T04:21:05Z
format Article
id doaj.art-77908b03557448e0ac877fd1daf46778
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-19T04:21:05Z
publishDate 2016-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-77908b03557448e0ac877fd1daf467782022-12-21T20:36:10ZengFrontiers Media S.A.Frontiers in Genetics1664-80212016-09-01710.3389/fgene.2016.00160205146SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regionsHOANG T NGUYEN0HOANG T NGUYEN1HOANG T NGUYEN2HOANG T NGUYEN3James Boocock4James Boocock5James Boocock6Tony R Merriman7Tony R Merriman8Mik A Black9Mik A Black10Otago UniversityVirtual Institute of Statistical Genetics, New ZealandMount Sinai School of MedicineCao Thang College of TechnologyOtago UniversityVirtual Institute of Statistical Genetics, New ZealandMount Sinai School of MedicineOtago UniversityVirtual Institute of Statistical Genetics, New ZealandOtago UniversityVirtual Institute of Statistical Genetics, New ZealandCopy-number variation (CNV) has been associated with increased risk of complex diseases. High throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structures of genomes as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximise information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92%, 100% and 82%, respectively).The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00160/fullCopy number variant (CNV)structural variation (SV)read depthbreakpoint cluster regionsplit read
spellingShingle HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
HOANG T NGUYEN
James Boocock
James Boocock
James Boocock
Tony R Merriman
Tony R Merriman
Mik A Black
Mik A Black
SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
Frontiers in Genetics
Copy number variant (CNV)
structural variation (SV)
read depth
breakpoint cluster region
split read
title SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
title_full SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
title_fullStr SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
title_full_unstemmed SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
title_short SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions
title_sort srbreak a read depth and split read framework to identify breakpoints of different events inside simple copy number variable regions
topic Copy number variant (CNV)
structural variation (SV)
read depth
breakpoint cluster region
split read
url http://journal.frontiersin.org/Journal/10.3389/fgene.2016.00160/full
work_keys_str_mv AT hoangtnguyen srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT hoangtnguyen srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT hoangtnguyen srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT hoangtnguyen srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT jamesboocock srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT jamesboocock srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT jamesboocock srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT tonyrmerriman srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT tonyrmerriman srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT mikablack srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions
AT mikablack srbreakareaddepthandsplitreadframeworktoidentifybreakpointsofdifferenteventsinsidesimplecopynumbervariableregions