BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations

Abstract Background Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common...

Full description

Bibliographic Details
Main Authors: Emanuele Marchi, Mathew Jones, Paul Klenerman, John Frater, Gkikas Magiorkinis, Robert Belshaw
Format: Article
Language:English
Published: BMC 2022-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04621-1
_version_ 1818022226160517120
author Emanuele Marchi
Mathew Jones
Paul Klenerman
John Frater
Gkikas Magiorkinis
Robert Belshaw
author_facet Emanuele Marchi
Mathew Jones
Paul Klenerman
John Frater
Gkikas Magiorkinis
Robert Belshaw
author_sort Emanuele Marchi
collection DOAJ
description Abstract Background Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common method of confirmation is visual inspection of an alignment of the chimaeric (split) reads that span a putative novel retroviral integration site. We perceived the need for a program that would facilitate this by producing a multiple alignment containing both the viral and host regions that flank an integration. Results BreakAlign is a Perl program that uses blastn to produce such a multiple alignment. In addition to the NGS dataset and a reference viral sequence, the program requires either (a) the ~ 500nt host genome sequence that spans the putative integration or (b) coordinates of this putative integration in an installed copy of the reference human genome (multiple integrations can be processed automatically). BreakAlign is freely available from https://github.com/marchiem/breakalign and is accompanied by example files allowing a test run. Conclusion BreakAlign will confirm and facilitate characterisation of both (a) germline integrations of endogenous retroviruses and (b) somatic integrations of exogenous retroviruses such as HIV and HTLV. Although developed for use with genomic short-read NGS (second generation) data and retroviruses, it should also be useful for long-read (third generation) data and any mobile element with at least one conserved flanking region.
first_indexed 2024-04-14T08:28:33Z
format Article
id doaj.art-f93c3eb8817c4d83a9e60b1652121508
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-14T08:28:33Z
publishDate 2022-04-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-f93c3eb8817c4d83a9e60b16521215082022-12-22T02:03:58ZengBMCBMC Bioinformatics1471-21052022-04-012311810.1186/s12859-022-04621-1BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrationsEmanuele Marchi0Mathew Jones1Paul Klenerman2John Frater3Gkikas Magiorkinis4Robert Belshaw5Nuffield Department of Medicine, University of OxfordNuffield Department of Medicine, University of OxfordNuffield Department of Medicine, University of OxfordNuffield Department of Medicine, University of OxfordDepartment of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of AthensDepartment of Biology, College of Science and Technology, Wenzhou-Kean UniversityAbstract Background Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common method of confirmation is visual inspection of an alignment of the chimaeric (split) reads that span a putative novel retroviral integration site. We perceived the need for a program that would facilitate this by producing a multiple alignment containing both the viral and host regions that flank an integration. Results BreakAlign is a Perl program that uses blastn to produce such a multiple alignment. In addition to the NGS dataset and a reference viral sequence, the program requires either (a) the ~ 500nt host genome sequence that spans the putative integration or (b) coordinates of this putative integration in an installed copy of the reference human genome (multiple integrations can be processed automatically). BreakAlign is freely available from https://github.com/marchiem/breakalign and is accompanied by example files allowing a test run. Conclusion BreakAlign will confirm and facilitate characterisation of both (a) germline integrations of endogenous retroviruses and (b) somatic integrations of exogenous retroviruses such as HIV and HTLV. Although developed for use with genomic short-read NGS (second generation) data and retroviruses, it should also be useful for long-read (third generation) data and any mobile element with at least one conserved flanking region.https://doi.org/10.1186/s12859-022-04621-1NGSRetrovirusProvirusIntegrationInsertionDetection
spellingShingle Emanuele Marchi
Mathew Jones
Paul Klenerman
John Frater
Gkikas Magiorkinis
Robert Belshaw
BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
BMC Bioinformatics
NGS
Retrovirus
Provirus
Integration
Insertion
Detection
title BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
title_full BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
title_fullStr BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
title_full_unstemmed BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
title_short BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations
title_sort breakalign a perl program to align chimaeric split genomic ngs reads and allow visual confirmation of novel retroviral integrations
topic NGS
Retrovirus
Provirus
Integration
Insertion
Detection
url https://doi.org/10.1186/s12859-022-04621-1
work_keys_str_mv AT emanuelemarchi breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations
AT mathewjones breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations
AT paulklenerman breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations
AT johnfrater breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations
AT gkikasmagiorkinis breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations
AT robertbelshaw breakalignaperlprogramtoalignchimaericsplitgenomicngsreadsandallowvisualconfirmationofnovelretroviralintegrations