GASOLINE: detecting germline and somatic structural variants from long-reads data

Abstract Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identificati...

Full description

Bibliographic Details
Main Authors: Alberto Magi, Gianluca Mattei, Alessandra Mingrino, Chiara Caprioli, Chiara Ronchini, Gianmaria Frigè, Roberto Semeraro, Marta Baragli, Davide Bolognini, Emanuela Colombo, Luca Mazzarella, Pier Giuseppe Pelicci
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-48285-0
_version_ 1797388420383768576
author Alberto Magi
Gianluca Mattei
Alessandra Mingrino
Chiara Caprioli
Chiara Ronchini
Gianmaria Frigè
Roberto Semeraro
Marta Baragli
Davide Bolognini
Emanuela Colombo
Luca Mazzarella
Pier Giuseppe Pelicci
author_facet Alberto Magi
Gianluca Mattei
Alessandra Mingrino
Chiara Caprioli
Chiara Ronchini
Gianmaria Frigè
Roberto Semeraro
Marta Baragli
Davide Bolognini
Emanuela Colombo
Luca Mazzarella
Pier Giuseppe Pelicci
author_sort Alberto Magi
collection DOAJ
description Abstract Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30 $$\times $$ × sequencing coverage experiments requires 4–5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
first_indexed 2024-03-08T22:40:36Z
format Article
id doaj.art-9b134809fa3345dab06c684dc490b0e0
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-08T22:40:36Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-9b134809fa3345dab06c684dc490b0e02023-12-17T12:13:05ZengNature PortfolioScientific Reports2045-23222023-11-0113111110.1038/s41598-023-48285-0GASOLINE: detecting germline and somatic structural variants from long-reads dataAlberto Magi0Gianluca Mattei1Alessandra Mingrino2Chiara Caprioli3Chiara Ronchini4Gianmaria Frigè5Roberto Semeraro6Marta Baragli7Davide Bolognini8Emanuela Colombo9Luca Mazzarella10Pier Giuseppe Pelicci11Department of Information Engineering, University of FlorenceDepartment of Information Engineering, University of FlorenceDepartment of Experimental and Clinical Medicine, University of FlorenceDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSDepartment of Experimental and Clinical Medicine, University of FlorenceDepartment of Information Engineering, University of FlorenceDepartment of Experimental and Clinical Medicine, University of FlorenceDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSDepartment of Experimental Oncology, IEO European Institute of Oncology IRCCSAbstract Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30 $$\times $$ × sequencing coverage experiments requires 4–5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.https://doi.org/10.1038/s41598-023-48285-0
spellingShingle Alberto Magi
Gianluca Mattei
Alessandra Mingrino
Chiara Caprioli
Chiara Ronchini
Gianmaria Frigè
Roberto Semeraro
Marta Baragli
Davide Bolognini
Emanuela Colombo
Luca Mazzarella
Pier Giuseppe Pelicci
GASOLINE: detecting germline and somatic structural variants from long-reads data
Scientific Reports
title GASOLINE: detecting germline and somatic structural variants from long-reads data
title_full GASOLINE: detecting germline and somatic structural variants from long-reads data
title_fullStr GASOLINE: detecting germline and somatic structural variants from long-reads data
title_full_unstemmed GASOLINE: detecting germline and somatic structural variants from long-reads data
title_short GASOLINE: detecting germline and somatic structural variants from long-reads data
title_sort gasoline detecting germline and somatic structural variants from long reads data
url https://doi.org/10.1038/s41598-023-48285-0
work_keys_str_mv AT albertomagi gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT gianlucamattei gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT alessandramingrino gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT chiaracaprioli gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT chiararonchini gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT gianmariafrige gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT robertosemeraro gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT martabaragli gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT davidebolognini gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT emanuelacolombo gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT lucamazzarella gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT piergiuseppepelicci gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata