Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.

Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improv...

Full description

Bibliographic Details
Main Authors: Bo-Young Kim, Jung Hoon Park, Hye-Yeong Jo, Soo Kyung Koo, Mi-Hyun Park
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5549930?pdf=render
_version_ 1819232295980105728
author Bo-Young Kim
Jung Hoon Park
Hye-Yeong Jo
Soo Kyung Koo
Mi-Hyun Park
author_facet Bo-Young Kim
Jung Hoon Park
Hye-Yeong Jo
Soo Kyung Koo
Mi-Hyun Park
author_sort Bo-Young Kim
collection DOAJ
description Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection.
first_indexed 2024-12-23T11:58:36Z
format Article
id doaj.art-0fe158860ce5418f90bf33f0655918fc
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-23T11:58:36Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-0fe158860ce5418f90bf33f0655918fc2022-12-21T17:48:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01128e018227210.1371/journal.pone.0182272Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.Bo-Young KimJung Hoon ParkHye-Yeong JoSoo Kyung KooMi-Hyun ParkInsertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection.http://europepmc.org/articles/PMC5549930?pdf=render
spellingShingle Bo-Young Kim
Jung Hoon Park
Hye-Yeong Jo
Soo Kyung Koo
Mi-Hyun Park
Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
PLoS ONE
title Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
title_full Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
title_fullStr Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
title_full_unstemmed Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
title_short Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
title_sort optimized detection of insertions deletions indels in whole exome sequencing data
url http://europepmc.org/articles/PMC5549930?pdf=render
work_keys_str_mv AT boyoungkim optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT junghoonpark optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT hyeyeongjo optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT sookyungkoo optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT mihyunpark optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata