Improving PacBio long read accuracy by short read alignment.
The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS techn...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2012-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3464235?pdf=render |
_version_ | 1818113085622190080 |
---|---|
author | Kin Fai Au Jason G Underwood Lawrence Lee Wing Hung Wong |
author_facet | Kin Fai Au Jason G Underwood Lawrence Lee Wing Hung Wong |
author_sort | Kin Fai Au |
collection | DOAJ |
description | The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity. |
first_indexed | 2024-12-11T03:29:14Z |
format | Article |
id | doaj.art-d10b2cc84a8f4a56acf073f5adb524ba |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-11T03:29:14Z |
publishDate | 2012-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-d10b2cc84a8f4a56acf073f5adb524ba2022-12-22T01:22:25ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-01710e4667910.1371/journal.pone.0046679Improving PacBio long read accuracy by short read alignment.Kin Fai AuJason G UnderwoodLawrence LeeWing Hung WongThe recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.http://europepmc.org/articles/PMC3464235?pdf=render |
spellingShingle | Kin Fai Au Jason G Underwood Lawrence Lee Wing Hung Wong Improving PacBio long read accuracy by short read alignment. PLoS ONE |
title | Improving PacBio long read accuracy by short read alignment. |
title_full | Improving PacBio long read accuracy by short read alignment. |
title_fullStr | Improving PacBio long read accuracy by short read alignment. |
title_full_unstemmed | Improving PacBio long read accuracy by short read alignment. |
title_short | Improving PacBio long read accuracy by short read alignment. |
title_sort | improving pacbio long read accuracy by short read alignment |
url | http://europepmc.org/articles/PMC3464235?pdf=render |
work_keys_str_mv | AT kinfaiau improvingpacbiolongreadaccuracybyshortreadalignment AT jasongunderwood improvingpacbiolongreadaccuracybyshortreadalignment AT lawrencelee improvingpacbiolongreadaccuracybyshortreadalignment AT winghungwong improvingpacbiolongreadaccuracybyshortreadalignment |