Improving PacBio long read accuracy by short read alignment.

The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS techn...

Full description

Bibliographic Details
Main Authors: Kin Fai Au, Jason G Underwood, Lawrence Lee, Wing Hung Wong
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3464235?pdf=render
_version_ 1818113085622190080
author Kin Fai Au
Jason G Underwood
Lawrence Lee
Wing Hung Wong
author_facet Kin Fai Au
Jason G Underwood
Lawrence Lee
Wing Hung Wong
author_sort Kin Fai Au
collection DOAJ
description The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
first_indexed 2024-12-11T03:29:14Z
format Article
id doaj.art-d10b2cc84a8f4a56acf073f5adb524ba
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-11T03:29:14Z
publishDate 2012-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-d10b2cc84a8f4a56acf073f5adb524ba2022-12-22T01:22:25ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-01710e4667910.1371/journal.pone.0046679Improving PacBio long read accuracy by short read alignment.Kin Fai AuJason G UnderwoodLawrence LeeWing Hung WongThe recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.http://europepmc.org/articles/PMC3464235?pdf=render
spellingShingle Kin Fai Au
Jason G Underwood
Lawrence Lee
Wing Hung Wong
Improving PacBio long read accuracy by short read alignment.
PLoS ONE
title Improving PacBio long read accuracy by short read alignment.
title_full Improving PacBio long read accuracy by short read alignment.
title_fullStr Improving PacBio long read accuracy by short read alignment.
title_full_unstemmed Improving PacBio long read accuracy by short read alignment.
title_short Improving PacBio long read accuracy by short read alignment.
title_sort improving pacbio long read accuracy by short read alignment
url http://europepmc.org/articles/PMC3464235?pdf=render
work_keys_str_mv AT kinfaiau improvingpacbiolongreadaccuracybyshortreadalignment
AT jasongunderwood improvingpacbiolongreadaccuracybyshortreadalignment
AT lawrencelee improvingpacbiolongreadaccuracybyshortreadalignment
AT winghungwong improvingpacbiolongreadaccuracybyshortreadalignment