High-quality draft assemblies of mammalian genomes from massively parallel sequence data

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use th...

Full description

Bibliographic Details
Main Authors: Gnerre, Sante, MacCallum, Iain, Przybylski, Dariusz, Ribeiro, Felipe J., Burton, Joshua, Walker, Bruce J., Sharpe, Ted, Hall, Giles, Shea, Terrance P., Sykes, Sean, Berlin, Aaron M., Aird, Daniel, Costello, Maura, Daza, Riza, Williams, Louise, Nicol, Robert, Gnirke, Andreas, Nusbaum, Chad, Jaffe, David B., Lander, Eric Steven
Other Authors: Massachusetts Institute of Technology. Department of Biology
Format: Article
Language:en_US
Published: National Academy of Sciences 2011
Online Access:http://hdl.handle.net/1721.1/64820
_version_ 1811096465924161536
author Gnerre, Sante
MacCallum, Iain
Przybylski, Dariusz
Ribeiro, Felipe J.
Burton, Joshua
Walker, Bruce J.
Sharpe, Ted
Hall, Giles
Shea, Terrance P.
Sykes, Sean
Berlin, Aaron M.
Aird, Daniel
Costello, Maura
Daza, Riza
Williams, Louise
Nicol, Robert
Gnirke, Andreas
Nusbaum, Chad
Jaffe, David B.
Lander, Eric Steven
author2 Massachusetts Institute of Technology. Department of Biology
author_facet Massachusetts Institute of Technology. Department of Biology
Gnerre, Sante
MacCallum, Iain
Przybylski, Dariusz
Ribeiro, Felipe J.
Burton, Joshua
Walker, Bruce J.
Sharpe, Ted
Hall, Giles
Shea, Terrance P.
Sykes, Sean
Berlin, Aaron M.
Aird, Daniel
Costello, Maura
Daza, Riza
Williams, Louise
Nicol, Robert
Gnirke, Andreas
Nusbaum, Chad
Jaffe, David B.
Lander, Eric Steven
author_sort Gnerre, Sante
collection MIT
description Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
first_indexed 2024-09-23T16:44:06Z
format Article
id mit-1721.1/64820
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:44:06Z
publishDate 2011
publisher National Academy of Sciences
record_format dspace
spelling mit-1721.1/648202022-09-29T21:07:42Z High-quality draft assemblies of mammalian genomes from massively parallel sequence data Gnerre, Sante MacCallum, Iain Przybylski, Dariusz Ribeiro, Felipe J. Burton, Joshua Walker, Bruce J. Sharpe, Ted Hall, Giles Shea, Terrance P. Sykes, Sean Berlin, Aaron M. Aird, Daniel Costello, Maura Daza, Riza Williams, Louise Nicol, Robert Gnirke, Andreas Nusbaum, Chad Jaffe, David B. Lander, Eric Steven Massachusetts Institute of Technology. Department of Biology Lander, Eric S Lander, Eric S. Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd. National Institutes of Health (U.S.) National Human Genome Research Institute (U.S.) (Grant U54HG003067) National Human Genome Research Institute (U.S.) (Grant R01HG003474) National Institute of Allergy and Infectious Diseases (U.S.) (Contract HHSN2722009000018C) 2011-07-15T16:53:31Z 2011-07-15T16:53:31Z 2010-12 2010-10 Article http://purl.org/eprint/type/JournalArticle 0027-8424 1091-6490 http://hdl.handle.net/1721.1/64820 Gnerre, S. et al. “High-quality Draft Assemblies of Mammalian Genomes from Massively Parallel Sequence Data.” Proceedings of the National Academy of Sciences 108.4 (2010) : 1513-1518. en_US http://dx.doi.org/10.1073/pnas.1017351108 Proceedings of the National Academy of Sciences of the United States of America Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf National Academy of Sciences PNAS
spellingShingle Gnerre, Sante
MacCallum, Iain
Przybylski, Dariusz
Ribeiro, Felipe J.
Burton, Joshua
Walker, Bruce J.
Sharpe, Ted
Hall, Giles
Shea, Terrance P.
Sykes, Sean
Berlin, Aaron M.
Aird, Daniel
Costello, Maura
Daza, Riza
Williams, Louise
Nicol, Robert
Gnirke, Andreas
Nusbaum, Chad
Jaffe, David B.
Lander, Eric Steven
High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title_full High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title_fullStr High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title_full_unstemmed High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title_short High-quality draft assemblies of mammalian genomes from massively parallel sequence data
title_sort high quality draft assemblies of mammalian genomes from massively parallel sequence data
url http://hdl.handle.net/1721.1/64820
work_keys_str_mv AT gnerresante highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT maccallumiain highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT przybylskidariusz highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT ribeirofelipej highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT burtonjoshua highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT walkerbrucej highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT sharpeted highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT hallgiles highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT sheaterrancep highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT sykessean highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT berlinaaronm highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT airddaniel highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT costellomaura highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT dazariza highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT williamslouise highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT nicolrobert highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT gnirkeandreas highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT nusbaumchad highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT jaffedavidb highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata
AT landerericsteven highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata