High-quality draft assemblies of mammalian genomes from massively parallel sequence data
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use th...
Main Authors: | , , , , , , , , , , , , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
National Academy of Sciences
2011
|
Online Access: | http://hdl.handle.net/1721.1/64820 |
_version_ | 1811096465924161536 |
---|---|
author | Gnerre, Sante MacCallum, Iain Przybylski, Dariusz Ribeiro, Felipe J. Burton, Joshua Walker, Bruce J. Sharpe, Ted Hall, Giles Shea, Terrance P. Sykes, Sean Berlin, Aaron M. Aird, Daniel Costello, Maura Daza, Riza Williams, Louise Nicol, Robert Gnirke, Andreas Nusbaum, Chad Jaffe, David B. Lander, Eric Steven |
author2 | Massachusetts Institute of Technology. Department of Biology |
author_facet | Massachusetts Institute of Technology. Department of Biology Gnerre, Sante MacCallum, Iain Przybylski, Dariusz Ribeiro, Felipe J. Burton, Joshua Walker, Bruce J. Sharpe, Ted Hall, Giles Shea, Terrance P. Sykes, Sean Berlin, Aaron M. Aird, Daniel Costello, Maura Daza, Riza Williams, Louise Nicol, Robert Gnirke, Andreas Nusbaum, Chad Jaffe, David B. Lander, Eric Steven |
author_sort | Gnerre, Sante |
collection | MIT |
description | Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd. |
first_indexed | 2024-09-23T16:44:06Z |
format | Article |
id | mit-1721.1/64820 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T16:44:06Z |
publishDate | 2011 |
publisher | National Academy of Sciences |
record_format | dspace |
spelling | mit-1721.1/648202022-09-29T21:07:42Z High-quality draft assemblies of mammalian genomes from massively parallel sequence data Gnerre, Sante MacCallum, Iain Przybylski, Dariusz Ribeiro, Felipe J. Burton, Joshua Walker, Bruce J. Sharpe, Ted Hall, Giles Shea, Terrance P. Sykes, Sean Berlin, Aaron M. Aird, Daniel Costello, Maura Daza, Riza Williams, Louise Nicol, Robert Gnirke, Andreas Nusbaum, Chad Jaffe, David B. Lander, Eric Steven Massachusetts Institute of Technology. Department of Biology Lander, Eric S Lander, Eric S. Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd. National Institutes of Health (U.S.) National Human Genome Research Institute (U.S.) (Grant U54HG003067) National Human Genome Research Institute (U.S.) (Grant R01HG003474) National Institute of Allergy and Infectious Diseases (U.S.) (Contract HHSN2722009000018C) 2011-07-15T16:53:31Z 2011-07-15T16:53:31Z 2010-12 2010-10 Article http://purl.org/eprint/type/JournalArticle 0027-8424 1091-6490 http://hdl.handle.net/1721.1/64820 Gnerre, S. et al. “High-quality Draft Assemblies of Mammalian Genomes from Massively Parallel Sequence Data.” Proceedings of the National Academy of Sciences 108.4 (2010) : 1513-1518. en_US http://dx.doi.org/10.1073/pnas.1017351108 Proceedings of the National Academy of Sciences of the United States of America Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf National Academy of Sciences PNAS |
spellingShingle | Gnerre, Sante MacCallum, Iain Przybylski, Dariusz Ribeiro, Felipe J. Burton, Joshua Walker, Bruce J. Sharpe, Ted Hall, Giles Shea, Terrance P. Sykes, Sean Berlin, Aaron M. Aird, Daniel Costello, Maura Daza, Riza Williams, Louise Nicol, Robert Gnirke, Andreas Nusbaum, Chad Jaffe, David B. Lander, Eric Steven High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title | High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title_full | High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title_fullStr | High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title_full_unstemmed | High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title_short | High-quality draft assemblies of mammalian genomes from massively parallel sequence data |
title_sort | high quality draft assemblies of mammalian genomes from massively parallel sequence data |
url | http://hdl.handle.net/1721.1/64820 |
work_keys_str_mv | AT gnerresante highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT maccallumiain highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT przybylskidariusz highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT ribeirofelipej highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT burtonjoshua highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT walkerbrucej highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT sharpeted highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT hallgiles highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT sheaterrancep highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT sykessean highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT berlinaaronm highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT airddaniel highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT costellomaura highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT dazariza highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT williamslouise highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT nicolrobert highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT gnirkeandreas highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT nusbaumchad highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT jaffedavidb highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata AT landerericsteven highqualitydraftassembliesofmammaliangenomesfrommassivelyparallelsequencedata |