Sequencing <it>Medicago truncatula </it>expressed sequenced tags using 454 Life Sciences technology

<p>Abstract</p> <p>Background</p> <p>In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure ann...

Full description

Bibliographic Details
Main Authors: Xiao Yongli, May Gregory D, Goldberg Susanne MD, Haas Brian J, Cheung Foo, Town Christopher D
Format: Article
Language:English
Published: BMC 2006-10-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/7/272
Description
Summary:<p>Abstract</p> <p>Background</p> <p>In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure annotation.</p> <p>Results</p> <p>A single 454 GS20 sequencing run on adapter-ligated cDNA, from a normalized cDNA library, generated 292,465 reads that were reduced to 252,384 reads with an average read length of 92 nucleotides after cleaning. After clustering and assembly, a total of 184,599 unique sequences were generated containing over 400 SSRs. The 454 sequences generated hits to more genes than a comparable amount of sequence from MtGI. Although short, the 454 reads are of sufficient length to map to a unique genome location as effectively as longer ESTs produced by conventional sequencing. Functional interpretation of the sequences was carried out by Gene Ontology assignments from matches to <it>Arabidopsis </it>and was shown to cover a broad range of GO categories. 53,796 assemblies and singletons (29%) had no match in the existing MtGI. Within the previously unobserved <it>Medicago </it>transcripts, thousands had matches in a comprehensive protein database and one or more of the TIGR Plant Gene Indices. Approximately 20% of these novel sequences could be found in the <it>Medicago </it>genome sequence. A total of 70,026 reads generated by the 454 technology were mapped to 785 <it>Medicago </it>finished BACs using PASA and over 1,000 gene models required modification. In parallel to 454 sequencing, 4,445 5'-prime reads were generated by conventional sequencing using the same library and from the assembled sequences it was shown to contain about 52% full length cDNAs encoding proteins from 50 to over 500 amino acids in length.</p> <p>Conclusion</p> <p>Due to the large number of reads afforded by the 454 DNA sequencing technology, it is effective in revealing the expression of transcripts from a broad range of GO categories and contains many rare transcripts in normalized cDNA libraries, although only a limited portion of their sequence is uncovered. As with longer ESTs, 454 reads can be mapped uniquely onto genomic sequence to provide support for, and modifications of, gene predictions.</p>
ISSN:1471-2164