The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborat...

Full description

Bibliographic Details
Main Authors:	Kellis, Manolis, Lin, Michael F.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Cold Spring Harbor Laboratory Press 2012
Online Access:	http://hdl.handle.net/1721.1/72151

_version_	1826195828890402816
author	Kellis, Manolis Lin, Michael F.
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kellis, Manolis Lin, Michael F.
author_sort	Kellis, Manolis
collection	MIT
description	Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.
first_indexed	2024-09-23T10:16:07Z
format	Article
id	mit-1721.1/72151
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:16:07Z
publishDate	2012
publisher	Cold Spring Harbor Laboratory Press
record_format	dspace
spelling	mit-1721.1/721512022-09-30T20:00:55Z The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes Kellis, Manolis Lin, Michael F. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kellis, Manolis Kellis, Manolis Lin, Michael F. Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions. National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01) Wellcome Trust (London, England) (Grant number WT062023) Wellcome Trust (London, England) (Grant number WT077198) 2012-08-15T17:40:09Z 2012-08-15T17:40:09Z 2009-04 2008-12 Article http://purl.org/eprint/type/JournalArticle 1088-9051 1088-9051 http://hdl.handle.net/1721.1/72151 Pruitt, K. D. et al. “The Consensus Coding Sequence (CCDS) Project: Identifying a Common Protein-coding Gene Set for the Human and Mouse Genomes.” Genome Research 19.7 (2009): 1316–1323. Copyright © 2009 by Cold Spring Harbor Laboratory Press en_US http://dx.doi.org/10.1101/gr.080531.108 Genome Research Creative Commons Attribution-NonCommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/ application/pdf Cold Spring Harbor Laboratory Press Genome Research
spellingShingle	Kellis, Manolis Lin, Michael F. The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title	The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title_full	The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title_fullStr	The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title_full_unstemmed	The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title_short	The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes
title_sort	consensus coding sequence ccds project identifying a common protein coding gene set for the human and mouse genomes
url	http://hdl.handle.net/1721.1/72151
work_keys_str_mv	AT kellismanolis theconsensuscodingsequenceccdsprojectidentifyingacommonproteincodinggenesetforthehumanandmousegenomes AT linmichaelf theconsensuscodingsequenceccdsprojectidentifyingacommonproteincodinggenesetforthehumanandmousegenomes AT kellismanolis consensuscodingsequenceccdsprojectidentifyingacommonproteincodinggenesetforthehumanandmousegenomes AT linmichaelf consensuscodingsequenceccdsprojectidentifyingacommonproteincodinggenesetforthehumanandmousegenomes

The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

Similar Items