More than 1,000 putative new human signalling proteins revealed by EST data mining.

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs...

Full description

Bibliographic Details
Main Authors: Schultz, J, Doerks, T, Ponting, C, Copley, R, Bork, P
Format: Journal article
Language:English
Published: 2000
_version_ 1797104443038105600
author Schultz, J
Doerks, T
Ponting, C
Copley, R
Bork, P
author_facet Schultz, J
Doerks, T
Ponting, C
Copley, R
Bork, P
author_sort Schultz, J
collection OXFORD
description Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
first_indexed 2024-03-07T06:33:52Z
format Journal article
id oxford-uuid:f6f17c29-8976-4ebd-b63c-57584d4260be
institution University of Oxford
language English
last_indexed 2024-03-07T06:33:52Z
publishDate 2000
record_format dspace
spelling oxford-uuid:f6f17c29-8976-4ebd-b63c-57584d4260be2022-03-27T12:39:02ZMore than 1,000 putative new human signalling proteins revealed by EST data mining.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:f6f17c29-8976-4ebd-b63c-57584d4260beEnglishSymplectic Elements at Oxford2000Schultz, JDoerks, TPonting, CCopley, RBork, PCloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
spellingShingle Schultz, J
Doerks, T
Ponting, C
Copley, R
Bork, P
More than 1,000 putative new human signalling proteins revealed by EST data mining.
title More than 1,000 putative new human signalling proteins revealed by EST data mining.
title_full More than 1,000 putative new human signalling proteins revealed by EST data mining.
title_fullStr More than 1,000 putative new human signalling proteins revealed by EST data mining.
title_full_unstemmed More than 1,000 putative new human signalling proteins revealed by EST data mining.
title_short More than 1,000 putative new human signalling proteins revealed by EST data mining.
title_sort more than 1 000 putative new human signalling proteins revealed by est data mining
work_keys_str_mv AT schultzj morethan1000putativenewhumansignallingproteinsrevealedbyestdatamining
AT doerkst morethan1000putativenewhumansignallingproteinsrevealedbyestdatamining
AT pontingc morethan1000putativenewhumansignallingproteinsrevealedbyestdatamining
AT copleyr morethan1000putativenewhumansignallingproteinsrevealedbyestdatamining
AT borkp morethan1000putativenewhumansignallingproteinsrevealedbyestdatamining