Mining protein function from text using term-based support vector machines

Abstract Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology t...

Full description

Bibliographic Details
Main Authors:	Rice Simon B, Nenadic Goran, Stapley Benjamin J
Format:	Article
Language:	English
Published:	BMC 2005-05-01
Series:	BMC Bioinformatics

_version_	1818758036521484288
author	Rice Simon B Nenadic Goran Stapley Benjamin J
author_facet	Rice Simon B Nenadic Goran Stapley Benjamin J
author_sort	Rice Simon B
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents.</p> <p>Results</p> <p>The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent.</p> <p>Conclusion</p> <p>A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2.</p>
first_indexed	2024-12-18T06:20:27Z
format	Article
id	doaj.art-c61ddb506bb5414da0b84e8848ca1a55
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-18T06:20:27Z
publishDate	2005-05-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-c61ddb506bb5414da0b84e8848ca1a552022-12-21T21:18:10ZengBMCBMC Bioinformatics1471-21052005-05-016Suppl 1S2210.1186/1471-2105-6-S1-S22Mining protein function from text using term-based support vector machinesRice Simon BNenadic GoranStapley Benjamin J<p>Abstract</p> <p>Background</p> <p>Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents.</p> <p>Results</p> <p>The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent.</p> <p>Conclusion</p> <p>A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2.</p>
spellingShingle	Rice Simon B Nenadic Goran Stapley Benjamin J Mining protein function from text using term-based support vector machines BMC Bioinformatics
title	Mining protein function from text using term-based support vector machines
title_full	Mining protein function from text using term-based support vector machines
title_fullStr	Mining protein function from text using term-based support vector machines
title_full_unstemmed	Mining protein function from text using term-based support vector machines
title_short	Mining protein function from text using term-based support vector machines
title_sort	mining protein function from text using term based support vector machines
work_keys_str_mv	AT ricesimonb miningproteinfunctionfromtextusingtermbasedsupportvectormachines AT nenadicgoran miningproteinfunctionfromtextusingtermbasedsupportvectormachines AT stapleybenjaminj miningproteinfunctionfromtextusingtermbasedsupportvectormachines

Mining protein function from text using term-based support vector machines

Similar Items