Overview of the BioCreative III Workshop

<p>Abstract</p> <p>Background</p> <p>The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end Bio...

Full description

Bibliographic Details
Main Authors: Valencia Alfonso, Wilbur W, Cohen Kevin B, Krallinger Martin, Lu Zhiyong, Arighi Cecilia N, Hirschman Lynette, Wu Cathy H
Format: Article
Language:English
Published: BMC 2011-10-01
Series:BMC Bioinformatics
_version_ 1828273417341632512
author Valencia Alfonso
Wilbur W
Cohen Kevin B
Krallinger Martin
Lu Zhiyong
Arighi Cecilia N
Hirschman Lynette
Wu Cathy H
author_facet Valencia Alfonso
Wilbur W
Cohen Kevin B
Krallinger Martin
Lu Zhiyong
Arighi Cecilia N
Hirschman Lynette
Wu Cathy H
author_sort Valencia Alfonso
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III.</p> <p>Results</p> <p>The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed<sup>®</sup> record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically.</p> <p>BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems.</p> <p>Conclusions</p> <p>In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper.</p>
first_indexed 2024-04-13T06:20:23Z
format Article
id doaj.art-dd6a12004ec64bb881cee18c67c381d1
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T06:20:23Z
publishDate 2011-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-dd6a12004ec64bb881cee18c67c381d12022-12-22T02:58:40ZengBMCBMC Bioinformatics1471-21052011-10-0112Suppl 8S110.1186/1471-2105-12-S8-S1Overview of the BioCreative III WorkshopValencia AlfonsoWilbur WCohen Kevin BKrallinger MartinLu ZhiyongArighi Cecilia NHirschman LynetteWu Cathy H<p>Abstract</p> <p>Background</p> <p>The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III.</p> <p>Results</p> <p>The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed<sup>®</sup> record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically.</p> <p>BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems.</p> <p>Conclusions</p> <p>In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper.</p>
spellingShingle Valencia Alfonso
Wilbur W
Cohen Kevin B
Krallinger Martin
Lu Zhiyong
Arighi Cecilia N
Hirschman Lynette
Wu Cathy H
Overview of the BioCreative III Workshop
BMC Bioinformatics
title Overview of the BioCreative III Workshop
title_full Overview of the BioCreative III Workshop
title_fullStr Overview of the BioCreative III Workshop
title_full_unstemmed Overview of the BioCreative III Workshop
title_short Overview of the BioCreative III Workshop
title_sort overview of the biocreative iii workshop
work_keys_str_mv AT valenciaalfonso overviewofthebiocreativeiiiworkshop
AT wilburw overviewofthebiocreativeiiiworkshop
AT cohenkevinb overviewofthebiocreativeiiiworkshop
AT krallingermartin overviewofthebiocreativeiiiworkshop
AT luzhiyong overviewofthebiocreativeiiiworkshop
AT arighicecilian overviewofthebiocreativeiiiworkshop
AT hirschmanlynette overviewofthebiocreativeiiiworkshop
AT wucathyh overviewofthebiocreativeiiiworkshop