Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

Abstract Background Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against...

Full description

Bibliographic Details
Main Authors:	Martin Pérez-Pérez, Gael Pérez-Rodríguez, Aitor Blanco-Míguez, Florentino Fdez-Riverola, Alfonso Valencia, Martin Krallinger, Anália Lourenço
Format:	Article
Language:	English
Published:	BMC 2019-06-01
Series:	Journal of Cheminformatics
Subjects:	Named entity recognition Shared task REST-API TIPS BeCalm metaserver Patent mining
Online Access:	http://link.springer.com/article/10.1186/s13321-019-0363-6

_version_	1818279824472408064
author	Martin Pérez-Pérez Gael Pérez-Rodríguez Aitor Blanco-Míguez Florentino Fdez-Riverola Alfonso Valencia Martin Krallinger Anália Lourenço
author_facet	Martin Pérez-Pérez Gael Pérez-Rodríguez Aitor Blanco-Míguez Florentino Fdez-Riverola Alfonso Valencia Martin Krallinger Anália Lourenço
author_sort	Martin Pérez-Pérez
collection	DOAJ
description	Abstract Background Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. Results A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. Conclusions The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents.
first_indexed	2024-12-12T23:39:28Z
format	Article
id	doaj.art-ec61dffa83fc48f58dfe789f670c0509
institution	Directory Open Access Journal
issn	1758-2946
language	English
last_indexed	2024-12-12T23:39:28Z
publishDate	2019-06-01
publisher	BMC
record_format	Article
series	Journal of Cheminformatics
spelling	doaj.art-ec61dffa83fc48f58dfe789f670c05092022-12-22T00:07:16ZengBMCJournal of Cheminformatics1758-29462019-06-0111111610.1186/s13321-019-0363-6Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalmMartin Pérez-Pérez0Gael Pérez-Rodríguez1Aitor Blanco-Míguez2Florentino Fdez-Riverola3Alfonso Valencia4Martin Krallinger5Anália Lourenço6Department of Computer Science, ESEI, University of VigoDepartment of Computer Science, ESEI, University of VigoDepartment of Computer Science, ESEI, University of VigoDepartment of Computer Science, ESEI, University of VigoLife Science Department, Barcelona Supercomputing Centre (BSC-CNS)Life Science Department, Barcelona Supercomputing Centre (BSC-CNS)Department of Computer Science, ESEI, University of VigoAbstract Background Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. Results A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. Conclusions The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents.http://link.springer.com/article/10.1186/s13321-019-0363-6Named entity recognitionShared taskREST-APITIPSBeCalm metaserverPatent mining
spellingShingle	Martin Pérez-Pérez Gael Pérez-Rodríguez Aitor Blanco-Míguez Florentino Fdez-Riverola Alfonso Valencia Martin Krallinger Anália Lourenço Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm Journal of Cheminformatics Named entity recognition Shared task REST-API TIPS BeCalm metaserver Patent mining
title	Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_full	Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_fullStr	Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_full_unstemmed	Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_short	Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
title_sort	next generation community assessment of biomedical entity recognition web servers metrics performance interoperability aspects of becalm
topic	Named entity recognition Shared task REST-API TIPS BeCalm metaserver Patent mining
url	http://link.springer.com/article/10.1186/s13321-019-0363-6
work_keys_str_mv	AT martinperezperez nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT gaelperezrodriguez nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT aitorblancomiguez nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT florentinofdezriverola nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT alfonsovalencia nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT martinkrallinger nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm AT analialourenco nextgenerationcommunityassessmentofbiomedicalentityrecognitionwebserversmetricsperformanceinteroperabilityaspectsofbecalm

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

Similar Items