BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Pensoft Publishers
2022-10-01
|
Series: | Biodiversity Data Journal |
Subjects: | |
Online Access: | https://bdj.pensoft.net/article/89481/download/pdf/ |
_version_ | 1797948445715070976 |
---|---|
author | Nora Abdelmageed Felicitas Löffler Leila Feddoul Alsayed Algergawy Sheeba Samuel Jitendra Gaikwad Anahita Kazem Birgitta König-Ries |
author_facet | Nora Abdelmageed Felicitas Löffler Leila Feddoul Alsayed Algergawy Sheeba Samuel Jitendra Gaikwad Anahita Kazem Birgitta König-Ries |
author_sort | Nora Abdelmageed |
collection | DOAJ |
description | Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora.In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora. |
first_indexed | 2024-04-10T21:43:26Z |
format | Article |
id | doaj.art-41c8f5a44f4449799c7c9623c1112c1c |
institution | Directory Open Access Journal |
issn | 1314-2828 |
language | English |
last_indexed | 2024-04-10T21:43:26Z |
publishDate | 2022-10-01 |
publisher | Pensoft Publishers |
record_format | Article |
series | Biodiversity Data Journal |
spelling | doaj.art-41c8f5a44f4449799c7c9623c1112c1c2023-01-18T21:08:31ZengPensoft PublishersBiodiversity Data Journal1314-28282022-10-011012410.3897/BDJ.10.e8948189481BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domainNora Abdelmageed0Felicitas Löffler1Leila Feddoul2Alsayed Algergawy3Sheeba Samuel4Jitendra Gaikwad5Anahita Kazem6Birgitta König-Ries7Michael-Stifel-Center for Data-Driven and Simulation ScienceHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaMichael-Stifel-Center for Data-Driven and Simulation ScienceHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaGerman Center for Integrative Biodiversity Research (iDiv)German Center for Integrative Biodiversity Research (iDiv)Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora.In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.https://bdj.pensoft.net/article/89481/download/pdf/entity annotationrelation annotationNamed Enti |
spellingShingle | Nora Abdelmageed Felicitas Löffler Leila Feddoul Alsayed Algergawy Sheeba Samuel Jitendra Gaikwad Anahita Kazem Birgitta König-Ries BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain Biodiversity Data Journal entity annotation relation annotation Named Enti |
title | BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
title_full | BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
title_fullStr | BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
title_full_unstemmed | BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
title_short | BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
title_sort | biodivnere gold standard corpora for named entity recognition and relation extraction in the biodiversity domain |
topic | entity annotation relation annotation Named Enti |
url | https://bdj.pensoft.net/article/89481/download/pdf/ |
work_keys_str_mv | AT noraabdelmageed biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT felicitasloffler biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT leilafeddoul biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT alsayedalgergawy biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT sheebasamuel biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT jitendragaikwad biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT anahitakazem biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain AT birgittakonigries biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain |