BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain

Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need...

Full description

Bibliographic Details
Main Authors: Nora Abdelmageed, Felicitas Löffler, Leila Feddoul, Alsayed Algergawy, Sheeba Samuel, Jitendra Gaikwad, Anahita Kazem, Birgitta König-Ries
Format: Article
Language:English
Published: Pensoft Publishers 2022-10-01
Series:Biodiversity Data Journal
Subjects:
Online Access:https://bdj.pensoft.net/article/89481/download/pdf/
_version_ 1797948445715070976
author Nora Abdelmageed
Felicitas Löffler
Leila Feddoul
Alsayed Algergawy
Sheeba Samuel
Jitendra Gaikwad
Anahita Kazem
Birgitta König-Ries
author_facet Nora Abdelmageed
Felicitas Löffler
Leila Feddoul
Alsayed Algergawy
Sheeba Samuel
Jitendra Gaikwad
Anahita Kazem
Birgitta König-Ries
author_sort Nora Abdelmageed
collection DOAJ
description Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora.In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.
first_indexed 2024-04-10T21:43:26Z
format Article
id doaj.art-41c8f5a44f4449799c7c9623c1112c1c
institution Directory Open Access Journal
issn 1314-2828
language English
last_indexed 2024-04-10T21:43:26Z
publishDate 2022-10-01
publisher Pensoft Publishers
record_format Article
series Biodiversity Data Journal
spelling doaj.art-41c8f5a44f4449799c7c9623c1112c1c2023-01-18T21:08:31ZengPensoft PublishersBiodiversity Data Journal1314-28282022-10-011012410.3897/BDJ.10.e8948189481BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domainNora Abdelmageed0Felicitas Löffler1Leila Feddoul2Alsayed Algergawy3Sheeba Samuel4Jitendra Gaikwad5Anahita Kazem6Birgitta König-Ries7Michael-Stifel-Center for Data-Driven and Simulation ScienceHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaMichael-Stifel-Center for Data-Driven and Simulation ScienceHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaGerman Center for Integrative Biodiversity Research (iDiv)German Center for Integrative Biodiversity Research (iDiv)Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora.In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.https://bdj.pensoft.net/article/89481/download/pdf/entity annotationrelation annotationNamed Enti
spellingShingle Nora Abdelmageed
Felicitas Löffler
Leila Feddoul
Alsayed Algergawy
Sheeba Samuel
Jitendra Gaikwad
Anahita Kazem
Birgitta König-Ries
BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
Biodiversity Data Journal
entity annotation
relation annotation
Named Enti
title BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_full BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_fullStr BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_full_unstemmed BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_short BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
title_sort biodivnere gold standard corpora for named entity recognition and relation extraction in the biodiversity domain
topic entity annotation
relation annotation
Named Enti
url https://bdj.pensoft.net/article/89481/download/pdf/
work_keys_str_mv AT noraabdelmageed biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT felicitasloffler biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT leilafeddoul biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT alsayedalgergawy biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT sheebasamuel biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT jitendragaikwad biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT anahitakazem biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain
AT birgittakonigries biodivneregoldstandardcorporafornamedentityrecognitionandrelationextractioninthebiodiversitydomain