Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction

Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becom...

Full description

Bibliographic Details
Main Authors: Jaskaran Gill, Madhu Chetty, Suryani Lim, Jennifer Hallinan
Format: Article
Language:English
Published: MDPI AG 2023-12-01
Series:Informatics
Subjects:
Online Access:https://www.mdpi.com/2227-9709/10/4/89
_version_ 1797380593416142848
author Jaskaran Gill
Madhu Chetty
Suryani Lim
Jennifer Hallinan
author_facet Jaskaran Gill
Madhu Chetty
Suryani Lim
Jennifer Hallinan
author_sort Jaskaran Gill
collection DOAJ
description Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a <b>K</b>nowledge-based <b>I</b>ntelligent <b>T</b>ext <b>S</b>implification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.
first_indexed 2024-03-08T20:40:33Z
format Article
id doaj.art-1ddbcedea6cf41579092181c67614499
institution Directory Open Access Journal
issn 2227-9709
language English
last_indexed 2024-03-08T20:40:33Z
publishDate 2023-12-01
publisher MDPI AG
record_format Article
series Informatics
spelling doaj.art-1ddbcedea6cf41579092181c676144992023-12-22T14:15:47ZengMDPI AGInformatics2227-97092023-12-011048910.3390/informatics10040089Knowledge-Based Intelligent Text Simplification for Biological Relation ExtractionJaskaran Gill0Madhu Chetty1Suryani Lim2Jennifer Hallinan3Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, AustraliaHealth Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, AustraliaHealth Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, AustraliaHealth Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, AustraliaRelation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a <b>K</b>nowledge-based <b>I</b>ntelligent <b>T</b>ext <b>S</b>implification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.https://www.mdpi.com/2227-9709/10/4/89sentence simplificationnamed entity recognitionrelation extractionBioBERTBERN2
spellingShingle Jaskaran Gill
Madhu Chetty
Suryani Lim
Jennifer Hallinan
Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
Informatics
sentence simplification
named entity recognition
relation extraction
BioBERT
BERN2
title Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
title_full Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
title_fullStr Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
title_full_unstemmed Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
title_short Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction
title_sort knowledge based intelligent text simplification for biological relation extraction
topic sentence simplification
named entity recognition
relation extraction
BioBERT
BERN2
url https://www.mdpi.com/2227-9709/10/4/89
work_keys_str_mv AT jaskarangill knowledgebasedintelligenttextsimplificationforbiologicalrelationextraction
AT madhuchetty knowledgebasedintelligenttextsimplificationforbiologicalrelationextraction
AT suryanilim knowledgebasedintelligenttextsimplificationforbiologicalrelationextraction
AT jenniferhallinan knowledgebasedintelligenttextsimplificationforbiologicalrelationextraction