Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]

The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT  are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the co...

Full description

Bibliographic Details
Main Authors: Yuda Munarko, David Nickerson, Anand Rampadarath
Format: Article
Language:English
Published: F1000 Research Ltd 2023-02-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/12-162/v1
_version_ 1827794586701922304
author Yuda Munarko
David Nickerson
Anand Rampadarath
author_facet Yuda Munarko
David Nickerson
Anand Rampadarath
author_sort Yuda Munarko
collection DOAJ
description The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT  are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).
first_indexed 2024-03-11T18:36:01Z
format Article
id doaj.art-913877e8152e4b9b8e1dfcddc10e08b0
institution Directory Open Access Journal
issn 2046-1402
language English
last_indexed 2024-03-11T18:36:01Z
publishDate 2023-02-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj.art-913877e8152e4b9b8e1dfcddc10e08b02023-10-13T00:00:00ZengF1000 Research LtdF1000Research2046-14022023-02-0112141628Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]Yuda Munarko0https://orcid.org/0000-0002-9656-3945David Nickerson1Anand Rampadarath2https://orcid.org/0000-0001-8830-6212Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandAuckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandAuckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandThe Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT  are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).https://f1000research.com/articles/12-162/v1Transformer BERT biosimulation model search engine semantic annotation CASBERT Physiome Model Repositoryeng
spellingShingle Yuda Munarko
David Nickerson
Anand Rampadarath
Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
F1000Research
Transformer
BERT
biosimulation model search engine
semantic annotation
CASBERT
Physiome Model Repository
eng
title Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
title_full Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
title_fullStr Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
title_full_unstemmed Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
title_short Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
title_sort building a search tool for compositely annotated entities using transformer based approach case study in biosimulation model search engine bmse version 1 peer review 2 approved
topic Transformer
BERT
biosimulation model search engine
semantic annotation
CASBERT
Physiome Model Repository
eng
url https://f1000research.com/articles/12-162/v1
work_keys_str_mv AT yudamunarko buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved
AT davidnickerson buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved
AT anandrampadarath buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved