Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]
The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the co...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
F1000 Research Ltd
2023-02-01
|
Series: | F1000Research |
Subjects: | |
Online Access: | https://f1000research.com/articles/12-162/v1 |
_version_ | 1827794586701922304 |
---|---|
author | Yuda Munarko David Nickerson Anand Rampadarath |
author_facet | Yuda Munarko David Nickerson Anand Rampadarath |
author_sort | Yuda Munarko |
collection | DOAJ |
description | The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE). |
first_indexed | 2024-03-11T18:36:01Z |
format | Article |
id | doaj.art-913877e8152e4b9b8e1dfcddc10e08b0 |
institution | Directory Open Access Journal |
issn | 2046-1402 |
language | English |
last_indexed | 2024-03-11T18:36:01Z |
publishDate | 2023-02-01 |
publisher | F1000 Research Ltd |
record_format | Article |
series | F1000Research |
spelling | doaj.art-913877e8152e4b9b8e1dfcddc10e08b02023-10-13T00:00:00ZengF1000 Research LtdF1000Research2046-14022023-02-0112141628Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved]Yuda Munarko0https://orcid.org/0000-0002-9656-3945David Nickerson1Anand Rampadarath2https://orcid.org/0000-0001-8830-6212Auckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandAuckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandAuckland Bioengineering Institute, University of Auckland, Auckland, 1010, New ZealandThe Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).https://f1000research.com/articles/12-162/v1Transformer BERT biosimulation model search engine semantic annotation CASBERT Physiome Model Repositoryeng |
spellingShingle | Yuda Munarko David Nickerson Anand Rampadarath Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] F1000Research Transformer BERT biosimulation model search engine semantic annotation CASBERT Physiome Model Repository eng |
title | Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] |
title_full | Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] |
title_fullStr | Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] |
title_full_unstemmed | Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] |
title_short | Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) [version 1; peer review: 2 approved] |
title_sort | building a search tool for compositely annotated entities using transformer based approach case study in biosimulation model search engine bmse version 1 peer review 2 approved |
topic | Transformer BERT biosimulation model search engine semantic annotation CASBERT Physiome Model Repository eng |
url | https://f1000research.com/articles/12-162/v1 |
work_keys_str_mv | AT yudamunarko buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved AT davidnickerson buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved AT anandrampadarath buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmseversion1peerreview2approved |