Document Retrieval System for Biomedical Question Answering

In this paper, we describe our biomedical document retrieval system and answers extraction module, which is part of the biomedical question answering system. Approximately 26.5 million PubMed articles are indexed as a corpus with the Apache Lucene text search engine. Our proposed system consists of...

Full description

Bibliographic Details
Main Authors:	Harun Bolat, Baha Şen
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Applied Sciences
Subjects:	information retrieval document retrieval biomedical question answering search engine natural language processing
Online Access:	https://www.mdpi.com/2076-3417/14/6/2613

_version_	1827307177181708288
author	Harun Bolat Baha Şen
author_facet	Harun Bolat Baha Şen
author_sort	Harun Bolat
collection	DOAJ
description	In this paper, we describe our biomedical document retrieval system and answers extraction module, which is part of the biomedical question answering system. Approximately 26.5 million PubMed articles are indexed as a corpus with the Apache Lucene text search engine. Our proposed system consists of three parts. The first part is the question analysis module, which analyzes the question and enriches it with biomedical concepts related to its wording. The second part of the system is the document retrieval module. In this step, the proposed system is tested using different information retrieval models, like the Vector Space Model, Okapi BM25, and Query Likelihood. The third part is the document re-ranking module, which is responsible for re-arranging the documents retrieved in the previous step. For this study, we tested our proposed system with 6B training questions from the BioASQ challenge task. We obtained the best MAP score on the document retrieval phase when we used Query Likelihood with the Dirichlet Smoothing model. We used the sequential dependence model at the re-rank phase, but this model produced a worse MAP score than the previous phase. In similarity calculation, we included the Named Entity Recognition (NER), UMLS Concept Unique Identifiers (CUI), and UMLS Semantic Types of the words in the question to find the sentences containing the answer. Using this approach, we observed a performance enhancement of roughly 25% for the top 20 outcomes, surpassing another method employed in this study, which relies solely on textual similarity.
first_indexed	2024-04-24T18:34:48Z
format	Article
id	doaj.art-c299ee39aef445e8ad8cfd00cb7fff49
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-04-24T18:34:48Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-c299ee39aef445e8ad8cfd00cb7fff492024-03-27T13:20:15ZengMDPI AGApplied Sciences2076-34172024-03-01146261310.3390/app14062613Document Retrieval System for Biomedical Question AnsweringHarun Bolat0Baha Şen1Computer Engineering Department, Ankara Yıldırım Beyazıt University, 06010 Ankara, TurkeyComputer Engineering Department, Ankara Yıldırım Beyazıt University, 06010 Ankara, TurkeyIn this paper, we describe our biomedical document retrieval system and answers extraction module, which is part of the biomedical question answering system. Approximately 26.5 million PubMed articles are indexed as a corpus with the Apache Lucene text search engine. Our proposed system consists of three parts. The first part is the question analysis module, which analyzes the question and enriches it with biomedical concepts related to its wording. The second part of the system is the document retrieval module. In this step, the proposed system is tested using different information retrieval models, like the Vector Space Model, Okapi BM25, and Query Likelihood. The third part is the document re-ranking module, which is responsible for re-arranging the documents retrieved in the previous step. For this study, we tested our proposed system with 6B training questions from the BioASQ challenge task. We obtained the best MAP score on the document retrieval phase when we used Query Likelihood with the Dirichlet Smoothing model. We used the sequential dependence model at the re-rank phase, but this model produced a worse MAP score than the previous phase. In similarity calculation, we included the Named Entity Recognition (NER), UMLS Concept Unique Identifiers (CUI), and UMLS Semantic Types of the words in the question to find the sentences containing the answer. Using this approach, we observed a performance enhancement of roughly 25% for the top 20 outcomes, surpassing another method employed in this study, which relies solely on textual similarity.https://www.mdpi.com/2076-3417/14/6/2613information retrievaldocument retrievalbiomedical question answeringsearch enginenatural language processing
spellingShingle	Harun Bolat Baha Şen Document Retrieval System for Biomedical Question Answering Applied Sciences information retrieval document retrieval biomedical question answering search engine natural language processing
title	Document Retrieval System for Biomedical Question Answering
title_full	Document Retrieval System for Biomedical Question Answering
title_fullStr	Document Retrieval System for Biomedical Question Answering
title_full_unstemmed	Document Retrieval System for Biomedical Question Answering
title_short	Document Retrieval System for Biomedical Question Answering
title_sort	document retrieval system for biomedical question answering
topic	information retrieval document retrieval biomedical question answering search engine natural language processing
url	https://www.mdpi.com/2076-3417/14/6/2613
work_keys_str_mv	AT harunbolat documentretrievalsystemforbiomedicalquestionanswering AT bahasen documentretrievalsystemforbiomedicalquestionanswering

Document Retrieval System for Biomedical Question Answering

Similar Items