DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering

With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing...

Full description

Bibliographic Details
Main Authors: Shuohua Zhou, Yanping Zhang
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/23/11251
_version_ 1797508096674758656
author Shuohua Zhou
Yanping Zhang
author_facet Shuohua Zhou
Yanping Zhang
author_sort Shuohua Zhou
collection DOAJ
description With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges.
first_indexed 2024-03-10T04:57:29Z
format Article
id doaj.art-a6aafa3cc21845c6a0ff50ddffdacb8c
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T04:57:29Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-a6aafa3cc21845c6a0ff50ddffdacb8c2023-11-23T02:05:02ZengMDPI AGApplied Sciences2076-34172021-11-0111231125110.3390/app112311251DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question AnsweringShuohua Zhou0Yanping Zhang1Department of Informatics, King’s College London, Strand, London WC2R 2LS, UKDepartment of Computer Science, School of Engineering and Applied Science, Gonzaga University, Spokane, WA 99258, USAWith the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges.https://www.mdpi.com/2076-3417/11/23/11251BERTGPT-2XGBoostT5-Smallmedical question answeringtransfer learning
spellingShingle Shuohua Zhou
Yanping Zhang
DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
Applied Sciences
BERT
GPT-2
XGBoost
T5-Small
medical question answering
transfer learning
title DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
title_full DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
title_fullStr DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
title_full_unstemmed DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
title_short DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
title_sort datlmedqa a data augmentation and transfer learning based solution for medical question answering
topic BERT
GPT-2
XGBoost
T5-Small
medical question answering
transfer learning
url https://www.mdpi.com/2076-3417/11/23/11251
work_keys_str_mv AT shuohuazhou datlmedqaadataaugmentationandtransferlearningbasedsolutionformedicalquestionanswering
AT yanpingzhang datlmedqaadataaugmentationandtransferlearningbasedsolutionformedicalquestionanswering