Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature

Abstract Background Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively...

Full description

Bibliographic Details
Main Authors:	Weixin Xie, Kunjie Fan, Shijun Zhang, Lang Li
Format:	Article
Language:	English
Published:	BMC 2023-05-01
Series:	Journal of Biomedical Semantics
Subjects:	Active learning Deep learning Drug-drug interaction Information retrieval Random negative sampling Positive sampling
Online Access:	https://doi.org/10.1186/s13326-023-00287-7

_version_	1797811174836797440
author	Weixin Xie Kunjie Fan Shijun Zhang Lang Li
author_facet	Weixin Xie Kunjie Fan Shijun Zhang Lang Li
author_sort	Weixin Xie
collection	DOAJ
description	Abstract Background Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. Results PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. Conclusions By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.
first_indexed	2024-03-13T07:19:47Z
format	Article
id	doaj.art-189d0ce3178c4a2d85c8f2c51d63d498
institution	Directory Open Access Journal
issn	2041-1480
language	English
last_indexed	2024-03-13T07:19:47Z
publishDate	2023-05-01
publisher	BMC
record_format	Article
series	Journal of Biomedical Semantics
spelling	doaj.art-189d0ce3178c4a2d85c8f2c51d63d4982023-06-04T11:42:22ZengBMCJournal of Biomedical Semantics2041-14802023-05-0114111210.1186/s13326-023-00287-7Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literatureWeixin Xie0Kunjie Fan1Shijun Zhang2Lang Li3Department of Biomedical Informatics, Ohio State UniversityDepartment of Biomedical Informatics, Ohio State UniversityDepartment of Biomedical Informatics, Ohio State UniversityDepartment of Biomedical Informatics, Ohio State UniversityAbstract Background Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. Results PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. Conclusions By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.https://doi.org/10.1186/s13326-023-00287-7Active learningDeep learningDrug-drug interactionInformation retrievalRandom negative samplingPositive sampling
spellingShingle	Weixin Xie Kunjie Fan Shijun Zhang Lang Li Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature Journal of Biomedical Semantics Active learning Deep learning Drug-drug interaction Information retrieval Random negative sampling Positive sampling
title	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_fullStr	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full_unstemmed	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_short	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_sort	multiple sampling schemes and deep learning improve active learning performance in drug drug interaction information retrieval analysis from the literature
topic	Active learning Deep learning Drug-drug interaction Information retrieval Random negative sampling Positive sampling
url	https://doi.org/10.1186/s13326-023-00287-7
work_keys_str_mv	AT weixinxie multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT kunjiefan multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT shijunzhang multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT langli multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature

Similar Items