Deep learning of mutation-gene-drug relations from the literature

Abstract Background Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinica...

Full description

Bibliographic Details
Main Authors:	Kyubum Lee, Byounggun Kim, Yonghwa Choi, Sunkyu Kim, Wonho Shin, Sunwon Lee, Sungjoon Park, Seongsoon Kim, Aik Choon Tan, Jaewoo Kang
Format:	Article
Language:	English
Published:	BMC 2018-01-01
Series:	BMC Bioinformatics
Subjects:	Deep learning Convolutional neural networks Information extraction Text mining NLP BioNLP
Online Access:	http://link.springer.com/article/10.1186/s12859-018-2029-1

_version_	1830306019836690432
author	Kyubum Lee Byounggun Kim Yonghwa Choi Sunkyu Kim Wonho Shin Sunwon Lee Sungjoon Park Seongsoon Kim Aik Choon Tan Jaewoo Kang
author_facet	Kyubum Lee Byounggun Kim Yonghwa Choi Sunkyu Kim Wonho Shin Sunwon Lee Sungjoon Park Seongsoon Kim Aik Choon Tan Jaewoo Kang
author_sort	Kyubum Lee
collection	DOAJ
description	Abstract Background Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. Results Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. Conclusion We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.
first_indexed	2024-12-19T10:04:56Z
format	Article
id	doaj.art-386d88c6d68142de95188c596a5735a1
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-19T10:04:56Z
publishDate	2018-01-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-386d88c6d68142de95188c596a5735a12022-12-21T20:26:32ZengBMCBMC Bioinformatics1471-21052018-01-0119111310.1186/s12859-018-2029-1Deep learning of mutation-gene-drug relations from the literatureKyubum Lee0Byounggun Kim1Yonghwa Choi2Sunkyu Kim3Wonho Shin4Sunwon Lee5Sungjoon Park6Seongsoon Kim7Aik Choon Tan8Jaewoo Kang9Department of Computer Science and Engineering, Korea UniversityInterdisciplinary Graduate Program in Bioinformatics, Korea UniversityDepartment of Computer Science and Engineering, Korea UniversityDepartment of Computer Science and Engineering, Korea UniversityInterdisciplinary Graduate Program in Bioinformatics, Korea UniversityDepartment of Computer Science and Engineering, Korea UniversityDepartment of Computer Science and Engineering, Korea UniversityDepartment of Computer Science and Engineering, Korea UniversityTranslational Bioinformatics and Cancer Systems Biology Laboratory, Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical CampusDepartment of Computer Science and Engineering, Korea UniversityAbstract Background Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. Results Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. Conclusion We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.http://link.springer.com/article/10.1186/s12859-018-2029-1Deep learningConvolutional neural networksInformation extractionText miningNLPBioNLP
spellingShingle	Kyubum Lee Byounggun Kim Yonghwa Choi Sunkyu Kim Wonho Shin Sunwon Lee Sungjoon Park Seongsoon Kim Aik Choon Tan Jaewoo Kang Deep learning of mutation-gene-drug relations from the literature BMC Bioinformatics Deep learning Convolutional neural networks Information extraction Text mining NLP BioNLP
title	Deep learning of mutation-gene-drug relations from the literature
title_full	Deep learning of mutation-gene-drug relations from the literature
title_fullStr	Deep learning of mutation-gene-drug relations from the literature
title_full_unstemmed	Deep learning of mutation-gene-drug relations from the literature
title_short	Deep learning of mutation-gene-drug relations from the literature
title_sort	deep learning of mutation gene drug relations from the literature
topic	Deep learning Convolutional neural networks Information extraction Text mining NLP BioNLP
url	http://link.springer.com/article/10.1186/s12859-018-2029-1
work_keys_str_mv	AT kyubumlee deeplearningofmutationgenedrugrelationsfromtheliterature AT byounggunkim deeplearningofmutationgenedrugrelationsfromtheliterature AT yonghwachoi deeplearningofmutationgenedrugrelationsfromtheliterature AT sunkyukim deeplearningofmutationgenedrugrelationsfromtheliterature AT wonhoshin deeplearningofmutationgenedrugrelationsfromtheliterature AT sunwonlee deeplearningofmutationgenedrugrelationsfromtheliterature AT sungjoonpark deeplearningofmutationgenedrugrelationsfromtheliterature AT seongsoonkim deeplearningofmutationgenedrugrelationsfromtheliterature AT aikchoontan deeplearningofmutationgenedrugrelationsfromtheliterature AT jaewookang deeplearningofmutationgenedrugrelationsfromtheliterature

Deep learning of mutation-gene-drug relations from the literature

Similar Items