Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery

Abstract Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches...

Full description

Bibliographic Details
Main Authors:	Munhwan Lee, Hyeyeon Kim, Hyunwhan Joe, Hong-Gee Kim
Format:	Article
Language:	English
Published:	BMC 2019-07-01
Series:	Journal of Cheminformatics
Subjects:	Deep neural networks Machine learning Compound–protein interaction Proteochemometrics Cheminformatics
Online Access:	http://link.springer.com/article/10.1186/s13321-019-0368-1

_version_	1831714533577588736
author	Munhwan Lee Hyeyeon Kim Hyunwhan Joe Hong-Gee Kim
author_facet	Munhwan Lee Hyeyeon Kim Hyunwhan Joe Hong-Gee Kim
author_sort	Munhwan Lee
collection	DOAJ
description	Abstract Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning’s advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-to-end learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multi-channel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multi-channel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
first_indexed	2024-12-21T00:01:15Z
format	Article
id	doaj.art-5918a419c0d742fe8da8eb821a2da4bd
institution	Directory Open Access Journal
issn	1758-2946
language	English
last_indexed	2024-12-21T00:01:15Z
publishDate	2019-07-01
publisher	BMC
record_format	Article
series	Journal of Cheminformatics
spelling	doaj.art-5918a419c0d742fe8da8eb821a2da4bd2022-12-21T19:22:36ZengBMCJournal of Cheminformatics1758-29462019-07-0111111610.1186/s13321-019-0368-1Multi-channel PINN: investigating scalable and transferable neural networks for drug discoveryMunhwan Lee0Hyeyeon Kim1Hyunwhan Joe2Hong-Gee Kim3Biomedical Knowledge Engineering Laboratory, Seoul National UniversityBiomedical Knowledge Engineering Laboratory, Seoul National UniversityBiomedical Knowledge Engineering Laboratory, Seoul National UniversityBiomedical Knowledge Engineering Laboratory, Seoul National UniversityAbstract Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning’s advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-to-end learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multi-channel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multi-channel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.http://link.springer.com/article/10.1186/s13321-019-0368-1Deep neural networksMachine learningCompound–protein interactionProteochemometricsCheminformatics
spellingShingle	Munhwan Lee Hyeyeon Kim Hyunwhan Joe Hong-Gee Kim Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery Journal of Cheminformatics Deep neural networks Machine learning Compound–protein interaction Proteochemometrics Cheminformatics
title	Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
title_full	Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
title_fullStr	Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
title_full_unstemmed	Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
title_short	Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
title_sort	multi channel pinn investigating scalable and transferable neural networks for drug discovery
topic	Deep neural networks Machine learning Compound–protein interaction Proteochemometrics Cheminformatics
url	http://link.springer.com/article/10.1186/s13321-019-0368-1
work_keys_str_mv	AT munhwanlee multichannelpinninvestigatingscalableandtransferableneuralnetworksfordrugdiscovery AT hyeyeonkim multichannelpinninvestigatingscalableandtransferableneuralnetworksfordrugdiscovery AT hyunwhanjoe multichannelpinninvestigatingscalableandtransferableneuralnetworksfordrugdiscovery AT honggeekim multichannelpinninvestigatingscalableandtransferableneuralnetworksfordrugdiscovery

Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery

Similar Items