Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations

Several challenges appear in the application of deep learning to genomic data. First, the dimensionality of input can be orders of magnitude greater than the number of samples, forcing the model to be prone to overfitting the training dataset. Second, each input variable’s contribution to the predic...

Full description

Bibliographic Details
Main Authors:	Kazuma Kobayashi, Amina Bolatkan, Shuichiro Shiina, Ryuji Hamamoto
Format:	Article
Language:	English
Published:	MDPI AG 2020-08-01
Series:	Biomolecules
Subjects:	deep learning Diet Networks lung cancer interpretable neural networks
Online Access:	https://www.mdpi.com/2218-273X/10/9/1249

_version_	1827707360004538368
author	Kazuma Kobayashi Amina Bolatkan Shuichiro Shiina Ryuji Hamamoto
author_facet	Kazuma Kobayashi Amina Bolatkan Shuichiro Shiina Ryuji Hamamoto
author_sort	Kazuma Kobayashi
collection	DOAJ
description	Several challenges appear in the application of deep learning to genomic data. First, the dimensionality of input can be orders of magnitude greater than the number of samples, forcing the model to be prone to overfitting the training dataset. Second, each input variable’s contribution to the prediction is usually difficult to interpret, owing to multiple nonlinear operations. Third, genetic data features sometimes have no innate structure. To alleviate these problems, we propose a modification to Diet Networks by adding element-wise input scaling. The original Diet Networks concept can considerably reduce the number of parameters of the fully-connected layers by taking the transposed data matrix as an input to its auxiliary network. The efficacy of the proposed architecture was evaluated on a binary classification task for lung cancer histology, that is, adenocarcinoma or squamous cell carcinoma, from a somatic mutation profile. The dataset consisted of 950 cases, and 5-fold cross-validation was performed for evaluating the model performance. The model achieved a prediction accuracy of around 80% and showed that our modification markedly stabilized the learning process. Also, latent representations acquired inside the model allowed us to interpret the relationship between somatic mutation sites for the prediction.
first_indexed	2024-03-10T16:44:36Z
format	Article
id	doaj.art-206b1c8352e345b78117c7e86d330ac4
institution	Directory Open Access Journal
issn	2218-273X
language	English
last_indexed	2024-03-10T16:44:36Z
publishDate	2020-08-01
publisher	MDPI AG
record_format	Article
series	Biomolecules
spelling	doaj.art-206b1c8352e345b78117c7e86d330ac42023-11-20T11:41:02ZengMDPI AGBiomolecules2218-273X2020-08-01109124910.3390/biom10091249Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic MutationsKazuma Kobayashi0Amina Bolatkan1Shuichiro Shiina2Ryuji Hamamoto3Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, JapanDivision of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, JapanDepartment of Diagnostic Imaging and Interventional Oncology, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, JapanDivision of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, JapanSeveral challenges appear in the application of deep learning to genomic data. First, the dimensionality of input can be orders of magnitude greater than the number of samples, forcing the model to be prone to overfitting the training dataset. Second, each input variable’s contribution to the prediction is usually difficult to interpret, owing to multiple nonlinear operations. Third, genetic data features sometimes have no innate structure. To alleviate these problems, we propose a modification to Diet Networks by adding element-wise input scaling. The original Diet Networks concept can considerably reduce the number of parameters of the fully-connected layers by taking the transposed data matrix as an input to its auxiliary network. The efficacy of the proposed architecture was evaluated on a binary classification task for lung cancer histology, that is, adenocarcinoma or squamous cell carcinoma, from a somatic mutation profile. The dataset consisted of 950 cases, and 5-fold cross-validation was performed for evaluating the model performance. The model achieved a prediction accuracy of around 80% and showed that our modification markedly stabilized the learning process. Also, latent representations acquired inside the model allowed us to interpret the relationship between somatic mutation sites for the prediction.https://www.mdpi.com/2218-273X/10/9/1249deep learningDiet Networkslung cancerinterpretable neural networks
spellingShingle	Kazuma Kobayashi Amina Bolatkan Shuichiro Shiina Ryuji Hamamoto Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations Biomolecules deep learning Diet Networks lung cancer interpretable neural networks
title	Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations
title_full	Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations
title_fullStr	Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations
title_full_unstemmed	Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations
title_short	Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations
title_sort	fully connected neural networks with reduced parameterization for predicting histological types of lung cancer from somatic mutations
topic	deep learning Diet Networks lung cancer interpretable neural networks
url	https://www.mdpi.com/2218-273X/10/9/1249
work_keys_str_mv	AT kazumakobayashi fullyconnectedneuralnetworkswithreducedparameterizationforpredictinghistologicaltypesoflungcancerfromsomaticmutations AT aminabolatkan fullyconnectedneuralnetworkswithreducedparameterizationforpredictinghistologicaltypesoflungcancerfromsomaticmutations AT shuichiroshiina fullyconnectedneuralnetworkswithreducedparameterizationforpredictinghistologicaltypesoflungcancerfromsomaticmutations AT ryujihamamoto fullyconnectedneuralnetworkswithreducedparameterizationforpredictinghistologicaltypesoflungcancerfromsomaticmutations

Fully-Connected Neural Networks with Reduced Parameterization for Predicting Histological Types of Lung Cancer from Somatic Mutations

Similar Items