Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences

Abstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of...

Full description

Bibliographic Details
Main Authors: Younghoon Kim, Tao Wang, Danyi Xiong, Xinlei Wang, Seongoh Park
Format: Article
Language:English
Published: BMC 2022-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05012-2
_version_ 1817970331348893696
author Younghoon Kim
Tao Wang
Danyi Xiong
Xinlei Wang
Seongoh Park
author_facet Younghoon Kim
Tao Wang
Danyi Xiong
Xinlei Wang
Seongoh Park
author_sort Younghoon Kim
collection DOAJ
description Abstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
first_indexed 2024-04-13T20:32:49Z
format Article
id doaj.art-e56c1ff0b1594ab8853966c341b2fbde
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T20:32:49Z
publishDate 2022-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-e56c1ff0b1594ab8853966c341b2fbde2022-12-22T02:31:08ZengBMCBMC Bioinformatics1471-21052022-11-0123111710.1186/s12859-022-05012-2Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequencesYounghoon Kim0Tao Wang1Danyi Xiong2Xinlei Wang3Seongoh Park4Department of Industrial and Management Systems Engineering, Kyung Hee UniversityQuantitative Biomedical Research Center, Peter O’ Donnell Jr. School of Public Health, University of Texas Southwestern Medical CenterDepartment of Statistical Science, Southern Methodist UniversityDepartment of Statistical Science, Southern Methodist UniversitySchool of Mathematics, Statistics and Data Science, Sungshin Women’s UniversityAbstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.https://doi.org/10.1186/s12859-022-05012-2Multiple instance learningInstance selectionPrimary instanceSparsemax
spellingShingle Younghoon Kim
Tao Wang
Danyi Xiong
Xinlei Wang
Seongoh Park
Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
BMC Bioinformatics
Multiple instance learning
Instance selection
Primary instance
Sparsemax
title Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
title_full Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
title_fullStr Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
title_full_unstemmed Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
title_short Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
title_sort multiple instance neural networks based on sparse attention for cancer detection using t cell receptor sequences
topic Multiple instance learning
Instance selection
Primary instance
Sparsemax
url https://doi.org/10.1186/s12859-022-05012-2
work_keys_str_mv AT younghoonkim multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences
AT taowang multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences
AT danyixiong multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences
AT xinleiwang multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences
AT seongohpark multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences