Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences
Abstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-11-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-022-05012-2 |
_version_ | 1817970331348893696 |
---|---|
author | Younghoon Kim Tao Wang Danyi Xiong Xinlei Wang Seongoh Park |
author_facet | Younghoon Kim Tao Wang Danyi Xiong Xinlei Wang Seongoh Park |
author_sort | Younghoon Kim |
collection | DOAJ |
description | Abstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire. |
first_indexed | 2024-04-13T20:32:49Z |
format | Article |
id | doaj.art-e56c1ff0b1594ab8853966c341b2fbde |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T20:32:49Z |
publishDate | 2022-11-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-e56c1ff0b1594ab8853966c341b2fbde2022-12-22T02:31:08ZengBMCBMC Bioinformatics1471-21052022-11-0123111710.1186/s12859-022-05012-2Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequencesYounghoon Kim0Tao Wang1Danyi Xiong2Xinlei Wang3Seongoh Park4Department of Industrial and Management Systems Engineering, Kyung Hee UniversityQuantitative Biomedical Research Center, Peter O’ Donnell Jr. School of Public Health, University of Texas Southwestern Medical CenterDepartment of Statistical Science, Southern Methodist UniversityDepartment of Statistical Science, Southern Methodist UniversitySchool of Mathematics, Statistics and Data Science, Sungshin Women’s UniversityAbstract Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.https://doi.org/10.1186/s12859-022-05012-2Multiple instance learningInstance selectionPrimary instanceSparsemax |
spellingShingle | Younghoon Kim Tao Wang Danyi Xiong Xinlei Wang Seongoh Park Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences BMC Bioinformatics Multiple instance learning Instance selection Primary instance Sparsemax |
title | Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences |
title_full | Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences |
title_fullStr | Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences |
title_full_unstemmed | Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences |
title_short | Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences |
title_sort | multiple instance neural networks based on sparse attention for cancer detection using t cell receptor sequences |
topic | Multiple instance learning Instance selection Primary instance Sparsemax |
url | https://doi.org/10.1186/s12859-022-05012-2 |
work_keys_str_mv | AT younghoonkim multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences AT taowang multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences AT danyixiong multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences AT xinleiwang multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences AT seongohpark multipleinstanceneuralnetworksbasedonsparseattentionforcancerdetectionusingtcellreceptorsequences |