Improved prediction of MHC-peptide binding using protein language models

Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipu...

Full description

Bibliographic Details
Main Authors: Nasser Hashemi, Boran Hao, Mikhail Ignatov, Ioannis Ch. Paschalidis, Pirooz Vakili, Sandor Vajda, Dima Kozakov
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-08-01
Series:Frontiers in Bioinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fbinf.2023.1207380/full
_version_ 1797741443008167936
author Nasser Hashemi
Boran Hao
Mikhail Ignatov
Mikhail Ignatov
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Pirooz Vakili
Sandor Vajda
Sandor Vajda
Sandor Vajda
Dima Kozakov
Dima Kozakov
Dima Kozakov
author_facet Nasser Hashemi
Boran Hao
Mikhail Ignatov
Mikhail Ignatov
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Pirooz Vakili
Sandor Vajda
Sandor Vajda
Sandor Vajda
Dima Kozakov
Dima Kozakov
Dima Kozakov
author_sort Nasser Hashemi
collection DOAJ
description Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.
first_indexed 2024-03-12T14:26:47Z
format Article
id doaj.art-e4ac602f29ef4f8cb40a092b72c4058b
institution Directory Open Access Journal
issn 2673-7647
language English
last_indexed 2024-03-12T14:26:47Z
publishDate 2023-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioinformatics
spelling doaj.art-e4ac602f29ef4f8cb40a092b72c4058b2023-08-18T05:12:08ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472023-08-01310.3389/fbinf.2023.12073801207380Improved prediction of MHC-peptide binding using protein language modelsNasser Hashemi0Boran Hao1Mikhail Ignatov2Mikhail Ignatov3Ioannis Ch. Paschalidis4Ioannis Ch. Paschalidis5Ioannis Ch. Paschalidis6Pirooz Vakili7Sandor Vajda8Sandor Vajda9Sandor Vajda10Dima Kozakov11Dima Kozakov12Dima Kozakov13Division of Systems Engineering, Boston University, Boston, MA, United StatesDepartment of Electrical and Computer Engineering, Boston University, Boston, MA, United StatesDepartment of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United StatesLaufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United StatesDivision of Systems Engineering, Boston University, Boston, MA, United StatesDepartment of Electrical and Computer Engineering, Boston University, Boston, MA, United StatesDepartment of Biomedical Engineering, Boston University, Boston, MA, United StatesDivision of Systems Engineering, Boston University, Boston, MA, United StatesDivision of Systems Engineering, Boston University, Boston, MA, United StatesDepartment of Biomedical Engineering, Boston University, Boston, MA, United StatesDepartment of Chemistry, Boston University, Boston, MA, United StatesDepartment of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United StatesLaufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United StatesDepartment of Biomedical Engineering, Boston University, Boston, MA, United StatesMajor histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.https://www.frontiersin.org/articles/10.3389/fbinf.2023.1207380/fullMHC class Ideep learningtransformersnatural language processingcellular immune system
spellingShingle Nasser Hashemi
Boran Hao
Mikhail Ignatov
Mikhail Ignatov
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Ioannis Ch. Paschalidis
Pirooz Vakili
Sandor Vajda
Sandor Vajda
Sandor Vajda
Dima Kozakov
Dima Kozakov
Dima Kozakov
Improved prediction of MHC-peptide binding using protein language models
Frontiers in Bioinformatics
MHC class I
deep learning
transformers
natural language processing
cellular immune system
title Improved prediction of MHC-peptide binding using protein language models
title_full Improved prediction of MHC-peptide binding using protein language models
title_fullStr Improved prediction of MHC-peptide binding using protein language models
title_full_unstemmed Improved prediction of MHC-peptide binding using protein language models
title_short Improved prediction of MHC-peptide binding using protein language models
title_sort improved prediction of mhc peptide binding using protein language models
topic MHC class I
deep learning
transformers
natural language processing
cellular immune system
url https://www.frontiersin.org/articles/10.3389/fbinf.2023.1207380/full
work_keys_str_mv AT nasserhashemi improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT boranhao improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT mikhailignatov improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT mikhailignatov improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT ioannischpaschalidis improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT ioannischpaschalidis improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT ioannischpaschalidis improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT piroozvakili improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT sandorvajda improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT sandorvajda improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT sandorvajda improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT dimakozakov improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT dimakozakov improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels
AT dimakozakov improvedpredictionofmhcpeptidebindingusingproteinlanguagemodels