Deep generative models for T cell receptor protein sequences

Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences fo...

Full description

Bibliographic Details
Main Authors: Kristian Davidsen, Branden J Olson, William S DeWitt III, Jean Feng, Elias Harkins, Philip Bradley, Frederick A Matsen IV
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2019-09-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/46935
_version_ 1818019896252956672
author Kristian Davidsen
Branden J Olson
William S DeWitt III
Jean Feng
Elias Harkins
Philip Bradley
Frederick A Matsen IV
author_facet Kristian Davidsen
Branden J Olson
William S DeWitt III
Jean Feng
Elias Harkins
Philip Bradley
Frederick A Matsen IV
author_sort Kristian Davidsen
collection DOAJ
description Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences.
first_indexed 2024-04-14T07:58:48Z
format Article
id doaj.art-c830ffb8a0de4a7489ad9c4b44a47423
institution Directory Open Access Journal
issn 2050-084X
language English
last_indexed 2024-04-14T07:58:48Z
publishDate 2019-09-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj.art-c830ffb8a0de4a7489ad9c4b44a474232022-12-22T02:04:58ZengeLife Sciences Publications LtdeLife2050-084X2019-09-01810.7554/eLife.46935Deep generative models for T cell receptor protein sequencesKristian Davidsen0https://orcid.org/0000-0002-3821-6902Branden J Olson1https://orcid.org/0000-0003-1951-8822William S DeWitt III2https://orcid.org/0000-0002-6802-9139Jean Feng3https://orcid.org/0000-0003-2041-3104Elias Harkins4Philip Bradley5https://orcid.org/0000-0002-0224-6464Frederick A Matsen IV6https://orcid.org/0000-0003-0607-6025University of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesUniversity of Washington, Seattle, United States; Fred Hutchinson Cancer Research Center, Seattle, United StatesProbabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences.https://elifesciences.org/articles/46935T cell receptorvariational autoencoderrepertoire modelingvaccineT cell expansion
spellingShingle Kristian Davidsen
Branden J Olson
William S DeWitt III
Jean Feng
Elias Harkins
Philip Bradley
Frederick A Matsen IV
Deep generative models for T cell receptor protein sequences
eLife
T cell receptor
variational autoencoder
repertoire modeling
vaccine
T cell expansion
title Deep generative models for T cell receptor protein sequences
title_full Deep generative models for T cell receptor protein sequences
title_fullStr Deep generative models for T cell receptor protein sequences
title_full_unstemmed Deep generative models for T cell receptor protein sequences
title_short Deep generative models for T cell receptor protein sequences
title_sort deep generative models for t cell receptor protein sequences
topic T cell receptor
variational autoencoder
repertoire modeling
vaccine
T cell expansion
url https://elifesciences.org/articles/46935
work_keys_str_mv AT kristiandavidsen deepgenerativemodelsfortcellreceptorproteinsequences
AT brandenjolson deepgenerativemodelsfortcellreceptorproteinsequences
AT williamsdewittiii deepgenerativemodelsfortcellreceptorproteinsequences
AT jeanfeng deepgenerativemodelsfortcellreceptorproteinsequences
AT eliasharkins deepgenerativemodelsfortcellreceptorproteinsequences
AT philipbradley deepgenerativemodelsfortcellreceptorproteinsequences
AT frederickamatseniv deepgenerativemodelsfortcellreceptorproteinsequences