Predicting the capsid architecture of phages from metagenomic data

Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variabili...

Full description

Bibliographic Details
Main Authors: Diana Y. Lee, Caitlin Bartels, Katelyn McNair, Robert A. Edwards, Manal A. Swairjo, Antoni Luque
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021005419
_version_ 1828088068739956736
author Diana Y. Lee
Caitlin Bartels
Katelyn McNair
Robert A. Edwards
Manal A. Swairjo
Antoni Luque
author_facet Diana Y. Lee
Caitlin Bartels
Katelyn McNair
Robert A. Edwards
Manal A. Swairjo
Antoni Luque
author_sort Diana Y. Lee
collection DOAJ
description Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.
first_indexed 2024-04-11T05:20:36Z
format Article
id doaj.art-bb9fd56f2c484fccadd66958287af1a4
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:20:36Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-bb9fd56f2c484fccadd66958287af1a42022-12-24T04:51:04ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-0120721732Predicting the capsid architecture of phages from metagenomic dataDiana Y. Lee0Caitlin Bartels1Katelyn McNair2Robert A. Edwards3Manal A. Swairjo4Antoni Luque5Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Flinders Accelerator for Microbiome Exploration, Flinders University, Bedford Park, GPO Box 2100, Adelaide 5001, South Australia, AustraliaViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Chemistry and Biochemistry, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Mathematics & Statistics, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Corresponding author at: Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA.Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.http://www.sciencedirect.com/science/article/pii/S2001037021005419Tailed bacteriophagesIcosahedral capsidsPhysical modelingMachine learningMetagenomesGut microbiome
spellingShingle Diana Y. Lee
Caitlin Bartels
Katelyn McNair
Robert A. Edwards
Manal A. Swairjo
Antoni Luque
Predicting the capsid architecture of phages from metagenomic data
Computational and Structural Biotechnology Journal
Tailed bacteriophages
Icosahedral capsids
Physical modeling
Machine learning
Metagenomes
Gut microbiome
title Predicting the capsid architecture of phages from metagenomic data
title_full Predicting the capsid architecture of phages from metagenomic data
title_fullStr Predicting the capsid architecture of phages from metagenomic data
title_full_unstemmed Predicting the capsid architecture of phages from metagenomic data
title_short Predicting the capsid architecture of phages from metagenomic data
title_sort predicting the capsid architecture of phages from metagenomic data
topic Tailed bacteriophages
Icosahedral capsids
Physical modeling
Machine learning
Metagenomes
Gut microbiome
url http://www.sciencedirect.com/science/article/pii/S2001037021005419
work_keys_str_mv AT dianaylee predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT caitlinbartels predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT katelynmcnair predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT robertaedwards predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT manalaswairjo predictingthecapsidarchitectureofphagesfrommetagenomicdata
AT antoniluque predictingthecapsidarchitectureofphagesfrommetagenomicdata