Predicting the capsid architecture of phages from metagenomic data
Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variabili...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037021005419 |
_version_ | 1828088068739956736 |
---|---|
author | Diana Y. Lee Caitlin Bartels Katelyn McNair Robert A. Edwards Manal A. Swairjo Antoni Luque |
author_facet | Diana Y. Lee Caitlin Bartels Katelyn McNair Robert A. Edwards Manal A. Swairjo Antoni Luque |
author_sort | Diana Y. Lee |
collection | DOAJ |
description | Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems. |
first_indexed | 2024-04-11T05:20:36Z |
format | Article |
id | doaj.art-bb9fd56f2c484fccadd66958287af1a4 |
institution | Directory Open Access Journal |
issn | 2001-0370 |
language | English |
last_indexed | 2024-04-11T05:20:36Z |
publishDate | 2022-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj.art-bb9fd56f2c484fccadd66958287af1a42022-12-24T04:51:04ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-0120721732Predicting the capsid architecture of phages from metagenomic dataDiana Y. Lee0Caitlin Bartels1Katelyn McNair2Robert A. Edwards3Manal A. Swairjo4Antoni Luque5Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Flinders Accelerator for Microbiome Exploration, Flinders University, Bedford Park, GPO Box 2100, Adelaide 5001, South Australia, AustraliaViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Chemistry and Biochemistry, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USAViral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Mathematics & Statistics, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Corresponding author at: Viral Information Institute, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA.Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.http://www.sciencedirect.com/science/article/pii/S2001037021005419Tailed bacteriophagesIcosahedral capsidsPhysical modelingMachine learningMetagenomesGut microbiome |
spellingShingle | Diana Y. Lee Caitlin Bartels Katelyn McNair Robert A. Edwards Manal A. Swairjo Antoni Luque Predicting the capsid architecture of phages from metagenomic data Computational and Structural Biotechnology Journal Tailed bacteriophages Icosahedral capsids Physical modeling Machine learning Metagenomes Gut microbiome |
title | Predicting the capsid architecture of phages from metagenomic data |
title_full | Predicting the capsid architecture of phages from metagenomic data |
title_fullStr | Predicting the capsid architecture of phages from metagenomic data |
title_full_unstemmed | Predicting the capsid architecture of phages from metagenomic data |
title_short | Predicting the capsid architecture of phages from metagenomic data |
title_sort | predicting the capsid architecture of phages from metagenomic data |
topic | Tailed bacteriophages Icosahedral capsids Physical modeling Machine learning Metagenomes Gut microbiome |
url | http://www.sciencedirect.com/science/article/pii/S2001037021005419 |
work_keys_str_mv | AT dianaylee predictingthecapsidarchitectureofphagesfrommetagenomicdata AT caitlinbartels predictingthecapsidarchitectureofphagesfrommetagenomicdata AT katelynmcnair predictingthecapsidarchitectureofphagesfrommetagenomicdata AT robertaedwards predictingthecapsidarchitectureofphagesfrommetagenomicdata AT manalaswairjo predictingthecapsidarchitectureofphagesfrommetagenomicdata AT antoniluque predictingthecapsidarchitectureofphagesfrommetagenomicdata |