Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
The incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-03-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2019.00207/full |
_version_ | 1818988464188686336 |
---|---|
author | Chaudhary Mashhood Alam Chaudhary Mashhood Alam Asif Iqbal Anjana Sharma Alan H. Schulman Alan H. Schulman Safdar Ali Safdar Ali |
author_facet | Chaudhary Mashhood Alam Chaudhary Mashhood Alam Asif Iqbal Anjana Sharma Alan H. Schulman Alan H. Schulman Safdar Ali Safdar Ali |
author_sort | Chaudhary Mashhood Alam |
collection | DOAJ |
description | The incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx software. We have also developed two MATLAB-based tools for extraction of gene locations from GenBank in tabular format and simulation of this data with SSR incidence data. Present study encompassing 147 Mycobacteriophage genomes revealed 25,284 SSRs and 1,127 compound SSRs (cSSRs) through IMEx. Mono- to hexa-nucleotide motifs were present. The SSR count per genome ranged from 78 (M100) to 342 (M58) while cSSRs incidence ranged from 1 (M138) to 17 (M28, M73). Though cSSRs were present in all the genomes, their frequency and SSR to cSSR conversion percentage varied from 1.08 (M138 with 93 SSRs) to 8.33 (M116 with 96 SSRs). In terms of localization, the SSRs were predominantly localized to coding regions (∼78%). Interestingly, genomes of around 50 kb contained a similar number of SSRs/cSSRs to that in a 110 kb genome, suggesting functional relevance for SSRs which was substantiated by variation in motif constitution between species with different host range. The three species with broad host range (M97, M100, M116) have around 90% of their mono-nucleotide repeat motifs composed of G or C and only M16 has both A and T mononucleotide motifs. Around 20% of the di-nucleotide repeat motifs in the genomes exhibiting a broad host range were CT/TC, which were either absent or represented to a much lesser extent in the other genomes. |
first_indexed | 2024-12-20T19:23:00Z |
format | Article |
id | doaj.art-845ddcd2187b445080f2e4e5e2f2f649 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-20T19:23:00Z |
publishDate | 2019-03-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-845ddcd2187b445080f2e4e5e2f2f6492022-12-21T19:28:56ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-03-011010.3389/fgene.2019.00207393588Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae FamilyChaudhary Mashhood Alam0Chaudhary Mashhood Alam1Asif Iqbal2Anjana Sharma3Alan H. Schulman4Alan H. Schulman5Safdar Ali6Safdar Ali7Luke/BI Plant Genome Dynamics Lab, Institute of Biotechnology and Viikki Plant Science Centre, University of Helsinki, Helsinki, FinlandIngenious e-Brain Solutions, Gurugram, IndiaPIRO Technologies Private Limited, New Delhi, IndiaDepartment of Biomedical Sciences, SRCASW, University of Delhi, New Delhi, IndiaLuke/BI Plant Genome Dynamics Lab, Institute of Biotechnology and Viikki Plant Science Centre, University of Helsinki, Helsinki, FinlandNatural Resources Institute Finland (Luke), Helsinki, FinlandDepartment of Biomedical Sciences, SRCASW, University of Delhi, New Delhi, IndiaDepartment of Biological Sciences, Aliah University, Kolkata, IndiaThe incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx software. We have also developed two MATLAB-based tools for extraction of gene locations from GenBank in tabular format and simulation of this data with SSR incidence data. Present study encompassing 147 Mycobacteriophage genomes revealed 25,284 SSRs and 1,127 compound SSRs (cSSRs) through IMEx. Mono- to hexa-nucleotide motifs were present. The SSR count per genome ranged from 78 (M100) to 342 (M58) while cSSRs incidence ranged from 1 (M138) to 17 (M28, M73). Though cSSRs were present in all the genomes, their frequency and SSR to cSSR conversion percentage varied from 1.08 (M138 with 93 SSRs) to 8.33 (M116 with 96 SSRs). In terms of localization, the SSRs were predominantly localized to coding regions (∼78%). Interestingly, genomes of around 50 kb contained a similar number of SSRs/cSSRs to that in a 110 kb genome, suggesting functional relevance for SSRs which was substantiated by variation in motif constitution between species with different host range. The three species with broad host range (M97, M100, M116) have around 90% of their mono-nucleotide repeat motifs composed of G or C and only M16 has both A and T mononucleotide motifs. Around 20% of the di-nucleotide repeat motifs in the genomes exhibiting a broad host range were CT/TC, which were either absent or represented to a much lesser extent in the other genomes.https://www.frontiersin.org/article/10.3389/fgene.2019.00207/fullMycobacteriophagesimple sequence repeatsimperfect microsatellite extractordMAXhost range |
spellingShingle | Chaudhary Mashhood Alam Chaudhary Mashhood Alam Asif Iqbal Anjana Sharma Alan H. Schulman Alan H. Schulman Safdar Ali Safdar Ali Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family Frontiers in Genetics Mycobacteriophage simple sequence repeats imperfect microsatellite extractor dMAX host range |
title | Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family |
title_full | Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family |
title_fullStr | Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family |
title_full_unstemmed | Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family |
title_short | Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family |
title_sort | microsatellite diversity complexity and host range of mycobacteriophage genomes of the siphoviridae family |
topic | Mycobacteriophage simple sequence repeats imperfect microsatellite extractor dMAX host range |
url | https://www.frontiersin.org/article/10.3389/fgene.2019.00207/full |
work_keys_str_mv | AT chaudharymashhoodalam microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT chaudharymashhoodalam microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT asifiqbal microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT anjanasharma microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT alanhschulman microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT alanhschulman microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT safdarali microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily AT safdarali microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily |