Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family

The incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx...

Full description

Bibliographic Details
Main Authors: Chaudhary Mashhood Alam, Asif Iqbal, Anjana Sharma, Alan H. Schulman, Safdar Ali
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-03-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2019.00207/full
_version_ 1818988464188686336
author Chaudhary Mashhood Alam
Chaudhary Mashhood Alam
Asif Iqbal
Anjana Sharma
Alan H. Schulman
Alan H. Schulman
Safdar Ali
Safdar Ali
author_facet Chaudhary Mashhood Alam
Chaudhary Mashhood Alam
Asif Iqbal
Anjana Sharma
Alan H. Schulman
Alan H. Schulman
Safdar Ali
Safdar Ali
author_sort Chaudhary Mashhood Alam
collection DOAJ
description The incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx software. We have also developed two MATLAB-based tools for extraction of gene locations from GenBank in tabular format and simulation of this data with SSR incidence data. Present study encompassing 147 Mycobacteriophage genomes revealed 25,284 SSRs and 1,127 compound SSRs (cSSRs) through IMEx. Mono- to hexa-nucleotide motifs were present. The SSR count per genome ranged from 78 (M100) to 342 (M58) while cSSRs incidence ranged from 1 (M138) to 17 (M28, M73). Though cSSRs were present in all the genomes, their frequency and SSR to cSSR conversion percentage varied from 1.08 (M138 with 93 SSRs) to 8.33 (M116 with 96 SSRs). In terms of localization, the SSRs were predominantly localized to coding regions (∼78%). Interestingly, genomes of around 50 kb contained a similar number of SSRs/cSSRs to that in a 110 kb genome, suggesting functional relevance for SSRs which was substantiated by variation in motif constitution between species with different host range. The three species with broad host range (M97, M100, M116) have around 90% of their mono-nucleotide repeat motifs composed of G or C and only M16 has both A and T mononucleotide motifs. Around 20% of the di-nucleotide repeat motifs in the genomes exhibiting a broad host range were CT/TC, which were either absent or represented to a much lesser extent in the other genomes.
first_indexed 2024-12-20T19:23:00Z
format Article
id doaj.art-845ddcd2187b445080f2e4e5e2f2f649
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-20T19:23:00Z
publishDate 2019-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-845ddcd2187b445080f2e4e5e2f2f6492022-12-21T19:28:56ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-03-011010.3389/fgene.2019.00207393588Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae FamilyChaudhary Mashhood Alam0Chaudhary Mashhood Alam1Asif Iqbal2Anjana Sharma3Alan H. Schulman4Alan H. Schulman5Safdar Ali6Safdar Ali7Luke/BI Plant Genome Dynamics Lab, Institute of Biotechnology and Viikki Plant Science Centre, University of Helsinki, Helsinki, FinlandIngenious e-Brain Solutions, Gurugram, IndiaPIRO Technologies Private Limited, New Delhi, IndiaDepartment of Biomedical Sciences, SRCASW, University of Delhi, New Delhi, IndiaLuke/BI Plant Genome Dynamics Lab, Institute of Biotechnology and Viikki Plant Science Centre, University of Helsinki, Helsinki, FinlandNatural Resources Institute Finland (Luke), Helsinki, FinlandDepartment of Biomedical Sciences, SRCASW, University of Delhi, New Delhi, IndiaDepartment of Biological Sciences, Aliah University, Kolkata, IndiaThe incidence, distribution, and variation of simple sequence repeats (SSRs) in viruses is instrumental in understanding the functional and evolutionary aspects of repeat sequences. Full-length genome sequences retrieved from NCBI were used for extraction and analysis of repeat sequences using IMEx software. We have also developed two MATLAB-based tools for extraction of gene locations from GenBank in tabular format and simulation of this data with SSR incidence data. Present study encompassing 147 Mycobacteriophage genomes revealed 25,284 SSRs and 1,127 compound SSRs (cSSRs) through IMEx. Mono- to hexa-nucleotide motifs were present. The SSR count per genome ranged from 78 (M100) to 342 (M58) while cSSRs incidence ranged from 1 (M138) to 17 (M28, M73). Though cSSRs were present in all the genomes, their frequency and SSR to cSSR conversion percentage varied from 1.08 (M138 with 93 SSRs) to 8.33 (M116 with 96 SSRs). In terms of localization, the SSRs were predominantly localized to coding regions (∼78%). Interestingly, genomes of around 50 kb contained a similar number of SSRs/cSSRs to that in a 110 kb genome, suggesting functional relevance for SSRs which was substantiated by variation in motif constitution between species with different host range. The three species with broad host range (M97, M100, M116) have around 90% of their mono-nucleotide repeat motifs composed of G or C and only M16 has both A and T mononucleotide motifs. Around 20% of the di-nucleotide repeat motifs in the genomes exhibiting a broad host range were CT/TC, which were either absent or represented to a much lesser extent in the other genomes.https://www.frontiersin.org/article/10.3389/fgene.2019.00207/fullMycobacteriophagesimple sequence repeatsimperfect microsatellite extractordMAXhost range
spellingShingle Chaudhary Mashhood Alam
Chaudhary Mashhood Alam
Asif Iqbal
Anjana Sharma
Alan H. Schulman
Alan H. Schulman
Safdar Ali
Safdar Ali
Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
Frontiers in Genetics
Mycobacteriophage
simple sequence repeats
imperfect microsatellite extractor
dMAX
host range
title Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
title_full Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
title_fullStr Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
title_full_unstemmed Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
title_short Microsatellite Diversity, Complexity, and Host Range of Mycobacteriophage Genomes of the Siphoviridae Family
title_sort microsatellite diversity complexity and host range of mycobacteriophage genomes of the siphoviridae family
topic Mycobacteriophage
simple sequence repeats
imperfect microsatellite extractor
dMAX
host range
url https://www.frontiersin.org/article/10.3389/fgene.2019.00207/full
work_keys_str_mv AT chaudharymashhoodalam microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT chaudharymashhoodalam microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT asifiqbal microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT anjanasharma microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT alanhschulman microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT alanhschulman microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT safdarali microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily
AT safdarali microsatellitediversitycomplexityandhostrangeofmycobacteriophagegenomesofthesiphoviridaefamily