MIDAS: Computer application for the identification of exact and inaccurate microsatellites in genomic sequences

Microsatellites are tandem repeat, frequent and diverse short sequences in the genomes of all species, constituting important markers in multiple areas of genomics-based research. Associations of these markers have been found in a significant number of human diseases. Vaccine development has shown h...

Full description

Bibliographic Details
Main Author: Carlos M. Martínez Ortiz
Format: Article
Language:English
Published: ECIMED 2019-01-01
Series:Revista Cubana de Informática Médica
Online Access:http://revinformatica.sld.cu/index.php/rcim/article/view/302
Description
Summary:Microsatellites are tandem repeat, frequent and diverse short sequences in the genomes of all species, constituting important markers in multiple areas of genomics-based research. Associations of these markers have been found in a significant number of human diseases. Vaccine development has shown how pathogens can evade the immune response by simply altering the composition of repeat sequences in their genes. There are numerous computer applications for the detection of these sequences, but they do not meet all expectations due to the divergence of criteria and approaches applied to solving the problem of their detection. MIDAS implements a non-heuristic solution based on two combinatorial algorithms in series: the first one detects exact microsatellites, and the second one, if the model parameters allow it, extends the sequences to their optimal inaccurate version. The application has as input the genomic sequence in GBFF or FASTA format and its output provides the microsatellite positions in the genomic sequence, as well as sizes, alignments, flanks and other statistics. The algorithm is highly efficient and comprehensive, detecting all possible repeat sequences regardless of their nucleotide composition.<br /><strong>Keywords:</strong> SSR; microsatellite; molecular marker; data mining; algorithms
ISSN:1684-1859