An efficient strategy using k-mers to analyse 16S rRNA sequences

The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to...

Full description

Bibliographic Details
Main Authors: Marcel Martínez-Porchas, Francisco Vargas-Albores
Format: Article
Language:English
Published: Elsevier 2017-07-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844017312501
_version_ 1818332594407735296
author Marcel Martínez-Porchas
Francisco Vargas-Albores
author_facet Marcel Martínez-Porchas
Francisco Vargas-Albores
author_sort Marcel Martínez-Porchas
collection DOAJ
description The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.
first_indexed 2024-12-13T13:38:13Z
format Article
id doaj.art-f4ed3b4abf164824b046d3d1180a0095
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-12-13T13:38:13Z
publishDate 2017-07-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-f4ed3b4abf164824b046d3d1180a00952022-12-21T23:43:41ZengElsevierHeliyon2405-84402017-07-013710.1016/j.heliyon.2017.e00370An efficient strategy using k-mers to analyse 16S rRNA sequencesMarcel Martínez-PorchasFrancisco Vargas-AlboresThe use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.http://www.sciencedirect.com/science/article/pii/S2405844017312501BioinformaticsMicrobiologyBiological sciences
spellingShingle Marcel Martínez-Porchas
Francisco Vargas-Albores
An efficient strategy using k-mers to analyse 16S rRNA sequences
Heliyon
Bioinformatics
Microbiology
Biological sciences
title An efficient strategy using k-mers to analyse 16S rRNA sequences
title_full An efficient strategy using k-mers to analyse 16S rRNA sequences
title_fullStr An efficient strategy using k-mers to analyse 16S rRNA sequences
title_full_unstemmed An efficient strategy using k-mers to analyse 16S rRNA sequences
title_short An efficient strategy using k-mers to analyse 16S rRNA sequences
title_sort efficient strategy using k mers to analyse 16s rrna sequences
topic Bioinformatics
Microbiology
Biological sciences
url http://www.sciencedirect.com/science/article/pii/S2405844017312501
work_keys_str_mv AT marcelmartinezporchas anefficientstrategyusingkmerstoanalyse16srrnasequences
AT franciscovargasalbores anefficientstrategyusingkmerstoanalyse16srrnasequences
AT marcelmartinezporchas efficientstrategyusingkmerstoanalyse16srrnasequences
AT franciscovargasalbores efficientstrategyusingkmerstoanalyse16srrnasequences