A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in...

Повний опис

Бібліографічні деталі
Автори: Kouchaki, S, Tapinos, A, Robertson, D
Формат: Journal article
Мова:English
Опубліковано: Springer Nature 2019
_version_ 1826279429853151232
author Kouchaki, S
Tapinos, A
Robertson, D
author_facet Kouchaki, S
Tapinos, A
Robertson, D
author_sort Kouchaki, S
collection OXFORD
description Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local 'texture' changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their 'texture' compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP .
first_indexed 2024-03-06T23:58:36Z
format Journal article
id oxford-uuid:751c79b2-7d48-420d-9da7-3bac9b8d3b81
institution University of Oxford
language English
last_indexed 2024-03-06T23:58:36Z
publishDate 2019
publisher Springer Nature
record_format dspace
spelling oxford-uuid:751c79b2-7d48-420d-9da7-3bac9b8d3b812022-03-26T20:07:24ZA signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patternsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:751c79b2-7d48-420d-9da7-3bac9b8d3b81EnglishSymplectic Elements at OxfordSpringer Nature2019Kouchaki, STapinos, ARobertson, DAlgorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local 'texture' changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their 'texture' compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP .
spellingShingle Kouchaki, S
Tapinos, A
Robertson, D
A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_full A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_fullStr A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_full_unstemmed A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_short A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_sort signal processing method for alignment free metagenomic binning multi resolution genomic binary patterns
work_keys_str_mv AT kouchakis asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT tapinosa asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT robertsond asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT kouchakis signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT tapinosa signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT robertsond signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns