The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequentia...

Full description

Bibliographic Details
Main Authors: Tapinos, A, Constantinides, B, Phan, M, Kouchaki, S, Cotten, M, Robertson, D
Format: Journal article
Published: MDPI 2019
_version_ 1797082938737688576
author Tapinos, A
Constantinides, B
Phan, M
Kouchaki, S
Cotten, M
Robertson, D
author_facet Tapinos, A
Constantinides, B
Phan, M
Kouchaki, S
Cotten, M
Robertson, D
author_sort Tapinos, A
collection OXFORD
description Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
first_indexed 2024-03-07T01:35:01Z
format Journal article
id oxford-uuid:94df4046-7b0b-4dda-8cb3-522f2935dffd
institution University of Oxford
last_indexed 2024-03-07T01:35:01Z
publishDate 2019
publisher MDPI
record_format dspace
spelling oxford-uuid:94df4046-7b0b-4dda-8cb3-522f2935dffd2022-03-26T23:42:21ZThe Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus SequencesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:94df4046-7b0b-4dda-8cb3-522f2935dffdSymplectic Elements at OxfordMDPI2019Tapinos, AConstantinides, BPhan, MKouchaki, SCotten, MRobertson, DAdvances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
spellingShingle Tapinos, A
Constantinides, B
Phan, M
Kouchaki, S
Cotten, M
Robertson, D
The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title_full The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title_fullStr The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title_full_unstemmed The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title_short The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
title_sort utility of data transformation for alignment de novo assembly and classification of short read virus sequences
work_keys_str_mv AT tapinosa theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT constantinidesb theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT phanm theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT kouchakis theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT cottenm theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT robertsond theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT tapinosa utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT constantinidesb utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT phanm utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT kouchakis utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT cottenm utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences
AT robertsond utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences