The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequentia...
Main Authors: | , , , , , |
---|---|
Format: | Journal article |
Published: |
MDPI
2019
|
_version_ | 1797082938737688576 |
---|---|
author | Tapinos, A Constantinides, B Phan, M Kouchaki, S Cotten, M Robertson, D |
author_facet | Tapinos, A Constantinides, B Phan, M Kouchaki, S Cotten, M Robertson, D |
author_sort | Tapinos, A |
collection | OXFORD |
description | Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data. |
first_indexed | 2024-03-07T01:35:01Z |
format | Journal article |
id | oxford-uuid:94df4046-7b0b-4dda-8cb3-522f2935dffd |
institution | University of Oxford |
last_indexed | 2024-03-07T01:35:01Z |
publishDate | 2019 |
publisher | MDPI |
record_format | dspace |
spelling | oxford-uuid:94df4046-7b0b-4dda-8cb3-522f2935dffd2022-03-26T23:42:21ZThe Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus SequencesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:94df4046-7b0b-4dda-8cb3-522f2935dffdSymplectic Elements at OxfordMDPI2019Tapinos, AConstantinides, BPhan, MKouchaki, SCotten, MRobertson, DAdvances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data. |
spellingShingle | Tapinos, A Constantinides, B Phan, M Kouchaki, S Cotten, M Robertson, D The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title | The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title_full | The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title_fullStr | The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title_full_unstemmed | The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title_short | The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences |
title_sort | utility of data transformation for alignment de novo assembly and classification of short read virus sequences |
work_keys_str_mv | AT tapinosa theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT constantinidesb theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT phanm theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT kouchakis theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT cottenm theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT robertsond theutilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT tapinosa utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT constantinidesb utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT phanm utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT kouchakis utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT cottenm utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences AT robertsond utilityofdatatransformationforalignmentdenovoassemblyandclassificationofshortreadvirussequences |