Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Genes |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4425/12/9/1455 |
_version_ | 1797519103756337152 |
---|---|
author | Kazuki Takabatake Kazuki Izawa Motohiro Akikawa Keisuke Yanagisawa Masahito Ohue Yutaka Akiyama |
author_facet | Kazuki Takabatake Kazuki Izawa Motohiro Akikawa Keisuke Yanagisawa Masahito Ohue Yutaka Akiyama |
author_sort | Kazuki Takabatake |
collection | DOAJ |
description | Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries. |
first_indexed | 2024-03-10T07:38:24Z |
format | Article |
id | doaj.art-1ea5b658173645f8b306b4851f46bb40 |
institution | Directory Open Access Journal |
issn | 2073-4425 |
language | English |
last_indexed | 2024-03-10T07:38:24Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Genes |
spelling | doaj.art-1ea5b658173645f8b306b4851f46bb402023-11-22T13:15:24ZengMDPI AGGenes2073-44252021-09-01129145510.3390/genes12091455Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid AlphabetsKazuki Takabatake0Kazuki Izawa1Motohiro Akikawa2Keisuke Yanagisawa3Masahito Ohue4Yutaka Akiyama5Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanMetagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.https://www.mdpi.com/2073-4425/12/9/1455homology searchgenome sequencemetagenomic analysisreduced amino acid |
spellingShingle | Kazuki Takabatake Kazuki Izawa Motohiro Akikawa Keisuke Yanagisawa Masahito Ohue Yutaka Akiyama Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets Genes homology search genome sequence metagenomic analysis reduced amino acid |
title | Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets |
title_full | Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets |
title_fullStr | Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets |
title_full_unstemmed | Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets |
title_short | Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets |
title_sort | improved large scale homology search by two step seed search using multiple reduced amino acid alphabets |
topic | homology search genome sequence metagenomic analysis reduced amino acid |
url | https://www.mdpi.com/2073-4425/12/9/1455 |
work_keys_str_mv | AT kazukitakabatake improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets AT kazukiizawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets AT motohiroakikawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets AT keisukeyanagisawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets AT masahitoohue improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets AT yutakaakiyama improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets |