Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets

Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI...

Full description

Bibliographic Details
Main Authors: Kazuki Takabatake, Kazuki Izawa, Motohiro Akikawa, Keisuke Yanagisawa, Masahito Ohue, Yutaka Akiyama
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/12/9/1455
_version_ 1797519103756337152
author Kazuki Takabatake
Kazuki Izawa
Motohiro Akikawa
Keisuke Yanagisawa
Masahito Ohue
Yutaka Akiyama
author_facet Kazuki Takabatake
Kazuki Izawa
Motohiro Akikawa
Keisuke Yanagisawa
Masahito Ohue
Yutaka Akiyama
author_sort Kazuki Takabatake
collection DOAJ
description Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.
first_indexed 2024-03-10T07:38:24Z
format Article
id doaj.art-1ea5b658173645f8b306b4851f46bb40
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-10T07:38:24Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-1ea5b658173645f8b306b4851f46bb402023-11-22T13:15:24ZengMDPI AGGenes2073-44252021-09-01129145510.3390/genes12091455Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid AlphabetsKazuki Takabatake0Kazuki Izawa1Motohiro Akikawa2Keisuke Yanagisawa3Masahito Ohue4Yutaka Akiyama5Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanDepartment of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, JapanMetagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.https://www.mdpi.com/2073-4425/12/9/1455homology searchgenome sequencemetagenomic analysisreduced amino acid
spellingShingle Kazuki Takabatake
Kazuki Izawa
Motohiro Akikawa
Keisuke Yanagisawa
Masahito Ohue
Yutaka Akiyama
Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
Genes
homology search
genome sequence
metagenomic analysis
reduced amino acid
title Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
title_full Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
title_fullStr Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
title_full_unstemmed Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
title_short Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets
title_sort improved large scale homology search by two step seed search using multiple reduced amino acid alphabets
topic homology search
genome sequence
metagenomic analysis
reduced amino acid
url https://www.mdpi.com/2073-4425/12/9/1455
work_keys_str_mv AT kazukitakabatake improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets
AT kazukiizawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets
AT motohiroakikawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets
AT keisukeyanagisawa improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets
AT masahitoohue improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets
AT yutakaakiyama improvedlargescalehomologysearchbytwostepseedsearchusingmultiplereducedaminoacidalphabets