A fast parallel algorithm for finding the longest common sequence of multiple biosequences

Abstract Background Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS...

Full description

Bibliographic Details
Main Authors:	Wan Andrew, Chen Yixin, Liu Wei
Format:	Article
Language:	English
Published:	BMC 2006-12-01
Series:	BMC Bioinformatics

_version_	1811282662373982208
author	Wan Andrew Chen Yixin Liu Wei
author_facet	Wan Andrew Chen Yixin Liu Wei
author_sort	Wan Andrew
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS.</p> <p>Results</p> <p>A fast parallel algorithm for LCS is presented. The algorithm first constructs a novel successor table to obtain all the identical pairs and their levels. It then obtains the LCS by tracing back from the identical character pairs at the last level. Effective pruning techniques are developed to significantly reduce the computational complexity. Experimental results on gene sequences in the <it>tigr </it>database show that our algorithm is optimal and much more efficient than other leading LCS algorithms.</p> <p>Conclusion</p> <p>We have developed one of the fastest parallel LCS algorithms on an MPP parallel computing model. For two sequences <it>X </it>and <it>Y </it>with lengths <it>n </it>and <it>m</it>, respectively, the memory required is max{4(<it>n</it>+1)+4(<it>m</it>+1), <it>L</it>}, where <it>L </it>is the number of identical character pairs. The time complexity is O(<it>L</it>) for sequential execution, and <it>O</it>(\|LCS(<it>X</it>, <it>Y</it>)\|) for parallel execution, where \|LCS(<it>X</it>, <it>Y</it>)\| is the length of the LCS of <it>X </it>and <it>Y</it>. For <it>n </it>sequences <it>X</it><sub>1</sub>, <it>X</it><sub>2</sub>, ..., <it>X</it><sub><it>n</it></sub>, the time complexity is O(<it>L</it>) for sequential execution, and <it>O</it>(\|LCS(<it>X</it><sub>1</sub>, <it>X</it><sub>2</sub>, ..., <it>X</it><sub><it>n</it></sub>)\|) for parallel execution. Experimental results support our analysis by showing significant improvement of the proposed method over other leading LCS algorithms.</p>
first_indexed	2024-04-13T01:56:35Z
format	Article
id	doaj.art-dfc264ab7a144869a5bfbfaef4619528
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-13T01:56:35Z
publishDate	2006-12-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-dfc264ab7a144869a5bfbfaef46195282022-12-22T03:07:45ZengBMCBMC Bioinformatics1471-21052006-12-017Suppl 4S410.1186/1471-2105-7-S4-S4A fast parallel algorithm for finding the longest common sequence of multiple biosequencesWan AndrewChen YixinLiu Wei<p>Abstract</p> <p>Background</p> <p>Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS.</p> <p>Results</p> <p>A fast parallel algorithm for LCS is presented. The algorithm first constructs a novel successor table to obtain all the identical pairs and their levels. It then obtains the LCS by tracing back from the identical character pairs at the last level. Effective pruning techniques are developed to significantly reduce the computational complexity. Experimental results on gene sequences in the <it>tigr </it>database show that our algorithm is optimal and much more efficient than other leading LCS algorithms.</p> <p>Conclusion</p> <p>We have developed one of the fastest parallel LCS algorithms on an MPP parallel computing model. For two sequences <it>X </it>and <it>Y </it>with lengths <it>n </it>and <it>m</it>, respectively, the memory required is max{4(<it>n</it>+1)+4(<it>m</it>+1), <it>L</it>}, where <it>L </it>is the number of identical character pairs. The time complexity is O(<it>L</it>) for sequential execution, and <it>O</it>(\|LCS(<it>X</it>, <it>Y</it>)\|) for parallel execution, where \|LCS(<it>X</it>, <it>Y</it>)\| is the length of the LCS of <it>X </it>and <it>Y</it>. For <it>n </it>sequences <it>X</it><sub>1</sub>, <it>X</it><sub>2</sub>, ..., <it>X</it><sub><it>n</it></sub>, the time complexity is O(<it>L</it>) for sequential execution, and <it>O</it>(\|LCS(<it>X</it><sub>1</sub>, <it>X</it><sub>2</sub>, ..., <it>X</it><sub><it>n</it></sub>)\|) for parallel execution. Experimental results support our analysis by showing significant improvement of the proposed method over other leading LCS algorithms.</p>
spellingShingle	Wan Andrew Chen Yixin Liu Wei A fast parallel algorithm for finding the longest common sequence of multiple biosequences BMC Bioinformatics
title	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_full	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_fullStr	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_full_unstemmed	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_short	A fast parallel algorithm for finding the longest common sequence of multiple biosequences
title_sort	fast parallel algorithm for finding the longest common sequence of multiple biosequences
work_keys_str_mv	AT wanandrew afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT chenyixin afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT liuwei afastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT wanandrew fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT chenyixin fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences AT liuwei fastparallelalgorithmforfindingthelongestcommonsequenceofmultiplebiosequences

A fast parallel algorithm for finding the longest common sequence of multiple biosequences

Similar Items