MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based o...

Full description

Bibliographic Details
Main Authors: Alexander J Ropelewski, Hugh B Nicholas, Ricardo R Gonzalez Mendez
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-11-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2981553?pdf=render
_version_ 1819000676015931392
author Alexander J Ropelewski
Hugh B Nicholas
Ricardo R Gonzalez Mendez
author_facet Alexander J Ropelewski
Hugh B Nicholas
Ricardo R Gonzalez Mendez
author_sort Alexander J Ropelewski
collection DOAJ
description Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.
first_indexed 2024-12-20T22:37:06Z
format Article
id doaj.art-725716e6756e4a79848a6cfbacac013e
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-20T22:37:06Z
publishDate 2010-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-725716e6756e4a79848a6cfbacac013e2022-12-21T19:24:35ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-11-01511e1399910.1371/journal.pone.0013999MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.Alexander J RopelewskiHugh B NicholasRicardo R Gonzalez MendezPhylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.http://europepmc.org/articles/PMC2981553?pdf=render
spellingShingle Alexander J Ropelewski
Hugh B Nicholas
Ricardo R Gonzalez Mendez
MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
PLoS ONE
title MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
title_full MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
title_fullStr MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
title_full_unstemmed MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
title_short MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.
title_sort mpi phylip parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families
url http://europepmc.org/articles/PMC2981553?pdf=render
work_keys_str_mv AT alexanderjropelewski mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies
AT hughbnicholas mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies
AT ricardorgonzalezmendez mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies