Improved orthologous databases to ease protozoan targets inference

Abstract Background Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformati...

Full description

Bibliographic Details
Main Authors:	Nelson Kotowski, Rodrigo Jardim, Alberto M. R. Dávila
Format:	Article
Language:	English
Published:	BMC 2015-09-01
Series:	Parasites & Vectors
Subjects:	Comparative genomics Homology inference Target identification Protozoa Orthologous database Distant homology
Online Access:	https://doi.org/10.1186/s13071-015-1090-0

_version_	1797811852622692352
author	Nelson Kotowski Rodrigo Jardim Alberto M. R. Dávila
author_facet	Nelson Kotowski Rodrigo Jardim Alberto M. R. Dávila
author_sort	Nelson Kotowski
collection	DOAJ
description	Abstract Background Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Methods Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. Results The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. Conclusions The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.
first_indexed	2024-03-13T07:28:52Z
format	Article
id	doaj.art-da467e4b95f745e696889ab1f83deb64
institution	Directory Open Access Journal
issn	1756-3305
language	English
last_indexed	2024-03-13T07:28:52Z
publishDate	2015-09-01
publisher	BMC
record_format	Article
series	Parasites & Vectors
spelling	doaj.art-da467e4b95f745e696889ab1f83deb642023-06-04T11:13:21ZengBMCParasites & Vectors1756-33052015-09-018111210.1186/s13071-015-1090-0Improved orthologous databases to ease protozoan targets inferenceNelson Kotowski0Rodrigo Jardim1Alberto M. R. Dávila2Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, FIOCRUZComputational and Systems Biology Laboratory, Oswaldo Cruz Institute, FIOCRUZComputational and Systems Biology Laboratory, Oswaldo Cruz Institute, FIOCRUZAbstract Background Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Methods Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. Results The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. Conclusions The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.https://doi.org/10.1186/s13071-015-1090-0Comparative genomicsHomology inferenceTarget identificationProtozoaOrthologous databaseDistant homology
spellingShingle	Nelson Kotowski Rodrigo Jardim Alberto M. R. Dávila Improved orthologous databases to ease protozoan targets inference Parasites & Vectors Comparative genomics Homology inference Target identification Protozoa Orthologous database Distant homology
title	Improved orthologous databases to ease protozoan targets inference
title_full	Improved orthologous databases to ease protozoan targets inference
title_fullStr	Improved orthologous databases to ease protozoan targets inference
title_full_unstemmed	Improved orthologous databases to ease protozoan targets inference
title_short	Improved orthologous databases to ease protozoan targets inference
title_sort	improved orthologous databases to ease protozoan targets inference
topic	Comparative genomics Homology inference Target identification Protozoa Orthologous database Distant homology
url	https://doi.org/10.1186/s13071-015-1090-0
work_keys_str_mv	AT nelsonkotowski improvedorthologousdatabasestoeaseprotozoantargetsinference AT rodrigojardim improvedorthologousdatabasestoeaseprotozoantargetsinference AT albertomrdavila improvedorthologousdatabasestoeaseprotozoantargetsinference

Improved orthologous databases to ease protozoan targets inference

Similar Items