KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
The KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficientl...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Genes |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4425/14/2/386 |
_version_ | 1797620845617610752 |
---|---|
author | Chao Zhang Zhongwei Chen Miming Zhang Shulei Jia |
author_facet | Chao Zhang Zhongwei Chen Miming Zhang Shulei Jia |
author_sort | Chao Zhang |
collection | DOAJ |
description | The KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficiently extract and sort the annotation results of KEGG still hinders the subsequent genome analysis. There is a lack of effective measures used to quickly extract and classify the gene sequences and species information of the KEGG annotations. Here, we present a supporting tool: KEGG_Extractor for species-specific genes extraction and classification, which can output the results through an iterative keyword matching algorithm. It can not only extract and classify the amino acid sequences, but also the nucleotide sequences, and it has proved to be fast and efficient for microbial analysis. Analysis of the ancient Wood Ljungdahl (WL) pathway through the KEGG_Extractor reveals that ~226 archaeal strains contained the WL pathway-related genes. Most of them were <i>Methanococcus maripaludis</i>, <i>Methanosarcina mazei</i> and members of the <i>Methanobacterium</i>, <i>Thermococcus</i> and <i>Methanosarcina</i> genus. Using the KEGG_Extractor, the ARWL database was constructed, which had a high accuracy and complement. This tool helps to link genes with the KEGG pathway and promote the reconstruction of molecular networks. Availability and implementation: KEGG_Extractor is freely available from the GitHub. |
first_indexed | 2024-03-11T08:47:19Z |
format | Article |
id | doaj.art-d0a4cb2308a141db8a12c66fede3a095 |
institution | Directory Open Access Journal |
issn | 2073-4425 |
language | English |
last_indexed | 2024-03-11T08:47:19Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Genes |
spelling | doaj.art-d0a4cb2308a141db8a12c66fede3a0952023-11-16T20:42:16ZengMDPI AGGenes2073-44252023-02-0114238610.3390/genes14020386KEGG_Extractor: An Effective Extraction Tool for KEGG OrthologsChao Zhang0Zhongwei Chen1Miming Zhang2Shulei Jia3Marine Sustainable Development Research Center, Third Institute of Oceanography, Xiamen 361102, ChinaNantong Marine Environmental Monitoring Center, Ministry of Natural Resources, Nantong 226002, ChinaThird Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, ChinaInstitute of Microbiology, Chinese Academy of Sciences, Beijing 100101, ChinaThe KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficiently extract and sort the annotation results of KEGG still hinders the subsequent genome analysis. There is a lack of effective measures used to quickly extract and classify the gene sequences and species information of the KEGG annotations. Here, we present a supporting tool: KEGG_Extractor for species-specific genes extraction and classification, which can output the results through an iterative keyword matching algorithm. It can not only extract and classify the amino acid sequences, but also the nucleotide sequences, and it has proved to be fast and efficient for microbial analysis. Analysis of the ancient Wood Ljungdahl (WL) pathway through the KEGG_Extractor reveals that ~226 archaeal strains contained the WL pathway-related genes. Most of them were <i>Methanococcus maripaludis</i>, <i>Methanosarcina mazei</i> and members of the <i>Methanobacterium</i>, <i>Thermococcus</i> and <i>Methanosarcina</i> genus. Using the KEGG_Extractor, the ARWL database was constructed, which had a high accuracy and complement. This tool helps to link genes with the KEGG pathway and promote the reconstruction of molecular networks. Availability and implementation: KEGG_Extractor is freely available from the GitHub.https://www.mdpi.com/2073-4425/14/2/386KEGGKEGG Orthology (KO)keyword matchingmethodology |
spellingShingle | Chao Zhang Zhongwei Chen Miming Zhang Shulei Jia KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs Genes KEGG KEGG Orthology (KO) keyword matching methodology |
title | KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs |
title_full | KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs |
title_fullStr | KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs |
title_full_unstemmed | KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs |
title_short | KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs |
title_sort | kegg extractor an effective extraction tool for kegg orthologs |
topic | KEGG KEGG Orthology (KO) keyword matching methodology |
url | https://www.mdpi.com/2073-4425/14/2/386 |
work_keys_str_mv | AT chaozhang keggextractoraneffectiveextractiontoolforkeggorthologs AT zhongweichen keggextractoraneffectiveextractiontoolforkeggorthologs AT mimingzhang keggextractoraneffectiveextractiontoolforkeggorthologs AT shuleijia keggextractoraneffectiveextractiontoolforkeggorthologs |