KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs

The KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficientl...

Full description

Bibliographic Details
Main Authors: Chao Zhang, Zhongwei Chen, Miming Zhang, Shulei Jia
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/14/2/386
_version_ 1797620845617610752
author Chao Zhang
Zhongwei Chen
Miming Zhang
Shulei Jia
author_facet Chao Zhang
Zhongwei Chen
Miming Zhang
Shulei Jia
author_sort Chao Zhang
collection DOAJ
description The KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficiently extract and sort the annotation results of KEGG still hinders the subsequent genome analysis. There is a lack of effective measures used to quickly extract and classify the gene sequences and species information of the KEGG annotations. Here, we present a supporting tool: KEGG_Extractor for species-specific genes extraction and classification, which can output the results through an iterative keyword matching algorithm. It can not only extract and classify the amino acid sequences, but also the nucleotide sequences, and it has proved to be fast and efficient for microbial analysis. Analysis of the ancient Wood Ljungdahl (WL) pathway through the KEGG_Extractor reveals that ~226 archaeal strains contained the WL pathway-related genes. Most of them were <i>Methanococcus maripaludis</i>, <i>Methanosarcina mazei</i> and members of the <i>Methanobacterium</i>, <i>Thermococcus</i> and <i>Methanosarcina</i> genus. Using the KEGG_Extractor, the ARWL database was constructed, which had a high accuracy and complement. This tool helps to link genes with the KEGG pathway and promote the reconstruction of molecular networks. Availability and implementation: KEGG_Extractor is freely available from the GitHub.
first_indexed 2024-03-11T08:47:19Z
format Article
id doaj.art-d0a4cb2308a141db8a12c66fede3a095
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-11T08:47:19Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-d0a4cb2308a141db8a12c66fede3a0952023-11-16T20:42:16ZengMDPI AGGenes2073-44252023-02-0114238610.3390/genes14020386KEGG_Extractor: An Effective Extraction Tool for KEGG OrthologsChao Zhang0Zhongwei Chen1Miming Zhang2Shulei Jia3Marine Sustainable Development Research Center, Third Institute of Oceanography, Xiamen 361102, ChinaNantong Marine Environmental Monitoring Center, Ministry of Natural Resources, Nantong 226002, ChinaThird Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, ChinaInstitute of Microbiology, Chinese Academy of Sciences, Beijing 100101, ChinaThe KEGG Orthology (KO) database is a widely used molecular function reference database which can be used to conduct functional annotation of most microorganisms. At present, there are many KEGG tools based on the KO entries for annotating functional orthologs. However, determining how to efficiently extract and sort the annotation results of KEGG still hinders the subsequent genome analysis. There is a lack of effective measures used to quickly extract and classify the gene sequences and species information of the KEGG annotations. Here, we present a supporting tool: KEGG_Extractor for species-specific genes extraction and classification, which can output the results through an iterative keyword matching algorithm. It can not only extract and classify the amino acid sequences, but also the nucleotide sequences, and it has proved to be fast and efficient for microbial analysis. Analysis of the ancient Wood Ljungdahl (WL) pathway through the KEGG_Extractor reveals that ~226 archaeal strains contained the WL pathway-related genes. Most of them were <i>Methanococcus maripaludis</i>, <i>Methanosarcina mazei</i> and members of the <i>Methanobacterium</i>, <i>Thermococcus</i> and <i>Methanosarcina</i> genus. Using the KEGG_Extractor, the ARWL database was constructed, which had a high accuracy and complement. This tool helps to link genes with the KEGG pathway and promote the reconstruction of molecular networks. Availability and implementation: KEGG_Extractor is freely available from the GitHub.https://www.mdpi.com/2073-4425/14/2/386KEGGKEGG Orthology (KO)keyword matchingmethodology
spellingShingle Chao Zhang
Zhongwei Chen
Miming Zhang
Shulei Jia
KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
Genes
KEGG
KEGG Orthology (KO)
keyword matching
methodology
title KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
title_full KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
title_fullStr KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
title_full_unstemmed KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
title_short KEGG_Extractor: An Effective Extraction Tool for KEGG Orthologs
title_sort kegg extractor an effective extraction tool for kegg orthologs
topic KEGG
KEGG Orthology (KO)
keyword matching
methodology
url https://www.mdpi.com/2073-4425/14/2/386
work_keys_str_mv AT chaozhang keggextractoraneffectiveextractiontoolforkeggorthologs
AT zhongweichen keggextractoraneffectiveextractiontoolforkeggorthologs
AT mimingzhang keggextractoraneffectiveextractiontoolforkeggorthologs
AT shuleijia keggextractoraneffectiveextractiontoolforkeggorthologs