PATtyFams: Protein families for the microbial genomes in the PATRIC database
The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Reso...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2016-02-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/full |
_version_ | 1811300917152055296 |
---|---|
author | James J Davis James J Davis Svetlana eGerdes Svetlana eGerdes Gary J Olsen Robert eOlson Robert eOlson Gordon D Pusch Gordon D Pusch Maulik eShukla Maulik eShukla Veronika eVonstein Veronika eVonstein Alice R Wattam Hyunseung eYoo Hyunseung eYoo |
author_facet | James J Davis James J Davis Svetlana eGerdes Svetlana eGerdes Gary J Olsen Robert eOlson Robert eOlson Gordon D Pusch Gordon D Pusch Maulik eShukla Maulik eShukla Veronika eVonstein Veronika eVonstein Alice R Wattam Hyunseung eYoo Hyunseung eYoo |
author_sort | James J Davis |
collection | DOAJ |
description | The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. |
first_indexed | 2024-04-13T06:58:29Z |
format | Article |
id | doaj.art-0f8e26c2d75b448e8852ebf30e4399f3 |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-04-13T06:58:29Z |
publishDate | 2016-02-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-0f8e26c2d75b448e8852ebf30e4399f32022-12-22T02:57:10ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2016-02-01710.3389/fmicb.2016.00118179115PATtyFams: Protein families for the microbial genomes in the PATRIC databaseJames J Davis0James J Davis1Svetlana eGerdes2Svetlana eGerdes3Gary J Olsen4Robert eOlson5Robert eOlson6Gordon D Pusch7Gordon D Pusch8Maulik eShukla9Maulik eShukla10Veronika eVonstein11Veronika eVonstein12Alice R Wattam13Hyunseung eYoo14Hyunseung eYoo15University of ChicagoArgonne National LaboratoryArgonne National LaboratoryFellowship for Interpretation of GenomesUniversity of Illinois at Urbana-ChampaignArgonne National LaboratoryUniversity of ChicagoArgonne National LaboratoryFellowship for Interpretation of GenomesArgonne National LaboratoryUniversity of ChicagoArgonne National LaboratoryFellowship for Interpretation of GenomesVirginia Tech UniversityArgonne National LaboratoryUniversity of ChicagoThe ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/fullComparative genomicsGenome annotationmetabolic modelingRASTFIGfams |
spellingShingle | James J Davis James J Davis Svetlana eGerdes Svetlana eGerdes Gary J Olsen Robert eOlson Robert eOlson Gordon D Pusch Gordon D Pusch Maulik eShukla Maulik eShukla Veronika eVonstein Veronika eVonstein Alice R Wattam Hyunseung eYoo Hyunseung eYoo PATtyFams: Protein families for the microbial genomes in the PATRIC database Frontiers in Microbiology Comparative genomics Genome annotation metabolic modeling RAST FIGfams |
title | PATtyFams: Protein families for the microbial genomes in the PATRIC database |
title_full | PATtyFams: Protein families for the microbial genomes in the PATRIC database |
title_fullStr | PATtyFams: Protein families for the microbial genomes in the PATRIC database |
title_full_unstemmed | PATtyFams: Protein families for the microbial genomes in the PATRIC database |
title_short | PATtyFams: Protein families for the microbial genomes in the PATRIC database |
title_sort | pattyfams protein families for the microbial genomes in the patric database |
topic | Comparative genomics Genome annotation metabolic modeling RAST FIGfams |
url | http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/full |
work_keys_str_mv | AT jamesjdavis pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT jamesjdavis pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT svetlanaegerdes pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT svetlanaegerdes pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT garyjolsen pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT roberteolson pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT roberteolson pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT gordondpusch pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT gordondpusch pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT maulikeshukla pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT maulikeshukla pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT veronikaevonstein pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT veronikaevonstein pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT alicerwattam pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT hyunseungeyoo pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT hyunseungeyoo pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase |