PATtyFams: Protein families for the microbial genomes in the PATRIC database

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Reso...

Full description

Bibliographic Details
Main Authors: James J Davis, Svetlana eGerdes, Gary J Olsen, Robert eOlson, Gordon D Pusch, Maulik eShukla, Veronika eVonstein, Alice R Wattam, Hyunseung eYoo
Format: Article
Language:English
Published: Frontiers Media S.A. 2016-02-01
Series:Frontiers in Microbiology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/full
_version_ 1811300917152055296
author James J Davis
James J Davis
Svetlana eGerdes
Svetlana eGerdes
Gary J Olsen
Robert eOlson
Robert eOlson
Gordon D Pusch
Gordon D Pusch
Maulik eShukla
Maulik eShukla
Veronika eVonstein
Veronika eVonstein
Alice R Wattam
Hyunseung eYoo
Hyunseung eYoo
author_facet James J Davis
James J Davis
Svetlana eGerdes
Svetlana eGerdes
Gary J Olsen
Robert eOlson
Robert eOlson
Gordon D Pusch
Gordon D Pusch
Maulik eShukla
Maulik eShukla
Veronika eVonstein
Veronika eVonstein
Alice R Wattam
Hyunseung eYoo
Hyunseung eYoo
author_sort James J Davis
collection DOAJ
description The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.
first_indexed 2024-04-13T06:58:29Z
format Article
id doaj.art-0f8e26c2d75b448e8852ebf30e4399f3
institution Directory Open Access Journal
issn 1664-302X
language English
last_indexed 2024-04-13T06:58:29Z
publishDate 2016-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj.art-0f8e26c2d75b448e8852ebf30e4399f32022-12-22T02:57:10ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2016-02-01710.3389/fmicb.2016.00118179115PATtyFams: Protein families for the microbial genomes in the PATRIC databaseJames J Davis0James J Davis1Svetlana eGerdes2Svetlana eGerdes3Gary J Olsen4Robert eOlson5Robert eOlson6Gordon D Pusch7Gordon D Pusch8Maulik eShukla9Maulik eShukla10Veronika eVonstein11Veronika eVonstein12Alice R Wattam13Hyunseung eYoo14Hyunseung eYoo15University of ChicagoArgonne National LaboratoryArgonne National LaboratoryFellowship for Interpretation of GenomesUniversity of Illinois at Urbana-ChampaignArgonne National LaboratoryUniversity of ChicagoArgonne National LaboratoryFellowship for Interpretation of GenomesArgonne National LaboratoryUniversity of ChicagoArgonne National LaboratoryFellowship for Interpretation of GenomesVirginia Tech UniversityArgonne National LaboratoryUniversity of ChicagoThe ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/fullComparative genomicsGenome annotationmetabolic modelingRASTFIGfams
spellingShingle James J Davis
James J Davis
Svetlana eGerdes
Svetlana eGerdes
Gary J Olsen
Robert eOlson
Robert eOlson
Gordon D Pusch
Gordon D Pusch
Maulik eShukla
Maulik eShukla
Veronika eVonstein
Veronika eVonstein
Alice R Wattam
Hyunseung eYoo
Hyunseung eYoo
PATtyFams: Protein families for the microbial genomes in the PATRIC database
Frontiers in Microbiology
Comparative genomics
Genome annotation
metabolic modeling
RAST
FIGfams
title PATtyFams: Protein families for the microbial genomes in the PATRIC database
title_full PATtyFams: Protein families for the microbial genomes in the PATRIC database
title_fullStr PATtyFams: Protein families for the microbial genomes in the PATRIC database
title_full_unstemmed PATtyFams: Protein families for the microbial genomes in the PATRIC database
title_short PATtyFams: Protein families for the microbial genomes in the PATRIC database
title_sort pattyfams protein families for the microbial genomes in the patric database
topic Comparative genomics
Genome annotation
metabolic modeling
RAST
FIGfams
url http://journal.frontiersin.org/Journal/10.3389/fmicb.2016.00118/full
work_keys_str_mv AT jamesjdavis pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT jamesjdavis pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT svetlanaegerdes pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT svetlanaegerdes pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT garyjolsen pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT roberteolson pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT roberteolson pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT gordondpusch pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT gordondpusch pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT maulikeshukla pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT maulikeshukla pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT veronikaevonstein pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT veronikaevonstein pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT alicerwattam pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT hyunseungeyoo pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT hyunseungeyoo pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase