Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase

ABSTRACT UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, exhibiting nearly linear scaling but suffer...

Full description

Bibliographic Details
Main Authors: Igor Sfiligoi, George Armstrong, Antonio Gonzalez, Daniel McDonald, Rob Knight
Format: Article
Language:English
Published: American Society for Microbiology 2022-06-01
Series:mSystems
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/msystems.00028-22
_version_ 1811342733965524992
author Igor Sfiligoi
George Armstrong
Antonio Gonzalez
Daniel McDonald
Rob Knight
author_facet Igor Sfiligoi
George Armstrong
Antonio Gonzalez
Daniel McDonald
Rob Knight
author_sort Igor Sfiligoi
collection DOAJ
description ABSTRACT UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, exhibiting nearly linear scaling but suffering from memory contention. Here, we adapt UniFrac to graphics processing units using OpenACC, enabling greater than 1,000× computational improvement, and apply it to 307,237 samples, the largest 16S rRNA V4 uniformly preprocessed microbiome data set analyzed to date. IMPORTANCE UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another. Here, we adapt UniFrac to operate on graphics processing units, enabling a 1,000× computational improvement. To highlight this advance, we perform what may be the largest microbiome analysis to date, applying UniFrac to 307,237 16S rRNA V4 microbiome samples preprocessed with Deblur. These scaling improvements turn UniFrac into a real-time tool for common data sets and unlock new research questions as more microbiome data are collected.
first_indexed 2024-04-13T19:15:56Z
format Article
id doaj.art-c324a129963e4ad5b8e43f5f10c0b6ce
institution Directory Open Access Journal
issn 2379-5077
language English
last_indexed 2024-04-13T19:15:56Z
publishDate 2022-06-01
publisher American Society for Microbiology
record_format Article
series mSystems
spelling doaj.art-c324a129963e4ad5b8e43f5f10c0b6ce2022-12-22T02:33:41ZengAmerican Society for MicrobiologymSystems2379-50772022-06-017310.1128/msystems.00028-22Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed IncreaseIgor Sfiligoi0George Armstrong1Antonio Gonzalez2Daniel McDonald3Rob Knight4San Diego Supercomputing Center, University of California, San Diego, La Jolla, California, USABioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California, USADepartment of Pediatrics, University of California, San Diego, La Jolla, California, USADepartment of Pediatrics, University of California, San Diego, La Jolla, California, USADepartment of Pediatrics, University of California, San Diego, La Jolla, California, USAABSTRACT UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, exhibiting nearly linear scaling but suffering from memory contention. Here, we adapt UniFrac to graphics processing units using OpenACC, enabling greater than 1,000× computational improvement, and apply it to 307,237 samples, the largest 16S rRNA V4 uniformly preprocessed microbiome data set analyzed to date. IMPORTANCE UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another. Here, we adapt UniFrac to operate on graphics processing units, enabling a 1,000× computational improvement. To highlight this advance, we perform what may be the largest microbiome analysis to date, applying UniFrac to 307,237 16S rRNA V4 microbiome samples preprocessed with Deblur. These scaling improvements turn UniFrac into a real-time tool for common data sets and unlock new research questions as more microbiome data are collected.https://journals.asm.org/doi/10.1128/msystems.00028-22microbiomeGPUOpenACCoptimizationUniFrac
spellingShingle Igor Sfiligoi
George Armstrong
Antonio Gonzalez
Daniel McDonald
Rob Knight
Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
mSystems
microbiome
GPU
OpenACC
optimization
UniFrac
title Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
title_full Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
title_fullStr Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
title_full_unstemmed Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
title_short Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
title_sort optimizing unifrac with openacc yields greater than one thousand times speed increase
topic microbiome
GPU
OpenACC
optimization
UniFrac
url https://journals.asm.org/doi/10.1128/msystems.00028-22
work_keys_str_mv AT igorsfiligoi optimizingunifracwithopenaccyieldsgreaterthanonethousandtimesspeedincrease
AT georgearmstrong optimizingunifracwithopenaccyieldsgreaterthanonethousandtimesspeedincrease
AT antoniogonzalez optimizingunifracwithopenaccyieldsgreaterthanonethousandtimesspeedincrease
AT danielmcdonald optimizingunifracwithopenaccyieldsgreaterthanonethousandtimesspeedincrease
AT robknight optimizingunifracwithopenaccyieldsgreaterthanonethousandtimesspeedincrease