A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions

Abstract Background Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade ad...

Full description

Bibliographic Details
Main Authors: Lu Yang, Jun Chen
Format: Article
Language:English
Published: BMC 2022-08-01
Series:Microbiome
Subjects:
Online Access:https://doi.org/10.1186/s40168-022-01320-0
_version_ 1811340418885877760
author Lu Yang
Jun Chen
author_facet Lu Yang
Jun Chen
author_sort Lu Yang
collection DOAJ
description Abstract Background Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade addressing the special characteristics of microbiome data such as zero inflation and compositional effects. Disturbingly, different DAA tools could sometimes produce quite discordant results, opening to the possibility of cherry-picking the tool in favor of one’s own hypothesis. To recommend the best DAA tool or practice to the field, a comprehensive evaluation, which covers as many biologically relevant scenarios as possible, is critically needed. Results We performed by far the most comprehensive evaluation of existing DAA tools using real data-based simulations. We found that DAA methods explicitly addressing compositional effects such as ANCOM-BC, Aldex2, metagenomeSeq (fitFeatureModel), and DACOMP did have improved performance in false-positive control. But they are still not optimal: type 1 error inflation or low statistical power has been observed in many settings. The recent LDM method generally had the best power, but its false-positive control in the presence of strong compositional effects was not satisfactory. Overall, none of the evaluated methods is simultaneously robust, powerful, and flexible, which makes the selection of the best DAA tool difficult. To meet the analysis needs, we designed an optimized procedure, ZicoSeq, drawing on the strength of the existing DAA methods. We show that ZicoSeq generally controlled for false positives across settings, and the power was among the highest. Application of DAA methods to a large collection of real datasets revealed a similar pattern observed in simulation studies. Conclusions Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery. Video Abstract
first_indexed 2024-04-13T18:41:37Z
format Article
id doaj.art-4c245bb82c744a438f7b3ddcf9ae8a7c
institution Directory Open Access Journal
issn 2049-2618
language English
last_indexed 2024-04-13T18:41:37Z
publishDate 2022-08-01
publisher BMC
record_format Article
series Microbiome
spelling doaj.art-4c245bb82c744a438f7b3ddcf9ae8a7c2022-12-22T02:34:42ZengBMCMicrobiome2049-26182022-08-0110112310.1186/s40168-022-01320-0A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutionsLu Yang0Jun Chen1Division of Computational Biology, Department of Quantitative Health Sciences, Mayo ClinicDivision of Computational Biology, Department of Quantitative Health Sciences, Mayo ClinicAbstract Background Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade addressing the special characteristics of microbiome data such as zero inflation and compositional effects. Disturbingly, different DAA tools could sometimes produce quite discordant results, opening to the possibility of cherry-picking the tool in favor of one’s own hypothesis. To recommend the best DAA tool or practice to the field, a comprehensive evaluation, which covers as many biologically relevant scenarios as possible, is critically needed. Results We performed by far the most comprehensive evaluation of existing DAA tools using real data-based simulations. We found that DAA methods explicitly addressing compositional effects such as ANCOM-BC, Aldex2, metagenomeSeq (fitFeatureModel), and DACOMP did have improved performance in false-positive control. But they are still not optimal: type 1 error inflation or low statistical power has been observed in many settings. The recent LDM method generally had the best power, but its false-positive control in the presence of strong compositional effects was not satisfactory. Overall, none of the evaluated methods is simultaneously robust, powerful, and flexible, which makes the selection of the best DAA tool difficult. To meet the analysis needs, we designed an optimized procedure, ZicoSeq, drawing on the strength of the existing DAA methods. We show that ZicoSeq generally controlled for false positives across settings, and the power was among the highest. Application of DAA methods to a large collection of real datasets revealed a similar pattern observed in simulation studies. Conclusions Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery. Video Abstracthttps://doi.org/10.1186/s40168-022-01320-0MicrobiomeMetagenomicsStatistical methodsDifferential abundance analysisFalse discovery rateCompositional effects
spellingShingle Lu Yang
Jun Chen
A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
Microbiome
Microbiome
Metagenomics
Statistical methods
Differential abundance analysis
False discovery rate
Compositional effects
title A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
title_full A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
title_fullStr A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
title_full_unstemmed A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
title_short A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions
title_sort comprehensive evaluation of microbial differential abundance analysis methods current status and potential solutions
topic Microbiome
Metagenomics
Statistical methods
Differential abundance analysis
False discovery rate
Compositional effects
url https://doi.org/10.1186/s40168-022-01320-0
work_keys_str_mv AT luyang acomprehensiveevaluationofmicrobialdifferentialabundanceanalysismethodscurrentstatusandpotentialsolutions
AT junchen acomprehensiveevaluationofmicrobialdifferentialabundanceanalysismethodscurrentstatusandpotentialsolutions
AT luyang comprehensiveevaluationofmicrobialdifferentialabundanceanalysismethodscurrentstatusandpotentialsolutions
AT junchen comprehensiveevaluationofmicrobialdifferentialabundanceanalysismethodscurrentstatusandpotentialsolutions