Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier
Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbio...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-04-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2021.642282/full |
_version_ | 1819295073816281088 |
---|---|
author | Samuel Anyaso-Samuel Archie Sachdeva Subharup Guha Somnath Datta |
author_facet | Samuel Anyaso-Samuel Archie Sachdeva Subharup Guha Somnath Datta |
author_sort | Samuel Anyaso-Samuel |
collection | DOAJ |
description | Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier. |
first_indexed | 2024-12-24T04:36:25Z |
format | Article |
id | doaj.art-4b3a846c050043d0a2e1aad008363dbf |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-24T04:36:25Z |
publishDate | 2021-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-4b3a846c050043d0a2e1aad008363dbf2022-12-21T17:15:09ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-04-011210.3389/fgene.2021.642282642282Metagenomic Geolocation Prediction Using an Adaptive Ensemble ClassifierSamuel Anyaso-SamuelArchie SachdevaSubharup GuhaSomnath DattaMicrobiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier.https://www.frontiersin.org/articles/10.3389/fgene.2021.642282/fullmetagenomicsmachine learningensemble classifiermicrobiomegeolocation |
spellingShingle | Samuel Anyaso-Samuel Archie Sachdeva Subharup Guha Somnath Datta Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier Frontiers in Genetics metagenomics machine learning ensemble classifier microbiome geolocation |
title | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_full | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_fullStr | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_full_unstemmed | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_short | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_sort | metagenomic geolocation prediction using an adaptive ensemble classifier |
topic | metagenomics machine learning ensemble classifier microbiome geolocation |
url | https://www.frontiersin.org/articles/10.3389/fgene.2021.642282/full |
work_keys_str_mv | AT samuelanyasosamuel metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT archiesachdeva metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT subharupguha metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT somnathdatta metagenomicgeolocationpredictionusinganadaptiveensembleclassifier |