CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and c...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Genes |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4425/14/3/634 |
_version_ | 1797611559506149376 |
---|---|
author | Ritwika Das Anil Rai Dwijesh Chandra Mishra |
author_facet | Ritwika Das Anil Rai Dwijesh Chandra Mishra |
author_sort | Ritwika Das |
collection | DOAJ |
description | Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, <i>k</i>-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets. |
first_indexed | 2024-03-11T06:30:24Z |
format | Article |
id | doaj.art-c30f255481584667b886060e593451cc |
institution | Directory Open Access Journal |
issn | 2073-4425 |
language | English |
last_indexed | 2024-03-11T06:30:24Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Genes |
spelling | doaj.art-c30f255481584667b886060e593451cc2023-11-17T11:17:18ZengMDPI AGGenes2073-44252023-03-0114363410.3390/genes14030634CNN_FunBar: Advanced Learning Technique for Fungi ITS Region ClassificationRitwika Das0Anil Rai1Dwijesh Chandra Mishra2Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaDivision of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaDivision of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaFungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, <i>k</i>-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.https://www.mdpi.com/2073-4425/14/3/634CNNfungi ITS<i>k</i>-merKNNNaïve-Bayesrandom forest |
spellingShingle | Ritwika Das Anil Rai Dwijesh Chandra Mishra CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification Genes CNN fungi ITS <i>k</i>-mer KNN Naïve-Bayes random forest |
title | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_full | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_fullStr | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_full_unstemmed | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_short | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_sort | cnn funbar advanced learning technique for fungi its region classification |
topic | CNN fungi ITS <i>k</i>-mer KNN Naïve-Bayes random forest |
url | https://www.mdpi.com/2073-4425/14/3/634 |
work_keys_str_mv | AT ritwikadas cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification AT anilrai cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification AT dwijeshchandramishra cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification |