CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification

Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and c...

Full description

Bibliographic Details
Main Authors: Ritwika Das, Anil Rai, Dwijesh Chandra Mishra
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/14/3/634
_version_ 1797611559506149376
author Ritwika Das
Anil Rai
Dwijesh Chandra Mishra
author_facet Ritwika Das
Anil Rai
Dwijesh Chandra Mishra
author_sort Ritwika Das
collection DOAJ
description Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, <i>k</i>-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.
first_indexed 2024-03-11T06:30:24Z
format Article
id doaj.art-c30f255481584667b886060e593451cc
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-11T06:30:24Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-c30f255481584667b886060e593451cc2023-11-17T11:17:18ZengMDPI AGGenes2073-44252023-03-0114363410.3390/genes14030634CNN_FunBar: Advanced Learning Technique for Fungi ITS Region ClassificationRitwika Das0Anil Rai1Dwijesh Chandra Mishra2Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaDivision of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaDivision of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaFungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, <i>k</i>-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.https://www.mdpi.com/2073-4425/14/3/634CNNfungi ITS<i>k</i>-merKNNNaïve-Bayesrandom forest
spellingShingle Ritwika Das
Anil Rai
Dwijesh Chandra Mishra
CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
Genes
CNN
fungi ITS
<i>k</i>-mer
KNN
Naïve-Bayes
random forest
title CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_full CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_fullStr CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_full_unstemmed CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_short CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_sort cnn funbar advanced learning technique for fungi its region classification
topic CNN
fungi ITS
<i>k</i>-mer
KNN
Naïve-Bayes
random forest
url https://www.mdpi.com/2073-4425/14/3/634
work_keys_str_mv AT ritwikadas cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification
AT anilrai cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification
AT dwijeshchandramishra cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification