Optimizing classification efficiency with machine learning techniques for pattern matching

Abstract The study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. T...

Full description

Bibliographic Details
Main Authors:	Belal A. Hamed, Osman Ali Sadek Ibrahim, Tarek Abd El-Hafeez
Format:	Article
Language:	English
Published:	SpringerOpen 2023-07-01
Series:	Journal of Big Data
Subjects:	Bioinformatics Feature extraction Pattern matching Machine learning DNA sequences
Online Access:	https://doi.org/10.1186/s40537-023-00804-6

_version_	1797769415927791616
author	Belal A. Hamed Osman Ali Sadek Ibrahim Tarek Abd El-Hafeez
author_facet	Belal A. Hamed Osman Ali Sadek Ibrahim Tarek Abd El-Hafeez
author_sort	Belal A. Hamed
collection	DOAJ
description	Abstract The study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. The performance of the proposed model is evaluated using various machine learning algorithms, and the results indicate that the SVM linear classifier achieves the highest accuracy and F1 score among the tested algorithms. This finding suggests that the proposed model can provide better overall performance than other algorithms in DNA sequence classification. In addition, the proposed model is compared to two suggested algorithms, namely FLPM and PAPM, and the results show that the proposed model outperforms these algorithms in terms of accuracy and efficiency. The study further explores the impact of pattern length on the accuracy and time complexity of each algorithm. The results show that as the pattern length increases, the execution time of each algorithm varies. For a pattern length of 5, SVM Linear and EFLPM have the lowest execution time of 0.0035 s. However, at a pattern length of 25, SVM Linear has the lowest execution time of 0.0012 s. The experimental results of the proposed model show that SVM Linear has the highest accuracy and F1 score among the tested algorithms. SVM Linear achieved an accuracy of 0.963 and an F1 score of 0.97, indicating that it can provide the best overall performance in DNA sequence classification. Naive Bayes also performs well with an accuracy of 0.838 and an F1 score of 0.94. The proposed model offers a valuable contribution to the field of DNA sequence analysis by providing a novel approach to pre-processing and feature extraction. The model’s potential applications include drug discovery, personalized medicine, and disease diagnosis. The study’s findings highlight the importance of considering the impact of pattern length on the accuracy and time complexity of DNA sequence classification algorithms.
first_indexed	2024-03-12T21:08:42Z
format	Article
id	doaj.art-1bf264a0a5374d8287019a16513fa39c
institution	Directory Open Access Journal
issn	2196-1115
language	English
last_indexed	2024-03-12T21:08:42Z
publishDate	2023-07-01
publisher	SpringerOpen
record_format	Article
series	Journal of Big Data
spelling	doaj.art-1bf264a0a5374d8287019a16513fa39c2023-07-30T11:17:42ZengSpringerOpenJournal of Big Data2196-11152023-07-0110111810.1186/s40537-023-00804-6Optimizing classification efficiency with machine learning techniques for pattern matchingBelal A. Hamed0Osman Ali Sadek Ibrahim1Tarek Abd El-Hafeez2Department of Computer Science, Faculty of Science, Minia UniversityDepartment of Computer Science, Faculty of Science, Minia UniversityDepartment of Computer Science, Faculty of Science, Minia UniversityAbstract The study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. The performance of the proposed model is evaluated using various machine learning algorithms, and the results indicate that the SVM linear classifier achieves the highest accuracy and F1 score among the tested algorithms. This finding suggests that the proposed model can provide better overall performance than other algorithms in DNA sequence classification. In addition, the proposed model is compared to two suggested algorithms, namely FLPM and PAPM, and the results show that the proposed model outperforms these algorithms in terms of accuracy and efficiency. The study further explores the impact of pattern length on the accuracy and time complexity of each algorithm. The results show that as the pattern length increases, the execution time of each algorithm varies. For a pattern length of 5, SVM Linear and EFLPM have the lowest execution time of 0.0035 s. However, at a pattern length of 25, SVM Linear has the lowest execution time of 0.0012 s. The experimental results of the proposed model show that SVM Linear has the highest accuracy and F1 score among the tested algorithms. SVM Linear achieved an accuracy of 0.963 and an F1 score of 0.97, indicating that it can provide the best overall performance in DNA sequence classification. Naive Bayes also performs well with an accuracy of 0.838 and an F1 score of 0.94. The proposed model offers a valuable contribution to the field of DNA sequence analysis by providing a novel approach to pre-processing and feature extraction. The model’s potential applications include drug discovery, personalized medicine, and disease diagnosis. The study’s findings highlight the importance of considering the impact of pattern length on the accuracy and time complexity of DNA sequence classification algorithms.https://doi.org/10.1186/s40537-023-00804-6BioinformaticsFeature extractionPattern matchingMachine learningDNA sequences
spellingShingle	Belal A. Hamed Osman Ali Sadek Ibrahim Tarek Abd El-Hafeez Optimizing classification efficiency with machine learning techniques for pattern matching Journal of Big Data Bioinformatics Feature extraction Pattern matching Machine learning DNA sequences
title	Optimizing classification efficiency with machine learning techniques for pattern matching
title_full	Optimizing classification efficiency with machine learning techniques for pattern matching
title_fullStr	Optimizing classification efficiency with machine learning techniques for pattern matching
title_full_unstemmed	Optimizing classification efficiency with machine learning techniques for pattern matching
title_short	Optimizing classification efficiency with machine learning techniques for pattern matching
title_sort	optimizing classification efficiency with machine learning techniques for pattern matching
topic	Bioinformatics Feature extraction Pattern matching Machine learning DNA sequences
url	https://doi.org/10.1186/s40537-023-00804-6
work_keys_str_mv	AT belalahamed optimizingclassificationefficiencywithmachinelearningtechniquesforpatternmatching AT osmanalisadekibrahim optimizingclassificationefficiencywithmachinelearningtechniquesforpatternmatching AT tarekabdelhafeez optimizingclassificationefficiencywithmachinelearningtechniquesforpatternmatching

Optimizing classification efficiency with machine learning techniques for pattern matching

Similar Items