Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites

DNA methylation is an epigenetic modification that plays a crucial role in various biological processes, including gene expression regulation, cell differentiation, and the development of diseases such as cancer. Identifying DNA methylation patterns is essential for understanding its functional impl...

Full description

Bibliographic Details
Main Authors: Sajeeb Saha, Phd, Rajib Kumar Halder, MPhil, Mohammed Nasir Uddin, Phd
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914823002204
_version_ 1797646570335764480
author Sajeeb Saha, Phd
Rajib Kumar Halder, MPhil
Mohammed Nasir Uddin, Phd
author_facet Sajeeb Saha, Phd
Rajib Kumar Halder, MPhil
Mohammed Nasir Uddin, Phd
author_sort Sajeeb Saha, Phd
collection DOAJ
description DNA methylation is an epigenetic modification that plays a crucial role in various biological processes, including gene expression regulation, cell differentiation, and the development of diseases such as cancer. Identifying DNA methylation patterns is essential for understanding its functional implications. Traditional experimental methods for detecting DNA methylation are costly, time-consuming, and inefficient for analyzing large-scale sequencing data. In this research, we explore the application of machine learning techniques to accurately identify DNA methylation sites. Our research aims to develop a Particle Swarm Optimization-Assisted Multilayer Ensemble Model (PSO-MEM) with several significant contributions. These include extracting semantic features from genetic sequences, optimizing feature dimensions to reduce classification errors, developing a multilayer dynamic approach that transfers learned information between layers during classification, and incorporating ensemble techniques for improved prediction and optimal results. To evaluate the performance of our proposed model, we compare it with existing models using eight publicly available datasets. The results demonstrate the efficacy of our approach, with AUC values of 91.99%, 92.80%, 90.28%, 91.03%, 93.09%, 90.79%, 90.68%, and 91.88% for the C. elegans, D. melanogaster, A. thaliana, E. coli, G. subterraneus, G. pickeringi, F. vesca, and R. chinensis datasets, respectively. The results highlight the potential of machine learning techniques for efficient and reliable identification of DNA methylation sites in large-scale genomic data, facilitating advancements in understanding epigenetic modifications and their functional implications.
first_indexed 2024-03-11T15:04:31Z
format Article
id doaj.art-311e945a0ff649be9891bd81c7e2fe8f
institution Directory Open Access Journal
issn 2352-9148
language English
last_indexed 2024-03-11T15:04:31Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj.art-311e945a0ff649be9891bd81c7e2fe8f2023-10-30T06:05:16ZengElsevierInformatics in Medicine Unlocked2352-91482023-01-0142101374Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sitesSajeeb Saha, Phd0Rajib Kumar Halder, MPhil1Mohammed Nasir Uddin, Phd2Department of Computer Science and Engineering, Jagannath University, Dhaka, 1100, BangladeshCorresponding author. Jagannath University, Dept. of Computer Science and Engineering, 9-10 Chittaranjan Ave, Dhaka, 1100, Bangladesh.; Department of Computer Science and Engineering, Jagannath University, Dhaka, 1100, BangladeshDepartment of Computer Science and Engineering, Jagannath University, Dhaka, 1100, BangladeshDNA methylation is an epigenetic modification that plays a crucial role in various biological processes, including gene expression regulation, cell differentiation, and the development of diseases such as cancer. Identifying DNA methylation patterns is essential for understanding its functional implications. Traditional experimental methods for detecting DNA methylation are costly, time-consuming, and inefficient for analyzing large-scale sequencing data. In this research, we explore the application of machine learning techniques to accurately identify DNA methylation sites. Our research aims to develop a Particle Swarm Optimization-Assisted Multilayer Ensemble Model (PSO-MEM) with several significant contributions. These include extracting semantic features from genetic sequences, optimizing feature dimensions to reduce classification errors, developing a multilayer dynamic approach that transfers learned information between layers during classification, and incorporating ensemble techniques for improved prediction and optimal results. To evaluate the performance of our proposed model, we compare it with existing models using eight publicly available datasets. The results demonstrate the efficacy of our approach, with AUC values of 91.99%, 92.80%, 90.28%, 91.03%, 93.09%, 90.79%, 90.68%, and 91.88% for the C. elegans, D. melanogaster, A. thaliana, E. coli, G. subterraneus, G. pickeringi, F. vesca, and R. chinensis datasets, respectively. The results highlight the potential of machine learning techniques for efficient and reliable identification of DNA methylation sites in large-scale genomic data, facilitating advancements in understanding epigenetic modifications and their functional implications.http://www.sciencedirect.com/science/article/pii/S2352914823002204DNA N4-MethylcytosineMachine learningFeature selectionEnsemble modelParticle swarm optimization (PSO)
spellingShingle Sajeeb Saha, Phd
Rajib Kumar Halder, MPhil
Mohammed Nasir Uddin, Phd
Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
Informatics in Medicine Unlocked
DNA N4-Methylcytosine
Machine learning
Feature selection
Ensemble model
Particle swarm optimization (PSO)
title Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
title_full Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
title_fullStr Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
title_full_unstemmed Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
title_short Particle Swarm Optimization-Assisted Multilayer Ensemble Model to predict DNA 4mC sites
title_sort particle swarm optimization assisted multilayer ensemble model to predict dna 4mc sites
topic DNA N4-Methylcytosine
Machine learning
Feature selection
Ensemble model
Particle swarm optimization (PSO)
url http://www.sciencedirect.com/science/article/pii/S2352914823002204
work_keys_str_mv AT sajeebsahaphd particleswarmoptimizationassistedmultilayerensemblemodeltopredictdna4mcsites
AT rajibkumarhaldermphil particleswarmoptimizationassistedmultilayerensemblemodeltopredictdna4mcsites
AT mohammednasiruddinphd particleswarmoptimizationassistedmultilayerensemblemodeltopredictdna4mcsites