Predicting liver cancer on epigenomics data using machine learning

Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the canc...

Full description

Bibliographic Details
Main Authors: Vishalkumar Vekariya, Kalpdrum Passi, Chakresh Kumar Jain
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-09-01
Series:Frontiers in Bioinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/full
_version_ 1818032762978828288
author Vishalkumar Vekariya
Kalpdrum Passi
Chakresh Kumar Jain
author_facet Vishalkumar Vekariya
Kalpdrum Passi
Chakresh Kumar Jain
author_sort Vishalkumar Vekariya
collection DOAJ
description Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.
first_indexed 2024-12-10T06:12:32Z
format Article
id doaj.art-7d04d3c1667f4262bf1a1c892d85975d
institution Directory Open Access Journal
issn 2673-7647
language English
last_indexed 2024-12-10T06:12:32Z
publishDate 2022-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioinformatics
spelling doaj.art-7d04d3c1667f4262bf1a1c892d85975d2022-12-22T01:59:32ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472022-09-01210.3389/fbinf.2022.954529954529Predicting liver cancer on epigenomics data using machine learningVishalkumar Vekariya0Kalpdrum Passi1Chakresh Kumar Jain2School of Engineering and Computer Science, Laurentian University, Sudbury, ON, CanadaSchool of Engineering and Computer Science, Laurentian University, Sudbury, ON, CanadaDepartment of Biotechnology, Jaypee Institute of Information Technology, Noida, IndiaEpigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/fullepigenomicshistoneDNA methylationhuman genomeRNA
spellingShingle Vishalkumar Vekariya
Kalpdrum Passi
Chakresh Kumar Jain
Predicting liver cancer on epigenomics data using machine learning
Frontiers in Bioinformatics
epigenomics
histone
DNA methylation
human genome
RNA
title Predicting liver cancer on epigenomics data using machine learning
title_full Predicting liver cancer on epigenomics data using machine learning
title_fullStr Predicting liver cancer on epigenomics data using machine learning
title_full_unstemmed Predicting liver cancer on epigenomics data using machine learning
title_short Predicting liver cancer on epigenomics data using machine learning
title_sort predicting liver cancer on epigenomics data using machine learning
topic epigenomics
histone
DNA methylation
human genome
RNA
url https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/full
work_keys_str_mv AT vishalkumarvekariya predictinglivercanceronepigenomicsdatausingmachinelearning
AT kalpdrumpassi predictinglivercanceronepigenomicsdatausingmachinelearning
AT chakreshkumarjain predictinglivercanceronepigenomicsdatausingmachinelearning