Predicting liver cancer on epigenomics data using machine learning
Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the canc...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-09-01
|
Series: | Frontiers in Bioinformatics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/full |
_version_ | 1818032762978828288 |
---|---|
author | Vishalkumar Vekariya Kalpdrum Passi Chakresh Kumar Jain |
author_facet | Vishalkumar Vekariya Kalpdrum Passi Chakresh Kumar Jain |
author_sort | Vishalkumar Vekariya |
collection | DOAJ |
description | Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer. |
first_indexed | 2024-12-10T06:12:32Z |
format | Article |
id | doaj.art-7d04d3c1667f4262bf1a1c892d85975d |
institution | Directory Open Access Journal |
issn | 2673-7647 |
language | English |
last_indexed | 2024-12-10T06:12:32Z |
publishDate | 2022-09-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Bioinformatics |
spelling | doaj.art-7d04d3c1667f4262bf1a1c892d85975d2022-12-22T01:59:32ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472022-09-01210.3389/fbinf.2022.954529954529Predicting liver cancer on epigenomics data using machine learningVishalkumar Vekariya0Kalpdrum Passi1Chakresh Kumar Jain2School of Engineering and Computer Science, Laurentian University, Sudbury, ON, CanadaSchool of Engineering and Computer Science, Laurentian University, Sudbury, ON, CanadaDepartment of Biotechnology, Jaypee Institute of Information Technology, Noida, IndiaEpigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/fullepigenomicshistoneDNA methylationhuman genomeRNA |
spellingShingle | Vishalkumar Vekariya Kalpdrum Passi Chakresh Kumar Jain Predicting liver cancer on epigenomics data using machine learning Frontiers in Bioinformatics epigenomics histone DNA methylation human genome RNA |
title | Predicting liver cancer on epigenomics data using machine learning |
title_full | Predicting liver cancer on epigenomics data using machine learning |
title_fullStr | Predicting liver cancer on epigenomics data using machine learning |
title_full_unstemmed | Predicting liver cancer on epigenomics data using machine learning |
title_short | Predicting liver cancer on epigenomics data using machine learning |
title_sort | predicting liver cancer on epigenomics data using machine learning |
topic | epigenomics histone DNA methylation human genome RNA |
url | https://www.frontiersin.org/articles/10.3389/fbinf.2022.954529/full |
work_keys_str_mv | AT vishalkumarvekariya predictinglivercanceronepigenomicsdatausingmachinelearning AT kalpdrumpassi predictinglivercanceronepigenomicsdatausingmachinelearning AT chakreshkumarjain predictinglivercanceronepigenomicsdatausingmachinelearning |