Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN

Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect c...

Full description

Bibliographic Details
Main Authors: Marji Marji, Imam Cholissodin, Dian Eka Ratnawati, Edy Santoso, Nurul Hidayat
Format: Article
Language:English
Published: University of Brawijaya 2022-09-01
Series:JITeCS (Journal of Information Technology and Computer Science)
Online Access:https://jitecs.ub.ac.id/index.php/jitecs/article/view/401
_version_ 1797248755046547456
author Marji Marji
Imam Cholissodin
Dian Eka Ratnawati
Edy Santoso
Nurul Hidayat
author_facet Marji Marji
Imam Cholissodin
Dian Eka Ratnawati
Edy Santoso
Nurul Hidayat
author_sort Marji Marji
collection DOAJ
description Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method.
first_indexed 2024-04-24T20:19:37Z
format Article
id doaj.art-86293154478541028904fd5ab219378c
institution Directory Open Access Journal
issn 2540-9433
2540-9824
language English
last_indexed 2024-04-24T20:19:37Z
publishDate 2022-09-01
publisher University of Brawijaya
record_format Article
series JITeCS (Journal of Information Technology and Computer Science)
spelling doaj.art-86293154478541028904fd5ab219378c2024-03-22T08:34:19ZengUniversity of BrawijayaJITeCS (Journal of Information Technology and Computer Science)2540-94332540-98242022-09-017110.25126/jitecs.202271401Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNNMarji Marji0Imam Cholissodin1Dian Eka Ratnawati2Edy Santoso3Nurul Hidayat4Computer Science Faculty Brawijaya UniversityBrawijaya UniversityBrawijaya UniversityBrawijaya UniversityBrawijaya University Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method. https://jitecs.ub.ac.id/index.php/jitecs/article/view/401
spellingShingle Marji Marji
Imam Cholissodin
Dian Eka Ratnawati
Edy Santoso
Nurul Hidayat
Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
JITeCS (Journal of Information Technology and Computer Science)
title Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
title_full Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
title_fullStr Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
title_full_unstemmed Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
title_short Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
title_sort cancer classification based on the features of itemset sequence pattern of tp53 protein code using deep miden knn
url https://jitecs.ub.ac.id/index.php/jitecs/article/view/401
work_keys_str_mv AT marjimarji cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn
AT imamcholissodin cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn
AT dianekaratnawati cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn
AT edysantoso cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn
AT nurulhidayat cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn