Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN
Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect c...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Brawijaya
2022-09-01
|
Series: | JITeCS (Journal of Information Technology and Computer Science) |
Online Access: | https://jitecs.ub.ac.id/index.php/jitecs/article/view/401 |
_version_ | 1797248755046547456 |
---|---|
author | Marji Marji Imam Cholissodin Dian Eka Ratnawati Edy Santoso Nurul Hidayat |
author_facet | Marji Marji Imam Cholissodin Dian Eka Ratnawati Edy Santoso Nurul Hidayat |
author_sort | Marji Marji |
collection | DOAJ |
description |
Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method.
|
first_indexed | 2024-04-24T20:19:37Z |
format | Article |
id | doaj.art-86293154478541028904fd5ab219378c |
institution | Directory Open Access Journal |
issn | 2540-9433 2540-9824 |
language | English |
last_indexed | 2024-04-24T20:19:37Z |
publishDate | 2022-09-01 |
publisher | University of Brawijaya |
record_format | Article |
series | JITeCS (Journal of Information Technology and Computer Science) |
spelling | doaj.art-86293154478541028904fd5ab219378c2024-03-22T08:34:19ZengUniversity of BrawijayaJITeCS (Journal of Information Technology and Computer Science)2540-94332540-98242022-09-017110.25126/jitecs.202271401Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNNMarji Marji0Imam Cholissodin1Dian Eka Ratnawati2Edy Santoso3Nurul Hidayat4Computer Science Faculty Brawijaya UniversityBrawijaya UniversityBrawijaya UniversityBrawijaya UniversityBrawijaya University Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method. https://jitecs.ub.ac.id/index.php/jitecs/article/view/401 |
spellingShingle | Marji Marji Imam Cholissodin Dian Eka Ratnawati Edy Santoso Nurul Hidayat Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN JITeCS (Journal of Information Technology and Computer Science) |
title | Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN |
title_full | Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN |
title_fullStr | Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN |
title_full_unstemmed | Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN |
title_short | Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN |
title_sort | cancer classification based on the features of itemset sequence pattern of tp53 protein code using deep miden knn |
url | https://jitecs.ub.ac.id/index.php/jitecs/article/view/401 |
work_keys_str_mv | AT marjimarji cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn AT imamcholissodin cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn AT dianekaratnawati cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn AT edysantoso cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn AT nurulhidayat cancerclassificationbasedonthefeaturesofitemsetsequencepatternoftp53proteincodeusingdeepmidenknn |