Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
Patients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-07-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2020.00674/full |
_version_ | 1819168584294727680 |
---|---|
author | Xiaojun Liu Lianxing Li Lihong Peng Bo Wang Jidong Lang Qingqing Lu Xizhe Zhang Yi Sun Geng Tian Huajun Zhang Liqian Zhou |
author_facet | Xiaojun Liu Lianxing Li Lihong Peng Bo Wang Jidong Lang Qingqing Lu Xizhe Zhang Yi Sun Geng Tian Huajun Zhang Liqian Zhou |
author_sort | Xiaojun Liu |
collection | DOAJ |
description | Patients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested that molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study was to evaluate the performance somatic mutations detected in a tumor to identify the cancer tissue of origin. We downloaded the somatic mutation datasets from the International Cancer Genome Consortium project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions, such as cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3,374 cancer samples across 13 cancer types reached 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue of origin, as well as the diagnosis and treatment of cancers. |
first_indexed | 2024-12-22T19:05:56Z |
format | Article |
id | doaj.art-d5130ff8ad7e43e0b4aed6e77037be02 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-22T19:05:56Z |
publishDate | 2020-07-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-d5130ff8ad7e43e0b4aed6e77037be022022-12-21T18:15:50ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-07-011110.3389/fgene.2020.00674537483Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation DataXiaojun Liu0Lianxing Li1Lihong Peng2Bo Wang3Jidong Lang4Qingqing Lu5Xizhe Zhang6Yi Sun7Geng Tian8Huajun Zhang9Liqian Zhou10School of Computer Science, Hunan University of Technology, Zhuzhou, ChinaChifeng Municipal Hospital, Chifeng, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaChifeng Municipal Hospital, Chifeng, ChinaChifeng Municipal Hospital, Chifeng, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaCollege of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaPatients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested that molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study was to evaluate the performance somatic mutations detected in a tumor to identify the cancer tissue of origin. We downloaded the somatic mutation datasets from the International Cancer Genome Consortium project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions, such as cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3,374 cancer samples across 13 cancer types reached 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue of origin, as well as the diagnosis and treatment of cancers.https://www.frontiersin.org/article/10.3389/fgene.2020.00674/fullsomatic mutationmachine learningrandom forestpatients with carcinoma of unknown primarytissue of origin |
spellingShingle | Xiaojun Liu Lianxing Li Lihong Peng Bo Wang Jidong Lang Qingqing Lu Xizhe Zhang Yi Sun Geng Tian Huajun Zhang Liqian Zhou Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data Frontiers in Genetics somatic mutation machine learning random forest patients with carcinoma of unknown primary tissue of origin |
title | Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data |
title_full | Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data |
title_fullStr | Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data |
title_full_unstemmed | Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data |
title_short | Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data |
title_sort | predicting cancer tissue of origin by a machine learning method using dna somatic mutation data |
topic | somatic mutation machine learning random forest patients with carcinoma of unknown primary tissue of origin |
url | https://www.frontiersin.org/article/10.3389/fgene.2020.00674/full |
work_keys_str_mv | AT xiaojunliu predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT lianxingli predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT lihongpeng predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT bowang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT jidonglang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT qingqinglu predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT xizhezhang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT yisun predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT gengtian predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT huajunzhang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata AT liqianzhou predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata |