Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data

Patients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested...

Full description

Bibliographic Details
Main Authors: Xiaojun Liu, Lianxing Li, Lihong Peng, Bo Wang, Jidong Lang, Qingqing Lu, Xizhe Zhang, Yi Sun, Geng Tian, Huajun Zhang, Liqian Zhou
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.00674/full
_version_ 1819168584294727680
author Xiaojun Liu
Lianxing Li
Lihong Peng
Bo Wang
Jidong Lang
Qingqing Lu
Xizhe Zhang
Yi Sun
Geng Tian
Huajun Zhang
Liqian Zhou
author_facet Xiaojun Liu
Lianxing Li
Lihong Peng
Bo Wang
Jidong Lang
Qingqing Lu
Xizhe Zhang
Yi Sun
Geng Tian
Huajun Zhang
Liqian Zhou
author_sort Xiaojun Liu
collection DOAJ
description Patients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested that molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study was to evaluate the performance somatic mutations detected in a tumor to identify the cancer tissue of origin. We downloaded the somatic mutation datasets from the International Cancer Genome Consortium project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions, such as cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3,374 cancer samples across 13 cancer types reached 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue of origin, as well as the diagnosis and treatment of cancers.
first_indexed 2024-12-22T19:05:56Z
format Article
id doaj.art-d5130ff8ad7e43e0b4aed6e77037be02
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-22T19:05:56Z
publishDate 2020-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-d5130ff8ad7e43e0b4aed6e77037be022022-12-21T18:15:50ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-07-011110.3389/fgene.2020.00674537483Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation DataXiaojun Liu0Lianxing Li1Lihong Peng2Bo Wang3Jidong Lang4Qingqing Lu5Xizhe Zhang6Yi Sun7Geng Tian8Huajun Zhang9Liqian Zhou10School of Computer Science, Hunan University of Technology, Zhuzhou, ChinaChifeng Municipal Hospital, Chifeng, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaChifeng Municipal Hospital, Chifeng, ChinaChifeng Municipal Hospital, Chifeng, ChinaGenesis Beijing Co., Ltd., Beijing, ChinaCollege of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, ChinaSchool of Computer Science, Hunan University of Technology, Zhuzhou, ChinaPatients with carcinoma of unknown primary (CUP) account for 3–5% of all cancer cases. A large number of metastatic cancers require further diagnosis to determine their tissue of origin. However, diagnosis of CUP and identification of its primary site are challenging. Previous studies have suggested that molecular profiling of tissue-specific genes could be useful in inferring the primary tissue of a tumor. The purpose of this study was to evaluate the performance somatic mutations detected in a tumor to identify the cancer tissue of origin. We downloaded the somatic mutation datasets from the International Cancer Genome Consortium project. The random forest algorithm was used to extract features, and a classifier was established based on the logistic regression. Specifically, the somatic mutations of 300 genes were extracted, which are significantly enriched in functions, such as cell-to-cell adhesion. In addition, the prediction accuracy on tissue-of-origin inference for 3,374 cancer samples across 13 cancer types reached 81% in a 10-fold cross-validation. Our method could be useful in the identification of cancer tissue of origin, as well as the diagnosis and treatment of cancers.https://www.frontiersin.org/article/10.3389/fgene.2020.00674/fullsomatic mutationmachine learningrandom forestpatients with carcinoma of unknown primarytissue of origin
spellingShingle Xiaojun Liu
Lianxing Li
Lihong Peng
Bo Wang
Jidong Lang
Qingqing Lu
Xizhe Zhang
Yi Sun
Geng Tian
Huajun Zhang
Liqian Zhou
Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
Frontiers in Genetics
somatic mutation
machine learning
random forest
patients with carcinoma of unknown primary
tissue of origin
title Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
title_full Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
title_fullStr Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
title_full_unstemmed Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
title_short Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data
title_sort predicting cancer tissue of origin by a machine learning method using dna somatic mutation data
topic somatic mutation
machine learning
random forest
patients with carcinoma of unknown primary
tissue of origin
url https://www.frontiersin.org/article/10.3389/fgene.2020.00674/full
work_keys_str_mv AT xiaojunliu predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT lianxingli predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT lihongpeng predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT bowang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT jidonglang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT qingqinglu predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT xizhezhang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT yisun predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT gengtian predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT huajunzhang predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata
AT liqianzhou predictingcancertissueoforiginbyamachinelearningmethodusingdnasomaticmutationdata