RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-06-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/full |
_version_ | 1811238213904236544 |
---|---|
author | Qihan Long Qihan Long Qihan Long Yangyang Yuan Yangyang Yuan Yangyang Yuan Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li |
author_facet | Qihan Long Qihan Long Qihan Long Yangyang Yuan Yangyang Yuan Yangyang Yuan Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li |
author_sort | Qihan Long |
collection | DOAJ |
description | The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms. |
first_indexed | 2024-04-12T12:36:46Z |
format | Article |
id | doaj.art-e61ec7d513504eb2b107197c0bce3ce1 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-04-12T12:36:46Z |
publishDate | 2022-06-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-e61ec7d513504eb2b107197c0bce3ce12022-12-22T03:32:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-06-011310.3389/fgene.2022.865313865313RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq DataQihan Long0Qihan Long1Qihan Long2Yangyang Yuan3Yangyang Yuan4Yangyang Yuan5Miaoxin Li6Miaoxin Li7Miaoxin Li8Miaoxin Li9Miaoxin Li10Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaZhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaZhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaGuangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, ChinaKey Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, ChinaThe usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/fullcancersomatic mutationRNARNA-Seqmachine learningRNA-SSNV |
spellingShingle | Qihan Long Qihan Long Qihan Long Yangyang Yuan Yangyang Yuan Yangyang Yuan Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li Miaoxin Li RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data Frontiers in Genetics cancer somatic mutation RNA RNA-Seq machine learning RNA-SSNV |
title | RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data |
title_full | RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data |
title_fullStr | RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data |
title_full_unstemmed | RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data |
title_short | RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data |
title_sort | rna ssnv a reliable somatic single nucleotide variant identification framework for bulk rna seq data |
topic | cancer somatic mutation RNA RNA-Seq machine learning RNA-SSNV |
url | https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/full |
work_keys_str_mv | AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata |