RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data

The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we...

Full description

Bibliographic Details
Main Authors: Qihan Long, Yangyang Yuan, Miaoxin Li
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-06-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/full
_version_ 1811238213904236544
author Qihan Long
Qihan Long
Qihan Long
Yangyang Yuan
Yangyang Yuan
Yangyang Yuan
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
author_facet Qihan Long
Qihan Long
Qihan Long
Yangyang Yuan
Yangyang Yuan
Yangyang Yuan
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
author_sort Qihan Long
collection DOAJ
description The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.
first_indexed 2024-04-12T12:36:46Z
format Article
id doaj.art-e61ec7d513504eb2b107197c0bce3ce1
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-12T12:36:46Z
publishDate 2022-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-e61ec7d513504eb2b107197c0bce3ce12022-12-22T03:32:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-06-011310.3389/fgene.2022.865313865313RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq DataQihan Long0Qihan Long1Qihan Long2Yangyang Yuan3Yangyang Yuan4Yangyang Yuan5Miaoxin Li6Miaoxin Li7Miaoxin Li8Miaoxin Li9Miaoxin Li10Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaZhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaZhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Precision Medicine, Sun Yat-Sen University, Guangzhou, ChinaCenter for Disease Genome Research, Sun Yat-Sen University, Guangzhou, ChinaGuangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, ChinaKey Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, ChinaThe usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/fullcancersomatic mutationRNARNA-Seqmachine learningRNA-SSNV
spellingShingle Qihan Long
Qihan Long
Qihan Long
Yangyang Yuan
Yangyang Yuan
Yangyang Yuan
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
Miaoxin Li
RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
Frontiers in Genetics
cancer
somatic mutation
RNA
RNA-Seq
machine learning
RNA-SSNV
title RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
title_full RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
title_fullStr RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
title_full_unstemmed RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
title_short RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data
title_sort rna ssnv a reliable somatic single nucleotide variant identification framework for bulk rna seq data
topic cancer
somatic mutation
RNA
RNA-Seq
machine learning
RNA-SSNV
url https://www.frontiersin.org/articles/10.3389/fgene.2022.865313/full
work_keys_str_mv AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT qihanlong rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT yangyangyuan rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata
AT miaoxinli rnassnvareliablesomaticsinglenucleotidevariantidentificationframeworkforbulkrnaseqdata