Detecting and understanding meaningful cancerous mutations based on computational models of mRNA splicing

Abstract Cancer research has long relied on non-silent mutations. Yet, it has become overwhelmingly clear that silent mutations can affect gene expression and cancer cell fitness. One fundamental mechanism that apparently silent mutations can severely disrupt is alternative splicing. Here we introdu...

Full description

Bibliographic Details
Main Authors: Nicolas Lynn, Tamir Tuller
Format: Article
Language:English
Published: Nature Portfolio 2024-03-01
Series:npj Systems Biology and Applications
Online Access:https://doi.org/10.1038/s41540-024-00351-7
Description
Summary:Abstract Cancer research has long relied on non-silent mutations. Yet, it has become overwhelmingly clear that silent mutations can affect gene expression and cancer cell fitness. One fundamental mechanism that apparently silent mutations can severely disrupt is alternative splicing. Here we introduce Oncosplice, a tool that scores mutations based on models of proteomes generated using aberrant splicing predictions. Oncosplice leverages a highly accurate neural network that predicts splice sites within arbitrary mRNA sequences, a greedy transcript constructor that considers alternate arrangements of splicing blueprints, and an algorithm that grades the functional divergence between proteins based on evolutionary conservation. By applying this tool to 12M somatic mutations we identify 8K deleterious variants that are significantly depleted within the healthy population; we demonstrate the tool’s ability to identify clinically validated pathogenic variants with a positive predictive value of 94%; we show strong enrichment of predicted deleterious mutations across pan-cancer drivers. We also achieve improved patient survival estimation using a proposed set of novel cancer-involved genes. Ultimately, this pipeline enables accelerated insight-gathering of sequence-specific consequences for a class of understudied mutations and provides an efficient way of filtering through massive variant datasets – functionalities with immediate experimental and clinical applications.
ISSN:2056-7189