Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration

Identifying the cell of origin of cancer is important to guide treatment decisions. Machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles from solid biopsies. However, solid biopsies can cause complications and certain tumors are not accessi...

Full description

Bibliographic Details
Main Authors: Alexandra Danyi, Myrthe Jager, Jeroen de Ridder
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Life
Subjects:
Online Access:https://www.mdpi.com/2075-1729/12/1/1
_version_ 1818062722265251840
author Alexandra Danyi
Myrthe Jager
Jeroen de Ridder
author_facet Alexandra Danyi
Myrthe Jager
Jeroen de Ridder
author_sort Alexandra Danyi
collection DOAJ
description Identifying the cell of origin of cancer is important to guide treatment decisions. Machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles from solid biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. Liquid biopsies are promising alternatives but their somatic mutation profile is sparse and current machine learning models fail to perform in this setting. We propose an improved method to deal with sparsity in liquid biopsy data. Firstly, data augmentation is performed on sparse data to enhance model robustness. Secondly, we employ data integration to merge information from: (i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 and 0.65 on data where only 70% and 2% of SNVs are retained, compared to 0.83 and 0.41 with the original model, respectively. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from liquid biopsy data.
first_indexed 2024-12-10T14:08:43Z
format Article
id doaj.art-65fd0467d5884c749b7a7a2633ce13d7
institution Directory Open Access Journal
issn 2075-1729
language English
last_indexed 2024-12-10T14:08:43Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Life
spelling doaj.art-65fd0467d5884c749b7a7a2633ce13d72022-12-22T01:45:34ZengMDPI AGLife2075-17292021-12-01121110.3390/life12010001Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and IntegrationAlexandra Danyi0Myrthe Jager1Jeroen de Ridder2Center for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The NetherlandsCenter for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The NetherlandsCenter for Molecular Medicine, University Medical Center Utrecht, 3584 CX Utrecht, The NetherlandsIdentifying the cell of origin of cancer is important to guide treatment decisions. Machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles from solid biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. Liquid biopsies are promising alternatives but their somatic mutation profile is sparse and current machine learning models fail to perform in this setting. We propose an improved method to deal with sparsity in liquid biopsy data. Firstly, data augmentation is performed on sparse data to enhance model robustness. Secondly, we employ data integration to merge information from: (i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 and 0.65 on data where only 70% and 2% of SNVs are retained, compared to 0.83 and 0.41 with the original model, respectively. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from liquid biopsy data.https://www.mdpi.com/2075-1729/12/1/1deep learninggenomicsgenetic variabilitybioinformatics
spellingShingle Alexandra Danyi
Myrthe Jager
Jeroen de Ridder
Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
Life
deep learning
genomics
genetic variability
bioinformatics
title Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
title_full Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
title_fullStr Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
title_full_unstemmed Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
title_short Cancer Type Classification in Liquid Biopsies Based on Sparse Mutational Profiles Enabled through Data Augmentation and Integration
title_sort cancer type classification in liquid biopsies based on sparse mutational profiles enabled through data augmentation and integration
topic deep learning
genomics
genetic variability
bioinformatics
url https://www.mdpi.com/2075-1729/12/1/1
work_keys_str_mv AT alexandradanyi cancertypeclassificationinliquidbiopsiesbasedonsparsemutationalprofilesenabledthroughdataaugmentationandintegration
AT myrthejager cancertypeclassificationinliquidbiopsiesbasedonsparsemutationalprofilesenabledthroughdataaugmentationandintegration
AT jeroenderidder cancertypeclassificationinliquidbiopsiesbasedonsparsemutationalprofilesenabledthroughdataaugmentationandintegration