TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion

Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes...

Full description

Bibliographic Details
Main Author: Tian Zheng
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-11-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.981269/full
_version_ 1811218929945673728
author Tian Zheng
Tian Zheng
author_facet Tian Zheng
Tian Zheng
author_sort Tian Zheng
collection DOAJ
description Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.
first_indexed 2024-04-12T07:17:31Z
format Article
id doaj.art-b5698f09336b4a42bff74f011018d6f9
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-12T07:17:31Z
publishDate 2022-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-b5698f09336b4a42bff74f011018d6f92022-12-22T03:42:26ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-11-011310.3389/fgene.2022.981269981269TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportionTian Zheng0Tian Zheng1Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, ChinaInstitute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, ChinaMutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.https://www.frontiersin.org/articles/10.3389/fgene.2022.981269/fullgeneticsstructural variationmachine learningnext generation sequencingmutation detection
spellingShingle Tian Zheng
Tian Zheng
TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
Frontiers in Genetics
genetics
structural variation
machine learning
next generation sequencing
mutation detection
title TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
title_full TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
title_fullStr TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
title_full_unstemmed TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
title_short TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion
title_sort tlsub a transfer learning based enhancement to accurately detect mutations with wide spectrum sub clonal proportion
topic genetics
structural variation
machine learning
next generation sequencing
mutation detection
url https://www.frontiersin.org/articles/10.3389/fgene.2022.981269/full
work_keys_str_mv AT tianzheng tlsubatransferlearningbasedenhancementtoaccuratelydetectmutationswithwidespectrumsubclonalproportion
AT tianzheng tlsubatransferlearningbasedenhancementtoaccuratelydetectmutationswithwidespectrumsubclonalproportion