CNV-P: a machine-learning framework for predicting high confident copy number variations

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs f...

Full description

Bibliographic Details
Main Authors: Taifu Wang, Jinghua Sun, Xiuqing Zhang, Wen-Jing Wang, Qing Zhou
Format: Article
Language:English
Published: PeerJ Inc. 2021-12-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/12564.pdf
_version_ 1827608196563337216
author Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
author_facet Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
author_sort Taifu Wang
collection DOAJ
description Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.
first_indexed 2024-03-09T07:07:36Z
format Article
id doaj.art-ad731120167f4781abc75638c1c55012
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T07:07:36Z
publishDate 2021-12-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-ad731120167f4781abc75638c1c550122023-12-03T09:28:10ZengPeerJ Inc.PeerJ2167-83592021-12-019e1256410.7717/peerj.12564CNV-P: a machine-learning framework for predicting high confident copy number variationsTaifu Wang0Jinghua Sun1Xiuqing Zhang2Wen-Jing Wang3Qing Zhou4BGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBackground Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.https://peerj.com/articles/12564.pdfCopy number variantMachine learningGenome sequencing
spellingShingle Taifu Wang
Jinghua Sun
Xiuqing Zhang
Wen-Jing Wang
Qing Zhou
CNV-P: a machine-learning framework for predicting high confident copy number variations
PeerJ
Copy number variant
Machine learning
Genome sequencing
title CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full CNV-P: a machine-learning framework for predicting high confident copy number variations
title_fullStr CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full_unstemmed CNV-P: a machine-learning framework for predicting high confident copy number variations
title_short CNV-P: a machine-learning framework for predicting high confident copy number variations
title_sort cnv p a machine learning framework for predicting high confident copy number variations
topic Copy number variant
Machine learning
Genome sequencing
url https://peerj.com/articles/12564.pdf
work_keys_str_mv AT taifuwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT jinghuasun cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT xiuqingzhang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT wenjingwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT qingzhou cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations