CNV-P: a machine-learning framework for predicting high confident copy number variations
Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs f...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2021-12-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/12564.pdf |
_version_ | 1827608196563337216 |
---|---|
author | Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou |
author_facet | Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou |
author_sort | Taifu Wang |
collection | DOAJ |
description | Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. |
first_indexed | 2024-03-09T07:07:36Z |
format | Article |
id | doaj.art-ad731120167f4781abc75638c1c55012 |
institution | Directory Open Access Journal |
issn | 2167-8359 |
language | English |
last_indexed | 2024-03-09T07:07:36Z |
publishDate | 2021-12-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ |
spelling | doaj.art-ad731120167f4781abc75638c1c550122023-12-03T09:28:10ZengPeerJ Inc.PeerJ2167-83592021-12-019e1256410.7717/peerj.12564CNV-P: a machine-learning framework for predicting high confident copy number variationsTaifu Wang0Jinghua Sun1Xiuqing Zhang2Wen-Jing Wang3Qing Zhou4BGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBGI-Shenzhen, Shenzhen, ChinaBackground Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.https://peerj.com/articles/12564.pdfCopy number variantMachine learningGenome sequencing |
spellingShingle | Taifu Wang Jinghua Sun Xiuqing Zhang Wen-Jing Wang Qing Zhou CNV-P: a machine-learning framework for predicting high confident copy number variations PeerJ Copy number variant Machine learning Genome sequencing |
title | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_fullStr | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full_unstemmed | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_short | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_sort | cnv p a machine learning framework for predicting high confident copy number variations |
topic | Copy number variant Machine learning Genome sequencing |
url | https://peerj.com/articles/12564.pdf |
work_keys_str_mv | AT taifuwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT jinghuasun cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT xiuqingzhang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT wenjingwang cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT qingzhou cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations |