Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach

Copy number variation (CNV) is a primary source of structural variation in the human genome, leading to several disorders. Therefore, analyzing neonatal CNVs is crucial for managing CNV-related chromosomal disabilities. However, genomic waves can hinder accurate CNV analysis. To mitigate the influen...

Full description

Bibliographic Details
Main Authors: Chul Jun Goh, Hyuk-Jung Kwon, Yoonhee Kim, Seunghee Jung, Jiwoo Park, Isaac Kise Lee, Bo-Ram Park, Myeong-Ji Kim, Min-Jeong Kim, Min-Seob Lee
Format: Article
Language:English
Published: MDPI AG 2023-12-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/14/1/84
_version_ 1797358914749071360
author Chul Jun Goh
Hyuk-Jung Kwon
Yoonhee Kim
Seunghee Jung
Jiwoo Park
Isaac Kise Lee
Bo-Ram Park
Myeong-Ji Kim
Min-Jeong Kim
Min-Seob Lee
author_facet Chul Jun Goh
Hyuk-Jung Kwon
Yoonhee Kim
Seunghee Jung
Jiwoo Park
Isaac Kise Lee
Bo-Ram Park
Myeong-Ji Kim
Min-Jeong Kim
Min-Seob Lee
author_sort Chul Jun Goh
collection DOAJ
description Copy number variation (CNV) is a primary source of structural variation in the human genome, leading to several disorders. Therefore, analyzing neonatal CNVs is crucial for managing CNV-related chromosomal disabilities. However, genomic waves can hinder accurate CNV analysis. To mitigate the influences of the waves, we adopted a machine learning approach and developed a new method that uses a modified log R ratio instead of the commonly used log R ratio. Validation results using samples with known CNVs demonstrated the superior performance of our method. We analyzed a total of 16,046 Korean newborn samples using the new method and identified CNVs related to 39 genetic disorders were identified in 342 cases. The most frequently detected CNV-related disorder was Joubert syndrome 4. The accuracy of our method was further confirmed by analyzing a subset of the detected results using NGS and comparing them with our results. The utilization of a genome-wide single nucleotide polymorphism array with wave offset was shown to be a powerful method for identifying CNVs in neonatal cases. The accurate screening and the ability to identify various disease susceptibilities offered by our new method could facilitate the identification of CNV-associated chromosomal disease etiologies.
first_indexed 2024-03-08T15:09:06Z
format Article
id doaj.art-1d169bfa2e214cc4b0bd34619280351a
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-08T15:09:06Z
publishDate 2023-12-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-1d169bfa2e214cc4b0bd34619280351a2024-01-10T14:53:55ZengMDPI AGDiagnostics2075-44182023-12-011418410.3390/diagnostics14010084Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based ApproachChul Jun Goh0Hyuk-Jung Kwon1Yoonhee Kim2Seunghee Jung3Jiwoo Park4Isaac Kise Lee5Bo-Ram Park6Myeong-Ji Kim7Min-Jeong Kim8Min-Seob Lee9Eone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaDiagnomics, Inc., 5795 Kearny Villa Rd., San Diego, CA 92123, USAEone-Diagnomics Genome Center, Inc., 143, Gaetbeol-ro, Yeonsu-gu, Incheon 21999, Republic of KoreaCopy number variation (CNV) is a primary source of structural variation in the human genome, leading to several disorders. Therefore, analyzing neonatal CNVs is crucial for managing CNV-related chromosomal disabilities. However, genomic waves can hinder accurate CNV analysis. To mitigate the influences of the waves, we adopted a machine learning approach and developed a new method that uses a modified log R ratio instead of the commonly used log R ratio. Validation results using samples with known CNVs demonstrated the superior performance of our method. We analyzed a total of 16,046 Korean newborn samples using the new method and identified CNVs related to 39 genetic disorders were identified in 342 cases. The most frequently detected CNV-related disorder was Joubert syndrome 4. The accuracy of our method was further confirmed by analyzing a subset of the detected results using NGS and comparing them with our results. The utilization of a genome-wide single nucleotide polymorphism array with wave offset was shown to be a powerful method for identifying CNVs in neonatal cases. The accurate screening and the ability to identify various disease susceptibilities offered by our new method could facilitate the identification of CNV-associated chromosomal disease etiologies.https://www.mdpi.com/2075-4418/14/1/84CNVgenome-wide SNP arrayKorean newbornmachine learninggenomic wave
spellingShingle Chul Jun Goh
Hyuk-Jung Kwon
Yoonhee Kim
Seunghee Jung
Jiwoo Park
Isaac Kise Lee
Bo-Ram Park
Myeong-Ji Kim
Min-Jeong Kim
Min-Seob Lee
Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
Diagnostics
CNV
genome-wide SNP array
Korean newborn
machine learning
genomic wave
title Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
title_full Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
title_fullStr Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
title_full_unstemmed Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
title_short Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach
title_sort improving cnv detection performance in microarray data using a machine learning based approach
topic CNV
genome-wide SNP array
Korean newborn
machine learning
genomic wave
url https://www.mdpi.com/2075-4418/14/1/84
work_keys_str_mv AT chuljungoh improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT hyukjungkwon improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT yoonheekim improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT seungheejung improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT jiwoopark improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT isaackiselee improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT borampark improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT myeongjikim improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT minjeongkim improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach
AT minseoblee improvingcnvdetectionperformanceinmicroarraydatausingamachinelearningbasedapproach