Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data

Population structure can be revealed using Single Nucleotide Polymorphisms (SNPs) which are genetic variations found in the DNA sequences of individuals. Due to the large number of SNPs, visualization of SNP data is often achieved through dimensionality reduction. Although Principal Component Analys...

Full description

Bibliographic Details
Main Author: Dimitrios Charalampidis
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10041144/
_version_ 1797905458506235904
author Dimitrios Charalampidis
author_facet Dimitrios Charalampidis
author_sort Dimitrios Charalampidis
collection DOAJ
description Population structure can be revealed using Single Nucleotide Polymorphisms (SNPs) which are genetic variations found in the DNA sequences of individuals. Due to the large number of SNPs, visualization of SNP data is often achieved through dimensionality reduction. Although Principal Component Analysis (PCA) has been extensively used for SNP data visualization, some other dimensionality reduction methods have been shown to be more successful in revealing complex population structures. Nevertheless, these techniques often suffer from reduced ability to preserve the global structure in the SNP data, namely the relative genetic distance between subpopulations, or from high computational cost. In this work, a method which uses Multidimensional Scaling (MDS) of smoothed PCA-transformed data (MSSPD) is proposed. MSSPD successfully reveals population structures in 2D maps, while being more effective than other techniques in preserving the global structure. In terms of computational efficiency, MSSPD is comparable to the fastest SNP visualization methods.
first_indexed 2024-04-10T10:05:38Z
format Article
id doaj.art-5fb0156eac00400798d96f7a1dc06fbf
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T10:05:38Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-5fb0156eac00400798d96f7a1dc06fbf2023-02-16T00:00:39ZengIEEEIEEE Access2169-35362023-01-0111135941360410.1109/ACCESS.2023.324357310041144Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed DataDimitrios Charalampidis0https://orcid.org/0000-0002-4311-5428Electrical and Computer Engineering Department, The University of New Orleans, New Orleans, LA, USAPopulation structure can be revealed using Single Nucleotide Polymorphisms (SNPs) which are genetic variations found in the DNA sequences of individuals. Due to the large number of SNPs, visualization of SNP data is often achieved through dimensionality reduction. Although Principal Component Analysis (PCA) has been extensively used for SNP data visualization, some other dimensionality reduction methods have been shown to be more successful in revealing complex population structures. Nevertheless, these techniques often suffer from reduced ability to preserve the global structure in the SNP data, namely the relative genetic distance between subpopulations, or from high computational cost. In this work, a method which uses Multidimensional Scaling (MDS) of smoothed PCA-transformed data (MSSPD) is proposed. MSSPD successfully reveals population structures in 2D maps, while being more effective than other techniques in preserving the global structure. In terms of computational efficiency, MSSPD is comparable to the fastest SNP visualization methods.https://ieeexplore.ieee.org/document/10041144/Dimensionality reductionmultidimensional scalingPCApopulation structuresingle nucleotide polymorphisms
spellingShingle Dimitrios Charalampidis
Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
IEEE Access
Dimensionality reduction
multidimensional scaling
PCA
population structure
single nucleotide polymorphisms
title Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
title_full Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
title_fullStr Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
title_full_unstemmed Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
title_short Visualizing Population Structures by Multidimensional Scaling of Smoothed PCA-Transformed Data
title_sort visualizing population structures by multidimensional scaling of smoothed pca transformed data
topic Dimensionality reduction
multidimensional scaling
PCA
population structure
single nucleotide polymorphisms
url https://ieeexplore.ieee.org/document/10041144/
work_keys_str_mv AT dimitrioscharalampidis visualizingpopulationstructuresbymultidimensionalscalingofsmoothedpcatransformeddata