Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely h...

Full description

Bibliographic Details
Main Authors: Min Wang, Steven M Kornblau, Kevin R Coombes
Format: Article
Language:English
Published: SAGE Publishing 2018-05-01
Series:Cancer Informatics
Online Access:https://doi.org/10.1177/1176935118771082
_version_ 1818244733848256512
author Min Wang
Steven M Kornblau
Kevin R Coombes
author_facet Min Wang
Steven M Kornblau
Kevin R Coombes
author_sort Min Wang
collection DOAJ
description Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.
first_indexed 2024-12-12T14:21:43Z
format Article
id doaj.art-12c8c825ad494676a98b048715b80cef
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-12T14:21:43Z
publishDate 2018-05-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-12c8c825ad494676a98b048715b80cef2022-12-22T00:21:47ZengSAGE PublishingCancer Informatics1176-93512018-05-011710.1177/1176935118771082Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal ComponentsMin Wang0Steven M Kornblau1Kevin R Coombes2Mathematical Biosciences Institute, The Ohio State University, Columbus, OH, USADepartment of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USADepartment of Biomedical Informatics, The Ohio State University, Columbus, OH, USAPrincipal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.https://doi.org/10.1177/1176935118771082
spellingShingle Min Wang
Steven M Kornblau
Kevin R Coombes
Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
Cancer Informatics
title Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
title_full Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
title_fullStr Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
title_full_unstemmed Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
title_short Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
title_sort decomposing the apoptosis pathway into biologically interpretable principal components
url https://doi.org/10.1177/1176935118771082
work_keys_str_mv AT minwang decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents
AT stevenmkornblau decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents
AT kevinrcoombes decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents