Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely h...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2018-05-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.1177/1176935118771082 |
_version_ | 1818244733848256512 |
---|---|
author | Min Wang Steven M Kornblau Kevin R Coombes |
author_facet | Min Wang Steven M Kornblau Kevin R Coombes |
author_sort | Min Wang |
collection | DOAJ |
description | Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable. |
first_indexed | 2024-12-12T14:21:43Z |
format | Article |
id | doaj.art-12c8c825ad494676a98b048715b80cef |
institution | Directory Open Access Journal |
issn | 1176-9351 |
language | English |
last_indexed | 2024-12-12T14:21:43Z |
publishDate | 2018-05-01 |
publisher | SAGE Publishing |
record_format | Article |
series | Cancer Informatics |
spelling | doaj.art-12c8c825ad494676a98b048715b80cef2022-12-22T00:21:47ZengSAGE PublishingCancer Informatics1176-93512018-05-011710.1177/1176935118771082Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal ComponentsMin Wang0Steven M Kornblau1Kevin R Coombes2Mathematical Biosciences Institute, The Ohio State University, Columbus, OH, USADepartment of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USADepartment of Biomedical Informatics, The Ohio State University, Columbus, OH, USAPrincipal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.https://doi.org/10.1177/1176935118771082 |
spellingShingle | Min Wang Steven M Kornblau Kevin R Coombes Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components Cancer Informatics |
title | Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components |
title_full | Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components |
title_fullStr | Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components |
title_full_unstemmed | Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components |
title_short | Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components |
title_sort | decomposing the apoptosis pathway into biologically interpretable principal components |
url | https://doi.org/10.1177/1176935118771082 |
work_keys_str_mv | AT minwang decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents AT stevenmkornblau decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents AT kevinrcoombes decomposingtheapoptosispathwayintobiologicallyinterpretableprincipalcomponents |