Unsupervised Outlier Profile Analysis
In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we e...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2014-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S13969 |
_version_ | 1828822606335180800 |
---|---|
author | Debashis Ghosh Song Li |
author_facet | Debashis Ghosh Song Li |
author_sort | Debashis Ghosh |
collection | DOAJ |
description | In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C (α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study. |
first_indexed | 2024-12-12T13:17:12Z |
format | Article |
id | doaj.art-9a81802812a8458bb15b4f5a3533bb3b |
institution | Directory Open Access Journal |
issn | 1176-9351 |
language | English |
last_indexed | 2024-12-12T13:17:12Z |
publishDate | 2014-01-01 |
publisher | SAGE Publishing |
record_format | Article |
series | Cancer Informatics |
spelling | doaj.art-9a81802812a8458bb15b4f5a3533bb3b2022-12-22T00:23:23ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s410.4137/CIN.S13969Unsupervised Outlier Profile AnalysisDebashis Ghosh0Song Li1Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.Duke Institute for Genome Sciences and Policy, Duke University, Durham, NC, USA.In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C (α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study.https://doi.org/10.4137/CIN.S13969 |
spellingShingle | Debashis Ghosh Song Li Unsupervised Outlier Profile Analysis Cancer Informatics |
title | Unsupervised Outlier Profile Analysis |
title_full | Unsupervised Outlier Profile Analysis |
title_fullStr | Unsupervised Outlier Profile Analysis |
title_full_unstemmed | Unsupervised Outlier Profile Analysis |
title_short | Unsupervised Outlier Profile Analysis |
title_sort | unsupervised outlier profile analysis |
url | https://doi.org/10.4137/CIN.S13969 |
work_keys_str_mv | AT debashisghosh unsupervisedoutlierprofileanalysis AT songli unsupervisedoutlierprofileanalysis |