Unsupervised Outlier Profile Analysis

In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we e...

Full description

Bibliographic Details
Main Authors: Debashis Ghosh, Song Li
Format: Article
Language:English
Published: SAGE Publishing 2014-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S13969
_version_ 1828822606335180800
author Debashis Ghosh
Song Li
author_facet Debashis Ghosh
Song Li
author_sort Debashis Ghosh
collection DOAJ
description In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C (α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study.
first_indexed 2024-12-12T13:17:12Z
format Article
id doaj.art-9a81802812a8458bb15b4f5a3533bb3b
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-12T13:17:12Z
publishDate 2014-01-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-9a81802812a8458bb15b4f5a3533bb3b2022-12-22T00:23:23ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s410.4137/CIN.S13969Unsupervised Outlier Profile AnalysisDebashis Ghosh0Song Li1Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.Duke Institute for Genome Sciences and Policy, Duke University, Durham, NC, USA.In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C (α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study.https://doi.org/10.4137/CIN.S13969
spellingShingle Debashis Ghosh
Song Li
Unsupervised Outlier Profile Analysis
Cancer Informatics
title Unsupervised Outlier Profile Analysis
title_full Unsupervised Outlier Profile Analysis
title_fullStr Unsupervised Outlier Profile Analysis
title_full_unstemmed Unsupervised Outlier Profile Analysis
title_short Unsupervised Outlier Profile Analysis
title_sort unsupervised outlier profile analysis
url https://doi.org/10.4137/CIN.S13969
work_keys_str_mv AT debashisghosh unsupervisedoutlierprofileanalysis
AT songli unsupervisedoutlierprofileanalysis