Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma

In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed...

Full description

Bibliographic Details
Main Authors: Satish Patel, Rick Jordan, James Lyons-Weiler, Hai Hu
Format: Article
Language:English
Published: SAGE Publishing 2008-01-01
Series:Cancer Informatics
Subjects:
Online Access:http://la-press.com/article.php?article_id=904
_version_ 1818970433201897472
author Satish Patel
Rick Jordan
James Lyons-Weiler
Hai Hu
author_facet Satish Patel
Rick Jordan
James Lyons-Weiler
Hai Hu
author_sort Satish Patel
collection DOAJ
description In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA. html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.
first_indexed 2024-12-20T14:36:24Z
format Article
id doaj.art-928fca3a2147410282eb6ee13b7045e1
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-20T14:36:24Z
publishDate 2008-01-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-928fca3a2147410282eb6ee13b7045e12022-12-21T19:37:27ZengSAGE PublishingCancer Informatics1176-93512008-01-016389421Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung AdenocarcinomaSatish PatelRick JordanJames Lyons-WeilerHai HuIn this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA. html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.http://la-press.com/article.php?article_id=904efficiencymicroarrayfeature selectionevaluation
spellingShingle Satish Patel
Rick Jordan
James Lyons-Weiler
Hai Hu
Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
Cancer Informatics
efficiency
microarray
feature selection
evaluation
title Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_full Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_fullStr Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_full_unstemmed Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_short Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
title_sort efficiency analysis of competing tests for finding differentially expressed genes in lung adenocarcinoma
topic efficiency
microarray
feature selection
evaluation
url http://la-press.com/article.php?article_id=904
work_keys_str_mv AT satishpatel efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma
AT rickjordan efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma
AT jameslyonsweiler efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma
AT haihu efficiencyanalysisofcompetingtestsforfindingdifferentiallyexpressedgenesinlungadenocarcinoma