Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection

<p>Abstract</p> <p>Background</p> <p>Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci,...

Full description

Bibliographic Details
Main Authors: Urbanowicz Ryan J, Kiralis Jeff, Fisher Jonathan M, Moore Jason H
Format: Article
Language:English
Published: BMC 2012-09-01
Series:BioData Mining
Subjects:
Online Access:http://www.biodatamining.org/content/5/1/15
_version_ 1818835538634866688
author Urbanowicz Ryan J
Kiralis Jeff
Fisher Jonathan M
Moore Jason H
author_facet Urbanowicz Ryan J
Kiralis Jeff
Fisher Jonathan M
Moore Jason H
author_sort Urbanowicz Ryan J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.</p> <p>Results</p> <p>We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability.</p> <p>Conclusions</p> <p>This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, <it>n</it>-locus epistatic models.</p>
first_indexed 2024-12-19T02:52:18Z
format Article
id doaj.art-e2bfeff8932340e28ecd2bab92c9d349
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-12-19T02:52:18Z
publishDate 2012-09-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-e2bfeff8932340e28ecd2bab92c9d3492022-12-21T20:38:35ZengBMCBioData Mining1756-03812012-09-01511510.1186/1756-0381-5-15Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selectionUrbanowicz Ryan JKiralis JeffFisher Jonathan MMoore Jason H<p>Abstract</p> <p>Background</p> <p>Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.</p> <p>Results</p> <p>We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability.</p> <p>Conclusions</p> <p>This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, <it>n</it>-locus epistatic models.</p>http://www.biodatamining.org/content/5/1/15EDMCORGAMETESSNPModel detectionEpistasisSimulationModelGenetics
spellingShingle Urbanowicz Ryan J
Kiralis Jeff
Fisher Jonathan M
Moore Jason H
Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
BioData Mining
EDM
COR
GAMETES
SNP
Model detection
Epistasis
Simulation
Model
Genetics
title Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_full Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_fullStr Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_full_unstemmed Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_short Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
title_sort predicting the difficulty of pure strict epistatic models metrics for simulated model selection
topic EDM
COR
GAMETES
SNP
Model detection
Epistasis
Simulation
Model
Genetics
url http://www.biodatamining.org/content/5/1/15
work_keys_str_mv AT urbanowiczryanj predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection
AT kiralisjeff predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection
AT fisherjonathanm predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection
AT moorejasonh predictingthedifficultyofpurestrictepistaticmodelsmetricsforsimulatedmodelselection