A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.

Changes in gene expression can help reveal the mechanisms of disease processes and the mode of action for toxicities and adverse effects on cellular responses induced by exposures to chemicals, drugs and environment agents. The U.S. Tox21 Federal collaboration, which currently quantifies the biologi...

Full description

Bibliographic Details
Main Authors: Deepak Mav, Ruchir R Shah, Brian E Howard, Scott S Auerbach, Pierre R Bushel, Jennifer B Collins, David L Gerhold, Richard S Judson, Agnes L Karmaus, Elizabeth A Maull, Donna L Mendrick, B Alex Merrick, Nisha S Sipes, Daniel Svoboda, Richard S Paules
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5819766?pdf=render
_version_ 1818028898975219712
author Deepak Mav
Ruchir R Shah
Brian E Howard
Scott S Auerbach
Pierre R Bushel
Jennifer B Collins
David L Gerhold
Richard S Judson
Agnes L Karmaus
Elizabeth A Maull
Donna L Mendrick
B Alex Merrick
Nisha S Sipes
Daniel Svoboda
Richard S Paules
author_facet Deepak Mav
Ruchir R Shah
Brian E Howard
Scott S Auerbach
Pierre R Bushel
Jennifer B Collins
David L Gerhold
Richard S Judson
Agnes L Karmaus
Elizabeth A Maull
Donna L Mendrick
B Alex Merrick
Nisha S Sipes
Daniel Svoboda
Richard S Paules
author_sort Deepak Mav
collection DOAJ
description Changes in gene expression can help reveal the mechanisms of disease processes and the mode of action for toxicities and adverse effects on cellular responses induced by exposures to chemicals, drugs and environment agents. The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. Our approach is modular, applicable to any species, and facilitates a robust, quantitative evaluation of performance. In particular, we were able to perform gene selection such that the resulting set of "sentinel genes" adequately represents all known canonical pathways from Molecular Signature Database (MSigDB v4.0) and can be used to infer expression changes for the remainder of the transcriptome. The resulting computational model allowed us to choose a purely data-driven subset of 1500 sentinel genes, referred to as the S1500 set, which was then augmented using a knowledge-driven selection of additional genes to create the final S1500+ gene set. Our results indicate that the sentinel genes selected can be used to accurately predict pathway perturbations and biological relationships for samples under study.
first_indexed 2024-12-10T05:11:07Z
format Article
id doaj.art-ac4c3100313c431db2f048cc682fb4c1
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-10T05:11:07Z
publishDate 2018-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-ac4c3100313c431db2f048cc682fb4c12022-12-22T02:01:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01132e019110510.1371/journal.pone.0191105A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.Deepak MavRuchir R ShahBrian E HowardScott S AuerbachPierre R BushelJennifer B CollinsDavid L GerholdRichard S JudsonAgnes L KarmausElizabeth A MaullDonna L MendrickB Alex MerrickNisha S SipesDaniel SvobodaRichard S PaulesChanges in gene expression can help reveal the mechanisms of disease processes and the mode of action for toxicities and adverse effects on cellular responses induced by exposures to chemicals, drugs and environment agents. The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. Our approach is modular, applicable to any species, and facilitates a robust, quantitative evaluation of performance. In particular, we were able to perform gene selection such that the resulting set of "sentinel genes" adequately represents all known canonical pathways from Molecular Signature Database (MSigDB v4.0) and can be used to infer expression changes for the remainder of the transcriptome. The resulting computational model allowed us to choose a purely data-driven subset of 1500 sentinel genes, referred to as the S1500 set, which was then augmented using a knowledge-driven selection of additional genes to create the final S1500+ gene set. Our results indicate that the sentinel genes selected can be used to accurately predict pathway perturbations and biological relationships for samples under study.http://europepmc.org/articles/PMC5819766?pdf=render
spellingShingle Deepak Mav
Ruchir R Shah
Brian E Howard
Scott S Auerbach
Pierre R Bushel
Jennifer B Collins
David L Gerhold
Richard S Judson
Agnes L Karmaus
Elizabeth A Maull
Donna L Mendrick
B Alex Merrick
Nisha S Sipes
Daniel Svoboda
Richard S Paules
A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
PLoS ONE
title A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
title_full A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
title_fullStr A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
title_full_unstemmed A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
title_short A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.
title_sort hybrid gene selection approach to create the s1500 targeted gene sets for use in high throughput transcriptomics
url http://europepmc.org/articles/PMC5819766?pdf=render
work_keys_str_mv AT deepakmav ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT ruchirrshah ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT brianehoward ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT scottsauerbach ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT pierrerbushel ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT jenniferbcollins ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT davidlgerhold ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT richardsjudson ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT agneslkarmaus ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT elizabethamaull ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT donnalmendrick ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT balexmerrick ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT nishassipes ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT danielsvoboda ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT richardspaules ahybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT deepakmav hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT ruchirrshah hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT brianehoward hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT scottsauerbach hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT pierrerbushel hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT jenniferbcollins hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT davidlgerhold hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT richardsjudson hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT agneslkarmaus hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT elizabethamaull hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT donnalmendrick hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT balexmerrick hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT nishassipes hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT danielsvoboda hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics
AT richardspaules hybridgeneselectionapproachtocreatethes1500targetedgenesetsforuseinhighthroughputtranscriptomics