Textrous!: extracting semantic textual meaning from gene sets.

The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant inform...

Full description

Bibliographic Details
Main Authors: Hongyu Chen, Bronwen Martin, Caitlin M Daimon, Sana Siddiqui, Louis M Luttrell, Stuart Maudsley
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3639949?pdf=render
_version_ 1831691073753186304
author Hongyu Chen
Bronwen Martin
Caitlin M Daimon
Sana Siddiqui
Louis M Luttrell
Stuart Maudsley
author_facet Hongyu Chen
Bronwen Martin
Caitlin M Daimon
Sana Siddiqui
Louis M Luttrell
Stuart Maudsley
author_sort Hongyu Chen
collection DOAJ
description The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.
first_indexed 2024-12-20T11:24:10Z
format Article
id doaj.art-283dbe02c34d4c90acd3136d45ee634a
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-20T11:24:10Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-283dbe02c34d4c90acd3136d45ee634a2022-12-21T19:42:26ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0184e6266510.1371/journal.pone.0062665Textrous!: extracting semantic textual meaning from gene sets.Hongyu ChenBronwen MartinCaitlin M DaimonSana SiddiquiLouis M LuttrellStuart MaudsleyThe un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.http://europepmc.org/articles/PMC3639949?pdf=render
spellingShingle Hongyu Chen
Bronwen Martin
Caitlin M Daimon
Sana Siddiqui
Louis M Luttrell
Stuart Maudsley
Textrous!: extracting semantic textual meaning from gene sets.
PLoS ONE
title Textrous!: extracting semantic textual meaning from gene sets.
title_full Textrous!: extracting semantic textual meaning from gene sets.
title_fullStr Textrous!: extracting semantic textual meaning from gene sets.
title_full_unstemmed Textrous!: extracting semantic textual meaning from gene sets.
title_short Textrous!: extracting semantic textual meaning from gene sets.
title_sort textrous extracting semantic textual meaning from gene sets
url http://europepmc.org/articles/PMC3639949?pdf=render
work_keys_str_mv AT hongyuchen textrousextractingsemantictextualmeaningfromgenesets
AT bronwenmartin textrousextractingsemantictextualmeaningfromgenesets
AT caitlinmdaimon textrousextractingsemantictextualmeaningfromgenesets
AT sanasiddiqui textrousextractingsemantictextualmeaningfromgenesets
AT louismluttrell textrousextractingsemantictextualmeaningfromgenesets
AT stuartmaudsley textrousextractingsemantictextualmeaningfromgenesets