Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction
<p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a M...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2008-04-01
|
Series: | BMC Bioinformatics |
_version_ | 1818514418629083136 |
---|---|
author | Mottaz Anaïs Ehrler Frédéric Tbahriti Imad Gobeill Julien Veuthey Anne-Lise Ruch Patrick |
author_facet | Mottaz Anaïs Ehrler Frédéric Tbahriti Imad Gobeill Julien Veuthey Anne-Lise Ruch Patrick |
author_sort | Mottaz Anaïs |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.</p> <p>Results</p> <p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).</p> <p>Conclusions</p> <p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.</p> |
first_indexed | 2024-12-11T00:15:32Z |
format | Article |
id | doaj.art-b2ba5ef4a5234c18a4e1dd6d621c6ac0 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-11T00:15:32Z |
publishDate | 2008-04-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-b2ba5ef4a5234c18a4e1dd6d621c6ac02022-12-22T01:27:58ZengBMCBMC Bioinformatics1471-21052008-04-019Suppl 3S910.1186/1471-2105-9-S3-S9Gene Ontology density estimation and discourse analysis for automatic GeneRiF extractionMottaz AnaïsEhrler FrédéricTbahriti ImadGobeill JulienVeuthey Anne-LiseRuch Patrick<p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.</p> <p>Results</p> <p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).</p> <p>Conclusions</p> <p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.</p> |
spellingShingle | Mottaz Anaïs Ehrler Frédéric Tbahriti Imad Gobeill Julien Veuthey Anne-Lise Ruch Patrick Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction BMC Bioinformatics |
title | Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction |
title_full | Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction |
title_fullStr | Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction |
title_full_unstemmed | Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction |
title_short | Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction |
title_sort | gene ontology density estimation and discourse analysis for automatic generif extraction |
work_keys_str_mv | AT mottazanais geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction AT ehrlerfrederic geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction AT tbahritiimad geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction AT gobeilljulien geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction AT veutheyannelise geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction AT ruchpatrick geneontologydensityestimationanddiscourseanalysisforautomaticgenerifextraction |