The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature

<p>Abstract</p> <p>Background</p> <p>One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer...

Full description

Bibliographic Details
Main Authors: Sun Lin, Silins Ilona, Korhonen Anna, Stenius Ulla
Format: Article
Language:English
Published: BMC 2009-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/303
_version_ 1819085113539952640
author Sun Lin
Silins Ilona
Korhonen Anna
Stenius Ulla
author_facet Sun Lin
Silins Ilona
Korhonen Anna
Stenius Ulla
author_sort Sun Lin
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature.</p> <p>Results</p> <p>The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice.</p> <p>Conclusion</p> <p>We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA.</p>
first_indexed 2024-12-21T20:59:12Z
format Article
id doaj.art-558e38e55f2c48128f8499183b52f646
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T20:59:12Z
publishDate 2009-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-558e38e55f2c48128f8499183b52f6462022-12-21T18:50:28ZengBMCBMC Bioinformatics1471-21052009-09-0110130310.1186/1471-2105-10-303The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literatureSun LinSilins IlonaKorhonen AnnaStenius Ulla<p>Abstract</p> <p>Background</p> <p>One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature.</p> <p>Results</p> <p>The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice.</p> <p>Conclusion</p> <p>We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA.</p>http://www.biomedcentral.com/1471-2105/10/303
spellingShingle Sun Lin
Silins Ilona
Korhonen Anna
Stenius Ulla
The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
BMC Bioinformatics
title The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_full The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_fullStr The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_full_unstemmed The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_short The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_sort first step in the development of text mining technology for cancer risk assessment identifying and organizing scientific evidence in risk assessment literature
url http://www.biomedcentral.com/1471-2105/10/303
work_keys_str_mv AT sunlin thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT silinsilona thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT korhonenanna thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT steniusulla thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT sunlin firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT silinsilona firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT korhonenanna firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature
AT steniusulla firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature