Text Mining the History of Medicine.

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining...

Full description

Bibliographic Details
Main Authors: Paul Thompson, Riza Theresa Batista-Navarro, Georgios Kontonatsios, Jacob Carter, Elizabeth Toon, John McNaught, Carsten Timmermann, Michael Worboys, Sophia Ananiadou
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4703377?pdf=render
_version_ 1818233105790533632
author Paul Thompson
Riza Theresa Batista-Navarro
Georgios Kontonatsios
Jacob Carter
Elizabeth Toon
John McNaught
Carsten Timmermann
Michael Worboys
Sophia Ananiadou
author_facet Paul Thompson
Riza Theresa Batista-Navarro
Georgios Kontonatsios
Jacob Carter
Elizabeth Toon
John McNaught
Carsten Timmermann
Michael Worboys
Sophia Ananiadou
author_sort Paul Thompson
collection DOAJ
description Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.
first_indexed 2024-12-12T11:16:54Z
format Article
id doaj.art-4ba24474b444401cb10d9844197798a1
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-12T11:16:54Z
publishDate 2016-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-4ba24474b444401cb10d9844197798a12022-12-22T00:26:08ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01111e014471710.1371/journal.pone.0144717Text Mining the History of Medicine.Paul ThompsonRiza Theresa Batista-NavarroGeorgios KontonatsiosJacob CarterElizabeth ToonJohn McNaughtCarsten TimmermannMichael WorboysSophia AnaniadouHistorical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.http://europepmc.org/articles/PMC4703377?pdf=render
spellingShingle Paul Thompson
Riza Theresa Batista-Navarro
Georgios Kontonatsios
Jacob Carter
Elizabeth Toon
John McNaught
Carsten Timmermann
Michael Worboys
Sophia Ananiadou
Text Mining the History of Medicine.
PLoS ONE
title Text Mining the History of Medicine.
title_full Text Mining the History of Medicine.
title_fullStr Text Mining the History of Medicine.
title_full_unstemmed Text Mining the History of Medicine.
title_short Text Mining the History of Medicine.
title_sort text mining the history of medicine
url http://europepmc.org/articles/PMC4703377?pdf=render
work_keys_str_mv AT paulthompson textminingthehistoryofmedicine
AT rizatheresabatistanavarro textminingthehistoryofmedicine
AT georgioskontonatsios textminingthehistoryofmedicine
AT jacobcarter textminingthehistoryofmedicine
AT elizabethtoon textminingthehistoryofmedicine
AT johnmcnaught textminingthehistoryofmedicine
AT carstentimmermann textminingthehistoryofmedicine
AT michaelworboys textminingthehistoryofmedicine
AT sophiaananiadou textminingthehistoryofmedicine