Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural lang...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Libertas Academica, Ltd
2017
|
Online Access: | http://hdl.handle.net/1721.1/106905 https://orcid.org/0000-0001-8411-6403 |
_version_ | 1826197115762638848 |
---|---|
author | Luo, Yuan Szolovits, Peter |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Luo, Yuan Szolovits, Peter |
author_sort | Luo, Yuan |
collection | MIT |
description | In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. |
first_indexed | 2024-09-23T10:42:44Z |
format | Article |
id | mit-1721.1/106905 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T10:42:44Z |
publishDate | 2017 |
publisher | Libertas Academica, Ltd |
record_format | dspace |
spelling | mit-1721.1/1069052022-09-27T14:27:35Z Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records Luo, Yuan Szolovits, Peter Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Szolovits, Peter In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. National Institutes of Health (U.S.) (Grants 5U54 LM008748 and 1U54 HG007963) 2017-02-10T19:40:21Z 2017-02-10T19:40:21Z 2016-07 2016-06 Article http://purl.org/eprint/type/JournalArticle 1178-2226 http://hdl.handle.net/1721.1/106905 Luo, and Peter Szolovits. “Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.” Biomedical Informatics Insights (2016): 29. https://orcid.org/0000-0001-8411-6403 en_US http://dx.doi.org/10.4137/bii.s38916 Biomedical Informatics Insights Creative Commons Attribution-NonCommercial 3.0 Unported licence http://creativecommons.org/licenses/by-nc/3.0/ application/pdf Libertas Academica, Ltd Libertas Academica |
spellingShingle | Luo, Yuan Szolovits, Peter Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title | Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title_full | Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title_fullStr | Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title_full_unstemmed | Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title_short | Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records |
title_sort | efficient queries of stand off annotations for natural language processing on electronic medical records |
url | http://hdl.handle.net/1721.1/106905 https://orcid.org/0000-0001-8411-6403 |
work_keys_str_mv | AT luoyuan efficientqueriesofstandoffannotationsfornaturallanguageprocessingonelectronicmedicalrecords AT szolovitspeter efficientqueriesofstandoffannotationsfornaturallanguageprocessingonelectronicmedicalrecords |