Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network

Abstract Background Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important cha...

Full description

Bibliographic Details
Main Authors: Guillermo Argüello-González, José Aquino-Esperanza, Daniel Salvador, Rosa Bretón-Romero, Carlos Del Río-Bermudez, Jorge Tello, Sebastian Menke
Format: Article
Language:English
Published: BMC 2023-10-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-023-02301-5
_version_ 1797559268414586880
author Guillermo Argüello-González
José Aquino-Esperanza
Daniel Salvador
Rosa Bretón-Romero
Carlos Del Río-Bermudez
Jorge Tello
Sebastian Menke
author_facet Guillermo Argüello-González
José Aquino-Esperanza
Daniel Salvador
Rosa Bretón-Romero
Carlos Del Río-Bermudez
Jorge Tello
Sebastian Menke
author_sort Guillermo Argüello-González
collection DOAJ
description Abstract Background Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important challenge. A wide range of cNLP applications have been developed to detect the negation of medical entities in clinical free-text, however, effective solutions for languages other than English are scarce. This study aimed at developing a solution for negation recognition in Spanish EHRs based on a combination of a customized rule-based NegEx layer and a convolutional neural network (CNN). Methods Based on our previous experience in real world evidence (RWE) studies using information embedded in EHRs, negation recognition was simplified into a binary problem (‘affirmative’ vs. ‘non-affirmative’ class). For the NegEx layer, negation rules were obtained from a publicly available Spanish corpus and enriched with custom ones, whereby the CNN binary classifier was trained on EHRs annotated for clinical named entities (cNEs) and negation markers by medical doctors. Results The proposed negation recognition pipeline obtained precision, recall, and F1-score of 0.93, 0.94, and 0.94 for the ‘affirmative’ class, and 0.86, 0.84, and 0.85 for the ‘non-affirmative’ class, respectively. To validate the generalization capabilities of our methodology, we applied the negation recognition pipeline on EHRs (6,710 cNEs) from a different data source distribution than the training corpus and obtained consistent performance metrics for the ‘affirmative’ and ‘non-affirmative’ class (0.95, 0.97, and 0.96; and 0.90, 0.83, and 0.86 for precision, recall, and F1-score, respectively). Lastly, we evaluated the pipeline against two publicly available Spanish negation corpora, the IULA and NUBes, obtaining state-of-the-art metrics (1.00, 0.99, and 0.99; and 1.00, 0.93, and 0.96 for precision, recall, and F1-score, respectively). Conclusion Negation recognition is a source of low precision in the retrieval of cNEs from EHRs’ free-text. Combining a customized rule-based NegEx layer with a CNN binary classifier outperformed many other current approaches. RWE studies highly benefit from the correct recognition of negation as it reduces false positive detections of cNE which otherwise would undoubtedly reduce the credibility of cNLP systems.
first_indexed 2024-03-10T17:43:01Z
format Article
id doaj.art-a4996e079f03446fa09bc8cb90a21571
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-03-10T17:43:01Z
publishDate 2023-10-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-a4996e079f03446fa09bc8cb90a215712023-11-20T09:38:25ZengBMCBMC Medical Informatics and Decision Making1472-69472023-10-012311910.1186/s12911-023-02301-5Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural networkGuillermo Argüello-González0José Aquino-Esperanza1Daniel Salvador2Rosa Bretón-Romero3Carlos Del Río-Bermudez4Jorge Tello5Sebastian Menke6MedSavana SLMedSavana SLMedSavana SLSavana ResearchSavana ResearchMedSavana SLMedSavana SLAbstract Background Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important challenge. A wide range of cNLP applications have been developed to detect the negation of medical entities in clinical free-text, however, effective solutions for languages other than English are scarce. This study aimed at developing a solution for negation recognition in Spanish EHRs based on a combination of a customized rule-based NegEx layer and a convolutional neural network (CNN). Methods Based on our previous experience in real world evidence (RWE) studies using information embedded in EHRs, negation recognition was simplified into a binary problem (‘affirmative’ vs. ‘non-affirmative’ class). For the NegEx layer, negation rules were obtained from a publicly available Spanish corpus and enriched with custom ones, whereby the CNN binary classifier was trained on EHRs annotated for clinical named entities (cNEs) and negation markers by medical doctors. Results The proposed negation recognition pipeline obtained precision, recall, and F1-score of 0.93, 0.94, and 0.94 for the ‘affirmative’ class, and 0.86, 0.84, and 0.85 for the ‘non-affirmative’ class, respectively. To validate the generalization capabilities of our methodology, we applied the negation recognition pipeline on EHRs (6,710 cNEs) from a different data source distribution than the training corpus and obtained consistent performance metrics for the ‘affirmative’ and ‘non-affirmative’ class (0.95, 0.97, and 0.96; and 0.90, 0.83, and 0.86 for precision, recall, and F1-score, respectively). Lastly, we evaluated the pipeline against two publicly available Spanish negation corpora, the IULA and NUBes, obtaining state-of-the-art metrics (1.00, 0.99, and 0.99; and 1.00, 0.93, and 0.96 for precision, recall, and F1-score, respectively). Conclusion Negation recognition is a source of low precision in the retrieval of cNEs from EHRs’ free-text. Combining a customized rule-based NegEx layer with a CNN binary classifier outperformed many other current approaches. RWE studies highly benefit from the correct recognition of negation as it reduces false positive detections of cNE which otherwise would undoubtedly reduce the credibility of cNLP systems.https://doi.org/10.1186/s12911-023-02301-5NegationNegExCNNElectronic health recordsClinical Natural Language Processing
spellingShingle Guillermo Argüello-González
José Aquino-Esperanza
Daniel Salvador
Rosa Bretón-Romero
Carlos Del Río-Bermudez
Jorge Tello
Sebastian Menke
Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
BMC Medical Informatics and Decision Making
Negation
NegEx
CNN
Electronic health records
Clinical Natural Language Processing
title Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
title_full Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
title_fullStr Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
title_full_unstemmed Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
title_short Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network
title_sort negation recognition in clinical natural language processing using a combination of the negex algorithm and a convolutional neural network
topic Negation
NegEx
CNN
Electronic health records
Clinical Natural Language Processing
url https://doi.org/10.1186/s12911-023-02301-5
work_keys_str_mv AT guillermoarguellogonzalez negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT joseaquinoesperanza negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT danielsalvador negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT rosabretonromero negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT carlosdelriobermudez negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT jorgetello negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork
AT sebastianmenke negationrecognitioninclinicalnaturallanguageprocessingusingacombinationofthenegexalgorithmandaconvolutionalneuralnetwork