Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs

Abstract Background Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not repres...

Full description

Bibliographic Details
Main Authors:	Rosanne J. Turner, Femke Coenen, Femke Roelofs, Karin Hagoort, Aki Härmä, Peter D. Grünwald, Fleur P. Velders, Floortje E. Scheepers
Format:	Article
Language:	English
Published:	BMC 2022-06-01
Series:	BMC Psychiatry
Subjects:	Transdiagnostic psychiatry Natural language processing Machine learning Depression Hamilton SF-36
Online Access:	https://doi.org/10.1186/s12888-022-04058-z

_version_	1818239739187167232
author	Rosanne J. Turner Femke Coenen Femke Roelofs Karin Hagoort Aki Härmä Peter D. Grünwald Fleur P. Velders Floortje E. Scheepers
author_facet	Rosanne J. Turner Femke Coenen Femke Roelofs Karin Hagoort Aki Härmä Peter D. Grünwald Fleur P. Velders Floortje E. Scheepers
author_sort	Rosanne J. Turner
collection	DOAJ
description	Abstract Background Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). Methods A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic outcome measures currently used in psychiatry research and compared to the HDRS (as a gold standard) through systematic review, resulting in an ideal set of variables. To extract these from EHR data, a semi-rule based NLP pipeline was constructed and tailored to the candidate variables using Prodigy. Classification accuracy and F1-scores were calculated and pipeline output was compared to HDRS scores using clinical notes from patients admitted in 2019 and 2020. Results Analysis of 34 questionnaires answered by clinical staff resulted in four themes defining treatment outcomes: symptom reduction, general well-being, social functioning and personalization. Systematic review revealed 242 different transdiagnostic outcome measures, with the 36-item Short-Form Survey for quality of life (SF36) being used most consistently, showing substantial overlap with the themes from the qualitative study. Comparing SF36 to HDRS scores in 26 studies revealed moderate to good correlations (0.62—0.79) and good positive predictive values (0.75—0.88). The NLP pipeline developed with notes from 22,170 patients reached an accuracy of 95 to 99 percent (F1 scores: 0.38 – 0.86) on detecting these themes, evaluated on data from 361 patients. Conclusions The NLP pipeline developed in this study extracts outcome measures from the EHR that cater specifically to the needs of clinical staff and align with outcome measures used to detect treatment effects in clinical trials.
first_indexed	2024-12-12T13:02:20Z
format	Article
id	doaj.art-bff1fd3a8a034deb9d011be226b18832
institution	Directory Open Access Journal
issn	1471-244X
language	English
last_indexed	2024-12-12T13:02:20Z
publishDate	2022-06-01
publisher	BMC
record_format	Article
series	BMC Psychiatry
spelling	doaj.art-bff1fd3a8a034deb9d011be226b188322022-12-22T00:23:45ZengBMCBMC Psychiatry1471-244X2022-06-0122111110.1186/s12888-022-04058-zInformation extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needsRosanne J. Turner0Femke Coenen1Femke Roelofs2Karin Hagoort3Aki Härmä4Peter D. Grünwald5Fleur P. Velders6Floortje E. Scheepers7University Medical Center Utrecht, Brain CenterUniversity Medical Center Utrecht, Brain CenterUniversity Medical Center Utrecht, Brain CenterUniversity Medical Center Utrecht, Brain CenterPhilips ResearchMachine Learning Group, CWIUniversity Medical Center Utrecht, Brain CenterUniversity Medical Center Utrecht, Brain CenterAbstract Background Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). Methods A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic outcome measures currently used in psychiatry research and compared to the HDRS (as a gold standard) through systematic review, resulting in an ideal set of variables. To extract these from EHR data, a semi-rule based NLP pipeline was constructed and tailored to the candidate variables using Prodigy. Classification accuracy and F1-scores were calculated and pipeline output was compared to HDRS scores using clinical notes from patients admitted in 2019 and 2020. Results Analysis of 34 questionnaires answered by clinical staff resulted in four themes defining treatment outcomes: symptom reduction, general well-being, social functioning and personalization. Systematic review revealed 242 different transdiagnostic outcome measures, with the 36-item Short-Form Survey for quality of life (SF36) being used most consistently, showing substantial overlap with the themes from the qualitative study. Comparing SF36 to HDRS scores in 26 studies revealed moderate to good correlations (0.62—0.79) and good positive predictive values (0.75—0.88). The NLP pipeline developed with notes from 22,170 patients reached an accuracy of 95 to 99 percent (F1 scores: 0.38 – 0.86) on detecting these themes, evaluated on data from 361 patients. Conclusions The NLP pipeline developed in this study extracts outcome measures from the EHR that cater specifically to the needs of clinical staff and align with outcome measures used to detect treatment effects in clinical trials.https://doi.org/10.1186/s12888-022-04058-zTransdiagnostic psychiatryNatural language processingMachine learningDepressionHamiltonSF-36
spellingShingle	Rosanne J. Turner Femke Coenen Femke Roelofs Karin Hagoort Aki Härmä Peter D. Grünwald Fleur P. Velders Floortje E. Scheepers Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs BMC Psychiatry Transdiagnostic psychiatry Natural language processing Machine learning Depression Hamilton SF-36
title	Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_full	Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_fullStr	Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_full_unstemmed	Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_short	Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs
title_sort	information extraction from free text for aiding transdiagnostic psychiatry constructing nlp pipelines tailored to clinicians needs
topic	Transdiagnostic psychiatry Natural language processing Machine learning Depression Hamilton SF-36
url	https://doi.org/10.1186/s12888-022-04058-z
work_keys_str_mv	AT rosannejturner informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT femkecoenen informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT femkeroelofs informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT karinhagoort informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT akiharma informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT peterdgrunwald informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT fleurpvelders informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds AT floortjeescheepers informationextractionfromfreetextforaidingtransdiagnosticpsychiatryconstructingnlppipelinestailoredtocliniciansneeds

Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs

Similar Items