Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System

Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining...

Full description

Bibliographic Details
Main Authors: Michael Graiser, Susan G. Moore, Rochelle Victor, Ashley Hilliard, Leroy Hill, Michael S. Keehan, Christopher R. Flowers M.D., M.S.
Format: Article
Language:English
Published: SAGE Publishing 2007-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.1177/117693510700300017
_version_ 1819016750040088576
author Michael Graiser
Susan G. Moore
Rochelle Victor
Ashley Hilliard
Leroy Hill
Michael S. Keehan
Christopher R. Flowers M.D., M.S.
author_facet Michael Graiser
Susan G. Moore
Rochelle Victor
Ashley Hilliard
Leroy Hill
Michael S. Keehan
Christopher R. Flowers M.D., M.S.
author_sort Michael Graiser
collection DOAJ
description Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.
first_indexed 2024-12-21T02:52:35Z
format Article
id doaj.art-22185058d5d345e99da2b623e5bd5570
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-21T02:52:35Z
publishDate 2007-01-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-22185058d5d345e99da2b623e5bd55702022-12-21T19:18:25ZengSAGE PublishingCancer Informatics1176-93512007-01-01310.1177/117693510700300017Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database SystemMichael Graiser0Susan G. Moore1Rochelle Victor2Ashley Hilliard3Leroy Hill4Michael S. Keehan5Christopher R. Flowers M.D., M.S.6Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.NuTec Health Systems, LaGrange, TX, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.https://doi.org/10.1177/117693510700300017
spellingShingle Michael Graiser
Susan G. Moore
Rochelle Victor
Ashley Hilliard
Leroy Hill
Michael S. Keehan
Christopher R. Flowers M.D., M.S.
Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
Cancer Informatics
title Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
title_full Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
title_fullStr Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
title_full_unstemmed Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
title_short Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
title_sort development of query strategies to identify a histologic lymphoma subtype in a large linked database system
url https://doi.org/10.1177/117693510700300017
work_keys_str_mv AT michaelgraiser developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT susangmoore developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT rochellevictor developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT ashleyhilliard developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT leroyhill developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT michaelskeehan developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem
AT christopherrflowersmdms developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem