Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System
Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2007-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.1177/117693510700300017 |
_version_ | 1819016750040088576 |
---|---|
author | Michael Graiser Susan G. Moore Rochelle Victor Ashley Hilliard Leroy Hill Michael S. Keehan Christopher R. Flowers M.D., M.S. |
author_facet | Michael Graiser Susan G. Moore Rochelle Victor Ashley Hilliard Leroy Hill Michael S. Keehan Christopher R. Flowers M.D., M.S. |
author_sort | Michael Graiser |
collection | DOAJ |
description | Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis. |
first_indexed | 2024-12-21T02:52:35Z |
format | Article |
id | doaj.art-22185058d5d345e99da2b623e5bd5570 |
institution | Directory Open Access Journal |
issn | 1176-9351 |
language | English |
last_indexed | 2024-12-21T02:52:35Z |
publishDate | 2007-01-01 |
publisher | SAGE Publishing |
record_format | Article |
series | Cancer Informatics |
spelling | doaj.art-22185058d5d345e99da2b623e5bd55702022-12-21T19:18:25ZengSAGE PublishingCancer Informatics1176-93512007-01-01310.1177/117693510700300017Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database SystemMichael Graiser0Susan G. Moore1Rochelle Victor2Ashley Hilliard3Leroy Hill4Michael S. Keehan5Christopher R. Flowers M.D., M.S.6Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.NuTec Health Systems, LaGrange, TX, U.S.A.Emory University School of Medicine, Winship Cancer Institute, Oncology Informatics, 1365 Clifton Road, N.E., Atlanta, GA, U.S.A.Background Large linked databases (LLDB) represent a novel resource for cancer outcomes research. However, accurate means of identifying a patient population of interest within these LLDBs can be challenging. Our research group developed a fully integrated platform that provides a means of combining independent legacy databases into a single cancer-focused LLDB system. We compared the sensitivity and specificity of several SQL-based query strategies for identifying a histologic lymphoma subtype in this LLDB to determine the most accurate legacy data source for identifying a specific cancer patient population. Methods Query strategies were developed to identify patients with follicular lymphoma from a LLDB of cancer registry data, electronic medical records (EMR), laboratory, administrative, pharmacy, and other clinical data. Queries were performed using common diagnostic codes (ICD-9), cancer registry histology codes (ICD-O), and text searches of EMRs. We reviewed medical records and pathology reports to confirm each diagnosis and calculated the sensitivity and specificity for each query strategy. Results Together the queries identified 1538 potential cases of follicular lymphoma. Review of pathology and other medical reports confirmed 415 cases of follicular lymphoma, 300 pathology-verified and 115 verified from other medical reports. The query using ICD-O codes was highly specific (96%). Queries using text strings varied in sensitivity (range 7–92%) and specificity (range 86–99%). Queries using ICD-9 codes were both less sensitive (34–44%) and specific (35–87%). Conclusions Queries of linked-cancer databases that include cancer registry data should utilize ICD-O codes or employ structured free-text searches to identify patient populations with a precise histologic diagnosis.https://doi.org/10.1177/117693510700300017 |
spellingShingle | Michael Graiser Susan G. Moore Rochelle Victor Ashley Hilliard Leroy Hill Michael S. Keehan Christopher R. Flowers M.D., M.S. Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System Cancer Informatics |
title | Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System |
title_full | Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System |
title_fullStr | Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System |
title_full_unstemmed | Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System |
title_short | Development of Query Strategies to Identify a Histologic Lymphoma Subtype in a Large Linked Database System |
title_sort | development of query strategies to identify a histologic lymphoma subtype in a large linked database system |
url | https://doi.org/10.1177/117693510700300017 |
work_keys_str_mv | AT michaelgraiser developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT susangmoore developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT rochellevictor developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT ashleyhilliard developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT leroyhill developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT michaelskeehan developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem AT christopherrflowersmdms developmentofquerystrategiestoidentifyahistologiclymphomasubtypeinalargelinkeddatabasesystem |