High-throughput multimodal automated phenotyping (MAP) with application to PheWAS

© 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput...

Full description

Bibliographic Details
Main Authors: Liao, Katherine P, Sun, Jiehuan, Cai, Tianrun A, Link, Nicholas, Hong, Chuan, Huang, Jie, Huffman, Jennifer E, Gronsbell, Jessica, Zhang, Yichi, Ho, Yuk-Lam, Castro, Victor, Gainer, Vivian, Murphy, Shawn N, O’Donnell, Christopher J, Gaziano, J Michael, Cho, Kelly, Szolovits, Peter, Kohane, Isaac S, Yu, Sheng, Cai, Tianxi
Format: Article
Language:English
Published: Oxford University Press (OUP) 2021
Online Access:https://hdl.handle.net/1721.1/134057
_version_ 1826192139073093632
author Liao, Katherine P
Sun, Jiehuan
Cai, Tianrun A
Link, Nicholas
Hong, Chuan
Huang, Jie
Huffman, Jennifer E
Gronsbell, Jessica
Zhang, Yichi
Ho, Yuk-Lam
Castro, Victor
Gainer, Vivian
Murphy, Shawn N
O’Donnell, Christopher J
Gaziano, J Michael
Cho, Kelly
Szolovits, Peter
Kohane, Isaac S
Yu, Sheng
Cai, Tianxi
author_facet Liao, Katherine P
Sun, Jiehuan
Cai, Tianrun A
Link, Nicholas
Hong, Chuan
Huang, Jie
Huffman, Jennifer E
Gronsbell, Jessica
Zhang, Yichi
Ho, Yuk-Lam
Castro, Victor
Gainer, Vivian
Murphy, Shawn N
O’Donnell, Christopher J
Gaziano, J Michael
Cho, Kelly
Szolovits, Peter
Kohane, Isaac S
Yu, Sheng
Cai, Tianxi
author_sort Liao, Katherine P
collection MIT
description © 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
first_indexed 2024-09-23T09:06:50Z
format Article
id mit-1721.1/134057
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:06:50Z
publishDate 2021
publisher Oxford University Press (OUP)
record_format dspace
spelling mit-1721.1/1340572021-10-28T04:47:57Z High-throughput multimodal automated phenotyping (MAP) with application to PheWAS Liao, Katherine P Sun, Jiehuan Cai, Tianrun A Link, Nicholas Hong, Chuan Huang, Jie Huffman, Jennifer E Gronsbell, Jessica Zhang, Yichi Ho, Yuk-Lam Castro, Victor Gainer, Vivian Murphy, Shawn N O’Donnell, Christopher J Gaziano, J Michael Cho, Kelly Szolovits, Peter Kohane, Isaac S Yu, Sheng Cai, Tianxi © 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS. 2021-10-27T19:57:50Z 2021-10-27T19:57:50Z 2019 2021-01-26T19:09:17Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/134057 en 10.1093/JAMIA/OCZ066 Journal of the American Medical Informatics Association Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Oxford University Press (OUP) bioRxiv
spellingShingle Liao, Katherine P
Sun, Jiehuan
Cai, Tianrun A
Link, Nicholas
Hong, Chuan
Huang, Jie
Huffman, Jennifer E
Gronsbell, Jessica
Zhang, Yichi
Ho, Yuk-Lam
Castro, Victor
Gainer, Vivian
Murphy, Shawn N
O’Donnell, Christopher J
Gaziano, J Michael
Cho, Kelly
Szolovits, Peter
Kohane, Isaac S
Yu, Sheng
Cai, Tianxi
High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title_full High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title_fullStr High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title_full_unstemmed High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title_short High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
title_sort high throughput multimodal automated phenotyping map with application to phewas
url https://hdl.handle.net/1721.1/134057
work_keys_str_mv AT liaokatherinep highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT sunjiehuan highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT caitianruna highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT linknicholas highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT hongchuan highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT huangjie highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT huffmanjennifere highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT gronsbelljessica highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT zhangyichi highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT hoyuklam highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT castrovictor highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT gainervivian highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT murphyshawnn highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT odonnellchristopherj highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT gazianojmichael highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT chokelly highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT szolovitspeter highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT kohaneisaacs highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT yusheng highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas
AT caitianxi highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas