High-throughput multimodal automated phenotyping (MAP) with application to PheWAS
© 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput...
Main Authors: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Oxford University Press (OUP)
2021
|
Online Access: | https://hdl.handle.net/1721.1/134057 |
_version_ | 1826192139073093632 |
---|---|
author | Liao, Katherine P Sun, Jiehuan Cai, Tianrun A Link, Nicholas Hong, Chuan Huang, Jie Huffman, Jennifer E Gronsbell, Jessica Zhang, Yichi Ho, Yuk-Lam Castro, Victor Gainer, Vivian Murphy, Shawn N O’Donnell, Christopher J Gaziano, J Michael Cho, Kelly Szolovits, Peter Kohane, Isaac S Yu, Sheng Cai, Tianxi |
author_facet | Liao, Katherine P Sun, Jiehuan Cai, Tianrun A Link, Nicholas Hong, Chuan Huang, Jie Huffman, Jennifer E Gronsbell, Jessica Zhang, Yichi Ho, Yuk-Lam Castro, Victor Gainer, Vivian Murphy, Shawn N O’Donnell, Christopher J Gaziano, J Michael Cho, Kelly Szolovits, Peter Kohane, Isaac S Yu, Sheng Cai, Tianxi |
author_sort | Liao, Katherine P |
collection | MIT |
description | © 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS. |
first_indexed | 2024-09-23T09:06:50Z |
format | Article |
id | mit-1721.1/134057 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T09:06:50Z |
publishDate | 2021 |
publisher | Oxford University Press (OUP) |
record_format | dspace |
spelling | mit-1721.1/1340572021-10-28T04:47:57Z High-throughput multimodal automated phenotyping (MAP) with application to PheWAS Liao, Katherine P Sun, Jiehuan Cai, Tianrun A Link, Nicholas Hong, Chuan Huang, Jie Huffman, Jennifer E Gronsbell, Jessica Zhang, Yichi Ho, Yuk-Lam Castro, Victor Gainer, Vivian Murphy, Shawn N O’Donnell, Christopher J Gaziano, J Michael Cho, Kelly Szolovits, Peter Kohane, Isaac S Yu, Sheng Cai, Tianxi © 2019 The Author(s). Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS. 2021-10-27T19:57:50Z 2021-10-27T19:57:50Z 2019 2021-01-26T19:09:17Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/134057 en 10.1093/JAMIA/OCZ066 Journal of the American Medical Informatics Association Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Oxford University Press (OUP) bioRxiv |
spellingShingle | Liao, Katherine P Sun, Jiehuan Cai, Tianrun A Link, Nicholas Hong, Chuan Huang, Jie Huffman, Jennifer E Gronsbell, Jessica Zhang, Yichi Ho, Yuk-Lam Castro, Victor Gainer, Vivian Murphy, Shawn N O’Donnell, Christopher J Gaziano, J Michael Cho, Kelly Szolovits, Peter Kohane, Isaac S Yu, Sheng Cai, Tianxi High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title | High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title_full | High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title_fullStr | High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title_full_unstemmed | High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title_short | High-throughput multimodal automated phenotyping (MAP) with application to PheWAS |
title_sort | high throughput multimodal automated phenotyping map with application to phewas |
url | https://hdl.handle.net/1721.1/134057 |
work_keys_str_mv | AT liaokatherinep highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT sunjiehuan highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT caitianruna highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT linknicholas highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT hongchuan highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT huangjie highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT huffmanjennifere highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT gronsbelljessica highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT zhangyichi highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT hoyuklam highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT castrovictor highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT gainervivian highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT murphyshawnn highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT odonnellchristopherj highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT gazianojmichael highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT chokelly highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT szolovitspeter highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT kohaneisaacs highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT yusheng highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas AT caitianxi highthroughputmultimodalautomatedphenotypingmapwithapplicationtophewas |