High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, a...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí: Zhang, Yichi, Cai, Tianrun, Yu, Sheng, Cho, Kelly, Hong, Chuan, Sun, Jiehuan, Huang, Jie, Ho, Yuk-Lam, Ananthakrishnan, Ashwin N, Xia, Zongqi, Shaw, Stanley Y, Gainer, Vivian, Castro, Victor, Link, Nicholas, Honerlaw, Jacqueline, Huang, Sicong, Gagnon, David, Karlson, Elizabeth W, Plenge, Robert M, Szolovits, Peter, Savova, Guergana, Churchill, Susanne, O’Donnell, Christopher, Murphy, Shawn N, Gaziano, J Michael, Kohane, Isaac, Cai, Tianxi, Liao, Katherine P
Rannpháirtithe: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Formáid: Alt
Teanga:English
Foilsithe / Cruthaithe: Springer Science and Business Media LLC 2021
Rochtain ar líne:https://hdl.handle.net/1721.1/134827
_version_ 1826212153450823680
author Zhang, Yichi
Cai, Tianrun
Yu, Sheng
Cho, Kelly
Hong, Chuan
Sun, Jiehuan
Huang, Jie
Ho, Yuk-Lam
Ananthakrishnan, Ashwin N
Xia, Zongqi
Shaw, Stanley Y
Gainer, Vivian
Castro, Victor
Link, Nicholas
Honerlaw, Jacqueline
Huang, Sicong
Gagnon, David
Karlson, Elizabeth W
Plenge, Robert M
Szolovits, Peter
Savova, Guergana
Churchill, Susanne
O’Donnell, Christopher
Murphy, Shawn N
Gaziano, J Michael
Kohane, Isaac
Cai, Tianxi
Liao, Katherine P
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Zhang, Yichi
Cai, Tianrun
Yu, Sheng
Cho, Kelly
Hong, Chuan
Sun, Jiehuan
Huang, Jie
Ho, Yuk-Lam
Ananthakrishnan, Ashwin N
Xia, Zongqi
Shaw, Stanley Y
Gainer, Vivian
Castro, Victor
Link, Nicholas
Honerlaw, Jacqueline
Huang, Sicong
Gagnon, David
Karlson, Elizabeth W
Plenge, Robert M
Szolovits, Peter
Savova, Guergana
Churchill, Susanne
O’Donnell, Christopher
Murphy, Shawn N
Gaziano, J Michael
Kohane, Isaac
Cai, Tianxi
Liao, Katherine P
author_sort Zhang, Yichi
collection MIT
description © 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
first_indexed 2024-09-23T15:16:53Z
format Article
id mit-1721.1/134827
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:16:53Z
publishDate 2021
publisher Springer Science and Business Media LLC
record_format dspace
spelling mit-1721.1/1348272023-12-18T19:58:39Z High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) Zhang, Yichi Cai, Tianrun Yu, Sheng Cho, Kelly Hong, Chuan Sun, Jiehuan Huang, Jie Ho, Yuk-Lam Ananthakrishnan, Ashwin N Xia, Zongqi Shaw, Stanley Y Gainer, Vivian Castro, Victor Link, Nicholas Honerlaw, Jacqueline Huang, Sicong Gagnon, David Karlson, Elizabeth W Plenge, Robert M Szolovits, Peter Savova, Guergana Churchill, Susanne O’Donnell, Christopher Murphy, Shawn N Gaziano, J Michael Kohane, Isaac Cai, Tianxi Liao, Katherine P Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). 2021-10-27T20:09:22Z 2021-10-27T20:09:22Z 2019 2021-03-26T16:44:36Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/134827 en 10.1038/S41596-019-0227-6 Nature Protocols Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Springer Science and Business Media LLC PMC
spellingShingle Zhang, Yichi
Cai, Tianrun
Yu, Sheng
Cho, Kelly
Hong, Chuan
Sun, Jiehuan
Huang, Jie
Ho, Yuk-Lam
Ananthakrishnan, Ashwin N
Xia, Zongqi
Shaw, Stanley Y
Gainer, Vivian
Castro, Victor
Link, Nicholas
Honerlaw, Jacqueline
Huang, Sicong
Gagnon, David
Karlson, Elizabeth W
Plenge, Robert M
Szolovits, Peter
Savova, Guergana
Churchill, Susanne
O’Donnell, Christopher
Murphy, Shawn N
Gaziano, J Michael
Kohane, Isaac
Cai, Tianxi
Liao, Katherine P
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title_full High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title_fullStr High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title_full_unstemmed High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title_short High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
title_sort high throughput phenotyping with electronic medical record data using a common semi supervised approach phecap
url https://hdl.handle.net/1721.1/134827
work_keys_str_mv AT zhangyichi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT caitianrun highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT yusheng highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT chokelly highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT hongchuan highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT sunjiehuan highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT huangjie highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT hoyuklam highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT ananthakrishnanashwinn highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT xiazongqi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT shawstanleyy highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT gainervivian highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT castrovictor highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT linknicholas highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT honerlawjacqueline highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT huangsicong highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT gagnondavid highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT karlsonelizabethw highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT plengerobertm highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT szolovitspeter highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT savovaguergana highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT churchillsusanne highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT odonnellchristopher highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT murphyshawnn highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT gazianojmichael highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT kohaneisaac highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT caitianxi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap
AT liaokatherinep highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap