High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, a...
Príomhchruthaitheoirí: | , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Rannpháirtithe: | |
Formáid: | Alt |
Teanga: | English |
Foilsithe / Cruthaithe: |
Springer Science and Business Media LLC
2021
|
Rochtain ar líne: | https://hdl.handle.net/1721.1/134827 |
_version_ | 1826212153450823680 |
---|---|
author | Zhang, Yichi Cai, Tianrun Yu, Sheng Cho, Kelly Hong, Chuan Sun, Jiehuan Huang, Jie Ho, Yuk-Lam Ananthakrishnan, Ashwin N Xia, Zongqi Shaw, Stanley Y Gainer, Vivian Castro, Victor Link, Nicholas Honerlaw, Jacqueline Huang, Sicong Gagnon, David Karlson, Elizabeth W Plenge, Robert M Szolovits, Peter Savova, Guergana Churchill, Susanne O’Donnell, Christopher Murphy, Shawn N Gaziano, J Michael Kohane, Isaac Cai, Tianxi Liao, Katherine P |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Zhang, Yichi Cai, Tianrun Yu, Sheng Cho, Kelly Hong, Chuan Sun, Jiehuan Huang, Jie Ho, Yuk-Lam Ananthakrishnan, Ashwin N Xia, Zongqi Shaw, Stanley Y Gainer, Vivian Castro, Victor Link, Nicholas Honerlaw, Jacqueline Huang, Sicong Gagnon, David Karlson, Elizabeth W Plenge, Robert M Szolovits, Peter Savova, Guergana Churchill, Susanne O’Donnell, Christopher Murphy, Shawn N Gaziano, J Michael Kohane, Isaac Cai, Tianxi Liao, Katherine P |
author_sort | Zhang, Yichi |
collection | MIT |
description | © 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). |
first_indexed | 2024-09-23T15:16:53Z |
format | Article |
id | mit-1721.1/134827 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T15:16:53Z |
publishDate | 2021 |
publisher | Springer Science and Business Media LLC |
record_format | dspace |
spelling | mit-1721.1/1348272023-12-18T19:58:39Z High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) Zhang, Yichi Cai, Tianrun Yu, Sheng Cho, Kelly Hong, Chuan Sun, Jiehuan Huang, Jie Ho, Yuk-Lam Ananthakrishnan, Ashwin N Xia, Zongqi Shaw, Stanley Y Gainer, Vivian Castro, Victor Link, Nicholas Honerlaw, Jacqueline Huang, Sicong Gagnon, David Karlson, Elizabeth W Plenge, Robert M Szolovits, Peter Savova, Guergana Churchill, Susanne O’Donnell, Christopher Murphy, Shawn N Gaziano, J Michael Kohane, Isaac Cai, Tianxi Liao, Katherine P Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). 2021-10-27T20:09:22Z 2021-10-27T20:09:22Z 2019 2021-03-26T16:44:36Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/134827 en 10.1038/S41596-019-0227-6 Nature Protocols Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Springer Science and Business Media LLC PMC |
spellingShingle | Zhang, Yichi Cai, Tianrun Yu, Sheng Cho, Kelly Hong, Chuan Sun, Jiehuan Huang, Jie Ho, Yuk-Lam Ananthakrishnan, Ashwin N Xia, Zongqi Shaw, Stanley Y Gainer, Vivian Castro, Victor Link, Nicholas Honerlaw, Jacqueline Huang, Sicong Gagnon, David Karlson, Elizabeth W Plenge, Robert M Szolovits, Peter Savova, Guergana Churchill, Susanne O’Donnell, Christopher Murphy, Shawn N Gaziano, J Michael Kohane, Isaac Cai, Tianxi Liao, Katherine P High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title | High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title_full | High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title_fullStr | High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title_full_unstemmed | High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title_short | High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP) |
title_sort | high throughput phenotyping with electronic medical record data using a common semi supervised approach phecap |
url | https://hdl.handle.net/1721.1/134827 |
work_keys_str_mv | AT zhangyichi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT caitianrun highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT yusheng highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT chokelly highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT hongchuan highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT sunjiehuan highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT huangjie highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT hoyuklam highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT ananthakrishnanashwinn highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT xiazongqi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT shawstanleyy highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT gainervivian highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT castrovictor highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT linknicholas highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT honerlawjacqueline highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT huangsicong highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT gagnondavid highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT karlsonelizabethw highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT plengerobertm highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT szolovitspeter highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT savovaguergana highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT churchillsusanne highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT odonnellchristopher highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT murphyshawnn highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT gazianojmichael highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT kohaneisaac highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT caitianxi highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap AT liaokatherinep highthroughputphenotypingwithelectronicmedicalrecorddatausingacommonsemisupervisedapproachphecap |