Data-driven identification of ageing-related diseases from electronic health records
Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify...
Main Authors: | , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2021-02-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-021-82459-y |
_version_ | 1818432340147306496 |
---|---|
author | Valerie Kuan Helen C. Fraser Melanie Hingorani Spiros Denaxas Arturo Gonzalez-Izquierdo Kenan Direk Dorothea Nitsch Rohini Mathur Constantinos A. Parisinos R. Thomas Lumbers Reecha Sofat Ian C. K. Wong Juan P. Casas Janet M. Thornton Harry Hemingway Linda Partridge Aroon D. Hingorani |
author_facet | Valerie Kuan Helen C. Fraser Melanie Hingorani Spiros Denaxas Arturo Gonzalez-Izquierdo Kenan Direk Dorothea Nitsch Rohini Mathur Constantinos A. Parisinos R. Thomas Lumbers Reecha Sofat Ian C. K. Wong Juan P. Casas Janet M. Thornton Harry Hemingway Linda Partridge Aroon D. Hingorani |
author_sort | Valerie Kuan |
collection | DOAJ |
description | Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways. |
first_indexed | 2024-12-14T16:03:38Z |
format | Article |
id | doaj.art-a4cfa63aeff54b99ac644b906f56e8bb |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-12-14T16:03:38Z |
publishDate | 2021-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-a4cfa63aeff54b99ac644b906f56e8bb2022-12-21T22:55:08ZengNature PortfolioScientific Reports2045-23222021-02-0111111710.1038/s41598-021-82459-yData-driven identification of ageing-related diseases from electronic health recordsValerie Kuan0Helen C. Fraser1Melanie Hingorani2Spiros Denaxas3Arturo Gonzalez-Izquierdo4Kenan Direk5Dorothea Nitsch6Rohini Mathur7Constantinos A. Parisinos8R. Thomas Lumbers9Reecha Sofat10Ian C. K. Wong11Juan P. Casas12Janet M. Thornton13Harry Hemingway14Linda Partridge15Aroon D. Hingorani16Institute of Health Informatics, University College LondonInstitute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College LondonMoorfields Eye HospitalInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonDepartment of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical MedicineDepartment of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical MedicineInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonSchool of Pharmacy, University College LondonDepartment of Medicine, Brigham and Women’s Hospital, Harvard Medical SchoolEuropean Molecular Biology Laboratory - European Bioinformatics Institute EMBL-EBI, Wellcome Genome CampusInstitute of Health Informatics, University College LondonInstitute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College LondonHealth Data Research UK London, University College LondonAbstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways.https://doi.org/10.1038/s41598-021-82459-y |
spellingShingle | Valerie Kuan Helen C. Fraser Melanie Hingorani Spiros Denaxas Arturo Gonzalez-Izquierdo Kenan Direk Dorothea Nitsch Rohini Mathur Constantinos A. Parisinos R. Thomas Lumbers Reecha Sofat Ian C. K. Wong Juan P. Casas Janet M. Thornton Harry Hemingway Linda Partridge Aroon D. Hingorani Data-driven identification of ageing-related diseases from electronic health records Scientific Reports |
title | Data-driven identification of ageing-related diseases from electronic health records |
title_full | Data-driven identification of ageing-related diseases from electronic health records |
title_fullStr | Data-driven identification of ageing-related diseases from electronic health records |
title_full_unstemmed | Data-driven identification of ageing-related diseases from electronic health records |
title_short | Data-driven identification of ageing-related diseases from electronic health records |
title_sort | data driven identification of ageing related diseases from electronic health records |
url | https://doi.org/10.1038/s41598-021-82459-y |
work_keys_str_mv | AT valeriekuan datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT helencfraser datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT melaniehingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT spirosdenaxas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT arturogonzalezizquierdo datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT kenandirek datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT dorotheanitsch datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT rohinimathur datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT constantinosaparisinos datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT rthomaslumbers datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT reechasofat datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT ianckwong datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT juanpcasas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT janetmthornton datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT harryhemingway datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT lindapartridge datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords AT aroondhingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords |