Data-driven identification of ageing-related diseases from electronic health records

Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify...

Full description

Bibliographic Details
Main Authors: Valerie Kuan, Helen C. Fraser, Melanie Hingorani, Spiros Denaxas, Arturo Gonzalez-Izquierdo, Kenan Direk, Dorothea Nitsch, Rohini Mathur, Constantinos A. Parisinos, R. Thomas Lumbers, Reecha Sofat, Ian C. K. Wong, Juan P. Casas, Janet M. Thornton, Harry Hemingway, Linda Partridge, Aroon D. Hingorani
Format: Article
Language:English
Published: Nature Portfolio 2021-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-82459-y
_version_ 1818432340147306496
author Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
author_facet Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
author_sort Valerie Kuan
collection DOAJ
description Abstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways.
first_indexed 2024-12-14T16:03:38Z
format Article
id doaj.art-a4cfa63aeff54b99ac644b906f56e8bb
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-14T16:03:38Z
publishDate 2021-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-a4cfa63aeff54b99ac644b906f56e8bb2022-12-21T22:55:08ZengNature PortfolioScientific Reports2045-23222021-02-0111111710.1038/s41598-021-82459-yData-driven identification of ageing-related diseases from electronic health recordsValerie Kuan0Helen C. Fraser1Melanie Hingorani2Spiros Denaxas3Arturo Gonzalez-Izquierdo4Kenan Direk5Dorothea Nitsch6Rohini Mathur7Constantinos A. Parisinos8R. Thomas Lumbers9Reecha Sofat10Ian C. K. Wong11Juan P. Casas12Janet M. Thornton13Harry Hemingway14Linda Partridge15Aroon D. Hingorani16Institute of Health Informatics, University College LondonInstitute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College LondonMoorfields Eye HospitalInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonDepartment of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical MedicineDepartment of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical MedicineInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonInstitute of Health Informatics, University College LondonSchool of Pharmacy, University College LondonDepartment of Medicine, Brigham and Women’s Hospital, Harvard Medical SchoolEuropean Molecular Biology Laboratory - European Bioinformatics Institute EMBL-EBI, Wellcome Genome CampusInstitute of Health Informatics, University College LondonInstitute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College LondonHealth Data Research UK London, University College LondonAbstract Reducing the burden of late-life morbidity requires an understanding of the mechanisms of ageing-related diseases (ARDs), defined as diseases that accumulate with increasing age. This has been hampered by the lack of formal criteria to identify ARDs. Here, we present a framework to identify ARDs using two complementary methods consisting of unsupervised machine learning and actuarial techniques, which we applied to electronic health records (EHRs) from 3,009,048 individuals in England using primary care data from the Clinical Practice Research Datalink (CPRD) linked to the Hospital Episode Statistics admitted patient care dataset between 1 April 2010 and 31 March 2015 (mean age 49.7 years (s.d. 18.6), 51% female, 70% white ethnicity). We grouped 278 high-burden diseases into nine main clusters according to their patterns of disease onset, using a hierarchical agglomerative clustering algorithm. Four of these clusters, encompassing 207 diseases spanning diverse organ systems and clinical specialties, had rates of disease onset that clearly increased with chronological age. However, the ages of onset for these four clusters were strikingly different, with median age of onset 82 years (IQR 82–83) for Cluster 1, 77 years (IQR 75–77) for Cluster 2, 69 years (IQR 66–71) for Cluster 3 and 57 years (IQR 54–59) for Cluster 4. Fitting to ageing-related actuarial models confirmed that the vast majority of these 207 diseases had a high probability of being ageing-related. Cardiovascular diseases and cancers were highly represented, while benign neoplastic, skin and psychiatric conditions were largely absent from the four ageing-related clusters. Our framework identifies and clusters ARDs and can form the basis for fundamental and translational research into ageing pathways.https://doi.org/10.1038/s41598-021-82459-y
spellingShingle Valerie Kuan
Helen C. Fraser
Melanie Hingorani
Spiros Denaxas
Arturo Gonzalez-Izquierdo
Kenan Direk
Dorothea Nitsch
Rohini Mathur
Constantinos A. Parisinos
R. Thomas Lumbers
Reecha Sofat
Ian C. K. Wong
Juan P. Casas
Janet M. Thornton
Harry Hemingway
Linda Partridge
Aroon D. Hingorani
Data-driven identification of ageing-related diseases from electronic health records
Scientific Reports
title Data-driven identification of ageing-related diseases from electronic health records
title_full Data-driven identification of ageing-related diseases from electronic health records
title_fullStr Data-driven identification of ageing-related diseases from electronic health records
title_full_unstemmed Data-driven identification of ageing-related diseases from electronic health records
title_short Data-driven identification of ageing-related diseases from electronic health records
title_sort data driven identification of ageing related diseases from electronic health records
url https://doi.org/10.1038/s41598-021-82459-y
work_keys_str_mv AT valeriekuan datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT helencfraser datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT melaniehingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT spirosdenaxas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT arturogonzalezizquierdo datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT kenandirek datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT dorotheanitsch datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT rohinimathur datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT constantinosaparisinos datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT rthomaslumbers datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT reechasofat datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT ianckwong datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT juanpcasas datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT janetmthornton datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT harryhemingway datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT lindapartridge datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords
AT aroondhingorani datadrivenidentificationofageingrelateddiseasesfromelectronichealthrecords