Summary: | <p>Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Electrocardiogram (ECG) is an important clinical measurement of cardiac activity. The major challenge in incorporating ECG time-series into CVD risk metrics is extracting features and classifying the ECG time-series into appropriate ECG abnormality groups. Therefore we set out to use machine learning to address this challenge.</p>
<p>We used machine learning to analyse 12-lead, 500Hz, 10-s electrocardiogram (ECG) data provided by the Mortara device to perform ECG signal classification in a large cohort study of 25,019 participants in the China Kadoorie Biobank (CKB). We compared the performance of 11 representative traditional machine learning algorithms for four-class classification of normal, “arrhythmia”, “ischemia”, and “hypertrophy”. We extracted 72 novel features and improved the 4-class classification accuracy from 53.5% using only Mortara features to 77.3%. We demonstrated that machine learning models could classify ECG with high accuracy without any knowledge of the diagnosis criteria, and the top features identified by the best model (SGB-F84) were very different from the ones commonly used in the clinic.</p>
<p>We further proposed a novel neural network architecture family - the Layer-wise Convex Network (LCN), and a neural architecture search algorithm - the AutoNet, to classify the ECG raw signals end-to-end without signal denoising, preprocessing, nor feature extraction. We benchmarked the AutoNet-LCN with the state-of-the-art ResNet-based model on three datasets: CKB, PhysioNet, and ICBEB. The AutoNet generated LCNs has no more than 2% of the parameters compared to the state-of-the-art architectures, outperformed the latter on all three datasets by a wide margin (9-16% improvement in terms of F1 score) within 2 hours of architecture search time, in comparison to weeks to months of trial-and-error by human researchers in the conventional deep learning model development process. The neural networks found by AutoNet-LCN are robust to varying noise levels, ECG signal length, sampling frequency, number of leads, amplitude scale, ECG abnormality types, and cohort sizes of the study populations.</p>
<p>Finally, to address the issue that the labels in the CKB were provided by the deterministic rule-based Minnesota code, which in theory can be approximated to arbitrary precision by a neural network, we proposed a novel paradigm: learning from alternative labels. We provided proof-of-concept by predicting the participants’ age from the 10-s ECG waveforms in the CKB dataset using AutoNet-LCN. We trained the AutoNet-LCN on the normal population and tested on the normal, “arrhythmia”, “ischemia”, and “hypertrophy” classes. We developed the gender-agnostic model as well as the gender-stratified mode, achieving mean absolute error of 5.7 years (R 2 = 44.1%), 5.6 years (R 2 = 45.4%), and 6.2 years (R 2 = 34.7%) for gender agnostic, female, and male models in the normal class, respectively. The absolute deviation of the predicted “heart age” from one’s chronological age suggests higher CVD risks, and a high “heart age” was associated with “hypertrophy”, “ischemia”, and “hypertension”, while a low “heart age” was associated with “arrhythmia” and “hypotension”. The “heart age” may be considered as an intuitive risk score for cardiovascular health and warrants further study of its associations with different CVD outcomes.</p>
|