Deep learning methods for genome-based prediction of drug resistance in Mycobacterium tuberculosis

<p>Tuberculosis is a highly lethal infectious disease, causing approximately 1.5 million deaths annually. Recent years have seen a concerning rise in drug-resistant tuberculosis cases, calling for the development of new approaches for drug susceptibility testing. The conventional gold standard...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoir: Wang, C
Rannpháirtithe: Clifton, D
Formáid: Tráchtas
Teanga:English
Foilsithe / Cruthaithe: 2023
Cur síos
Achoimre:<p>Tuberculosis is a highly lethal infectious disease, causing approximately 1.5 million deaths annually. Recent years have seen a concerning rise in drug-resistant tuberculosis cases, calling for the development of new approaches for drug susceptibility testing. The conventional gold standard, bacterial culture testing, can take up to six weeks to get results, potentially leading to incorrect treatments and worsened patient outcomes. The development of whole-genome sequencing technology offers the hope of rapid and accurate genomic drug susceptibility tests. To achieve this, a substantial number of <em>Mycobacterium tuberculosis</em> (MTB) isolates, with both genomic sequencing and phenotypic drug resistance data, are required to expand our understanding of MTB drug resistance mechanisms and improve the predictive performance of genomic testing.</p> <p>The <em>Comprehensive Resistance Prediction for Tuberculosis: an International Consortium</em> (CRyPTIC), a global collaboration, has granted access to two extensive datasets, enabling the training of accurate prediction models. In this doctoral thesis, which is based on the CRyPTIC datasets, we explore and develop a variety of machine learning and deep learning methods. Our research includes the application of binary classifiers for predicting MTB resistance labels, the employment of multi-class classification and ordinal regression techniques for predicting minimum inhibitory concentrations of MTB, and the implementation of calibration models to improve the uncertainty estimation in our predictions.</p> <p>In addition, we acknowledge that current approaches ignore valuable attribute information associated with mutations. Based on this observation, we introduce MTB-HINE-BERT, an innovative attention-based deep learning framework. This framework integrates heterogeneous information networks to enhance the prediction of MTB resistance from mutations. Our primary contribution includes improving the area under the receiver operating characteristic curve (AUROC) for pyrazinamide and amikacin by 2.2% and 1.9%, respectively.</p> <p>Furthermore, we aim to overcome the limitation of training separate models for each drug, which neglects the resistance co-occurrence and shared resistance mechanisms across various drugs. We employ multi-task learning and meta-learning techniques to address this challenge. The multi-task learning model enhances the AUROC for eight drugs. Additionally, we propose the hyper-logistic regression model, enabling simultaneous training for different drugs and marginally improving the overall predictive performance.</p>