Transforming outcome measures in plastic surgery

<p>The CLEFT-Q is a patient reported outcome measure (PROM) designed to measure the elements of health most important to people born with a cleft lip or palate (CL/P). Response burden is considered a major barrier to its uptake in routine clinical practice in the UK and abroad. In this thesis,...

Full description

Bibliographic Details
Main Author: Harrison, C
Other Authors: Rodrigues, J
Format: Thesis
Language:English
Published: 2022
Subjects:
Description
Summary:<p>The CLEFT-Q is a patient reported outcome measure (PROM) designed to measure the elements of health most important to people born with a cleft lip or palate (CL/P). Response burden is considered a major barrier to its uptake in routine clinical practice in the UK and abroad. In this thesis, I aimed to develop, validate, and deploy a computerised adaptive testing (CAT) version of the questionnaire, that can provide accurate scores for a respondent, based on an individualised selection of items. This is anticipated to reduce response burden and facilitate the questionnaire’s uptake. I also aimed to explore future research avenues for clinical CAT research, including the quantification of measurement precision, and the application of multidimensional item response theory and regression tree learning.</p> <p>In a series of Monte Carlo simulations, I compared popular score estimators and item selection criteria for use in CLEFT-Q CAT algorithms, based on Rasch models calibrated from the item responses of 2434 participants in the CLEFT-Q field test. I validated these algorithms in an external sample of 536 patients undergoing routine CL/P care, and set stopping rules for the assessments during a workshop that involved patients, parents, researchers, and clinicians. I developed a user interface for the software and a web application to assist with score interpretation. I deployed the platform during routine clinical practice at a single centre and interviewed 6 patients and 4 clinicians to understand the perceived impact of the system on clinical care.</p> <p>Through simulation studies, I compared the measurement precision of Rasch-based CAT algorithms to CAT algorithms built with graded response models (GRMs). I developed a multidimensional GRM to describe the relationship between items and factors in the CLEFT-Q appearance scales, built this into a CAT algorithm, and demonstrated how it might work in practice. Finally, I built a range of CAT algorithms that use regression trees to incorporate clinician-reported data into the assessment, and compared the efficiency of these algorithms to Rasch-based counterparts.</p> <p>There were no meaningful differences between the performance of popular score estimators and item selection criteria, and I chose to proceed with standard Bayesian techniques (expected a posteriori scoring and minimum expected posterior variance item selection). In the validation study, algorithms were able to provide accurate measurements from fewer items, but accuracy decreased with decreasing assessment length. Standard error of measurement (SE) was higher for these Rasch-based algorithms than what has been recommended for GRM-based CAT algorithms. In the stopping rule workshop, stakeholders voted to reduce the CLEFT-Q scale lengths by 1-4 items each, and at these assessment lengths the root mean squared error between full-length scale scores and CAT scores was 2-5 points out of 100. In clinical practice, the platform was perceived to enhance communication between patients and clinicians, improve consultation focus, facilitate multidisciplinary team working, and improve the early detection of health related quality of life issues.</p> <p>I demonstrated that while GRM-based algorithms achieve a lower SE, they are unlikely to provide more accurate point-estimates than Rasch-based algorithms. The multidimensional GRM CAT algorithm was able to measure the appearance of the face based on item responses to scales relating to facial components (for example, appearance of the nose, lips, and teeth). However, it did not provide obvious gains in efficiency. Regression trees preferentially used item responses over clinical variables for latent construct measurement and demonstrated inferior performance to Rasch-based algorithms, likely due to a loss in measurement granularity from binary data splitting.</p> <p>The CLEFT-Q CAT system reduces assessment length and its use in clinical care is perceived to enhance communication and patient experience. This may drive routine, standardised, patient-centred outcome measurement in CL/P, and can be rapidly translated into other clinical fields. The high SE values are likely an artefact of Rasch model parsimony and SE heuristics used in GRM-based CAT systems should not necessarily be generalised to these Rasch-based algorithms. Future work should continue to cautiously explore the potential for multidimensional item response theory to improve assessment efficiency, and other methods for incorporating non-PROM data into latent variable models should be evaluated.</p>