Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?

BackgroundIn lung clinical trials with imaging, blinded independent central review with double reads is recommended to reduce evaluation bias and the Response Evaluation Criteria In Solid Tumor (RECIST) is still widely used. We retrospectively analyzed the inter-reader discrepancies rate over time,...

Full description

Bibliographic Details
Main Authors: Hubert Beaumont, Antoine Iannessi
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-10-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2023.1239570/full
_version_ 1797665758174511104
author Hubert Beaumont
Antoine Iannessi
author_facet Hubert Beaumont
Antoine Iannessi
author_sort Hubert Beaumont
collection DOAJ
description BackgroundIn lung clinical trials with imaging, blinded independent central review with double reads is recommended to reduce evaluation bias and the Response Evaluation Criteria In Solid Tumor (RECIST) is still widely used. We retrospectively analyzed the inter-reader discrepancies rate over time, the risk factors for discrepancies related to baseline evaluations, and the potential of machine learning to predict inter-reader discrepancies.Materials and methodsWe retrospectively analyzed five BICR clinical trials for patients on immunotherapy or targeted therapy for lung cancer. Double reads of 1724 patients involving 17 radiologists were performed using RECIST 1.1. We evaluated the rate of discrepancies over time according to four endpoints: progressive disease declared (PDD), date of progressive disease (DOPD), best overall response (BOR), and date of the first response (DOFR). Risk factors associated with discrepancies were analyzed, two predictive models were evaluated.ResultsAt the end of trials, the discrepancy rates between trials were not different. On average, the discrepancy rates were 21.0%, 41.0%, 28.8%, and 48.8% for PDD, DOPD, BOR, and DOFR, respectively. Over time, the discrepancy rate was higher for DOFR than DOPD, and the rates increased as the trial progressed, even after accrual was completed. It was rare for readers to not find any disease, for less than 7% of patients, at least one reader selected non-measurable disease only (NTL). Often the readers selected some of their target lesions (TLs) and NTLs in different organs, with ranges of 36.0-57.9% and 60.5-73.5% of patients, respectively. Rarely (4-8.1%) two readers selected all their TLs in different locations. Significant risk factors were different depending on the endpoint and the trial being considered. Prediction had a poor performance but the positive predictive value was higher than 80%. The best classification was obtained with BOR.ConclusionPredicting discordance rates necessitates having knowledge of patient accrual, patient survival, and the probability of discordances over time. In lung cancer trials, although risk factors for inter-reader discrepancies are known, they are weakly significant, the ability to predict discrepancies from baseline data is limited. To boost prediction accuracy, it would be necessary to enhance baseline-derived features or create new ones, considering other risk factors and looking into optimal reader associations.
first_indexed 2024-03-11T19:49:26Z
format Article
id doaj.art-7241ce2d08b04138ad2cff74dcb8a500
institution Directory Open Access Journal
issn 2234-943X
language English
last_indexed 2024-03-11T19:49:26Z
publishDate 2023-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj.art-7241ce2d08b04138ad2cff74dcb8a5002023-10-05T13:39:36ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2023-10-011310.3389/fonc.2023.12395701239570Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?Hubert BeaumontAntoine IannessiBackgroundIn lung clinical trials with imaging, blinded independent central review with double reads is recommended to reduce evaluation bias and the Response Evaluation Criteria In Solid Tumor (RECIST) is still widely used. We retrospectively analyzed the inter-reader discrepancies rate over time, the risk factors for discrepancies related to baseline evaluations, and the potential of machine learning to predict inter-reader discrepancies.Materials and methodsWe retrospectively analyzed five BICR clinical trials for patients on immunotherapy or targeted therapy for lung cancer. Double reads of 1724 patients involving 17 radiologists were performed using RECIST 1.1. We evaluated the rate of discrepancies over time according to four endpoints: progressive disease declared (PDD), date of progressive disease (DOPD), best overall response (BOR), and date of the first response (DOFR). Risk factors associated with discrepancies were analyzed, two predictive models were evaluated.ResultsAt the end of trials, the discrepancy rates between trials were not different. On average, the discrepancy rates were 21.0%, 41.0%, 28.8%, and 48.8% for PDD, DOPD, BOR, and DOFR, respectively. Over time, the discrepancy rate was higher for DOFR than DOPD, and the rates increased as the trial progressed, even after accrual was completed. It was rare for readers to not find any disease, for less than 7% of patients, at least one reader selected non-measurable disease only (NTL). Often the readers selected some of their target lesions (TLs) and NTLs in different organs, with ranges of 36.0-57.9% and 60.5-73.5% of patients, respectively. Rarely (4-8.1%) two readers selected all their TLs in different locations. Significant risk factors were different depending on the endpoint and the trial being considered. Prediction had a poor performance but the positive predictive value was higher than 80%. The best classification was obtained with BOR.ConclusionPredicting discordance rates necessitates having knowledge of patient accrual, patient survival, and the probability of discordances over time. In lung cancer trials, although risk factors for inter-reader discrepancies are known, they are weakly significant, the ability to predict discrepancies from baseline data is limited. To boost prediction accuracy, it would be necessary to enhance baseline-derived features or create new ones, considering other risk factors and looking into optimal reader associations.https://www.frontiersin.org/articles/10.3389/fonc.2023.1239570/fullclinical trialInterobserver variationRECISTcomputed tomographylung cancer
spellingShingle Hubert Beaumont
Antoine Iannessi
Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
Frontiers in Oncology
clinical trial
Interobserver variation
RECIST
computed tomography
lung cancer
title Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
title_full Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
title_fullStr Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
title_full_unstemmed Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
title_short Can we predict discordant RECIST 1.1 evaluations in double read clinical trials?
title_sort can we predict discordant recist 1 1 evaluations in double read clinical trials
topic clinical trial
Interobserver variation
RECIST
computed tomography
lung cancer
url https://www.frontiersin.org/articles/10.3389/fonc.2023.1239570/full
work_keys_str_mv AT hubertbeaumont canwepredictdiscordantrecist11evaluationsindoublereadclinicaltrials
AT antoineiannessi canwepredictdiscordantrecist11evaluationsindoublereadclinicaltrials