Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening

Abstract Mass community testing is a critical means for monitoring the spread of the COVID-19 pandemic. Polymerase chain reaction (PCR) is the gold standard for detecting the causative coronavirus 2 (SARS-CoV-2) but the test is invasive, test centers may not be readily available, and the wait for la...

Full description

Bibliographic Details
Main Authors: Hao Xiong, Shlomo Berkovsky, Mohamed Ali Kâafar, Adam Jaffe, Enrico Coiera, Roneel V. Sharan
Format: Article
Language:English
Published: Nature Portfolio 2022-12-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-26492-5
_version_ 1797977486974255104
author Hao Xiong
Shlomo Berkovsky
Mohamed Ali Kâafar
Adam Jaffe
Enrico Coiera
Roneel V. Sharan
author_facet Hao Xiong
Shlomo Berkovsky
Mohamed Ali Kâafar
Adam Jaffe
Enrico Coiera
Roneel V. Sharan
author_sort Hao Xiong
collection DOAJ
description Abstract Mass community testing is a critical means for monitoring the spread of the COVID-19 pandemic. Polymerase chain reaction (PCR) is the gold standard for detecting the causative coronavirus 2 (SARS-CoV-2) but the test is invasive, test centers may not be readily available, and the wait for laboratory results can take several days. Various machine learning based alternatives to PCR screening for SARS-CoV-2 have been proposed, including cough sound analysis. Cough classification models appear to be a robust means to predict infective status, but collecting reliable PCR confirmed data for their development is challenging and recent work using unverified crowdsourced data is seen as a viable alternative. In this study, we report experiments that assess cough classification models trained (i) using data from PCR-confirmed COVID subjects and (ii) using data of individuals self-reporting their infective status. We compare performance using PCR-confirmed data. Models trained on PCR-confirmed data perform better than those trained on patient-reported data. Models using PCR-confirmed data also exploit more stable predictive features and converge faster. Crowd-sourced cough data is less reliable than PCR-confirmed data for developing predictive models for COVID-19, and raises concerns about the utility of patient reported outcome data in developing other clinical predictive models when better gold-standard data are available.
first_indexed 2024-04-11T05:07:41Z
format Article
id doaj.art-9b8d731cd5eb460cbd5a479c8c82631a
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-11T05:07:41Z
publishDate 2022-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-9b8d731cd5eb460cbd5a479c8c82631a2022-12-25T12:16:24ZengNature PortfolioScientific Reports2045-23222022-12-011211910.1038/s41598-022-26492-5Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screeningHao Xiong0Shlomo Berkovsky1Mohamed Ali Kâafar2Adam Jaffe3Enrico Coiera4Roneel V. Sharan5Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie UniversityCentre for Health Informatics, Australian Institute of Health Innovation, Macquarie UniversityDepartment of Computing, Macquarie UniversitySchool of Women’s and Children’s Health, Faculty of Medicine, University of New South WalesCentre for Health Informatics, Australian Institute of Health Innovation, Macquarie UniversityCentre for Health Informatics, Australian Institute of Health Innovation, Macquarie UniversityAbstract Mass community testing is a critical means for monitoring the spread of the COVID-19 pandemic. Polymerase chain reaction (PCR) is the gold standard for detecting the causative coronavirus 2 (SARS-CoV-2) but the test is invasive, test centers may not be readily available, and the wait for laboratory results can take several days. Various machine learning based alternatives to PCR screening for SARS-CoV-2 have been proposed, including cough sound analysis. Cough classification models appear to be a robust means to predict infective status, but collecting reliable PCR confirmed data for their development is challenging and recent work using unverified crowdsourced data is seen as a viable alternative. In this study, we report experiments that assess cough classification models trained (i) using data from PCR-confirmed COVID subjects and (ii) using data of individuals self-reporting their infective status. We compare performance using PCR-confirmed data. Models trained on PCR-confirmed data perform better than those trained on patient-reported data. Models using PCR-confirmed data also exploit more stable predictive features and converge faster. Crowd-sourced cough data is less reliable than PCR-confirmed data for developing predictive models for COVID-19, and raises concerns about the utility of patient reported outcome data in developing other clinical predictive models when better gold-standard data are available.https://doi.org/10.1038/s41598-022-26492-5
spellingShingle Hao Xiong
Shlomo Berkovsky
Mohamed Ali Kâafar
Adam Jaffe
Enrico Coiera
Roneel V. Sharan
Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
Scientific Reports
title Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
title_full Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
title_fullStr Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
title_full_unstemmed Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
title_short Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening
title_sort reliability of crowdsourced data and patient reported outcome measures in cough based covid 19 screening
url https://doi.org/10.1038/s41598-022-26492-5
work_keys_str_mv AT haoxiong reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening
AT shlomoberkovsky reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening
AT mohamedalikaafar reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening
AT adamjaffe reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening
AT enricocoiera reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening
AT roneelvsharan reliabilityofcrowdsourceddataandpatientreportedoutcomemeasuresincoughbasedcovid19screening