Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides

BackgroundThere is a huge number of health-related apps available, and the numbers are growing fast. However, many of them have been developed without any kind of quality control. In an attempt to contribute to the development of high-quality apps and enable existing apps to be assessed, several gui...

Full description

Bibliographic Details
Main Authors: Miró, Jordi, Llorens-Vernet, Pere
Format: Article
Language:English
Published: JMIR Publications 2021-04-01
Series:JMIR mHealth and uHealth
Online Access:https://mhealth.jmir.org/2021/4/e26471
_version_ 1818886253292027904
author Miró, Jordi
Llorens-Vernet, Pere
author_facet Miró, Jordi
Llorens-Vernet, Pere
author_sort Miró, Jordi
collection DOAJ
description BackgroundThere is a huge number of health-related apps available, and the numbers are growing fast. However, many of them have been developed without any kind of quality control. In an attempt to contribute to the development of high-quality apps and enable existing apps to be assessed, several guides have been developed. ObjectiveThe main aim of this study was to study the interrater reliability of a new guide — the Mobile App Development and Assessment Guide (MAG) — and compare it with one of the most used guides in the field, the Mobile App Rating Scale (MARS). Moreover, we also focused on whether the interrater reliability of the measures is consistent across multiple types of apps and stakeholders. MethodsIn order to study the interrater reliability of the MAG and MARS, we evaluated the 4 most downloaded health apps for chronic health conditions in the medical category of IOS and Android devices (ie, App Store and Google Play). A group of 8 reviewers, representative of individuals that would be most knowledgeable and interested in the use and development of health-related apps and including different types of stakeholders such as clinical researchers, engineers, health care professionals, and end users as potential patients, independently evaluated the quality of the apps using the MAG and MARS. We calculated the Krippendorff alpha for every category in the 2 guides, for each type of reviewer and every app, separately and combined, to study the interrater reliability. ResultsOnly a few categories of the MAG and MARS demonstrated a high interrater reliability. Although the MAG was found to be superior, there was considerable variation in the scores between the different types of reviewers. The categories with the highest interrater reliability in MAG were “Security” (α=0.78) and “Privacy” (α=0.73). In addition, 2 other categories, “Usability” and “Safety,” were very close to compliance (health care professionals: α=0.62 and 0.61, respectively). The total interrater reliability of the MAG (ie, for all categories) was 0.45, whereas the total interrater reliability of the MARS was 0.29. ConclusionsThis study shows that some categories of MAG have significant interrater reliability. Importantly, the data show that the MAG scores are better than the ones provided by the MARS, which is the most commonly used guide in the area. However, there is great variability in the responses, which seems to be associated with subjective interpretation by the reviewers.
first_indexed 2024-12-19T16:18:24Z
format Article
id doaj.art-33dad88a60c24b47b847672d2940a577
institution Directory Open Access Journal
issn 2291-5222
language English
last_indexed 2024-12-19T16:18:24Z
publishDate 2021-04-01
publisher JMIR Publications
record_format Article
series JMIR mHealth and uHealth
spelling doaj.art-33dad88a60c24b47b847672d2940a5772022-12-21T20:14:34ZengJMIR PublicationsJMIR mHealth and uHealth2291-52222021-04-0194e2647110.2196/26471Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two GuidesMiró, JordiLlorens-Vernet, PereBackgroundThere is a huge number of health-related apps available, and the numbers are growing fast. However, many of them have been developed without any kind of quality control. In an attempt to contribute to the development of high-quality apps and enable existing apps to be assessed, several guides have been developed. ObjectiveThe main aim of this study was to study the interrater reliability of a new guide — the Mobile App Development and Assessment Guide (MAG) — and compare it with one of the most used guides in the field, the Mobile App Rating Scale (MARS). Moreover, we also focused on whether the interrater reliability of the measures is consistent across multiple types of apps and stakeholders. MethodsIn order to study the interrater reliability of the MAG and MARS, we evaluated the 4 most downloaded health apps for chronic health conditions in the medical category of IOS and Android devices (ie, App Store and Google Play). A group of 8 reviewers, representative of individuals that would be most knowledgeable and interested in the use and development of health-related apps and including different types of stakeholders such as clinical researchers, engineers, health care professionals, and end users as potential patients, independently evaluated the quality of the apps using the MAG and MARS. We calculated the Krippendorff alpha for every category in the 2 guides, for each type of reviewer and every app, separately and combined, to study the interrater reliability. ResultsOnly a few categories of the MAG and MARS demonstrated a high interrater reliability. Although the MAG was found to be superior, there was considerable variation in the scores between the different types of reviewers. The categories with the highest interrater reliability in MAG were “Security” (α=0.78) and “Privacy” (α=0.73). In addition, 2 other categories, “Usability” and “Safety,” were very close to compliance (health care professionals: α=0.62 and 0.61, respectively). The total interrater reliability of the MAG (ie, for all categories) was 0.45, whereas the total interrater reliability of the MARS was 0.29. ConclusionsThis study shows that some categories of MAG have significant interrater reliability. Importantly, the data show that the MAG scores are better than the ones provided by the MARS, which is the most commonly used guide in the area. However, there is great variability in the responses, which seems to be associated with subjective interpretation by the reviewers.https://mhealth.jmir.org/2021/4/e26471
spellingShingle Miró, Jordi
Llorens-Vernet, Pere
Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
JMIR mHealth and uHealth
title Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
title_full Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
title_fullStr Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
title_full_unstemmed Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
title_short Assessing the Quality of Mobile Health-Related Apps: Interrater Reliability Study of Two Guides
title_sort assessing the quality of mobile health related apps interrater reliability study of two guides
url https://mhealth.jmir.org/2021/4/e26471
work_keys_str_mv AT mirojordi assessingthequalityofmobilehealthrelatedappsinterraterreliabilitystudyoftwoguides
AT llorensvernetpere assessingthequalityofmobilehealthrelatedappsinterraterreliabilitystudyoftwoguides