Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study

Abstract Background Artificial intelligence can be trained to outperform dermatologists in image‐based skin cancer diagnostics. However, the networks' sensitivity to biases and overfitting may hamper their clinical applicability. Objectives The aim of this study was to explain the potential con...

Full description

Bibliographic Details
Main Authors: Niels K. Ternov, Anders N. Christensen, Peter J. T. Kampen, Gustav Als, Tine Vestergaard, Lars Konge, Martin Tolsgaard, Lisbet R. Hölmich, Pascale Guitera, Annette H. Chakera, Morten R. Hannemose
Format: Article
Language:English
Published: Wiley 2022-12-01
Series:JEADV Clinical Practice
Subjects:
Online Access:https://doi.org/10.1002/jvc2.59
_version_ 1811213517302267904
author Niels K. Ternov
Anders N. Christensen
Peter J. T. Kampen
Gustav Als
Tine Vestergaard
Lars Konge
Martin Tolsgaard
Lisbet R. Hölmich
Pascale Guitera
Annette H. Chakera
Morten R. Hannemose
author_facet Niels K. Ternov
Anders N. Christensen
Peter J. T. Kampen
Gustav Als
Tine Vestergaard
Lars Konge
Martin Tolsgaard
Lisbet R. Hölmich
Pascale Guitera
Annette H. Chakera
Morten R. Hannemose
author_sort Niels K. Ternov
collection DOAJ
description Abstract Background Artificial intelligence can be trained to outperform dermatologists in image‐based skin cancer diagnostics. However, the networks' sensitivity to biases and overfitting may hamper their clinical applicability. Objectives The aim of this study was to explain the potential consequences of implementing convolutional neural networks for stand‐alone melanoma diagnostics and skin lesion triage. Methods In this algorithm validation study on retrospective data, we reproduced and evaluated the performance of state‐of‐the‐art artificial intelligence (convolutional neural networks) for skin cancer diagnostics. The networks were trained on 25,331 annotated dermoscopic skin lesion images from an open‐source data set (ISIC‐2019) and tested using a novel data set (AISC‐2021) consisting of 26,591 annotated dermoscopic skin lesion images. We tested the trained algorithms' ability to generalize to new data and their diagnostic performance in two simulations (melanoma diagnostics and skin lesion triage). Results The trained algorithms performed significantly less accurate diagnostics on images of nevi, melanomas and actinic keratoses from the AISC‐2021 data set than the ISIC‐2019 data set (p < 0.003). Almost one‐third (31.1%) of the melanomas were misclassified during the melanoma diagnostics simulation, irrespective of their Breslow thickness. Furthermore, the algorithms marked 92.7% of the lesions ‘suspicious’ during the triage simulation, which yielded a triage sensitivity and specificity of 99.7% and 8.2%, respectively. Conclusions Although state‐of‐the‐art artificial intelligence outperforms dermatologists on image‐based skin lesion classification within an artificial setting, additional data and technological advances are needed before clinical implementation.
first_indexed 2024-04-12T05:47:54Z
format Article
id doaj.art-ba5155ff6442483daa8fc4d02b0d98b6
institution Directory Open Access Journal
issn 2768-6566
language English
last_indexed 2024-04-12T05:47:54Z
publishDate 2022-12-01
publisher Wiley
record_format Article
series JEADV Clinical Practice
spelling doaj.art-ba5155ff6442483daa8fc4d02b0d98b62022-12-22T03:45:24ZengWileyJEADV Clinical Practice2768-65662022-12-011434435410.1002/jvc2.59Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation studyNiels K. Ternov0Anders N. Christensen1Peter J. T. Kampen2Gustav Als3Tine Vestergaard4Lars Konge5Martin Tolsgaard6Lisbet R. Hölmich7Pascale Guitera8Annette H. Chakera9Morten R. Hannemose10Department of Plastic Surgery Herlev and Gentofte University Hospital Copenhagen DenmarkDepartment of Applied Mathematics and Computer Science Technical University of Denmark Lyngby DenmarkDepartment of Applied Mathematics and Computer Science Technical University of Denmark Lyngby DenmarkDepartment of Applied Mathematics and Computer Science Technical University of Denmark Lyngby DenmarkDepartment of Dermatology and Allergy Center Odense University Hospital Odense DenmarkCopenhagen Academy for Medical Education and Simulation Copenhagen University Hospital ‐ Rigshospitalet Copenhagen DenmarkCopenhagen Academy for Medical Education and Simulation Copenhagen University Hospital ‐ Rigshospitalet Copenhagen DenmarkDepartment of Plastic Surgery Herlev and Gentofte University Hospital Copenhagen DenmarkMelanoma Institute Australia Sydney The University of Sydney Sydney New South Wales AustraliaDepartment of Plastic Surgery Herlev and Gentofte University Hospital Copenhagen DenmarkDepartment of Applied Mathematics and Computer Science Technical University of Denmark Lyngby DenmarkAbstract Background Artificial intelligence can be trained to outperform dermatologists in image‐based skin cancer diagnostics. However, the networks' sensitivity to biases and overfitting may hamper their clinical applicability. Objectives The aim of this study was to explain the potential consequences of implementing convolutional neural networks for stand‐alone melanoma diagnostics and skin lesion triage. Methods In this algorithm validation study on retrospective data, we reproduced and evaluated the performance of state‐of‐the‐art artificial intelligence (convolutional neural networks) for skin cancer diagnostics. The networks were trained on 25,331 annotated dermoscopic skin lesion images from an open‐source data set (ISIC‐2019) and tested using a novel data set (AISC‐2021) consisting of 26,591 annotated dermoscopic skin lesion images. We tested the trained algorithms' ability to generalize to new data and their diagnostic performance in two simulations (melanoma diagnostics and skin lesion triage). Results The trained algorithms performed significantly less accurate diagnostics on images of nevi, melanomas and actinic keratoses from the AISC‐2021 data set than the ISIC‐2019 data set (p < 0.003). Almost one‐third (31.1%) of the melanomas were misclassified during the melanoma diagnostics simulation, irrespective of their Breslow thickness. Furthermore, the algorithms marked 92.7% of the lesions ‘suspicious’ during the triage simulation, which yielded a triage sensitivity and specificity of 99.7% and 8.2%, respectively. Conclusions Although state‐of‐the‐art artificial intelligence outperforms dermatologists on image‐based skin lesion classification within an artificial setting, additional data and technological advances are needed before clinical implementation.https://doi.org/10.1002/jvc2.59artificial intelligencemelanomaskin cancerskin cancer prevention and early detection
spellingShingle Niels K. Ternov
Anders N. Christensen
Peter J. T. Kampen
Gustav Als
Tine Vestergaard
Lars Konge
Martin Tolsgaard
Lisbet R. Hölmich
Pascale Guitera
Annette H. Chakera
Morten R. Hannemose
Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
JEADV Clinical Practice
artificial intelligence
melanoma
skin cancer
skin cancer prevention and early detection
title Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
title_full Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
title_fullStr Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
title_full_unstemmed Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
title_short Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
title_sort generalizability and usefulness of artificial intelligence for skin cancer diagnostics an algorithm validation study
topic artificial intelligence
melanoma
skin cancer
skin cancer prevention and early detection
url https://doi.org/10.1002/jvc2.59
work_keys_str_mv AT nielskternov generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT andersnchristensen generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT peterjtkampen generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT gustavals generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT tinevestergaard generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT larskonge generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT martintolsgaard generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT lisbetrholmich generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT pascaleguitera generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT annettehchakera generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy
AT mortenrhannemose generalizabilityandusefulnessofartificialintelligenceforskincancerdiagnosticsanalgorithmvalidationstudy