Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-05-01
|
Series: | Frontiers in Medicine |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fmed.2020.00177/full |
_version_ | 1818850781401448448 |
---|---|
author | Achim Hekler Jakob N. Kather Jakob N. Kather Eva Krieghoff-Henning Jochen S. Utikal Jochen S. Utikal Friedegund Meier Friedegund Meier Frank F. Gellrich Frank F. Gellrich Julius Upmeier zu Belzen Lars French Justin G. Schlager Kamran Ghoreschi Tabea Wilhelm Heinz Kutzner Carola Berking Markus V. Heppt Sebastian Haferkamp Wiebke Sondermann Dirk Schadendorf Bastian Schilling Benjamin Izar Roman Maron Max Schmitt Stefan Fröhling Stefan Fröhling Daniel B. Lipka Daniel B. Lipka Daniel B. Lipka Titus J. Brinker |
author_facet | Achim Hekler Jakob N. Kather Jakob N. Kather Eva Krieghoff-Henning Jochen S. Utikal Jochen S. Utikal Friedegund Meier Friedegund Meier Frank F. Gellrich Frank F. Gellrich Julius Upmeier zu Belzen Lars French Justin G. Schlager Kamran Ghoreschi Tabea Wilhelm Heinz Kutzner Carola Berking Markus V. Heppt Sebastian Haferkamp Wiebke Sondermann Dirk Schadendorf Bastian Schilling Benjamin Izar Roman Maron Max Schmitt Stefan Fröhling Stefan Fröhling Daniel B. Lipka Daniel B. Lipka Daniel B. Lipka Titus J. Brinker |
author_sort | Achim Hekler |
collection | DOAJ |
description | Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem. |
first_indexed | 2024-12-19T06:54:35Z |
format | Article |
id | doaj.art-7c8a3d919b4d4e7da94887fa5381fe8d |
institution | Directory Open Access Journal |
issn | 2296-858X |
language | English |
last_indexed | 2024-12-19T06:54:35Z |
publishDate | 2020-05-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Medicine |
spelling | doaj.art-7c8a3d919b4d4e7da94887fa5381fe8d2022-12-21T20:31:34ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2020-05-01710.3389/fmed.2020.00177536659Effects of Label Noise on Deep Learning-Based Skin Cancer ClassificationAchim Hekler0Jakob N. Kather1Jakob N. Kather2Eva Krieghoff-Henning3Jochen S. Utikal4Jochen S. Utikal5Friedegund Meier6Friedegund Meier7Frank F. Gellrich8Frank F. Gellrich9Julius Upmeier zu Belzen10Lars French11Justin G. Schlager12Kamran Ghoreschi13Tabea Wilhelm14Heinz Kutzner15Carola Berking16Markus V. Heppt17Sebastian Haferkamp18Wiebke Sondermann19Dirk Schadendorf20Bastian Schilling21Benjamin Izar22Roman Maron23Max Schmitt24Stefan Fröhling25Stefan Fröhling26Daniel B. Lipka27Daniel B. Lipka28Daniel B. Lipka29Titus J. Brinker30National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyDepartment of Medicine III, RWTH University Hospital Aachen, Aachen, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyDepartment of Dermatology, Heidelberg University, Mannheim, GermanySkin Cancer Unit, German Cancer Research Center, Heidelberg, GermanySkin Cancer Center at the University Cancer Centre and National Center for Tumor Diseases Dresden, Dresden, GermanyDepartment of Dermatology, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, GermanySkin Cancer Center at the University Cancer Centre and National Center for Tumor Diseases Dresden, Dresden, GermanyDepartment of Dermatology, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, GermanyBerlin Institute of Health (BIH), Charité, Berlin, GermanyDepartment of Dermatology and Allergology, Ludwig Maximilian University of Munich, Munich, GermanyDepartment of Dermatology and Allergology, Ludwig Maximilian University of Munich, Munich, GermanyDepartment of Dermatology, Venereology and Allergology, Charité–Universitätsmedizin Berlin, Berlin, GermanyDepartment of Dermatology, Venereology and Allergology, Charité–Universitätsmedizin Berlin, Berlin, Germany0Dermatopathology Laboratory, Friedrichshafen, Germany1Department of Dermatology, University Hospital Erlangen, Erlangen, Germany1Department of Dermatology, University Hospital Erlangen, Erlangen, Germany2Department of Dermatology, University Hospital Regensburg, Regensburg, Germany3Department of Dermatology, University Hospital Essen, Essen, Germany3Department of Dermatology, University Hospital Essen, Essen, Germany4Department of Dermatology, University Hospital Würzburg, Würzburg, Germany5Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, United StatesNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany6Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ), Heidelberg, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany6Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany7Faculty of Medicine, Medical Center, Otto-von-Guericke-University, Magdeburg, GermanyNational Center for Tumor Diseases, German Cancer Research Center, Heidelberg, GermanyRecent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.https://www.frontiersin.org/article/10.3389/fmed.2020.00177/fulldermatologyartificial intelligencelabel noiseskin cancermelanomanevi |
spellingShingle | Achim Hekler Jakob N. Kather Jakob N. Kather Eva Krieghoff-Henning Jochen S. Utikal Jochen S. Utikal Friedegund Meier Friedegund Meier Frank F. Gellrich Frank F. Gellrich Julius Upmeier zu Belzen Lars French Justin G. Schlager Kamran Ghoreschi Tabea Wilhelm Heinz Kutzner Carola Berking Markus V. Heppt Sebastian Haferkamp Wiebke Sondermann Dirk Schadendorf Bastian Schilling Benjamin Izar Roman Maron Max Schmitt Stefan Fröhling Stefan Fröhling Daniel B. Lipka Daniel B. Lipka Daniel B. Lipka Titus J. Brinker Effects of Label Noise on Deep Learning-Based Skin Cancer Classification Frontiers in Medicine dermatology artificial intelligence label noise skin cancer melanoma nevi |
title | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_full | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_fullStr | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_full_unstemmed | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_short | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_sort | effects of label noise on deep learning based skin cancer classification |
topic | dermatology artificial intelligence label noise skin cancer melanoma nevi |
url | https://www.frontiersin.org/article/10.3389/fmed.2020.00177/full |
work_keys_str_mv | AT achimhekler effectsoflabelnoiseondeeplearningbasedskincancerclassification AT jakobnkather effectsoflabelnoiseondeeplearningbasedskincancerclassification AT jakobnkather effectsoflabelnoiseondeeplearningbasedskincancerclassification AT evakrieghoffhenning effectsoflabelnoiseondeeplearningbasedskincancerclassification AT jochensutikal effectsoflabelnoiseondeeplearningbasedskincancerclassification AT jochensutikal effectsoflabelnoiseondeeplearningbasedskincancerclassification AT friedegundmeier effectsoflabelnoiseondeeplearningbasedskincancerclassification AT friedegundmeier effectsoflabelnoiseondeeplearningbasedskincancerclassification AT frankfgellrich effectsoflabelnoiseondeeplearningbasedskincancerclassification AT frankfgellrich effectsoflabelnoiseondeeplearningbasedskincancerclassification AT juliusupmeierzubelzen effectsoflabelnoiseondeeplearningbasedskincancerclassification AT larsfrench effectsoflabelnoiseondeeplearningbasedskincancerclassification AT justingschlager effectsoflabelnoiseondeeplearningbasedskincancerclassification AT kamranghoreschi effectsoflabelnoiseondeeplearningbasedskincancerclassification AT tabeawilhelm effectsoflabelnoiseondeeplearningbasedskincancerclassification AT heinzkutzner effectsoflabelnoiseondeeplearningbasedskincancerclassification AT carolaberking effectsoflabelnoiseondeeplearningbasedskincancerclassification AT markusvheppt effectsoflabelnoiseondeeplearningbasedskincancerclassification AT sebastianhaferkamp effectsoflabelnoiseondeeplearningbasedskincancerclassification AT wiebkesondermann effectsoflabelnoiseondeeplearningbasedskincancerclassification AT dirkschadendorf effectsoflabelnoiseondeeplearningbasedskincancerclassification AT bastianschilling effectsoflabelnoiseondeeplearningbasedskincancerclassification AT benjaminizar effectsoflabelnoiseondeeplearningbasedskincancerclassification AT romanmaron effectsoflabelnoiseondeeplearningbasedskincancerclassification AT maxschmitt effectsoflabelnoiseondeeplearningbasedskincancerclassification AT stefanfrohling effectsoflabelnoiseondeeplearningbasedskincancerclassification AT stefanfrohling effectsoflabelnoiseondeeplearningbasedskincancerclassification AT danielblipka effectsoflabelnoiseondeeplearningbasedskincancerclassification AT danielblipka effectsoflabelnoiseondeeplearningbasedskincancerclassification AT danielblipka effectsoflabelnoiseondeeplearningbasedskincancerclassification AT titusjbrinker effectsoflabelnoiseondeeplearningbasedskincancerclassification |