Deep learning-based research on the influence of training data size for breast cancer pathology detection

In pathological diagnosis of breast cancer, there are problems such as shortage of pathologists, difficulties in sample labeling, and huge workload of manual diagnosis. Therefore, deep learning-based computer-assisted pathology analysis systems have been developed to diagnose breast cancer and have...

Full description

Bibliographic Details
Main Authors: Chongyang Cui, Shangchun Fan, Han Lei, Xiaolei Qu, Dezhi Zheng
Format: Article
Language:English
Published: Wiley 2019-12-01
Series:The Journal of Engineering
Subjects:
Online Access:https://digital-library.theiet.org/content/journals/10.1049/joe.2018.9093
Description
Summary:In pathological diagnosis of breast cancer, there are problems such as shortage of pathologists, difficulties in sample labeling, and huge workload of manual diagnosis. Therefore, deep learning-based computer-assisted pathology analysis systems have been developed to diagnose breast cancer and have achieved impressive results. However, it is difficult to obtain a large number of training sets due to the scarcity of pathological images and the huge labeling costs. Therefore, the size of the training set should be planned before building the pathology computer-assisted breast cancer analysis system. Here, the authors present a study to determine the optimal size of the training data set needed to achieve high classification accuracy when developing a pathology computer-assisted breast cancer analysis system. The authors trained two kind of CNNs using six different sizes of training data set and then tested the resulting system with a total of 10,000 images. All images were acquired from the Camelyon17 challenge. Here, the authors propose a scheme for determining the size of the training set and the size of the model in developing the pathology computer-assisted breast cancer analysis systems, which can be easily applied to develop systems for other different pathological images.
ISSN:2051-3305