Summary: | Early detection and diagnosis of breast cancer are crucial to improve the survival rates of patients. Hence, pathologists and radiologists need a computer-aided diagnosis system to assist their clinical diagnoses effectively and efficiently. However, most breast cancer recognition models are faced with the sample scarcity problem, which results in serious overfitting and lowers recognition performance. To alleviate the sample scarcity problem, a simple, effective model called “refinement, correlation, adaptive” (RCA) for breast cancer recognition is proposed from the perspective of fine-grained feature selection. An innovative multiview efficient range-based gene selection algorithm is proposed to complete the first-layer feature “refinement,” which contributes to suppressing the noisy information in the original feature space. Then, more-discriminant but low-dimensional information among heterogeneous features is mined through the second-layer cross-modal “correlation” mining. Feature dimensions are reduced to a reasonable value that fits the sample size well and alleviates the overfitting problem. Finally, the last-layer decision-tree-guided “adaptive” feature selection is completed using the gradient boosting decision tree algorithm. The RCA model was validated on two well-known datasets. The experimental results demonstrate that the proposed RCA model can address the sample scarcity problem well. It outperforms state-of-the-art baselines, especially in terms of accuracy and the area of the Kiviat diagram. The largest performance improvements of the metrics are 2.39% and 1121, respectively. Moreover, an online diagnosis system based on the RCA model is proposed. It provides rapid and effective breast cancer recognition, which should make clinical diagnoses more convenient and narrow the gap between theoretical research and practical application.
|